Files
rigdoctor/README.md
T
jessey 2e6a981120
release / release (push) Successful in 13s
Release 0.0.5: health report (M4), installer (M9), update check (M13)
M4 — health report (the 0.0.4 CHANGELOG entry, folded into this release):
- core/health.py: scan journalctl (Xid/panic/OOM/MCE/AER/thermal), SMART,
  NVIDIA driver mismatch, journald persistence, live temps -> findings
- CLI `rigdoctor report` (text/JSON); GUI Health tab; scanner tests

M9 — installer (first cut):
- core/{catalog,sysenv,installer}.py; `rigdoctor install [--check] [-y]`
- GUI Setup tab: detect distro/GPU, show optional components, one-click
  install of missing apt packages via pkexec/sudo

M13 — update check (check half):
- core/updates.py; sidebar shows up-to-date / "Update to v…" / unavailable

Plus tests, version bump to 0.0.5, CHANGELOG, and doc status updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:36:11 +02:00

118 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# RigDoctor
A **modular diagnostics, monitoring, and health-check toolkit for Linux gamers.**
> **Status:** 🟢 Phase 1 (MVP) complete. The **sensor core (M1)**, **crash-capture logger
> (M3)**, and **health report (M4)** all work — live `snapshot`/`monitor`, crash-safe `record`
> with a post-crash report, and `report` to scan logs/SMART/driver for likely causes. A
> desktop GUI (M10) ties them together (dashboard, recording, health). See `docs/ROADMAP.md`.
## Why this exists
Linux gaming hardware faults are hard to diagnose: GPUs falling off the PCIe bus, the screen
suddenly going black mid-game, silent thermal/VRAM throttling, power transients,
driver/library mismatches, Proton quirks, and CPU governor / power-profile misconfiguration.
The data needed to diagnose them is scattered across `nvidia-smi`, `/sys/class/hwmon`,
`journalctl`, SMART, and more — and the most useful readings (the ones right before a hard
freeze) are usually lost because nothing flushed them to disk.
RigDoctor pulls all of that into one modular tool: live monitoring, crash-safe logging, a
one-shot health report, and an interactive installer that only sets up the modules a given
user actually needs for their hardware.
**Seed use cases:** an RTX 3070 that intermittently "falls off the bus" under heavy GPU load
(Path of Exile on Linux, Escape from Tarkov on Windows), and a monitor going black mid-game.
See `docs/SPEC.md` §1.
## How you run it
RigDoctor is **GUI-first** — the desktop app is the primary way in — but every feature is
also available headless:
- **Desktop GUI** — graphical dashboard, recording controls, log browser, reports. The
default interface for most users.
- **Tray applet** — a small top-menu-bar applet with quick actions and at-a-glance status.
- **CLI** — full functionality from the terminal; works over SSH and in scripts.
The GUI/tray are optional modules; a headless (CLI-only) install loses no capability.
## Key decisions (settled)
| Topic | Decision |
|-------|----------|
| Name | **RigDoctor** |
| Language / stack | **Python 3 + Qt (PySide6)** — core/CLI/daemon stdlib-only; Qt only for GUI/tray |
| Primary distro | **Ubuntu** (Debian via apt); others best-effort later |
| Primary GPU | **NVIDIA** first; AMD, then Intel later |
| MVP | **Sensor core + crash logger + health report** (NVIDIA-only, CLI-first) |
| Distribution | **User-local install** (self-updating from the public repo, no root); **`.deb`** optional |
| Scope of action | **Read-only + suggestions** (no auto-apply yet) |
| Stress tests | **Out of scope** |
Full rationale and the still-open questions are in `docs/DECISIONS.md`.
## Repo layout
| Path | Purpose |
|------|---------|
| `docs/SPEC.md` | Product specification — vision, requirements, modules (the main planning doc) |
| `docs/ARCHITECTURE.md` | Technical design — core engine, front-ends, daemon, installer |
| `docs/MODULES.md` | Catalog of modules with scope, dependencies, status |
| `docs/ROADMAP.md` | Phased milestones |
| `docs/DECISIONS.md` | Decision log + remaining open questions |
| `src/rigdoctor/` | Source code — `core/` engine + sources, `cli.py`, `render.py` |
| `installer/` | Installer / `.deb` packaging (empty until Phase 4) |
| `tests/` | Tests (stdlib `unittest`) |
## Run it (dev)
Stdlib-only, no install needed (target is Python ≥ 3.11; tested on 3.14):
```bash
PYTHONPATH=src python3 -m rigdoctor snapshot # one-shot sensor read
PYTHONPATH=src python3 -m rigdoctor snapshot --json
PYTHONPATH=src python3 -m rigdoctor monitor -n 1 # live view (Ctrl-C to quit)
PYTHONPATH=src python3 -m rigdoctor sources # list detected sensor sources
PYTHONPATH=src python3 -m unittest discover -s tests
```
### Crash-capture logger (M3)
A crash-safe background logger (JSONL, `fsync` per sample, bounded by rotation) for catching
the state right before a freeze:
```bash
rigdoctor record start # start logging in the background
rigdoctor record status # is it running? latest readings, sample count
rigdoctor record stop # stop it
rigdoctor record report # post-crash summary: peaks, events, last samples
rigdoctor record run # run in the foreground (the systemd-ready entrypoint)
```
Logs live in `~/.local/share/rigdoctor/logs/`. It detects GPU "lost"/hang (nvidia-smi query
timeout) and writes an event marker. Trigger modes (always-on / game-launch) and the
`systemd --user` service arrive in Phase 4.
### Desktop GUI (M10)
The GUI uses PySide6 (Qt) — the only part of RigDoctor that needs a non-stdlib dep:
```bash
pip install -e '.[gui]' # core + PySide6, gives `rigdoctor` and `rigdoctor-gui`
rigdoctor gui # or: rigdoctor-gui
```
It opens a dark-themed window with sidebar navigation and a **live dashboard** over the
same sensor core — circular gauges for the headline metrics plus collapsible per-subsystem
cards (GPU/CPU/memory/storage) with temperature-colored values (icey-blue → green → red).
The **Logs** and **Health** sections are full pages (recording controls + post-crash report;
and the kernel-log / SMART / driver scan). **Inventory** is a placeholder until M5 lands.
Without the GUI extra, `pip install -e .` gives just the stdlib-only CLI.
## Start here
1. Read `docs/SPEC.md` for what we're building.
2. Read `docs/ROADMAP.md` for the build order (Phase 1 = the MVP).
3. Read `docs/DECISIONS.md` for the settled decisions (D1D15).
</content>