# RigDoctor A **modular diagnostics, monitoring, and health-check toolkit for Linux gamers.** > **Status:** ๐ŸŸข Phase 1 (MVP) complete. The **sensor core (M1)**, **crash-capture logger > (M3)**, and **health report (M4)** all work โ€” live `snapshot`/`monitor`, crash-safe `record` > with a post-crash report, and `report` to scan logs/SMART/driver for likely causes. A > desktop GUI (M10) ties them together (dashboard, recording, health). See `docs/ROADMAP.md`. ## Why this exists Linux gaming hardware faults are hard to diagnose: GPUs falling off the PCIe bus, the screen suddenly going black mid-game, silent thermal/VRAM throttling, power transients, driver/library mismatches, Proton quirks, and CPU governor / power-profile misconfiguration. The data needed to diagnose them is scattered across `nvidia-smi`, `/sys/class/hwmon`, `journalctl`, SMART, and more โ€” and the most useful readings (the ones right before a hard freeze) are usually lost because nothing flushed them to disk. RigDoctor pulls all of that into one modular tool: live monitoring, crash-safe logging, a one-shot health report, and an interactive installer that only sets up the modules a given user actually needs for their hardware. **Seed use cases:** an RTX 3070 that intermittently "falls off the bus" under heavy GPU load (Path of Exile on Linux, Escape from Tarkov on Windows), and a monitor going black mid-game. See `docs/SPEC.md` ยง1. ## How you run it RigDoctor is **GUI-first** โ€” the desktop app is the primary way in โ€” but every feature is also available headless: - **Desktop GUI** โ€” graphical dashboard, recording controls, log browser, reports. The default interface for most users. - **Tray applet** โ€” a small top-menu-bar applet with quick actions and at-a-glance status. - **CLI** โ€” full functionality from the terminal; works over SSH and in scripts. The GUI/tray are optional modules; a headless (CLI-only) install loses no capability. ## Key decisions (settled) | Topic | Decision | |-------|----------| | Name | **RigDoctor** | | Language / stack | **Python 3 + Qt (PySide6)** โ€” core/CLI/daemon stdlib-only; Qt only for GUI/tray | | Primary distro | **Ubuntu** (Debian via apt); others best-effort later | | Primary GPU | **NVIDIA** first; AMD, then Intel later | | MVP | **Sensor core + crash logger + health report** (NVIDIA-only, CLI-first) | | Distribution | **User-local install** (self-updating from the public repo, no root); **`.deb`** optional | | Scope of action | **Read-only + suggestions** (no auto-apply yet) | | Stress tests | **Out of scope** | Full rationale and the still-open questions are in `docs/DECISIONS.md`. ## Repo layout | Path | Purpose | |------|---------| | `docs/SPEC.md` | Product specification โ€” vision, requirements, modules (the main planning doc) | | `docs/ARCHITECTURE.md` | Technical design โ€” core engine, front-ends, daemon, installer | | `docs/MODULES.md` | Catalog of modules with scope, dependencies, status | | `docs/ROADMAP.md` | Phased milestones | | `docs/DECISIONS.md` | Decision log + remaining open questions | | `src/rigdoctor/` | Source code โ€” `core/` engine + sources, `cli.py`, `render.py` | | `installer/` | Installer / `.deb` packaging (empty until Phase 4) | | `tests/` | Tests (stdlib `unittest`) | ## Run it (dev) Stdlib-only, no install needed (target is Python โ‰ฅ 3.11; tested on 3.14): ```bash PYTHONPATH=src python3 -m rigdoctor snapshot # one-shot sensor read PYTHONPATH=src python3 -m rigdoctor snapshot --json PYTHONPATH=src python3 -m rigdoctor monitor -n 1 # live view (Ctrl-C to quit) PYTHONPATH=src python3 -m rigdoctor sources # list detected sensor sources PYTHONPATH=src python3 -m unittest discover -s tests ``` ### Crash-capture logger (M3) A crash-safe background logger (JSONL, `fsync` per sample, bounded by rotation) for catching the state right before a freeze: ```bash rigdoctor record start # start logging in the background rigdoctor record status # is it running? latest readings, sample count rigdoctor record stop # stop it rigdoctor record report # post-crash summary: peaks, events, last samples rigdoctor record run # run in the foreground (the systemd-ready entrypoint) ``` Logs live in `~/.local/share/rigdoctor/logs/`. It detects GPU "lost"/hang (nvidia-smi query timeout) and writes an event marker. Trigger modes (always-on / game-launch) and the `systemd --user` service arrive in Phase 4. ### Desktop GUI (M10) The GUI uses PySide6 (Qt) โ€” the only part of RigDoctor that needs a non-stdlib dep: ```bash pip install -e '.[gui]' # core + PySide6, gives `rigdoctor` and `rigdoctor-gui` rigdoctor gui # or: rigdoctor-gui ``` It opens a dark-themed window with sidebar navigation and a **live dashboard** over the same sensor core โ€” circular gauges for the headline metrics plus collapsible per-subsystem cards (GPU/CPU/memory/storage) with temperature-colored values (icey-blue โ†’ green โ†’ red). The **Logs** and **Health** sections are full pages (recording controls + post-crash report; and the kernel-log / SMART / driver scan). **Inventory** is a placeholder until M5 lands. Without the GUI extra, `pip install -e .` gives just the stdlib-only CLI. ## Start here 1. Read `docs/SPEC.md` for what we're building. 2. Read `docs/ROADMAP.md` for the build order (Phase 1 = the MVP). 3. Read `docs/DECISIONS.md` for the settled decisions (D1โ€“D15).