Reviewed-on: #17
RigDoctor
A modular diagnostics, monitoring, and health-check toolkit for Linux gamers.
Status: 🟢 Phase 1 (MVP) complete. The sensor core (M1), crash-capture logger (M3), and health report (M4) all work — live
snapshot/monitor, crash-saferecordwith a post-crash report, andreportto scan logs/SMART/driver for likely causes. A desktop GUI (M10) ties them together (dashboard, recording, health). Seedocs/ROADMAP.md.
Why this exists
Linux gaming hardware faults are hard to diagnose: GPUs falling off the PCIe bus, the screen
suddenly going black mid-game, silent thermal/VRAM throttling, power transients,
driver/library mismatches, Proton quirks, and CPU governor / power-profile misconfiguration.
The data needed to diagnose them is scattered across nvidia-smi, /sys/class/hwmon,
journalctl, SMART, and more — and the most useful readings (the ones right before a hard
freeze) are usually lost because nothing flushed them to disk.
RigDoctor pulls all of that into one modular tool: live monitoring, crash-safe logging, a one-shot health report, and an interactive installer that only sets up the modules a given user actually needs for their hardware.
Seed use cases: an RTX 3070 that intermittently "falls off the bus" under heavy GPU load
(Path of Exile on Linux, Escape from Tarkov on Windows), and a monitor going black mid-game.
See docs/SPEC.md §1.
How you run it
RigDoctor is GUI-first — the desktop app is the primary way in — but every feature is also available headless:
- Desktop GUI — graphical dashboard, recording controls, log browser, reports. The default interface for most users.
- Tray applet — a small top-menu-bar applet with quick actions and at-a-glance status.
- CLI — full functionality from the terminal; works over SSH and in scripts.
The GUI/tray are optional modules; a headless (CLI-only) install loses no capability.
Key decisions (settled)
| Topic | Decision |
|---|---|
| Name | RigDoctor |
| Language / stack | Python 3 + Qt (PySide6) — core/CLI/daemon stdlib-only; Qt only for GUI/tray |
| Primary distro | Ubuntu (Debian via apt); others best-effort later |
| Primary GPU | NVIDIA first; AMD, then Intel later |
| MVP | Sensor core + crash logger + health report (NVIDIA-only, CLI-first) |
| Distribution | User-local install (self-updating from the public repo, no root); .deb optional |
| Scope of action | Read-only + suggestions (no auto-apply yet) |
| Stress tests | Out of scope |
Full rationale and the still-open questions are in docs/DECISIONS.md.
Repo layout
| Path | Purpose |
|---|---|
docs/SPEC.md |
Product specification — vision, requirements, modules (the main planning doc) |
docs/ARCHITECTURE.md |
Technical design — core engine, front-ends, daemon, installer |
docs/MODULES.md |
Catalog of modules with scope, dependencies, status |
docs/ROADMAP.md |
Phased milestones |
docs/DECISIONS.md |
Decision log + remaining open questions |
src/rigdoctor/ |
Source code — core/ engine + sources, cli.py, render.py |
installer/ |
Installer / .deb packaging (empty until Phase 4) |
tests/ |
Tests (stdlib unittest) |
Install (user-local, no root)
RigDoctor installs into a private venv under ~/.local — no root, self-updating:
./install.sh # from a source checkout or the self-extracting .run
./install.sh --ref v0.0.6 # install a specific released tag (needs a token)
./install.sh --uninstall # remove it
This adds rigdoctor / rigdoctor-gui to ~/.local/bin and a desktop entry. Each release
also ships a one-file .run installer (download, chmod +x, run). Updates are gated to
accounts on the Git server (a Personal Access Token); save one via the GUI Setup → Update
access panel or rigdoctor login, then rigdoctor update (or the sidebar button).
Run it (dev)
Stdlib-only, no install needed (target is Python ≥ 3.11; tested on 3.14):
PYTHONPATH=src python3 -m rigdoctor snapshot # one-shot sensor read
PYTHONPATH=src python3 -m rigdoctor snapshot --json
PYTHONPATH=src python3 -m rigdoctor monitor -n 1 # live view (Ctrl-C to quit)
PYTHONPATH=src python3 -m rigdoctor sources # list detected sensor sources
PYTHONPATH=src python3 -m unittest discover -s tests
Crash-capture logger (M3)
A crash-safe background logger (JSONL, fsync per sample, bounded by rotation) for catching
the state right before a freeze:
rigdoctor record start # start logging in the background
rigdoctor record status # is it running? latest readings, sample count
rigdoctor record stop # stop it
rigdoctor record report # post-crash summary: peaks, events, last samples
rigdoctor record run # run in the foreground (the systemd-ready entrypoint)
Logs live in ~/.local/share/rigdoctor/logs/. It detects GPU "lost"/hang (nvidia-smi query
timeout) and writes an event marker. Trigger modes (always-on / game-launch) and the
systemd --user service arrive in Phase 4.
Desktop GUI (M10)
The GUI uses PySide6 (Qt) — the only part of RigDoctor that needs a non-stdlib dep:
pip install -e '.[gui]' # core + PySide6, gives `rigdoctor` and `rigdoctor-gui`
rigdoctor gui # or: rigdoctor-gui
It opens a dark-themed window with sidebar navigation and a live dashboard over the same sensor core — circular gauges for the headline metrics plus collapsible per-subsystem cards (GPU/CPU/memory/storage) with temperature-colored values (icey-blue → green → red). The Logs and Health sections are full pages (recording controls + post-crash report; and the kernel-log / SMART / driver scan). Inventory is a placeholder until M5 lands.
Without the GUI extra, pip install -e . gives just the stdlib-only CLI.
Start here
- Read
docs/SPEC.mdfor what we're building. - Read
docs/ROADMAP.mdfor the build order (Phase 1 = the MVP). - Read
docs/DECISIONS.mdfor the settled decisions (D1–D15).