2e6a981120
release / release (push) Successful in 13s
M4 — health report (the 0.0.4 CHANGELOG entry, folded into this release):
- core/health.py: scan journalctl (Xid/panic/OOM/MCE/AER/thermal), SMART,
NVIDIA driver mismatch, journald persistence, live temps -> findings
- CLI `rigdoctor report` (text/JSON); GUI Health tab; scanner tests
M9 — installer (first cut):
- core/{catalog,sysenv,installer}.py; `rigdoctor install [--check] [-y]`
- GUI Setup tab: detect distro/GPU, show optional components, one-click
install of missing apt packages via pkexec/sudo
M13 — update check (check half):
- core/updates.py; sidebar shows up-to-date / "Update to v…" / unavailable
Plus tests, version bump to 0.0.5, CHANGELOG, and doc status updates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
118 lines
5.3 KiB
Markdown
118 lines
5.3 KiB
Markdown
# RigDoctor
|
||
|
||
A **modular diagnostics, monitoring, and health-check toolkit for Linux gamers.**
|
||
|
||
> **Status:** 🟢 Phase 1 (MVP) complete. The **sensor core (M1)**, **crash-capture logger
|
||
> (M3)**, and **health report (M4)** all work — live `snapshot`/`monitor`, crash-safe `record`
|
||
> with a post-crash report, and `report` to scan logs/SMART/driver for likely causes. A
|
||
> desktop GUI (M10) ties them together (dashboard, recording, health). See `docs/ROADMAP.md`.
|
||
|
||
## Why this exists
|
||
|
||
Linux gaming hardware faults are hard to diagnose: GPUs falling off the PCIe bus, the screen
|
||
suddenly going black mid-game, silent thermal/VRAM throttling, power transients,
|
||
driver/library mismatches, Proton quirks, and CPU governor / power-profile misconfiguration.
|
||
The data needed to diagnose them is scattered across `nvidia-smi`, `/sys/class/hwmon`,
|
||
`journalctl`, SMART, and more — and the most useful readings (the ones right before a hard
|
||
freeze) are usually lost because nothing flushed them to disk.
|
||
|
||
RigDoctor pulls all of that into one modular tool: live monitoring, crash-safe logging, a
|
||
one-shot health report, and an interactive installer that only sets up the modules a given
|
||
user actually needs for their hardware.
|
||
|
||
**Seed use cases:** an RTX 3070 that intermittently "falls off the bus" under heavy GPU load
|
||
(Path of Exile on Linux, Escape from Tarkov on Windows), and a monitor going black mid-game.
|
||
See `docs/SPEC.md` §1.
|
||
|
||
## How you run it
|
||
|
||
RigDoctor is **GUI-first** — the desktop app is the primary way in — but every feature is
|
||
also available headless:
|
||
- **Desktop GUI** — graphical dashboard, recording controls, log browser, reports. The
|
||
default interface for most users.
|
||
- **Tray applet** — a small top-menu-bar applet with quick actions and at-a-glance status.
|
||
- **CLI** — full functionality from the terminal; works over SSH and in scripts.
|
||
|
||
The GUI/tray are optional modules; a headless (CLI-only) install loses no capability.
|
||
|
||
## Key decisions (settled)
|
||
|
||
| Topic | Decision |
|
||
|-------|----------|
|
||
| Name | **RigDoctor** |
|
||
| Language / stack | **Python 3 + Qt (PySide6)** — core/CLI/daemon stdlib-only; Qt only for GUI/tray |
|
||
| Primary distro | **Ubuntu** (Debian via apt); others best-effort later |
|
||
| Primary GPU | **NVIDIA** first; AMD, then Intel later |
|
||
| MVP | **Sensor core + crash logger + health report** (NVIDIA-only, CLI-first) |
|
||
| Distribution | **User-local install** (self-updating from the public repo, no root); **`.deb`** optional |
|
||
| Scope of action | **Read-only + suggestions** (no auto-apply yet) |
|
||
| Stress tests | **Out of scope** |
|
||
|
||
Full rationale and the still-open questions are in `docs/DECISIONS.md`.
|
||
|
||
## Repo layout
|
||
|
||
| Path | Purpose |
|
||
|------|---------|
|
||
| `docs/SPEC.md` | Product specification — vision, requirements, modules (the main planning doc) |
|
||
| `docs/ARCHITECTURE.md` | Technical design — core engine, front-ends, daemon, installer |
|
||
| `docs/MODULES.md` | Catalog of modules with scope, dependencies, status |
|
||
| `docs/ROADMAP.md` | Phased milestones |
|
||
| `docs/DECISIONS.md` | Decision log + remaining open questions |
|
||
| `src/rigdoctor/` | Source code — `core/` engine + sources, `cli.py`, `render.py` |
|
||
| `installer/` | Installer / `.deb` packaging (empty until Phase 4) |
|
||
| `tests/` | Tests (stdlib `unittest`) |
|
||
|
||
## Run it (dev)
|
||
|
||
Stdlib-only, no install needed (target is Python ≥ 3.11; tested on 3.14):
|
||
|
||
```bash
|
||
PYTHONPATH=src python3 -m rigdoctor snapshot # one-shot sensor read
|
||
PYTHONPATH=src python3 -m rigdoctor snapshot --json
|
||
PYTHONPATH=src python3 -m rigdoctor monitor -n 1 # live view (Ctrl-C to quit)
|
||
PYTHONPATH=src python3 -m rigdoctor sources # list detected sensor sources
|
||
PYTHONPATH=src python3 -m unittest discover -s tests
|
||
```
|
||
|
||
### Crash-capture logger (M3)
|
||
|
||
A crash-safe background logger (JSONL, `fsync` per sample, bounded by rotation) for catching
|
||
the state right before a freeze:
|
||
|
||
```bash
|
||
rigdoctor record start # start logging in the background
|
||
rigdoctor record status # is it running? latest readings, sample count
|
||
rigdoctor record stop # stop it
|
||
rigdoctor record report # post-crash summary: peaks, events, last samples
|
||
rigdoctor record run # run in the foreground (the systemd-ready entrypoint)
|
||
```
|
||
|
||
Logs live in `~/.local/share/rigdoctor/logs/`. It detects GPU "lost"/hang (nvidia-smi query
|
||
timeout) and writes an event marker. Trigger modes (always-on / game-launch) and the
|
||
`systemd --user` service arrive in Phase 4.
|
||
|
||
### Desktop GUI (M10)
|
||
|
||
The GUI uses PySide6 (Qt) — the only part of RigDoctor that needs a non-stdlib dep:
|
||
|
||
```bash
|
||
pip install -e '.[gui]' # core + PySide6, gives `rigdoctor` and `rigdoctor-gui`
|
||
rigdoctor gui # or: rigdoctor-gui
|
||
```
|
||
|
||
It opens a dark-themed window with sidebar navigation and a **live dashboard** over the
|
||
same sensor core — circular gauges for the headline metrics plus collapsible per-subsystem
|
||
cards (GPU/CPU/memory/storage) with temperature-colored values (icey-blue → green → red).
|
||
The **Logs** and **Health** sections are full pages (recording controls + post-crash report;
|
||
and the kernel-log / SMART / driver scan). **Inventory** is a placeholder until M5 lands.
|
||
|
||
Without the GUI extra, `pip install -e .` gives just the stdlib-only CLI.
|
||
|
||
## Start here
|
||
|
||
1. Read `docs/SPEC.md` for what we're building.
|
||
2. Read `docs/ROADMAP.md` for the build order (Phase 1 = the MVP).
|
||
3. Read `docs/DECISIONS.md` for the settled decisions (D1–D15).
|
||
</content>
|