Files
rigdoctor/docs/MODULES.md
T
jessey 67665974dc feat(m6): PowerMizer + Wine/Steam versions + non-Steam launchers — 0.22.0
M6 leftovers (the watcher defers to M9's trigger-mode work):
- gameenv: check_gpu_powermizer (NVIDIA, X; degrades when the gpu target won't
  resolve), check_wine (wine --version), check_steam_client (dpkg package version);
  steam.client_version() helper.
- core/launchers.py: detect Lutris (read-only SQLite pga.db) and Heroic (Epic
  legendary + GOG JSON) installed games; Game gained a `launcher` field.
- Games page + `rigdoctor games` list non-Steam games alongside Steam, tagged by
  launcher; Run Diagnostic works on them (auto-launch stays Steam-only).
- Tests for launchers (synthetic Lutris db + Heroic json).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:46:42 +02:00

132 lines
10 KiB
Markdown

# RigDoctor — Module Catalog (DRAFT v0.2)
Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
> Module set per D14, plus **M12 (session sharing, D16)** and **M13 (auto-update, D18)**.
> **M7 (stress/repro) was dropped (D7).** M10/M11 are the GUI and tray modules (D10/D11).
> GPU scope reads "all (NVIDIA first)" — NVIDIA first, others via the vendor abstraction (D4).
| ID | Module | Bundle | Key deps | GPU scope | Priority | Status |
|----|--------|--------|----------|-----------|----------|--------|
| M1 | Sensor core | Essential | none (nvidia-smi, sysfs) | all (NVIDIA first) | P0 | ✅ |
| M3 | Crash-capture logger | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | ✅ |
| M4 | Health report (log scan) | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | ✅ |
| M2 | Live monitor (TUI) | Monitoring | none (stdlib curses) | all | P1 | ✅ |
| M8 | Alerting | Monitoring | libnotify (opt) | all | P2 | ✅ |
| M5 | System inventory | Diagnostics | none (opt: lm-sensors, dmidecode) | all | P1 | ✅ |
| M6 | Gaming env checks | Diagnostics | none | all | P2 | 🟨 |
| M10 | Desktop GUI | Desktop UI | **python3-pyside6** | all | P2 | ✅ |
| M11 | Tray / menu-bar applet | Desktop UI | **python3-pyside6** (+ AppIndicator on GNOME) | all | P2 | ✅ |
| M9 | Installer | (meta) | none | all | P1 | 🟨 |
| M12 | Session sharing / remote assist | Sharing | none (Tier 3: tmate/sshx) | all | P3 | 🟨 |
| M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | ✅ |
| ~~M7~~ | ~~Stress / repro~~ | — | — | — | — | ❌ dropped (D7) |
## Notes per module
- **M1 Sensor core** — the foundation everything else samples from. Stdlib-only. Abstracts
NVIDIA/AMD/Intel + hwmon behind one interface; **ship the NVIDIA + hwmon path first**.
- **M3 Crash-capture logger** — the highest-value piece for the seed use case. `fsync` per
sample; GPU-lost detection via query timeout; bounded rotation; `systemd --user` service
with a **user-selectable trigger mode** (always-on / game-launch / manual — D6).
*Implemented (manual trigger):* JSONL log with fsync-per-sample, size-based rotation
(`log_max_bytes`/`log_backups`), GPU-lost/recovered event markers, atomic status file, and
`rigdoctor record run|start|stop|status|report`. The foreground `run` is the systemd-ready
entrypoint. The **game-launch trigger** is implemented via the D12 wrapper (`rigdoctor wrap
%command%`, see M6/below); the `systemd --user` service unit + always-on trigger (D6) and the
zero-config watcher (D12) are still pending. Also fully driven from the GUI's Recording/Logs
page (M10) via shared `core.reccontrol`.
- **M4 Health report** — turns scattered logs into a prioritized, plain-language findings
list with **suggested** fixes (read-only, D9). Reuses M1 for a live snapshot. Also powers
the **guided diagnostic session** (with M3): pick a game → focused capture → scan →
findings (see SPEC §4). *Implemented:* journalctl scan (Xid/panic/OOM/MCE/AER/thermal/amdgpu),
SMART, NVIDIA driver-mismatch, journald-persistence + live-temp checks; `rigdoctor report`
(text/JSON) + GUI Health tab. GPU-firmware verification deferred.
- **M2 Live monitor** — the terminal "HWMonitor for Linux" face. *Implemented (`tui.py`):*
`rigdoctor monitor` is a stdlib **curses** dashboard — current / session-min / session-max
per sensor, grouped by subsystem, with temperature & utilization color bands; `q` quits,
`r` resets the min/max. Falls back to a plain redraw on a non-TTY (`--plain` forces it).
- **M5 / M6 Diagnostics** — inventory export + gaming-env checks; M6 flags risky settings and
suggests the fix command but does not apply it (D9). *M6 implemented (Steam detection first —
the D12 "pick a game" foundation):* discovers Steam installs + all library folders
(`libraryfolders.vdf`, multi-drive) and the games in each (`appmanifest_*.acf`), filtering
runtimes/Proton/redistributables — stdlib only. **Libraries are opt-in** (`steam_libraries`
config); the GUI **Games** page lists them with per-library counts and rescans in the
background on every launch, badging games installed since the last scan (cached in
`state/games.json`). CLI: `rigdoctor games` / `games libraries [--enable|--disable|--all]`.
*Env-check engine implemented* (`core/gameenv.py`): a read-only findings report (reusing the
M4 `Finding` model) over PCIe ASPM, NVIDIA persistence mode, CPU governor (the three seed-case
contributors to GPU bus-drop / Xid 79), GameMode, MangoHud, swappiness, shader cache, THP, CPU
mitigations, and installed Proton versions — each with the suggested fix command. CLI
`rigdoctor gameenv`; GUI **Environment** page. Per **D22**, the GUI adds **one-click apply**
for the runtime-reversible tunables (governor / NVIDIA persistence / PCIe ASPM / swappiness /
THP — dropdown + Apply via a single pkexec prompt, `core/fixes.py`) and **one-click install**
of optional tools (GameMode / MangoHud / cpupower, now in the M9 catalog). GRUB/mitigations
stay suggestion-only. *Guided diagnostic (D12 "pick a game", `core/diagnostic.py`):* a focused
capture tagged with a game → window-scoped report (capture summary + M4 findings), in the CLI
(`rigdoctor diagnose start/status/finish`) and GUI (per-game **Run Diagnostic** → recording
banner → results dialog). **Auto-capture** via the D12 wrapper (`rigdoctor wrap %command%`,
`core/wrap.py`; GUI "Auto-capture…" helper). **Hard crashes are detected** (capture left
without a clean stop) and flagged on next launch with a crash-boot kernel-log analysis
(`pending_crash`/`analyze_crash` + `health.check_previous_boot`). **Non-Steam launchers**
(Lutris SQLite + Heroic JSON, `core/launchers.py`) are detected and listed alongside Steam
games; env checks also cover **GPU PowerMizer** (X), **Wine** and **Steam-client** versions.
*Pending:* the zero-config watcher (D12 fallback) — landing with M9's trigger-mode work.
- **M8 Alerting** — threshold/event notifications; integrates with the tray applet (M11).
- **M10 Desktop GUI** — PySide6 graphical front-end over the core engine. Optional; adds the
Qt dependency. Dark-themed window with a **grouped sidebar** (Monitor / Diagnose / System /
App) over: **Dashboard** (live history graphs + per-subsystem cards), **Games** (M6 detection
+ Run Diagnostic), **Recordings** (recorder controls + view/report any captured log + analyze
a crash), **System Health** (M4 scan), **Tuning** (M6 gaming tunables + fixes), **Inventory**
(M5), **Settings** (components/deps + alerts + account + uninstall), and **Share** (M12). A
global recording badge shows on every page. GUI-first per D17.
- **M11 Tray applet** — `QSystemTrayIcon` menu-bar applet. *Implemented (`gui/tray.py`, D13):*
the menu shows live M1 readouts (CPU temp, GPU temp, memory used/total) + a status line
(Normal / Hot / GPU not responding), led by a **Run Diagnostic** submenu (per detected game →
the guided session), plus Open dashboard / Start-Stop recording / Snapshot-copy / Quit. It
shares the dashboard's sample stream (no extra sampling) and drives the existing MainWindow
flows. With a tray present, closing the window **hides to the tray** (Quit exits); `rigdoctor-gui
--tray` starts hidden for autostart. Optional; shares the Qt dependency with M10. *Needs a tray
host* — on GNOME that means the AppIndicator extension; degrades to no-op if none is available.
- **M9 Installer** — interactive wizard layered on the `.deb` (D8); apt-first dependency
resolution; enables the logger service and trigger mode. *Implemented (first cut):* distro/
package-manager/GPU detection (`core/sysenv`), an optional-component catalog (`core/catalog`),
and dependency install via pkexec/sudo — `rigdoctor install [--check] [-y]` + GUI Setup tab.
The **user-local app install** is `install.sh` (private venv + `~/.local/bin` launchers +
desktop entry, no root; handles the `python3-venv` prerequisite) plus a self-extracting
**`.run`** (pure-Python self-extractor, `packaging/make_run.py`, built by CI). *Pending:*
config/module selection + `systemd --user`
service enable.
- **M12 Session sharing / remote assist** (D16) — let a helper inspect a user's machine, in
an escalating ladder: (1) **diagnostic bundle export** (inventory + recent log + report,
one-way), (2) **live read-only view** over a user-chosen tunnel (Tailscale/cloudflared/SSH,
no hosted relay), (3) **gated interactive terminal** wrapping tmate/sshx (read-only by
default; read-write only on explicit consent — a deliberate exception to D9). Per-session
consent, ephemeral revocable tokens, audit log.
- **M13 Auto-update** (D18) — *check + auth implemented:* updates are **gated to Gitea account
holders** via a Personal Access Token, stored **encrypted in the OS keyring** (`secret-tool`)
with a 0600-file fallback (`config.load_token`/`save_token`/`token_backend`). `core/updates`
queries the releases API with the token; CLI `login`/`logout`/`update`; GUI Setup "Update
access" panel + sidebar states. The no-root **self-update apply** is implemented:
`rigdoctor update` runs an authenticated `pip install --upgrade "rigdoctor[gui] @
git+https://oauth2:<token>@…@<tag>"` into the user-local venv (GUI "Update to v…" button +
restart prompt; token scrubbed). Installed via the user-local **`install.sh`** /
self-extracting **`.run`** (M9).
*Original plan:* On launch, check the public Gitea releases API and
**self-update a user-local install with no root** (download → verify checksum/signature →
atomic symlink swap → restart, incl. the daemon). HTTPS-only, version-check-only (no
telemetry), opt-out-able. Surfaced in the GUI; `rigdoctor update` in the CLI. (`.deb` users
update via apt instead.)
## Bundles (final — D14)
- **Essential:** M1 + M3 + M4 *(the MVP, NVIDIA-only — D5)*
- **Monitoring:** M2 + M8
- **Diagnostics:** M5 + M6
- **Desktop UI:** M10 + M11 *(adds PySide6)*
- **Sharing:** M12 *(session sharing / remote assist — D16)*
## MVP candidate — *confirmed (D5)*
**M1 + M3 + M4 (Essential), NVIDIA-only, CLI-first.** Gives a working tool that captures the
GPU crash and explains the logs — deliverable before the installer, GUI/tray, or multi-vendor
work.
</content>