67665974dc
M6 leftovers (the watcher defers to M9's trigger-mode work): - gameenv: check_gpu_powermizer (NVIDIA, X; degrades when the gpu target won't resolve), check_wine (wine --version), check_steam_client (dpkg package version); steam.client_version() helper. - core/launchers.py: detect Lutris (read-only SQLite pga.db) and Heroic (Epic legendary + GOG JSON) installed games; Game gained a `launcher` field. - Games page + `rigdoctor games` list non-Steam games alongside Steam, tagged by launcher; Run Diagnostic works on them (auto-launch stays Steam-only). - Tests for launchers (synthetic Lutris db + Heroic json). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 KiB
10 KiB
RigDoctor — Module Catalog (DRAFT v0.2)
Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
Module set per D14, plus M12 (session sharing, D16) and M13 (auto-update, D18). M7 (stress/repro) was dropped (D7). M10/M11 are the GUI and tray modules (D10/D11). GPU scope reads "all (NVIDIA first)" — NVIDIA first, others via the vendor abstraction (D4).
| ID | Module | Bundle | Key deps | GPU scope | Priority | Status |
|---|---|---|---|---|---|---|
| M1 | Sensor core | Essential | none (nvidia-smi, sysfs) | all (NVIDIA first) | P0 | ✅ |
| M3 | Crash-capture logger | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | ✅ |
| M4 | Health report (log scan) | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | ✅ |
| M2 | Live monitor (TUI) | Monitoring | none (stdlib curses) | all | P1 | ✅ |
| M8 | Alerting | Monitoring | libnotify (opt) | all | P2 | ✅ |
| M5 | System inventory | Diagnostics | none (opt: lm-sensors, dmidecode) | all | P1 | ✅ |
| M6 | Gaming env checks | Diagnostics | none | all | P2 | 🟨 |
| M10 | Desktop GUI | Desktop UI | python3-pyside6 | all | P2 | ✅ |
| M11 | Tray / menu-bar applet | Desktop UI | python3-pyside6 (+ AppIndicator on GNOME) | all | P2 | ✅ |
| M9 | Installer | (meta) | none | all | P1 | 🟨 |
| M12 | Session sharing / remote assist | Sharing | none (Tier 3: tmate/sshx) | all | P3 | 🟨 |
| M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | ✅ |
| — | — | — | — | ❌ dropped (D7) |
Notes per module
- M1 Sensor core — the foundation everything else samples from. Stdlib-only. Abstracts NVIDIA/AMD/Intel + hwmon behind one interface; ship the NVIDIA + hwmon path first.
- M3 Crash-capture logger — the highest-value piece for the seed use case.
fsyncper sample; GPU-lost detection via query timeout; bounded rotation;systemd --userservice with a user-selectable trigger mode (always-on / game-launch / manual — D6). Implemented (manual trigger): JSONL log with fsync-per-sample, size-based rotation (log_max_bytes/log_backups), GPU-lost/recovered event markers, atomic status file, andrigdoctor record run|start|stop|status|report. The foregroundrunis the systemd-ready entrypoint. The game-launch trigger is implemented via the D12 wrapper (rigdoctor wrap %command%, see M6/below); thesystemd --userservice unit + always-on trigger (D6) and the zero-config watcher (D12) are still pending. Also fully driven from the GUI's Recording/Logs page (M10) via sharedcore.reccontrol. - M4 Health report — turns scattered logs into a prioritized, plain-language findings
list with suggested fixes (read-only, D9). Reuses M1 for a live snapshot. Also powers
the guided diagnostic session (with M3): pick a game → focused capture → scan →
findings (see SPEC §4). Implemented: journalctl scan (Xid/panic/OOM/MCE/AER/thermal/amdgpu),
SMART, NVIDIA driver-mismatch, journald-persistence + live-temp checks;
rigdoctor report(text/JSON) + GUI Health tab. GPU-firmware verification deferred. - M2 Live monitor — the terminal "HWMonitor for Linux" face. Implemented (
tui.py):rigdoctor monitoris a stdlib curses dashboard — current / session-min / session-max per sensor, grouped by subsystem, with temperature & utilization color bands;qquits,rresets the min/max. Falls back to a plain redraw on a non-TTY (--plainforces it). - M5 / M6 Diagnostics — inventory export + gaming-env checks; M6 flags risky settings and
suggests the fix command but does not apply it (D9). M6 implemented (Steam detection first —
the D12 "pick a game" foundation): discovers Steam installs + all library folders
(
libraryfolders.vdf, multi-drive) and the games in each (appmanifest_*.acf), filtering runtimes/Proton/redistributables — stdlib only. Libraries are opt-in (steam_librariesconfig); the GUI Games page lists them with per-library counts and rescans in the background on every launch, badging games installed since the last scan (cached instate/games.json). CLI:rigdoctor games/games libraries [--enable|--disable|--all]. Env-check engine implemented (core/gameenv.py): a read-only findings report (reusing the M4Findingmodel) over PCIe ASPM, NVIDIA persistence mode, CPU governor (the three seed-case contributors to GPU bus-drop / Xid 79), GameMode, MangoHud, swappiness, shader cache, THP, CPU mitigations, and installed Proton versions — each with the suggested fix command. CLIrigdoctor gameenv; GUI Environment page. Per D22, the GUI adds one-click apply for the runtime-reversible tunables (governor / NVIDIA persistence / PCIe ASPM / swappiness / THP — dropdown + Apply via a single pkexec prompt,core/fixes.py) and one-click install of optional tools (GameMode / MangoHud / cpupower, now in the M9 catalog). GRUB/mitigations stay suggestion-only. Guided diagnostic (D12 "pick a game",core/diagnostic.py): a focused capture tagged with a game → window-scoped report (capture summary + M4 findings), in the CLI (rigdoctor diagnose start/status/finish) and GUI (per-game Run Diagnostic → recording banner → results dialog). Auto-capture via the D12 wrapper (rigdoctor wrap %command%,core/wrap.py; GUI "Auto-capture…" helper). Hard crashes are detected (capture left without a clean stop) and flagged on next launch with a crash-boot kernel-log analysis (pending_crash/analyze_crash+health.check_previous_boot). Non-Steam launchers (Lutris SQLite + Heroic JSON,core/launchers.py) are detected and listed alongside Steam games; env checks also cover GPU PowerMizer (X), Wine and Steam-client versions. Pending: the zero-config watcher (D12 fallback) — landing with M9's trigger-mode work. - M8 Alerting — threshold/event notifications; integrates with the tray applet (M11).
- M10 Desktop GUI — PySide6 graphical front-end over the core engine. Optional; adds the
Qt dependency. Dark-themed window with a grouped sidebar (Monitor / Diagnose / System /
App) over: Dashboard (live history graphs + per-subsystem cards), Games (M6 detection
- Run Diagnostic), Recordings (recorder controls + view/report any captured log + analyze a crash), System Health (M4 scan), Tuning (M6 gaming tunables + fixes), Inventory (M5), Settings (components/deps + alerts + account + uninstall), and Share (M12). A global recording badge shows on every page. GUI-first per D17.
- M11 Tray applet —
QSystemTrayIconmenu-bar applet. Implemented (gui/tray.py, D13): the menu shows live M1 readouts (CPU temp, GPU temp, memory used/total) + a status line (Normal / Hot / GPU not responding), led by a Run Diagnostic submenu (per detected game → the guided session), plus Open dashboard / Start-Stop recording / Snapshot-copy / Quit. It shares the dashboard's sample stream (no extra sampling) and drives the existing MainWindow flows. With a tray present, closing the window hides to the tray (Quit exits);rigdoctor-gui --traystarts hidden for autostart. Optional; shares the Qt dependency with M10. Needs a tray host — on GNOME that means the AppIndicator extension; degrades to no-op if none is available. - M9 Installer — interactive wizard layered on the
.deb(D8); apt-first dependency resolution; enables the logger service and trigger mode. Implemented (first cut): distro/ package-manager/GPU detection (core/sysenv), an optional-component catalog (core/catalog), and dependency install via pkexec/sudo —rigdoctor install [--check] [-y]+ GUI Setup tab. The user-local app install isinstall.sh(private venv +~/.local/binlaunchers + desktop entry, no root; handles thepython3-venvprerequisite) plus a self-extracting.run(pure-Python self-extractor,packaging/make_run.py, built by CI). Pending: config/module selection +systemd --userservice enable. - M12 Session sharing / remote assist (D16) — let a helper inspect a user's machine, in an escalating ladder: (1) diagnostic bundle export (inventory + recent log + report, one-way), (2) live read-only view over a user-chosen tunnel (Tailscale/cloudflared/SSH, no hosted relay), (3) gated interactive terminal wrapping tmate/sshx (read-only by default; read-write only on explicit consent — a deliberate exception to D9). Per-session consent, ephemeral revocable tokens, audit log.
- M13 Auto-update (D18) — check + auth implemented: updates are gated to Gitea account
holders via a Personal Access Token, stored encrypted in the OS keyring (
secret-tool) with a 0600-file fallback (config.load_token/save_token/token_backend).core/updatesqueries the releases API with the token; CLIlogin/logout/update; GUI Setup "Update access" panel + sidebar states. The no-root self-update apply is implemented:rigdoctor updateruns an authenticatedpip install --upgrade "rigdoctor[gui] @ git+https://oauth2:<token>@…@<tag>"into the user-local venv (GUI "Update to v…" button + restart prompt; token scrubbed). Installed via the user-localinstall.sh/ self-extracting.run(M9). Original plan: On launch, check the public Gitea releases API and self-update a user-local install with no root (download → verify checksum/signature → atomic symlink swap → restart, incl. the daemon). HTTPS-only, version-check-only (no telemetry), opt-out-able. Surfaced in the GUI;rigdoctor updatein the CLI. (.debusers update via apt instead.)
Bundles (final — D14)
- Essential: M1 + M3 + M4 (the MVP, NVIDIA-only — D5)
- Monitoring: M2 + M8
- Diagnostics: M5 + M6
- Desktop UI: M10 + M11 (adds PySide6)
- Sharing: M12 (session sharing / remote assist — D16)
MVP candidate — confirmed (D5)
M1 + M3 + M4 (Essential), NVIDIA-only, CLI-first. Gives a working tool that captures the GPU crash and explains the logs — deliverable before the installer, GUI/tray, or multi-vendor work.