Files
rigdoctor/docs/MODULES.md
T
jessey 78cd417d0b
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 28s
feat(m9): .deb package + CI build/publish — 0.35.0
packaging/make_deb.py builds rigdoctor_<ver>_all.deb (Architecture: all) via
dpkg-deb, no debhelper: Depends python3; Recommends python3-pyside6/pyte (GUI by
default, --no-install-recommends = CLI only). Installs the package, both
launchers, desktop entry + icon; postinst refreshes the desktop database.
release.yml builds it as a release asset and optionally pushes to the Gitea apt
registry (REGISTRY_TOKEN). Verified locally: valid .deb, packaged launcher runs
'rigdoctor --version'. Docs/README/ROADMAP/MODULES updated; M9 complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:15:33 +02:00

156 lines
12 KiB
Markdown

# RigDoctor — Module Catalog (DRAFT v0.2)
Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
> Module set per D14, plus **M12 (session sharing, D16)**, **M13 (auto-update, D18)**,
> **M14 (AI assistant, D24)**, and **M15 (logging & reports, D25)**.
> **M7 (stress/repro) was dropped (D7).** M10/M11 are the GUI and tray modules (D10/D11).
> GPU scope reads "all (NVIDIA first)" — NVIDIA first, others via the vendor abstraction (D4).
| ID | Module | Bundle | Key deps | GPU scope | Priority | Status |
|----|--------|--------|----------|-----------|----------|--------|
| M1 | Sensor core | Essential | none (nvidia-smi, sysfs) | all (NVIDIA first) | P0 | ✅ |
| M3 | Crash-capture logger | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | ✅ |
| M4 | Health report (log scan) | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | ✅ |
| M2 | Live monitor (TUI) | Monitoring | none (stdlib curses) | all | P1 | ✅ |
| M8 | Alerting | Monitoring | libnotify (opt) | all | P2 | ✅ |
| M5 | System inventory | Diagnostics | none (opt: lm-sensors, dmidecode) | all | P1 | ✅ |
| M6 | Gaming env checks | Diagnostics | none | all | P2 | 🟨 |
| M10 | Desktop GUI | Desktop UI | **python3-pyside6** | all | P2 | ✅ |
| M11 | Tray / menu-bar applet | Desktop UI | **python3-pyside6** (+ AppIndicator on GNOME) | all | P2 | ✅ |
| M9 | Installer (+ `.deb`) | (meta) | none | all | P1 | ✅ |
| M12 | Session sharing (shared terminal) | Sharing | none (relay) | all | P3 | ✅ |
| M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | ✅ |
| M14 | AI assistant (explain diagnostics) | (optional) | none (stdlib urllib; Ollama or Claude) | all | P3 | ✅ |
| M15 | Logging & report bundles | (core) | none (stdlib logging + zip) | all | P3 | ✅ |
| ~~M7~~ | ~~Stress / repro~~ | — | — | — | — | ❌ dropped (D7) |
## Notes per module
- **M1 Sensor core** — the foundation everything else samples from. Stdlib-only. Abstracts
NVIDIA/AMD/Intel + hwmon behind one interface; **ship the NVIDIA + hwmon path first**.
- **M3 Crash-capture logger** — the highest-value piece for the seed use case. `fsync` per
sample; GPU-lost detection via query timeout; bounded rotation; `systemd --user` service
with a **user-selectable trigger mode** (always-on / game-launch / manual — D6).
*Implemented (manual trigger):* JSONL log with fsync-per-sample, size-based rotation
(`log_max_bytes`/`log_backups`), GPU-lost/recovered event markers, atomic status file, and
`rigdoctor record run|start|stop|status|report`. The foreground `run` is the systemd-ready
entrypoint. The **game-launch trigger** is implemented via the D12 wrapper (`rigdoctor wrap
%command%`, see M6/below); the `systemd --user` service unit + always-on trigger (D6) and the
zero-config watcher (D12) are still pending. Also fully driven from the GUI's Recording/Logs
page (M10) via shared `core.reccontrol`.
- **M4 Health report** — turns scattered logs into a prioritized, plain-language findings
list with **suggested** fixes (read-only, D9). Reuses M1 for a live snapshot. Also powers
the **guided diagnostic session** (with M3): pick a game → focused capture → scan →
findings (see SPEC §4). *Implemented:* journalctl scan (Xid/panic/OOM/MCE/AER/thermal/amdgpu),
SMART, NVIDIA driver-mismatch, journald-persistence + live-temp checks; `rigdoctor report`
(text/JSON) + GUI Health tab. GPU-firmware verification deferred.
- **M2 Live monitor** — the terminal "HWMonitor for Linux" face. *Implemented (`tui.py`):*
`rigdoctor monitor` is a stdlib **curses** dashboard — current / session-min / session-max
per sensor, grouped by subsystem, with temperature & utilization color bands; `q` quits,
`r` resets the min/max. Falls back to a plain redraw on a non-TTY (`--plain` forces it).
- **M5 / M6 Diagnostics** — inventory export + gaming-env checks; M6 flags risky settings and
suggests the fix command but does not apply it (D9). *M6 implemented (Steam detection first —
the D12 "pick a game" foundation):* discovers Steam installs + all library folders
(`libraryfolders.vdf`, multi-drive) and the games in each (`appmanifest_*.acf`), filtering
runtimes/Proton/redistributables — stdlib only. **Libraries are opt-in** (`steam_libraries`
config); the GUI **Games** page lists them with per-library counts and rescans in the
background on every launch, badging games installed since the last scan (cached in
`state/games.json`). CLI: `rigdoctor games` / `games libraries [--enable|--disable|--all]`.
*Env-check engine implemented* (`core/gameenv.py`): a read-only findings report (reusing the
M4 `Finding` model) over PCIe ASPM, NVIDIA persistence mode, CPU governor (the three seed-case
contributors to GPU bus-drop / Xid 79), GameMode, MangoHud, swappiness, shader cache, THP, CPU
mitigations, and installed Proton versions — each with the suggested fix command. CLI
`rigdoctor gameenv`; GUI **Environment** page. Per **D22**, the GUI adds **one-click apply**
for the runtime-reversible tunables (governor / NVIDIA persistence / PCIe ASPM / swappiness /
THP — dropdown + Apply via a single pkexec prompt, `core/fixes.py`) and **one-click install**
of optional tools (GameMode / MangoHud / cpupower, now in the M9 catalog). GRUB/mitigations
stay suggestion-only. *Guided diagnostic (D12 "pick a game", `core/diagnostic.py`):* a focused
capture tagged with a game → window-scoped report (capture summary + M4 findings), in the CLI
(`rigdoctor diagnose start/status/finish`) and GUI (per-game **Run Diagnostic** → recording
banner → results dialog). **Auto-capture** via the D12 wrapper (`rigdoctor wrap %command%`,
`core/wrap.py`; GUI "Auto-capture…" helper). **Hard crashes are detected** (capture left
without a clean stop) and flagged on next launch with a crash-boot kernel-log analysis
(`pending_crash`/`analyze_crash` + `health.check_previous_boot`). **Non-Steam launchers**
(Lutris SQLite + Heroic JSON, `core/launchers.py`) are detected and listed alongside Steam
games; env checks also cover **GPU PowerMizer** (X), **Wine** and **Steam-client** versions.
*Pending:* the zero-config watcher (D12 fallback) — landing with M9's trigger-mode work.
- **M8 Alerting** — threshold/event notifications; integrates with the tray applet (M11).
- **M10 Desktop GUI** — PySide6 graphical front-end over the core engine. Optional; adds the
Qt dependency. Dark-themed window with a **grouped sidebar** (Monitor / Diagnose / System /
App) over: **Dashboard** (live history graphs + per-subsystem cards), **Games** (M6 detection
+ Run Diagnostic), **Recordings** (recorder controls + view/report any captured log + analyze
a crash), **System Health** (M4 scan), **Tuning** (M6 gaming tunables + fixes), **Inventory**
(M5), **Settings** (components/deps + alerts + account + uninstall), and **Share** (M12). A
global recording badge shows on every page. GUI-first per D17.
- **M11 Tray applet** — `QSystemTrayIcon` menu-bar applet. *Implemented (`gui/tray.py`, D13):*
the menu shows live M1 readouts (CPU temp, GPU temp, memory used/total) + a status line
(Normal / Hot / GPU not responding), led by a **Run Diagnostic** submenu (per detected game →
the guided session), plus Open dashboard / Start-Stop recording / Snapshot-copy / Quit. It
shares the dashboard's sample stream (no extra sampling) and drives the existing MainWindow
flows. With a tray present, closing the window **hides to the tray** (Quit exits); `rigdoctor-gui
--tray` starts hidden for autostart. Optional; shares the Qt dependency with M10. *Needs a tray
host* — on GNOME that means the AppIndicator extension; degrades to no-op if none is available.
- **M9 Installer** — interactive wizard layered on the `.deb` (D8); apt-first dependency
resolution; enables the logger service and trigger mode. *Implemented (first cut):* distro/
package-manager/GPU detection (`core/sysenv`), an optional-component catalog (`core/catalog`),
and dependency install via pkexec/sudo — `rigdoctor install [--check] [-y]` + GUI Setup tab.
The **user-local app install** is `install.sh` (private venv + `~/.local/bin` launchers +
desktop entry, no root; handles the `python3-venv` prerequisite) plus a self-extracting
**`.run`** (pure-Python self-extractor, `packaging/make_run.py`, built by CI). *Pending:*
config/module selection + `systemd --user`
service enable.
- **M12 Session sharing / remote assist** (D16, scoped to terminal-only by **D23**) — a single
mode: a **host-consented shared terminal** over the relay. The host shares a real PTY running
their `$SHELL` (colors/theming preserved — fish etc.); the guest watches live and can type
**only if the host allows it** (otherwise read-only) — a deliberate, consent-gated exception
to D9. The host reads along and can type too (e.g. a sudo password, which stays local). Either
side can pop the terminal **full-screen**. Account-gated by the Gitea token. *The earlier
read-only stats view and `share serve` (Tier 1/2) were removed.*
- **M13 Auto-update** (D18) — *check + auth implemented:* updates are **gated to Gitea account
holders** via a Personal Access Token, stored **encrypted in the OS keyring** (`secret-tool`)
with a 0600-file fallback (`config.load_token`/`save_token`/`token_backend`). `core/updates`
queries the releases API with the token; CLI `login`/`logout`/`update`; GUI Setup "Update
access" panel + sidebar states. The no-root **self-update apply** is implemented:
`rigdoctor update` runs an authenticated `pip install --upgrade "rigdoctor[gui] @
git+https://oauth2:<token>@…@<tag>"` into the user-local venv (GUI "Update to v…" button +
restart prompt; token scrubbed). Installed via the user-local **`install.sh`** /
self-extracting **`.run`** (M9).
*Original plan:* On launch, check the public Gitea releases API and
**self-update a user-local install with no root** (download → verify checksum/signature →
atomic symlink swap → restart, incl. the daemon). HTTPS-only, version-check-only (no
telemetry), opt-out-able. Surfaced in the GUI; `rigdoctor update` in the CLI. (`.deb` users
update via apt instead.)
- **M14 AI assistant** (D24) — optional, **strictly opt-in, never automatic**: explains the
collected diagnostics in plain language only when the user presses **"Explain with AI"**
(`core/ai.py`, GUI button on the diagnostic dialog, `rigdoctor ai explain`). The user picks a
provider explicitly (no default): **Ollama** (local, private, no key) or **Claude** (Anthropic
Messages API, key in the keyring; consent prompt before sending). Answers are **grounded**
we pass the actual findings plus matched reference facts from a curated knowledge base
(`core/ai_knowledge.py`, "RAG-lite": exact keyword/code match, no embeddings, stdlib only),
which lifts a small local model and sharpens Claude. Stdlib `urllib` (no pip deps); output is
advisory (D9). Configure in **Settings → AI assistant**.
- **M15 Logging & report bundles** (D25) — opt-in via one `logging_enabled` toggle (default off):
application logging to a rotating `app.log` (`core/applog.py`) and **per-diagnostic storage**
(`core/diagstore.py`) — each diagnostic gets its own `DATA_DIR/diagnostics/<id>/`: capture,
`result.json`, `report.txt`, the full **inventory** (M5: hardware/OS), scoped **game logs**
(`core/gamelogs.py`), scoped **system logs** (`core/syslogs.py``journalctl -k`,
`coredumpctl`, an `nvidia-smi -q` snapshot, and the X11/Wayland display-server log), and an
`ai/` record of every AI interaction (exact data sent, model, reply). **"Report"** zips one
into `DATA_DIR/reports/` (GUI button on the diagnostic dialog; CLI `rigdoctor bundle`). Logs
are session-scoped and fed to the AI on "Explain". Stays local; shareable on demand.
## Bundles (final — D14)
- **Essential:** M1 + M3 + M4 *(the MVP, NVIDIA-only — D5)*
- **Monitoring:** M2 + M8
- **Diagnostics:** M5 + M6
- **Desktop UI:** M10 + M11 *(adds PySide6)*
- **Sharing:** M12 *(session sharing / remote assist — D16)*
- **AI:** M14 *(optional AI explanations — D24)*
## MVP candidate — *confirmed (D5)*
**M1 + M3 + M4 (Essential), NVIDIA-only, CLI-first.** Gives a working tool that captures the
GPU crash and explains the logs — deliverable before the installer, GUI/tray, or multi-vendor
work.
</content>