# RigDoctor β€” Module Catalog (DRAFT v0.2) Status: ⬜ not started Β· 🟦 designing Β· 🟨 in progress Β· βœ… done > Module set per D14, plus **M12 (session sharing, D16)**, **M13 (auto-update, D18)**, > **M14 (AI assistant, D24)**, and **M15 (logging & reports, D25)**. > **M7 (stress/repro) was dropped (D7).** M10/M11 are the GUI and tray modules (D10/D11). > GPU scope reads "all (NVIDIA first)" β€” NVIDIA first, others via the vendor abstraction (D4). | ID | Module | Bundle | Key deps | GPU scope | Priority | Status | |----|--------|--------|----------|-----------|----------|--------| | M1 | Sensor core | Essential | none (nvidia-smi, sysfs) | all (NVIDIA first) | P0 | βœ… | | M3 | Crash-capture logger | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | βœ… | | M4 | Health report (log scan) | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | βœ… | | M2 | Live monitor (TUI) | Monitoring | none (stdlib curses) | all | P1 | βœ… | | M8 | Alerting | Monitoring | libnotify (opt) | all | P2 | βœ… | | M5 | System inventory | Diagnostics | none (opt: lm-sensors, dmidecode) | all | P1 | βœ… | | M6 | Gaming env checks | Diagnostics | none | all | P2 | 🟨 | | M10 | Desktop GUI | Desktop UI | **python3-pyside6** | all | P2 | βœ… | | M11 | Tray / menu-bar applet | Desktop UI | **python3-pyside6** (+ AppIndicator on GNOME) | all | P2 | βœ… | | M9 | Installer | (meta) | none | all | P1 | 🟨 | | M12 | Session sharing (shared terminal) | Sharing | none (relay) | all | P3 | βœ… | | M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | βœ… | | M14 | AI assistant (explain diagnostics) | (optional) | none (stdlib urllib; Ollama or Claude) | all | P3 | βœ… | | M15 | Logging & report bundles | (core) | none (stdlib logging + zip) | all | P3 | βœ… | | ~~M7~~ | ~~Stress / repro~~ | β€” | β€” | β€” | β€” | ❌ dropped (D7) | ## Notes per module - **M1 Sensor core** β€” the foundation everything else samples from. Stdlib-only. Abstracts NVIDIA/AMD/Intel + hwmon behind one interface; **ship the NVIDIA + hwmon path first**. - **M3 Crash-capture logger** β€” the highest-value piece for the seed use case. `fsync` per sample; GPU-lost detection via query timeout; bounded rotation; `systemd --user` service with a **user-selectable trigger mode** (always-on / game-launch / manual β€” D6). *Implemented (manual trigger):* JSONL log with fsync-per-sample, size-based rotation (`log_max_bytes`/`log_backups`), GPU-lost/recovered event markers, atomic status file, and `rigdoctor record run|start|stop|status|report`. The foreground `run` is the systemd-ready entrypoint. The **game-launch trigger** is implemented via the D12 wrapper (`rigdoctor wrap %command%`, see M6/below); the `systemd --user` service unit + always-on trigger (D6) and the zero-config watcher (D12) are still pending. Also fully driven from the GUI's Recording/Logs page (M10) via shared `core.reccontrol`. - **M4 Health report** β€” turns scattered logs into a prioritized, plain-language findings list with **suggested** fixes (read-only, D9). Reuses M1 for a live snapshot. Also powers the **guided diagnostic session** (with M3): pick a game β†’ focused capture β†’ scan β†’ findings (see SPEC Β§4). *Implemented:* journalctl scan (Xid/panic/OOM/MCE/AER/thermal/amdgpu), SMART, NVIDIA driver-mismatch, journald-persistence + live-temp checks; `rigdoctor report` (text/JSON) + GUI Health tab. GPU-firmware verification deferred. - **M2 Live monitor** β€” the terminal "HWMonitor for Linux" face. *Implemented (`tui.py`):* `rigdoctor monitor` is a stdlib **curses** dashboard β€” current / session-min / session-max per sensor, grouped by subsystem, with temperature & utilization color bands; `q` quits, `r` resets the min/max. Falls back to a plain redraw on a non-TTY (`--plain` forces it). - **M5 / M6 Diagnostics** β€” inventory export + gaming-env checks; M6 flags risky settings and suggests the fix command but does not apply it (D9). *M6 implemented (Steam detection first β€” the D12 "pick a game" foundation):* discovers Steam installs + all library folders (`libraryfolders.vdf`, multi-drive) and the games in each (`appmanifest_*.acf`), filtering runtimes/Proton/redistributables β€” stdlib only. **Libraries are opt-in** (`steam_libraries` config); the GUI **Games** page lists them with per-library counts and rescans in the background on every launch, badging games installed since the last scan (cached in `state/games.json`). CLI: `rigdoctor games` / `games libraries [--enable|--disable|--all]`. *Env-check engine implemented* (`core/gameenv.py`): a read-only findings report (reusing the M4 `Finding` model) over PCIe ASPM, NVIDIA persistence mode, CPU governor (the three seed-case contributors to GPU bus-drop / Xid 79), GameMode, MangoHud, swappiness, shader cache, THP, CPU mitigations, and installed Proton versions β€” each with the suggested fix command. CLI `rigdoctor gameenv`; GUI **Environment** page. Per **D22**, the GUI adds **one-click apply** for the runtime-reversible tunables (governor / NVIDIA persistence / PCIe ASPM / swappiness / THP β€” dropdown + Apply via a single pkexec prompt, `core/fixes.py`) and **one-click install** of optional tools (GameMode / MangoHud / cpupower, now in the M9 catalog). GRUB/mitigations stay suggestion-only. *Guided diagnostic (D12 "pick a game", `core/diagnostic.py`):* a focused capture tagged with a game β†’ window-scoped report (capture summary + M4 findings), in the CLI (`rigdoctor diagnose start/status/finish`) and GUI (per-game **Run Diagnostic** β†’ recording banner β†’ results dialog). **Auto-capture** via the D12 wrapper (`rigdoctor wrap %command%`, `core/wrap.py`; GUI "Auto-capture…" helper). **Hard crashes are detected** (capture left without a clean stop) and flagged on next launch with a crash-boot kernel-log analysis (`pending_crash`/`analyze_crash` + `health.check_previous_boot`). **Non-Steam launchers** (Lutris SQLite + Heroic JSON, `core/launchers.py`) are detected and listed alongside Steam games; env checks also cover **GPU PowerMizer** (X), **Wine** and **Steam-client** versions. *Pending:* the zero-config watcher (D12 fallback) β€” landing with M9's trigger-mode work. - **M8 Alerting** β€” threshold/event notifications; integrates with the tray applet (M11). - **M10 Desktop GUI** β€” PySide6 graphical front-end over the core engine. Optional; adds the Qt dependency. Dark-themed window with a **grouped sidebar** (Monitor / Diagnose / System / App) over: **Dashboard** (live history graphs + per-subsystem cards), **Games** (M6 detection + Run Diagnostic), **Recordings** (recorder controls + view/report any captured log + analyze a crash), **System Health** (M4 scan), **Tuning** (M6 gaming tunables + fixes), **Inventory** (M5), **Settings** (components/deps + alerts + account + uninstall), and **Share** (M12). A global recording badge shows on every page. GUI-first per D17. - **M11 Tray applet** β€” `QSystemTrayIcon` menu-bar applet. *Implemented (`gui/tray.py`, D13):* the menu shows live M1 readouts (CPU temp, GPU temp, memory used/total) + a status line (Normal / Hot / GPU not responding), led by a **Run Diagnostic** submenu (per detected game β†’ the guided session), plus Open dashboard / Start-Stop recording / Snapshot-copy / Quit. It shares the dashboard's sample stream (no extra sampling) and drives the existing MainWindow flows. With a tray present, closing the window **hides to the tray** (Quit exits); `rigdoctor-gui --tray` starts hidden for autostart. Optional; shares the Qt dependency with M10. *Needs a tray host* β€” on GNOME that means the AppIndicator extension; degrades to no-op if none is available. - **M9 Installer** β€” interactive wizard layered on the `.deb` (D8); apt-first dependency resolution; enables the logger service and trigger mode. *Implemented (first cut):* distro/ package-manager/GPU detection (`core/sysenv`), an optional-component catalog (`core/catalog`), and dependency install via pkexec/sudo β€” `rigdoctor install [--check] [-y]` + GUI Setup tab. The **user-local app install** is `install.sh` (private venv + `~/.local/bin` launchers + desktop entry, no root; handles the `python3-venv` prerequisite) plus a self-extracting **`.run`** (pure-Python self-extractor, `packaging/make_run.py`, built by CI). *Pending:* config/module selection + `systemd --user` service enable. - **M12 Session sharing / remote assist** (D16, scoped to terminal-only by **D23**) β€” a single mode: a **host-consented shared terminal** over the relay. The host shares a real PTY running their `$SHELL` (colors/theming preserved β€” fish etc.); the guest watches live and can type **only if the host allows it** (otherwise read-only) β€” a deliberate, consent-gated exception to D9. The host reads along and can type too (e.g. a sudo password, which stays local). Either side can pop the terminal **full-screen**. Account-gated by the Gitea token. *The earlier read-only stats view and `share serve` (Tier 1/2) were removed.* - **M13 Auto-update** (D18) β€” *check + auth implemented:* updates are **gated to Gitea account holders** via a Personal Access Token, stored **encrypted in the OS keyring** (`secret-tool`) with a 0600-file fallback (`config.load_token`/`save_token`/`token_backend`). `core/updates` queries the releases API with the token; CLI `login`/`logout`/`update`; GUI Setup "Update access" panel + sidebar states. The no-root **self-update apply** is implemented: `rigdoctor update` runs an authenticated `pip install --upgrade "rigdoctor[gui] @ git+https://oauth2:@…@"` into the user-local venv (GUI "Update to v…" button + restart prompt; token scrubbed). Installed via the user-local **`install.sh`** / self-extracting **`.run`** (M9). *Original plan:* On launch, check the public Gitea releases API and **self-update a user-local install with no root** (download β†’ verify checksum/signature β†’ atomic symlink swap β†’ restart, incl. the daemon). HTTPS-only, version-check-only (no telemetry), opt-out-able. Surfaced in the GUI; `rigdoctor update` in the CLI. (`.deb` users update via apt instead.) - **M14 AI assistant** (D24) β€” optional, **strictly opt-in, never automatic**: explains the collected diagnostics in plain language only when the user presses **"Explain with AI"** (`core/ai.py`, GUI button on the diagnostic dialog, `rigdoctor ai explain`). The user picks a provider explicitly (no default): **Ollama** (local, private, no key) or **Claude** (Anthropic Messages API, key in the keyring; consent prompt before sending). Answers are **grounded** β€” we pass the actual findings plus matched reference facts from a curated knowledge base (`core/ai_knowledge.py`, "RAG-lite": exact keyword/code match, no embeddings, stdlib only), which lifts a small local model and sharpens Claude. Stdlib `urllib` (no pip deps); output is advisory (D9). Configure in **Settings β†’ AI assistant**. - **M15 Logging & report bundles** (D25) β€” opt-in via one `logging_enabled` toggle (default off): application logging to a rotating `app.log` (`core/applog.py`) and **per-diagnostic storage** (`core/diagstore.py`) β€” each diagnostic gets its own `DATA_DIR/diagnostics//` (capture, `result.json`, `report.txt`, scoped game logs, and an `ai/` record of every AI interaction: exact data sent, model, reply). **"Report"** zips one into `DATA_DIR/reports/` (GUI button on the diagnostic dialog; CLI `rigdoctor bundle`). Stays local; shareable on demand. ## Bundles (final β€” D14) - **Essential:** M1 + M3 + M4 *(the MVP, NVIDIA-only β€” D5)* - **Monitoring:** M2 + M8 - **Diagnostics:** M5 + M6 - **Desktop UI:** M10 + M11 *(adds PySide6)* - **Sharing:** M12 *(session sharing / remote assist β€” D16)* - **AI:** M14 *(optional AI explanations β€” D24)* ## MVP candidate β€” *confirmed (D5)* **M1 + M3 + M4 (Essential), NVIDIA-only, CLI-first.** Gives a working tool that captures the GPU crash and explains the logs β€” deliverable before the installer, GUI/tray, or multi-vendor work.