bf3ac4af1a
D6 trigger modes, no root: - core/service.py: write/enable `systemd --user` units; apply_mode(manual/ always-on/game-launch) reconciles the recorder + watcher services; status(). - core/watcher.py + `rigdoctor watch`: poll Steam RunningAppID, auto-bracket a focused capture (D12 zero-config fallback; wrapper stays primary). - CLI `rigdoctor service status|mode`; config `trigger_mode`. - GUI Settings: "Recording trigger" dropdown (Apply runs apply_mode off-thread). - Tests for unit generation, mode reconciliation, watcher transitions/parse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
96 lines
6.4 KiB
Markdown
96 lines
6.4 KiB
Markdown
# RigDoctor — Roadmap (DRAFT v0.2)
|
||
|
||
Phased so the seed use case (capturing the RTX 3070 crash / black-screen events) is solved
|
||
early, before the broader "tool for all Linux gamers" work. Stack: Python 3 + Qt/PySide6;
|
||
Ubuntu + NVIDIA first; `.deb` distribution (see `DECISIONS.md`).
|
||
|
||
## Phase 0 — Workspace & spec *(done)*
|
||
- [x] Create repo + docs scaffold
|
||
- [x] Settle the foundational decisions D1–D11 (name, language, platform/GPU priority, MVP
|
||
scope, trigger model, packaging, scope-of-action, GUI/tray)
|
||
- [x] Lock the MVP scope (M1 + M3 + M4, NVIDIA-only)
|
||
|
||
## Phase 1 — MVP: capture *this* crash (Essential bundle, NVIDIA-only, CLI)
|
||
- [x] M1 sensor core (NVIDIA via nvidia-smi + hwmon for CPU/RAM/NVMe), stdlib-only
|
||
- [x] M3 crash-capture logger (JSONL, fsync per sample, GPU-lost detection, size rotation)
|
||
- [x] Manual trigger mode (`rigdoctor record run/start/stop/status`); `systemd --user`
|
||
service + other trigger modes in Phase 4 (`run` is already the service entrypoint)
|
||
- [x] M4 health report (Xid/panic/OOM/MCE/AER/thermal scan + SMART + driver-mismatch +
|
||
journald-persistence + live temps, suggested fixes only — D9; GPU-firmware verify deferred)
|
||
- [x] `record report` post-crash summary (peak temps/power per subsystem, events, last N samples)
|
||
- **Exit criteria:** user can run it during gaming and, after a freeze/black-screen, see the
|
||
last readings + a plausible cause.
|
||
|
||
## Phase 2 — Live monitor (terminal)
|
||
- [x] M2 TUI dashboard (`rigdoctor monitor`, `tui.py`): curses, current/min/max grouped by
|
||
subsystem with temp/usage color bands; q quit / r reset; plain-redraw fallback on non-TTY
|
||
- [ ] M8 basic alerting (overheat/throttle/GPU-lost notifications)
|
||
|
||
## Phase 3 — Diagnostics breadth
|
||
- [ ] M5 system inventory + exportable report
|
||
- [~] M6 gaming environment checks (suggest-only) — *Steam game/library detection done*
|
||
(multi-library `libraryfolders.vdf` discovery + `appmanifest` scan, opt-in libraries,
|
||
launch-time background rescan with new-game badge; CLI `rigdoctor games`, GUI Games page).
|
||
This is also the D12 "pick a game" foundation. *Env-check engine done* (`rigdoctor gameenv`
|
||
+ GUI Environment page): PCIe ASPM, NVIDIA persistence, CPU governor, GameMode, MangoHud,
|
||
swappiness, shader cache, THP, mitigations, Proton versions — read-only with fix commands.
|
||
Also: GPU PowerMizer (X), Wine + Steam-client versions, and non-Steam launchers
|
||
(Lutris/Heroic, `core/launchers.py`). *Pending:* the zero-config watcher (D12 fallback,
|
||
lands with M9's trigger-mode work).
|
||
- [ ] SMART integration (smartmontools if present)
|
||
|
||
## Phase 4 — Desktop UI & installer
|
||
- [x] M10 desktop GUI (PySide6: dashboard w/ history graphs, logs, health, games, environment,
|
||
inventory, setup, notifications, share)
|
||
- [x] M11 tray / menu-bar applet (`gui/tray.py`: live CPU/GPU temp + memory readouts, status
|
||
line, Run Diagnostic submenu per game, Open dashboard / Start-Stop recording / Snapshot /
|
||
Quit — D13; close-to-tray, `--tray` autostart). Needs a tray host (AppIndicator on GNOME).
|
||
- [~] Guided diagnostic session (pick game → focused M3 capture → M4 scan → findings),
|
||
shared by tray/GUI/CLI — *core + CLI + GUI done* (`core/diagnostic.py`, `rigdoctor
|
||
diagnose start/status/finish`, and a **Run Diagnostic** button per game on the GUI Games
|
||
page → recording banner → results dialog with the capture summary + findings). Tags a
|
||
focused capture with the chosen game (own diagnostic log, window-scoped report) and
|
||
combines the capture summary with the M4 findings. **Auto start/stop** via the D12
|
||
wrapper is wired in, and a **hard-crash is detected** (capture left without a clean stop)
|
||
→ flagged on next launch with a deeper crash-boot log analysis. *Pending:* the tray (M11)
|
||
entry point and the zero-config watcher.
|
||
- [~] Logger trigger modes: always-on + game-launch (D12) — *game-launch **wrapper** done:*
|
||
`rigdoctor wrap %command%` (per-game Steam launch option / Lutris/Heroic wrapper field)
|
||
auto-brackets a focused capture around the game; GUI "Auto-capture…" helper shows the
|
||
launch-option string. *Pending:* global Steam compat-tool registration, the zero-config
|
||
watcher (Steam RunningAppID + /proc), GameMode hook, and the always-on `systemd --user`
|
||
service.
|
||
- [~] M9 interactive installer — *done:* distro/GPU detection + optional-dependency install
|
||
(`rigdoctor install`, GUI Settings); **user-local `install.sh` + self-extracting `.run`**
|
||
(no-root venv install, handles python3-venv prereq, CI-built); **`systemd --user` trigger
|
||
modes** (`core/service.py`, `rigdoctor service mode manual|always-on|game-launch` + GUI
|
||
Settings "Recording trigger") incl. the zero-config **game-launch watcher**
|
||
(`core/watcher.py`, `rigdoctor watch`). *Pending:* module-selection config during install.
|
||
- [ ] `.deb` packaging (D8) declaring per-bundle deps incl. python3-pyside6 for Desktop UI
|
||
|
||
## Phase 5 — Breadth (later)
|
||
- [ ] AMD GPU support in M1 (Steam Deck / Radeon)
|
||
- [ ] Intel GPU best-effort
|
||
- [x] M13 auto-update (D18) — launch-time version check (GUI sidebar) + no-root self-update
|
||
apply (`rigdoctor update` / sidebar button → authenticated pip upgrade), token-gated.
|
||
Restart-after-update is manual for now.
|
||
- [~] Optional auto-apply of suggested fixes behind explicit consent (D9 milestone) — *first
|
||
cut shipped for M6 (D22):* one-click apply of runtime-reversible tunables (CPU governor,
|
||
NVIDIA persistence, PCIe ASPM, swappiness, THP) via a single pkexec prompt, no reboot.
|
||
GRUB-based fixes + CPU mitigations remain suggestion-only.
|
||
|
||
## Phase 6 — Session sharing / remote assist (M12, D16)
|
||
Escalating ladder, built in order:
|
||
- [ ] Tier 1: `share export` — diagnostic bundle (inventory + recent log + report); B opens
|
||
it in RigDoctor. One-way, safest.
|
||
- [x] Tier 2: live read-only view — `rigdoctor share serve` (stdlib HTTP, token-gated:
|
||
sensors + health + inventory). Remote = user-chosen tunnel; GUI controls still to add.
|
||
- [x] Tier 3: host-consented interactive terminal — a real PTY shell shared over the relay
|
||
(own `pty`, pyte-rendered guest), off by default; host reads along + can type (sudo).
|
||
|
||
> **Out of scope:** stress/repro module (D7); multi-distro support and packaging beyond
|
||
> Ubuntu/apt + `.deb` (D15) — a thin seam is kept but not built out.
|
||
|
||
> **Dropped:** stress / repro module (D7) — not on the roadmap.
|
||
</content>
|