Files
rigdoctor/docs/ROADMAP.md
T
jessey 03b2dd8363 feat: D12 Steam-launch wrapper for auto crash-capture + doc status fixes — 0.16.0
D12 "build first" wrapper: `rigdoctor wrap %command%` (Steam launch option /
Lutris/Heroic wrapper field) auto-brackets a focused diagnostic around a game —
start a game-tagged capture on launch, clean stop on exit; a hard freeze leaves
it unterminated → flagged as a crash next launch.

- core/wrap.py: game name from SteamAppId, PATH-proof launch_option(), run()
  that doesn't disturb an existing capture and returns the game's exit code.
- diagnostic.start() preserves an unanalyzed crash to diagnostic-crash.jsonl
  before clearing, so auto-relaunch can't wipe an unseen crash; pending_crash/
  analyze_crash check the archive first.
- GUI: "Auto-capture…" helper dialog (copyable launch-option string).
- Tests for wrap (name resolution, exit-code passthrough, no-double-start).
- docs: fix stale MODULES.md status column (M1/M3/M4/M5/M8/M10/M13 → done),
  update ROADMAP/MODULES for the wrapper + crash detection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:59:54 +02:00

5.8 KiB
Raw Blame History

RigDoctor — Roadmap (DRAFT v0.2)

Phased so the seed use case (capturing the RTX 3070 crash / black-screen events) is solved early, before the broader "tool for all Linux gamers" work. Stack: Python 3 + Qt/PySide6; Ubuntu + NVIDIA first; .deb distribution (see DECISIONS.md).

Phase 0 — Workspace & spec (done)

  • Create repo + docs scaffold
  • Settle the foundational decisions D1D11 (name, language, platform/GPU priority, MVP scope, trigger model, packaging, scope-of-action, GUI/tray)
  • Lock the MVP scope (M1 + M3 + M4, NVIDIA-only)

Phase 1 — MVP: capture this crash (Essential bundle, NVIDIA-only, CLI)

  • M1 sensor core (NVIDIA via nvidia-smi + hwmon for CPU/RAM/NVMe), stdlib-only
  • M3 crash-capture logger (JSONL, fsync per sample, GPU-lost detection, size rotation)
  • Manual trigger mode (rigdoctor record run/start/stop/status); systemd --user service + other trigger modes in Phase 4 (run is already the service entrypoint)
  • M4 health report (Xid/panic/OOM/MCE/AER/thermal scan + SMART + driver-mismatch + journald-persistence + live temps, suggested fixes only — D9; GPU-firmware verify deferred)
  • record report post-crash summary (peak temps/power per subsystem, events, last N samples)
  • Exit criteria: user can run it during gaming and, after a freeze/black-screen, see the last readings + a plausible cause.

Phase 2 — Live monitor (terminal)

  • M2 TUI dashboard (current/min/max, grouped, throttle highlighting)
  • M8 basic alerting (overheat/throttle/GPU-lost notifications)

Phase 3 — Diagnostics breadth

  • M5 system inventory + exportable report
  • [~] M6 gaming environment checks (suggest-only) — Steam game/library detection done (multi-library libraryfolders.vdf discovery + appmanifest scan, opt-in libraries, launch-time background rescan with new-game badge; CLI rigdoctor games, GUI Games page). This is also the D12 "pick a game" foundation. Env-check engine done (rigdoctor gameenv + GUI Environment page): PCIe ASPM, NVIDIA persistence, CPU governor, GameMode, MangoHud, swappiness, shader cache, THP, mitigations, Proton versions — read-only with fix commands. Pending: non-Steam launchers (Lutris/Heroic) + GPU power-profile (PowerMizer) checks.
  • SMART integration (smartmontools if present)

Phase 4 — Desktop UI & installer

  • M10 desktop GUI (PySide6: dashboard, log browser, report viewer, logger controls)
  • M11 tray / menu-bar applet (QSystemTrayIcon: live M1 readouts + Run Diagnostic + supporting actions — D13)
  • [~] Guided diagnostic session (pick game → focused M3 capture → M4 scan → findings), shared by tray/GUI/CLI — core + CLI + GUI done (core/diagnostic.py, rigdoctor diagnose start/status/finish, and a Run Diagnostic button per game on the GUI Games page → recording banner → results dialog with the capture summary + findings). Tags a focused capture with the chosen game (own diagnostic log, window-scoped report) and combines the capture summary with the M4 findings. Auto start/stop via the D12 wrapper is wired in, and a hard-crash is detected (capture left without a clean stop) → flagged on next launch with a deeper crash-boot log analysis. Pending: the tray (M11) entry point and the zero-config watcher.
  • [~] Logger trigger modes: always-on + game-launch (D12) — game-launch wrapper done: rigdoctor wrap %command% (per-game Steam launch option / Lutris/Heroic wrapper field) auto-brackets a focused capture around the game; GUI "Auto-capture…" helper shows the launch-option string. Pending: global Steam compat-tool registration, the zero-config watcher (Steam RunningAppID + /proc), GameMode hook, and the always-on systemd --user service.
  • [~] M9 interactive installer — done: distro/GPU detection + optional-dependency install (rigdoctor install, GUI Setup tab); user-local install.sh + self-extracting .run (no-root venv install, handles python3-venv prereq, CI-built). Pending: module-selection config + systemd --user service enable + trigger-mode pick.
  • .deb packaging (D8) declaring per-bundle deps incl. python3-pyside6 for Desktop UI

Phase 5 — Breadth (later)

  • AMD GPU support in M1 (Steam Deck / Radeon)
  • Intel GPU best-effort
  • M13 auto-update (D18) — launch-time version check (GUI sidebar) + no-root self-update apply (rigdoctor update / sidebar button → authenticated pip upgrade), token-gated. Restart-after-update is manual for now.
  • [~] Optional auto-apply of suggested fixes behind explicit consent (D9 milestone) — first cut shipped for M6 (D22): one-click apply of runtime-reversible tunables (CPU governor, NVIDIA persistence, PCIe ASPM, swappiness, THP) via a single pkexec prompt, no reboot. GRUB-based fixes + CPU mitigations remain suggestion-only.

Phase 6 — Session sharing / remote assist (M12, D16)

Escalating ladder, built in order:

  • Tier 1: share export — diagnostic bundle (inventory + recent log + report); B opens it in RigDoctor. One-way, safest.
  • Tier 2: live read-only view — rigdoctor share serve (stdlib HTTP, token-gated: sensors + health + inventory). Remote = user-chosen tunnel; GUI controls still to add.
  • Tier 3: host-consented interactive terminal — a real PTY shell shared over the relay (own pty, pyte-rendered guest), off by default; host reads along + can type (sudo).

Out of scope: stress/repro module (D7); multi-distro support and packaging beyond Ubuntu/apt + .deb (D15) — a thin seam is kept but not built out.

Dropped: stress / repro module (D7) — not on the roadmap.