78cd417d0b
packaging/make_deb.py builds rigdoctor_<ver>_all.deb (Architecture: all) via dpkg-deb, no debhelper: Depends python3; Recommends python3-pyside6/pyte (GUI by default, --no-install-recommends = CLI only). Installs the package, both launchers, desktop entry + icon; postinst refreshes the desktop database. release.yml builds it as a release asset and optionally pushes to the Gitea apt registry (REGISTRY_TOKEN). Verified locally: valid .deb, packaged launcher runs 'rigdoctor --version'. Docs/README/ROADMAP/MODULES updated; M9 complete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.8 KiB
7.8 KiB
RigDoctor — Roadmap (DRAFT v0.2)
Phased so the seed use case (capturing the RTX 3070 crash / black-screen events) is solved
early, before the broader "tool for all Linux gamers" work. Stack: Python 3 + Qt/PySide6;
Ubuntu + NVIDIA first; .deb distribution (see DECISIONS.md).
Phase 0 — Workspace & spec (done)
- Create repo + docs scaffold
- Settle the foundational decisions D1–D11 (name, language, platform/GPU priority, MVP scope, trigger model, packaging, scope-of-action, GUI/tray)
- Lock the MVP scope (M1 + M3 + M4, NVIDIA-only)
Phase 1 — MVP: capture this crash (Essential bundle, NVIDIA-only, CLI)
- M1 sensor core (NVIDIA via nvidia-smi + hwmon for CPU/RAM/NVMe), stdlib-only
- M3 crash-capture logger (JSONL, fsync per sample, GPU-lost detection, size rotation)
- Manual trigger mode (
rigdoctor record run/start/stop/status);systemd --userservice + other trigger modes in Phase 4 (runis already the service entrypoint) - M4 health report (Xid/panic/OOM/MCE/AER/thermal scan + SMART + driver-mismatch + journald-persistence + live temps, suggested fixes only — D9; GPU-firmware verify deferred)
record reportpost-crash summary (peak temps/power per subsystem, events, last N samples)- Exit criteria: user can run it during gaming and, after a freeze/black-screen, see the last readings + a plausible cause.
Phase 2 — Live monitor (terminal)
- M2 TUI dashboard (
rigdoctor monitor,tui.py): curses, current/min/max grouped by subsystem with temp/usage color bands; q quit / r reset; plain-redraw fallback on non-TTY - M8 basic alerting (overheat/throttle/GPU-lost notifications)
Phase 3 — Diagnostics breadth
- M5 system inventory + exportable report
- [~] M6 gaming environment checks (suggest-only) — Steam game/library detection done
(multi-library
libraryfolders.vdfdiscovery +appmanifestscan, opt-in libraries, launch-time background rescan with new-game badge; CLIrigdoctor games, GUI Games page). This is also the D12 "pick a game" foundation. Env-check engine done (rigdoctor gameenv+ GUI Environment page): PCIe ASPM, NVIDIA persistence, CPU governor, GameMode, MangoHud, swappiness, shader cache, THP, mitigations, Proton versions — read-only with fix commands. Also: GPU PowerMizer (X), Wine + Steam-client versions, and non-Steam launchers (Lutris/Heroic,core/launchers.py). Pending: the zero-config watcher (D12 fallback, lands with M9's trigger-mode work). - SMART integration (smartmontools if present)
Phase 4 — Desktop UI & installer
- M10 desktop GUI (PySide6: dashboard w/ history graphs, logs, health, games, environment, inventory, setup, notifications, share)
- M11 tray / menu-bar applet (
gui/tray.py: live CPU/GPU temp + memory readouts, status line, Run Diagnostic submenu per game, Open dashboard / Start-Stop recording / Snapshot / Quit — D13; close-to-tray,--trayautostart). Needs a tray host (AppIndicator on GNOME). - [~] Guided diagnostic session (pick game → focused M3 capture → M4 scan → findings),
shared by tray/GUI/CLI — core + CLI + GUI done (
core/diagnostic.py,rigdoctor diagnose start/status/finish, and a Run Diagnostic button per game on the GUI Games page → recording banner → results dialog with the capture summary + findings). Tags a focused capture with the chosen game (own diagnostic log, window-scoped report) and combines the capture summary with the M4 findings. Auto start/stop via the D12 wrapper is wired in, and a hard-crash is detected (capture left without a clean stop) → flagged on next launch with a deeper crash-boot log analysis. Pending: the tray (M11) entry point and the zero-config watcher. - [~] Logger trigger modes: always-on + game-launch (D12) — game-launch wrapper done:
rigdoctor wrap %command%(per-game Steam launch option / Lutris/Heroic wrapper field) auto-brackets a focused capture around the game; GUI "Auto-capture…" helper shows the launch-option string. Pending: global Steam compat-tool registration, the zero-config watcher (Steam RunningAppID + /proc), GameMode hook, and the always-onsystemd --userservice. - [~] M9 interactive installer — done: distro/GPU detection + optional-dependency install
(
rigdoctor install, GUI Settings); user-localinstall.sh+ self-extracting.run(no-root venv install, handles python3-venv prereq, CI-built);systemd --usertrigger modes (core/service.py,rigdoctor service mode manual|always-on|game-launch+ GUI Settings "Recording trigger") incl. the zero-config game-launch watcher (core/watcher.py,rigdoctor watch); and a graphical first-run setup wizard (gui/setup_wizard.py): environment → dependency-bundle selection → install → recording trigger → readiness, auto-launched by install.sh and re-runnable from Settings; and a.deb(packaging/make_deb.py,Architecture: all,Depends: python3,Recommends: python3-pyside6/pyte) built + published in CI (release asset + optional Gitea apt registry). M9 complete. .debpackaging (D8) — built viadpkg-deb(no debhelper); GUI deps as Recommends soapt install rigdoctorincludes the Desktop UI,--no-install-recommends= CLI only.
Phase 5 — Breadth (later)
- AMD GPU support in M1 (Steam Deck / Radeon)
- Intel GPU best-effort
- M13 auto-update (D18) — launch-time version check (GUI sidebar) + no-root self-update
apply (
rigdoctor update/ sidebar button → authenticated pip upgrade), token-gated. Restart-after-update is manual for now. - [~] Optional auto-apply of suggested fixes behind explicit consent (D9 milestone) — first cut shipped for M6 (D22): one-click apply of runtime-reversible tunables (CPU governor, NVIDIA persistence, PCIe ASPM, swappiness, THP) via a single pkexec prompt, no reboot. GRUB-based fixes + CPU mitigations remain suggestion-only.
Phase 6 — Session sharing / remote assist (M12, D16 → scoped to terminal-only by D23)
- Shared terminal — a real PTY (host's
$SHELL) shared over the relay, color-rendered (pyte), full-screen-able; the guest watches and may type only on host consent (D9 exception); host reads along + can type (sudo). The single share mode. - [removed] The read-only stats view (
share serve) and bundle export — dropped per D23; the shared terminal is the only sharing mode.
Phase 7 — AI assistant (M14, D24)
- Explain diagnostics with AI — opt-in, never automatic (
core/ai.py, "Explain with AI" button +rigdoctor ai explain). Provider chosen explicitly: Ollama (local) or Claude (Anthropic). Grounded with a curated reference KB (core/ai_knowledge.py, RAG-lite, exact match — no embeddings); stdliburllib. Settings → AI assistant. - Possible follow-ups: interactive chat grounded in the data; more reference-KB entries; an "Explain" button on the System Health page.
Phase 8 — Logging & report bundles (M15, D25)
- Opt-in logging (one
logging_enabledtoggle): rotatingapp.log(core/applog.py) + per-diagnostic storage in its own directory (core/diagstore.py) — capture, result, report, scoped game logs, and AI-interaction records. - Report bundle — zip a diagnostic (incl. exactly what was sent to the AI, the model,
and its reply) into the reports folder. GUI button +
rigdoctor bundle.
Out of scope: stress/repro module (D7); multi-distro support and packaging beyond Ubuntu/apt +
.deb(D15) — a thin seam is kept but not built out.
Dropped: stress / repro module (D7) — not on the roadmap.