ce5f830393
release / release (push) Successful in 2m13s
Crash-capture logger (M3): - crash-safe JSONL (fsync per sample), size-based rotation, GPU-lost/recovered markers, atomic status file - CLI: record run/start/stop/status/report (run = systemd-ready entrypoint) - shared core.reccontrol so CLI + GUI drive the same recorder - crashlog tests (writer, rotation, reader, summary, recorder) GUI: - Recording/Logs page: start/stop/interval controls, live status, post-crash report - shared render helpers (format_raw/headline, render_summary) Docs/decisions: - GUI-first (D17); CLI keeps full parity - D8 revised: user-local self-updating install primary, .deb optional - planned: M12 session sharing (D16), M13 no-root auto-update from public repo (D18) - versioning + CHANGELOG convention (D19) Infra: - .gitea/workflows/release.yml: build wheel+sdist and publish a Gitea release v<version> on push to main - align version to the 0.0.x release line; bump to 0.0.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.5 KiB
5.5 KiB
RigDoctor — Module Catalog (DRAFT v0.2)
Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
Module set per D14, plus M12 (session sharing, D16) and M13 (auto-update, D18). M7 (stress/repro) was dropped (D7). M10/M11 are the GUI and tray modules (D10/D11). GPU scope reads "all (NVIDIA first)" — NVIDIA first, others via the vendor abstraction (D4).
| ID | Module | Bundle | Key deps | GPU scope | Priority | Status |
|---|---|---|---|---|---|---|
| M1 | Sensor core | Essential | none (nvidia-smi, sysfs) | all (NVIDIA first) | P0 | ⬜ |
| M3 | Crash-capture logger | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | 🟨 |
| M4 | Health report (log scan) | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | ⬜ |
| M2 | Live monitor (TUI) | Monitoring | none (stdlib curses) | all | P1 | ⬜ |
| M8 | Alerting | Monitoring | libnotify (opt) | all | P2 | ⬜ |
| M5 | System inventory | Diagnostics | none (opt: lm-sensors, dmidecode) | all | P1 | ⬜ |
| M6 | Gaming env checks | Diagnostics | none | all | P2 | ⬜ |
| M10 | Desktop GUI | Desktop UI | python3-pyside6 | all | P2 | 🟨 |
| M11 | Tray / menu-bar applet | Desktop UI | python3-pyside6 (+ AppIndicator on GNOME) | all | P2 | ⬜ |
| M9 | Installer | (meta) | none | all | P1 | ⬜ |
| M12 | Session sharing / remote assist | Sharing | none (Tier 3: tmate/sshx) | all | P3 | ⬜ |
| M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | ⬜ |
| — | — | — | — | ❌ dropped (D7) |
Notes per module
- M1 Sensor core — the foundation everything else samples from. Stdlib-only. Abstracts NVIDIA/AMD/Intel + hwmon behind one interface; ship the NVIDIA + hwmon path first.
- M3 Crash-capture logger — the highest-value piece for the seed use case.
fsyncper sample; GPU-lost detection via query timeout; bounded rotation;systemd --userservice with a user-selectable trigger mode (always-on / game-launch / manual — D6). Implemented (manual trigger): JSONL log with fsync-per-sample, size-based rotation (log_max_bytes/log_backups), GPU-lost/recovered event markers, atomic status file, andrigdoctor record run|start|stop|status|report. The foregroundrunis the systemd-ready entrypoint; the service unit + always-on/game-launch triggers (D6/D12) land in Phase 4. Also fully driven from the GUI's Recording/Logs page (M10) via sharedcore.reccontrol. - M4 Health report — turns scattered logs into a prioritized, plain-language findings list with suggested fixes (read-only, D9). Reuses M1 for a live snapshot. Also powers the guided diagnostic session (with M3): pick a game → focused capture → scan → findings (see SPEC §4).
- M2 Live monitor — depends on M1; the terminal "HWMonitor for Linux" face. Stdlib-only.
- M5 / M6 Diagnostics — inventory export + gaming-env checks; M6 flags risky settings and suggests the fix command but does not apply it (D9).
- M8 Alerting — threshold/event notifications; integrates with the tray applet (M11).
- M10 Desktop GUI — PySide6 graphical front-end over the core engine (dashboard, log browser, report viewer, logger controls). Optional; adds the Qt dependency. Bootstrapped early (ahead of its Phase 4 slot) at the user's request: dark-themed window with sidebar nav, a live dashboard (circular gauges + collapsible per-subsystem cards, temperature- colored values), and a Recording/Logs page with full M3 controls (start/stop/status + post-crash report). Health/Inventory remain placeholders until M4/M5. GUI-first per D17.
- M11 Tray applet —
QSystemTrayIconmenu-bar applet. Dropdown shows live M1 readouts (CPU temp, GPU temp, memory used/total, status dot) and is led by a Run Diagnostic action (the guided diagnostic session), plus Open dashboard / Start-Stop recording / Snapshot / Quit (D13). Optional; shares the Qt dependency with M10. - M9 Installer — interactive wizard layered on the
.deb(D8); apt-first dependency resolution; enables the logger service and trigger mode. - M12 Session sharing / remote assist (D16) — let a helper inspect a user's machine, in an escalating ladder: (1) diagnostic bundle export (inventory + recent log + report, one-way), (2) live read-only view over a user-chosen tunnel (Tailscale/cloudflared/SSH, no hosted relay), (3) gated interactive terminal wrapping tmate/sshx (read-only by default; read-write only on explicit consent — a deliberate exception to D9). Per-session consent, ephemeral revocable tokens, audit log.
- M13 Auto-update (D18) — planned. On launch, check the public Gitea releases API and
self-update a user-local install with no root (download → verify checksum/signature →
atomic symlink swap → restart, incl. the daemon). HTTPS-only, version-check-only (no
telemetry), opt-out-able. Surfaced in the GUI;
rigdoctor updatein the CLI. (.debusers update via apt instead.)
Bundles (final — D14)
- Essential: M1 + M3 + M4 (the MVP, NVIDIA-only — D5)
- Monitoring: M2 + M8
- Diagnostics: M5 + M6
- Desktop UI: M10 + M11 (adds PySide6)
- Sharing: M12 (session sharing / remote assist — D16)
MVP candidate — confirmed (D5)
M1 + M3 + M4 (Essential), NVIDIA-only, CLI-first. Gives a working tool that captures the GPU crash and explains the logs — deliverable before the installer, GUI/tray, or multi-vendor work.