Crash-capture logger (M3): - crash-safe JSONL (fsync per sample), size-based rotation, GPU-lost/recovered markers, atomic status file - CLI: record run/start/stop/status/report (run = systemd-ready entrypoint) - shared core.reccontrol so CLI + GUI drive the same recorder - crashlog tests (writer, rotation, reader, summary, recorder) GUI: - Recording/Logs page: start/stop/interval controls, live status, post-crash report - shared render helpers (format_raw/headline, render_summary) Docs/decisions: - GUI-first (D17); CLI keeps full parity - D8 revised: user-local self-updating install primary, .deb optional - planned: M12 session sharing (D16), M13 no-root auto-update from public repo (D18) - versioning + CHANGELOG convention (D19) Infra: - .gitea/workflows/release.yml: build wheel+sdist and publish a Gitea release v<version> on push to main - align version to the 0.0.x release line; bump to 0.0.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.2 KiB
RigDoctor — Architecture (DRAFT v0.2)
Tech stack and key structural decisions are now settled (see
DECISIONS.mdD2, D6, D8, D10, D11). Items still marked [OPEN] are tracked there.
1. Principles
- Modular core + plugins. A small engine; every capability is a module that can be installed/omitted independently.
- Capability detection over assumption. Probe what hardware/tools exist; degrade gracefully.
- Vendor & distro abstraction. GPU and package-manager differences live behind interfaces, not scattered through the code (NVIDIA + apt are the first concrete impls).
- One engine, many front-ends. CLI, TUI, GUI, and tray are all thin front-ends over the same core engine. Anything the GUI/tray can do is reachable headless from the CLI.
2. Tech stack — DECIDED (D2)
- Language: Python 3 (target machine has Python 3.14).
- Core / CLI / daemon: stdlib only — no
pipdeps. Easy log/JSON/subprocess handling, tiny footprint, runs headless/over SSH. - TUI (M2): stdlib
curses/ plain ANSI redraw (no deps). - GUI (M10) + tray (M11): Qt via PySide6 — one toolkit for both the desktop window
and the
QSystemTrayIconmenu-bar applet. PySide6 is a dependency of only these two modules, declared in the.deb; the core/daemon never import Qt. - Installer bootstrap (M9): the
.deb's maintainer scripts ensure Python is present, then hand off to the Python installer for module selection.
3. Component layout
+--------------------------+
| core engine | (stdlib only)
| sources → sampler → bus |
+--------------------------+
^ ^ ^ ^
+-------------------+ | | +--------------------+
| +-----------+ +-----------+ |
+---------+ +----------+ +-----------+ +--------------+
| CLI | | daemon | | GUI | | tray applet |
| (stdlib)| | (M3, | | (M10,Qt) | | (M11, Qt) |
| TUI(M2) | | systemd) | | | | |
+---------+ +----------+ +-----------+ +--------------+
- The core engine is a stdlib-only library: sources → sampler loop → an internal bus that fans samples out to sinks (TUI renderer, CSV/JSON logger, alert engine, report builder).
- The daemon (M3) is a long-running, stdlib-only process managed by
systemd --user. - The GUI and tray import PySide6 and talk to the same engine; for live status they can read the daemon's output / a small status file or socket rather than re-sampling.
4. Core engine
+-------------------+ +------------------+ +-------------------+
| Sources (probe) | ---> | Sampler loop | ---> | Sinks |
| nvidia-smi/NVML | | (interval, Hz) | | - TUI renderer |
| amdgpu sysfs | | normalizes into | | - CSV/JSON logger |
| hwmon/lm-sensors | | Sample records | | - Alert engine |
| journalctl/SMART | | | | - Report builder |
+-------------------+ +------------------+ | - GUI/tray feed |
+-------------------+
- Sample record:
{ ts, source, metric, value, unit }flattened per tick into a row. - Sources are pluggable; each declares which metrics it can provide and self-checks
availability at startup. NVIDIA (
nvidia-smi/NVML) + hwmon are the first implementations.
5. Module contract
Each module declares a manifest so the installer and engine can reason about it:
module:
id: crash-logger
name: "Crash-capture logger"
provides: [logging]
requires_sources: [gpu, cpu_temp] # capabilities, not packages
system_packages: # per package manager, optional
apt: [] # uses nvidia-smi + sysfs only
pacman: []
dnf: []
python_deps: [] # e.g. GUI/tray modules → [pyside6]
optional_packages:
apt: [smartmontools] # enriches if present
gpu_vendors: [nvidia, amd, intel]
default_in_bundles: [essential]
Lifecycle hooks a module may implement: probe(), collect(sample), render(view),
report(), install_hint(). GUI/tray modules additionally declare python_deps: [pyside6].
6. Crash-logger daemon & trigger model — DECIDED (D6)
The logger (M3) runs as a systemd --user service. Three user-selectable trigger modes:
- Always-on — service enabled at login, samples continuously (bounded by rotation).
- Game-launch-triggered — starts when a game/Steam session begins, stops after.
Detection is layered (D12), no root: a precise wrapper (
rigdoctor wrap %command%- global Steam compat-tool) as primary; a zero-config watcher (Steam
RunningAppID /procheuristic) as fallback; GameMode D-Bus signals ifgamemodedis present.
- global Steam compat-tool) as primary; a zero-config watcher (Steam
- Manual — started/stopped via the CLI (
rigdoctor record start/stop) or the tray applet's quick action.
The selected mode is written to config by the installer and changeable later via CLI/GUI.
7. GUI & tray — DECIDED (D10/D11)
- GUI (M10): a PySide6 desktop app — live dashboard (graphs/gauges), crash-log browser, health-report viewer, inventory view, logger controls. Works under X11 and Wayland.
- Tray (M11):
QSystemTrayIconapplet in the top menu bar (StatusNotifierItem; on Ubuntu/GNOME surfaced via the AppIndicator extension). Dropdown shows live M1 readouts (CPU temp, GPU temp, memory used/total, status dot) and actions led by Run Diagnostic (the guided diagnostic session, §7.1), plus Open dashboard / Start-Stop recording / Snapshot / Quit (D13). - Both are optional — a headless/server install omits them and loses no diagnostic capability (everything is in the CLI).
7.1 Guided diagnostic session (orchestration)
The "Run Diagnostic" flow (exposed in tray, GUI, and CLI) is not a new module — it orchestrates existing ones: pick a game (D12 detection: Steam library / recently played / running process) → focused capture (M3 scoped to that game's session via the D12 wrapper/watcher) → scan & analyze (M4 over the captured window + system logs) → present prioritized findings with suggested fixes (read-only, D9). The engine exposes it as a single callable so all three front-ends share one implementation.
8. Installer design (M9)
- Detect GPU vendor via
lspci(NVIDIA first) and the package manager (apt first). - Present a module menu grouped into bundles:
- Essential (sensor core + crash logger + health report) — the MVP, NVIDIA-only.
- Monitoring (live TUI + alerts)
- Diagnostics (inventory + gaming-env checks + SMART)
- Desktop UI (GUI + tray applet — adds the PySide6 dependency)
- Custom (pick individual modules) For each selection, show the exact packages that will be installed.
- Resolve dependencies: union of selected modules'
system_packages+python_depsfor the detected package manager; report-only if a package is missing and sudo unavailable. - Install (with explicit confirmation), write config (
~/.config/rigdoctor/), optionally enable thesystemd --userlogger service and choose its trigger mode (D6). - Verify each installed module's
probe()and print a readiness summary.
Module list/bundling is final (D14). Packaging: a user-local install is primary
(self-updating from the public repo, no root — D8/D18), with an optional .deb system
package; the wizard layers module selection on top of either.
9. GPU vendor abstraction
| Capability | NVIDIA (first) | AMD (later) | Intel (later) |
|---|---|---|---|
| Temps/clocks/power | nvidia-smi/NVML |
/sys/class/drm/.../device + rocm-smi |
/sys + intel_gpu_top |
| VRAM temp | mem-junction (often N/A on GeForce) | sysfs mem hwmon |
n/a |
| Crash signature | Xid in dmesg | amdgpu: GPU reset / ring timeouts |
i915 GPU hang |
| Power limit (read-only, D9) | nvidia-smi -pl (suggested, not applied) |
sysfs power_dpm / pp_* |
n/a |
10. Data & config layout
~/.config/rigdoctor/config.toml # enabled modules, thresholds, interval, trigger mode
~/.local/share/rigdoctor/logs/ # rotated crash logs (CSV/JSON)
~/.local/state/rigdoctor/ # session/min-max state, daemon status feed
11. Dependency package names — apt-only (D15)
We maintain package names for Ubuntu/apt only; no cross-distro mapping is built or maintained. The set is small (filled in per module as they land):
| Logical dep | apt package |
|---|---|
| SMART | smartmontools |
| lm-sensors | lm-sensors |
| DMI/inventory | dmidecode |
| GUI/tray (Qt) | python3-pyside6 |
| Tray on GNOME | gir1.2-appindicator3-0.1 (AppIndicator) |
| Desktop notifications | libnotify-bin |
Module manifests still declare deps under a system_packages.apt / python_deps key, so a
thin seam remains if another package manager is ever added — but multi-distro support is not
a planned deliverable (D15).