Files
rigdoctor/docs/ARCHITECTURE.md
T
jessey ce5f830393
release / release (push) Successful in 2m13s
Release 0.0.2: M3 logger (CLI + GUI), GUI-first, CI release workflow
Crash-capture logger (M3):
- crash-safe JSONL (fsync per sample), size-based rotation, GPU-lost/recovered
  markers, atomic status file
- CLI: record run/start/stop/status/report (run = systemd-ready entrypoint)
- shared core.reccontrol so CLI + GUI drive the same recorder
- crashlog tests (writer, rotation, reader, summary, recorder)

GUI:
- Recording/Logs page: start/stop/interval controls, live status, post-crash report
- shared render helpers (format_raw/headline, render_summary)

Docs/decisions:
- GUI-first (D17); CLI keeps full parity
- D8 revised: user-local self-updating install primary, .deb optional
- planned: M12 session sharing (D16), M13 no-root auto-update from public repo (D18)
- versioning + CHANGELOG convention (D19)

Infra:
- .gitea/workflows/release.yml: build wheel+sdist and publish a Gitea release
  v<version> on push to main
- align version to the 0.0.x release line; bump to 0.0.2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:16:41 +02:00

9.2 KiB

RigDoctor — Architecture (DRAFT v0.2)

Tech stack and key structural decisions are now settled (see DECISIONS.md D2, D6, D8, D10, D11). Items still marked [OPEN] are tracked there.

1. Principles

  • Modular core + plugins. A small engine; every capability is a module that can be installed/omitted independently.
  • Capability detection over assumption. Probe what hardware/tools exist; degrade gracefully.
  • Vendor & distro abstraction. GPU and package-manager differences live behind interfaces, not scattered through the code (NVIDIA + apt are the first concrete impls).
  • One engine, many front-ends. CLI, TUI, GUI, and tray are all thin front-ends over the same core engine. Anything the GUI/tray can do is reachable headless from the CLI.

2. Tech stack — DECIDED (D2)

  • Language: Python 3 (target machine has Python 3.14).
  • Core / CLI / daemon: stdlib only — no pip deps. Easy log/JSON/subprocess handling, tiny footprint, runs headless/over SSH.
  • TUI (M2): stdlib curses / plain ANSI redraw (no deps).
  • GUI (M10) + tray (M11): Qt via PySide6 — one toolkit for both the desktop window and the QSystemTrayIcon menu-bar applet. PySide6 is a dependency of only these two modules, declared in the .deb; the core/daemon never import Qt.
  • Installer bootstrap (M9): the .deb's maintainer scripts ensure Python is present, then hand off to the Python installer for module selection.

3. Component layout

                         +--------------------------+
                         |       core engine        |   (stdlib only)
                         |  sources → sampler → bus |
                         +--------------------------+
                            ^      ^      ^      ^
        +-------------------+      |      |      +--------------------+
        |              +-----------+      +-----------+              |
   +---------+    +----------+        +-----------+        +--------------+
   |   CLI   |    |  daemon  |        |    GUI    |        | tray applet  |
   | (stdlib)|    |  (M3,    |        | (M10,Qt)  |        | (M11, Qt)    |
   | TUI(M2) |    | systemd) |        |           |        |              |
   +---------+    +----------+        +-----------+        +--------------+
  • The core engine is a stdlib-only library: sources → sampler loop → an internal bus that fans samples out to sinks (TUI renderer, CSV/JSON logger, alert engine, report builder).
  • The daemon (M3) is a long-running, stdlib-only process managed by systemd --user.
  • The GUI and tray import PySide6 and talk to the same engine; for live status they can read the daemon's output / a small status file or socket rather than re-sampling.

4. Core engine

+-------------------+      +------------------+      +-------------------+
|  Sources (probe)  | ---> |  Sampler loop    | ---> |  Sinks            |
|  nvidia-smi/NVML  |      |  (interval, Hz)  |      |  - TUI renderer   |
|  amdgpu sysfs     |      |  normalizes into |      |  - CSV/JSON logger |
|  hwmon/lm-sensors |      |  Sample records  |      |  - Alert engine   |
|  journalctl/SMART |      |                  |      |  - Report builder |
+-------------------+      +------------------+      |  - GUI/tray feed   |
                                                     +-------------------+
  • Sample record: { ts, source, metric, value, unit } flattened per tick into a row.
  • Sources are pluggable; each declares which metrics it can provide and self-checks availability at startup. NVIDIA (nvidia-smi/NVML) + hwmon are the first implementations.

5. Module contract

Each module declares a manifest so the installer and engine can reason about it:

module:
  id: crash-logger
  name: "Crash-capture logger"
  provides: [logging]
  requires_sources: [gpu, cpu_temp]      # capabilities, not packages
  system_packages:                       # per package manager, optional
    apt:    []                           # uses nvidia-smi + sysfs only
    pacman: []
    dnf:    []
  python_deps: []                        # e.g. GUI/tray modules → [pyside6]
  optional_packages:
    apt:    [smartmontools]              # enriches if present
  gpu_vendors: [nvidia, amd, intel]
  default_in_bundles: [essential]

Lifecycle hooks a module may implement: probe(), collect(sample), render(view), report(), install_hint(). GUI/tray modules additionally declare python_deps: [pyside6].

6. Crash-logger daemon & trigger model — DECIDED (D6)

The logger (M3) runs as a systemd --user service. Three user-selectable trigger modes:

  1. Always-on — service enabled at login, samples continuously (bounded by rotation).
  2. Game-launch-triggered — starts when a game/Steam session begins, stops after. Detection is layered (D12), no root: a precise wrapper (rigdoctor wrap %command%
    • global Steam compat-tool) as primary; a zero-config watcher (Steam RunningAppID
    • /proc heuristic) as fallback; GameMode D-Bus signals if gamemoded is present.
  3. Manual — started/stopped via the CLI (rigdoctor record start/stop) or the tray applet's quick action.

The selected mode is written to config by the installer and changeable later via CLI/GUI.

7. GUI & tray — DECIDED (D10/D11)

  • GUI (M10): a PySide6 desktop app — live dashboard (graphs/gauges), crash-log browser, health-report viewer, inventory view, logger controls. Works under X11 and Wayland.
  • Tray (M11): QSystemTrayIcon applet in the top menu bar (StatusNotifierItem; on Ubuntu/GNOME surfaced via the AppIndicator extension). Dropdown shows live M1 readouts (CPU temp, GPU temp, memory used/total, status dot) and actions led by Run Diagnostic (the guided diagnostic session, §7.1), plus Open dashboard / Start-Stop recording / Snapshot / Quit (D13).
  • Both are optional — a headless/server install omits them and loses no diagnostic capability (everything is in the CLI).

7.1 Guided diagnostic session (orchestration)

The "Run Diagnostic" flow (exposed in tray, GUI, and CLI) is not a new module — it orchestrates existing ones: pick a game (D12 detection: Steam library / recently played / running process) → focused capture (M3 scoped to that game's session via the D12 wrapper/watcher) → scan & analyze (M4 over the captured window + system logs) → present prioritized findings with suggested fixes (read-only, D9). The engine exposes it as a single callable so all three front-ends share one implementation.

8. Installer design (M9)

  1. Detect GPU vendor via lspci (NVIDIA first) and the package manager (apt first).
  2. Present a module menu grouped into bundles:
    • Essential (sensor core + crash logger + health report) — the MVP, NVIDIA-only.
    • Monitoring (live TUI + alerts)
    • Diagnostics (inventory + gaming-env checks + SMART)
    • Desktop UI (GUI + tray applet — adds the PySide6 dependency)
    • Custom (pick individual modules) For each selection, show the exact packages that will be installed.
  3. Resolve dependencies: union of selected modules' system_packages + python_deps for the detected package manager; report-only if a package is missing and sudo unavailable.
  4. Install (with explicit confirmation), write config (~/.config/rigdoctor/), optionally enable the systemd --user logger service and choose its trigger mode (D6).
  5. Verify each installed module's probe() and print a readiness summary.

Module list/bundling is final (D14). Packaging: a user-local install is primary (self-updating from the public repo, no root — D8/D18), with an optional .deb system package; the wizard layers module selection on top of either.

9. GPU vendor abstraction

Capability NVIDIA (first) AMD (later) Intel (later)
Temps/clocks/power nvidia-smi/NVML /sys/class/drm/.../device + rocm-smi /sys + intel_gpu_top
VRAM temp mem-junction (often N/A on GeForce) sysfs mem hwmon n/a
Crash signature Xid in dmesg amdgpu: GPU reset / ring timeouts i915 GPU hang
Power limit (read-only, D9) nvidia-smi -pl (suggested, not applied) sysfs power_dpm / pp_* n/a

10. Data & config layout

~/.config/rigdoctor/config.toml      # enabled modules, thresholds, interval, trigger mode
~/.local/share/rigdoctor/logs/       # rotated crash logs (CSV/JSON)
~/.local/state/rigdoctor/            # session/min-max state, daemon status feed

11. Dependency package names — apt-only (D15)

We maintain package names for Ubuntu/apt only; no cross-distro mapping is built or maintained. The set is small (filled in per module as they land):

Logical dep apt package
SMART smartmontools
lm-sensors lm-sensors
DMI/inventory dmidecode
GUI/tray (Qt) python3-pyside6
Tray on GNOME gir1.2-appindicator3-0.1 (AppIndicator)
Desktop notifications libnotify-bin

Module manifests still declare deps under a system_packages.apt / python_deps key, so a thin seam remains if another package manager is ever added — but multi-distro support is not a planned deliverable (D15).