Files
rigdoctor/docs/ARCHITECTURE.md
T
jessey 305b6c4497 Initial commit: docs, decisions, and M1 sensor core
Planning docs (SPEC, ARCHITECTURE, MODULES, ROADMAP, DECISIONS) with
decisions D1-D15 settled: RigDoctor name, Python 3 + Qt/PySide6 stack
(core/CLI/daemon stdlib-only), Ubuntu + NVIDIA first, .deb packaging,
read-only + suggestions, GUI + tray modules, stress module dropped.

First code: the M1 sensor core (stdlib-only) and a CLI.
- core engine: Reading/Sample model, Sampler, hwmon reader
- self-probing sources (NVIDIA first): nvidia-smi GPU, coretemp/k10temp
  CPU, /proc/meminfo + DDR5 SPD memory, NVMe storage
- CLI: snapshot (text/JSON), monitor, sources; record/report stubbed
- stdlib unittest smoke tests

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:40:21 +02:00

167 lines
9.1 KiB
Markdown

# RigDoctor — Architecture (DRAFT v0.2)
> Tech stack and key structural decisions are now settled (see `DECISIONS.md` D2, D6, D8,
> D10, D11). Items still marked **[OPEN]** are tracked there.
## 1. Principles
- **Modular core + plugins.** A small engine; every capability is a module that can be
installed/omitted independently.
- **Capability detection over assumption.** Probe what hardware/tools exist; degrade
gracefully.
- **Vendor & distro abstraction.** GPU and package-manager differences live behind
interfaces, not scattered through the code (NVIDIA + apt are the first concrete impls).
- **One engine, many front-ends.** CLI, TUI, GUI, and tray are all thin front-ends over the
same core engine. Anything the GUI/tray can do is reachable headless from the CLI.
## 2. Tech stack — *DECIDED (D2)*
- **Language:** Python 3 (target machine has Python 3.14).
- **Core / CLI / daemon:** **stdlib only** — no `pip` deps. Easy log/JSON/subprocess
handling, tiny footprint, runs headless/over SSH.
- **TUI (M2):** stdlib `curses` / plain ANSI redraw (no deps).
- **GUI (M10) + tray (M11):** **Qt via PySide6** — one toolkit for both the desktop window
and the `QSystemTrayIcon` menu-bar applet. PySide6 is a dependency of *only* these two
modules, declared in the `.deb`; the core/daemon never import Qt.
- **Installer bootstrap (M9):** the `.deb`'s maintainer scripts ensure Python is present,
then hand off to the Python installer for module selection.
## 3. Component layout
```
+--------------------------+
| core engine | (stdlib only)
| sources → sampler → bus |
+--------------------------+
^ ^ ^ ^
+-------------------+ | | +--------------------+
| +-----------+ +-----------+ |
+---------+ +----------+ +-----------+ +--------------+
| CLI | | daemon | | GUI | | tray applet |
| (stdlib)| | (M3, | | (M10,Qt) | | (M11, Qt) |
| TUI(M2) | | systemd) | | | | |
+---------+ +----------+ +-----------+ +--------------+
```
- The **core engine** is a stdlib-only library: sources → sampler loop → an internal bus
that fans samples out to sinks (TUI renderer, CSV/JSON logger, alert engine, report
builder).
- The **daemon** (M3) is a long-running, stdlib-only process managed by `systemd --user`.
- The **GUI** and **tray** import PySide6 and talk to the same engine; for live status they
can read the daemon's output / a small status file or socket rather than re-sampling.
## 4. Core engine
```
+-------------------+ +------------------+ +-------------------+
| Sources (probe) | ---> | Sampler loop | ---> | Sinks |
| nvidia-smi/NVML | | (interval, Hz) | | - TUI renderer |
| amdgpu sysfs | | normalizes into | | - CSV/JSON logger |
| hwmon/lm-sensors | | Sample records | | - Alert engine |
| journalctl/SMART | | | | - Report builder |
+-------------------+ +------------------+ | - GUI/tray feed |
+-------------------+
```
- **Sample record:** `{ ts, source, metric, value, unit }` flattened per tick into a row.
- **Sources** are pluggable; each declares which metrics it can provide and self-checks
availability at startup. NVIDIA (`nvidia-smi`/NVML) + hwmon are the first implementations.
## 5. Module contract
Each module declares a manifest so the installer and engine can reason about it:
```
module:
id: crash-logger
name: "Crash-capture logger"
provides: [logging]
requires_sources: [gpu, cpu_temp] # capabilities, not packages
system_packages: # per package manager, optional
apt: [] # uses nvidia-smi + sysfs only
pacman: []
dnf: []
python_deps: [] # e.g. GUI/tray modules → [pyside6]
optional_packages:
apt: [smartmontools] # enriches if present
gpu_vendors: [nvidia, amd, intel]
default_in_bundles: [essential]
```
Lifecycle hooks a module may implement: `probe()`, `collect(sample)`, `render(view)`,
`report()`, `install_hint()`. GUI/tray modules additionally declare `python_deps: [pyside6]`.
## 6. Crash-logger daemon & trigger model — *DECIDED (D6)*
The logger (M3) runs as a `systemd --user` service. Three user-selectable trigger modes:
1. **Always-on** — service enabled at login, samples continuously (bounded by rotation).
2. **Game-launch-triggered** — starts when a game/Steam session begins, stops after.
Detection is layered (D12), no root: a precise **wrapper** (`rigdoctor wrap %command%`
+ global Steam compat-tool) as primary; a zero-config **watcher** (Steam `RunningAppID`
+ `/proc` heuristic) as fallback; **GameMode** D-Bus signals if `gamemoded` is present.
3. **Manual** — started/stopped via the CLI (`rigdoctor record start/stop`) or the tray
applet's quick action.
The selected mode is written to config by the installer and changeable later via CLI/GUI.
## 7. GUI & tray — *DECIDED (D10/D11)*
- **GUI (M10):** a PySide6 desktop app — live dashboard (graphs/gauges), crash-log browser,
health-report viewer, inventory view, logger controls. Works under X11 and Wayland.
- **Tray (M11):** `QSystemTrayIcon` applet in the top menu bar (StatusNotifierItem; on
Ubuntu/GNOME surfaced via the AppIndicator extension). Dropdown shows live M1 readouts
(CPU temp, GPU temp, memory used/total, status dot) and actions led by **Run Diagnostic**
(the guided diagnostic session, §7.1), plus Open dashboard / Start-Stop recording /
Snapshot / Quit (D13).
- Both are **optional** — a headless/server install omits them and loses no diagnostic
capability (everything is in the CLI).
### 7.1 Guided diagnostic session (orchestration)
The "Run Diagnostic" flow (exposed in tray, GUI, and CLI) is not a new module — it
orchestrates existing ones: **pick a game** (D12 detection: Steam library / recently played
/ running process) → **focused capture** (M3 scoped to that game's session via the D12
wrapper/watcher) → **scan & analyze** (M4 over the captured window + system logs) →
**present prioritized findings** with suggested fixes (read-only, D9). The engine exposes it
as a single callable so all three front-ends share one implementation.
## 8. Installer design (M9)
1. **Detect** GPU vendor via `lspci` (NVIDIA first) and the package manager (apt first).
2. **Present** a module menu grouped into bundles:
- *Essential* (sensor core + crash logger + health report) — the MVP, NVIDIA-only.
- *Monitoring* (live TUI + alerts)
- *Diagnostics* (inventory + gaming-env checks + SMART)
- *Desktop UI* (GUI + tray applet — adds the PySide6 dependency)
- *Custom* (pick individual modules)
For each selection, show the exact packages that will be installed.
3. **Resolve** dependencies: union of selected modules' `system_packages` + `python_deps`
for the detected package manager; report-only if a package is missing and sudo
unavailable.
4. **Install** (with explicit confirmation), **write config** (`~/.config/rigdoctor/`),
optionally **enable** the `systemd --user` logger service and choose its trigger mode (D6).
5. **Verify** each installed module's `probe()` and print a readiness summary.
Module list/bundling is final (D14). Packaging is `.deb`-first (D8); the wizard layers
module selection on top of the package.
## 9. GPU vendor abstraction
| Capability | NVIDIA (first) | AMD (later) | Intel (later) |
|------------|--------|-----|-------|
| Temps/clocks/power | `nvidia-smi`/NVML | `/sys/class/drm/.../device` + `rocm-smi` | `/sys` + `intel_gpu_top` |
| VRAM temp | mem-junction (often N/A on GeForce) | sysfs `mem` hwmon | n/a |
| Crash signature | Xid in dmesg | `amdgpu: GPU reset` / ring timeouts | i915 GPU hang |
| Power limit (read-only, D9) | `nvidia-smi -pl` (suggested, not applied) | sysfs `power_dpm` / `pp_*` | n/a |
## 10. Data & config layout
```
~/.config/rigdoctor/config.toml # enabled modules, thresholds, interval, trigger mode
~/.local/share/rigdoctor/logs/ # rotated crash logs (CSV/JSON)
~/.local/state/rigdoctor/ # session/min-max state, daemon status feed
```
## 11. Dependency package names — apt-only (D15)
We maintain package names for **Ubuntu/apt only**; no cross-distro mapping is built or
maintained. The set is small (filled in per module as they land):
| Logical dep | apt package |
|-------------|-------------|
| SMART | `smartmontools` |
| lm-sensors | `lm-sensors` |
| DMI/inventory | `dmidecode` |
| GUI/tray (Qt) | `python3-pyside6` |
| Tray on GNOME | `gir1.2-appindicator3-0.1` (AppIndicator) |
| Desktop notifications | `libnotify-bin` |
Module manifests still declare deps under a `system_packages.apt` / `python_deps` key, so a
thin seam remains if another package manager is ever added — but multi-distro support is **not
a planned deliverable** (D15).
</content>