Initial commit: docs, decisions, and M1 sensor core
Planning docs (SPEC, ARCHITECTURE, MODULES, ROADMAP, DECISIONS) with decisions D1-D15 settled: RigDoctor name, Python 3 + Qt/PySide6 stack (core/CLI/daemon stdlib-only), Ubuntu + NVIDIA first, .deb packaging, read-only + suggestions, GUI + tray modules, stress module dropped. First code: the M1 sensor core (stdlib-only) and a CLI. - core engine: Reading/Sample model, Sampler, hwmon reader - self-probing sources (NVIDIA first): nvidia-smi GPU, coretemp/k10temp CPU, /proc/meminfo + DDR5 SPD memory, NVMe storage - CLI: snapshot (text/JSON), monitor, sources; record/report stubbed - stdlib unittest smoke tests Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,166 @@
|
||||
# RigDoctor — Architecture (DRAFT v0.2)
|
||||
|
||||
> Tech stack and key structural decisions are now settled (see `DECISIONS.md` D2, D6, D8,
|
||||
> D10, D11). Items still marked **[OPEN]** are tracked there.
|
||||
|
||||
## 1. Principles
|
||||
- **Modular core + plugins.** A small engine; every capability is a module that can be
|
||||
installed/omitted independently.
|
||||
- **Capability detection over assumption.** Probe what hardware/tools exist; degrade
|
||||
gracefully.
|
||||
- **Vendor & distro abstraction.** GPU and package-manager differences live behind
|
||||
interfaces, not scattered through the code (NVIDIA + apt are the first concrete impls).
|
||||
- **One engine, many front-ends.** CLI, TUI, GUI, and tray are all thin front-ends over the
|
||||
same core engine. Anything the GUI/tray can do is reachable headless from the CLI.
|
||||
|
||||
## 2. Tech stack — *DECIDED (D2)*
|
||||
- **Language:** Python 3 (target machine has Python 3.14).
|
||||
- **Core / CLI / daemon:** **stdlib only** — no `pip` deps. Easy log/JSON/subprocess
|
||||
handling, tiny footprint, runs headless/over SSH.
|
||||
- **TUI (M2):** stdlib `curses` / plain ANSI redraw (no deps).
|
||||
- **GUI (M10) + tray (M11):** **Qt via PySide6** — one toolkit for both the desktop window
|
||||
and the `QSystemTrayIcon` menu-bar applet. PySide6 is a dependency of *only* these two
|
||||
modules, declared in the `.deb`; the core/daemon never import Qt.
|
||||
- **Installer bootstrap (M9):** the `.deb`'s maintainer scripts ensure Python is present,
|
||||
then hand off to the Python installer for module selection.
|
||||
|
||||
## 3. Component layout
|
||||
```
|
||||
+--------------------------+
|
||||
| core engine | (stdlib only)
|
||||
| sources → sampler → bus |
|
||||
+--------------------------+
|
||||
^ ^ ^ ^
|
||||
+-------------------+ | | +--------------------+
|
||||
| +-----------+ +-----------+ |
|
||||
+---------+ +----------+ +-----------+ +--------------+
|
||||
| CLI | | daemon | | GUI | | tray applet |
|
||||
| (stdlib)| | (M3, | | (M10,Qt) | | (M11, Qt) |
|
||||
| TUI(M2) | | systemd) | | | | |
|
||||
+---------+ +----------+ +-----------+ +--------------+
|
||||
```
|
||||
- The **core engine** is a stdlib-only library: sources → sampler loop → an internal bus
|
||||
that fans samples out to sinks (TUI renderer, CSV/JSON logger, alert engine, report
|
||||
builder).
|
||||
- The **daemon** (M3) is a long-running, stdlib-only process managed by `systemd --user`.
|
||||
- The **GUI** and **tray** import PySide6 and talk to the same engine; for live status they
|
||||
can read the daemon's output / a small status file or socket rather than re-sampling.
|
||||
|
||||
## 4. Core engine
|
||||
```
|
||||
+-------------------+ +------------------+ +-------------------+
|
||||
| Sources (probe) | ---> | Sampler loop | ---> | Sinks |
|
||||
| nvidia-smi/NVML | | (interval, Hz) | | - TUI renderer |
|
||||
| amdgpu sysfs | | normalizes into | | - CSV/JSON logger |
|
||||
| hwmon/lm-sensors | | Sample records | | - Alert engine |
|
||||
| journalctl/SMART | | | | - Report builder |
|
||||
+-------------------+ +------------------+ | - GUI/tray feed |
|
||||
+-------------------+
|
||||
```
|
||||
- **Sample record:** `{ ts, source, metric, value, unit }` flattened per tick into a row.
|
||||
- **Sources** are pluggable; each declares which metrics it can provide and self-checks
|
||||
availability at startup. NVIDIA (`nvidia-smi`/NVML) + hwmon are the first implementations.
|
||||
|
||||
## 5. Module contract
|
||||
Each module declares a manifest so the installer and engine can reason about it:
|
||||
```
|
||||
module:
|
||||
id: crash-logger
|
||||
name: "Crash-capture logger"
|
||||
provides: [logging]
|
||||
requires_sources: [gpu, cpu_temp] # capabilities, not packages
|
||||
system_packages: # per package manager, optional
|
||||
apt: [] # uses nvidia-smi + sysfs only
|
||||
pacman: []
|
||||
dnf: []
|
||||
python_deps: [] # e.g. GUI/tray modules → [pyside6]
|
||||
optional_packages:
|
||||
apt: [smartmontools] # enriches if present
|
||||
gpu_vendors: [nvidia, amd, intel]
|
||||
default_in_bundles: [essential]
|
||||
```
|
||||
Lifecycle hooks a module may implement: `probe()`, `collect(sample)`, `render(view)`,
|
||||
`report()`, `install_hint()`. GUI/tray modules additionally declare `python_deps: [pyside6]`.
|
||||
|
||||
## 6. Crash-logger daemon & trigger model — *DECIDED (D6)*
|
||||
The logger (M3) runs as a `systemd --user` service. Three user-selectable trigger modes:
|
||||
1. **Always-on** — service enabled at login, samples continuously (bounded by rotation).
|
||||
2. **Game-launch-triggered** — starts when a game/Steam session begins, stops after.
|
||||
Detection is layered (D12), no root: a precise **wrapper** (`rigdoctor wrap %command%`
|
||||
+ global Steam compat-tool) as primary; a zero-config **watcher** (Steam `RunningAppID`
|
||||
+ `/proc` heuristic) as fallback; **GameMode** D-Bus signals if `gamemoded` is present.
|
||||
3. **Manual** — started/stopped via the CLI (`rigdoctor record start/stop`) or the tray
|
||||
applet's quick action.
|
||||
|
||||
The selected mode is written to config by the installer and changeable later via CLI/GUI.
|
||||
|
||||
## 7. GUI & tray — *DECIDED (D10/D11)*
|
||||
- **GUI (M10):** a PySide6 desktop app — live dashboard (graphs/gauges), crash-log browser,
|
||||
health-report viewer, inventory view, logger controls. Works under X11 and Wayland.
|
||||
- **Tray (M11):** `QSystemTrayIcon` applet in the top menu bar (StatusNotifierItem; on
|
||||
Ubuntu/GNOME surfaced via the AppIndicator extension). Dropdown shows live M1 readouts
|
||||
(CPU temp, GPU temp, memory used/total, status dot) and actions led by **Run Diagnostic**
|
||||
(the guided diagnostic session, §7.1), plus Open dashboard / Start-Stop recording /
|
||||
Snapshot / Quit (D13).
|
||||
- Both are **optional** — a headless/server install omits them and loses no diagnostic
|
||||
capability (everything is in the CLI).
|
||||
|
||||
### 7.1 Guided diagnostic session (orchestration)
|
||||
The "Run Diagnostic" flow (exposed in tray, GUI, and CLI) is not a new module — it
|
||||
orchestrates existing ones: **pick a game** (D12 detection: Steam library / recently played
|
||||
/ running process) → **focused capture** (M3 scoped to that game's session via the D12
|
||||
wrapper/watcher) → **scan & analyze** (M4 over the captured window + system logs) →
|
||||
**present prioritized findings** with suggested fixes (read-only, D9). The engine exposes it
|
||||
as a single callable so all three front-ends share one implementation.
|
||||
|
||||
## 8. Installer design (M9)
|
||||
1. **Detect** GPU vendor via `lspci` (NVIDIA first) and the package manager (apt first).
|
||||
2. **Present** a module menu grouped into bundles:
|
||||
- *Essential* (sensor core + crash logger + health report) — the MVP, NVIDIA-only.
|
||||
- *Monitoring* (live TUI + alerts)
|
||||
- *Diagnostics* (inventory + gaming-env checks + SMART)
|
||||
- *Desktop UI* (GUI + tray applet — adds the PySide6 dependency)
|
||||
- *Custom* (pick individual modules)
|
||||
For each selection, show the exact packages that will be installed.
|
||||
3. **Resolve** dependencies: union of selected modules' `system_packages` + `python_deps`
|
||||
for the detected package manager; report-only if a package is missing and sudo
|
||||
unavailable.
|
||||
4. **Install** (with explicit confirmation), **write config** (`~/.config/rigdoctor/`),
|
||||
optionally **enable** the `systemd --user` logger service and choose its trigger mode (D6).
|
||||
5. **Verify** each installed module's `probe()` and print a readiness summary.
|
||||
|
||||
Module list/bundling is final (D14). Packaging is `.deb`-first (D8); the wizard layers
|
||||
module selection on top of the package.
|
||||
|
||||
## 9. GPU vendor abstraction
|
||||
| Capability | NVIDIA (first) | AMD (later) | Intel (later) |
|
||||
|------------|--------|-----|-------|
|
||||
| Temps/clocks/power | `nvidia-smi`/NVML | `/sys/class/drm/.../device` + `rocm-smi` | `/sys` + `intel_gpu_top` |
|
||||
| VRAM temp | mem-junction (often N/A on GeForce) | sysfs `mem` hwmon | n/a |
|
||||
| Crash signature | Xid in dmesg | `amdgpu: GPU reset` / ring timeouts | i915 GPU hang |
|
||||
| Power limit (read-only, D9) | `nvidia-smi -pl` (suggested, not applied) | sysfs `power_dpm` / `pp_*` | n/a |
|
||||
|
||||
## 10. Data & config layout
|
||||
```
|
||||
~/.config/rigdoctor/config.toml # enabled modules, thresholds, interval, trigger mode
|
||||
~/.local/share/rigdoctor/logs/ # rotated crash logs (CSV/JSON)
|
||||
~/.local/state/rigdoctor/ # session/min-max state, daemon status feed
|
||||
```
|
||||
|
||||
## 11. Dependency package names — apt-only (D15)
|
||||
We maintain package names for **Ubuntu/apt only**; no cross-distro mapping is built or
|
||||
maintained. The set is small (filled in per module as they land):
|
||||
|
||||
| Logical dep | apt package |
|
||||
|-------------|-------------|
|
||||
| SMART | `smartmontools` |
|
||||
| lm-sensors | `lm-sensors` |
|
||||
| DMI/inventory | `dmidecode` |
|
||||
| GUI/tray (Qt) | `python3-pyside6` |
|
||||
| Tray on GNOME | `gir1.2-appindicator3-0.1` (AppIndicator) |
|
||||
| Desktop notifications | `libnotify-bin` |
|
||||
|
||||
Module manifests still declare deps under a `system_packages.apt` / `python_deps` key, so a
|
||||
thin seam remains if another package manager is ever added — but multi-distro support is **not
|
||||
a planned deliverable** (D15).
|
||||
</content>
|
||||
Reference in New Issue
Block a user