ce5f830393
release / release (push) Successful in 2m13s
Crash-capture logger (M3): - crash-safe JSONL (fsync per sample), size-based rotation, GPU-lost/recovered markers, atomic status file - CLI: record run/start/stop/status/report (run = systemd-ready entrypoint) - shared core.reccontrol so CLI + GUI drive the same recorder - crashlog tests (writer, rotation, reader, summary, recorder) GUI: - Recording/Logs page: start/stop/interval controls, live status, post-crash report - shared render helpers (format_raw/headline, render_summary) Docs/decisions: - GUI-first (D17); CLI keeps full parity - D8 revised: user-local self-updating install primary, .deb optional - planned: M12 session sharing (D16), M13 no-root auto-update from public repo (D18) - versioning + CHANGELOG convention (D19) Infra: - .gitea/workflows/release.yml: build wheel+sdist and publish a Gitea release v<version> on push to main - align version to the 0.0.x release line; bump to 0.0.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
168 lines
9.2 KiB
Markdown
168 lines
9.2 KiB
Markdown
# RigDoctor — Architecture (DRAFT v0.2)
|
|
|
|
> Tech stack and key structural decisions are now settled (see `DECISIONS.md` D2, D6, D8,
|
|
> D10, D11). Items still marked **[OPEN]** are tracked there.
|
|
|
|
## 1. Principles
|
|
- **Modular core + plugins.** A small engine; every capability is a module that can be
|
|
installed/omitted independently.
|
|
- **Capability detection over assumption.** Probe what hardware/tools exist; degrade
|
|
gracefully.
|
|
- **Vendor & distro abstraction.** GPU and package-manager differences live behind
|
|
interfaces, not scattered through the code (NVIDIA + apt are the first concrete impls).
|
|
- **One engine, many front-ends.** CLI, TUI, GUI, and tray are all thin front-ends over the
|
|
same core engine. Anything the GUI/tray can do is reachable headless from the CLI.
|
|
|
|
## 2. Tech stack — *DECIDED (D2)*
|
|
- **Language:** Python 3 (target machine has Python 3.14).
|
|
- **Core / CLI / daemon:** **stdlib only** — no `pip` deps. Easy log/JSON/subprocess
|
|
handling, tiny footprint, runs headless/over SSH.
|
|
- **TUI (M2):** stdlib `curses` / plain ANSI redraw (no deps).
|
|
- **GUI (M10) + tray (M11):** **Qt via PySide6** — one toolkit for both the desktop window
|
|
and the `QSystemTrayIcon` menu-bar applet. PySide6 is a dependency of *only* these two
|
|
modules, declared in the `.deb`; the core/daemon never import Qt.
|
|
- **Installer bootstrap (M9):** the `.deb`'s maintainer scripts ensure Python is present,
|
|
then hand off to the Python installer for module selection.
|
|
|
|
## 3. Component layout
|
|
```
|
|
+--------------------------+
|
|
| core engine | (stdlib only)
|
|
| sources → sampler → bus |
|
|
+--------------------------+
|
|
^ ^ ^ ^
|
|
+-------------------+ | | +--------------------+
|
|
| +-----------+ +-----------+ |
|
|
+---------+ +----------+ +-----------+ +--------------+
|
|
| CLI | | daemon | | GUI | | tray applet |
|
|
| (stdlib)| | (M3, | | (M10,Qt) | | (M11, Qt) |
|
|
| TUI(M2) | | systemd) | | | | |
|
|
+---------+ +----------+ +-----------+ +--------------+
|
|
```
|
|
- The **core engine** is a stdlib-only library: sources → sampler loop → an internal bus
|
|
that fans samples out to sinks (TUI renderer, CSV/JSON logger, alert engine, report
|
|
builder).
|
|
- The **daemon** (M3) is a long-running, stdlib-only process managed by `systemd --user`.
|
|
- The **GUI** and **tray** import PySide6 and talk to the same engine; for live status they
|
|
can read the daemon's output / a small status file or socket rather than re-sampling.
|
|
|
|
## 4. Core engine
|
|
```
|
|
+-------------------+ +------------------+ +-------------------+
|
|
| Sources (probe) | ---> | Sampler loop | ---> | Sinks |
|
|
| nvidia-smi/NVML | | (interval, Hz) | | - TUI renderer |
|
|
| amdgpu sysfs | | normalizes into | | - CSV/JSON logger |
|
|
| hwmon/lm-sensors | | Sample records | | - Alert engine |
|
|
| journalctl/SMART | | | | - Report builder |
|
|
+-------------------+ +------------------+ | - GUI/tray feed |
|
|
+-------------------+
|
|
```
|
|
- **Sample record:** `{ ts, source, metric, value, unit }` flattened per tick into a row.
|
|
- **Sources** are pluggable; each declares which metrics it can provide and self-checks
|
|
availability at startup. NVIDIA (`nvidia-smi`/NVML) + hwmon are the first implementations.
|
|
|
|
## 5. Module contract
|
|
Each module declares a manifest so the installer and engine can reason about it:
|
|
```
|
|
module:
|
|
id: crash-logger
|
|
name: "Crash-capture logger"
|
|
provides: [logging]
|
|
requires_sources: [gpu, cpu_temp] # capabilities, not packages
|
|
system_packages: # per package manager, optional
|
|
apt: [] # uses nvidia-smi + sysfs only
|
|
pacman: []
|
|
dnf: []
|
|
python_deps: [] # e.g. GUI/tray modules → [pyside6]
|
|
optional_packages:
|
|
apt: [smartmontools] # enriches if present
|
|
gpu_vendors: [nvidia, amd, intel]
|
|
default_in_bundles: [essential]
|
|
```
|
|
Lifecycle hooks a module may implement: `probe()`, `collect(sample)`, `render(view)`,
|
|
`report()`, `install_hint()`. GUI/tray modules additionally declare `python_deps: [pyside6]`.
|
|
|
|
## 6. Crash-logger daemon & trigger model — *DECIDED (D6)*
|
|
The logger (M3) runs as a `systemd --user` service. Three user-selectable trigger modes:
|
|
1. **Always-on** — service enabled at login, samples continuously (bounded by rotation).
|
|
2. **Game-launch-triggered** — starts when a game/Steam session begins, stops after.
|
|
Detection is layered (D12), no root: a precise **wrapper** (`rigdoctor wrap %command%`
|
|
+ global Steam compat-tool) as primary; a zero-config **watcher** (Steam `RunningAppID`
|
|
+ `/proc` heuristic) as fallback; **GameMode** D-Bus signals if `gamemoded` is present.
|
|
3. **Manual** — started/stopped via the CLI (`rigdoctor record start/stop`) or the tray
|
|
applet's quick action.
|
|
|
|
The selected mode is written to config by the installer and changeable later via CLI/GUI.
|
|
|
|
## 7. GUI & tray — *DECIDED (D10/D11)*
|
|
- **GUI (M10):** a PySide6 desktop app — live dashboard (graphs/gauges), crash-log browser,
|
|
health-report viewer, inventory view, logger controls. Works under X11 and Wayland.
|
|
- **Tray (M11):** `QSystemTrayIcon` applet in the top menu bar (StatusNotifierItem; on
|
|
Ubuntu/GNOME surfaced via the AppIndicator extension). Dropdown shows live M1 readouts
|
|
(CPU temp, GPU temp, memory used/total, status dot) and actions led by **Run Diagnostic**
|
|
(the guided diagnostic session, §7.1), plus Open dashboard / Start-Stop recording /
|
|
Snapshot / Quit (D13).
|
|
- Both are **optional** — a headless/server install omits them and loses no diagnostic
|
|
capability (everything is in the CLI).
|
|
|
|
### 7.1 Guided diagnostic session (orchestration)
|
|
The "Run Diagnostic" flow (exposed in tray, GUI, and CLI) is not a new module — it
|
|
orchestrates existing ones: **pick a game** (D12 detection: Steam library / recently played
|
|
/ running process) → **focused capture** (M3 scoped to that game's session via the D12
|
|
wrapper/watcher) → **scan & analyze** (M4 over the captured window + system logs) →
|
|
**present prioritized findings** with suggested fixes (read-only, D9). The engine exposes it
|
|
as a single callable so all three front-ends share one implementation.
|
|
|
|
## 8. Installer design (M9)
|
|
1. **Detect** GPU vendor via `lspci` (NVIDIA first) and the package manager (apt first).
|
|
2. **Present** a module menu grouped into bundles:
|
|
- *Essential* (sensor core + crash logger + health report) — the MVP, NVIDIA-only.
|
|
- *Monitoring* (live TUI + alerts)
|
|
- *Diagnostics* (inventory + gaming-env checks + SMART)
|
|
- *Desktop UI* (GUI + tray applet — adds the PySide6 dependency)
|
|
- *Custom* (pick individual modules)
|
|
For each selection, show the exact packages that will be installed.
|
|
3. **Resolve** dependencies: union of selected modules' `system_packages` + `python_deps`
|
|
for the detected package manager; report-only if a package is missing and sudo
|
|
unavailable.
|
|
4. **Install** (with explicit confirmation), **write config** (`~/.config/rigdoctor/`),
|
|
optionally **enable** the `systemd --user` logger service and choose its trigger mode (D6).
|
|
5. **Verify** each installed module's `probe()` and print a readiness summary.
|
|
|
|
Module list/bundling is final (D14). Packaging: a **user-local install is primary**
|
|
(self-updating from the public repo, no root — D8/D18), with an **optional `.deb`** system
|
|
package; the wizard layers module selection on top of either.
|
|
|
|
## 9. GPU vendor abstraction
|
|
| Capability | NVIDIA (first) | AMD (later) | Intel (later) |
|
|
|------------|--------|-----|-------|
|
|
| Temps/clocks/power | `nvidia-smi`/NVML | `/sys/class/drm/.../device` + `rocm-smi` | `/sys` + `intel_gpu_top` |
|
|
| VRAM temp | mem-junction (often N/A on GeForce) | sysfs `mem` hwmon | n/a |
|
|
| Crash signature | Xid in dmesg | `amdgpu: GPU reset` / ring timeouts | i915 GPU hang |
|
|
| Power limit (read-only, D9) | `nvidia-smi -pl` (suggested, not applied) | sysfs `power_dpm` / `pp_*` | n/a |
|
|
|
|
## 10. Data & config layout
|
|
```
|
|
~/.config/rigdoctor/config.toml # enabled modules, thresholds, interval, trigger mode
|
|
~/.local/share/rigdoctor/logs/ # rotated crash logs (CSV/JSON)
|
|
~/.local/state/rigdoctor/ # session/min-max state, daemon status feed
|
|
```
|
|
|
|
## 11. Dependency package names — apt-only (D15)
|
|
We maintain package names for **Ubuntu/apt only**; no cross-distro mapping is built or
|
|
maintained. The set is small (filled in per module as they land):
|
|
|
|
| Logical dep | apt package |
|
|
|-------------|-------------|
|
|
| SMART | `smartmontools` |
|
|
| lm-sensors | `lm-sensors` |
|
|
| DMI/inventory | `dmidecode` |
|
|
| GUI/tray (Qt) | `python3-pyside6` |
|
|
| Tray on GNOME | `gir1.2-appindicator3-0.1` (AppIndicator) |
|
|
| Desktop notifications | `libnotify-bin` |
|
|
|
|
Module manifests still declare deps under a `system_packages.apt` / `python_deps` key, so a
|
|
thin seam remains if another package manager is ever added — but multi-distro support is **not
|
|
a planned deliverable** (D15).
|
|
</content>
|