46ba53631a
release / release (push) Successful in 22s
- install.sh: no-root user-local install (private venv + ~/.local/bin launchers + desktop entry); --ref <tag> to install a specific release, --uninstall to remove; auto-installs the python3-venv prerequisite with consent - packaging/make-run.sh: build a self-extracting .run installer (makeself) bundling the wheel + install.sh; release workflow builds and attaches it - M13 self-update apply: `rigdoctor update` runs an authenticated pip upgrade (rigdoctor[gui] @ git+https://oauth2:<token>@...@<tag>), token scrubbed; GUI sidebar "Update to v…" button applies it and prompts to restart - version 0.0.7, CHANGELOG, docs (M9/M13, ROADMAP, README install section) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
133 lines
6.0 KiB
Markdown
133 lines
6.0 KiB
Markdown
# RigDoctor
|
||
|
||
A **modular diagnostics, monitoring, and health-check toolkit for Linux gamers.**
|
||
|
||
> **Status:** 🟢 Phase 1 (MVP) complete. The **sensor core (M1)**, **crash-capture logger
|
||
> (M3)**, and **health report (M4)** all work — live `snapshot`/`monitor`, crash-safe `record`
|
||
> with a post-crash report, and `report` to scan logs/SMART/driver for likely causes. A
|
||
> desktop GUI (M10) ties them together (dashboard, recording, health). See `docs/ROADMAP.md`.
|
||
|
||
## Why this exists
|
||
|
||
Linux gaming hardware faults are hard to diagnose: GPUs falling off the PCIe bus, the screen
|
||
suddenly going black mid-game, silent thermal/VRAM throttling, power transients,
|
||
driver/library mismatches, Proton quirks, and CPU governor / power-profile misconfiguration.
|
||
The data needed to diagnose them is scattered across `nvidia-smi`, `/sys/class/hwmon`,
|
||
`journalctl`, SMART, and more — and the most useful readings (the ones right before a hard
|
||
freeze) are usually lost because nothing flushed them to disk.
|
||
|
||
RigDoctor pulls all of that into one modular tool: live monitoring, crash-safe logging, a
|
||
one-shot health report, and an interactive installer that only sets up the modules a given
|
||
user actually needs for their hardware.
|
||
|
||
**Seed use cases:** an RTX 3070 that intermittently "falls off the bus" under heavy GPU load
|
||
(Path of Exile on Linux, Escape from Tarkov on Windows), and a monitor going black mid-game.
|
||
See `docs/SPEC.md` §1.
|
||
|
||
## How you run it
|
||
|
||
RigDoctor is **GUI-first** — the desktop app is the primary way in — but every feature is
|
||
also available headless:
|
||
- **Desktop GUI** — graphical dashboard, recording controls, log browser, reports. The
|
||
default interface for most users.
|
||
- **Tray applet** — a small top-menu-bar applet with quick actions and at-a-glance status.
|
||
- **CLI** — full functionality from the terminal; works over SSH and in scripts.
|
||
|
||
The GUI/tray are optional modules; a headless (CLI-only) install loses no capability.
|
||
|
||
## Key decisions (settled)
|
||
|
||
| Topic | Decision |
|
||
|-------|----------|
|
||
| Name | **RigDoctor** |
|
||
| Language / stack | **Python 3 + Qt (PySide6)** — core/CLI/daemon stdlib-only; Qt only for GUI/tray |
|
||
| Primary distro | **Ubuntu** (Debian via apt); others best-effort later |
|
||
| Primary GPU | **NVIDIA** first; AMD, then Intel later |
|
||
| MVP | **Sensor core + crash logger + health report** (NVIDIA-only, CLI-first) |
|
||
| Distribution | **User-local install** (self-updating from the public repo, no root); **`.deb`** optional |
|
||
| Scope of action | **Read-only + suggestions** (no auto-apply yet) |
|
||
| Stress tests | **Out of scope** |
|
||
|
||
Full rationale and the still-open questions are in `docs/DECISIONS.md`.
|
||
|
||
## Repo layout
|
||
|
||
| Path | Purpose |
|
||
|------|---------|
|
||
| `docs/SPEC.md` | Product specification — vision, requirements, modules (the main planning doc) |
|
||
| `docs/ARCHITECTURE.md` | Technical design — core engine, front-ends, daemon, installer |
|
||
| `docs/MODULES.md` | Catalog of modules with scope, dependencies, status |
|
||
| `docs/ROADMAP.md` | Phased milestones |
|
||
| `docs/DECISIONS.md` | Decision log + remaining open questions |
|
||
| `src/rigdoctor/` | Source code — `core/` engine + sources, `cli.py`, `render.py` |
|
||
| `installer/` | Installer / `.deb` packaging (empty until Phase 4) |
|
||
| `tests/` | Tests (stdlib `unittest`) |
|
||
|
||
## Install (user-local, no root)
|
||
|
||
RigDoctor installs into a private venv under `~/.local` — no root, self-updating:
|
||
|
||
```bash
|
||
./install.sh # from a source checkout or the self-extracting .run
|
||
./install.sh --ref v0.0.6 # install a specific released tag (needs a token)
|
||
./install.sh --uninstall # remove it
|
||
```
|
||
|
||
This adds `rigdoctor` / `rigdoctor-gui` to `~/.local/bin` and a desktop entry. Each release
|
||
also ships a one-file **`.run`** installer (download, `chmod +x`, run). Updates are gated to
|
||
accounts on the Git server (a Personal Access Token); save one via the GUI **Setup → Update
|
||
access** panel or `rigdoctor login`, then `rigdoctor update` (or the sidebar button).
|
||
|
||
## Run it (dev)
|
||
|
||
Stdlib-only, no install needed (target is Python ≥ 3.11; tested on 3.14):
|
||
|
||
```bash
|
||
PYTHONPATH=src python3 -m rigdoctor snapshot # one-shot sensor read
|
||
PYTHONPATH=src python3 -m rigdoctor snapshot --json
|
||
PYTHONPATH=src python3 -m rigdoctor monitor -n 1 # live view (Ctrl-C to quit)
|
||
PYTHONPATH=src python3 -m rigdoctor sources # list detected sensor sources
|
||
PYTHONPATH=src python3 -m unittest discover -s tests
|
||
```
|
||
|
||
### Crash-capture logger (M3)
|
||
|
||
A crash-safe background logger (JSONL, `fsync` per sample, bounded by rotation) for catching
|
||
the state right before a freeze:
|
||
|
||
```bash
|
||
rigdoctor record start # start logging in the background
|
||
rigdoctor record status # is it running? latest readings, sample count
|
||
rigdoctor record stop # stop it
|
||
rigdoctor record report # post-crash summary: peaks, events, last samples
|
||
rigdoctor record run # run in the foreground (the systemd-ready entrypoint)
|
||
```
|
||
|
||
Logs live in `~/.local/share/rigdoctor/logs/`. It detects GPU "lost"/hang (nvidia-smi query
|
||
timeout) and writes an event marker. Trigger modes (always-on / game-launch) and the
|
||
`systemd --user` service arrive in Phase 4.
|
||
|
||
### Desktop GUI (M10)
|
||
|
||
The GUI uses PySide6 (Qt) — the only part of RigDoctor that needs a non-stdlib dep:
|
||
|
||
```bash
|
||
pip install -e '.[gui]' # core + PySide6, gives `rigdoctor` and `rigdoctor-gui`
|
||
rigdoctor gui # or: rigdoctor-gui
|
||
```
|
||
|
||
It opens a dark-themed window with sidebar navigation and a **live dashboard** over the
|
||
same sensor core — circular gauges for the headline metrics plus collapsible per-subsystem
|
||
cards (GPU/CPU/memory/storage) with temperature-colored values (icey-blue → green → red).
|
||
The **Logs** and **Health** sections are full pages (recording controls + post-crash report;
|
||
and the kernel-log / SMART / driver scan). **Inventory** is a placeholder until M5 lands.
|
||
|
||
Without the GUI extra, `pip install -e .` gives just the stdlib-only CLI.
|
||
|
||
## Start here
|
||
|
||
1. Read `docs/SPEC.md` for what we're building.
|
||
2. Read `docs/ROADMAP.md` for the build order (Phase 1 = the MVP).
|
||
3. Read `docs/DECISIONS.md` for the settled decisions (D1–D15).
|
||
</content>
|