Files
rigdoctor/docs/DECISIONS.md
T
jessey 09cbc57b8c
release / release (push) Successful in 13s
feat: in-app uninstaller, changelog viewer, version automation (0.1.0)
First milestone release — a complete, installable, self-updating RigDoctor:
live monitoring, crash capture + health report, desktop GUI, user-local
install/uninstall, and token-gated self-update with real release notes.

- feat(gui): in-app uninstaller — Setup "Uninstall RigDoctor" button and
  `rigdoctor uninstall [--purge]`; removes venv/launchers/desktop entry
  (detached so it can delete its own venv), with optional purge of
  settings/token/logs (core/uninstall.py)
- feat(gui): in-app changelog — sidebar "Changelog" link listing release
  history fetched from the update server (updates.list_releases)
- chore: versioning rules + automation (D21) — git-cliff --bumped-version,
  packaging/bump.sh, cliff.toml [bump] (pre-1.0: breaking -> minor)
- chore(release): stamp 0.1.0; milestone policy recorded in D19

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:42:29 +02:00

234 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# RigDoctor — Decisions & Open Questions
Format: each item is **OPEN** (needs a call) or **DECIDED** (with date + rationale).
Decisions D1D19 are settled (D1D15 on 2026-05-21); the original open questions are kept
below with their resolutions so the reasoning is traceable. No tracked decisions are
currently open.
## Decided
### D1 — Project name — *DECIDED 2026-05-21*
**RigDoctor.** Confirmed as the final name (repo, package, and CLI command `rigdoctor`).
Alternatives (RigWatch, GameDoc, Penguin Pit Crew, LGD) dropped.
### D2 — Language / runtime — *DECIDED 2026-05-21*
**Python 3 + Qt (PySide6).**
- *Why Python:* fastest AI-assisted development (largest codegen corpus) and a perfect fit
for the real workload — parsing `nvidia-smi`/sysfs/`journalctl`, CSV/JSON, subprocess.
- *Why Qt/PySide6:* one toolkit covers **both** the desktop GUI and the system-tray applet.
- *Layering that preserves "low overhead":* the **core engine, CLI, and crash-logger daemon
stay stdlib-only** (no hard deps, tiny footprint); **only the GUI and tray modules pull in
PySide6**. This maps cleanly onto the modular installer — a headless/server user never
installs Qt.
- *Trade-off accepted:* the GUI carries a Qt runtime dependency (not a single static binary).
Mitigated by shipping a `.deb` that declares `python3` + `python3-pyside6` (see D8).
### D3 — Distro priority order — *DECIDED 2026-05-21*
**Ubuntu first**, by an explicit margin. Debian comes along for free via `apt`. Arch
(`pacman`) / Fedora (`dnf`) / openSUSE (`zypper`) are best-effort later. The package-manager
and distro abstraction stays in the design so other distros can be added, but all primary
development, testing, and packaging target Ubuntu.
### D4 — GPU vendor priority — *DECIDED 2026-05-21*
**NVIDIA first.** It's the seed hardware (RTX 3070) and the source of the motivating crash.
AMD and Intel come later behind the vendor abstraction; nothing should hard-code NVIDIA in a
way that blocks them.
### D5 — MVP scope — *DECIDED 2026-05-21*
**M1 + M3 + M4 (the *Essential* bundle), NVIDIA-only.** This was the first build target — it
captures the seed crash and explains the logs before any installer, multi-vendor, etc. work.
*(The MVP was built CLI-first; per D17 the GUI is now the primary interface going forward —
the CLI keeps full parity.)*
### D6 — Crash-logger trigger model — *DECIDED 2026-05-21*
**Let the user choose.** All three modes are supported and selectable (installer + config):
1. **Always-on** `systemd --user` service.
2. **Game-launch-triggered** (auto-start when a game/Steam session starts, stop after).
3. **Manual** (CLI command, or the tray applet's "start recording" button).
*Still open:* the exact game-launch detection mechanism — see D12.
### D7 — Stress / repro module — *DECIDED 2026-05-21*
**Out of scope. Module M7 is dropped.** RigDoctor will not build or bundle stress/load
generators. Users who want to reproduce load can run existing tools (gpu-burn, vkmark,
stress-ng) themselves alongside the logger.
### D8 — Distribution / packaging — *DECIDED 2026-05-21; revised 2026-05-21 (see D18)*
**Primary: a user-local install** (pipx/venv or a versioned bundle under `~/.local`, owned by
the user) so the app can **self-update from the public Gitea releases with no root** (D18). A
**`.deb` remains an optional** system-install channel for users who prefer it (updated via
apt). *Why the revision:* the repo is public and we want frictionless, GUI-first self-updates,
which a root-owned system package can't apply silently. The interactive installer (M9) layers
module selection on top of either channel. AUR / Flatpak / COPR still later, if warranted.
### D9 — Scope of action (read-only vs apply-fixes) — *DECIDED 2026-05-21*
**Read-only + suggestions.** RigDoctor diagnoses, monitors, and **suggests** actions in
plain language (with the exact command where possible), but does **not** apply changes
itself in this stage. Auto-applying fixes (governor, power profile, etc.) is a deliberate
later milestone, gated behind explicit user consent when it lands.
### D10 — GUI is a first-class deliverable — *DECIDED 2026-05-21*
The app must run **three ways**: (a) **CLI-only / headless** (full functionality from the
terminal, works over SSH), (b) a **desktop GUI**, and (c) a **system-tray / top-menu-bar
applet** with quick actions. This supersedes the original "terminal-first, GUI maybe later"
non-goal. GUI and tray are separate optional modules over the shared core engine.
### D11 — Tray / menu-bar applet — *DECIDED 2026-05-21*
A small always-available applet in the Linux top menu bar (system tray / StatusNotifierItem,
via Qt's `QSystemTrayIcon`; on Ubuntu/GNOME this surfaces through the AppIndicator
extension). Provides quick actions and at-a-glance status.
*Still open:* the exact set of quick actions/indicators — see D13.
### D12 — Game-launch detection mechanism — *DECIDED 2026-05-21*
**Layered approach, no root** (logger stays a `systemd --user` service):
1. **Wrapper (precise, primary):** `rigdoctor wrap %command%` for per-game Steam launch
options, plus an installer helper that registers RigDoctor as a **global Steam
compatibility tool** (covers all Proton games without per-game edits). The same wrapper
field works in Lutris/Heroic. Deterministic start/stop, knows the title, needs no
watcher daemon. *Build first.*
2. **Zero-config watcher (fallback):** low-frequency poll of Steam's `RunningAppID`
(`~/.steam/registry.vdf`) plus a `/proc` heuristic for non-Steam launchers, for users
who won't edit launch options. *Build later.*
3. **GameMode (opportunistic):** if Feral `gamemoded` is present, use its D-Bus
`GameRegistered`/`GameUnregistered` signals (via `gdbus`/`busctl` — no Python dbus dep).
- *Explicitly rejected:* root-only kernel mechanisms (proc-connector netlink `PROC_EVENTS`,
eBPF) — they'd force the logger to run as root.
- *Phasing:* wrapper ships with the game-launch trigger mode (Phase 4); watcher + GameMode
follow.
### D13 — Tray / menu-bar applet: actions & indicators — *DECIDED 2026-05-21*
**Live readouts (from M1) + a Run Diagnostic action.**
- **At-a-glance live data** shown inline in the tray dropdown, refreshed periodically:
**CPU temp, GPU temp, memory used/total** (e.g. "14 GB / 32 GB"). A status dot
(normal / throttling / alert) is proposed alongside.
- **Run Diagnostic** — the primary action. Launches the **guided diagnostic session**
(SPEC §4): prompts *which game to focus on*, starts a focused log collection for that
game's session (M3, scoped via the D12 game detection), then scans/analyzes (M4) and
presents the findings.
- **Supporting actions (proposed minimal set):** Open dashboard (M10), Start/Stop recording
(manual trigger), Snapshot now, Quit.
### D14 — Final installer module list & bundles — *DECIDED 2026-05-21*
**Use the current `MODULES.md` catalog and bundles as final.** Modules: M1, M2, M3, M4, M5,
M6, M8, M9, M10, M11 (M7 dropped). Bundles: Essential / Monitoring / Diagnostics /
Desktop UI (+ Custom). No further additions planned for v1.
### D15 — Distro package-name mapping → apt-only — *DECIDED 2026-05-21*
*What it was:* RigDoctor's optional modules need a few system packages (smartmontools,
lm-sensors, dmidecode, python3-pyside6, AppIndicator). The same tool is named differently
per distro (e.g. `lm-sensors` on apt vs `lm_sensors` on pacman/dnf; Qt is `python3-pyside6`
on apt). Supporting multiple distros would require a table mapping each logical dependency to
the right package name per package manager.
*Decision:* **apt-only.** We maintain package names for **Ubuntu/apt only** and do **not**
build or maintain mappings for other package managers. A thin seam is left in the design so
another package manager *could* be added later, but multi-distro support is **not** a planned
deliverable. Revisit only if Ubuntu-only proves too narrow.
### D16 — Session sharing / remote assist (M12) — *DECIDED 2026-05-21*
Build a **session-sharing / remote-assist** capability (new module **M12**) so a user (A)
can let a helper (B) inspect their machine. **Full ladder, built in order:**
1. **Diagnostic bundle export**`share export` packages inventory (M5) + recent capture
log (M3) + a report into one file A sends to B; B opens it in RigDoctor. One-way, no live
connection. Safest; build first.
2. **Live read-only view** — a small local server serving the live dashboard + logs
read-only, reached over a **user-chosen tunnel** (Tailscale / cloudflared / SSH reverse
tunnel — *no RigDoctor-hosted relay*, to keep the no-telemetry promise). Token-gated,
short TTL, A approves and can kill instantly. No terminal.
3. **Gated interactive terminal** — wrap an existing trusted tool (`tmate`/`sshx`) rather
than rolling our own; **read-only link by default**, read-write requires explicit
per-session consent. This is a deliberate, consent-gated exception to the read-only stance
(D9) — it's full machine access and must be treated as such.
*Cross-cutting principles:* explicit per-session consent; ephemeral, revocable tokens;
clear permission escalation (view ≠ shell); no mandatory central relay; session audit log.
*Note:* this adds M12 on top of the "final" list from D14; the catalog is updated accordingly.
### D17 — GUI-first interface emphasis — *DECIDED 2026-05-21*
The **desktop GUI (M10) is the primary, default interface** for end users — it's the more
user-friendly way in, and **every capability** (recording, reports, status, …) must be
reachable from it. This **supersedes the earlier "CLI-first / terminal-first" framing**
(updates D5 and the SPEC wording).
- *The CLI is not removed:* it keeps **full functionality** for headless / SSH / server /
scripting use, and it's the engine the background daemon runs on.
- *No change to layering (D2):* the core, CLI, and daemon stay **stdlib-only** and must run
without Qt. "GUI-first" is about emphasis and front-end parity, not dropping headless support.
### D18 — Auto-update (M13) — *PLANNED 2026-05-21; mechanism revised 2026-05-21*
RigDoctor should **check for a newer version on launch and self-update** (new module **M13**).
**Mechanism (revised): user-local, no-root self-update over authenticated HTTP (token).**
*Why revised:* the Gitea instance requires sign-in for **all** anonymous access (repo page,
releases feed, raw, API all 303/403 anonymously), so the original "public HTTP" plan can't
work. Updates are therefore **gated to people with an account on the Gitea server**, which is
desirable — access control is delegated to Gitea.
- *Auth:* each user creates a **Personal Access Token** (scope `read:repository`); RigDoctor
stores it at `~/.config/rigdoctor/token` (mode 0600) or reads `RIGDOCTOR_TOKEN`. Requests
send `Authorization: token <PAT>`. Finer access = repo visibility/collaborators on Gitea.
- *Check:* `GET /api/v1/repos/jessey/rigdoctor/releases/latest` with the token; compare tags.
- *Apply:* `pip install --upgrade "git+https://oauth2:<token>@…/rigdoctor.git@<tag>"` into the
user-local venv, then restart (incl. the daemon). No root.
- *States surfaced:* no-token → "connect to update server"; auth error → "access denied";
newer → "Update to v…"; else "up-to-date".
- *Original (now-superseded) plan was anonymous public HTTP:*
- *Install model (D8 revised):* primary install is **user-local** (`~/.local`), so the running
app can replace its own files and update with **no apt, no root, no password prompt**.
- *Check:* on launch, query the **public Gitea releases API**
(`/api/v1/repos/jessey/rigdoctor/releases/latest`) over HTTPS; compare to the running version.
- *Apply:* download the new release bundle, **verify checksum/signature**, stage it
(e.g. `~/.local/share/rigdoctor/versions/x.y.z`), swap a symlink atomically, then restart
(including the `systemd --user` daemon).
- *GUI-first (D17):* a non-intrusive "update available" prompt + one-click apply; `rigdoctor
update` in the CLI.
- *Security:* HTTPS only; verify checksum/signature before swapping; never run unverified code.
- *Privacy (no telemetry):* version-check only — no tracking; auto-check is opt-out-able.
- *`.deb` users:* the optional `.deb` channel updates via apt instead; auto-update targets the
user-local install.
- *Caveat (to confirm before building):* the Gitea instance currently **requires sign-in for
API calls** (`"Only signed in user is allowed to call APIs."`), so anonymous version checks
need the instance/repo set to allow anonymous access — or a separate public version endpoint
(e.g. a static file or a mirror).
### D19 — Versioning & changelog — *DECIDED 2026-05-21*
**Track a version number on every change.** SemVer-style `MAJOR.MINOR.PATCH` (pre-1.0: bump
PATCH for ordinary changes, MINOR for larger milestones). `__version__`
(`rigdoctor/__init__.py`) and `pyproject.toml` are the single source of truth and **must match
the git release tag** so the auto-updater (D18) can compare versions. Every change updates
`CHANGELOG.md` — now generated from **Conventional Commits** via git-cliff (see D20).
*Milestone policy (pre-1.0):* **0.0.x** = early development; **0.1.0** = first complete,
installable, self-updating release (reached 2026-05-21); **0.x.0** = each later milestone
(AMD/Intel, unattended logger auto-start, session sharing…); **1.0.0** = broadly stable
(multi-vendor/distro, no major caveats). PATCH (`0.x.PATCH`) for fixes/small changes. *Note:* an early placeholder `0.1.0` was corrected to
follow the released **0.0.x** line — first release was **V0.0.1**; current is **0.0.2**.
### D20 — Automated changelog & release notes — *DECIDED 2026-05-21*
**Release notes are generated from our changes, surfaced in the auto-updater.**
- *Release body:* CI sets each Gitea release's `body` from the matching `CHANGELOG.md`
section (was a hardcoded "Automated release for…"). The updater fetches the release `body`
and shows **"What's new"** — a dialog before applying (GUI) and in `rigdoctor update` (CLI).
- *Generation:* adopt **Conventional Commits** (`feat:`/`fix:`/`docs:`/`chore:` …) and
**git-cliff** (`cliff.toml`, `packaging/changelog.sh`) to generate `CHANGELOG.md` from
commit history. Refines D19's "hand-write CHANGELOG" to "generate it from conventional
commits"; `__version__`/`pyproject.toml`/tag still the source of truth for the version.
- *CI does not auto-commit the changelog* (avoids push loops) — it's regenerated by the dev
via the script when cutting a version; CI only reads the section for the release body.
### D21 — Versioning rules & automation — *DECIDED 2026-05-21*
The next version is **determined by the Conventional Commit types** since the last release
(D20), so it can be auto-computed instead of guessed:
- `fix:` / `perf:` → bump **PATCH**.
- `feat:` → bump **MINOR** (pre-1.0: `0.MINOR.0`).
- breaking (`feat!:` / `BREAKING CHANGE:`) → pre-1.0: bump **MINOR** (not major); post-1.0: MAJOR.
- `docs:` / `chore:` / `refactor:` / `ci:` / `test:` / `style:` alone → **PATCH** (no feature release).
- Milestone overrides by hand are allowed (e.g., jumping to `1.0.0`); see the milestone policy in D19.
*Automation:* `git-cliff --bumped-version` computes the next version from history;
`packaging/bump.sh` writes it into `__init__.py` + `pyproject.toml`. Rules live in
`cliff.toml [bump]` (pre-1.0: `breaking_always_bump_major = false`).
## Open
None currently — all tracked decisions (D1D21) are resolved. New questions will be added
here as they arise. Remaining detail to flesh out during build: the tray's supporting-action
set (D13), per-module apt package names, M12's tunnel/token specifics, and M13's
update mechanism (APT repo vs. self-installed `.deb`).
</content>
</invoke>