New optional module (D24): explains the collected findings in plain language,
contacted ONLY on an explicit user action (never automatic).
- core/ai.py: provider chosen explicitly (no default) — ollama (local) or claude
(Anthropic Messages API via stdlib urllib; key in keyring). Grounded prompt;
HTTP error parsing; one-shot (no thinking/caching — snappy).
- core/ai_knowledge.py: curated reference KB (Xid/SMART/Proton/tunables),
exact keyword/code match ("RAG-lite", no embeddings) injected into the prompt —
lifts local models, sharpens Claude. No fine-tuning.
- config: ai_provider/ai_model/ai_endpoint + keyring-backed AI key (generalized
the token keyring helpers).
- GUI: Settings → AI assistant (provider radios, model/endpoint/key, Save/Test);
"Explain with AI" button on the diagnostic dialog (consent prompt for cloud).
- CLI: `rigdoctor ai status|test|explain`.
- Docs: D24, SPEC/MODULES/ROADMAP (Phase 7); tests for providers/grounding/parse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
19 KiB
RigDoctor — Decisions & Open Questions
Format: each item is OPEN (needs a call) or DECIDED (with date + rationale). Decisions D1–D19 are settled (D1–D15 on 2026-05-21); the original open questions are kept below with their resolutions so the reasoning is traceable. No tracked decisions are currently open.
Decided
D1 — Project name — DECIDED 2026-05-21
RigDoctor. Confirmed as the final name (repo, package, and CLI command rigdoctor).
Alternatives (RigWatch, GameDoc, Penguin Pit Crew, LGD) dropped.
D2 — Language / runtime — DECIDED 2026-05-21
Python 3 + Qt (PySide6).
- Why Python: fastest AI-assisted development (largest codegen corpus) and a perfect fit
for the real workload — parsing
nvidia-smi/sysfs/journalctl, CSV/JSON, subprocess. - Why Qt/PySide6: one toolkit covers both the desktop GUI and the system-tray applet.
- Layering that preserves "low overhead": the core engine, CLI, and crash-logger daemon stay stdlib-only (no hard deps, tiny footprint); only the GUI and tray modules pull in PySide6. This maps cleanly onto the modular installer — a headless/server user never installs Qt.
- Trade-off accepted: the GUI carries a Qt runtime dependency (not a single static binary).
Mitigated by shipping a
.debthat declarespython3+python3-pyside6(see D8).
D3 — Distro priority order — DECIDED 2026-05-21
Ubuntu first, by an explicit margin. Debian comes along for free via apt. Arch
(pacman) / Fedora (dnf) / openSUSE (zypper) are best-effort later. The package-manager
and distro abstraction stays in the design so other distros can be added, but all primary
development, testing, and packaging target Ubuntu.
D4 — GPU vendor priority — DECIDED 2026-05-21
NVIDIA first. It's the seed hardware (RTX 3070) and the source of the motivating crash. AMD and Intel come later behind the vendor abstraction; nothing should hard-code NVIDIA in a way that blocks them.
D5 — MVP scope — DECIDED 2026-05-21
M1 + M3 + M4 (the Essential bundle), NVIDIA-only. This was the first build target — it captures the seed crash and explains the logs before any installer, multi-vendor, etc. work. (The MVP was built CLI-first; per D17 the GUI is now the primary interface going forward — the CLI keeps full parity.)
D6 — Crash-logger trigger model — DECIDED 2026-05-21
Let the user choose. All three modes are supported and selectable (installer + config):
- Always-on
systemd --userservice. - Game-launch-triggered (auto-start when a game/Steam session starts, stop after).
- Manual (CLI command, or the tray applet's "start recording" button). Still open: the exact game-launch detection mechanism — see D12.
D7 — Stress / repro module — DECIDED 2026-05-21
Out of scope. Module M7 is dropped. RigDoctor will not build or bundle stress/load generators. Users who want to reproduce load can run existing tools (gpu-burn, vkmark, stress-ng) themselves alongside the logger.
D8 — Distribution / packaging — DECIDED 2026-05-21; revised 2026-05-21 (see D18)
Primary: a user-local install (pipx/venv or a versioned bundle under ~/.local, owned by
the user) so the app can self-update from the public Gitea releases with no root (D18). A
.deb remains an optional system-install channel for users who prefer it (updated via
apt). Why the revision: the repo is public and we want frictionless, GUI-first self-updates,
which a root-owned system package can't apply silently. The interactive installer (M9) layers
module selection on top of either channel. AUR / Flatpak / COPR still later, if warranted.
D9 — Scope of action (read-only vs apply-fixes) — DECIDED 2026-05-21
Read-only + suggestions. RigDoctor diagnoses, monitors, and suggests actions in plain language (with the exact command where possible), but does not apply changes itself in this stage. Auto-applying fixes (governor, power profile, etc.) is a deliberate later milestone, gated behind explicit user consent when it lands.
D10 — GUI is a first-class deliverable — DECIDED 2026-05-21
The app must run three ways: (a) CLI-only / headless (full functionality from the terminal, works over SSH), (b) a desktop GUI, and (c) a system-tray / top-menu-bar applet with quick actions. This supersedes the original "terminal-first, GUI maybe later" non-goal. GUI and tray are separate optional modules over the shared core engine.
D11 — Tray / menu-bar applet — DECIDED 2026-05-21
A small always-available applet in the Linux top menu bar (system tray / StatusNotifierItem,
via Qt's QSystemTrayIcon; on Ubuntu/GNOME this surfaces through the AppIndicator
extension). Provides quick actions and at-a-glance status.
Still open: the exact set of quick actions/indicators — see D13.
D12 — Game-launch detection mechanism — DECIDED 2026-05-21
Layered approach, no root (logger stays a systemd --user service):
- Wrapper (precise, primary):
rigdoctor wrap %command%for per-game Steam launch options, plus an installer helper that registers RigDoctor as a global Steam compatibility tool (covers all Proton games without per-game edits). The same wrapper field works in Lutris/Heroic. Deterministic start/stop, knows the title, needs no watcher daemon. Build first. - Zero-config watcher (fallback): low-frequency poll of Steam's
RunningAppID(~/.steam/registry.vdf) plus a/procheuristic for non-Steam launchers, for users who won't edit launch options. Build later. - GameMode (opportunistic): if Feral
gamemodedis present, use its D-BusGameRegistered/GameUnregisteredsignals (viagdbus/busctl— no Python dbus dep).
- Explicitly rejected: root-only kernel mechanisms (proc-connector netlink
PROC_EVENTS, eBPF) — they'd force the logger to run as root. - Phasing: wrapper ships with the game-launch trigger mode (Phase 4); watcher + GameMode follow.
D13 — Tray / menu-bar applet: actions & indicators — DECIDED 2026-05-21
Live readouts (from M1) + a Run Diagnostic action.
- At-a-glance live data shown inline in the tray dropdown, refreshed periodically: CPU temp, GPU temp, memory used/total (e.g. "14 GB / 32 GB"). A status dot (normal / throttling / alert) is proposed alongside.
- Run Diagnostic — the primary action. Launches the guided diagnostic session (SPEC §4): prompts which game to focus on, starts a focused log collection for that game's session (M3, scoped via the D12 game detection), then scans/analyzes (M4) and presents the findings.
- Supporting actions (proposed minimal set): Open dashboard (M10), Start/Stop recording (manual trigger), Snapshot now, Quit.
D14 — Final installer module list & bundles — DECIDED 2026-05-21
Use the current MODULES.md catalog and bundles as final. Modules: M1, M2, M3, M4, M5,
M6, M8, M9, M10, M11 (M7 dropped). Bundles: Essential / Monitoring / Diagnostics /
Desktop UI (+ Custom). No further additions planned for v1.
D15 — Distro package-name mapping → apt-only — DECIDED 2026-05-21
What it was: RigDoctor's optional modules need a few system packages (smartmontools,
lm-sensors, dmidecode, python3-pyside6, AppIndicator). The same tool is named differently
per distro (e.g. lm-sensors on apt vs lm_sensors on pacman/dnf; Qt is python3-pyside6
on apt). Supporting multiple distros would require a table mapping each logical dependency to
the right package name per package manager.
Decision: apt-only. We maintain package names for Ubuntu/apt only and do not
build or maintain mappings for other package managers. A thin seam is left in the design so
another package manager could be added later, but multi-distro support is not a planned
deliverable. Revisit only if Ubuntu-only proves too narrow.
D16 — Session sharing / remote assist (M12) — DECIDED 2026-05-21
Build a session-sharing / remote-assist capability (new module M12) so a user (A) can let a helper (B) inspect their machine. Full ladder, built in order:
- Diagnostic bundle export —
share exportpackages inventory (M5) + recent capture log (M3) + a report into one file A sends to B; B opens it in RigDoctor. One-way, no live connection. Safest; build first. - Live read-only view — a small local server serving the live dashboard + logs read-only, reached over a user-chosen tunnel (Tailscale / cloudflared / SSH reverse tunnel — no RigDoctor-hosted relay, to keep the no-telemetry promise). Token-gated, short TTL, A approves and can kill instantly. No terminal.
- Gated interactive terminal — wrap an existing trusted tool (
tmate/sshx) rather than rolling our own; read-only link by default, read-write requires explicit per-session consent. This is a deliberate, consent-gated exception to the read-only stance (D9) — it's full machine access and must be treated as such.
Cross-cutting principles: explicit per-session consent; ephemeral, revocable tokens; clear permission escalation (view ≠ shell); no mandatory central relay; session audit log. Note: this adds M12 on top of the "final" list from D14; the catalog is updated accordingly.
D17 — GUI-first interface emphasis — DECIDED 2026-05-21
The desktop GUI (M10) is the primary, default interface for end users — it's the more user-friendly way in, and every capability (recording, reports, status, …) must be reachable from it. This supersedes the earlier "CLI-first / terminal-first" framing (updates D5 and the SPEC wording).
- The CLI is not removed: it keeps full functionality for headless / SSH / server / scripting use, and it's the engine the background daemon runs on.
- No change to layering (D2): the core, CLI, and daemon stay stdlib-only and must run without Qt. "GUI-first" is about emphasis and front-end parity, not dropping headless support.
D18 — Auto-update (M13) — PLANNED 2026-05-21; mechanism revised 2026-05-21
RigDoctor should check for a newer version on launch and self-update (new module M13). Mechanism (revised): user-local, no-root self-update over authenticated HTTP (token). Why revised: the Gitea instance requires sign-in for all anonymous access (repo page, releases feed, raw, API all 303/403 anonymously), so the original "public HTTP" plan can't work. Updates are therefore gated to people with an account on the Gitea server, which is desirable — access control is delegated to Gitea.
- Auth: each user creates a Personal Access Token (scope
read:repository); RigDoctor stores it at~/.config/rigdoctor/token(mode 0600) or readsRIGDOCTOR_TOKEN. Requests sendAuthorization: token <PAT>. Finer access = repo visibility/collaborators on Gitea. - Check:
GET /api/v1/repos/jessey/rigdoctor/releases/latestwith the token; compare tags. - Apply:
pip install --upgrade "git+https://oauth2:<token>@…/rigdoctor.git@<tag>"into the user-local venv, then restart (incl. the daemon). No root. - States surfaced: no-token → "connect to update server"; auth error → "access denied"; newer → "Update to v…"; else "up-to-date".
- Original (now-superseded) plan was anonymous public HTTP:
- Install model (D8 revised): primary install is user-local (
~/.local), so the running app can replace its own files and update with no apt, no root, no password prompt. - Check: on launch, query the public Gitea releases API
(
/api/v1/repos/jessey/rigdoctor/releases/latest) over HTTPS; compare to the running version. - Apply: download the new release bundle, verify checksum/signature, stage it
(e.g.
~/.local/share/rigdoctor/versions/x.y.z), swap a symlink atomically, then restart (including thesystemd --userdaemon). - GUI-first (D17): a non-intrusive "update available" prompt + one-click apply;
rigdoctor updatein the CLI. - Security: HTTPS only; verify checksum/signature before swapping; never run unverified code.
- Privacy (no telemetry): version-check only — no tracking; auto-check is opt-out-able.
.debusers: the optional.debchannel updates via apt instead; auto-update targets the user-local install.- Caveat (to confirm before building): the Gitea instance currently requires sign-in for
API calls (
"Only signed in user is allowed to call APIs."), so anonymous version checks need the instance/repo set to allow anonymous access — or a separate public version endpoint (e.g. a static file or a mirror).
D19 — Versioning & changelog — DECIDED 2026-05-21
Track a version number on every change. SemVer-style MAJOR.MINOR.PATCH (pre-1.0: bump
PATCH for ordinary changes, MINOR for larger milestones). __version__
(rigdoctor/__init__.py) and pyproject.toml are the single source of truth and must match
the git release tag so the auto-updater (D18) can compare versions. Every change updates
CHANGELOG.md — now generated from Conventional Commits via git-cliff (see D20).
Milestone policy (pre-1.0): 0.0.x = early development; 0.1.0 = first complete,
installable, self-updating release (reached 2026-05-21); 0.x.0 = each later milestone
(AMD/Intel, unattended logger auto-start, session sharing…); 1.0.0 = broadly stable
(multi-vendor/distro, no major caveats). PATCH (0.x.PATCH) for fixes/small changes. Note: an early placeholder 0.1.0 was corrected to
follow the released 0.0.x line — first release was V0.0.1; current is 0.0.2.
D20 — Automated changelog & release notes — DECIDED 2026-05-21
Release notes are generated from our changes, surfaced in the auto-updater.
- Release body: CI sets each Gitea release's
bodyfrom the matchingCHANGELOG.mdsection (was a hardcoded "Automated release for…"). The updater fetches the releasebodyand shows "What's new" — a dialog before applying (GUI) and inrigdoctor update(CLI). - Generation: adopt Conventional Commits (
feat:/fix:/docs:/chore:…) and git-cliff (cliff.toml,packaging/changelog.sh) to generateCHANGELOG.mdfrom commit history. Refines D19's "hand-write CHANGELOG" to "generate it from conventional commits";__version__/pyproject.toml/tag still the source of truth for the version. - CI does not auto-commit the changelog (avoids push loops) — it's regenerated by the dev via the script when cutting a version; CI only reads the section for the release body.
D21 — Versioning rules & automation — DECIDED 2026-05-21
The next version is determined by the Conventional Commit types since the last release (D20), so it can be auto-computed instead of guessed:
fix:/perf:→ bump PATCH.feat:→ bump MINOR (pre-1.0:0.MINOR.0).- breaking (
feat!:/BREAKING CHANGE:) → pre-1.0: bump MINOR (not major); post-1.0: MAJOR. docs:/chore:/refactor:/ci:/test:/style:alone → PATCH (no feature release).- Milestone overrides by hand are allowed (e.g., jumping to
1.0.0); see the milestone policy in D19.
Automation: git-cliff --bumped-version computes the next version from history;
packaging/bump.sh writes it into __init__.py + pyproject.toml. Rules live in
cliff.toml [bump] (pre-1.0: breaking_always_bump_major = false).
D22 — Limited live apply of fixes (M6) — DECIDED 2026-05-22; realizes the D9 milestone
D9 deferred auto-applying fixes to "a deliberate later milestone, gated behind explicit user consent." That milestone lands here, scoped tightly to stay safe:
- Only runtime-reversible settings are applyable from the gaming-environment report (M6): CPU governor, NVIDIA persistence mode, PCIe ASPM policy, vm.swappiness, Transparent HugePages. Each takes effect immediately, needs no reboot, and reverts on reboot.
- How: a dropdown of the live options + an Apply button per finding (
core/fixes.py). Applying runs a single pkexec-elevated command (one auth prompt); the chosen value is validated against the live options first; writes target sysfs/procfs ornvidia-smi— never the GRUB cmdline or a persistent config file. - Still suggestion-only (the read-only stance holds for these): GRUB-based
pcie_aspm=off, CPU mitigations changes (security-sensitive, need a reboot), and the shader-cache env var. - Everything remains CLI-discoverable (
rigdoctor gameenvstill prints the exact commands); the apply UI is an additive convenience in the GUI, not the only path. Installing optional tools (GameMode/MangoHud/cpupower) reuses the M9 installer and is likewise one-click.
D23 — Session sharing scoped to a shared terminal only — DECIDED 2026-05-22; amends D16
D16's escalating ladder (export → read-only stats view → terminal) is cut down to just the shared terminal. Rationale: the terminal is the only mode the owner wants; the stats view duplicated what the GUI already shows and added surface area. Concretely:
- Removed: the read-only stats view + its HTTP server (
core/share.py,rigdoctor share serve) and the (never-built) bundle export. TheshareCLI command is gone. - Kept & finished: the relay shared terminal (host PTY of
$SHELL) — now color-rendered (preserves fish/ls/git theming), full-screen-able, with the guest read-only unless the host ticks "Allow the guest to type" (the D9 consent exception). Account-gated by the Gitea token.
D24 — AI assistant module (M14) — DECIDED 2026-05-22; adds to D14
A new optional module that explains the collected diagnostics in plain language (likely root cause + suggested next steps). Adds M14 to the D14 set.
- Strictly opt-in, never automatic. The model is contacted only on an explicit user
action (an "Explain with AI" button /
rigdoctor ai explain) — never on launch, after a diagnostic, in the sample/record loop, or in the background. Configuring a provider does not trigger any call. - Local-first. Defaults to a local Ollama server (data never leaves the machine, no
key, stdlib
urllib). An OpenAI-compatible endpoint (cloud or local) can be used with a key (stored in the keyring like the update token). Cloud use shows a "this sends your data to X" consent before the first call. - Grounded & advisory. The prompt carries only the findings we collected; output is framed as suggestions (consistent with D9 — it explains/recommends, applying fixes stays consent-gated). No new runtime dependency (HTTP via stdlib).
Open
None currently — all tracked decisions (D1–D24) are resolved. New questions will be added
here as they arise. Remaining detail to flesh out during build: the tray's supporting-action
set (D13), per-module apt package names, M12's tunnel/token specifics, and M13's
update mechanism (APT repo vs. self-installed .deb).