2e545ff718
Scope M12 down to a single shared-terminal mode (D23, amends D16): - Share page rewritten terminal-only: host shares their PTY/shell; guest watches and may type only if the host ticks "Allow the guest to type" (read-only otherwise — the D9 consent exception). Terminal is larger; either side can pop it full-screen (Esc to exit). - Removed the read-only stats view + HTTP server (core/share.py) and the `rigdoctor share serve` CLI; deleted their tests. - Docs: D23 added; SPEC/MODULES/ROADMAP updated (M12 → done, terminal-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
260 lines
18 KiB
Markdown
260 lines
18 KiB
Markdown
# RigDoctor — Decisions & Open Questions
|
||
|
||
Format: each item is **OPEN** (needs a call) or **DECIDED** (with date + rationale).
|
||
Decisions D1–D19 are settled (D1–D15 on 2026-05-21); the original open questions are kept
|
||
below with their resolutions so the reasoning is traceable. No tracked decisions are
|
||
currently open.
|
||
|
||
## Decided
|
||
|
||
### D1 — Project name — *DECIDED 2026-05-21*
|
||
**RigDoctor.** Confirmed as the final name (repo, package, and CLI command `rigdoctor`).
|
||
Alternatives (RigWatch, GameDoc, Penguin Pit Crew, LGD) dropped.
|
||
|
||
### D2 — Language / runtime — *DECIDED 2026-05-21*
|
||
**Python 3 + Qt (PySide6).**
|
||
- *Why Python:* fastest AI-assisted development (largest codegen corpus) and a perfect fit
|
||
for the real workload — parsing `nvidia-smi`/sysfs/`journalctl`, CSV/JSON, subprocess.
|
||
- *Why Qt/PySide6:* one toolkit covers **both** the desktop GUI and the system-tray applet.
|
||
- *Layering that preserves "low overhead":* the **core engine, CLI, and crash-logger daemon
|
||
stay stdlib-only** (no hard deps, tiny footprint); **only the GUI and tray modules pull in
|
||
PySide6**. This maps cleanly onto the modular installer — a headless/server user never
|
||
installs Qt.
|
||
- *Trade-off accepted:* the GUI carries a Qt runtime dependency (not a single static binary).
|
||
Mitigated by shipping a `.deb` that declares `python3` + `python3-pyside6` (see D8).
|
||
|
||
### D3 — Distro priority order — *DECIDED 2026-05-21*
|
||
**Ubuntu first**, by an explicit margin. Debian comes along for free via `apt`. Arch
|
||
(`pacman`) / Fedora (`dnf`) / openSUSE (`zypper`) are best-effort later. The package-manager
|
||
and distro abstraction stays in the design so other distros can be added, but all primary
|
||
development, testing, and packaging target Ubuntu.
|
||
|
||
### D4 — GPU vendor priority — *DECIDED 2026-05-21*
|
||
**NVIDIA first.** It's the seed hardware (RTX 3070) and the source of the motivating crash.
|
||
AMD and Intel come later behind the vendor abstraction; nothing should hard-code NVIDIA in a
|
||
way that blocks them.
|
||
|
||
### D5 — MVP scope — *DECIDED 2026-05-21*
|
||
**M1 + M3 + M4 (the *Essential* bundle), NVIDIA-only.** This was the first build target — it
|
||
captures the seed crash and explains the logs before any installer, multi-vendor, etc. work.
|
||
*(The MVP was built CLI-first; per D17 the GUI is now the primary interface going forward —
|
||
the CLI keeps full parity.)*
|
||
|
||
### D6 — Crash-logger trigger model — *DECIDED 2026-05-21*
|
||
**Let the user choose.** All three modes are supported and selectable (installer + config):
|
||
1. **Always-on** `systemd --user` service.
|
||
2. **Game-launch-triggered** (auto-start when a game/Steam session starts, stop after).
|
||
3. **Manual** (CLI command, or the tray applet's "start recording" button).
|
||
*Still open:* the exact game-launch detection mechanism — see D12.
|
||
|
||
### D7 — Stress / repro module — *DECIDED 2026-05-21*
|
||
**Out of scope. Module M7 is dropped.** RigDoctor will not build or bundle stress/load
|
||
generators. Users who want to reproduce load can run existing tools (gpu-burn, vkmark,
|
||
stress-ng) themselves alongside the logger.
|
||
|
||
### D8 — Distribution / packaging — *DECIDED 2026-05-21; revised 2026-05-21 (see D18)*
|
||
**Primary: a user-local install** (pipx/venv or a versioned bundle under `~/.local`, owned by
|
||
the user) so the app can **self-update from the public Gitea releases with no root** (D18). A
|
||
**`.deb` remains an optional** system-install channel for users who prefer it (updated via
|
||
apt). *Why the revision:* the repo is public and we want frictionless, GUI-first self-updates,
|
||
which a root-owned system package can't apply silently. The interactive installer (M9) layers
|
||
module selection on top of either channel. AUR / Flatpak / COPR still later, if warranted.
|
||
|
||
### D9 — Scope of action (read-only vs apply-fixes) — *DECIDED 2026-05-21*
|
||
**Read-only + suggestions.** RigDoctor diagnoses, monitors, and **suggests** actions in
|
||
plain language (with the exact command where possible), but does **not** apply changes
|
||
itself in this stage. Auto-applying fixes (governor, power profile, etc.) is a deliberate
|
||
later milestone, gated behind explicit user consent when it lands.
|
||
|
||
### D10 — GUI is a first-class deliverable — *DECIDED 2026-05-21*
|
||
The app must run **three ways**: (a) **CLI-only / headless** (full functionality from the
|
||
terminal, works over SSH), (b) a **desktop GUI**, and (c) a **system-tray / top-menu-bar
|
||
applet** with quick actions. This supersedes the original "terminal-first, GUI maybe later"
|
||
non-goal. GUI and tray are separate optional modules over the shared core engine.
|
||
|
||
### D11 — Tray / menu-bar applet — *DECIDED 2026-05-21*
|
||
A small always-available applet in the Linux top menu bar (system tray / StatusNotifierItem,
|
||
via Qt's `QSystemTrayIcon`; on Ubuntu/GNOME this surfaces through the AppIndicator
|
||
extension). Provides quick actions and at-a-glance status.
|
||
*Still open:* the exact set of quick actions/indicators — see D13.
|
||
|
||
### D12 — Game-launch detection mechanism — *DECIDED 2026-05-21*
|
||
**Layered approach, no root** (logger stays a `systemd --user` service):
|
||
1. **Wrapper (precise, primary):** `rigdoctor wrap %command%` for per-game Steam launch
|
||
options, plus an installer helper that registers RigDoctor as a **global Steam
|
||
compatibility tool** (covers all Proton games without per-game edits). The same wrapper
|
||
field works in Lutris/Heroic. Deterministic start/stop, knows the title, needs no
|
||
watcher daemon. *Build first.*
|
||
2. **Zero-config watcher (fallback):** low-frequency poll of Steam's `RunningAppID`
|
||
(`~/.steam/registry.vdf`) plus a `/proc` heuristic for non-Steam launchers, for users
|
||
who won't edit launch options. *Build later.*
|
||
3. **GameMode (opportunistic):** if Feral `gamemoded` is present, use its D-Bus
|
||
`GameRegistered`/`GameUnregistered` signals (via `gdbus`/`busctl` — no Python dbus dep).
|
||
- *Explicitly rejected:* root-only kernel mechanisms (proc-connector netlink `PROC_EVENTS`,
|
||
eBPF) — they'd force the logger to run as root.
|
||
- *Phasing:* wrapper ships with the game-launch trigger mode (Phase 4); watcher + GameMode
|
||
follow.
|
||
|
||
### D13 — Tray / menu-bar applet: actions & indicators — *DECIDED 2026-05-21*
|
||
**Live readouts (from M1) + a Run Diagnostic action.**
|
||
- **At-a-glance live data** shown inline in the tray dropdown, refreshed periodically:
|
||
**CPU temp, GPU temp, memory used/total** (e.g. "14 GB / 32 GB"). A status dot
|
||
(normal / throttling / alert) is proposed alongside.
|
||
- **Run Diagnostic** — the primary action. Launches the **guided diagnostic session**
|
||
(SPEC §4): prompts *which game to focus on*, starts a focused log collection for that
|
||
game's session (M3, scoped via the D12 game detection), then scans/analyzes (M4) and
|
||
presents the findings.
|
||
- **Supporting actions (proposed minimal set):** Open dashboard (M10), Start/Stop recording
|
||
(manual trigger), Snapshot now, Quit.
|
||
|
||
### D14 — Final installer module list & bundles — *DECIDED 2026-05-21*
|
||
**Use the current `MODULES.md` catalog and bundles as final.** Modules: M1, M2, M3, M4, M5,
|
||
M6, M8, M9, M10, M11 (M7 dropped). Bundles: Essential / Monitoring / Diagnostics /
|
||
Desktop UI (+ Custom). No further additions planned for v1.
|
||
|
||
### D15 — Distro package-name mapping → apt-only — *DECIDED 2026-05-21*
|
||
*What it was:* RigDoctor's optional modules need a few system packages (smartmontools,
|
||
lm-sensors, dmidecode, python3-pyside6, AppIndicator). The same tool is named differently
|
||
per distro (e.g. `lm-sensors` on apt vs `lm_sensors` on pacman/dnf; Qt is `python3-pyside6`
|
||
on apt). Supporting multiple distros would require a table mapping each logical dependency to
|
||
the right package name per package manager.
|
||
*Decision:* **apt-only.** We maintain package names for **Ubuntu/apt only** and do **not**
|
||
build or maintain mappings for other package managers. A thin seam is left in the design so
|
||
another package manager *could* be added later, but multi-distro support is **not** a planned
|
||
deliverable. Revisit only if Ubuntu-only proves too narrow.
|
||
|
||
### D16 — Session sharing / remote assist (M12) — *DECIDED 2026-05-21*
|
||
Build a **session-sharing / remote-assist** capability (new module **M12**) so a user (A)
|
||
can let a helper (B) inspect their machine. **Full ladder, built in order:**
|
||
1. **Diagnostic bundle export** — `share export` packages inventory (M5) + recent capture
|
||
log (M3) + a report into one file A sends to B; B opens it in RigDoctor. One-way, no live
|
||
connection. Safest; build first.
|
||
2. **Live read-only view** — a small local server serving the live dashboard + logs
|
||
read-only, reached over a **user-chosen tunnel** (Tailscale / cloudflared / SSH reverse
|
||
tunnel — *no RigDoctor-hosted relay*, to keep the no-telemetry promise). Token-gated,
|
||
short TTL, A approves and can kill instantly. No terminal.
|
||
3. **Gated interactive terminal** — wrap an existing trusted tool (`tmate`/`sshx`) rather
|
||
than rolling our own; **read-only link by default**, read-write requires explicit
|
||
per-session consent. This is a deliberate, consent-gated exception to the read-only stance
|
||
(D9) — it's full machine access and must be treated as such.
|
||
|
||
*Cross-cutting principles:* explicit per-session consent; ephemeral, revocable tokens;
|
||
clear permission escalation (view ≠ shell); no mandatory central relay; session audit log.
|
||
*Note:* this adds M12 on top of the "final" list from D14; the catalog is updated accordingly.
|
||
|
||
### D17 — GUI-first interface emphasis — *DECIDED 2026-05-21*
|
||
The **desktop GUI (M10) is the primary, default interface** for end users — it's the more
|
||
user-friendly way in, and **every capability** (recording, reports, status, …) must be
|
||
reachable from it. This **supersedes the earlier "CLI-first / terminal-first" framing**
|
||
(updates D5 and the SPEC wording).
|
||
- *The CLI is not removed:* it keeps **full functionality** for headless / SSH / server /
|
||
scripting use, and it's the engine the background daemon runs on.
|
||
- *No change to layering (D2):* the core, CLI, and daemon stay **stdlib-only** and must run
|
||
without Qt. "GUI-first" is about emphasis and front-end parity, not dropping headless support.
|
||
|
||
### D18 — Auto-update (M13) — *PLANNED 2026-05-21; mechanism revised 2026-05-21*
|
||
RigDoctor should **check for a newer version on launch and self-update** (new module **M13**).
|
||
**Mechanism (revised): user-local, no-root self-update over authenticated HTTP (token).**
|
||
*Why revised:* the Gitea instance requires sign-in for **all** anonymous access (repo page,
|
||
releases feed, raw, API all 303/403 anonymously), so the original "public HTTP" plan can't
|
||
work. Updates are therefore **gated to people with an account on the Gitea server**, which is
|
||
desirable — access control is delegated to Gitea.
|
||
- *Auth:* each user creates a **Personal Access Token** (scope `read:repository`); RigDoctor
|
||
stores it at `~/.config/rigdoctor/token` (mode 0600) or reads `RIGDOCTOR_TOKEN`. Requests
|
||
send `Authorization: token <PAT>`. Finer access = repo visibility/collaborators on Gitea.
|
||
- *Check:* `GET /api/v1/repos/jessey/rigdoctor/releases/latest` with the token; compare tags.
|
||
- *Apply:* `pip install --upgrade "git+https://oauth2:<token>@…/rigdoctor.git@<tag>"` into the
|
||
user-local venv, then restart (incl. the daemon). No root.
|
||
- *States surfaced:* no-token → "connect to update server"; auth error → "access denied";
|
||
newer → "Update to v…"; else "up-to-date".
|
||
- *Original (now-superseded) plan was anonymous public HTTP:*
|
||
- *Install model (D8 revised):* primary install is **user-local** (`~/.local`), so the running
|
||
app can replace its own files and update with **no apt, no root, no password prompt**.
|
||
- *Check:* on launch, query the **public Gitea releases API**
|
||
(`/api/v1/repos/jessey/rigdoctor/releases/latest`) over HTTPS; compare to the running version.
|
||
- *Apply:* download the new release bundle, **verify checksum/signature**, stage it
|
||
(e.g. `~/.local/share/rigdoctor/versions/x.y.z`), swap a symlink atomically, then restart
|
||
(including the `systemd --user` daemon).
|
||
- *GUI-first (D17):* a non-intrusive "update available" prompt + one-click apply; `rigdoctor
|
||
update` in the CLI.
|
||
- *Security:* HTTPS only; verify checksum/signature before swapping; never run unverified code.
|
||
- *Privacy (no telemetry):* version-check only — no tracking; auto-check is opt-out-able.
|
||
- *`.deb` users:* the optional `.deb` channel updates via apt instead; auto-update targets the
|
||
user-local install.
|
||
- *Caveat (to confirm before building):* the Gitea instance currently **requires sign-in for
|
||
API calls** (`"Only signed in user is allowed to call APIs."`), so anonymous version checks
|
||
need the instance/repo set to allow anonymous access — or a separate public version endpoint
|
||
(e.g. a static file or a mirror).
|
||
|
||
### D19 — Versioning & changelog — *DECIDED 2026-05-21*
|
||
**Track a version number on every change.** SemVer-style `MAJOR.MINOR.PATCH` (pre-1.0: bump
|
||
PATCH for ordinary changes, MINOR for larger milestones). `__version__`
|
||
(`rigdoctor/__init__.py`) and `pyproject.toml` are the single source of truth and **must match
|
||
the git release tag** so the auto-updater (D18) can compare versions. Every change updates
|
||
`CHANGELOG.md` — now generated from **Conventional Commits** via git-cliff (see D20).
|
||
*Milestone policy (pre-1.0):* **0.0.x** = early development; **0.1.0** = first complete,
|
||
installable, self-updating release (reached 2026-05-21); **0.x.0** = each later milestone
|
||
(AMD/Intel, unattended logger auto-start, session sharing…); **1.0.0** = broadly stable
|
||
(multi-vendor/distro, no major caveats). PATCH (`0.x.PATCH`) for fixes/small changes. *Note:* an early placeholder `0.1.0` was corrected to
|
||
follow the released **0.0.x** line — first release was **V0.0.1**; current is **0.0.2**.
|
||
|
||
### D20 — Automated changelog & release notes — *DECIDED 2026-05-21*
|
||
**Release notes are generated from our changes, surfaced in the auto-updater.**
|
||
- *Release body:* CI sets each Gitea release's `body` from the matching `CHANGELOG.md`
|
||
section (was a hardcoded "Automated release for…"). The updater fetches the release `body`
|
||
and shows **"What's new"** — a dialog before applying (GUI) and in `rigdoctor update` (CLI).
|
||
- *Generation:* adopt **Conventional Commits** (`feat:`/`fix:`/`docs:`/`chore:` …) and
|
||
**git-cliff** (`cliff.toml`, `packaging/changelog.sh`) to generate `CHANGELOG.md` from
|
||
commit history. Refines D19's "hand-write CHANGELOG" to "generate it from conventional
|
||
commits"; `__version__`/`pyproject.toml`/tag still the source of truth for the version.
|
||
- *CI does not auto-commit the changelog* (avoids push loops) — it's regenerated by the dev
|
||
via the script when cutting a version; CI only reads the section for the release body.
|
||
|
||
### D21 — Versioning rules & automation — *DECIDED 2026-05-21*
|
||
The next version is **determined by the Conventional Commit types** since the last release
|
||
(D20), so it can be auto-computed instead of guessed:
|
||
- `fix:` / `perf:` → bump **PATCH**.
|
||
- `feat:` → bump **MINOR** (pre-1.0: `0.MINOR.0`).
|
||
- breaking (`feat!:` / `BREAKING CHANGE:`) → pre-1.0: bump **MINOR** (not major); post-1.0: MAJOR.
|
||
- `docs:` / `chore:` / `refactor:` / `ci:` / `test:` / `style:` alone → **PATCH** (no feature release).
|
||
- Milestone overrides by hand are allowed (e.g., jumping to `1.0.0`); see the milestone policy in D19.
|
||
|
||
*Automation:* `git-cliff --bumped-version` computes the next version from history;
|
||
`packaging/bump.sh` writes it into `__init__.py` + `pyproject.toml`. Rules live in
|
||
`cliff.toml [bump]` (pre-1.0: `breaking_always_bump_major = false`).
|
||
|
||
### D22 — Limited live apply of fixes (M6) — *DECIDED 2026-05-22; realizes the D9 milestone*
|
||
D9 deferred auto-applying fixes to "a deliberate later milestone, gated behind explicit user
|
||
consent." That milestone lands here, **scoped tightly to stay safe**:
|
||
- **Only runtime-reversible settings** are applyable from the gaming-environment report (M6):
|
||
**CPU governor, NVIDIA persistence mode, PCIe ASPM policy, vm.swappiness, Transparent
|
||
HugePages.** Each takes effect immediately, needs **no reboot**, and reverts on reboot.
|
||
- **How:** a dropdown of the live options + an Apply button per finding (`core/fixes.py`).
|
||
Applying runs a **single pkexec-elevated command** (one auth prompt); the chosen value is
|
||
validated against the live options first; writes target **sysfs/procfs or `nvidia-smi`** —
|
||
never the GRUB cmdline or a persistent config file.
|
||
- **Still suggestion-only** (the read-only stance holds for these): GRUB-based `pcie_aspm=off`,
|
||
CPU **mitigations** changes (security-sensitive, need a reboot), and the shader-cache env var.
|
||
- Everything remains **CLI-discoverable** (`rigdoctor gameenv` still prints the exact commands);
|
||
the apply UI is an additive convenience in the GUI, not the only path. Installing optional
|
||
tools (GameMode/MangoHud/cpupower) reuses the M9 installer and is likewise one-click.
|
||
|
||
### D23 — Session sharing scoped to a shared terminal only — *DECIDED 2026-05-22; amends D16*
|
||
D16's escalating ladder (export → read-only stats view → terminal) is **cut down to just the
|
||
shared terminal.** Rationale: the terminal is the only mode the owner wants; the stats view
|
||
duplicated what the GUI already shows and added surface area. Concretely:
|
||
- **Removed:** the read-only stats view + its HTTP server (`core/share.py`, `rigdoctor share
|
||
serve`) and the (never-built) bundle export. The `share` CLI command is gone.
|
||
- **Kept & finished:** the relay **shared terminal** (host PTY of `$SHELL`) — now color-rendered
|
||
(preserves fish/ls/git theming), full-screen-able, with the guest read-only unless the host
|
||
ticks "Allow the guest to type" (the D9 consent exception). Account-gated by the Gitea token.
|
||
|
||
## Open
|
||
|
||
None currently — all tracked decisions (D1–D23) are resolved. New questions will be added
|
||
here as they arise. Remaining detail to flesh out during build: the tray's supporting-action
|
||
set (D13), per-module apt package names, M12's tunnel/token specifics, and M13's
|
||
update mechanism (APT repo vs. self-installed `.deb`).
|
||
</content>
|
||
</invoke>
|