Each NVMe drive's Inventory entry now shows its negotiated PCIe link (e.g.
'· PCIe Gen4 x4') from sysfs (current/max link speed+width), and flags drives
running below their capability ('Gen3 x4 (capable of Gen4 x4)') — so you can
confirm a Gen4 SSD is in a Gen4 slot. SATA disks show no PCIe link. Renders in
the GUI Inventory, CLI, and the Markdown/JSON export automatically. +tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rigdoctor update assumed a pip/venv install and ran 'python -m pip install', which
fails on a .deb (system python has no pip; you can't pip-upgrade a dpkg package).
Add updates.install_kind() (dpkg ownership / venv / source-checkout detection,
cached) and route apply_update: pip self-updates in place; apt and source installs
return guidance instead. CLI and the GUI Update button show the apt/git command.
Adds tests/test_updates.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AlertMonitor now scans the kernel log (journalctl -k) every ~30s and fires
one-shot, cooldown-gated desktop alerts on critical events: NVIDIA Xid, OOM
kills, CPU machine-checks, PCIe AER, and disk I/O errors — so users are warned
the moment something goes wrong, not only on a temperature threshold. Disk I/O
errors come from the kernel log (no root needed, unlike smartctl). Edge/spam
protection reuses the existing cooldown model. syslogs.scan_critical() does the
matching; init seeds last-scan to "now" so old boot logs don't alert on launch.
Tests for the matcher + monitor gating/cooldown; Settings note updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ai.explain_stream(findings_text, on_chunk) streams token deltas and returns
(ok, full_text). Ollama: stream=True NDJSON; Claude: stream=True SSE (parse
content_block_delta text deltas). The diagnostic dialog opens an explanation
window immediately and fills it token-by-token via a _chunk signal, then
re-renders the finished answer as Markdown — no more multi-second freeze on a
local model. Non-streaming explain() kept for the CLI. Tests for both parsers;
verified live against qwen2.5:7b.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expand diagnostic/report collection (all stored per-diagnostic, in the Report zip;
logs also fed to the AI on "Explain"):
- syslogs: nvidia-smi -q snapshot (driver/throttle/clocks/power/temps/PCIe/ECC/
retired pages) + display-server log auto-detected — Xorg.0.log on X11, or the
compositor user-journal slice (gnome-shell/kwin/sway/gamescope) on Wayland.
- diagstore: include the full M5 inventory (inventory.txt + .json) — invaluable
for larger/shared debugging. inventory.collect() degrades gracefully (no root
prompt). Best-effort throughout.
- Tests for nvidia/display + inventory in store; docs (M15/SPEC).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
core/syslogs.py gathers, scoped to the diagnostic window:
- kernel-log slice (journalctl -k): Xid, OOM, MCE, PCIe AER, thermal, hung tasks
- crashed-process records (coredumpctl): exe, signal, when
Stored as syslogs.txt in the diagnostic dir, included in the Report bundle, and
fed to the AI on "Explain" alongside the game logs. Best-effort (degrades if the
tools are missing/denied); treats journalctl's "-- No entries --" as empty.
Tests + docs (M15/SPEC).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One `logging_enabled` toggle (default off) gates everything (D25):
- core/applog.py: rotating app.log (no-op unless enabled); setup() at GUI/CLI start.
- core/diagstore.py: each diagnostic stored in DATA_DIR/diagnostics/<id>/ (capture,
result.json, report.txt, scoped gamelogs, ai/ records of exactly what was sent to
the model + which model + the reply). make_report() zips a diagnostic (+ app.log)
into DATA_DIR/reports/.
- diagnostic.finish()/analyze_crash() store when enabled; DiagnosticResult.dir.
- GUI: Settings → Logging toggle; "Report" button on the diagnostic dialog; AI
interactions recorded into the diagnostic dir on "Explain with AI".
- CLI: `rigdoctor bundle` (report is taken by the M4 health report).
- Tests for store/record_ai/make_report + applog gating; docs (D25, M15, Phase 8).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The model guessed "Rainbow Six Siege" for appID 2694490 (Path of Exile 2). We
already know the names locally, so ground it: steam.appid_names() maps appid→name
from the scanned library, and ai.build_prompt scans the text for app IDs and
injects a resolved glossary. Only locally-known IDs are listed; no network, no
fine-tuning. Tests + verified live (2694490 = Path of Exile 2).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The user ran a game ~20s with no crash but the AI dredged up old log lines,
guessed the wrong game, and gave Windows advice. Fixes:
- Prompt now includes the real game name + capture duration + outcome (clean vs
crash), so the model uses the known game instead of guessing from log paths.
- gamelogs.collect(since=…): scope Steam-console lines by timestamp and skip a
stale per-app Proton log (mtime before the session) — no unrelated past run.
- ai_knowledge: flag benign Steam/Proton lines (libnvidia-ml.so.1 assertion,
routine minidumps, "fork without exec") as non-causal.
- System prompt: Linux-only steps (no "run as administrator"); don't manufacture
a problem on a clean run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1) The explanation popup rendered raw Markdown (### / **). Switched to
QTextEdit.setMarkdown and told the model to answer in Markdown.
2) On "Explain with AI", also collect recent Proton (~/steam-*.log) and Steam
console logs (core/gamelogs.py — tail-read, size-bounded) and include them in
the prompt so the model can correlate log errors with findings and pinpoint
when things went wrong. Reference-fact matching runs over the logs too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New optional module (D24): explains the collected findings in plain language,
contacted ONLY on an explicit user action (never automatic).
- core/ai.py: provider chosen explicitly (no default) — ollama (local) or claude
(Anthropic Messages API via stdlib urllib; key in keyring). Grounded prompt;
HTTP error parsing; one-shot (no thinking/caching — snappy).
- core/ai_knowledge.py: curated reference KB (Xid/SMART/Proton/tunables),
exact keyword/code match ("RAG-lite", no embeddings) injected into the prompt —
lifts local models, sharpens Claude. No fine-tuning.
- config: ai_provider/ai_model/ai_endpoint + keyring-backed AI key (generalized
the token keyring helpers).
- GUI: Settings → AI assistant (provider radios, model/endpoint/key, Save/Test);
"Explain with AI" button on the diagnostic dialog (consent prompt for cloud).
- CLI: `rigdoctor ai status|test|explain`.
- Docs: D24, SPEC/MODULES/ROADMAP (Phase 7); tests for providers/grounding/parse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The full installer experience as a GUI wizard (gui/setup_wizard.py): environment
summary → pick dependency bundles (from the catalog, grouped) → install missing
apt packages → choose recording trigger → readiness summary.
- Shown on first launch (config setup_done) and via `rigdoctor-gui --setup`;
re-runnable from Settings → Run setup wizard.
- install.sh launches it after a fresh install when a desktop session is present.
- catalog.by_bundle() groups components; config gains setup_done.
- Tests: by_bundle grouping + wizard construction smoke.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scope M12 down to a single shared-terminal mode (D23, amends D16):
- Share page rewritten terminal-only: host shares their PTY/shell; guest watches
and may type only if the host ticks "Allow the guest to type" (read-only
otherwise — the D9 consent exception). Terminal is larger; either side can pop
it full-screen (Esc to exit).
- Removed the read-only stats view + HTTP server (core/share.py) and the
`rigdoctor share serve` CLI; deleted their tests.
- Docs: D23 added; SPEC/MODULES/ROADMAP updated (M12 → done, terminal-only).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
M6 leftovers (the watcher defers to M9's trigger-mode work):
- gameenv: check_gpu_powermizer (NVIDIA, X; degrades when the gpu target won't
resolve), check_wine (wine --version), check_steam_client (dpkg package version);
steam.client_version() helper.
- core/launchers.py: detect Lutris (read-only SQLite pga.db) and Heroic (Epic
legendary + GOG JSON) installed games; Game gained a `launcher` field.
- Games page + `rigdoctor games` list non-Steam games alongside Steam, tagged by
launcher; Run Diagnostic works on them (auto-launch stays Steam-only).
- Tests for launchers (synthetic Lutris db + Heroic json).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upgrade `rigdoctor monitor` from a basic redraw to a stdlib curses dashboard
(tui.py): current / session-min / session-max per sensor, grouped by subsystem,
with temperature & utilization color bands (GPU-lost flagged red). q quits,
r resets min/max. Plain full-screen redraw fallback on a non-TTY (--plain forces
it). Pure track()/band() helpers are unit-tested; curses path verified in a pty.
Completes the Monitoring bundle (M2 + M8).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reshape the IA so it reads by intent instead of a flat pile of pages.
- Grouped sidebar: Monitor / Diagnose / System / App (section headers).
- Renames: Health → System Health, Environment → Tuning, Logs → Recordings,
Setup → Settings.
- Settings absorbs Notifications (alerts) as a section; Notifications dropped as a
separate page (notifications_page.py removed; SetupPage gains the alerts card +
`changed` signal wired to the live alert monitor).
- Recordings is now a hub: a source dropdown to view any captured log (always-on /
last diagnostic / preserved crash) + Analyze-crash in place, plus the recorder
controls; status line now shows the captured game.
- main_window nav is data-driven (_NAV groups → _PAGES order → stack); show_page,
badges, and tray flows updated. GUI smoke test asserts the new page set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QSystemTrayIcon applet (gui/tray.py, D13): menu with live CPU/GPU temp + memory
used/total, a status line, a Run Diagnostic submenu per detected game, plus Open
dashboard / Start-Stop recording / Snapshot-copy / Quit. Reuses the dashboard's
sample stream; drives existing MainWindow flows.
- MainWindow creates the tray when one is available; closing the window hides to
tray (Quit exits); setQuitOnLastWindowClosed(False) so dialogs don't quit it.
- app: `--tray` starts hidden for autostart.
- tests/test_gui_smoke.py: construct MainWindow headless + exercise the tray, so
a startup crash (like the 0.18.0 import bug) fails the build. Skips if no PySide6.
- docs: M10/M11 marked done in MODULES/ROADMAP.
Completes the Desktop UI bundle (M10 + M11).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D12 "build first" wrapper: `rigdoctor wrap %command%` (Steam launch option /
Lutris/Heroic wrapper field) auto-brackets a focused diagnostic around a game —
start a game-tagged capture on launch, clean stop on exit; a hard freeze leaves
it unterminated → flagged as a crash next launch.
- core/wrap.py: game name from SteamAppId, PATH-proof launch_option(), run()
that doesn't disturb an existing capture and returns the game's exit code.
- diagnostic.start() preserves an unanalyzed crash to diagnostic-crash.jsonl
before clearing, so auto-relaunch can't wipe an unseen crash; pending_crash/
analyze_crash check the archive first.
- GUI: "Auto-capture…" helper dialog (copyable launch-option string).
- Tests for wrap (name resolution, exit-code passthrough, no-double-start).
- docs: fix stale MODULES.md status column (M1/M3/M4/M5/M8/M10/M13 → done),
update ROADMAP/MODULES for the wrapper + crash detection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A focused capture that ends without a clean stop (no session-stop, no live
recorder) is treated as a likely hard freeze.
- core/diagnostic.py: pending_crash() detects the unterminated session;
acknowledge_crash() dismisses it; analyze_crash() combines the captured window
(final readings + GPU-lost) with a focused scan of the PREVIOUS (crashed) boot
+ SMART/driver/persistence/temps.
- health.check_previous_boot() scans `journalctl -k -b -1`; run_health_checks
gained include_journal to avoid double-scanning for the crash path.
- GUI: Games page shows a warning banner on launch for an interrupted diagnostic
with Analyze crash / Dismiss → results dialog.
- Tests for crash detection / clean-stop / acknowledge / in-progress.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The seed use case end to end, orchestrating M3 + M4 (ARCHITECTURE §7.1).
- core/diagnostic.py: start(game) runs a focused, game-tagged capture into a
dedicated diagnostic log (window-scoped report, separate from the always-on
crash log); finish() stops it and combines the capture summary (M3) with the
health findings (M4). Game recorded as a log event so it survives crash+reboot.
- CLI: rigdoctor diagnose start --game/--appid | status | finish.
- recorder/record run gained an optional --game tag; reccontrol passes it through.
- Tests for game recovery + the finish() combination.
GUI/tray "Run Diagnostic" button and auto start/stop (D12 wrapper) come next.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make the environment report actionable, not just advisory.
Install (reuses M9 installer):
- Add GameMode, MangoHud, cpupower to the component catalog (so they also show
on the Setup page); catalog.by_id() lookup.
- "tool not installed" findings (GameMode/MangoHud) get an Install button.
Apply runtime-reversible tunables (D22, realizing the D9 consent-gated milestone):
- core/fixes.py: dropdown of live options + Apply for CPU governor, NVIDIA
persistence, PCIe ASPM policy, vm.swappiness, THP. One pkexec command each,
no reboot, reverts on reboot; chosen value validated against live options;
writes go to sysfs/procfs/nvidia-smi, never GRUB. GRUB/mitigations stay
suggestion-only.
- Finding gained optional action (install) + fix (apply) ids; shared
finding_card renders the matching control; Environment page wires both and
re-checks after a change.
Tests for fixes (parse, command builders, value validation, gameenv wiring).
Docs: D22 added (amends D9); SPEC/MODULES/ROADMAP updated. 0.9.0 -> 0.10.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The evaluate-and-suggest half of M6: a read-only findings report (D9) over
system settings that affect gaming stability/performance, each with the exact
fix command.
- core/gameenv.py: PCIe ASPM, NVIDIA persistence mode, CPU governor (the three
seed-case contributors to GPU bus-drop / Xid 79), GameMode, MangoHud,
vm.swappiness, shader disk cache, THP, CPU mitigations, Proton versions.
Pure evaluate_* helpers split from IO for testing; reuses the M4 Finding model.
- steam.proton_versions(): surfaces installed Proton builds for the report.
- CLI: rigdoctor gameenv (text / --json); render_health() gained a title arg.
- GUI: new Environment page; extracted a shared finding_card widget and switched
the Health page to it.
- Tests for the pure evaluators + aggregate.
Also fix: desktop notifications now use the RigDoctor icon (installed theme copy
-> bundled asset -> stock fallback) instead of a generic stock icon, matching
the app/dock icon.
Docs (MODULES/ROADMAP) updated; version 0.8.0 -> 0.9.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first slice of M6 (gaming-environment checks): detect a user's Steam
libraries and the games installed in each — also the D12 "pick a game"
foundation.
- core/steam.py: multi-install/library discovery (libraryfolders.vdf, symlink
dedupe, native/Flatpak/Snap), appmanifest_*.acf scan with runtime/Proton/
redist filtering, scan cache + new-game diff. Stdlib only. VDF keys read
case-insensitively (e.g. lastupdated vs SizeOnDisk).
- Libraries are opt-in (config steam_libraries); the flat TOML writer now
emits list/array values.
- GUI Games page: library checkboxes with per-library counts, game list,
background rescan on every launch, NEW badge + sidebar count for games
installed since the last scan (acknowledged when viewed).
- CLI: rigdoctor games / games libraries [--enable|--disable|--all|--json]
(headless-complete, D17).
- Tests for VDF parse, scan, tool filter, cache diff, config list round-trip.
- Docs (MODULES/ROADMAP) updated; version 0.7.3 -> 0.8.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- feat(share): host-consented interactive terminal over the relay. The host shares
a real PTY shell (core/pty_session.py); the guest renders it with pyte and sends
keystrokes (gui/terminal_widget.py) — vim/top/tab-completion/Ctrl-C work. Runs as
the host's user (never root). The host reads along live and can type too, e.g. a
sudo password, which stays local and is never sent to the guest. Off by default.
Guest also pulls inventory on join (req_full).
- fix(gui): style all form controls (QLineEdit/QPlainTextEdit/spin boxes/combo/
terminals) dark-on-light-text — Fusion defaulted them to unreadable light-on-light.
- replaces the command/response shell with the full PTY; adds pyte to the gui extra.
Verified end-to-end against the deployed relay (guest keystroke ran on host PTY).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a Share tab that hosts or joins a read-only live session through the
rigdoctor-relay over WebSocket (QtWebSockets), gated by the Gitea access token.
- gui/share_page.py: Start shared session (host: get a code, stream snapshot +
health + inventory) and Enter share code (guest: view a host's data read-only)
- core/share.py: host_full_frame / host_snapshot_frame + guest_html renderer
- config: relay_url (default wss://rigdoctor.jesseyvanofferen.com)
- setup: token now powers updates AND sharing — hint asks for read:user +
read:repository scopes (relay validates the account via Gitea)
- main_window: Share nav tab + socket cleanup on close
- tests for the relay frame builders and guest HTML
Verified end-to-end against the deployed relay (host code -> guest frame).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- feat(alerts): desktop notifications (notify-send) for overheat (GPU/CPU past a
configurable threshold), GPU-lost, and a new-version-available alert (once per
version). Edge-triggered with cooldown so it doesn't spam (core/alerts.py)
- feat(gui): Notifications page to configure alerts (enable, GPU/CPU thresholds,
Send test); changes apply live and persist via config.save_config/update_config
- feat(gui): ship a RigDoctor icon; the GUI self-registers the icon + .desktop on
launch and sets the Wayland app-id, so the dock shows it after an update + relaunch
(no installer re-run); installer/uninstaller updated to manage the icon
- config: alerts_enabled, gpu_temp_alert, cpu_temp_alert; flat-TOML writer
- tests for the alert monitor and config round-trip
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>