Compare commits

...

53 Commits

Author SHA1 Message Date
jessey 31178bace8 Merge pull request 'feat(memory): flag RAM below rated speed (XMP/EXPO not enabled) — 0.40.0' (#44) from feat/ram-speed into main
release / test (push) Successful in 13s
release / release (push) Successful in 16s
Reviewed-on: #44
2026-05-22 15:00:25 +00:00
jessey 04e8d72bce feat(memory): flag RAM below rated speed (XMP/EXPO not enabled) — 0.40.0
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 27s
Inventory shows configured RAM speed + the rated speed when lower
('4800 MT/s (rated 5600)'); System Health flags it with the fix (enable
XMP/EXPO in BIOS). With the profile off dmidecode only reports the JEDEC base,
so the rated speed comes from dmidecode's max OR the part number, matched against
known DDR5 speed grades to avoid false positives. inventory.module_speed() shared
by both; needs dmidecode (root/launch elevation). +tests (incl. the user's
CMK..5600 kit → (4800, 5600)). Completes the underperforming-hardware trio with
PCIe gen + refresh rate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 17:00:02 +02:00
jessey fb468e83c2 Merge pull request 'feat(displays): monitors w/ resolution+refresh in Inventory; flag sub-max refresh in Health — 0.39.0' (#43) from feat/displays into main
release / test (push) Successful in 12s
release / release (push) Successful in 15s
Reviewed-on: #43
2026-05-22 14:56:15 +00:00
jessey b006fa6b8d feat(displays): monitors w/ resolution+refresh in Inventory; flag sub-max refresh in Health — 0.39.0
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 27s
New core/displays.py reads connected monitors via GNOME Mutter DisplayConfig over
D-Bus (busctl --json; works on X11 + Wayland), falling back to xrandr on other X11
desktops. Inventory's Display section now lists each monitor's resolution + current
refresh (e.g. 'DP-1 · Samsung LC34G55T: 3440x1440 @ 165 Hz'). System Health
(check_displays) flags a monitor running below its max refresh AT THE CURRENT
resolution (e.g. 165 Hz panel set to 60 Hz) — never suggests lowering resolution.
+tests (Mutter JSON + xrandr parsers, health check).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 16:55:33 +02:00
jessey b20e8dfc3a Merge pull request 'docs: public apt setup — one-line source with arch=all (no token, no notices)' (#42) from docs/public-registry into main
release / test (push) Successful in 11s
release / release (push) Successful in 16s
Reviewed-on: #42
2026-05-22 14:52:02 +00:00
jessey 9fe9a6576f Merge pull request 'feat(health): flag NVMe PCIe links below capability (lane-sharing) — 0.38.0' (#41) from feat/inventory-pcie into main
release / test (push) Successful in 12s
release / release (push) Successful in 15s
Reviewed-on: #41
2026-05-22 14:51:12 +00:00
jessey 07bc722209 feat(health): flag NVMe PCIe links below capability (lane-sharing) — 0.38.0
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 27s
check_pcie_links() warns when an NVMe drive negotiates fewer lanes than it
supports — almost always motherboard lane-sharing (a GPU/second card or another
M.2 stealing lanes), the case the user asked about — and reports speed-only
reductions as info (slower slot / idle ASPM). GPU is excluded: NVIDIA drops its
PCIe gen+width at idle, so a snapshot would false-alarm. Reuses inventory
read_link/nvme_controllers (refactored to public). Wired into run_health_checks;
+tests. Folded into the 0.38.0 PCIe work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 16:49:47 +02:00
jessey d405bf7caf Merge pull request 'feat(inventory): show NVMe PCIe link gen/width, flag downtrains — 0.38.0' (#40) from feat/inventory-pcie into main
release / test (push) Successful in 13s
release / release (push) Successful in 15s
Reviewed-on: #40
2026-05-22 14:45:46 +00:00
jessey 9bb0f9a684 feat(inventory): show NVMe PCIe link gen/width, flag downtrains — 0.38.0
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 27s
Each NVMe drive's Inventory entry now shows its negotiated PCIe link (e.g.
'· PCIe Gen4 x4') from sysfs (current/max link speed+width), and flags drives
running below their capability ('Gen3 x4 (capable of Gen4 x4)') — so you can
confirm a Gen4 SSD is in a Gen4 slot. SATA disks show no PCIe link. Renders in
the GUI Inventory, CLI, and the Markdown/JSON export automatically. +tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 16:45:08 +02:00
jessey 4bbc0fa97e Merge pull request 'docs: add Dashboard/Inventory/Share screenshots to the README' (#39) from docs/readme-screenshots into main
release / test (push) Successful in 12s
release / release (push) Successful in 15s
Reviewed-on: #39
2026-05-22 14:43:13 +00:00
jessey a0f8055328 docs: add Dashboard/Inventory/Share screenshots to the README
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 26s
Adds assets/screenshots/{dashboard,inventory,share}.png and a Screenshots section
(Dashboard + Inventory side by side, Share below).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 16:40:47 +02:00
jessey 323451428b Merge pull request 'fix(update): route the self-update by install kind (apt/pip/source) — 0.37.1' (#38) from fix/updater-by-install into main
release / test (push) Successful in 11s
release / release (push) Successful in 16s
Reviewed-on: #38
2026-05-22 14:40:19 +00:00
jessey 479189ee4e fix(update): route the self-update by install kind (apt/pip/source) — 0.37.1
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 27s
rigdoctor update assumed a pip/venv install and ran 'python -m pip install', which
fails on a .deb (system python has no pip; you can't pip-upgrade a dpkg package).
Add updates.install_kind() (dpkg ownership / venv / source-checkout detection,
cached) and route apply_update: pip self-updates in place; apt and source installs
return guidance instead. CLI and the GUI Update button show the apt/git command.
Adds tests/test_updates.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 16:39:42 +02:00
jessey 51133e4042 Merge pull request 'feat(gui): scrollable pages + version footer — 0.37.0' (#37) from fix/scrollable-pages into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #37
2026-05-22 14:29:56 +00:00
jessey bcf6ac2656 feat(gui): scrollable pages + version footer — 0.37.0
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 31s
Wrap each page (except self-scrolling Dashboard/Health/Inventory and the Share
terminal) in a QScrollArea, so long pages scroll when too tall (Settings'
Uninstall is reachable again) and the window is no longer pinned to the tallest
page's height — min height drops from >screen to ~600px, so it can be resized
smaller. Add a bottom footer showing 'RigDoctor v<version>' bottom-right (moved
out of the sidebar); themed #Footer with a top border.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 16:29:14 +02:00
jessey 81c7757546 docs: public apt setup — one-line source with arch=all (no token, no notices)
tests / core (pull_request) Successful in 11s
tests / gui-smoke (pull_request) Successful in 27s
Registry is public now: drop the token/auth.conf.d, use the signed-by one-line
source with arch=all so apt doesn't emit 'doesn't support architecture amd64'
notices (our package is Architecture: all). Drop the curl|sh bootstrap idea.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 16:05:03 +02:00
jessey d59261f021 Merge pull request 'docs: registry is public now — drop the token/auth.conf.d from apt setup' (#36) from docs/public-registry into main
release / test (push) Successful in 13s
release / release (push) Successful in 15s
Reviewed-on: #36
2026-05-22 13:58:13 +00:00
jessey 44923b771a docs: registry is public now — drop the token/auth.conf.d from apt setup
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 27s
REQUIRE_SIGNIN_VIEW is off and the repo is public, so anonymous apt works. The
apt instructions no longer need a read:package token or auth.conf.d — just the
signing key + a deb822 Signed-By source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:57:40 +02:00
jessey eaaf14c58a Merge pull request 'fix(cli): correct the missing-PySide6 hint to the real apt packages — 0.36.1' (#35) from docs/apt-proper into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #35
2026-05-22 13:49:28 +00:00
jessey 7779131cf9 Merge branch 'main' into docs/apt-proper
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 27s
2026-05-22 13:48:36 +00:00
jessey 87fa678ccb fix(cli): correct the missing-PySide6 hint to the real apt packages — 0.36.1
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 26s
rigdoctor gui suggested 'apt install python3-pyside6' (no such package on
Debian/Ubuntu). Point to the split modules instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:48:20 +02:00
jessey c5e24b3984 Merge pull request 'docs: document the proper (GPG-verified, deb822) apt setup' (#34) from docs/apt-proper into main
release / test (push) Successful in 12s
release / release (push) Successful in 14s
Reviewed-on: #34
2026-05-22 13:46:10 +00:00
jessey 21cc6a4813 docs: document the proper (GPG-verified, deb822) apt setup
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 27s
Replace the trusted=yes apt instructions with the proper method: read:package
token, registry signing key dearmored into /etc/apt/keyrings, credentials in
auth.conf.d, and a modern deb822 .sources file with Signed-By + Architectures:
all. Keeps the trusted=yes one-liner as a noted fallback for unsigned registries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:44:41 +02:00
jessey ee73049248 Merge pull request 'fix(deb): auto-install all deps — correct PySide6 names + bundle tools — 0.36.0' (#33) from fix/deb-pyside6-deps into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #33
2026-05-22 13:39:01 +00:00
jessey 3a8ad5bd5d fix(deb): auto-install all deps — correct PySide6 names + bundle tools — 0.36.0
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 29s
The old Recommends named python3-pyside6 (no such package on Debian/Ubuntu —
PySide6 is split per module), so apt skipped it and the GUI couldn't start.
Now Recommends the real modules (python3-pyside6.qt{widgets,gui,websockets,svg}
+ python3-pyte) AND the optional diagnostic/gaming tools (smartmontools,
lm-sensors, dmidecode, pciutils, libnotify-bin, libsecret-tools, gamemode,
mangohud), so 'apt install rigdoctor' sets up the whole toolset automatically —
no manual installs. cpupower -> Suggests. Verified all candidates resolve in apt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:38:12 +02:00
jessey e8b84bf046 Merge pull request 'docs: rewrite README to be user-first (install + use)' (#32) from docs/readme-users into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #32
2026-05-22 13:32:41 +00:00
jessey 2342dd83aa docs: rewrite README to be user-first (install + use)
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 29s
Lead with what RigDoctor does, then install (.deb/apt incl. the private-registry
auth.conf.d + trusted=yes notes, and the .run), then usage (GUI/tray/CLI),
requirements, and privacy. Move the dev content (from-source, tests, docs links)
into a short Development section at the end. Drops the stale status/decisions/
repo-layout planning sections from the top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:31:36 +02:00
jessey a028fe6d38 Merge pull request 'ci: make apt registry upload idempotent (tolerate 409)' (#31) from fix/apt-409 into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #31
2026-05-22 13:26:47 +00:00
jessey a6453335e9 ci: make apt registry upload idempotent (tolerate 409)
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 28s
Gitea's Debian registry is immutable, so re-uploading an existing version returns
409. With --fail that aborted the release on any re-run / repeat push at the same
version. Now we capture the HTTP code: 2xx = uploaded, 409 = already published
(skip), anything else = fail with the body. Also fixed the stale skip message
(REGISTRY_TOKEN, not PACKAGES_TOKEN).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:21:27 +02:00
jessey baec47dd4e Merge pull request 'assets: project avatar (gauge + heartbeat) for Gitea' (#30) from chore/avatar into main
release / test (push) Successful in 12s
release / release (push) Failing after 15s
Reviewed-on: #30
2026-05-22 13:18:59 +00:00
jessey 47ecb702e7 Merge branch 'main' into chore/avatar
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 28s
2026-05-22 13:17:28 +00:00
jessey 944945ce72 Merge pull request 'feat(m9): .deb package + CI build/publish — 0.35.0' (#29) from feat/deb-packaging into main
release / test (push) Successful in 13s
release / release (push) Successful in 19s
Reviewed-on: #29
2026-05-22 13:17:19 +00:00
jessey dc719f6a89 assets: project avatar (gauge + heartbeat) for Gitea
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 27s
512x512 PNG (assets/avatar.png) rendered from assets/avatar.svg, matching the app
icon's gauge-ring + heartbeat motif on a dark gradient. Upload as the repo avatar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:16:58 +02:00
jessey 78cd417d0b feat(m9): .deb package + CI build/publish — 0.35.0
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 28s
packaging/make_deb.py builds rigdoctor_<ver>_all.deb (Architecture: all) via
dpkg-deb, no debhelper: Depends python3; Recommends python3-pyside6/pyte (GUI by
default, --no-install-recommends = CLI only). Installs the package, both
launchers, desktop entry + icon; postinst refreshes the desktop database.
release.yml builds it as a release asset and optionally pushes to the Gitea apt
registry (REGISTRY_TOKEN). Verified locally: valid .deb, packaged launcher runs
'rigdoctor --version'. Docs/README/ROADMAP/MODULES updated; M9 complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:15:33 +02:00
jessey 856a3305ad Merge pull request 'feat(m8): event-based alerts — Xid/OOM/MCE/PCIe/disk from the kernel log — 0.34.0' (#28) from feat/event-alerts into main
release / test (push) Successful in 13s
release / release (push) Successful in 15s
Reviewed-on: #28
2026-05-22 12:48:41 +00:00
jessey 3b1a2e7393 Merge branch 'feat/event-alerts' of ssh://jesseyvanofferen.com:2222/jessey/rigdoctor into feat/event-alerts
tests / core (pull_request) Successful in 11s
tests / gui-smoke (pull_request) Successful in 26s
2026-05-22 14:42:53 +02:00
jessey 2989e8e23e ci: run tests.yml on pull_request only (no push) to avoid double runs
A branch with an open PR triggered both the push and pull_request events, running
every job twice. Trigger on pull_request only; pushes to main are already tested
by release.yml's `test` job. No version bump (CI config only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:42:41 +02:00
jessey 670df23e06 Merge branch 'main' into feat/event-alerts
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 26s
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 26s
2026-05-22 12:41:34 +00:00
jessey 2ee7763d00 feat(m8): event-based alerts — Xid/OOM/MCE/PCIe/disk from the kernel log — 0.34.0
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 27s
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 26s
AlertMonitor now scans the kernel log (journalctl -k) every ~30s and fires
one-shot, cooldown-gated desktop alerts on critical events: NVIDIA Xid, OOM
kills, CPU machine-checks, PCIe AER, and disk I/O errors — so users are warned
the moment something goes wrong, not only on a temperature threshold. Disk I/O
errors come from the kernel log (no root needed, unlike smartctl). Edge/spam
protection reuses the existing cooldown model. syslogs.scan_critical() does the
matching; init seeds last-scan to "now" so old boot logs don't alert on launch.
Tests for the matcher + monitor gating/cooldown; Settings note updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:41:13 +02:00
jessey bd6cad5a42 Merge pull request 'feat(ai): stream explanations live (Ollama NDJSON + Claude SSE) — 0.33.0' (#27) from feat/syslogs into main
release / test (push) Successful in 12s
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 25s
release / release (push) Successful in 15s
Reviewed-on: #27
2026-05-22 12:35:11 +00:00
jessey 7fa9b63661 Merge branch 'main' into feat/syslogs
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 25s
tests / core (pull_request) Successful in 11s
tests / gui-smoke (pull_request) Successful in 28s
2026-05-22 12:28:59 +00:00
jessey c443a8b9f8 ci: add tests workflow + gate releases on tests passing
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 38s
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 27s
- .gitea/workflows/tests.yml: run `unittest discover` on push + pull_request.
  `core` job (stdlib install, GUI tests skip) is bulletproof; `gui-smoke` job
  installs the GUI extra + offscreen Qt libs and runs the suite headless.
- release.yml: add a `test` job and `release: needs: test` so a push to main
  can't publish if the tests fail.

No version bump — CI config only; nothing in the shipped app changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:26:47 +02:00
jessey bbc22fa288 feat(ai): stream explanations live (Ollama NDJSON + Claude SSE) — 0.33.0
ai.explain_stream(findings_text, on_chunk) streams token deltas and returns
(ok, full_text). Ollama: stream=True NDJSON; Claude: stream=True SSE (parse
content_block_delta text deltas). The diagnostic dialog opens an explanation
window immediately and fills it token-by-token via a _chunk signal, then
re-renders the finished answer as Markdown — no more multi-second freeze on a
local model. Non-streaming explain() kept for the CLI. Tests for both parsers;
verified live against qwen2.5:7b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:23:15 +02:00
jessey 5502251789 Merge pull request 'feat(m15): collect session-scoped system logs (kernel + coredumps) — 0.31.0' (#26) from feat/syslogs into main
release / release (push) Successful in 15s
Reviewed-on: #26
2026-05-22 12:16:52 +00:00
jessey 4bd51a40c3 feat(m15): nvidia-smi snapshot + display logs + inventory in reports — 0.32.0
Expand diagnostic/report collection (all stored per-diagnostic, in the Report zip;
logs also fed to the AI on "Explain"):
- syslogs: nvidia-smi -q snapshot (driver/throttle/clocks/power/temps/PCIe/ECC/
  retired pages) + display-server log auto-detected — Xorg.0.log on X11, or the
  compositor user-journal slice (gnome-shell/kwin/sway/gamescope) on Wayland.
- diagstore: include the full M5 inventory (inventory.txt + .json) — invaluable
  for larger/shared debugging. inventory.collect() degrades gracefully (no root
  prompt). Best-effort throughout.
- Tests for nvidia/display + inventory in store; docs (M15/SPEC).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:16:23 +02:00
jessey 984292c368 feat(m15): collect session-scoped system logs (kernel + coredumps) — 0.31.0
core/syslogs.py gathers, scoped to the diagnostic window:
- kernel-log slice (journalctl -k): Xid, OOM, MCE, PCIe AER, thermal, hung tasks
- crashed-process records (coredumpctl): exe, signal, when
Stored as syslogs.txt in the diagnostic dir, included in the Report bundle, and
fed to the AI on "Explain" alongside the game logs. Best-effort (degrades if the
tools are missing/denied); treats journalctl's "-- No entries --" as empty.
Tests + docs (M15/SPEC).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:10:30 +02:00
jessey bffaf73ad4 Merge pull request 'fix(ai): analyse the actual session, not stale/benign logs — 0.28.1' (#25) from feat/m14-ai into main
release / release (push) Successful in 15s
Reviewed-on: #25
2026-05-22 11:57:03 +00:00
jessey 7f0ab9a635 feat(m15): opt-in logging + per-diagnostic storage + Report bundles — 0.30.0
One `logging_enabled` toggle (default off) gates everything (D25):
- core/applog.py: rotating app.log (no-op unless enabled); setup() at GUI/CLI start.
- core/diagstore.py: each diagnostic stored in DATA_DIR/diagnostics/<id>/ (capture,
  result.json, report.txt, scoped gamelogs, ai/ records of exactly what was sent to
  the model + which model + the reply). make_report() zips a diagnostic (+ app.log)
  into DATA_DIR/reports/.
- diagnostic.finish()/analyze_crash() store when enabled; DiagnosticResult.dir.
- GUI: Settings → Logging toggle; "Report" button on the diagnostic dialog; AI
  interactions recorded into the diagnostic dir on "Explain with AI".
- CLI: `rigdoctor bundle` (report is taken by the M4 health report).
- Tests for store/record_ai/make_report + applog gating; docs (D25, M15, Phase 8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 13:56:31 +02:00
jessey 12339c3282 feat(ai): resolve Steam app IDs from the library, don't make the model guess — 0.29.0
The model guessed "Rainbow Six Siege" for appID 2694490 (Path of Exile 2). We
already know the names locally, so ground it: steam.appid_names() maps appid→name
from the scanned library, and ai.build_prompt scans the text for app IDs and
injects a resolved glossary. Only locally-known IDs are listed; no network, no
fine-tuning. Tests + verified live (2694490 = Path of Exile 2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 13:40:34 +02:00
jessey c7e50ba4cb fix(ai): analyse the actual session, not stale/benign logs — 0.28.1
The user ran a game ~20s with no crash but the AI dredged up old log lines,
guessed the wrong game, and gave Windows advice. Fixes:
- Prompt now includes the real game name + capture duration + outcome (clean vs
  crash), so the model uses the known game instead of guessing from log paths.
- gamelogs.collect(since=…): scope Steam-console lines by timestamp and skip a
  stale per-app Proton log (mtime before the session) — no unrelated past run.
- ai_knowledge: flag benign Steam/Proton lines (libnvidia-ml.so.1 assertion,
  routine minidumps, "fork without exec") as non-causal.
- System prompt: Linux-only steps (no "run as administrator"); don't manufacture
  a problem on a clean run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 13:38:19 +02:00
jessey a3caabc0d5 Merge pull request 'feat(ai): pre-fill qwen2.5:7b when Ollama is selected — 0.27.1' (#24) from feat/m14-ai into main
release / release (push) Successful in 14s
Reviewed-on: #24
2026-05-22 11:32:59 +00:00
jessey b59f202891 feat(ai): render Markdown + feed game/Proton/Steam logs to the AI — 0.28.0
1) The explanation popup rendered raw Markdown (### / **). Switched to
   QTextEdit.setMarkdown and told the model to answer in Markdown.
2) On "Explain with AI", also collect recent Proton (~/steam-*.log) and Steam
   console logs (core/gamelogs.py — tail-read, size-bounded) and include them in
   the prompt so the model can correlate log errors with findings and pinpoint
   when things went wrong. Reference-fact matching runs over the logs too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 13:32:51 +02:00
jessey e6d94fbd59 feat(ai): pre-fill qwen2.5:7b when Ollama is selected — 0.27.1
Selecting the Ollama provider pre-fills the model field with the suggested
qwen2.5:7b (fits an 8 GB GPU at Q4; grounding makes a 7B sufficient). Won't
overwrite a model the user already typed. Constant ai.OLLAMA_SUGGESTED_MODEL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 13:25:04 +02:00
45 changed files with 2480 additions and 160 deletions
+39
View File
@@ -11,7 +11,20 @@ on:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install (core only)
run: python -m pip install -e .
- name: Run tests
run: python -m unittest discover -s tests -v
release:
needs: test # don't publish a release if the tests fail
runs-on: ubuntu-latest
steps:
- name: Checkout
@@ -30,6 +43,9 @@ jobs:
- name: Build self-extracting installer (.run)
run: python packaging/make_run.py
- name: Build .deb
run: python packaging/make_deb.py
- name: Read version
id: ver
run: |
@@ -90,3 +106,26 @@ jobs:
"${API}/releases/${rid}/assets?name=$(basename "$f")" >/dev/null
done
echo "Published ${TAG}."
- name: Publish .deb to the Gitea apt registry (optional — needs REGISTRY_TOKEN)
env:
PKG_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
run: |
set -euo pipefail
if [ -z "${PKG_TOKEN:-}" ]; then
echo "REGISTRY_TOKEN not set — skipping apt publish (the .deb is still a release asset)."
exit 0
fi
OWNER="${{ github.repository_owner }}"
URL="${{ github.server_url }}/api/packages/${OWNER}/debian/pool/stable/main/upload"
for f in dist/*.deb; do
echo "Uploading $(basename "$f") to the apt registry…"
code=$(curl -sS -o /tmp/apt_upload.txt -w '%{http_code}' \
--user "${OWNER}:${PKG_TOKEN}" --upload-file "$f" "$URL" || true)
case "$code" in
2*) echo " uploaded ($code)";;
409) echo " already published ($code) — skipping (registry versions are immutable)";;
*) echo " upload failed ($code):"; cat /tmp/apt_upload.txt || true; exit 1;;
esac
done
echo "apt source: deb ${{ github.server_url }}/api/packages/${OWNER}/debian stable main"
+44
View File
@@ -0,0 +1,44 @@
name: tests
run-name: Run test suite
# Runs the unittest suite on pull requests (once per PR). Pushes to main are covered by the
# `test` job in release.yml, so we don't trigger on push here — that would double every run.
# Two jobs:
# core — stdlib-only install; the GUI tests skip (@skipUnless HAVE_QT). Bulletproof.
# gui-smoke — installs the GUI extra + offscreen Qt libs and runs the same suite headless,
# exercising the MainWindow/SetupWizard/DiagnosticDialog construction tests.
# Make `tests / core (pull_request)` a required status check on `main` so a PR can't merge red.
on:
pull_request:
jobs:
core:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install (core only — no PySide6)
run: python -m pip install -e .
- name: Run tests (GUI tests skip without PySide6)
run: python -m unittest discover -s tests -v
gui-smoke:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: System libraries for offscreen Qt
run: |
sudo apt-get update
sudo apt-get install -y libegl1 libgl1 libxkbcommon0 libdbus-1-3 libglib2.0-0
- name: Install (with GUI extra)
run: python -m pip install -e ".[gui]"
- name: Run tests (headless)
env:
QT_QPA_PLATFORM: offscreen
run: python -m unittest discover -s tests -v
+162
View File
@@ -5,6 +5,168 @@ All notable changes to RigDoctor are recorded here. Format follows
(`MAJOR.MINOR.PATCH`, pre-1.0). `__version__` and `pyproject.toml` must match the git
release tag (so the auto-updater, D18, can compare versions).
## [0.40.0] - 2026-05-22
### Added
- **RAM speed / XMP-EXPO check.** Inventory now shows each module's configured speed and, when it's
below the rated speed, the rating (e.g. `4800 MT/s (rated 5600)`); **System Health** flags it
("RAM at 4800 MT/s (rated 5600 MT/s)") with the fix — enable XMP/EXPO in BIOS. With the profile
off, dmidecode only reports the JEDEC base, so the rated speed is read from both dmidecode and
the part number (matched against known DDR5 speed grades, so no false positives). Needs dmidecode
(root / launch elevation). Completes the "underperforming hardware" trio with PCIe gen + refresh.
## [0.39.0] - 2026-05-22
### Added
- **Displays in the Inventory.** A new `core/displays.py` lists each connected monitor with its
resolution and current/max refresh — e.g. `DP-1 · Samsung LC34G55T → 3440x1440 @ 165 Hz`. Reads
GNOME's Mutter `DisplayConfig` over D-Bus (works on X11 *and* Wayland), falling back to `xrandr`
on other X11 desktops.
- **System Health flags monitors below their max refresh.** If a monitor supports a higher refresh
at its current resolution (e.g. a 165 Hz panel set to 60 Hz — an easily-missed gaming setting),
Health reports it with the fix (raise it in Display settings). Max is computed at the *current*
resolution, so it never suggests dropping resolution.
## [0.38.0] - 2026-05-22
### Added
- **PCIe link in the Inventory.** Each NVMe drive now shows its negotiated PCIe link next to the
model — e.g. `Samsung SSD 980 PRO 1TB (931.5G) · PCIe Gen4 x4` — read from sysfs
(`current/max_link_speed` + width). If a drive negotiates below its capability (a slower M.2
slot, lane-sharing, or a downtrain) it's flagged: `PCIe Gen3 x4 (capable of Gen4 x4)`. So you
can confirm a Gen4 SSD is actually in a Gen4 slot. (SATA disks show no PCIe link.)
- **System Health flags downtrained NVMe links.** A new check warns when an NVMe drive negotiates
fewer PCIe lanes than it supports (almost always motherboard **lane-sharing** — a GPU/second
card or another M.2 stealing lanes) and notes speed-only reductions as info (a slower slot or
idle ASPM). The GPU is deliberately excluded — NVIDIA drops its PCIe gen/width at idle, so a
snapshot would false-alarm.
## [0.37.1] - 2026-05-22
### Fixed
- **`rigdoctor update` now uses the right method for how RigDoctor was installed.** It detects
apt (`.deb`), pip (venv/`.run`), or source installs (`updates.install_kind()`); only pip
installs self-update in place. An apt install no longer fails with "No module named pip" —
it (and the GUI Update button) shows `sudo apt update && sudo apt install --only-upgrade
rigdoctor`; a source checkout points to `git pull`.
## [0.37.0] - 2026-05-22
### Added
- **Version footer** — a footer across the bottom of the window shows `RigDoctor v<version>` in
the bottom-right (moved out of the sidebar).
### Fixed
- **Pages scroll when content doesn't fit, and the window is no longer pinned to the tallest
page's height.** Long pages (Settings, Tuning, …) get a scrollbar when too tall — so controls
like Uninstall are always reachable — and the window can now be resized smaller than the screen
(min height dropped from "taller than the screen" to ~600px). Pages that manage their own
scroll/fill (Dashboard, System Health, Inventory, Share) are unchanged.
## [0.36.1] - 2026-05-22
### Fixed
- `rigdoctor gui` printed the wrong fix when PySide6 is missing — it suggested the non-existent
`python3-pyside6` package. Now it names the real split modules
(`python3-pyside6.qt{widgets,gui,websockets,svg}` + `python3-pyte`).
## [0.36.0] - 2026-05-22
### Fixed
- **`.deb` now installs all dependencies automatically — no manual tool install.** The previous
`Recommends: python3-pyside6` named a package that doesn't exist on Debian/Ubuntu (PySide6 is
split per module), so apt silently skipped it and the GUI wouldn't start. Now it Recommends the
actual modules the GUI imports — `python3-pyside6.qt{widgets,gui,websockets,svg}` + `python3-pyte`.
### Changed
- **`apt install rigdoctor` sets up the whole toolset.** The `.deb` also Recommends the optional
diagnostic/gaming tools (smartmontools, lm-sensors, dmidecode, pciutils, libnotify-bin,
libsecret-tools, gamemode, mangohud) so they install by default — users never hand-install
tools. `cpupower` is a Suggests (kernel-tied); `--no-install-recommends` still gives CLI-only.
## [0.35.0] - 2026-05-22
### Added
- **`.deb` package (M9 / D8)** — `packaging/make_deb.py` builds a `rigdoctor_<version>_all.deb`
(pure-Python, `Architecture: all`) via `dpkg-deb`: `Depends: python3`, with the GUI deps
(`python3-pyside6`, `python3-pyte`) as **Recommends** so `sudo apt install ./rigdoctor_*.deb`
gives the full app and `--no-install-recommends` gives CLI-only. Installs the package, both
launchers, the desktop entry, and the icon. CI (`release.yml`) builds it as a **release asset**
every release, and optionally publishes it to the Gitea **apt registry** (set a `REGISTRY_TOKEN`
secret) for `sudo apt install rigdoctor`. **M9 is now complete.**
## [0.34.0] - 2026-05-22
### Added
- **Event-based alerts (M8).** Beyond temperature + GPU-lost, RigDoctor now notifies on
**critical kernel events** — Xid (GPU error), out-of-memory kills, CPU machine-checks, PCIe
AER errors, and disk I/O errors — scanned from the kernel log every ~30s while monitoring and
fired one-shot (cooldown-gated, so no spam). A proactive warning the moment something goes
wrong, not just on a temperature threshold. Included whenever desktop notifications are on.
## [0.33.0] - 2026-05-22
### Added
- **AI explanations stream live.** "Explain with AI" now fills token-by-token as the model
generates (Ollama NDJSON + Claude SSE, both via stdlib `urllib`) instead of a multi-second
freeze, then re-renders the finished answer as Markdown. `core/ai.explain_stream()`.
## [0.32.0] - 2026-05-22
### Added
- **More for diagnostics & reports:**
- **`nvidia-smi -q` snapshot** — driver, throttle/clock-event reasons, clocks, power, temps,
PCIe link, ECC + retired pages (point-in-time at diagnostic time).
- **Display-server log** — auto-detected: `Xorg.0.log` on X11, or the compositor's user-journal
slice (gnome-shell/kwin/sway/gamescope) on Wayland.
- **Full system inventory** (M5 hardware/OS) is now included in each stored diagnostic and the
**Report** bundle — invaluable for larger/shared debugging.
These join the kernel log + coredump records in `syslogs.txt`/`inventory.*`, are saved per
diagnostic, included in the Report zip, and (logs) fed to the AI on "Explain".
## [0.31.0] - 2026-05-22
### Added
- **Diagnostics now collect session-scoped system logs** (`core/syslogs.py`): a kernel-log
slice (`journalctl -k` — Xid, OOM-killer, MCE, PCIe AER, thermal, hung tasks) and
**crashed-process records** (`coredumpctl` — which executable, signal, and when). They're saved
to the diagnostic directory (`syslogs.txt`), included in the **Report** bundle, and fed to the
AI on "Explain" alongside the game logs. Best-effort — degrades quietly if the tools are
missing or access is denied; scoped to the session window so it doesn't drag in old noise.
## [0.30.0] - 2026-05-22
### Added
- **Logging & report bundles (M15, D25)** — opt-in via one **Settings → Logging** toggle
(default off). When on: the app logs to a rotating `app.log`, and **each diagnostic is stored
in its own folder** (`~/.local/share/rigdoctor/diagnostics/<id>/`) with the capture log, a
structured `result.json`, a readable `report.txt`, a session-scoped game-log snapshot, and an
`ai/` record of every AI interaction — **the exact data sent, which model, and its reply**.
- **Report** — a button on the diagnostic dialog (and `rigdoctor bundle`) zips a diagnostic's
folder plus `app.log` into `~/.local/share/rigdoctor/reports/<id>.zip` for sharing. Everything
stays local; the zip only leaves your machine if you share it. Available only when logging is on.
## [0.29.0] - 2026-05-22
### Added
- **AI now resolves Steam app IDs from your library instead of guessing.** When app IDs appear
in the logs/findings, RigDoctor looks them up in your scanned games (`steam.appid_names()`) and
injects an "App IDs (resolved from your installed games)" glossary into the prompt — so the
model names games correctly (e.g. `2694490 = Path of Exile 2`) rather than hallucinating. Only
IDs it can resolve locally are listed; no network, no model "training" needed.
## [0.28.1] - 2026-05-22
### Fixed
- **AI explanations were misreading stale/benign logs.** Three fixes so the model analyses the
*actual* session: (1) the prompt now states the **real game name, capture duration, and
outcome** (clean vs. crash) so the model stops guessing the game from log paths; (2) game logs
are **scoped to the session window** (Steam-console lines filtered by timestamp; a stale
per-app Proton log from an earlier game is skipped); (3) the reference KB flags common
**benign** Steam/Proton lines (`libnvidia-ml.so.1` assertion, routine minidump uploads, "fork
without exec") so they aren't reported as the cause. The system prompt also forbids
Windows-only advice (no "run as administrator") and tells the model not to invent a problem
when the run was clean.
## [0.28.0] - 2026-05-22
### Added
- **AI explanations now include recent game logs.** When you press "Explain with AI" on a
diagnostic, RigDoctor also gathers recent **Proton** (`~/steam-<appid>.log`) and **Steam**
console logs (`core/gamelogs.py`, tail-read + size-bounded) and passes them to the model, so
it can correlate log errors with the sensor findings and pinpoint *when* something went wrong.
### Fixed
- The AI explanation popup now **renders Markdown** (headings, bold, lists) instead of showing
raw `###`/`**``QTextEdit.setMarkdown`, and the model is told to answer in Markdown.
## [0.27.1] - 2026-05-22
### Changed
- AI assistant: selecting **Ollama** now pre-fills the model field with **`qwen2.5:7b`** (a
strong 7B that fits an 8 GB GPU; our grounding makes a 7B sufficient). It won't overwrite a
model you've already entered, and you can change it freely.
## [0.27.0] - 2026-05-22
### Added
- **AI assistant (M14, D24)** — optional, **strictly opt-in, never automatic**. Explains your
+102 -98
View File
@@ -1,132 +1,136 @@
# RigDoctor
A **modular diagnostics, monitoring, and health-check toolkit for Linux gamers.**
**Hardware monitoring & crash diagnostics for Linux gamers.** Live sensors, crash-safe
logging, plain-language health reports, per-game diagnostics, and optional AI explanations —
in a desktop app, a tray applet, or the terminal. Ubuntu/Debian + NVIDIA first.
> **Status:** 🟢 Phase 1 (MVP) complete. The **sensor core (M1)**, **crash-capture logger
> (M3)**, and **health report (M4)** all work — live `snapshot`/`monitor`, crash-safe `record`
> with a post-crash report, and `report` to scan logs/SMART/driver for likely causes. A
> desktop GUI (M10) ties them together (dashboard, recording, health). See `docs/ROADMAP.md`.
Linux gaming faults are hard to pin down — GPUs falling off the PCIe bus, black screens
mid-game, silent thermal/VRAM throttling, driver/Proton mismatches. The useful data is
scattered across `nvidia-smi`, `/sys`, `journalctl`, and SMART, and the readings right before a
freeze are usually lost. RigDoctor pulls it together and keeps the evidence.
## Why this exists
## Features
Linux gaming hardware faults are hard to diagnose: GPUs falling off the PCIe bus, the screen
suddenly going black mid-game, silent thermal/VRAM throttling, power transients,
driver/library mismatches, Proton quirks, and CPU governor / power-profile misconfiguration.
The data needed to diagnose them is scattered across `nvidia-smi`, `/sys/class/hwmon`,
`journalctl`, SMART, and more — and the most useful readings (the ones right before a hard
freeze) are usually lost because nothing flushed them to disk.
- **Live monitoring** — a dark desktop **dashboard** (history graphs + per-subsystem cards), a
**tray applet** with at-a-glance status, and a terminal view (`rigdoctor monitor`).
- **Crash-safe recording** — background logger that `fsync`s every sample, so the state right
before a hard freeze survives. Manual, always-on, or auto-start when a game launches.
- **Health report** — scans `journalctl`/SMART/driver for likely causes (Xid, OOM, disk
errors, throttling…) and explains them with suggested fixes.
- **Per-game diagnostics** — pick a game, capture while you play, get a focused report; hard
crashes are detected and analysed on next launch.
- **Gaming tune-ups** — flags risky settings (CPU governor, PCIe ASPM, persistence mode…) with
**one-click, reversible fixes**.
- **Proactive alerts** — desktop notifications on overheating and critical kernel events
(GPU-lost, Xid, out-of-memory, disk I/O).
- **AI explanations** *(optional, opt-in)* — explain a diagnostic in plain language with a
**local model (Ollama)** or **Claude**. Never automatic; only when you press the button.
- **Shareable reports** — zip a diagnostic (logs, inventory, AI transcript) to hand to someone,
or share a live **terminal session** for remote help.
- **Self-updating** — `apt upgrade`, or the in-app updater.
RigDoctor pulls all of that into one modular tool: live monitoring, crash-safe logging, a
one-shot health report, and an interactive installer that only sets up the modules a given
user actually needs for their hardware.
## Screenshots
**Seed use cases:** an RTX 3070 that intermittently "falls off the bus" under heavy GPU load
(Path of Exile on Linux, Escape from Tarkov on Windows), and a monitor going black mid-game.
See `docs/SPEC.md` §1.
| Dashboard | Inventory |
|---|---|
| ![Dashboard — live sensors](assets/screenshots/dashboard.png) | ![Inventory — hardware/OS](assets/screenshots/inventory.png) |
## How you run it
**Share** — a read-only or interactive terminal session over the relay, for remote help:
RigDoctor is **GUI-first** — the desktop app is the primary way in — but every feature is
also available headless:
- **Desktop GUI** — graphical dashboard, recording controls, log browser, reports. The
default interface for most users.
- **Tray applet** — a small top-menu-bar applet with quick actions and at-a-glance status.
- **CLI** — full functionality from the terminal; works over SSH and in scripts.
![Share — shared terminal session](assets/screenshots/share.png)
The GUI/tray are optional modules; a headless (CLI-only) install loses no capability.
## Install
## Key decisions (settled)
### Debian / Ubuntu — `.deb`
| Topic | Decision |
|-------|----------|
| Name | **RigDoctor** |
| Language / stack | **Python 3 + Qt (PySide6)** — core/CLI/daemon stdlib-only; Qt only for GUI/tray |
| Primary distro | **Ubuntu** (Debian via apt); others best-effort later |
| Primary GPU | **NVIDIA** first; AMD, then Intel later |
| MVP | **Sensor core + crash logger + health report** (NVIDIA-only, CLI-first) |
| Distribution | **User-local install** (self-updating from the public repo, no root); **`.deb`** optional |
| Scope of action | **Read-only + suggestions** (no auto-apply yet) |
| Stress tests | **Out of scope** |
Full rationale and the still-open questions are in `docs/DECISIONS.md`.
## Repo layout
| Path | Purpose |
|------|---------|
| `docs/SPEC.md` | Product specification — vision, requirements, modules (the main planning doc) |
| `docs/ARCHITECTURE.md` | Technical design — core engine, front-ends, daemon, installer |
| `docs/MODULES.md` | Catalog of modules with scope, dependencies, status |
| `docs/ROADMAP.md` | Phased milestones |
| `docs/DECISIONS.md` | Decision log + remaining open questions |
| `src/rigdoctor/` | Source code — `core/` engine + sources, `cli.py`, `render.py` |
| `installer/` | Installer / `.deb` packaging (empty until Phase 4) |
| `tests/` | Tests (stdlib `unittest`) |
## Install (user-local, no root)
RigDoctor installs into a private venv under `~/.local` — no root, self-updating:
The simplest path: grab the latest **`rigdoctor_<version>_all.deb`** from the
[releases page](https://git.jesseyvanofferen.com/jessey/rigdoctor/releases) and install it —
apt pulls the GUI dependencies (PySide6, pyte) automatically:
```bash
./install.sh # from a source checkout or the self-extracting .run
./install.sh --ref v0.0.6 # install a specific released tag (needs a token)
./install.sh --uninstall # remove it
sudo apt install ./rigdoctor_*_all.deb # CLI only: add --no-install-recommends
```
This adds `rigdoctor` / `rigdoctor-gui` to `~/.local/bin` and a desktop entry. Each release
also ships a one-file **`.run`** installer (download, `chmod +x`, run). Updates are gated to
accounts on the Git server (a Personal Access Token); save one via the GUI **Setup → Update
access** panel or `rigdoctor login`, then `rigdoctor update` (or the sidebar button).
## Run it (dev)
Stdlib-only, no install needed (target is Python ≥ 3.11; tested on 3.14):
**Or add the apt repository** for `apt install` + automatic updates (the registry is public and
GPG-signed — no token needed):
```bash
PYTHONPATH=src python3 -m rigdoctor snapshot # one-shot sensor read
PYTHONPATH=src python3 -m rigdoctor snapshot --json
PYTHONPATH=src python3 -m rigdoctor monitor -n 1 # live view (Ctrl-C to quit)
PYTHONPATH=src python3 -m rigdoctor sources # list detected sensor sources
PYTHONPATH=src python3 -m unittest discover -s tests
sudo curl https://git.jesseyvanofferen.com/api/packages/jessey/debian/repository.key -o /etc/apt/keyrings/gitea-jessey.asc
echo "deb [arch=all signed-by=/etc/apt/keyrings/gitea-jessey.asc] https://git.jesseyvanofferen.com/api/packages/jessey/debian stable main" | sudo tee /etc/apt/sources.list.d/gitea.list
sudo apt update
sudo apt install rigdoctor
```
### Crash-capture logger (M3)
Then `sudo apt upgrade` keeps it current.
A crash-safe background logger (JSONL, `fsync` per sample, bounded by rotation) for catching
the state right before a freeze:
Then `sudo apt upgrade` keeps it current.
### Any distro — self-extracting `.run` (no root)
Download **`rigdoctor-<version>-installer.run`** from the releases page and run it. It installs
into a private virtualenv under `~/.local` (no root), adds the launchers + desktop entry, and
opens the first-run setup wizard:
```bash
rigdoctor record start # start logging in the background
rigdoctor record status # is it running? latest readings, sample count
rigdoctor record stop # stop it
rigdoctor record report # post-crash summary: peaks, events, last samples
rigdoctor record run # run in the foreground (the systemd-ready entrypoint)
sh rigdoctor-*-installer.run
```
Logs live in `~/.local/share/rigdoctor/logs/`. It detects GPU "lost"/hang (nvidia-smi query
timeout) and writes an event marker. Trigger modes (always-on / game-launch) and the
`systemd --user` service arrive in Phase 4.
### Updating & removing
### Desktop GUI (M10)
- **`.deb`:** `sudo apt upgrade` (or reinstall a newer `.deb`).
- **`.run` / user-local:** the in-app **Update** button, or `rigdoctor update`.
- **Remove:** `sudo apt remove rigdoctor`, or `rigdoctor uninstall` for the user-local install.
The GUI uses PySide6 (Qt) — the only part of RigDoctor that needs a non-stdlib dep:
## Using it
Launch **RigDoctor** from your app menu, or:
```bash
pip install -e '.[gui]' # core + PySide6, gives `rigdoctor` and `rigdoctor-gui`
rigdoctor gui # or: rigdoctor-gui
rigdoctor-gui # desktop app (+ tray)
rigdoctor --help # everything from the terminal (works over SSH)
```
It opens a dark-themed window with sidebar navigation and a **live dashboard** over the
same sensor core — circular gauges for the headline metrics plus collapsible per-subsystem
cards (GPU/CPU/memory/storage) with temperature-colored values (icey-blue → green → red).
The **Logs** and **Health** sections are full pages (recording controls + post-crash report;
and the kernel-log / SMART / driver scan). **Inventory** is a placeholder until M5 lands.
Handy CLI commands:
Without the GUI extra, `pip install -e .` gives just the stdlib-only CLI.
```bash
rigdoctor snapshot # one-shot reading of every sensor
rigdoctor monitor # live terminal dashboard
rigdoctor report # health report (logs / SMART / driver)
rigdoctor diagnose start|finish # capture while gaming, then analyse
rigdoctor gameenv # flag risky gaming settings + fixes
rigdoctor inventory # hardware/OS inventory
rigdoctor ai explain # AI explanation of the current findings (opt-in)
rigdoctor bundle # zip the latest diagnostic into a shareable report
```
## Start here
## Requirements
1. Read `docs/SPEC.md` for what we're building.
2. Read `docs/ROADMAP.md` for the build order (Phase 1 = the MVP).
3. Read `docs/DECISIONS.md` for the settled decisions (D1D15).
</content>
- **Linux** — Ubuntu/Debian first-class (the `.deb`); the `.run` works on any distro with
Python ≥ 3.11.
- **GPU** — NVIDIA fully supported (via `nvidia-smi`); AMD/Intel sensors are best-effort.
- **CLI/daemon** need only Python 3 (stdlib). The **GUI/tray** add **PySide6** (`python3-pyside6`).
- Optional tools unlock more: `smartmontools`, `lm-sensors`, `gamemode`, `mangohud`. The setup
wizard offers to install them.
## Privacy
Everything stays on your machine — no telemetry, no phone-home. The AI assistant is **off by
default** and runs only when you explicitly trigger it; with Ollama nothing leaves the machine,
and the Claude option asks before sending. Reports are local files; they leave only if you share
the zip.
## Development
RigDoctor's core is stdlib-only Python; the GUI/tray use PySide6.
```bash
git clone https://git.jesseyvanofferen.com/jessey/rigdoctor && cd rigdoctor
pip install -e ".[gui]" # core + GUI; omit [gui] for CLI-only
python -m unittest discover -s tests # run the test suite
PYTHONPATH=src python3 -m rigdoctor snapshot # run without installing
```
Design docs live in `docs/``SPEC.md` (vision/requirements), `ARCHITECTURE.md`,
`MODULES.md` (module catalog), `ROADMAP.md`, and `DECISIONS.md` (the decision log).
Contributions: branch off `main`, keep tests green (CI runs them on PRs), and bump the version
+ `CHANGELOG.md` for shipped changes.
BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

+17
View File
@@ -0,0 +1,17 @@
<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512">
<defs>
<radialGradient id="bg" cx="50%" cy="42%" r="78%">
<stop offset="0%" stop-color="#1b2230"/>
<stop offset="100%" stop-color="#0d0f13"/>
</radialGradient>
</defs>
<rect width="512" height="512" fill="url(#bg)"/>
<!-- gauge ring -->
<circle cx="256" cy="256" r="168" fill="none" stroke="#2a2f39" stroke-width="28"/>
<!-- accent sweep -->
<path d="M256 88 a168 168 0 1 1 -118.8 49.2" fill="none" stroke="#38bdf8"
stroke-width="28" stroke-linecap="round"/>
<!-- heartbeat / monitoring trace -->
<path d="M120 264 H200 L232 192 L280 336 L312 264 H392" fill="none" stroke="#e6e8eb"
stroke-width="28" stroke-linecap="round" stroke-linejoin="round"/>
</svg>

After

Width:  |  Height:  |  Size: 798 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 171 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 141 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

+13 -1
View File
@@ -264,9 +264,21 @@ root cause + suggested next steps). Adds M14 to the D14 set.
as suggestions (consistent with D9 — it explains/recommends, applying fixes stays
consent-gated). No new runtime dependency (HTTP via stdlib).
### D25 — Logging & report bundles (M15) — *DECIDED 2026-05-22*
Opt-in logging + shareable diagnostic reports.
- **One combined `logging_enabled` toggle** (default off) controls both application logging
(rotating `app.log`) and per-diagnostic storage. Kept as a single switch for simplicity.
- **Each diagnostic is stored in its own directory** (`DATA_DIR/diagnostics/<id>/`): capture
log, structured `result.json`, human-readable `report.txt`, a scoped game-log snapshot, and an
`ai/` folder recording each AI interaction (**exact data sent, provider+model, and the reply**).
- **"Report"** zips one diagnostic directory (plus `app.log`) into `DATA_DIR/reports/` —
auto-saved there (no save dialog), shown with its path. Available only when logging is on
(nothing is stored otherwise). CLI: `rigdoctor bundle`.
- Everything stays local; the report only leaves the machine if the user shares the zip.
## Open
None currently — all tracked decisions (D1D24) are resolved. New questions will be added
None currently — all tracked decisions (D1D25) are resolved. New questions will be added
here as they arise. Remaining detail to flesh out during build: the tray's supporting-action
set (D13), per-module apt package names, M12's tunnel/token specifics, and M13's
update mechanism (APT repo vs. self-installed `.deb`).
+14 -2
View File
@@ -2,7 +2,8 @@
Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
> Module set per D14, plus **M12 (session sharing, D16)** and **M13 (auto-update, D18)**.
> Module set per D14, plus **M12 (session sharing, D16)**, **M13 (auto-update, D18)**,
> **M14 (AI assistant, D24)**, and **M15 (logging & reports, D25)**.
> **M7 (stress/repro) was dropped (D7).** M10/M11 are the GUI and tray modules (D10/D11).
> GPU scope reads "all (NVIDIA first)" — NVIDIA first, others via the vendor abstraction (D4).
@@ -17,10 +18,11 @@ Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
| M6 | Gaming env checks | Diagnostics | none | all | P2 | 🟨 |
| M10 | Desktop GUI | Desktop UI | **python3-pyside6** | all | P2 | ✅ |
| M11 | Tray / menu-bar applet | Desktop UI | **python3-pyside6** (+ AppIndicator on GNOME) | all | P2 | ✅ |
| M9 | Installer | (meta) | none | all | P1 | 🟨 |
| M9 | Installer (+ `.deb`) | (meta) | none | all | P1 | |
| M12 | Session sharing (shared terminal) | Sharing | none (relay) | all | P3 | ✅ |
| M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | ✅ |
| M14 | AI assistant (explain diagnostics) | (optional) | none (stdlib urllib; Ollama or Claude) | all | P3 | ✅ |
| M15 | Logging & report bundles | (core) | none (stdlib logging + zip) | all | P3 | ✅ |
| ~~M7~~ | ~~Stress / repro~~ | — | — | — | — | ❌ dropped (D7) |
## Notes per module
@@ -128,6 +130,16 @@ Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
which lifts a small local model and sharpens Claude. Stdlib `urllib` (no pip deps); output is
advisory (D9). Configure in **Settings → AI assistant**.
- **M15 Logging & report bundles** (D25) — opt-in via one `logging_enabled` toggle (default off):
application logging to a rotating `app.log` (`core/applog.py`) and **per-diagnostic storage**
(`core/diagstore.py`) — each diagnostic gets its own `DATA_DIR/diagnostics/<id>/`: capture,
`result.json`, `report.txt`, the full **inventory** (M5: hardware/OS), scoped **game logs**
(`core/gamelogs.py`), scoped **system logs** (`core/syslogs.py``journalctl -k`,
`coredumpctl`, an `nvidia-smi -q` snapshot, and the X11/Wayland display-server log), and an
`ai/` record of every AI interaction (exact data sent, model, reply). **"Report"** zips one
into `DATA_DIR/reports/` (GUI button on the diagnostic dialog; CLI `rigdoctor bundle`). Logs
are session-scoped and fed to the AI on "Explain". Stays local; shareable on demand.
## Bundles (final — D14)
- **Essential:** M1 + M3 + M4 *(the MVP, NVIDIA-only — D5)*
- **Monitoring:** M2 + M8
+13 -3
View File
@@ -67,9 +67,12 @@ Ubuntu + NVIDIA first; `.deb` distribution (see `DECISIONS.md`).
Settings "Recording trigger") incl. the zero-config **game-launch watcher**
(`core/watcher.py`, `rigdoctor watch`); and a **graphical first-run setup wizard**
(`gui/setup_wizard.py`): environment → dependency-bundle selection → install → recording
trigger → readiness, auto-launched by install.sh and re-runnable from Settings.
*Pending:* `.deb` packaging (next bullet).
- [ ] `.deb` packaging (D8) declaring per-bundle deps incl. python3-pyside6 for Desktop UI
trigger → readiness, auto-launched by install.sh and re-runnable from Settings; and a
**`.deb`** (`packaging/make_deb.py`, `Architecture: all`, `Depends: python3`,
`Recommends: python3-pyside6/pyte`) built + published in CI (release asset + optional
Gitea apt registry). **M9 complete.**
- [x] `.deb` packaging (D8) — built via `dpkg-deb` (no debhelper); GUI deps as Recommends so
`apt install rigdoctor` includes the Desktop UI, `--no-install-recommends` = CLI only.
## Phase 5 — Breadth (later)
- [ ] AMD GPU support in M1 (Steam Deck / Radeon)
@@ -97,6 +100,13 @@ Ubuntu + NVIDIA first; `.deb` distribution (see `DECISIONS.md`).
- [ ] *Possible follow-ups:* interactive chat grounded in the data; more reference-KB entries;
an "Explain" button on the System Health page.
## Phase 8 — Logging & report bundles (M15, D25)
- [x] **Opt-in logging** (one `logging_enabled` toggle): rotating `app.log` (`core/applog.py`)
+ **per-diagnostic storage** in its own directory (`core/diagstore.py`) — capture,
result, report, scoped game logs, and AI-interaction records.
- [x] **Report** bundle — zip a diagnostic (incl. exactly what was sent to the AI, the model,
and its reply) into the reports folder. GUI button + `rigdoctor bundle`.
> **Out of scope:** stress/repro module (D7); multi-distro support and packaging beyond
> Ubuntu/apt + `.deb` (D15) — a thin seam is kept but not built out.
+12
View File
@@ -162,6 +162,18 @@ the actual findings plus matched reference facts from a curated, exact-match kno
("RAG-lite" — no embeddings/vector store, stdlib only); no fine-tuning. HTTP via stdlib `urllib`
(no new core dependency); output is advisory (consistent with D9).
### M15 — Logging & report bundles (D25)
Opt-in (one `logging_enabled` toggle, default off). When on: the application logs to a rotating
`app.log`, and **each diagnostic is stored in its own directory** (capture log, structured
result, human-readable report, the full **inventory** (M5 hardware/OS), session-scoped **game
logs** (Proton/Steam) and **system logs** (`journalctl -k`, `coredumpctl`, an `nvidia-smi -q`
snapshot, and the X11/Wayland display-server log), and a record of every AI interaction — the
exact data sent, the model, and its reply). The collected logs are also fed to the AI on
"Explain". Collection is best-effort (degrades if tools are missing/denied). A **Report** action zips one diagnostic's directory
(plus the app log) into a shareable bundle saved under the reports folder (GUI button; CLI
`rigdoctor bundle`). Everything stays local — a report only leaves the machine if the user
shares the zip. Stdlib only (`logging` + `zipfile`).
## 5. Non-functional requirements
- **Zero hard deps for the core/CLI/daemon** — Python stdlib + tools already present. **Qt
(PySide6) is required only by the GUI (M10) and tray (M11) modules**, declared in the
+121
View File
@@ -0,0 +1,121 @@
"""Build a `.deb` for RigDoctor (M9 / D8) — dependency-light, no debhelper.
Pure-Python app, so it's `Architecture: all`: we stage the package into dist-packages, drop the
two launchers in /usr/bin, install the desktop entry + icon, write a DEBIAN/control, and call
`dpkg-deb`. The core is stdlib (`Depends: python3`); everything else is **Recommends** so a
plain `apt install rigdoctor` sets up the whole toolset automatically (users never hand-install
deps) — the GUI modules (Debian/Ubuntu split PySide6 per module, so we name
`python3-pyside6.qt{widgets,gui,websockets,svg}`) + `python3-pyte`, plus the diagnostic/gaming
tools (smartmontools, lm-sensors, dmidecode, pciutils, libnotify-bin, libsecret-tools, gamemode,
mangohud). `--no-install-recommends` still yields a CLI-only install; `cpupower` is a Suggests
(kernel-tied/heavy).
Run: `python packaging/make_deb.py` → `dist/rigdoctor_<version>_all.deb`.
"""
from __future__ import annotations
import shutil
import subprocess
import sys
from pathlib import Path
ROOT = Path(__file__).resolve().parents[1]
DIST = ROOT / "dist"
MAINTAINER = "Jessey van Offeren <jjvanofferen@gmail.com>"
HOMEPAGE = "https://git.jesseyvanofferen.com/jessey/rigdoctor"
def _version() -> str:
text = (ROOT / "src" / "rigdoctor" / "__init__.py").read_text(encoding="utf-8")
for line in text.splitlines():
if line.startswith("__version__"):
return line.split('"')[1]
raise SystemExit("could not read __version__")
_LAUNCHER = """\
#!/usr/bin/python3
import sys
from {module} import main
sys.exit(main())
"""
_DESKTOP = """\
[Desktop Entry]
Type=Application
Name=RigDoctor
Comment=Hardware monitoring & crash diagnostics for Linux gamers
Exec=rigdoctor-gui
Icon=rigdoctor
Terminal=false
Categories=System;Monitor;Utility;
StartupWMClass=rigdoctor
"""
_CONTROL = """\
Package: rigdoctor
Version: {version}
Architecture: all
Maintainer: {maintainer}
Section: utils
Priority: optional
Depends: python3 (>= 3.11)
Recommends: python3-pyside6.qtwidgets, python3-pyside6.qtgui, python3-pyside6.qtwebsockets, python3-pyside6.qtsvg, python3-pyte, smartmontools, lm-sensors, dmidecode, pciutils, libnotify-bin, libsecret-tools, gamemode, mangohud
Suggests: linux-tools-generic
Homepage: {homepage}
Description: Hardware monitoring & crash diagnostics for Linux gamers
RigDoctor monitors GPU/CPU temperatures, load, and sensors, captures crash
diagnostics while gaming, scans logs (Xid/SMART/kernel) for problems, and can
explain them in plain language. The CLI and background daemon are pure Python
(stdlib only); the optional desktop GUI and system-tray applet use PySide6,
pulled in via Recommends. Install with --no-install-recommends for CLI only.
"""
def _write(path: Path, text: str, mode: int = 0o644) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(text, encoding="utf-8")
path.chmod(mode)
def build() -> Path:
version = _version()
DIST.mkdir(exist_ok=True)
stage = DIST / f"rigdoctor_{version}_all"
if stage.exists():
shutil.rmtree(stage)
# Python package → dist-packages (importable system-wide), minus bytecode.
pkg_dst = stage / "usr/lib/python3/dist-packages/rigdoctor"
shutil.copytree(ROOT / "src" / "rigdoctor", pkg_dst,
ignore=shutil.ignore_patterns("__pycache__", "*.pyc"))
# Launchers.
_write(stage / "usr/bin/rigdoctor", _LAUNCHER.format(module="rigdoctor.cli"), 0o755)
_write(stage / "usr/bin/rigdoctor-gui", _LAUNCHER.format(module="rigdoctor.gui.app"), 0o755)
# Desktop entry + icon.
_write(stage / "usr/share/applications/rigdoctor.desktop", _DESKTOP)
icon = ROOT / "src" / "rigdoctor" / "gui" / "assets" / "rigdoctor.svg"
_write(stage / "usr/share/icons/hicolor/scalable/apps/rigdoctor.svg",
icon.read_text(encoding="utf-8"))
# Refresh the desktop database on install/remove (best-effort).
_write(stage / "DEBIAN/postinst",
"#!/bin/sh\nset -e\nupdate-desktop-database -q 2>/dev/null || true\n", 0o755)
_write(stage / "DEBIAN/postrm",
"#!/bin/sh\nset -e\nupdate-desktop-database -q 2>/dev/null || true\n", 0o755)
_write(stage / "DEBIAN/control",
_CONTROL.format(version=version, maintainer=MAINTAINER, homepage=HOMEPAGE))
out = DIST / f"rigdoctor_{version}_all.deb"
subprocess.run(["dpkg-deb", "--root-owner-group", "--build", str(stage), str(out)], check=True)
shutil.rmtree(stage)
return out
if __name__ == "__main__":
path = build()
print(f"built {path}")
sys.exit(0)
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "rigdoctor"
version = "0.27.0"
version = "0.40.0"
description = "Modular hardware monitoring & crash diagnostics for Linux gamers."
readme = "README.md"
requires-python = ">=3.11"
+1 -1
View File
@@ -1,3 +1,3 @@
"""RigDoctor — modular hardware monitoring & crash diagnostics for Linux gamers."""
__version__ = "0.27.0"
__version__ = "0.40.0"
+30 -2
View File
@@ -55,8 +55,9 @@ def cmd_gui(args) -> int:
from .gui.app import main as gui_main
except ImportError as exc:
print("The GUI needs PySide6, which isn't installed.")
print(" Install it with: pip install 'rigdoctor[gui]'")
print(" or on Ubuntu: sudo apt install python3-pyside6")
print(" Ubuntu/Debian: sudo apt install python3-pyside6.qtwidgets "
"python3-pyside6.qtgui python3-pyside6.qtwebsockets python3-pyside6.qtsvg python3-pyte")
print(" pip: pip install 'rigdoctor[gui]'")
print(f" ({exc})")
return 2
return gui_main([sys.argv[0]])
@@ -262,6 +263,10 @@ def cmd_update(args) -> int:
print("\nWhat's new:\n" + "\n".join(" " + ln for ln in notes.splitlines()) + "\n")
if args.check:
return 0
kind = updates.install_kind()
if kind != "pip": # apt/source installs aren't pip-updatable — show the right command
print(updates.update_hint(kind))
return 0
print(f"Installing {tag}")
rc, out = updates.apply_update(tag)
print(out[-2000:])
@@ -472,6 +477,23 @@ def cmd_ai(args) -> int:
return 0 if ok else 1
def cmd_bundle(args) -> int:
"""Zip the latest stored diagnostic into a report bundle (M15) — needs logging enabled."""
from .core import diagstore
if not diagstore.enabled():
print("Logging is off. Enable it (Settings → Logging, or set logging_enabled) so "
"diagnostics are stored and can be reported.")
return 1
directory = diagstore.latest_dir()
if directory is None:
print("No stored diagnostics yet — run a diagnostic first.")
return 1
out = diagstore.make_report(directory)
print(f"Report written: {out}")
return 0
def cmd_gameenv(args) -> int:
from dataclasses import asdict
@@ -686,10 +708,16 @@ def build_parser() -> argparse.ArgumentParser:
ai_sub.add_parser("test", help="send a tiny probe to verify connectivity").set_defaults(func=cmd_ai)
ai_sub.add_parser("explain", help="explain the current health findings with AI").set_defaults(func=cmd_ai)
ai_p.set_defaults(func=cmd_ai, ai_cmd=None)
bundle_p = sub.add_parser("bundle", help="zip the latest stored diagnostic into a report bundle (M15)")
bundle_p.set_defaults(func=cmd_bundle)
return p
def main(argv: list[str] | None = None) -> int:
from .core import applog
applog.setup() # opt-in app logging (M15); no-op unless logging_enabled
args = build_parser().parse_args(argv)
return args.func(args)
+7
View File
@@ -37,6 +37,12 @@ SPAWN_LOG = STATE_DIR / "recorder.out"
# not config: refreshed by the background scan on every launch).
GAMES_FILE = STATE_DIR / "games.json"
# Logging & reports (opt-in via `logging_enabled`). App log: rotating file of app events.
# Each diagnostic is stored under DIAGNOSTICS_DIR/<id>/; "Report" zips one into REPORTS_DIR.
APP_LOG = STATE_DIR / "app.log"
DIAGNOSTICS_DIR = DATA_DIR / "diagnostics"
REPORTS_DIR = DATA_DIR / "reports"
# Update access token (M13) — gates updates to Gitea account holders (D18).
# Stored in the OS keyring (Secret Service / GNOME Keyring) via `secret-tool` when
# available — encrypted at rest, unlocked with the login session — else a 0600 file.
@@ -190,6 +196,7 @@ DEFAULTS: dict = {
"ai_provider": "", # AI assistant (M14, D24): "" (unset) | "ollama" | "claude"
"ai_model": "", # model name (e.g. "llama3.1" for Ollama; blank = Claude default)
"ai_endpoint": "http://localhost:11434", # Ollama server base URL (Claude uses a fixed endpoint)
"logging_enabled": False, # opt-in: app logging + per-diagnostic storage + Report (M15)
}
+124 -9
View File
@@ -16,27 +16,40 @@ Answers are *grounded*: we pass the actual findings plus matched reference facts
from __future__ import annotations
import json
import re
import urllib.error
import urllib.request
from .. import config
from . import ai_knowledge
_APPID_RE = re.compile(r"\b\d{5,7}\b") # Steam app IDs are 57 digits
PROVIDERS = ("ollama", "claude")
OLLAMA_DEFAULT_ENDPOINT = "http://localhost:11434"
# Suggested Ollama model — strong instruction-following that fits an 8 GB GPU at Q4. Because we
# ground the prompt with reference facts, a 7B model is sufficient here.
OLLAMA_SUGGESTED_MODEL = "qwen2.5:7b"
CLAUDE_ENDPOINT = "https://api.anthropic.com/v1/messages"
CLAUDE_DEFAULT_MODEL = "claude-opus-4-7"
CLAUDE_MAX_TOKENS = 2000
ANTHROPIC_VERSION = "2023-06-01"
SYSTEM_PROMPT = (
"You are RigDoctor's hardware-diagnostics assistant for Linux gamers. You are given the "
"structured findings RigDoctor collected from this machine, and a set of reference facts. "
"Explain in plain language what the findings mean, identify the most likely root cause of "
"any problem, and give concrete, ordered next steps (exact commands where useful). Base "
"your reasoning ONLY on the findings and reference facts provided — do not invent readings, "
"hardware, or log lines. Be concise and practical. Present fixes as suggestions, and clearly "
"warn before any step that could cause data loss or instability."
"You are RigDoctor's hardware-diagnostics assistant for Linux gamers (Ubuntu + NVIDIA, games "
"via Steam/Proton). You are given session context, the structured findings RigDoctor "
"collected — which may include recent game/Proton/system log excerpts scoped to this session "
"— plus reference facts. Use the GAME NAME from the session context; never guess the game "
"from log paths or app IDs. Correlate log errors with the findings to pinpoint WHEN and WHY "
"things went wrong, identify the most likely root cause, and give concrete, ordered next "
"steps with exact Linux commands where useful.\n"
"Rules: Base your reasoning ONLY on the data and reference facts provided — never invent "
"readings, hardware, or log lines. This is LINUX: never suggest Windows-only steps (e.g. "
"'run as administrator', registry edits, toggling antivirus). Treat log lines flagged BENIGN "
"in the reference facts as non-causal. If no crash was recorded and there are no warning or "
"critical findings, say plainly that the session looks healthy and do NOT manufacture a "
"problem. Be concise. Present fixes as suggestions and warn before anything that risks data "
"loss or instability. Format your answer in Markdown."
)
@@ -79,10 +92,35 @@ def provider_label() -> str:
return "not configured"
def appid_glossary(text: str) -> str:
"""Resolve Steam app IDs that appear in `text` against the user's scanned library.
We don't teach the model app IDs — we look them up locally and hand it the mapping, so it
names games correctly instead of guessing. Only IDs we can resolve are listed.
"""
candidates = set(_APPID_RE.findall(text))
if not candidates:
return ""
try:
from . import steam
names = steam.appid_names()
except Exception: # never let a glossary lookup break an explanation
return ""
known = sorted((i, names[i]) for i in candidates if i in names)
if not known:
return ""
return "App IDs (resolved from your installed games):\n" + "\n".join(
f"- {appid} = {name}" for appid, name in known)
def build_prompt(findings_text: str) -> str:
"""The user-message content: matched reference facts + the collected findings."""
facts = ai_knowledge.relevant(findings_text)
"""The user-message content: app-ID glossary + matched reference facts + the findings."""
parts = []
glossary = appid_glossary(findings_text)
if glossary:
parts.append(glossary)
parts.append("")
facts = ai_knowledge.relevant(findings_text)
if facts:
parts.append("Reference facts (use these to interpret the findings):")
parts += [f"- {f}" for f in facts]
@@ -112,6 +150,24 @@ def explain(findings_text: str, timeout: float = 120.0) -> tuple[bool, str]:
return False, f"Unexpected response from the AI provider: {exc}"
def explain_stream(findings_text: str, on_chunk, timeout: float = 180.0) -> tuple[bool, str]:
"""Like :func:`explain`, but calls ``on_chunk(text_delta)`` as tokens arrive and returns
``(ok, full_text)`` at the end. Caller MUST be a direct user action (D24)."""
content = build_prompt(findings_text)
try:
if provider() == "claude":
return _claude_stream(content, on_chunk, timeout)
if provider() == "ollama":
return _ollama_stream(content, on_chunk, timeout)
return False, "No AI provider is configured (Settings → AI assistant)."
except urllib.error.HTTPError as exc:
return False, _http_error(exc)
except (urllib.error.URLError, OSError, TimeoutError) as exc:
return False, f"Couldn't reach the AI provider: {exc}"
except (ValueError, KeyError, IndexError) as exc:
return False, f"Unexpected response from the AI provider: {exc}"
def _post(url: str, payload: dict, headers: dict, timeout: float) -> dict:
req = urllib.request.Request(
url, data=json.dumps(payload).encode("utf-8"),
@@ -147,6 +203,65 @@ def _claude(content: str, timeout: float) -> tuple[bool, str]:
return True, text.strip() or "(the model returned no text)"
def _stream_request(url: str, payload: dict, headers: dict, timeout: float):
req = urllib.request.Request(
url, data=json.dumps(payload).encode("utf-8"),
headers={"Content-Type": "application/json", **headers})
return urllib.request.urlopen(req, timeout=timeout)
def _ollama_stream(content: str, on_chunk, timeout: float) -> tuple[bool, str]:
if not model():
return False, "No Ollama model is set (Settings → AI assistant)."
payload = {"model": model(), "system": SYSTEM_PROMPT, "prompt": content, "stream": True}
parts: list[str] = []
with _stream_request(endpoint().rstrip("/") + "/api/generate", payload, {}, timeout) as resp:
for raw in resp: # newline-delimited JSON objects
line = raw.decode("utf-8", "replace").strip()
if not line:
continue
obj = json.loads(line)
chunk = obj.get("response", "")
if chunk:
parts.append(chunk)
on_chunk(chunk)
if obj.get("done"):
break
return True, "".join(parts).strip() or "(the model returned an empty response)"
def _claude_stream(content: str, on_chunk, timeout: float) -> tuple[bool, str]:
key = config.load_ai_key()
if not key:
return False, "No Claude API key is set (Settings → AI assistant)."
payload = {
"model": model(), "max_tokens": CLAUDE_MAX_TOKENS, "system": SYSTEM_PROMPT,
"messages": [{"role": "user", "content": content}], "stream": True,
}
headers = {"x-api-key": key, "anthropic-version": ANTHROPIC_VERSION}
parts: list[str] = []
with _stream_request(CLAUDE_ENDPOINT, payload, headers, timeout) as resp:
for raw in resp: # SSE: parse `data:` lines, accumulate text deltas
line = raw.decode("utf-8", "replace").strip()
if not line.startswith("data:"):
continue
try:
event = json.loads(line[5:].strip())
except ValueError:
continue
etype = event.get("type")
if etype == "content_block_delta" and event.get("delta", {}).get("type") == "text_delta":
chunk = event["delta"].get("text", "")
if chunk:
parts.append(chunk)
on_chunk(chunk)
elif etype == "error":
return False, event.get("error", {}).get("message", "stream error")
elif etype == "message_stop":
break
return True, "".join(parts).strip() or "(the model returned no text)"
def _http_error(exc: urllib.error.HTTPError) -> str:
detail = ""
try:
+12
View File
@@ -64,6 +64,18 @@ ENTRIES: list[tuple[tuple[str, ...], str]] = [
(("nvidia persistence", "persistence mode"),
"NVIDIA persistence mode keeps the driver loaded when no app is using the GPU, avoiding "
"re-init stalls — harmless to enable."),
(("libnvidia-ml.so", "interface.h", "failed to load \"libnvidia-ml"),
"BENIGN: a Steam log assertion 'Failed to load libnvidia-ml.so.1' (from interface.h) is "
"logged on many normal launches — the Steam runtime sandbox can't see the host NVML library. "
"It is NOT by itself a crash cause. Only investigate the driver if the GPU is genuinely "
"undetected (nvidia-smi fails)."),
(("minidump", ".dmp", "uploading minidump"),
"BENIGN-by-default: a minidump upload line means a crash handler ran AND that the game/engine "
"routinely uploads dumps; it is not proof that THIS session crashed unless a hard freeze or "
"non-zero exit was also recorded. Don't treat a routine minidump line as the root cause."),
(("fork without exec", "skipping destruction"),
"BENIGN: 'pid X != Y, skipping destruction (fork without exec?)' is routine Steam/Proton "
"process bookkeeping, not an error."),
]
+41 -5
View File
@@ -1,8 +1,9 @@
"""Desktop alerts (M8): notify on overheat / GPU-lost / new version via notify-send.
"""Desktop alerts (M8): notify on overheat / GPU-lost / critical kernel events / new version.
Edge-triggered: an alert fires when a condition becomes true (not every sample), and
can fire again only after it has cleared and a cooldown has passed — so a hot GPU or a
1-Hz sample loop doesn't spam notifications. Degrades to a no-op if notify-send is absent.
Edge-triggered: a sustained condition (hot GPU, GPU-lost) fires once when it becomes true and
can re-fire only after it clears + a cooldown; momentary **kernel events** (Xid, OOM-kill, MCE,
PCIe AER, disk I/O errors) are scanned from the kernel log every `event_interval` seconds and
fire one-shot (cooldown-gated). So a 1-Hz sample loop never spams. No-op if notify-send absent.
"""
from __future__ import annotations
@@ -57,13 +58,16 @@ def notify(title: str, message: str, urgency: str = "normal") -> bool:
class AlertMonitor:
"""Evaluate samples and raise edge-triggered desktop alerts."""
def __init__(self, gpu_temp: float = 90.0, cpu_temp: float = 95.0, cooldown: float = 300.0):
def __init__(self, gpu_temp: float = 90.0, cpu_temp: float = 95.0, cooldown: float = 300.0,
event_interval: float = 30.0):
self.gpu_temp = gpu_temp
self.cpu_temp = cpu_temp
self.cooldown = cooldown
self.event_interval = event_interval # how often to scan the kernel log
self.enabled = True
self._active: dict[str, bool] = {}
self._last: dict[str, float] = {}
self._last_kernel_scan = time.time() # only alert on events after the monitor starts
def _fire(self, key: str, title: str, message: str, urgency: str = "critical") -> None:
if self._active.get(key):
@@ -75,9 +79,39 @@ class AlertMonitor:
self._last[key] = now
notify(title, message, urgency)
def _notify_once(self, key: str, title: str, message: str, urgency: str = "critical") -> None:
"""One-shot alert for a momentary event (cooldown-gated, no active latch)."""
now = time.time()
if now - self._last.get(key, 0.0) < self.cooldown:
return
self._last[key] = now
notify(title, message, urgency)
def _clear(self, key: str) -> None:
self._active[key] = False
def _scan_kernel_events(self) -> None:
"""Periodically scan the kernel log for new critical events (Xid/OOM/MCE/PCIe/disk)."""
now = time.time()
if now - self._last_kernel_scan < self.event_interval:
return
since = self._last_kernel_scan
self._last_kernel_scan = now
try:
from . import syslogs
text = syslogs.kernel_log(since=since)
except Exception: # alerting must never crash the sample loop
return
if not text:
return
seen: set[str] = set()
for label, line in syslogs.scan_critical(text):
if label in seen: # one alert per category per scan
continue
seen.add(label)
self._notify_once(f"kernel:{label}", label, line[:180])
def check(self, sample: Sample) -> None:
if not self.enabled:
return
@@ -107,3 +141,5 @@ class AlertMonitor:
self._fire("gpu_lost", "GPU not responding", "nvidia-smi query timed out — the GPU may have dropped")
else:
self._clear("gpu_lost")
self._scan_kernel_events() # Xid / OOM / MCE / PCIe / disk I/O from the kernel log
+63
View File
@@ -0,0 +1,63 @@
"""Application logging (M15) — opt-in via the `logging_enabled` setting.
When enabled, app events/errors are written to a rotating file (`config.APP_LOG`); when
disabled, nothing is written (no file is created). All RigDoctor code logs through
``applog.get_logger(__name__)``; the handler is attached once at startup by :func:`setup`.
Stdlib ``logging`` only.
"""
from __future__ import annotations
import logging
from logging.handlers import RotatingFileHandler
from .. import config
_ROOT = "rigdoctor"
_configured = False
def setup(force: bool = False) -> bool:
"""Attach the file handler if logging is enabled. Idempotent. Returns whether it's on."""
global _configured
logger = logging.getLogger(_ROOT)
enabled = bool(config.load_config().get("logging_enabled", False))
if not enabled:
if force: # toggled off at runtime — detach so we stop writing
for h in list(logger.handlers):
logger.removeHandler(h)
h.close()
_configured = False
return False
if _configured and not force:
return True
for h in list(logger.handlers): # avoid duplicate handlers on re-setup
logger.removeHandler(h)
h.close()
try:
config.STATE_DIR.mkdir(parents=True, exist_ok=True)
handler = RotatingFileHandler(config.APP_LOG, maxBytes=2_000_000, backupCount=3,
encoding="utf-8")
handler.setFormatter(logging.Formatter(
"%(asctime)s %(levelname)-7s %(name)s: %(message)s"))
logger.addHandler(handler)
logger.setLevel(logging.INFO)
logger.propagate = False
_configured = True
logger.info("logging started (rigdoctor %s)", _version())
except OSError:
return False
return True
def get_logger(name: str) -> logging.Logger:
"""A child logger. Safe to call before setup — it just won't write until enabled."""
short = name.split(".")[-1]
return logging.getLogger(f"{_ROOT}.{short}")
def _version() -> str:
from .. import __version__
return __version__
+20 -2
View File
@@ -28,6 +28,7 @@ class DiagnosticResult:
game: str | None
summary: Summary # capture window: peak temps/power, events, last samples (M3)
findings: list[Finding] # health findings: Xid/SMART/driver/etc. (M4)
dir: str | None = None # storage directory when logging is on (M15); else None
@dataclass
@@ -97,7 +98,22 @@ def finish(last_n: int = 10, log_path=None) -> DiagnosticResult:
summary = summarize(path, last_n=last_n)
game = _game_from_summary(summary) or (reccontrol.read_status() or {}).get("game")
findings = run_health_checks()
return DiagnosticResult(game=game, summary=summary, findings=findings)
result = DiagnosticResult(game=game, summary=summary, findings=findings)
_store(result, path, summary)
return result
def _store(result: DiagnosticResult, capture_path, summary: Summary) -> None:
"""Persist the diagnostic to its own directory when logging is enabled (M15)."""
try:
from . import diagstore
since = (summary.start - 60) if summary.start else None
directory = diagstore.store(result, capture_path, since=since)
if directory:
result.dir = str(directory)
except Exception: # storage must never break a diagnostic
pass
# --- hard-crash detection & post-crash analysis -----------------------------------
@@ -184,4 +200,6 @@ def analyze_crash(last_n: int = 15) -> DiagnosticResult:
findings += check_previous_boot() # the crashed boot's kernel log
findings += run_health_checks(include_journal=False) # SMART/driver/persistence/temps
findings.sort(key=lambda f: _SEV_ORDER.get(f.severity, 9))
return DiagnosticResult(game=_game_from_summary(summary), summary=summary, findings=findings)
result = DiagnosticResult(game=_game_from_summary(summary), summary=summary, findings=findings)
_store(result, _crash_path(), summary)
return result
+152
View File
@@ -0,0 +1,152 @@
"""Per-diagnostic storage + Report bundles (M15) — opt-in via `logging_enabled`.
When logging is on, each finished diagnostic is persisted to its own directory under
``config.DIAGNOSTICS_DIR/<id>/`` (capture log, structured result, human-readable report, a
game-log snapshot, and any AI interactions). "Report" zips one directory — including exactly
**what was sent to the AI, which model, and its reply** — into ``config.REPORTS_DIR``.
"""
from __future__ import annotations
import json
import shutil
import time
import zipfile
from dataclasses import asdict, is_dataclass
from pathlib import Path
from .. import config
def enabled() -> bool:
return bool(config.load_config().get("logging_enabled", False))
def _slug(name: str | None) -> str:
s = "".join(c if c.isalnum() else "-" for c in (name or "session").lower())
return s.strip("-")[:40] or "session"
def _new_dir(game: str | None) -> Path:
base = config.DIAGNOSTICS_DIR
stamp = time.strftime("%Y%m%d-%H%M%S")
name = f"{stamp}-{_slug(game)}"
target = base / name
n = 1
while target.exists():
target = base / f"{name}-{n}"
n += 1
target.mkdir(parents=True, exist_ok=True)
return target
def _as_dict(obj):
if is_dataclass(obj):
return asdict(obj)
return getattr(obj, "__dict__", {}) or str(obj)
def store(result, capture_path=None, since: float | None = None) -> Path | None:
"""Persist a finished diagnostic to its own directory. Returns the dir, or None if off."""
if not enabled():
return None
from ..render import render_summary
from . import ai, gamelogs, syslogs
target = _new_dir(getattr(result, "game", None))
if capture_path and Path(capture_path).exists():
try:
shutil.copyfile(capture_path, target / "capture.jsonl")
except OSError:
pass
payload = {
"game": getattr(result, "game", None),
"stored_at": time.time(),
"summary": _as_dict(result.summary),
"findings": [_as_dict(f) for f in result.findings],
}
_write(target / "result.json", json.dumps(payload, indent=2, default=str))
report = [f"Game: {getattr(result, 'game', None) or 'unknown'}", "",
render_summary(result.summary), "",
ai.format_findings(result.findings, header="Findings:")]
_write(target / "report.txt", "\n".join(report))
try:
logs = gamelogs.collect(since=since)
if logs:
_write(target / "gamelogs.txt", logs)
except OSError:
pass
try:
sys_logs = syslogs.collect(since=since)
if sys_logs:
_write(target / "syslogs.txt", sys_logs)
except OSError:
pass
try: # full hardware/OS inventory (M5) — invaluable for larger debugging in a shared report
from . import inventory
sections = inventory.collect()
_write(target / "inventory.txt", inventory.render_text(sections))
_write(target / "inventory.json", inventory.render_json(sections))
except Exception: # inventory probes vary by machine; never let it break storage
pass
return target
def record_ai(diag_dir, *, provider: str, model: str, system: str, prompt: str, response: str) -> None:
"""Save one AI interaction (exact data sent, model, reply) into the diagnostic's `ai/` dir."""
if not diag_dir:
return
out = Path(diag_dir) / "ai"
try:
out.mkdir(parents=True, exist_ok=True)
except OSError:
return
stamp = time.strftime("%Y%m%d-%H%M%S")
record = {
"timestamp": time.time(), "provider": provider, "model": model,
"system_prompt": system, "data_sent_to_model": prompt, "model_reply": response,
}
_write(out / f"explain-{stamp}.json", json.dumps(record, indent=2, default=str))
readable = (
f"Provider: {provider}\nModel: {model}\n\n"
f"=== System prompt ===\n{system}\n\n"
f"=== Data sent to the model ===\n{prompt}\n\n"
f"=== Model reply ===\n{response}\n"
)
_write(out / f"explain-{stamp}.txt", readable)
def make_report(diag_dir) -> Path:
"""Zip a diagnostic directory (plus the app log) into REPORTS_DIR; return the zip path."""
diag_dir = Path(diag_dir)
config.REPORTS_DIR.mkdir(parents=True, exist_ok=True)
out = config.REPORTS_DIR / f"report-{diag_dir.name}.zip"
with zipfile.ZipFile(out, "w", zipfile.ZIP_DEFLATED) as zf:
for path in sorted(diag_dir.rglob("*")):
if path.is_file():
zf.write(path, arcname=str(Path(diag_dir.name) / path.relative_to(diag_dir)))
if config.APP_LOG.exists(): # the application log, for context around the session
zf.write(config.APP_LOG, arcname=str(Path(diag_dir.name) / "app.log"))
return out
def latest_dir() -> Path | None:
try:
dirs = [d for d in config.DIAGNOSTICS_DIR.iterdir() if d.is_dir()]
except OSError:
return None
return max(dirs, key=lambda d: d.stat().st_mtime) if dirs else None
def _write(path: Path, text: str) -> None:
try:
path.write_text(text, encoding="utf-8")
except OSError:
pass
+148
View File
@@ -0,0 +1,148 @@
"""Connected displays (M5): resolution + current/max refresh per monitor.
GNOME exposes the authoritative data over D-Bus (Mutter `DisplayConfig.GetCurrentState`),
which works on both X11 and Wayland — read via `busctl --json`. Plain X11 desktops fall back
to `xrandr`. Other Wayland compositors (sway/KDE) aren't covered yet and degrade to empty.
Stdlib only; every probe fails soft. Max refresh is computed at the *current* resolution, so
"can go faster" never suggests dropping resolution.
"""
from __future__ import annotations
import json
import re
import shutil
import subprocess
from dataclasses import dataclass
# A few common PNP monitor-vendor IDs → friendly names (best-effort; unknown codes pass through).
_PNP = {
"SAM": "Samsung", "DEL": "Dell", "GSM": "LG", "LGD": "LG", "AUS": "ASUS", "ACR": "Acer",
"BNQ": "BenQ", "MSI": "MSI", "AOC": "AOC", "VSC": "ViewSonic", "HWP": "HP", "HPN": "HP",
"PHL": "Philips", "GBT": "Gigabyte", "APP": "Apple", "DGC": "Dell",
}
@dataclass
class Monitor:
connector: str # e.g. "DP-1"
name: str # e.g. "Samsung LC34G55T" ("" if unknown, e.g. xrandr)
width: int
height: int
refresh: float # current Hz
max_refresh: float # max Hz available at the current resolution
@property
def can_go_faster(self) -> bool:
"""True if a meaningfully higher refresh is available at the current resolution."""
return self.max_refresh - self.refresh > 1.0
def label(self) -> str:
return f"{self.connector} · {self.name}".rstrip(" ·") if self.name else self.connector
def _run(cmd: list[str], timeout: float = 8.0) -> str:
try:
proc = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)
if proc.returncode == 0:
return proc.stdout
except (subprocess.SubprocessError, OSError):
pass
return ""
def _parse_mutter(out: str) -> list[Monitor]:
"""Parse `busctl --json` output of Mutter DisplayConfig.GetCurrentState.
data = [serial, monitors, logical_monitors, props]; each monitor is
[[connector, vendor, product, serial], [modes], props]; each mode is
[id, width, height, refresh, scale, [scales], {props}] where props may hold is-current.
"""
try:
data = json.loads(out)["data"]
raw_monitors = data[1]
except (json.JSONDecodeError, KeyError, IndexError, TypeError):
return []
monitors: list[Monitor] = []
for mon in raw_monitors:
try:
connector, vendor, product = mon[0][0], mon[0][1], mon[0][2]
modes = mon[1]
except (IndexError, TypeError):
continue
current = None
for m in modes:
props = m[6] if len(m) > 6 and isinstance(m[6], dict) else {}
if (props.get("is-current") or {}).get("data"):
current = m
break
if current is None:
continue
w, h, r = int(current[1]), int(current[2]), float(current[3])
max_r = max((float(m[3]) for m in modes if int(m[1]) == w and int(m[2]) == h), default=r)
name = f"{_PNP.get(vendor, vendor)} {product}".strip()
monitors.append(Monitor(connector, name, w, h, r, max_r))
return monitors
def _parse_xrandr(out: str) -> list[Monitor]:
"""Parse `xrandr --query`: an output line with the active WxH+x+y, then indented mode lines
whose rates carry `*` for the current one."""
monitors: list[Monitor] = []
out_re = re.compile(r"^(\S+) connected.*?(\d+)x(\d+)\+\d+\+\d+")
mode_re = re.compile(r"^\s+(\d+)x(\d+)\s+(.+)$")
name = ""
cw = ch = 0
cur_r = max_r = 0.0
def flush() -> None:
if name and cw and cur_r:
monitors.append(Monitor(name, "", cw, ch, cur_r, max_r or cur_r))
for line in out.splitlines():
mo = out_re.match(line)
if mo:
flush()
name, cw, ch = mo.group(1), int(mo.group(2)), int(mo.group(3))
cur_r = max_r = 0.0
continue
mm = mode_re.match(line)
if mm and name and int(mm.group(1)) == cw and int(mm.group(2)) == ch:
for tok in mm.group(3).split():
try:
rate = float(tok.rstrip("*+"))
except ValueError:
continue
max_r = max(max_r, rate)
if "*" in tok:
cur_r = rate
flush()
return monitors
def _mutter() -> list[Monitor]:
exe = shutil.which("busctl")
if not exe:
return []
out = _run([exe, "--user", "--json=short", "call", "org.gnome.Mutter.DisplayConfig",
"/org/gnome/Mutter/DisplayConfig", "org.gnome.Mutter.DisplayConfig",
"GetCurrentState"])
return _parse_mutter(out) if out.strip() else []
def _xrandr() -> list[Monitor]:
if not shutil.which("xrandr"):
return []
return _parse_xrandr(_run(["xrandr", "--query"]))
def collect() -> list[Monitor]:
"""Connected monitors, via the first backend that returns any (Mutter, then xrandr)."""
for backend in (_mutter, _xrandr):
try:
monitors = backend()
except Exception:
monitors = []
if monitors:
return monitors
return []
+116
View File
@@ -0,0 +1,116 @@
"""Collect recent game / Proton / Steam logs to enrich an AI diagnostic (M14).
Reads logs that already exist on disk — no change to how the game is launched. Two reliable
sources: Proton's per-app log (``~/steam-<appid>.log``, written when ``PROTON_LOG=1``) and
Steam's own console log. Each is tail-read and size-bounded so the AI prompt stays small. The
text is fed to the AI alongside the findings so it can see *when* something went wrong (a
vkd3d/DXVK error, a crash line, the exit code) rather than only the sensor summary.
"""
from __future__ import annotations
import os
import re
import time
from pathlib import Path
# Steam keeps logs under its install root; ~/.steam/steam usually symlinks to the real one.
_STEAM_LOG_DIRS = ("~/.steam/steam/logs", "~/.local/share/Steam/logs", "~/.steam/root/logs")
_STEAM_LOG_FILES = ("console-linux.txt", "console_log.txt", "stderr.txt")
_TS = re.compile(r"^\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]")
def _line_epoch(line: str) -> float | None:
m = _TS.match(line)
if not m:
return None
try:
return time.mktime(time.strptime(m.group(1), "%Y-%m-%d %H:%M:%S"))
except ValueError:
return None
def _since_filter(text: str, since: float) -> str:
"""Keep lines from the first timestamp >= `since` onward (logs are chronological).
Untimestamped lines before the window are dropped; once inside the window every line is
kept (so multi-line entries survive). This scopes a long-lived Steam log to one session.
"""
out: list[str] = []
including = False
for line in text.splitlines():
epoch = _line_epoch(line)
if epoch is not None and epoch >= since:
including = True
if including:
out.append(line)
return "\n".join(out)
def _tail(path: Path, max_bytes: int) -> str:
"""Last ``max_bytes`` of a file, decoded leniently (empty string on error)."""
try:
size = path.stat().st_size
with path.open("rb") as fh:
if size > max_bytes:
fh.seek(size - max_bytes)
return fh.read().decode("utf-8", "replace")
except OSError:
return ""
def _proton_logs() -> list[Path]:
try:
logs = list(Path.home().glob("steam-*.log"))
except OSError:
return []
return sorted(logs, key=lambda p: p.stat().st_mtime, reverse=True)
def _steam_console() -> Path | None:
for directory in _STEAM_LOG_DIRS:
base = Path(os.path.expanduser(directory))
for name in _STEAM_LOG_FILES:
candidate = base / name
if candidate.exists():
return candidate
return None
def available() -> bool:
return bool(_proton_logs() or _steam_console())
def collect(since: float | None = None, max_bytes: int = 8000) -> str:
"""Recent Proton + Steam log tails as one labelled text block ('' if none).
With ``since`` (epoch), scope to that session: skip a Proton log not written during/after
the session (a stale per-app log from an earlier game), and keep only Steam-console lines
timestamped at/after ``since`` — so we don't feed the model an unrelated past session.
"""
sections: list[str] = []
protons = _proton_logs()
if protons:
log = protons[0]
fresh = since is None or _mtime(log) >= since
tail = _tail(log, max_bytes).strip() if fresh else ""
if tail:
sections.append(f"--- Proton log ({log.name}) ---\n{tail}")
console = _steam_console()
if console:
raw = _tail(console, 40000 if since else max_bytes)
if since is not None:
raw = _since_filter(raw, since)
raw = raw.strip()[-max_bytes:].strip()
if raw:
sections.append(f"--- Steam log ({console.name}) ---\n{raw}")
return "\n\n".join(sections)
def _mtime(path: Path) -> float:
try:
return path.stat().st_mtime
except OSError:
return 0.0
+75
View File
@@ -251,6 +251,78 @@ def check_live_temps() -> list[Finding]:
)]
def check_pcie_links() -> list[Finding]:
"""Flag NVMe drives linked below their PCIe capability — a slower slot or, most often,
motherboard lane-sharing where a GPU/second card or another M.2 steals lanes from the slot.
Width reductions are reliable (reported as warnings); speed-only reductions are info (they can
also be normal link power management at idle). The GPU is intentionally not checked here:
NVIDIA drops its PCIe gen *and* width at idle, so a point-in-time snapshot is misleading.
"""
from . import inventory
findings: list[Finding] = []
for name, dev in inventory.nvme_controllers():
cur_g, cur_w, max_g, max_w = inventory.read_link(dev)
if not cur_g or not max_g:
continue
if max_w and cur_w and cur_w != max_w: # fewer lanes → almost always lane-sharing
findings.append(Finding(
WARNING, "PCIe", f"{name} linked at x{cur_w} (supports x{max_w})",
f"{name} negotiated PCIe Gen{cur_g} x{cur_w}, but the drive supports "
f"Gen{max_g} x{max_w}. Fewer lanes is usually motherboard lane-sharing — a GPU or a "
"second card in a PCIe slot, or another populated M.2, can steal lanes from this slot.",
"Check your board manual's lane-sharing table; move the drive to a full-x4 "
"(often CPU-attached) M.2 slot."))
elif cur_g < max_g: # full width but a lower generation → slower slot or idle ASPM
findings.append(Finding(
INFO, "PCIe", f"{name} linked at Gen{cur_g} (supports Gen{max_g})",
f"{name} negotiated PCIe Gen{cur_g} but supports Gen{max_g}. This can be a slower "
"(chipset or older) M.2 slot, or normal link power management (ASPM) at idle.",
"If you expect full speed, check the slot and the BIOS PCIe/ASPM settings."))
return findings
def check_displays() -> list[Finding]:
"""Flag monitors running below their max refresh rate at the current resolution — e.g. a
165 Hz panel set to 60 Hz, a common and easily-missed gaming setting (read-only suggestion)."""
from . import displays
findings: list[Finding] = []
for m in displays.collect():
if m.can_go_faster:
findings.append(Finding(
INFO, "Display",
f"{m.connector} at {round(m.refresh)} Hz (supports {round(m.max_refresh)} Hz)",
f"{m.name or m.connector} is running at {round(m.refresh)} Hz at "
f"{m.width}x{m.height}, but supports {round(m.max_refresh)} Hz at that resolution.",
"Raise the refresh rate in your desktop's Display settings (GNOME: Settings → Displays)."))
return findings
def check_memory_speed() -> list[Finding]:
"""Flag RAM running below its rated speed — i.e. the XMP (Intel) / EXPO (AMD) profile isn't
enabled, leaving memory bandwidth on the table. Needs dmidecode (root); silent without it."""
from . import elevation, inventory
priv = elevation.privileged()
dmi = priv["dmidecode"] if (priv and priv.get("dmidecode")) else inventory._dmidecode()
worst: tuple[int, int] | None = None # (configured, rated) with the biggest gap
for m in dmi.get("memory", []):
configured, rated = inventory.module_speed(m)
if configured and rated and configured < rated:
if worst is None or (rated - configured) > (worst[1] - worst[0]):
worst = (configured, rated)
if worst is None:
return []
configured, rated = worst
return [Finding(
INFO, "Memory", f"RAM at {configured} MT/s (rated {rated} MT/s)",
f"Memory is running at {configured} MT/s but the modules are rated {rated} MT/s — the "
"XMP/EXPO profile isn't enabled, so you're leaving memory bandwidth on the table.",
"Enable XMP (Intel) or EXPO (AMD) in your BIOS/UEFI to run at the rated speed.")]
def run_health_checks(include_journal: bool = True) -> list[Finding]:
"""Run all checks and return findings sorted by severity (worst first).
@@ -273,5 +345,8 @@ def run_health_checks(include_journal: bool = True) -> list[Finding]:
else:
findings += check_smart()
findings += check_live_temps()
findings += check_pcie_links()
findings += check_displays()
findings += check_memory_speed() # uses elevation data if present, else dmidecode (root)
findings.sort(key=lambda f: _ORDER.get(f.severity, 9))
return findings
+109 -5
View File
@@ -9,6 +9,7 @@ from __future__ import annotations
import json
import os
import platform
import re
import shutil
import subprocess
from dataclasses import dataclass
@@ -85,6 +86,35 @@ def _firmware(dmi: dict) -> Section:
return Section("Firmware", items)
# Common DDR5 XMP/EXPO speed grades (MT/s) — used to read a kit's rated speed from its part
# number, since with XMP/EXPO off dmidecode only reports the JEDEC base (e.g. 4800).
_DDR_SPEEDS = {4800, 5200, 5600, 6000, 6200, 6400, 6600, 6800, 7000, 7200, 7600, 8000, 8200, 8400}
def _mts(value: str) -> int | None:
"""Parse a dmidecode speed like '4800 MT/s' (or 'MHz') to its integer MT/s."""
m = re.match(r"\s*(\d+)", value or "")
return int(m.group(1)) if m else None
def _rated_from_part(part: str) -> int | None:
"""The highest known DDR speed-grade appearing as a 4-digit token in a part number."""
grades = [int(n) for n in re.findall(r"(?<!\d)(\d{4})(?!\d)", part or "") if int(n) in _DDR_SPEEDS]
return max(grades) if grades else None
def module_speed(m: dict) -> tuple[int | None, int | None]:
"""(configured, rated) MT/s for a dmidecode Memory Device.
Configured = what it's actually running at; rated = the highest of dmidecode's reported max
and the part-number speed-grade (so an unapplied XMP/EXPO profile is still detected).
"""
configured = _mts(m.get("Configured Memory Speed") or m.get("Configured Clock Speed") or m.get("Speed", ""))
candidates = [s for s in (_mts(m.get("Speed", "")), _rated_from_part(m.get("Part Number", ""))) if s]
rated = max(candidates) if candidates else None
return configured, rated
def _memory(dmi: dict) -> Section:
items: list[tuple[str, str]] = []
try:
@@ -98,8 +128,12 @@ def _memory(dmi: dict) -> Section:
if modules:
items.append(("Modules", str(len(modules))))
for i, m in enumerate(modules):
desc = " · ".join(p for p in (m.get("Size"), m.get("Type"), m.get("Speed"), m.get("Part Number")) if p)
items.append((f"Slot {i}", desc))
configured, rated = module_speed(m)
speed = f"{configured} MT/s" if configured else m.get("Speed", "")
if rated and configured and rated > configured: # XMP/EXPO not applied
speed += f" (rated {rated})"
parts = (m.get("Size"), m.get("Type"), speed, m.get("Part Number"))
items.append((f"Slot {i}", " · ".join(p for p in parts if p)))
elif shutil.which("dmidecode"):
items.append(("Modules", "run with admin for module details"))
return Section("Memory", items)
@@ -123,6 +157,64 @@ def _gpu() -> Section:
return Section("GPU", [("Device", g) for g in gpus] or [("Device", "unknown")])
# PCIe link speed (GT/s) → generation.
_PCIE_GEN = {"2.5": 1, "5": 2, "5.0": 2, "8": 3, "8.0": 3, "16": 4, "16.0": 4, "32": 5, "32.0": 5}
def _gen(speed: str) -> int | None:
"""Map a sysfs link speed like '16.0 GT/s PCIe' to its PCIe generation (4)."""
tok = speed.strip().split()[0] if speed.strip() else ""
return _PCIE_GEN.get(tok)
def read_link(dev: Path) -> tuple[int | None, str, int | None, str]:
"""Negotiated/max PCIe link for a PCI device dir: (cur_gen, cur_width, max_gen, max_width).
Widths are the raw sysfs strings (e.g. '4'); gens are ints (4) or None when unreadable.
"""
def rd(name: str) -> str:
try:
return (dev / name).read_text().strip()
except OSError:
return ""
return (_gen(rd("current_link_speed")), rd("current_link_width"),
_gen(rd("max_link_speed")), rd("max_link_width"))
def _link_desc(dev: Path) -> str:
"""Describe a PCI device's negotiated PCIe link, noting if it's below its max.
e.g. 'PCIe Gen4 x4', or 'PCIe Gen3 x4 (capable of Gen4 x4)' when downtrained / in a
slower slot.
"""
cur_g, cur_w, max_g, max_w = read_link(dev)
if not cur_g or not cur_w:
return ""
desc = f"PCIe Gen{cur_g} x{cur_w}"
if max_g and (cur_g < max_g or (max_w and cur_w != max_w)):
desc += f" (capable of Gen{max_g} x{max_w})"
return desc
def nvme_controllers() -> list[tuple[str, Path]]:
"""Each NVMe controller as (name, pci-device-dir), e.g. ('nvme0', /sys/.../device)."""
base = Path("/sys/class/nvme")
try:
entries = [p for p in base.iterdir() if re.fullmatch(r"nvme\d+", p.name)]
except OSError:
return []
return sorted((p.name, p / "device") for p in entries)
def _nvme_link(block_name: str) -> str:
"""PCIe link for an NVMe block device (nvme0n1 → controller nvme0); '' for non-NVMe."""
m = re.match(r"(nvme\d+)", block_name)
if not m:
return ""
return _link_desc(Path("/sys/class/nvme") / m.group(1) / "device")
def _storage() -> Section:
items: list[tuple[str, str]] = []
# TYPE first so MODEL (which can contain spaces) is the trailing field.
@@ -133,15 +225,27 @@ def _storage() -> Section:
continue
name, size = parts[1], parts[2]
model = parts[3] if len(parts) > 3 else ""
items.append((name, f"{model} ({size})".strip()))
desc = f"{model} ({size})".strip()
link = _nvme_link(name) # NVMe PCIe gen/width (e.g. Gen4 x4), flags downtrains
if link:
desc += f" · {link}"
items.append((name, desc))
return Section("Storage", items or [("Disks", "unknown")])
def _display() -> Section:
return Section("Display", [
from . import displays
items = [
("Session", os.environ.get("XDG_SESSION_TYPE", "unknown")),
("Desktop", os.environ.get("XDG_CURRENT_DESKTOP") or os.environ.get("DESKTOP_SESSION", "unknown")),
])
]
for m in displays.collect():
val = f"{m.width}x{m.height} @ {round(m.refresh)} Hz"
if m.can_go_faster:
val += f" (supports {round(m.max_refresh)} Hz)"
items.append((m.label(), val))
return Section("Display", items)
def _dmidecode() -> dict:
+5
View File
@@ -318,6 +318,11 @@ def cached_games() -> list[Game]:
return [Game(**{k: g[k] for k in Game.__dataclass_fields__ if k in g}) for g in cache.get("games", [])]
def appid_names() -> dict[str, str]:
"""{appid: name} for the user's scanned games — lets us resolve IDs seen in logs (M14)."""
return {g.appid: g.name for g in cached_games() if g.appid and g.name}
def rescan(cfg: dict | None = None) -> ScanResult:
"""Scan the selected libraries, diff against the cache, and persist the result.
+165
View File
@@ -0,0 +1,165 @@
"""Session-scoped system logs for diagnostics (M15): kernel, coredumps, NVIDIA, display.
Covers what the *system* logged when something went wrong, so the report bundle and the AI both
see it:
* kernel ring-buffer slice (`journalctl -k`) Xid, OOM-killer, MCE, PCIe AER, thermal, hung tasks
* systemd-coredump records (`coredumpctl`) did the game/wine dump core (SIGSEGV/ABRT), when
* an `nvidia-smi -q` snapshot driver, throttle/clock-event reasons, clocks, power, temps, PCIe,
ECC + retired pages (point-in-time at diagnostic time)
* the display-server log `Xorg.0.log` on X11, or the compositor's user-journal slice on Wayland
Best-effort and size-bounded: degrades silently if a tool is missing or access is denied. Stdlib only.
"""
from __future__ import annotations
import os
import re
import shutil
import subprocess
import time
from pathlib import Path
_MAX = 8000 # cap each log section so the prompt/report stays small
_NV_MAX = 10000 # nvidia-smi -q is structured + valuable; allow a bit more (head-truncated)
# Compositors whose user-journal entries are the "Wayland log" (OR-matched by journalctl).
_COMPOSITORS = ("gnome-shell", "mutter", "kwin_wayland", "Xwayland", "sway", "gamescope")
_XORG_LOGS = ("~/.local/share/xorg/Xorg.0.log", "/var/log/Xorg.0.log")
def _since_arg(since: float | None) -> str | None:
return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(since)) if since else None
def _run(cmd: list[str], timeout: float = 15.0) -> str:
try:
proc = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)
except (OSError, subprocess.SubprocessError):
return ""
return (proc.stdout or "").strip()
def kernel_log(since: float | None = None, max_bytes: int = _MAX) -> str:
if not shutil.which("journalctl"):
return ""
cmd = ["journalctl", "-k", "--no-pager"]
since_arg = _since_arg(since)
if since_arg:
cmd += ["--since", since_arg]
out = _run(cmd)
if not out or out.strip().lower() == "-- no entries --": # journalctl's empty marker
return ""
return out[-max_bytes:]
def coredumps(since: float | None = None, max_bytes: int = _MAX) -> str:
if not shutil.which("coredumpctl"):
return ""
cmd = ["coredumpctl", "list", "--no-pager"]
since_arg = _since_arg(since)
if since_arg:
cmd += ["--since", since_arg]
out = _run(cmd)
if not out or "no coredumps" in out.lower():
return ""
return out[-max_bytes:]
def nvidia_snapshot(max_bytes: int = _NV_MAX) -> str:
"""Point-in-time `nvidia-smi -q` (head-truncated — driver/temps/clocks/ECC sit near the top)."""
if not shutil.which("nvidia-smi"):
return ""
out = _run(["nvidia-smi", "-q"])
return out[:max_bytes] if out else ""
def _xorg_log() -> Path | None:
for cand in _XORG_LOGS:
path = Path(os.path.expanduser(cand))
if path.exists():
return path
return None
def _session_type() -> str:
declared = os.environ.get("XDG_SESSION_TYPE", "").lower()
if declared in ("x11", "wayland"):
return declared
if os.environ.get("WAYLAND_DISPLAY"):
return "wayland"
return "x11" if _xorg_log() else "unknown"
def _tail_file(path: Path, max_bytes: int) -> str:
try:
size = path.stat().st_size
with path.open("rb") as fh:
if size > max_bytes:
fh.seek(size - max_bytes)
return fh.read().decode("utf-8", "replace")
except OSError:
return ""
def display_log(since: float | None = None, max_bytes: int = _MAX) -> str:
"""Xorg.0.log on X11, or the compositor's user-journal slice on Wayland ('' if none)."""
if _session_type() == "wayland":
if not shutil.which("journalctl"):
return ""
cmd = ["journalctl", "--user", "--no-pager"]
since_arg = _since_arg(since)
if since_arg:
cmd += ["--since", since_arg]
cmd += [f"_COMM={comp}" for comp in _COMPOSITORS] # OR-matched
out = _run(cmd)
if not out or out.strip().lower() == "-- no entries --":
return ""
return out[-max_bytes:]
log = _xorg_log() # X11: Xorg log isn't wall-clock-timestamped, so tail rather than scope
return _tail_file(log, max_bytes) if log else ""
# Kernel-log patterns worth alerting on in real time (M8 event alerts). (label, regex).
_CRITICAL = [
("GPU error (Xid)", re.compile(r"NVRM:\s*Xid", re.I)),
("Out of memory", re.compile(r"out of memory|oom-kill|killed process \d+", re.I)),
("CPU machine-check", re.compile(r"\bmce:|machine check", re.I)),
("PCIe error", re.compile(r"\bAER:|pcie bus error", re.I)),
("Disk I/O error", re.compile(
r"buffer i/o error|\bi/o error\b|critical medium error|ext4-fs error|"
r"blk_update_request:.*error|ata\d+.*(?:failed|error)", re.I)),
]
def scan_critical(text: str) -> list[tuple[str, str]]:
"""(label, line) for kernel lines matching a critical pattern (first match per line)."""
events: list[tuple[str, str]] = []
for line in text.splitlines():
for label, pat in _CRITICAL:
if pat.search(line):
events.append((label, line.strip()))
break
return events
def available() -> bool:
return bool(shutil.which("journalctl") or shutil.which("coredumpctl")
or shutil.which("nvidia-smi") or _xorg_log())
def collect(since: float | None = None) -> str:
"""Kernel + coredumps + NVIDIA snapshot + display log as one labelled block ('' if none)."""
sections: list[str] = []
kern = kernel_log(since)
if kern:
sections.append(f"--- Kernel log (journalctl -k) ---\n{kern}")
cores = coredumps(since)
if cores:
sections.append(f"--- Crashed processes (coredumpctl) ---\n{cores}")
nvidia = nvidia_snapshot()
if nvidia:
sections.append(f"--- NVIDIA snapshot (nvidia-smi -q) ---\n{nvidia}")
display = display_log(since)
if display:
sections.append(f"--- Display server log ({_session_type()}) ---\n{display}")
return "\n\n".join(sections)
+55 -3
View File
@@ -8,11 +8,14 @@ state for the UI; `apply_update` performs the no-root self-update.
from __future__ import annotations
import functools
import json
import shutil
import subprocess
import sys
import urllib.error
import urllib.request
from pathlib import Path
from .. import __version__
from ..config import load_token
@@ -31,6 +34,50 @@ UP_TO_DATE = "up-to-date"
AVAILABLE = "available"
APT_PACKAGE = "rigdoctor"
def _dpkg_owns(path: Path) -> bool:
"""True if dpkg reports `path` belongs to a package (i.e. an apt/.deb install)."""
if not shutil.which("dpkg"):
return False
try:
r = subprocess.run(["dpkg", "-S", str(path)], capture_output=True, text=True, timeout=5)
except (subprocess.SubprocessError, OSError):
return False
return r.returncode == 0 and APT_PACKAGE in r.stdout
@functools.lru_cache(maxsize=1)
def install_kind() -> str:
"""How RigDoctor was installed: 'apt' (.deb), 'pip' (venv/.run), or 'dev' (source checkout).
Decides which updater to use: only 'pip' can self-update in place; apt is root/dpkg-managed
and source is VCS-managed, so those are guided rather than auto-applied.
"""
pkg = Path(__file__).resolve().parents[1] # .../rigdoctor
if _dpkg_owns(pkg / "__init__.py"):
return "apt"
if sys.prefix != sys.base_prefix: # inside a venv → the pip/.run install
return "pip"
if (pkg.parents[1] / "pyproject.toml").exists(): # repo checkout
return "dev"
if str(pkg).startswith("/usr/") or "/dist-packages/" in str(pkg):
return "apt" # system-managed but no dpkg record — still don't pip
return "pip"
def update_hint(kind: str | None = None) -> str:
"""Human guidance for installs that can't self-update via pip (apt / source)."""
kind = kind or install_kind()
if kind == "apt":
return ("Installed via apt — update with:\n"
f" sudo apt update && sudo apt install --only-upgrade {APT_PACKAGE}")
if kind == "dev":
return "Running from a source checkout — update with `git pull`."
return ""
def _parse(version: str) -> tuple[int, ...]:
return tuple(int(p) for p in version.lstrip("vV").split(".") if p.isdigit())
@@ -100,11 +147,16 @@ def list_releases(limit: int = 15, timeout: float = 6.0) -> tuple[list[tuple[str
def apply_update(tag: str) -> tuple[int, str]:
"""Self-update the current (user-local) install to `tag` via authenticated pip.
"""Update to `tag` using the method matching how RigDoctor was installed.
Installs `rigdoctor[gui] @ git+https://oauth2:<token>@/rigdoctor.git@<tag>` into
the running environment. Returns (exit_code, output) with the token scrubbed.
Only pip/venv installs are upgraded in place (authenticated pip install of
`rigdoctor[gui] @ git+https://oauth2:<token>@/rigdoctor.git@<tag>`). apt and source
installs can't be (root/dpkg- or VCS-managed), so they return guidance instead of
attempting pip. Returns (exit_code, output) with the token scrubbed.
"""
kind = install_kind()
if kind != "pip":
return (1, update_hint(kind))
token = load_token()
if not token:
return (1, "No update token configured. Run `rigdoctor login`.")
+4
View File
@@ -17,6 +17,10 @@ ICON = Path(__file__).parent / "assets" / "rigdoctor.svg"
def main(argv: list[str] | None = None) -> int:
from ..core import applog
applog.setup() # opt-in app logging (M15); no-op unless logging_enabled
applog.get_logger(__name__).info("GUI starting")
desktop.ensure() # self-register icon + .desktop so updates show it without re-installing
app = QApplication(argv if argv is not None else sys.argv)
app.setApplicationName("RigDoctor")
+101 -18
View File
@@ -5,7 +5,7 @@ from __future__ import annotations
import threading
from PySide6.QtCore import Qt, Signal
from PySide6.QtGui import QFont
from PySide6.QtGui import QFont, QTextCursor
from PySide6.QtWidgets import (
QDialog,
QFrame,
@@ -24,11 +24,15 @@ from .widgets import finding_card
class DiagnosticDialog(QDialog):
_explained = Signal(object) # (ok, text) from a user-triggered AI explanation
_chunk = Signal(str) # streamed token delta (worker thread -> GUI)
_explained = Signal(object) # (ok, full_text) when the AI stream finishes
def __init__(self, result, parent=None) -> None:
super().__init__(parent)
self._result = result
self._stream_view = None
self._stream_status = None
self._chunk.connect(self._on_chunk)
self._explained.connect(self._on_explained)
self.setWindowTitle(f"Diagnostic — {result.game}" if result.game else "Diagnostic")
self.resize(660, 680)
@@ -86,6 +90,10 @@ class DiagnosticDialog(QDialog):
from ..core import ai
self._explain_btn.setVisible(ai.is_configured()) # opt-in only; hidden if not set up
buttons.addWidget(self._explain_btn)
self._report_btn = QPushButton("Report") # zip this diagnostic's logs (M15)
self._report_btn.clicked.connect(self._make_report)
self._report_btn.setVisible(bool(result.dir)) # only when logging stored the session
buttons.addWidget(self._report_btn)
buttons.addStretch(1)
close = QPushButton("Close")
close.setObjectName("PrimaryButton")
@@ -93,7 +101,7 @@ class DiagnosticDialog(QDialog):
buttons.addWidget(close)
root.addLayout(buttons)
# --- AI explanation (M14, D24) — runs only on this button press ----------------
# --- AI explanation (M14, D24) — streamed; runs only on this button press ----------
def _explain_with_ai(self) -> None:
from ..core import ai
@@ -107,23 +115,97 @@ class DiagnosticDialog(QDialog):
if confirm != QMessageBox.StandardButton.Yes:
return
self._explain_btn.setEnabled(False)
self._explain_btn.setText("Asking the AI…")
dialog = self._open_stream_dialog()
threading.Thread(target=self._work_explain, daemon=True).start()
dialog.exec() # streaming fills the view live via signals during this nested loop
self._stream_view = self._stream_status = None
self._explain_btn.setEnabled(True)
def _work_explain(self) -> None:
from ..core import ai
from ..core import ai, gamelogs, syslogs
text = ai.format_findings(self._result.findings, header="Diagnostic findings:")
text += "\n\nCapture summary:\n" + render_summary(self._result.summary)
self._explained.emit(ai.explain(text))
result = self._result
summary = result.summary
events = {kind for _ts, kind, _detail in summary.events}
clean = "session-stop" in events
gpu_lost = "gpu-lost" in events
lines = [f"Game: {result.game or 'unknown'}"]
if summary.start and summary.end:
lines.append(f"Capture duration: ~{int(summary.end - summary.start)}s")
outcome = "ended cleanly (no crash detected)" if clean else \
"ended without a clean stop (possible crash/freeze)"
if gpu_lost:
outcome += "; a GPU-lost event was recorded"
lines.append(f"Outcome: {outcome}")
lines.append("")
lines.append(ai.format_findings(result.findings, header="Findings:"))
lines.append("\nCapture summary:\n" + render_summary(summary))
since = (summary.start - 60) if summary.start else None
logs = gamelogs.collect(since=since) # scoped to this session
if logs:
lines.append("\nGame/Proton/Steam logs for this session:\n" + logs)
sys_logs = syslogs.collect(since=since) # kernel log + crashed-process records
if sys_logs:
lines.append("\nSystem logs for this session (kernel + crashed processes):\n" + sys_logs)
text = "\n".join(lines)
ok, reply = ai.explain_stream(text, on_chunk=lambda d: self._chunk.emit(d))
if result.dir: # record exactly what was sent, the model, and the reply (M15)
from ..core import diagstore
diagstore.record_ai(
result.dir, provider=ai.provider(), model=ai.model(),
system=ai.SYSTEM_PROMPT, prompt=ai.build_prompt(text),
response=reply if ok else f"[error] {reply}")
self._explained.emit((ok, reply))
def _on_chunk(self, delta: str) -> None:
if self._stream_view is None:
return
self._stream_view.moveCursor(QTextCursor.MoveOperation.End)
self._stream_view.insertPlainText(delta) # live plain text as tokens arrive
self._stream_view.ensureCursorVisible()
def _on_explained(self, result) -> None:
ok, text = result
self._explain_btn.setEnabled(True)
self._explain_btn.setText("Explain with AI")
self._show_explanation(text if ok else f"AI explanation failed:\n\n{text}")
if self._stream_view is not None:
if ok:
self._stream_view.setMarkdown(text) # re-render the finished answer as Markdown
else:
self._stream_view.setPlainText(f"AI explanation failed:\n\n{text}")
if self._stream_status is not None:
self._stream_status.setText(
"AI-generated suggestions — verify before acting, especially anything that changes "
"settings or data." if ok else "The request failed.")
def _show_explanation(self, text: str) -> None:
# --- Report bundle (M15) ------------------------------------------------------
def _make_report(self) -> None:
from PySide6.QtCore import QUrl
from PySide6.QtGui import QDesktopServices
from ..core import diagstore
self._report_btn.setEnabled(False)
try:
out = diagstore.make_report(self._result.dir)
except OSError as exc:
self._report_btn.setEnabled(True)
QMessageBox.warning(self, "Report failed", str(exc))
return
self._report_btn.setEnabled(True)
box = QMessageBox(self)
box.setWindowTitle("Report created")
box.setText(f"Saved report:\n{out}\n\nIt contains this diagnostic's logs and any AI "
"interaction (data sent, model, and reply).")
open_btn = box.addButton("Open folder", QMessageBox.ButtonRole.ActionRole)
box.addButton("OK", QMessageBox.ButtonRole.AcceptRole)
box.exec()
if box.clickedButton() is open_btn:
QDesktopServices.openUrl(QUrl.fromLocalFile(str(out.parent)))
def _open_stream_dialog(self) -> QDialog:
"""A live dialog the AI streams into; finalized to rendered Markdown when done."""
from ..core import ai
dlg = QDialog(self)
@@ -133,14 +215,15 @@ class DiagnosticDialog(QDialog):
view = QTextEdit()
view.setObjectName("Report")
view.setReadOnly(True)
view.setPlainText(text)
lay.addWidget(view)
note = QLabel("AI-generated suggestions — verify before acting, especially anything that changes settings or data.")
note.setObjectName("Muted")
note.setWordWrap(True)
lay.addWidget(note)
status = QLabel("Streaming from the model…")
status.setObjectName("Muted")
status.setWordWrap(True)
lay.addWidget(status)
close = QPushButton("Close")
close.setObjectName("PrimaryButton")
close.clicked.connect(dlg.accept)
lay.addWidget(close, alignment=Qt.AlignmentFlag.AlignRight)
dlg.exec()
self._stream_view = view
self._stream_status = status
return dlg
+39 -6
View File
@@ -20,6 +20,7 @@ from PySide6.QtWidgets import (
QMainWindow,
QMessageBox,
QPushButton,
QScrollArea,
QStackedWidget,
QSystemTrayIcon,
QTextEdit,
@@ -51,6 +52,10 @@ _NAV = [
("App", ["Settings", "Share"]),
]
_PAGES = [name for _section, names in _NAV for name in names]
# Pages that manage their own scrolling (pinned header + inner scroll) or must fill the
# viewport (the Share terminal) — these are added to the stack as-is; every other page is
# wrapped in a QScrollArea so it scrolls when too tall and doesn't pin the window's height.
_NO_WRAP = {"Dashboard", "System Health", "Inventory", "Share"}
_ICON = Path(__file__).parent / "assets" / "rigdoctor.svg"
@@ -68,7 +73,11 @@ class MainWindow(QMainWindow):
central = QWidget()
self.setCentralWidget(central)
layout = QHBoxLayout(central)
outer = QVBoxLayout(central)
outer.setContentsMargins(0, 0, 0, 0)
outer.setSpacing(0)
body = QWidget()
layout = QHBoxLayout(body)
layout.setContentsMargins(0, 0, 0, 0)
layout.setSpacing(0)
@@ -100,11 +109,14 @@ class MainWindow(QMainWindow):
"Share": self.share_page,
}
for name in _PAGES:
self._stack.addWidget(self._pages[name])
page = self._pages[name]
self._stack.addWidget(page if name in _NO_WRAP else self._scrollable(page))
content_layout.addWidget(self._stack)
layout.addWidget(self._build_sidebar())
layout.addWidget(content, 1)
outer.addWidget(body, 1)
outer.addWidget(self._build_footer())
self._worker = SamplerWorker(interval=interval)
self._worker.sampled.connect(self.dashboard.update_sample)
@@ -216,9 +228,6 @@ class MainWindow(QMainWindow):
v.addStretch(1)
live = QLabel(f'<span style="color:{ACCENT};">●</span> <span style="color:{MUTED};">Live</span>')
v.addWidget(live)
version = QLabel(f"v{__version__}")
version.setObjectName("Muted")
v.addWidget(version)
changelog_btn = QPushButton("Changelog")
changelog_btn.setObjectName("LinkButton")
changelog_btn.setCursor(Qt.CursorShape.PointingHandCursor)
@@ -248,6 +257,27 @@ class MainWindow(QMainWindow):
v.addWidget(self._restart_btn)
return bar
def _scrollable(self, page: QWidget) -> QScrollArea:
"""Wrap a page so it scrolls when taller than the window — and so the window can shrink
below the page's natural height instead of being pinned to it."""
area = QScrollArea()
area.setWidget(page)
area.setWidgetResizable(True)
area.setFrameShape(QFrame.Shape.NoFrame)
area.setHorizontalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAlwaysOff)
return area
def _build_footer(self) -> QFrame:
bar = QFrame()
bar.setObjectName("Footer")
h = QHBoxLayout(bar)
h.setContentsMargins(14, 5, 16, 5)
h.addStretch(1)
version = QLabel(f"RigDoctor v{__version__}")
version.setObjectName("Muted")
h.addWidget(version)
return bar
def _restart(self) -> None:
gui = os.path.join(os.path.dirname(sys.executable), "rigdoctor-gui")
if os.path.exists(gui):
@@ -259,6 +289,9 @@ class MainWindow(QMainWindow):
def _apply_update(self) -> None:
if not self._latest_tag:
return
if updates.install_kind() != "pip": # apt/source: can't pip-update — show the command
QMessageBox.information(self, "Update RigDoctor", updates.update_hint())
return
box = QMessageBox(self)
box.setWindowTitle(f"Update to {self._latest_tag}")
box.setText(f"Update RigDoctor to {self._latest_tag}?")
@@ -424,7 +457,7 @@ class MainWindow(QMainWindow):
self._update_label.setText("update check unavailable")
elif state == updates.AVAILABLE:
self._update_label.setText(f'<span style="color:{GOOD};">{tag} available</span>')
self._update_btn.setText(f"Update to {tag}")
self._update_btn.setText(f"Update to {tag}" if updates.install_kind() == "pip" else "How to update")
self._update_btn.setVisible(True)
if self._alert_monitor.enabled and tag != self._notified_update_tag:
self._notified_update_tag = tag # once per version, not every poll
+30 -3
View File
@@ -27,7 +27,7 @@ from PySide6.QtWidgets import (
)
from .. import config
from ..core import alerts, installer, service, sysenv, uninstall, updates
from ..core import ai, alerts, installer, service, sysenv, uninstall, updates
from .theme import GOOD, MUTED, WARN
@@ -114,7 +114,8 @@ class SetupPage(QWidget):
grid.addWidget(QLabel("CPU temperature alert"), 1, 0)
grid.addWidget(self._cpu_alert, 1, 1)
alerts_layout.addLayout(grid)
alerts_note = QLabel("GPU-lost and new-version alerts are included whenever notifications are enabled.")
alerts_note = QLabel("GPU-lost, critical kernel events (Xid, out-of-memory, disk I/O, PCIe), "
"and new-version alerts are included whenever notifications are enabled.")
alerts_note.setObjectName("Muted")
alerts_note.setWordWrap(True)
alerts_layout.addWidget(alerts_note)
@@ -188,7 +189,8 @@ class SetupPage(QWidget):
ai_layout.addLayout(prov_row)
self._ai_model = QLineEdit()
self._ai_model.setPlaceholderText("Model (e.g. llama3.1 for Ollama; blank = Claude default)")
self._ai_model.setPlaceholderText(
f"Model (e.g. {ai.OLLAMA_SUGGESTED_MODEL} for Ollama; blank = Claude default)")
ai_layout.addWidget(self._ai_model)
self._ai_endpoint = QLineEdit()
self._ai_endpoint.setPlaceholderText("Ollama server URL (default http://localhost:11434)")
@@ -214,6 +216,23 @@ class SetupPage(QWidget):
ai_layout.addWidget(self._ai_status)
root.addWidget(ai_card)
# Logging (M15): opt-in app logging + per-diagnostic storage (enables the Report bundle).
log_card, log_layout = _panel("Logging")
log_desc = QLabel(
"Save application logs and store each diagnostic in its own folder so you can review "
"or <b>Report</b> it. Off by default; everything stays on your machine.\n"
f"• Diagnostics: {config.DIAGNOSTICS_DIR}\n"
f"• Reports: {config.REPORTS_DIR}"
)
log_desc.setObjectName("Muted")
log_desc.setWordWrap(True)
log_layout.addWidget(log_desc)
self._logging = QCheckBox("Enable logging (application + diagnostics)")
self._logging.setChecked(config.load_config().get("logging_enabled", False))
self._logging.toggled.connect(self._toggle_logging)
log_layout.addWidget(self._logging)
root.addWidget(log_card)
# Account access (M13/M12): one Gitea token gates updates and session sharing.
upd_card, upd_layout = _panel("Account access")
hint = QLabel("A Gitea access token unlocks updates and session sharing. "
@@ -286,6 +305,8 @@ class SetupPage(QWidget):
self._ai_endpoint.setVisible(prov == "ollama")
self._ai_key.setVisible(prov == "claude")
self._ai_test_btn.setEnabled(prov != "")
if prov == "ollama" and not self._ai_model.text().strip():
self._ai_model.setText(ai.OLLAMA_SUGGESTED_MODEL) # suggested default; user can change
def _save_ai(self) -> None:
prov = self._ai_provider()
@@ -317,6 +338,12 @@ class SetupPage(QWidget):
self._ai_test_btn.setEnabled(True)
self._ai_status.setText(("" if ok else "") + (msg[:200] if msg else ""))
def _toggle_logging(self, on: bool) -> None:
from ..core import applog
config.update_config(logging_enabled=on)
applog.setup(force=True) # attach/detach the file handler immediately
def _run_wizard(self) -> None:
from .setup_wizard import SetupWizard
+2
View File
@@ -68,6 +68,8 @@ QMainWindow, #ContentArea, #Page {{ background: {BG}; }}
QLabel {{ background: transparent; }}
#Sidebar {{ background: {SIDEBAR}; border-right: 1px solid {CARD_BORDER}; }}
#Footer {{ background: {SIDEBAR}; border-top: 1px solid {CARD_BORDER}; }}
#Footer QLabel {{ font-size: 11px; }}
#AppTitle {{ font-size: 17px; font-weight: 800; }}
#AppSubtitle {{ color: {MUTED}; font-size: 11px; }}
+63
View File
@@ -62,6 +62,23 @@ class PromptTests(unittest.TestCase):
text = ai.format_findings([F()])
self.assertIn("[WARN] GPU: Hot — 92C", text)
def test_appid_glossary_resolves_known_ids(self):
from rigdoctor.core import steam
with mock.patch.object(steam, "appid_names", return_value={"2694490": "Path of Exile 2"}):
glossary = ai.appid_glossary("Steam log: removed AppID 2694490 ... pid 130544")
self.assertIn("2694490 = Path of Exile 2", glossary)
def test_appid_glossary_ignores_unknown_ids(self):
from rigdoctor.core import steam
with mock.patch.object(steam, "appid_names", return_value={"570": "Dota 2"}):
self.assertEqual(ai.appid_glossary("pid 130544 used 8192 MiB"), "") # not in library
def test_build_prompt_includes_glossary(self):
from rigdoctor.core import steam
with mock.patch.object(steam, "appid_names", return_value={"2694490": "Path of Exile 2"}):
prompt = ai.build_prompt("AppID 2694490 launched")
self.assertIn("Path of Exile 2", prompt)
class ExplainTests(unittest.TestCase):
def _cfg(self, **over):
@@ -97,5 +114,51 @@ class ExplainTests(unittest.TestCase):
self.assertEqual(headers["x-api-key"], "sk-ant-x")
class _FakeResp:
"""A context-managed iterable of byte lines, like urlopen() returns."""
def __init__(self, lines):
self._lines = [l.encode("utf-8") for l in lines]
def __enter__(self):
return iter(self._lines)
def __exit__(self, *a):
return False
class StreamTests(unittest.TestCase):
def _cfg(self, **over):
base = {"ai_provider": "", "ai_model": "", "ai_endpoint": "http://localhost:11434"}
base.update(over)
return base
def test_ollama_stream_accumulates_and_callbacks(self):
lines = ['{"response": "It is ", "done": false}',
'{"response": "the PSU.", "done": false}',
'{"response": "", "done": true}']
chunks = []
with mock.patch.object(ai.config, "load_config",
return_value=self._cfg(ai_provider="ollama", ai_model="qwen2.5:7b")), \
mock.patch.object(ai, "_stream_request", return_value=_FakeResp(lines)):
ok, full = ai.explain_stream("Xid 79", on_chunk=chunks.append)
self.assertTrue(ok)
self.assertEqual(full, "It is the PSU.")
self.assertEqual(chunks, ["It is ", "the PSU."])
def test_claude_stream_parses_sse(self):
lines = [
'event: content_block_delta',
'data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Failing "}}',
'data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"disk."}}',
'data: {"type":"message_stop"}',
]
chunks = []
with mock.patch.object(ai.config, "load_config", return_value=self._cfg(ai_provider="claude")), \
mock.patch.object(ai.config, "load_ai_key", return_value="sk-ant-x"), \
mock.patch.object(ai, "_stream_request", return_value=_FakeResp(lines)):
ok, full = ai.explain_stream("SMART 197", on_chunk=chunks.append)
self.assertTrue(ok)
self.assertEqual(full, "Failing disk.")
self.assertEqual(chunks, ["Failing ", "disk."])
if __name__ == "__main__":
unittest.main()
+30
View File
@@ -34,5 +34,35 @@ class AlertTests(unittest.TestCase):
m.assert_called_once()
class KernelEventAlertTests(unittest.TestCase):
@mock.patch.object(alerts, "notify")
def test_kernel_event_fires_once_within_cooldown(self, m):
mon = alerts.AlertMonitor(cooldown=300.0, event_interval=0.0)
mon._last_kernel_scan = 0.0 # force a scan
with mock.patch("rigdoctor.core.syslogs.kernel_log",
return_value="NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus"):
mon._scan_kernel_events()
mon._last_kernel_scan = 0.0 # force another scan — cooldown must suppress it
mon._scan_kernel_events()
self.assertEqual(m.call_count, 1)
self.assertIn("Xid", m.call_args[0][0])
@mock.patch.object(alerts, "notify")
def test_no_alert_when_kernel_log_empty(self, m):
mon = alerts.AlertMonitor(event_interval=0.0)
mon._last_kernel_scan = 0.0
with mock.patch("rigdoctor.core.syslogs.kernel_log", return_value=""):
mon._scan_kernel_events()
m.assert_not_called()
@mock.patch.object(alerts, "notify")
def test_scan_gated_by_interval(self, m):
mon = alerts.AlertMonitor(event_interval=9999.0) # just constructed → not due yet
with mock.patch("rigdoctor.core.syslogs.kernel_log", return_value="NVRM: Xid 79") as kl:
mon._scan_kernel_events()
kl.assert_not_called()
m.assert_not_called()
if __name__ == "__main__":
unittest.main()
+104
View File
@@ -0,0 +1,104 @@
"""Tests for M15 per-diagnostic storage + Report bundles + app logging."""
import json
import tempfile
import unittest
import zipfile
from dataclasses import dataclass, field
from pathlib import Path
from unittest import mock
from rigdoctor.core import applog, diagstore
@dataclass
class FakeSummary:
start: float = 1.0
end: float = 2.0
samples: int = 3
events: list = field(default_factory=list)
@dataclass
class FakeFinding:
severity: str = "ok"
category: str = "GPU"
title: str = "Looks fine"
detail: str = "no issues"
@dataclass
class FakeResult:
game: str = "Path of Exile 2"
summary: FakeSummary = field(default_factory=FakeSummary)
findings: list = field(default_factory=lambda: [FakeFinding()])
dir: str | None = None
class StoreTests(unittest.TestCase):
def setUp(self):
self.tmp = Path(tempfile.mkdtemp())
def test_disabled_returns_none(self):
with mock.patch.object(diagstore, "enabled", return_value=False):
self.assertIsNone(diagstore.store(FakeResult()))
def test_store_writes_artifacts(self):
with mock.patch.object(diagstore, "enabled", return_value=True), \
mock.patch("rigdoctor.render.render_summary", return_value="SUMMARY-TEXT"), \
mock.patch("rigdoctor.core.gamelogs.collect", return_value="LOG-TEXT"), \
mock.patch("rigdoctor.core.syslogs.collect", return_value="SYS-LOG"), \
mock.patch("rigdoctor.core.inventory.collect", return_value=[]), \
mock.patch.object(diagstore.config, "DIAGNOSTICS_DIR", self.tmp / "diagnostics"):
directory = diagstore.store(FakeResult())
self.assertTrue((directory / "result.json").exists())
self.assertTrue((directory / "report.txt").exists())
self.assertEqual((directory / "gamelogs.txt").read_text(), "LOG-TEXT")
self.assertEqual((directory / "syslogs.txt").read_text(), "SYS-LOG")
self.assertTrue((directory / "inventory.txt").exists()) # inventory included for debugging
data = json.loads((directory / "result.json").read_text())
self.assertEqual(data["game"], "Path of Exile 2")
self.assertEqual(len(data["findings"]), 1)
def test_record_ai_then_report_includes_ai_and_applog(self):
diag = self.tmp / "20260522-poe2"
diag.mkdir()
diagstore.record_ai(diag, provider="claude", model="claude-opus-4-7",
system="SYS", prompt="EXACT DATA SENT", response="THE REPLY")
ai_files = list((diag / "ai").glob("explain-*.json"))
self.assertTrue(ai_files)
record = json.loads(ai_files[0].read_text())
self.assertEqual(record["model"], "claude-opus-4-7")
self.assertEqual(record["data_sent_to_model"], "EXACT DATA SENT")
self.assertEqual(record["model_reply"], "THE REPLY")
app_log = self.tmp / "app.log"
app_log.write_text("app log line")
with mock.patch.object(diagstore.config, "REPORTS_DIR", self.tmp / "reports"), \
mock.patch.object(diagstore.config, "APP_LOG", app_log):
out = diagstore.make_report(diag)
self.assertTrue(out.exists())
with zipfile.ZipFile(out) as zf:
names = zf.namelist()
self.assertTrue(any(n.endswith("app.log") for n in names))
self.assertTrue(any("/ai/explain-" in n for n in names))
class AppLogTests(unittest.TestCase):
def test_disabled_is_noop(self):
with mock.patch.object(applog.config, "load_config", return_value={"logging_enabled": False}):
self.assertFalse(applog.setup(force=True))
def test_enabled_writes_file(self):
tmp = Path(tempfile.mkdtemp())
with mock.patch.object(applog.config, "load_config", return_value={"logging_enabled": True}), \
mock.patch.object(applog.config, "STATE_DIR", tmp), \
mock.patch.object(applog.config, "APP_LOG", tmp / "app.log"):
self.assertTrue(applog.setup(force=True))
applog.get_logger("test").info("hello world")
applog.setup(force=True) # cleanup path: re-run detaches/reattaches cleanly
self.assertTrue((tmp / "app.log").exists())
if __name__ == "__main__":
unittest.main()
+67
View File
@@ -0,0 +1,67 @@
"""Tests for display detection (Mutter D-Bus JSON + xrandr parsers)."""
import unittest
from rigdoctor.core import displays
# Minimal Mutter GetCurrentState (busctl --json) shape: current mode is 60 Hz, panel max 165 Hz.
_MUTTER_60 = (
'{"type":"x","data":[1,[[["DP-1","SAM","LC34G55T","S"],['
'["3440x1440@60",3440,1440,60.0,1.0,[1.0],{"is-current":{"type":"b","data":true}}],'
'["3440x1440@165",3440,1440,165.0,1.0,[1.0],{"is-preferred":{"type":"b","data":true}}]'
'],{}]],[],{}]}'
)
_MUTTER_MAX = (
'{"type":"x","data":[1,[[["DP-1","SAM","LC34G55T","S"],['
'["3440x1440@165",3440,1440,165.0,1.0,[1.0],{"is-current":{"type":"b","data":true}}],'
'["3440x1440@60",3440,1440,60.0,1.0,[1.0],{}]'
'],{}]],[],{}]}'
)
_XRANDR_60 = """Screen 0: minimum 8 x 8, current 3440 x 1440, maximum 16384 x 16384
DP-1 connected primary 3440x1440+0+0 (normal left inverted right x axis y axis) 800mm x 335mm
3440x1440 60.00*+ 165.00 100.00
2560x1440 165.00 60.00
HDMI-1 disconnected (normal left inverted right x axis y axis)
"""
class MutterParseTests(unittest.TestCase):
def test_parses_and_flags_higher_refresh(self):
mons = displays._parse_mutter(_MUTTER_60)
self.assertEqual(len(mons), 1)
m = mons[0]
self.assertEqual(m.connector, "DP-1")
self.assertEqual(m.name, "Samsung LC34G55T") # PNP code SAM mapped
self.assertEqual((m.width, m.height), (3440, 1440))
self.assertEqual(round(m.refresh), 60)
self.assertEqual(round(m.max_refresh), 165)
self.assertTrue(m.can_go_faster)
def test_at_max_is_not_flagged(self):
m = displays._parse_mutter(_MUTTER_MAX)[0]
self.assertEqual(round(m.refresh), 165)
self.assertFalse(m.can_go_faster)
def test_garbage_returns_empty(self):
self.assertEqual(displays._parse_mutter("not json"), [])
self.assertEqual(displays._parse_mutter("{}"), [])
class XrandrParseTests(unittest.TestCase):
def test_current_and_max_refresh(self):
mons = displays._parse_xrandr(_XRANDR_60)
self.assertEqual(len(mons), 1) # disconnected output ignored
m = mons[0]
self.assertEqual(m.connector, "DP-1")
self.assertEqual((m.width, m.height), (3440, 1440))
self.assertEqual(round(m.refresh), 60)
self.assertEqual(round(m.max_refresh), 165)
self.assertTrue(m.can_go_faster)
def test_empty_returns_empty(self):
self.assertEqual(displays._parse_xrandr(""), [])
if __name__ == "__main__":
unittest.main()
+77
View File
@@ -0,0 +1,77 @@
"""Tests for M14 game/Proton/Steam log collection."""
import os
import tempfile
import time
import unittest
from pathlib import Path
from unittest import mock
from rigdoctor.core import gamelogs
class TailTests(unittest.TestCase):
def test_tail_returns_last_bytes(self):
path = Path(tempfile.mkdtemp()) / "x.log"
path.write_text("A" * 100 + "TAIL")
out = gamelogs._tail(path, 4)
self.assertEqual(out, "TAIL")
def test_tail_short_file(self):
path = Path(tempfile.mkdtemp()) / "x.log"
path.write_text("short")
self.assertEqual(gamelogs._tail(path, 9999), "short")
def test_tail_missing(self):
self.assertEqual(gamelogs._tail(Path("/nope/x.log"), 10), "")
class CollectTests(unittest.TestCase):
def test_collect_includes_proton_and_steam(self):
tmp = Path(tempfile.mkdtemp())
proton = tmp / "steam-570.log"
proton.write_text("err: vkd3d device lost")
console = tmp / "console-linux.txt"
console.write_text("Game removed AppID 570 ... exit")
with mock.patch.object(gamelogs, "_proton_logs", return_value=[proton]), \
mock.patch.object(gamelogs, "_steam_console", return_value=console):
out = gamelogs.collect()
self.assertIn("Proton log", out)
self.assertIn("vkd3d", out)
self.assertIn("Steam log", out)
self.assertIn("exit", out)
def test_collect_empty_when_none(self):
with mock.patch.object(gamelogs, "_proton_logs", return_value=[]), \
mock.patch.object(gamelogs, "_steam_console", return_value=None):
self.assertEqual(gamelogs.collect(), "")
class SinceScopingTests(unittest.TestCase):
def test_since_filter_keeps_window_only(self):
text = (
"[2026-05-22 13:00:00] old session line\n"
"[2026-05-22 13:00:01] another old line\n"
"[2026-05-22 14:30:00] new session launch\n"
"[2026-05-22 14:30:05] new session error\n"
)
since = time.mktime(time.strptime("2026-05-22 14:00:00", "%Y-%m-%d %H:%M:%S"))
out = gamelogs._since_filter(text, since)
self.assertIn("new session launch", out)
self.assertIn("new session error", out)
self.assertNotIn("old session", out)
def test_collect_skips_stale_proton_log(self):
tmp = Path(tempfile.mkdtemp())
proton = tmp / "steam-9999.log"
proton.write_text("stale proton output from an earlier game")
old_mtime = time.time() - 3600
os.utime(proton, (old_mtime, old_mtime))
since = time.time() - 60 # session started a minute ago
with mock.patch.object(gamelogs, "_proton_logs", return_value=[proton]), \
mock.patch.object(gamelogs, "_steam_console", return_value=None):
self.assertEqual(gamelogs.collect(since=since), "") # stale log excluded
if __name__ == "__main__":
unittest.main()
+78 -1
View File
@@ -1,8 +1,20 @@
"""Tests for the M4 health report's log scanner (synthetic input)."""
import unittest
from pathlib import Path
from unittest import mock
from rigdoctor.core.health import CRITICAL, WARNING, run_health_checks, scan_journal_text
from rigdoctor.core import displays, health
from rigdoctor.core.health import (
CRITICAL,
INFO,
WARNING,
check_displays,
check_memory_speed,
check_pcie_links,
run_health_checks,
scan_journal_text,
)
class HealthScanTests(unittest.TestCase):
@@ -42,5 +54,70 @@ class HealthScanTests(unittest.TestCase):
self.assertEqual(ranks, sorted(ranks))
class PcieLinkCheckTests(unittest.TestCase):
def _with_link(self, cur_g, cur_w, max_g, max_w):
# one fake NVMe controller returning the given link tuple
return (mock.patch("rigdoctor.core.inventory.nvme_controllers",
return_value=[("nvme0", Path("/x"))]),
mock.patch("rigdoctor.core.inventory.read_link",
return_value=(cur_g, cur_w, max_g, max_w)))
def test_reduced_width_is_a_warning_about_lane_sharing(self):
ctrls, link = self._with_link(4, "2", 4, "4") # Gen4 x2 but supports x4
with ctrls, link:
findings = check_pcie_links()
self.assertEqual(len(findings), 1)
self.assertEqual(findings[0].severity, WARNING)
self.assertIn("lane-sharing", findings[0].detail)
def test_reduced_speed_only_is_info(self):
ctrls, link = self._with_link(3, "4", 4, "4") # Gen3 x4 but supports Gen4
with ctrls, link:
findings = check_pcie_links()
self.assertEqual(len(findings), 1)
self.assertEqual(findings[0].severity, INFO)
def test_full_speed_no_finding(self):
ctrls, link = self._with_link(4, "4", 4, "4")
with ctrls, link:
self.assertEqual(check_pcie_links(), [])
class DisplayCheckTests(unittest.TestCase):
def test_lower_than_max_refresh_is_flagged(self):
mon = displays.Monitor("DP-1", "Samsung LC34G55T", 3440, 1440, 60.0, 165.0)
with mock.patch("rigdoctor.core.displays.collect", return_value=[mon]):
findings = check_displays()
self.assertEqual(len(findings), 1)
self.assertEqual(findings[0].severity, INFO)
self.assertIn("165", findings[0].title)
def test_at_max_refresh_no_finding(self):
mon = displays.Monitor("DP-1", "Samsung LC34G55T", 3440, 1440, 165.0, 165.0)
with mock.patch("rigdoctor.core.displays.collect", return_value=[mon]):
self.assertEqual(check_displays(), [])
class MemorySpeedCheckTests(unittest.TestCase):
def _dmi(self, configured, part):
return {"memory": [{"Configured Memory Speed": configured, "Speed": configured,
"Part Number": part}]}
def test_flags_unapplied_expo(self):
dmi = self._dmi("4800 MT/s", "CMK32GX5M2B5600Z36")
with mock.patch("rigdoctor.core.elevation.privileged", return_value=None), \
mock.patch("rigdoctor.core.inventory._dmidecode", return_value=dmi):
findings = check_memory_speed()
self.assertEqual(len(findings), 1)
self.assertEqual(findings[0].severity, INFO)
self.assertIn("5600", findings[0].title)
def test_no_flag_at_rated(self):
dmi = self._dmi("5600 MT/s", "CMK32GX5M2B5600Z36")
with mock.patch("rigdoctor.core.elevation.privileged", return_value=None), \
mock.patch("rigdoctor.core.inventory._dmidecode", return_value=dmi):
self.assertEqual(check_memory_speed(), [])
if __name__ == "__main__":
unittest.main()
+46
View File
@@ -1,6 +1,8 @@
"""Tests for the M5 system inventory (render + dict round-trip; collect on real system)."""
import tempfile
import unittest
from pathlib import Path
from rigdoctor.core import inventory
from rigdoctor.core.inventory import Section
@@ -26,5 +28,49 @@ class InventoryTests(unittest.TestCase):
self.assertIn("- **Model:** Test CPU", md)
class PcieLinkTests(unittest.TestCase):
def test_gen_mapping(self):
self.assertEqual(inventory._gen("16.0 GT/s PCIe"), 4)
self.assertEqual(inventory._gen("8.0 GT/s PCIe"), 3)
self.assertIsNone(inventory._gen(""))
def _fake_dev(self, cur_s, cur_w, max_s, max_w) -> Path:
d = Path(tempfile.mkdtemp())
(d / "current_link_speed").write_text(cur_s)
(d / "current_link_width").write_text(cur_w)
(d / "max_link_speed").write_text(max_s)
(d / "max_link_width").write_text(max_w)
return d
def test_link_at_full_speed(self):
dev = self._fake_dev("16.0 GT/s PCIe", "4", "16.0 GT/s PCIe", "4")
self.assertEqual(inventory._link_desc(dev), "PCIe Gen4 x4")
def test_link_downtrained_flags_capability(self):
dev = self._fake_dev("8.0 GT/s PCIe", "4", "16.0 GT/s PCIe", "4")
self.assertEqual(inventory._link_desc(dev), "PCIe Gen3 x4 (capable of Gen4 x4)")
def test_non_nvme_has_no_link(self):
self.assertEqual(inventory._nvme_link("sda"), "")
class MemorySpeedTests(unittest.TestCase):
def test_rated_speed_from_part_number(self):
self.assertEqual(inventory._rated_from_part("CMK32GX5M2B5600Z36"), 5600)
self.assertEqual(inventory._rated_from_part("F5-6000J3038F16G"), 6000)
self.assertIsNone(inventory._rated_from_part("NoSpeedHere"))
def test_detects_unapplied_expo(self):
# XMP/EXPO off: dmidecode only sees JEDEC 4800; the 5600 is in the part number.
m = {"Configured Memory Speed": "4800 MT/s", "Speed": "4800 MT/s",
"Part Number": "CMK32GX5M2B5600Z36"}
self.assertEqual(inventory.module_speed(m), (4800, 5600))
def test_at_rated_speed(self):
m = {"Configured Memory Speed": "5600 MT/s", "Speed": "5600 MT/s",
"Part Number": "CMK32GX5M2B5600Z36"}
self.assertEqual(inventory.module_speed(m), (5600, 5600))
if __name__ == "__main__":
unittest.main()
+114
View File
@@ -0,0 +1,114 @@
"""Tests for M15 session-scoped system-log collection (kernel + coredumps)."""
import unittest
from unittest import mock
from rigdoctor.core import syslogs
class KernelLogTests(unittest.TestCase):
def test_passes_since_and_tails(self):
with mock.patch("shutil.which", return_value="/usr/bin/journalctl"), \
mock.patch.object(syslogs, "_run", return_value="X" * 100 + "TAILLINE") as run:
out = syslogs.kernel_log(since=1_000_000_000, max_bytes=8)
self.assertEqual(out, "TAILLINE")
cmd = run.call_args[0][0]
self.assertIn("-k", cmd)
self.assertIn("--since", cmd)
def test_missing_tool_returns_empty(self):
with mock.patch("shutil.which", return_value=None):
self.assertEqual(syslogs.kernel_log(), "")
class CoredumpTests(unittest.TestCase):
def test_empty_when_no_coredumps(self):
with mock.patch("shutil.which", return_value="/usr/bin/coredumpctl"), \
mock.patch.object(syslogs, "_run", return_value="No coredumps found."):
self.assertEqual(syslogs.coredumps(), "")
def test_returns_list(self):
with mock.patch("shutil.which", return_value="/usr/bin/coredumpctl"), \
mock.patch.object(syslogs, "_run", return_value="TIME PID SIG EXE\n... SEGV PathOfExile"):
out = syslogs.coredumps()
self.assertIn("PathOfExile", out)
class NvidiaTests(unittest.TestCase):
def test_missing_tool(self):
with mock.patch("shutil.which", return_value=None):
self.assertEqual(syslogs.nvidia_snapshot(), "")
def test_snapshot_head_truncated(self):
with mock.patch("shutil.which", return_value="/usr/bin/nvidia-smi"), \
mock.patch.object(syslogs, "_run", return_value="DRIVER\n" + "x" * 99999):
out = syslogs.nvidia_snapshot(max_bytes=10)
self.assertEqual(out, "DRIVER\nxxx") # head, not tail
class DisplayTests(unittest.TestCase):
def test_session_type_env(self):
with mock.patch.dict("os.environ", {"XDG_SESSION_TYPE": "wayland"}):
self.assertEqual(syslogs._session_type(), "wayland")
def test_x11_tails_xorg_log(self):
import tempfile
from pathlib import Path
log = Path(tempfile.mkdtemp()) / "Xorg.0.log"
log.write_text("(EE) NVIDIA(GPU-0): something failed")
with mock.patch.object(syslogs, "_session_type", return_value="x11"), \
mock.patch.object(syslogs, "_xorg_log", return_value=log):
out = syslogs.display_log()
self.assertIn("(EE) NVIDIA", out)
def test_wayland_uses_user_journal(self):
with mock.patch.object(syslogs, "_session_type", return_value="wayland"), \
mock.patch("shutil.which", return_value="/usr/bin/journalctl"), \
mock.patch.object(syslogs, "_run", return_value="gnome-shell: GPU error") as run:
out = syslogs.display_log(since=1_000_000_000)
self.assertIn("GPU error", out)
cmd = run.call_args[0][0]
self.assertIn("--user", cmd)
self.assertTrue(any(a.startswith("_COMM=") for a in cmd))
class ScanCriticalTests(unittest.TestCase):
def test_matches_each_category(self):
text = "\n".join([
"NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus",
"Out of memory: Killed process 1234 (PathOfExile)",
"mce: [Hardware Error]: CPU 0",
"pcieport 0000:00:01.0: AER: Corrected error received",
"blk_update_request: I/O error, dev sda, sector 99",
"this is a perfectly normal line",
])
labels = {label for label, _ in syslogs.scan_critical(text)}
self.assertEqual(labels, {
"GPU error (Xid)", "Out of memory", "CPU machine-check",
"PCIe error", "Disk I/O error"})
def test_clean_log_no_events(self):
self.assertEqual(syslogs.scan_critical("usb 1-2: new high-speed device\nsystemd: started"), [])
class CollectTests(unittest.TestCase):
def test_collect_combines_sections(self):
with mock.patch.object(syslogs, "kernel_log", return_value="NVRM: Xid 79"), \
mock.patch.object(syslogs, "coredumps", return_value="game SIGSEGV"), \
mock.patch.object(syslogs, "nvidia_snapshot", return_value="Driver Version 595"), \
mock.patch.object(syslogs, "display_log", return_value="(EE) NVIDIA"):
out = syslogs.collect()
for needle in ("Kernel log", "Xid 79", "Crashed processes", "SIGSEGV",
"NVIDIA snapshot", "595", "Display server log"):
self.assertIn(needle, out)
def test_collect_empty_when_nothing(self):
with mock.patch.object(syslogs, "kernel_log", return_value=""), \
mock.patch.object(syslogs, "coredumps", return_value=""), \
mock.patch.object(syslogs, "nvidia_snapshot", return_value=""), \
mock.patch.object(syslogs, "display_log", return_value=""):
self.assertEqual(syslogs.collect(), "")
if __name__ == "__main__":
unittest.main()
+64
View File
@@ -0,0 +1,64 @@
"""Tests for the M13 updater: install detection + routing the update to the right method."""
import unittest
from unittest import mock
from rigdoctor.core import updates
class InstallKindTests(unittest.TestCase):
def setUp(self):
updates.install_kind.cache_clear()
def tearDown(self):
updates.install_kind.cache_clear()
def test_apt_when_dpkg_owns_the_package(self):
with mock.patch.object(updates, "_dpkg_owns", return_value=True):
self.assertEqual(updates.install_kind(), "apt")
def test_pip_when_running_in_a_venv(self):
with mock.patch.object(updates, "_dpkg_owns", return_value=False), \
mock.patch.object(updates.sys, "prefix", "/opt/venv"), \
mock.patch.object(updates.sys, "base_prefix", "/usr"):
self.assertEqual(updates.install_kind(), "pip")
class ApplyUpdateRoutingTests(unittest.TestCase):
def test_apt_returns_guidance_and_never_runs_pip(self):
with mock.patch.object(updates, "install_kind", return_value="apt"), \
mock.patch("subprocess.run") as run:
rc, out = updates.apply_update("v9.9.9")
self.assertEqual(rc, 1)
self.assertIn("apt install --only-upgrade", out)
run.assert_not_called()
def test_dev_returns_guidance_and_never_runs_pip(self):
with mock.patch.object(updates, "install_kind", return_value="dev"), \
mock.patch("subprocess.run") as run:
rc, out = updates.apply_update("v9.9.9")
self.assertIn("git pull", out)
run.assert_not_called()
def test_pip_install_runs_pip(self):
proc = mock.Mock(returncode=0, stdout="Successfully installed", stderr="")
with mock.patch.object(updates, "install_kind", return_value="pip"), \
mock.patch.object(updates, "load_token", return_value="TOK"), \
mock.patch("subprocess.run", return_value=proc) as run:
rc, _out = updates.apply_update("v1.2.3")
self.assertEqual(rc, 0)
cmd = run.call_args[0][0]
self.assertIn("pip", cmd)
self.assertIn("install", cmd)
class UpdateHintTests(unittest.TestCase):
def test_apt_hint_names_the_apt_command(self):
self.assertIn("apt install --only-upgrade rigdoctor", updates.update_hint("apt"))
def test_dev_hint_says_git_pull(self):
self.assertIn("git pull", updates.update_hint("dev"))
if __name__ == "__main__":
unittest.main()