Compare commits

...

32 Commits

Author SHA1 Message Date
jessey 51133e4042 Merge pull request 'feat(gui): scrollable pages + version footer — 0.37.0' (#37) from fix/scrollable-pages into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #37
2026-05-22 14:29:56 +00:00
jessey bcf6ac2656 feat(gui): scrollable pages + version footer — 0.37.0
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 31s
Wrap each page (except self-scrolling Dashboard/Health/Inventory and the Share
terminal) in a QScrollArea, so long pages scroll when too tall (Settings'
Uninstall is reachable again) and the window is no longer pinned to the tallest
page's height — min height drops from >screen to ~600px, so it can be resized
smaller. Add a bottom footer showing 'RigDoctor v<version>' bottom-right (moved
out of the sidebar); themed #Footer with a top border.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 16:29:14 +02:00
jessey d59261f021 Merge pull request 'docs: registry is public now — drop the token/auth.conf.d from apt setup' (#36) from docs/public-registry into main
release / test (push) Successful in 13s
release / release (push) Successful in 15s
Reviewed-on: #36
2026-05-22 13:58:13 +00:00
jessey 44923b771a docs: registry is public now — drop the token/auth.conf.d from apt setup
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 27s
REQUIRE_SIGNIN_VIEW is off and the repo is public, so anonymous apt works. The
apt instructions no longer need a read:package token or auth.conf.d — just the
signing key + a deb822 Signed-By source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:57:40 +02:00
jessey eaaf14c58a Merge pull request 'fix(cli): correct the missing-PySide6 hint to the real apt packages — 0.36.1' (#35) from docs/apt-proper into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #35
2026-05-22 13:49:28 +00:00
jessey 7779131cf9 Merge branch 'main' into docs/apt-proper
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 27s
2026-05-22 13:48:36 +00:00
jessey 87fa678ccb fix(cli): correct the missing-PySide6 hint to the real apt packages — 0.36.1
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 26s
rigdoctor gui suggested 'apt install python3-pyside6' (no such package on
Debian/Ubuntu). Point to the split modules instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:48:20 +02:00
jessey c5e24b3984 Merge pull request 'docs: document the proper (GPG-verified, deb822) apt setup' (#34) from docs/apt-proper into main
release / test (push) Successful in 12s
release / release (push) Successful in 14s
Reviewed-on: #34
2026-05-22 13:46:10 +00:00
jessey 21cc6a4813 docs: document the proper (GPG-verified, deb822) apt setup
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 27s
Replace the trusted=yes apt instructions with the proper method: read:package
token, registry signing key dearmored into /etc/apt/keyrings, credentials in
auth.conf.d, and a modern deb822 .sources file with Signed-By + Architectures:
all. Keeps the trusted=yes one-liner as a noted fallback for unsigned registries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:44:41 +02:00
jessey ee73049248 Merge pull request 'fix(deb): auto-install all deps — correct PySide6 names + bundle tools — 0.36.0' (#33) from fix/deb-pyside6-deps into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #33
2026-05-22 13:39:01 +00:00
jessey 3a8ad5bd5d fix(deb): auto-install all deps — correct PySide6 names + bundle tools — 0.36.0
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 29s
The old Recommends named python3-pyside6 (no such package on Debian/Ubuntu —
PySide6 is split per module), so apt skipped it and the GUI couldn't start.
Now Recommends the real modules (python3-pyside6.qt{widgets,gui,websockets,svg}
+ python3-pyte) AND the optional diagnostic/gaming tools (smartmontools,
lm-sensors, dmidecode, pciutils, libnotify-bin, libsecret-tools, gamemode,
mangohud), so 'apt install rigdoctor' sets up the whole toolset automatically —
no manual installs. cpupower -> Suggests. Verified all candidates resolve in apt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:38:12 +02:00
jessey e8b84bf046 Merge pull request 'docs: rewrite README to be user-first (install + use)' (#32) from docs/readme-users into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #32
2026-05-22 13:32:41 +00:00
jessey 2342dd83aa docs: rewrite README to be user-first (install + use)
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 29s
Lead with what RigDoctor does, then install (.deb/apt incl. the private-registry
auth.conf.d + trusted=yes notes, and the .run), then usage (GUI/tray/CLI),
requirements, and privacy. Move the dev content (from-source, tests, docs links)
into a short Development section at the end. Drops the stale status/decisions/
repo-layout planning sections from the top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:31:36 +02:00
jessey a028fe6d38 Merge pull request 'ci: make apt registry upload idempotent (tolerate 409)' (#31) from fix/apt-409 into main
release / test (push) Successful in 12s
release / release (push) Successful in 16s
Reviewed-on: #31
2026-05-22 13:26:47 +00:00
jessey a6453335e9 ci: make apt registry upload idempotent (tolerate 409)
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 28s
Gitea's Debian registry is immutable, so re-uploading an existing version returns
409. With --fail that aborted the release on any re-run / repeat push at the same
version. Now we capture the HTTP code: 2xx = uploaded, 409 = already published
(skip), anything else = fail with the body. Also fixed the stale skip message
(REGISTRY_TOKEN, not PACKAGES_TOKEN).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:21:27 +02:00
jessey baec47dd4e Merge pull request 'assets: project avatar (gauge + heartbeat) for Gitea' (#30) from chore/avatar into main
release / test (push) Successful in 12s
release / release (push) Failing after 15s
Reviewed-on: #30
2026-05-22 13:18:59 +00:00
jessey 47ecb702e7 Merge branch 'main' into chore/avatar
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 28s
2026-05-22 13:17:28 +00:00
jessey 944945ce72 Merge pull request 'feat(m9): .deb package + CI build/publish — 0.35.0' (#29) from feat/deb-packaging into main
release / test (push) Successful in 13s
release / release (push) Successful in 19s
Reviewed-on: #29
2026-05-22 13:17:19 +00:00
jessey dc719f6a89 assets: project avatar (gauge + heartbeat) for Gitea
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 27s
512x512 PNG (assets/avatar.png) rendered from assets/avatar.svg, matching the app
icon's gauge-ring + heartbeat motif on a dark gradient. Upload as the repo avatar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:16:58 +02:00
jessey 78cd417d0b feat(m9): .deb package + CI build/publish — 0.35.0
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 28s
packaging/make_deb.py builds rigdoctor_<ver>_all.deb (Architecture: all) via
dpkg-deb, no debhelper: Depends python3; Recommends python3-pyside6/pyte (GUI by
default, --no-install-recommends = CLI only). Installs the package, both
launchers, desktop entry + icon; postinst refreshes the desktop database.
release.yml builds it as a release asset and optionally pushes to the Gitea apt
registry (REGISTRY_TOKEN). Verified locally: valid .deb, packaged launcher runs
'rigdoctor --version'. Docs/README/ROADMAP/MODULES updated; M9 complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:15:33 +02:00
jessey 856a3305ad Merge pull request 'feat(m8): event-based alerts — Xid/OOM/MCE/PCIe/disk from the kernel log — 0.34.0' (#28) from feat/event-alerts into main
release / test (push) Successful in 13s
release / release (push) Successful in 15s
Reviewed-on: #28
2026-05-22 12:48:41 +00:00
jessey 3b1a2e7393 Merge branch 'feat/event-alerts' of ssh://jesseyvanofferen.com:2222/jessey/rigdoctor into feat/event-alerts
tests / core (pull_request) Successful in 11s
tests / gui-smoke (pull_request) Successful in 26s
2026-05-22 14:42:53 +02:00
jessey 2989e8e23e ci: run tests.yml on pull_request only (no push) to avoid double runs
A branch with an open PR triggered both the push and pull_request events, running
every job twice. Trigger on pull_request only; pushes to main are already tested
by release.yml's `test` job. No version bump (CI config only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:42:41 +02:00
jessey 670df23e06 Merge branch 'main' into feat/event-alerts
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 26s
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 26s
2026-05-22 12:41:34 +00:00
jessey 2ee7763d00 feat(m8): event-based alerts — Xid/OOM/MCE/PCIe/disk from the kernel log — 0.34.0
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 27s
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 26s
AlertMonitor now scans the kernel log (journalctl -k) every ~30s and fires
one-shot, cooldown-gated desktop alerts on critical events: NVIDIA Xid, OOM
kills, CPU machine-checks, PCIe AER, and disk I/O errors — so users are warned
the moment something goes wrong, not only on a temperature threshold. Disk I/O
errors come from the kernel log (no root needed, unlike smartctl). Edge/spam
protection reuses the existing cooldown model. syslogs.scan_critical() does the
matching; init seeds last-scan to "now" so old boot logs don't alert on launch.
Tests for the matcher + monitor gating/cooldown; Settings note updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:41:13 +02:00
jessey bd6cad5a42 Merge pull request 'feat(ai): stream explanations live (Ollama NDJSON + Claude SSE) — 0.33.0' (#27) from feat/syslogs into main
release / test (push) Successful in 12s
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 25s
release / release (push) Successful in 15s
Reviewed-on: #27
2026-05-22 12:35:11 +00:00
jessey 7fa9b63661 Merge branch 'main' into feat/syslogs
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 25s
tests / core (pull_request) Successful in 11s
tests / gui-smoke (pull_request) Successful in 28s
2026-05-22 12:28:59 +00:00
jessey c443a8b9f8 ci: add tests workflow + gate releases on tests passing
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 38s
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 27s
- .gitea/workflows/tests.yml: run `unittest discover` on push + pull_request.
  `core` job (stdlib install, GUI tests skip) is bulletproof; `gui-smoke` job
  installs the GUI extra + offscreen Qt libs and runs the suite headless.
- release.yml: add a `test` job and `release: needs: test` so a push to main
  can't publish if the tests fail.

No version bump — CI config only; nothing in the shipped app changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:26:47 +02:00
jessey bbc22fa288 feat(ai): stream explanations live (Ollama NDJSON + Claude SSE) — 0.33.0
ai.explain_stream(findings_text, on_chunk) streams token deltas and returns
(ok, full_text). Ollama: stream=True NDJSON; Claude: stream=True SSE (parse
content_block_delta text deltas). The diagnostic dialog opens an explanation
window immediately and fills it token-by-token via a _chunk signal, then
re-renders the finished answer as Markdown — no more multi-second freeze on a
local model. Non-streaming explain() kept for the CLI. Tests for both parsers;
verified live against qwen2.5:7b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:23:15 +02:00
jessey 5502251789 Merge pull request 'feat(m15): collect session-scoped system logs (kernel + coredumps) — 0.31.0' (#26) from feat/syslogs into main
release / release (push) Successful in 15s
Reviewed-on: #26
2026-05-22 12:16:52 +00:00
jessey 4bd51a40c3 feat(m15): nvidia-smi snapshot + display logs + inventory in reports — 0.32.0
Expand diagnostic/report collection (all stored per-diagnostic, in the Report zip;
logs also fed to the AI on "Explain"):
- syslogs: nvidia-smi -q snapshot (driver/throttle/clocks/power/temps/PCIe/ECC/
  retired pages) + display-server log auto-detected — Xorg.0.log on X11, or the
  compositor user-journal slice (gnome-shell/kwin/sway/gamescope) on Wayland.
- diagstore: include the full M5 inventory (inventory.txt + .json) — invaluable
  for larger/shared debugging. inventory.collect() degrades gracefully (no root
  prompt). Best-effort throughout.
- Tests for nvidia/display + inventory in store; docs (M15/SPEC).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:16:23 +02:00
jessey 984292c368 feat(m15): collect session-scoped system logs (kernel + coredumps) — 0.31.0
core/syslogs.py gathers, scoped to the diagnostic window:
- kernel-log slice (journalctl -k): Xid, OOM, MCE, PCIe AER, thermal, hung tasks
- crashed-process records (coredumpctl): exe, signal, when
Stored as syslogs.txt in the diagnostic dir, included in the Report bundle, and
fed to the AI on "Explain" alongside the game logs. Best-effort (degrades if the
tools are missing/denied); treats journalctl's "-- No entries --" as empty.
Tests + docs (M15/SPEC).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:10:30 +02:00
25 changed files with 1000 additions and 144 deletions
+39
View File
@@ -11,7 +11,20 @@ on:
branches: [main] branches: [main]
jobs: jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install (core only)
run: python -m pip install -e .
- name: Run tests
run: python -m unittest discover -s tests -v
release: release:
needs: test # don't publish a release if the tests fail
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout - name: Checkout
@@ -30,6 +43,9 @@ jobs:
- name: Build self-extracting installer (.run) - name: Build self-extracting installer (.run)
run: python packaging/make_run.py run: python packaging/make_run.py
- name: Build .deb
run: python packaging/make_deb.py
- name: Read version - name: Read version
id: ver id: ver
run: | run: |
@@ -90,3 +106,26 @@ jobs:
"${API}/releases/${rid}/assets?name=$(basename "$f")" >/dev/null "${API}/releases/${rid}/assets?name=$(basename "$f")" >/dev/null
done done
echo "Published ${TAG}." echo "Published ${TAG}."
- name: Publish .deb to the Gitea apt registry (optional — needs REGISTRY_TOKEN)
env:
PKG_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
run: |
set -euo pipefail
if [ -z "${PKG_TOKEN:-}" ]; then
echo "REGISTRY_TOKEN not set — skipping apt publish (the .deb is still a release asset)."
exit 0
fi
OWNER="${{ github.repository_owner }}"
URL="${{ github.server_url }}/api/packages/${OWNER}/debian/pool/stable/main/upload"
for f in dist/*.deb; do
echo "Uploading $(basename "$f") to the apt registry…"
code=$(curl -sS -o /tmp/apt_upload.txt -w '%{http_code}' \
--user "${OWNER}:${PKG_TOKEN}" --upload-file "$f" "$URL" || true)
case "$code" in
2*) echo " uploaded ($code)";;
409) echo " already published ($code) — skipping (registry versions are immutable)";;
*) echo " upload failed ($code):"; cat /tmp/apt_upload.txt || true; exit 1;;
esac
done
echo "apt source: deb ${{ github.server_url }}/api/packages/${OWNER}/debian stable main"
+44
View File
@@ -0,0 +1,44 @@
name: tests
run-name: Run test suite
# Runs the unittest suite on pull requests (once per PR). Pushes to main are covered by the
# `test` job in release.yml, so we don't trigger on push here — that would double every run.
# Two jobs:
# core — stdlib-only install; the GUI tests skip (@skipUnless HAVE_QT). Bulletproof.
# gui-smoke — installs the GUI extra + offscreen Qt libs and runs the same suite headless,
# exercising the MainWindow/SetupWizard/DiagnosticDialog construction tests.
# Make `tests / core (pull_request)` a required status check on `main` so a PR can't merge red.
on:
pull_request:
jobs:
core:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install (core only — no PySide6)
run: python -m pip install -e .
- name: Run tests (GUI tests skip without PySide6)
run: python -m unittest discover -s tests -v
gui-smoke:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: System libraries for offscreen Qt
run: |
sudo apt-get update
sudo apt-get install -y libegl1 libgl1 libxkbcommon0 libdbus-1-3 libglib2.0-0
- name: Install (with GUI extra)
run: python -m pip install -e ".[gui]"
- name: Run tests (headless)
env:
QT_QPA_PLATFORM: offscreen
run: python -m unittest discover -s tests -v
+74
View File
@@ -5,6 +5,80 @@ All notable changes to RigDoctor are recorded here. Format follows
(`MAJOR.MINOR.PATCH`, pre-1.0). `__version__` and `pyproject.toml` must match the git (`MAJOR.MINOR.PATCH`, pre-1.0). `__version__` and `pyproject.toml` must match the git
release tag (so the auto-updater, D18, can compare versions). release tag (so the auto-updater, D18, can compare versions).
## [0.37.0] - 2026-05-22
### Added
- **Version footer** — a footer across the bottom of the window shows `RigDoctor v<version>` in
the bottom-right (moved out of the sidebar).
### Fixed
- **Pages scroll when content doesn't fit, and the window is no longer pinned to the tallest
page's height.** Long pages (Settings, Tuning, …) get a scrollbar when too tall — so controls
like Uninstall are always reachable — and the window can now be resized smaller than the screen
(min height dropped from "taller than the screen" to ~600px). Pages that manage their own
scroll/fill (Dashboard, System Health, Inventory, Share) are unchanged.
## [0.36.1] - 2026-05-22
### Fixed
- `rigdoctor gui` printed the wrong fix when PySide6 is missing — it suggested the non-existent
`python3-pyside6` package. Now it names the real split modules
(`python3-pyside6.qt{widgets,gui,websockets,svg}` + `python3-pyte`).
## [0.36.0] - 2026-05-22
### Fixed
- **`.deb` now installs all dependencies automatically — no manual tool install.** The previous
`Recommends: python3-pyside6` named a package that doesn't exist on Debian/Ubuntu (PySide6 is
split per module), so apt silently skipped it and the GUI wouldn't start. Now it Recommends the
actual modules the GUI imports — `python3-pyside6.qt{widgets,gui,websockets,svg}` + `python3-pyte`.
### Changed
- **`apt install rigdoctor` sets up the whole toolset.** The `.deb` also Recommends the optional
diagnostic/gaming tools (smartmontools, lm-sensors, dmidecode, pciutils, libnotify-bin,
libsecret-tools, gamemode, mangohud) so they install by default — users never hand-install
tools. `cpupower` is a Suggests (kernel-tied); `--no-install-recommends` still gives CLI-only.
## [0.35.0] - 2026-05-22
### Added
- **`.deb` package (M9 / D8)** — `packaging/make_deb.py` builds a `rigdoctor_<version>_all.deb`
(pure-Python, `Architecture: all`) via `dpkg-deb`: `Depends: python3`, with the GUI deps
(`python3-pyside6`, `python3-pyte`) as **Recommends** so `sudo apt install ./rigdoctor_*.deb`
gives the full app and `--no-install-recommends` gives CLI-only. Installs the package, both
launchers, the desktop entry, and the icon. CI (`release.yml`) builds it as a **release asset**
every release, and optionally publishes it to the Gitea **apt registry** (set a `REGISTRY_TOKEN`
secret) for `sudo apt install rigdoctor`. **M9 is now complete.**
## [0.34.0] - 2026-05-22
### Added
- **Event-based alerts (M8).** Beyond temperature + GPU-lost, RigDoctor now notifies on
**critical kernel events** — Xid (GPU error), out-of-memory kills, CPU machine-checks, PCIe
AER errors, and disk I/O errors — scanned from the kernel log every ~30s while monitoring and
fired one-shot (cooldown-gated, so no spam). A proactive warning the moment something goes
wrong, not just on a temperature threshold. Included whenever desktop notifications are on.
## [0.33.0] - 2026-05-22
### Added
- **AI explanations stream live.** "Explain with AI" now fills token-by-token as the model
generates (Ollama NDJSON + Claude SSE, both via stdlib `urllib`) instead of a multi-second
freeze, then re-renders the finished answer as Markdown. `core/ai.explain_stream()`.
## [0.32.0] - 2026-05-22
### Added
- **More for diagnostics & reports:**
- **`nvidia-smi -q` snapshot** — driver, throttle/clock-event reasons, clocks, power, temps,
PCIe link, ECC + retired pages (point-in-time at diagnostic time).
- **Display-server log** — auto-detected: `Xorg.0.log` on X11, or the compositor's user-journal
slice (gnome-shell/kwin/sway/gamescope) on Wayland.
- **Full system inventory** (M5 hardware/OS) is now included in each stored diagnostic and the
**Report** bundle — invaluable for larger/shared debugging.
These join the kernel log + coredump records in `syslogs.txt`/`inventory.*`, are saved per
diagnostic, included in the Report zip, and (logs) fed to the AI on "Explain".
## [0.31.0] - 2026-05-22
### Added
- **Diagnostics now collect session-scoped system logs** (`core/syslogs.py`): a kernel-log
slice (`journalctl -k` — Xid, OOM-killer, MCE, PCIe AER, thermal, hung tasks) and
**crashed-process records** (`coredumpctl` — which executable, signal, and when). They're saved
to the diagnostic directory (`syslogs.txt`), included in the **Report** bundle, and fed to the
AI on "Explain" alongside the game logs. Best-effort — degrades quietly if the tools are
missing or access is denied; scoped to the session window so it doesn't drag in old noise.
## [0.30.0] - 2026-05-22 ## [0.30.0] - 2026-05-22
### Added ### Added
- **Logging & report bundles (M15, D25)** — opt-in via one **Settings → Logging** toggle - **Logging & report bundles (M15, D25)** — opt-in via one **Settings → Logging** toggle
+106 -102
View File
@@ -1,132 +1,136 @@
# RigDoctor # RigDoctor
A **modular diagnostics, monitoring, and health-check toolkit for Linux gamers.** **Hardware monitoring & crash diagnostics for Linux gamers.** Live sensors, crash-safe
logging, plain-language health reports, per-game diagnostics, and optional AI explanations —
in a desktop app, a tray applet, or the terminal. Ubuntu/Debian + NVIDIA first.
> **Status:** 🟢 Phase 1 (MVP) complete. The **sensor core (M1)**, **crash-capture logger Linux gaming faults are hard to pin down — GPUs falling off the PCIe bus, black screens
> (M3)**, and **health report (M4)** all work — live `snapshot`/`monitor`, crash-safe `record` mid-game, silent thermal/VRAM throttling, driver/Proton mismatches. The useful data is
> with a post-crash report, and `report` to scan logs/SMART/driver for likely causes. A scattered across `nvidia-smi`, `/sys`, `journalctl`, and SMART, and the readings right before a
> desktop GUI (M10) ties them together (dashboard, recording, health). See `docs/ROADMAP.md`. freeze are usually lost. RigDoctor pulls it together and keeps the evidence.
## Why this exists ## Features
Linux gaming hardware faults are hard to diagnose: GPUs falling off the PCIe bus, the screen - **Live monitoring** — a dark desktop **dashboard** (history graphs + per-subsystem cards), a
suddenly going black mid-game, silent thermal/VRAM throttling, power transients, **tray applet** with at-a-glance status, and a terminal view (`rigdoctor monitor`).
driver/library mismatches, Proton quirks, and CPU governor / power-profile misconfiguration. - **Crash-safe recording** — background logger that `fsync`s every sample, so the state right
The data needed to diagnose them is scattered across `nvidia-smi`, `/sys/class/hwmon`, before a hard freeze survives. Manual, always-on, or auto-start when a game launches.
`journalctl`, SMART, and more — and the most useful readings (the ones right before a hard - **Health report** — scans `journalctl`/SMART/driver for likely causes (Xid, OOM, disk
freeze) are usually lost because nothing flushed them to disk. errors, throttling…) and explains them with suggested fixes.
- **Per-game diagnostics** — pick a game, capture while you play, get a focused report; hard
crashes are detected and analysed on next launch.
- **Gaming tune-ups** — flags risky settings (CPU governor, PCIe ASPM, persistence mode…) with
**one-click, reversible fixes**.
- **Proactive alerts** — desktop notifications on overheating and critical kernel events
(GPU-lost, Xid, out-of-memory, disk I/O).
- **AI explanations** *(optional, opt-in)* — explain a diagnostic in plain language with a
**local model (Ollama)** or **Claude**. Never automatic; only when you press the button.
- **Shareable reports** — zip a diagnostic (logs, inventory, AI transcript) to hand to someone,
or share a live **terminal session** for remote help.
- **Self-updating** — `apt upgrade`, or the in-app updater.
RigDoctor pulls all of that into one modular tool: live monitoring, crash-safe logging, a ## Install
one-shot health report, and an interactive installer that only sets up the modules a given
user actually needs for their hardware.
**Seed use cases:** an RTX 3070 that intermittently "falls off the bus" under heavy GPU load ### Debian / Ubuntu — `.deb`
(Path of Exile on Linux, Escape from Tarkov on Windows), and a monitor going black mid-game.
See `docs/SPEC.md` §1.
## How you run it The simplest path: grab the latest **`rigdoctor_<version>_all.deb`** from the
[releases page](https://git.jesseyvanofferen.com/jessey/rigdoctor/releases) and install it —
RigDoctor is **GUI-first** — the desktop app is the primary way in — but every feature is apt pulls the GUI dependencies (PySide6, pyte) automatically:
also available headless:
- **Desktop GUI** — graphical dashboard, recording controls, log browser, reports. The
default interface for most users.
- **Tray applet** — a small top-menu-bar applet with quick actions and at-a-glance status.
- **CLI** — full functionality from the terminal; works over SSH and in scripts.
The GUI/tray are optional modules; a headless (CLI-only) install loses no capability.
## Key decisions (settled)
| Topic | Decision |
|-------|----------|
| Name | **RigDoctor** |
| Language / stack | **Python 3 + Qt (PySide6)** — core/CLI/daemon stdlib-only; Qt only for GUI/tray |
| Primary distro | **Ubuntu** (Debian via apt); others best-effort later |
| Primary GPU | **NVIDIA** first; AMD, then Intel later |
| MVP | **Sensor core + crash logger + health report** (NVIDIA-only, CLI-first) |
| Distribution | **User-local install** (self-updating from the public repo, no root); **`.deb`** optional |
| Scope of action | **Read-only + suggestions** (no auto-apply yet) |
| Stress tests | **Out of scope** |
Full rationale and the still-open questions are in `docs/DECISIONS.md`.
## Repo layout
| Path | Purpose |
|------|---------|
| `docs/SPEC.md` | Product specification — vision, requirements, modules (the main planning doc) |
| `docs/ARCHITECTURE.md` | Technical design — core engine, front-ends, daemon, installer |
| `docs/MODULES.md` | Catalog of modules with scope, dependencies, status |
| `docs/ROADMAP.md` | Phased milestones |
| `docs/DECISIONS.md` | Decision log + remaining open questions |
| `src/rigdoctor/` | Source code — `core/` engine + sources, `cli.py`, `render.py` |
| `installer/` | Installer / `.deb` packaging (empty until Phase 4) |
| `tests/` | Tests (stdlib `unittest`) |
## Install (user-local, no root)
RigDoctor installs into a private venv under `~/.local` — no root, self-updating:
```bash ```bash
./install.sh # from a source checkout or the self-extracting .run sudo apt install ./rigdoctor_*_all.deb # CLI only: add --no-install-recommends
./install.sh --ref v0.0.6 # install a specific released tag (needs a token)
./install.sh --uninstall # remove it
``` ```
This adds `rigdoctor` / `rigdoctor-gui` to `~/.local/bin` and a desktop entry. Each release **Or add the apt repository** for `apt install` + automatic updates. The registry is public and
also ships a one-file **`.run`** installer (download, `chmod +x`, run). Updates are gated to GPG-signed — no token needed; just add the signing key and a deb822 source:
accounts on the Git server (a Personal Access Token); save one via the GUI **Setup → Update
access** panel or `rigdoctor login`, then `rigdoctor update` (or the sidebar button).
## Run it (dev)
Stdlib-only, no install needed (target is Python ≥ 3.11; tested on 3.14):
```bash ```bash
PYTHONPATH=src python3 -m rigdoctor snapshot # one-shot sensor read # signing key → dearmored into the keyring
PYTHONPATH=src python3 -m rigdoctor snapshot --json sudo install -d -m 0755 /etc/apt/keyrings
PYTHONPATH=src python3 -m rigdoctor monitor -n 1 # live view (Ctrl-C to quit) curl -fsSL https://git.jesseyvanofferen.com/api/packages/jessey/debian/repository.key \
PYTHONPATH=src python3 -m rigdoctor sources # list detected sensor sources | sudo gpg --dearmor -o /etc/apt/keyrings/gitea-jessey.gpg
PYTHONPATH=src python3 -m unittest discover -s tests
# the source (modern deb822 format, GPG-verified, all-arch)
sudo tee /etc/apt/sources.list.d/rigdoctor.sources >/dev/null <<'EOF'
Types: deb
URIs: https://git.jesseyvanofferen.com/api/packages/jessey/debian
Suites: stable
Components: main
Architectures: all
Signed-By: /etc/apt/keyrings/gitea-jessey.gpg
EOF
sudo apt update && sudo apt install rigdoctor
``` ```
### Crash-capture logger (M3) Then `sudo apt upgrade` keeps it current.
A crash-safe background logger (JSONL, `fsync` per sample, bounded by rotation) for catching ### Any distro — self-extracting `.run` (no root)
the state right before a freeze:
Download **`rigdoctor-<version>-installer.run`** from the releases page and run it. It installs
into a private virtualenv under `~/.local` (no root), adds the launchers + desktop entry, and
opens the first-run setup wizard:
```bash ```bash
rigdoctor record start # start logging in the background sh rigdoctor-*-installer.run
rigdoctor record status # is it running? latest readings, sample count
rigdoctor record stop # stop it
rigdoctor record report # post-crash summary: peaks, events, last samples
rigdoctor record run # run in the foreground (the systemd-ready entrypoint)
``` ```
Logs live in `~/.local/share/rigdoctor/logs/`. It detects GPU "lost"/hang (nvidia-smi query ### Updating & removing
timeout) and writes an event marker. Trigger modes (always-on / game-launch) and the
`systemd --user` service arrive in Phase 4.
### Desktop GUI (M10) - **`.deb`:** `sudo apt upgrade` (or reinstall a newer `.deb`).
- **`.run` / user-local:** the in-app **Update** button, or `rigdoctor update`.
- **Remove:** `sudo apt remove rigdoctor`, or `rigdoctor uninstall` for the user-local install.
The GUI uses PySide6 (Qt) — the only part of RigDoctor that needs a non-stdlib dep: ## Using it
Launch **RigDoctor** from your app menu, or:
```bash ```bash
pip install -e '.[gui]' # core + PySide6, gives `rigdoctor` and `rigdoctor-gui` rigdoctor-gui # desktop app (+ tray)
rigdoctor gui # or: rigdoctor-gui rigdoctor --help # everything from the terminal (works over SSH)
``` ```
It opens a dark-themed window with sidebar navigation and a **live dashboard** over the Handy CLI commands:
same sensor core — circular gauges for the headline metrics plus collapsible per-subsystem
cards (GPU/CPU/memory/storage) with temperature-colored values (icey-blue → green → red).
The **Logs** and **Health** sections are full pages (recording controls + post-crash report;
and the kernel-log / SMART / driver scan). **Inventory** is a placeholder until M5 lands.
Without the GUI extra, `pip install -e .` gives just the stdlib-only CLI. ```bash
rigdoctor snapshot # one-shot reading of every sensor
rigdoctor monitor # live terminal dashboard
rigdoctor report # health report (logs / SMART / driver)
rigdoctor diagnose start|finish # capture while gaming, then analyse
rigdoctor gameenv # flag risky gaming settings + fixes
rigdoctor inventory # hardware/OS inventory
rigdoctor ai explain # AI explanation of the current findings (opt-in)
rigdoctor bundle # zip the latest diagnostic into a shareable report
```
## Start here ## Requirements
1. Read `docs/SPEC.md` for what we're building. - **Linux** — Ubuntu/Debian first-class (the `.deb`); the `.run` works on any distro with
2. Read `docs/ROADMAP.md` for the build order (Phase 1 = the MVP). Python ≥ 3.11.
3. Read `docs/DECISIONS.md` for the settled decisions (D1D15). - **GPU** — NVIDIA fully supported (via `nvidia-smi`); AMD/Intel sensors are best-effort.
</content> - **CLI/daemon** need only Python 3 (stdlib). The **GUI/tray** add **PySide6** (`python3-pyside6`).
- Optional tools unlock more: `smartmontools`, `lm-sensors`, `gamemode`, `mangohud`. The setup
wizard offers to install them.
## Privacy
Everything stays on your machine — no telemetry, no phone-home. The AI assistant is **off by
default** and runs only when you explicitly trigger it; with Ollama nothing leaves the machine,
and the Claude option asks before sending. Reports are local files; they leave only if you share
the zip.
## Development
RigDoctor's core is stdlib-only Python; the GUI/tray use PySide6.
```bash
git clone https://git.jesseyvanofferen.com/jessey/rigdoctor && cd rigdoctor
pip install -e ".[gui]" # core + GUI; omit [gui] for CLI-only
python -m unittest discover -s tests # run the test suite
PYTHONPATH=src python3 -m rigdoctor snapshot # run without installing
```
Design docs live in `docs/``SPEC.md` (vision/requirements), `ARCHITECTURE.md`,
`MODULES.md` (module catalog), `ROADMAP.md`, and `DECISIONS.md` (the decision log).
Contributions: branch off `main`, keep tests green (CI runs them on PRs), and bump the version
+ `CHANGELOG.md` for shipped changes.
BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

+17
View File
@@ -0,0 +1,17 @@
<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512">
<defs>
<radialGradient id="bg" cx="50%" cy="42%" r="78%">
<stop offset="0%" stop-color="#1b2230"/>
<stop offset="100%" stop-color="#0d0f13"/>
</radialGradient>
</defs>
<rect width="512" height="512" fill="url(#bg)"/>
<!-- gauge ring -->
<circle cx="256" cy="256" r="168" fill="none" stroke="#2a2f39" stroke-width="28"/>
<!-- accent sweep -->
<path d="M256 88 a168 168 0 1 1 -118.8 49.2" fill="none" stroke="#38bdf8"
stroke-width="28" stroke-linecap="round"/>
<!-- heartbeat / monitoring trace -->
<path d="M120 264 H200 L232 192 L280 336 L312 264 H392" fill="none" stroke="#e6e8eb"
stroke-width="28" stroke-linecap="round" stroke-linejoin="round"/>
</svg>

After

Width:  |  Height:  |  Size: 798 B

+8 -5
View File
@@ -18,7 +18,7 @@ Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
| M6 | Gaming env checks | Diagnostics | none | all | P2 | 🟨 | | M6 | Gaming env checks | Diagnostics | none | all | P2 | 🟨 |
| M10 | Desktop GUI | Desktop UI | **python3-pyside6** | all | P2 | ✅ | | M10 | Desktop GUI | Desktop UI | **python3-pyside6** | all | P2 | ✅ |
| M11 | Tray / menu-bar applet | Desktop UI | **python3-pyside6** (+ AppIndicator on GNOME) | all | P2 | ✅ | | M11 | Tray / menu-bar applet | Desktop UI | **python3-pyside6** (+ AppIndicator on GNOME) | all | P2 | ✅ |
| M9 | Installer | (meta) | none | all | P1 | 🟨 | | M9 | Installer (+ `.deb`) | (meta) | none | all | P1 | |
| M12 | Session sharing (shared terminal) | Sharing | none (relay) | all | P3 | ✅ | | M12 | Session sharing (shared terminal) | Sharing | none (relay) | all | P3 | ✅ |
| M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | ✅ | | M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | ✅ |
| M14 | AI assistant (explain diagnostics) | (optional) | none (stdlib urllib; Ollama or Claude) | all | P3 | ✅ | | M14 | AI assistant (explain diagnostics) | (optional) | none (stdlib urllib; Ollama or Claude) | all | P3 | ✅ |
@@ -132,10 +132,13 @@ Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
- **M15 Logging & report bundles** (D25) — opt-in via one `logging_enabled` toggle (default off): - **M15 Logging & report bundles** (D25) — opt-in via one `logging_enabled` toggle (default off):
application logging to a rotating `app.log` (`core/applog.py`) and **per-diagnostic storage** application logging to a rotating `app.log` (`core/applog.py`) and **per-diagnostic storage**
(`core/diagstore.py`) — each diagnostic gets its own `DATA_DIR/diagnostics/<id>/` (capture, (`core/diagstore.py`) — each diagnostic gets its own `DATA_DIR/diagnostics/<id>/`: capture,
`result.json`, `report.txt`, scoped game logs, and an `ai/` record of every AI interaction: `result.json`, `report.txt`, the full **inventory** (M5: hardware/OS), scoped **game logs**
exact data sent, model, reply). **"Report"** zips one into `DATA_DIR/reports/` (GUI button on (`core/gamelogs.py`), scoped **system logs** (`core/syslogs.py``journalctl -k`,
the diagnostic dialog; CLI `rigdoctor bundle`). Stays local; shareable on demand. `coredumpctl`, an `nvidia-smi -q` snapshot, and the X11/Wayland display-server log), and an
`ai/` record of every AI interaction (exact data sent, model, reply). **"Report"** zips one
into `DATA_DIR/reports/` (GUI button on the diagnostic dialog; CLI `rigdoctor bundle`). Logs
are session-scoped and fed to the AI on "Explain". Stays local; shareable on demand.
## Bundles (final — D14) ## Bundles (final — D14)
- **Essential:** M1 + M3 + M4 *(the MVP, NVIDIA-only — D5)* - **Essential:** M1 + M3 + M4 *(the MVP, NVIDIA-only — D5)*
+6 -3
View File
@@ -67,9 +67,12 @@ Ubuntu + NVIDIA first; `.deb` distribution (see `DECISIONS.md`).
Settings "Recording trigger") incl. the zero-config **game-launch watcher** Settings "Recording trigger") incl. the zero-config **game-launch watcher**
(`core/watcher.py`, `rigdoctor watch`); and a **graphical first-run setup wizard** (`core/watcher.py`, `rigdoctor watch`); and a **graphical first-run setup wizard**
(`gui/setup_wizard.py`): environment → dependency-bundle selection → install → recording (`gui/setup_wizard.py`): environment → dependency-bundle selection → install → recording
trigger → readiness, auto-launched by install.sh and re-runnable from Settings. trigger → readiness, auto-launched by install.sh and re-runnable from Settings; and a
*Pending:* `.deb` packaging (next bullet). **`.deb`** (`packaging/make_deb.py`, `Architecture: all`, `Depends: python3`,
- [ ] `.deb` packaging (D8) declaring per-bundle deps incl. python3-pyside6 for Desktop UI `Recommends: python3-pyside6/pyte`) built + published in CI (release asset + optional
Gitea apt registry). **M9 complete.**
- [x] `.deb` packaging (D8) — built via `dpkg-deb` (no debhelper); GUI deps as Recommends so
`apt install rigdoctor` includes the Desktop UI, `--no-install-recommends` = CLI only.
## Phase 5 — Breadth (later) ## Phase 5 — Breadth (later)
- [ ] AMD GPU support in M1 (Steam Deck / Radeon) - [ ] AMD GPU support in M1 (Steam Deck / Radeon)
+5 -2
View File
@@ -165,8 +165,11 @@ the actual findings plus matched reference facts from a curated, exact-match kno
### M15 — Logging & report bundles (D25) ### M15 — Logging & report bundles (D25)
Opt-in (one `logging_enabled` toggle, default off). When on: the application logs to a rotating Opt-in (one `logging_enabled` toggle, default off). When on: the application logs to a rotating
`app.log`, and **each diagnostic is stored in its own directory** (capture log, structured `app.log`, and **each diagnostic is stored in its own directory** (capture log, structured
result, human-readable report, scoped game logs, and a record of every AI interaction — the result, human-readable report, the full **inventory** (M5 hardware/OS), session-scoped **game
exact data sent, the model, and its reply). A **Report** action zips one diagnostic's directory logs** (Proton/Steam) and **system logs** (`journalctl -k`, `coredumpctl`, an `nvidia-smi -q`
snapshot, and the X11/Wayland display-server log), and a record of every AI interaction — the
exact data sent, the model, and its reply). The collected logs are also fed to the AI on
"Explain". Collection is best-effort (degrades if tools are missing/denied). A **Report** action zips one diagnostic's directory
(plus the app log) into a shareable bundle saved under the reports folder (GUI button; CLI (plus the app log) into a shareable bundle saved under the reports folder (GUI button; CLI
`rigdoctor bundle`). Everything stays local — a report only leaves the machine if the user `rigdoctor bundle`). Everything stays local — a report only leaves the machine if the user
shares the zip. Stdlib only (`logging` + `zipfile`). shares the zip. Stdlib only (`logging` + `zipfile`).
+121
View File
@@ -0,0 +1,121 @@
"""Build a `.deb` for RigDoctor (M9 / D8) — dependency-light, no debhelper.
Pure-Python app, so it's `Architecture: all`: we stage the package into dist-packages, drop the
two launchers in /usr/bin, install the desktop entry + icon, write a DEBIAN/control, and call
`dpkg-deb`. The core is stdlib (`Depends: python3`); everything else is **Recommends** so a
plain `apt install rigdoctor` sets up the whole toolset automatically (users never hand-install
deps) — the GUI modules (Debian/Ubuntu split PySide6 per module, so we name
`python3-pyside6.qt{widgets,gui,websockets,svg}`) + `python3-pyte`, plus the diagnostic/gaming
tools (smartmontools, lm-sensors, dmidecode, pciutils, libnotify-bin, libsecret-tools, gamemode,
mangohud). `--no-install-recommends` still yields a CLI-only install; `cpupower` is a Suggests
(kernel-tied/heavy).
Run: `python packaging/make_deb.py` → `dist/rigdoctor_<version>_all.deb`.
"""
from __future__ import annotations
import shutil
import subprocess
import sys
from pathlib import Path
ROOT = Path(__file__).resolve().parents[1]
DIST = ROOT / "dist"
MAINTAINER = "Jessey van Offeren <jjvanofferen@gmail.com>"
HOMEPAGE = "https://git.jesseyvanofferen.com/jessey/rigdoctor"
def _version() -> str:
text = (ROOT / "src" / "rigdoctor" / "__init__.py").read_text(encoding="utf-8")
for line in text.splitlines():
if line.startswith("__version__"):
return line.split('"')[1]
raise SystemExit("could not read __version__")
_LAUNCHER = """\
#!/usr/bin/python3
import sys
from {module} import main
sys.exit(main())
"""
_DESKTOP = """\
[Desktop Entry]
Type=Application
Name=RigDoctor
Comment=Hardware monitoring & crash diagnostics for Linux gamers
Exec=rigdoctor-gui
Icon=rigdoctor
Terminal=false
Categories=System;Monitor;Utility;
StartupWMClass=rigdoctor
"""
_CONTROL = """\
Package: rigdoctor
Version: {version}
Architecture: all
Maintainer: {maintainer}
Section: utils
Priority: optional
Depends: python3 (>= 3.11)
Recommends: python3-pyside6.qtwidgets, python3-pyside6.qtgui, python3-pyside6.qtwebsockets, python3-pyside6.qtsvg, python3-pyte, smartmontools, lm-sensors, dmidecode, pciutils, libnotify-bin, libsecret-tools, gamemode, mangohud
Suggests: linux-tools-generic
Homepage: {homepage}
Description: Hardware monitoring & crash diagnostics for Linux gamers
RigDoctor monitors GPU/CPU temperatures, load, and sensors, captures crash
diagnostics while gaming, scans logs (Xid/SMART/kernel) for problems, and can
explain them in plain language. The CLI and background daemon are pure Python
(stdlib only); the optional desktop GUI and system-tray applet use PySide6,
pulled in via Recommends. Install with --no-install-recommends for CLI only.
"""
def _write(path: Path, text: str, mode: int = 0o644) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(text, encoding="utf-8")
path.chmod(mode)
def build() -> Path:
version = _version()
DIST.mkdir(exist_ok=True)
stage = DIST / f"rigdoctor_{version}_all"
if stage.exists():
shutil.rmtree(stage)
# Python package → dist-packages (importable system-wide), minus bytecode.
pkg_dst = stage / "usr/lib/python3/dist-packages/rigdoctor"
shutil.copytree(ROOT / "src" / "rigdoctor", pkg_dst,
ignore=shutil.ignore_patterns("__pycache__", "*.pyc"))
# Launchers.
_write(stage / "usr/bin/rigdoctor", _LAUNCHER.format(module="rigdoctor.cli"), 0o755)
_write(stage / "usr/bin/rigdoctor-gui", _LAUNCHER.format(module="rigdoctor.gui.app"), 0o755)
# Desktop entry + icon.
_write(stage / "usr/share/applications/rigdoctor.desktop", _DESKTOP)
icon = ROOT / "src" / "rigdoctor" / "gui" / "assets" / "rigdoctor.svg"
_write(stage / "usr/share/icons/hicolor/scalable/apps/rigdoctor.svg",
icon.read_text(encoding="utf-8"))
# Refresh the desktop database on install/remove (best-effort).
_write(stage / "DEBIAN/postinst",
"#!/bin/sh\nset -e\nupdate-desktop-database -q 2>/dev/null || true\n", 0o755)
_write(stage / "DEBIAN/postrm",
"#!/bin/sh\nset -e\nupdate-desktop-database -q 2>/dev/null || true\n", 0o755)
_write(stage / "DEBIAN/control",
_CONTROL.format(version=version, maintainer=MAINTAINER, homepage=HOMEPAGE))
out = DIST / f"rigdoctor_{version}_all.deb"
subprocess.run(["dpkg-deb", "--root-owner-group", "--build", str(stage), str(out)], check=True)
shutil.rmtree(stage)
return out
if __name__ == "__main__":
path = build()
print(f"built {path}")
sys.exit(0)
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "rigdoctor" name = "rigdoctor"
version = "0.30.0" version = "0.37.0"
description = "Modular hardware monitoring & crash diagnostics for Linux gamers." description = "Modular hardware monitoring & crash diagnostics for Linux gamers."
readme = "README.md" readme = "README.md"
requires-python = ">=3.11" requires-python = ">=3.11"
+1 -1
View File
@@ -1,3 +1,3 @@
"""RigDoctor — modular hardware monitoring & crash diagnostics for Linux gamers.""" """RigDoctor — modular hardware monitoring & crash diagnostics for Linux gamers."""
__version__ = "0.30.0" __version__ = "0.37.0"
+3 -2
View File
@@ -55,8 +55,9 @@ def cmd_gui(args) -> int:
from .gui.app import main as gui_main from .gui.app import main as gui_main
except ImportError as exc: except ImportError as exc:
print("The GUI needs PySide6, which isn't installed.") print("The GUI needs PySide6, which isn't installed.")
print(" Install it with: pip install 'rigdoctor[gui]'") print(" Ubuntu/Debian: sudo apt install python3-pyside6.qtwidgets "
print(" or on Ubuntu: sudo apt install python3-pyside6") "python3-pyside6.qtgui python3-pyside6.qtwebsockets python3-pyside6.qtsvg python3-pyte")
print(" pip: pip install 'rigdoctor[gui]'")
print(f" ({exc})") print(f" ({exc})")
return 2 return 2
return gui_main([sys.argv[0]]) return gui_main([sys.argv[0]])
+77
View File
@@ -150,6 +150,24 @@ def explain(findings_text: str, timeout: float = 120.0) -> tuple[bool, str]:
return False, f"Unexpected response from the AI provider: {exc}" return False, f"Unexpected response from the AI provider: {exc}"
def explain_stream(findings_text: str, on_chunk, timeout: float = 180.0) -> tuple[bool, str]:
"""Like :func:`explain`, but calls ``on_chunk(text_delta)`` as tokens arrive and returns
``(ok, full_text)`` at the end. Caller MUST be a direct user action (D24)."""
content = build_prompt(findings_text)
try:
if provider() == "claude":
return _claude_stream(content, on_chunk, timeout)
if provider() == "ollama":
return _ollama_stream(content, on_chunk, timeout)
return False, "No AI provider is configured (Settings → AI assistant)."
except urllib.error.HTTPError as exc:
return False, _http_error(exc)
except (urllib.error.URLError, OSError, TimeoutError) as exc:
return False, f"Couldn't reach the AI provider: {exc}"
except (ValueError, KeyError, IndexError) as exc:
return False, f"Unexpected response from the AI provider: {exc}"
def _post(url: str, payload: dict, headers: dict, timeout: float) -> dict: def _post(url: str, payload: dict, headers: dict, timeout: float) -> dict:
req = urllib.request.Request( req = urllib.request.Request(
url, data=json.dumps(payload).encode("utf-8"), url, data=json.dumps(payload).encode("utf-8"),
@@ -185,6 +203,65 @@ def _claude(content: str, timeout: float) -> tuple[bool, str]:
return True, text.strip() or "(the model returned no text)" return True, text.strip() or "(the model returned no text)"
def _stream_request(url: str, payload: dict, headers: dict, timeout: float):
req = urllib.request.Request(
url, data=json.dumps(payload).encode("utf-8"),
headers={"Content-Type": "application/json", **headers})
return urllib.request.urlopen(req, timeout=timeout)
def _ollama_stream(content: str, on_chunk, timeout: float) -> tuple[bool, str]:
if not model():
return False, "No Ollama model is set (Settings → AI assistant)."
payload = {"model": model(), "system": SYSTEM_PROMPT, "prompt": content, "stream": True}
parts: list[str] = []
with _stream_request(endpoint().rstrip("/") + "/api/generate", payload, {}, timeout) as resp:
for raw in resp: # newline-delimited JSON objects
line = raw.decode("utf-8", "replace").strip()
if not line:
continue
obj = json.loads(line)
chunk = obj.get("response", "")
if chunk:
parts.append(chunk)
on_chunk(chunk)
if obj.get("done"):
break
return True, "".join(parts).strip() or "(the model returned an empty response)"
def _claude_stream(content: str, on_chunk, timeout: float) -> tuple[bool, str]:
key = config.load_ai_key()
if not key:
return False, "No Claude API key is set (Settings → AI assistant)."
payload = {
"model": model(), "max_tokens": CLAUDE_MAX_TOKENS, "system": SYSTEM_PROMPT,
"messages": [{"role": "user", "content": content}], "stream": True,
}
headers = {"x-api-key": key, "anthropic-version": ANTHROPIC_VERSION}
parts: list[str] = []
with _stream_request(CLAUDE_ENDPOINT, payload, headers, timeout) as resp:
for raw in resp: # SSE: parse `data:` lines, accumulate text deltas
line = raw.decode("utf-8", "replace").strip()
if not line.startswith("data:"):
continue
try:
event = json.loads(line[5:].strip())
except ValueError:
continue
etype = event.get("type")
if etype == "content_block_delta" and event.get("delta", {}).get("type") == "text_delta":
chunk = event["delta"].get("text", "")
if chunk:
parts.append(chunk)
on_chunk(chunk)
elif etype == "error":
return False, event.get("error", {}).get("message", "stream error")
elif etype == "message_stop":
break
return True, "".join(parts).strip() or "(the model returned no text)"
def _http_error(exc: urllib.error.HTTPError) -> str: def _http_error(exc: urllib.error.HTTPError) -> str:
detail = "" detail = ""
try: try:
+41 -5
View File
@@ -1,8 +1,9 @@
"""Desktop alerts (M8): notify on overheat / GPU-lost / new version via notify-send. """Desktop alerts (M8): notify on overheat / GPU-lost / critical kernel events / new version.
Edge-triggered: an alert fires when a condition becomes true (not every sample), and Edge-triggered: a sustained condition (hot GPU, GPU-lost) fires once when it becomes true and
can fire again only after it has cleared and a cooldown has passed — so a hot GPU or a can re-fire only after it clears + a cooldown; momentary **kernel events** (Xid, OOM-kill, MCE,
1-Hz sample loop doesn't spam notifications. Degrades to a no-op if notify-send is absent. PCIe AER, disk I/O errors) are scanned from the kernel log every `event_interval` seconds and
fire one-shot (cooldown-gated). So a 1-Hz sample loop never spams. No-op if notify-send absent.
""" """
from __future__ import annotations from __future__ import annotations
@@ -57,13 +58,16 @@ def notify(title: str, message: str, urgency: str = "normal") -> bool:
class AlertMonitor: class AlertMonitor:
"""Evaluate samples and raise edge-triggered desktop alerts.""" """Evaluate samples and raise edge-triggered desktop alerts."""
def __init__(self, gpu_temp: float = 90.0, cpu_temp: float = 95.0, cooldown: float = 300.0): def __init__(self, gpu_temp: float = 90.0, cpu_temp: float = 95.0, cooldown: float = 300.0,
event_interval: float = 30.0):
self.gpu_temp = gpu_temp self.gpu_temp = gpu_temp
self.cpu_temp = cpu_temp self.cpu_temp = cpu_temp
self.cooldown = cooldown self.cooldown = cooldown
self.event_interval = event_interval # how often to scan the kernel log
self.enabled = True self.enabled = True
self._active: dict[str, bool] = {} self._active: dict[str, bool] = {}
self._last: dict[str, float] = {} self._last: dict[str, float] = {}
self._last_kernel_scan = time.time() # only alert on events after the monitor starts
def _fire(self, key: str, title: str, message: str, urgency: str = "critical") -> None: def _fire(self, key: str, title: str, message: str, urgency: str = "critical") -> None:
if self._active.get(key): if self._active.get(key):
@@ -75,9 +79,39 @@ class AlertMonitor:
self._last[key] = now self._last[key] = now
notify(title, message, urgency) notify(title, message, urgency)
def _notify_once(self, key: str, title: str, message: str, urgency: str = "critical") -> None:
"""One-shot alert for a momentary event (cooldown-gated, no active latch)."""
now = time.time()
if now - self._last.get(key, 0.0) < self.cooldown:
return
self._last[key] = now
notify(title, message, urgency)
def _clear(self, key: str) -> None: def _clear(self, key: str) -> None:
self._active[key] = False self._active[key] = False
def _scan_kernel_events(self) -> None:
"""Periodically scan the kernel log for new critical events (Xid/OOM/MCE/PCIe/disk)."""
now = time.time()
if now - self._last_kernel_scan < self.event_interval:
return
since = self._last_kernel_scan
self._last_kernel_scan = now
try:
from . import syslogs
text = syslogs.kernel_log(since=since)
except Exception: # alerting must never crash the sample loop
return
if not text:
return
seen: set[str] = set()
for label, line in syslogs.scan_critical(text):
if label in seen: # one alert per category per scan
continue
seen.add(label)
self._notify_once(f"kernel:{label}", label, line[:180])
def check(self, sample: Sample) -> None: def check(self, sample: Sample) -> None:
if not self.enabled: if not self.enabled:
return return
@@ -107,3 +141,5 @@ class AlertMonitor:
self._fire("gpu_lost", "GPU not responding", "nvidia-smi query timed out — the GPU may have dropped") self._fire("gpu_lost", "GPU not responding", "nvidia-smi query timed out — the GPU may have dropped")
else: else:
self._clear("gpu_lost") self._clear("gpu_lost")
self._scan_kernel_events() # Xid / OOM / MCE / PCIe / disk I/O from the kernel log
+17 -1
View File
@@ -51,7 +51,7 @@ def store(result, capture_path=None, since: float | None = None) -> Path | None:
if not enabled(): if not enabled():
return None return None
from ..render import render_summary from ..render import render_summary
from . import ai, gamelogs from . import ai, gamelogs, syslogs
target = _new_dir(getattr(result, "game", None)) target = _new_dir(getattr(result, "game", None))
@@ -80,6 +80,22 @@ def store(result, capture_path=None, since: float | None = None) -> Path | None:
_write(target / "gamelogs.txt", logs) _write(target / "gamelogs.txt", logs)
except OSError: except OSError:
pass pass
try:
sys_logs = syslogs.collect(since=since)
if sys_logs:
_write(target / "syslogs.txt", sys_logs)
except OSError:
pass
try: # full hardware/OS inventory (M5) — invaluable for larger debugging in a shared report
from . import inventory
sections = inventory.collect()
_write(target / "inventory.txt", inventory.render_text(sections))
_write(target / "inventory.json", inventory.render_json(sections))
except Exception: # inventory probes vary by machine; never let it break storage
pass
return target return target
+165
View File
@@ -0,0 +1,165 @@
"""Session-scoped system logs for diagnostics (M15): kernel, coredumps, NVIDIA, display.
Covers what the *system* logged when something went wrong, so the report bundle and the AI both
see it:
* kernel ring-buffer slice (`journalctl -k`) — Xid, OOM-killer, MCE, PCIe AER, thermal, hung tasks
* systemd-coredump records (`coredumpctl`) — did the game/wine dump core (SIGSEGV/ABRT), when
* an `nvidia-smi -q` snapshot — driver, throttle/clock-event reasons, clocks, power, temps, PCIe,
ECC + retired pages (point-in-time at diagnostic time)
* the display-server log — `Xorg.0.log` on X11, or the compositor's user-journal slice on Wayland
Best-effort and size-bounded: degrades silently if a tool is missing or access is denied. Stdlib only.
"""
from __future__ import annotations
import os
import re
import shutil
import subprocess
import time
from pathlib import Path
_MAX = 8000 # cap each log section so the prompt/report stays small
_NV_MAX = 10000 # nvidia-smi -q is structured + valuable; allow a bit more (head-truncated)
# Compositors whose user-journal entries are the "Wayland log" (OR-matched by journalctl).
_COMPOSITORS = ("gnome-shell", "mutter", "kwin_wayland", "Xwayland", "sway", "gamescope")
_XORG_LOGS = ("~/.local/share/xorg/Xorg.0.log", "/var/log/Xorg.0.log")
def _since_arg(since: float | None) -> str | None:
return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(since)) if since else None
def _run(cmd: list[str], timeout: float = 15.0) -> str:
try:
proc = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)
except (OSError, subprocess.SubprocessError):
return ""
return (proc.stdout or "").strip()
def kernel_log(since: float | None = None, max_bytes: int = _MAX) -> str:
if not shutil.which("journalctl"):
return ""
cmd = ["journalctl", "-k", "--no-pager"]
since_arg = _since_arg(since)
if since_arg:
cmd += ["--since", since_arg]
out = _run(cmd)
if not out or out.strip().lower() == "-- no entries --": # journalctl's empty marker
return ""
return out[-max_bytes:]
def coredumps(since: float | None = None, max_bytes: int = _MAX) -> str:
if not shutil.which("coredumpctl"):
return ""
cmd = ["coredumpctl", "list", "--no-pager"]
since_arg = _since_arg(since)
if since_arg:
cmd += ["--since", since_arg]
out = _run(cmd)
if not out or "no coredumps" in out.lower():
return ""
return out[-max_bytes:]
def nvidia_snapshot(max_bytes: int = _NV_MAX) -> str:
"""Point-in-time `nvidia-smi -q` (head-truncated — driver/temps/clocks/ECC sit near the top)."""
if not shutil.which("nvidia-smi"):
return ""
out = _run(["nvidia-smi", "-q"])
return out[:max_bytes] if out else ""
def _xorg_log() -> Path | None:
for cand in _XORG_LOGS:
path = Path(os.path.expanduser(cand))
if path.exists():
return path
return None
def _session_type() -> str:
declared = os.environ.get("XDG_SESSION_TYPE", "").lower()
if declared in ("x11", "wayland"):
return declared
if os.environ.get("WAYLAND_DISPLAY"):
return "wayland"
return "x11" if _xorg_log() else "unknown"
def _tail_file(path: Path, max_bytes: int) -> str:
try:
size = path.stat().st_size
with path.open("rb") as fh:
if size > max_bytes:
fh.seek(size - max_bytes)
return fh.read().decode("utf-8", "replace")
except OSError:
return ""
def display_log(since: float | None = None, max_bytes: int = _MAX) -> str:
"""Xorg.0.log on X11, or the compositor's user-journal slice on Wayland ('' if none)."""
if _session_type() == "wayland":
if not shutil.which("journalctl"):
return ""
cmd = ["journalctl", "--user", "--no-pager"]
since_arg = _since_arg(since)
if since_arg:
cmd += ["--since", since_arg]
cmd += [f"_COMM={comp}" for comp in _COMPOSITORS] # OR-matched
out = _run(cmd)
if not out or out.strip().lower() == "-- no entries --":
return ""
return out[-max_bytes:]
log = _xorg_log() # X11: Xorg log isn't wall-clock-timestamped, so tail rather than scope
return _tail_file(log, max_bytes) if log else ""
# Kernel-log patterns worth alerting on in real time (M8 event alerts). (label, regex).
_CRITICAL = [
("GPU error (Xid)", re.compile(r"NVRM:\s*Xid", re.I)),
("Out of memory", re.compile(r"out of memory|oom-kill|killed process \d+", re.I)),
("CPU machine-check", re.compile(r"\bmce:|machine check", re.I)),
("PCIe error", re.compile(r"\bAER:|pcie bus error", re.I)),
("Disk I/O error", re.compile(
r"buffer i/o error|\bi/o error\b|critical medium error|ext4-fs error|"
r"blk_update_request:.*error|ata\d+.*(?:failed|error)", re.I)),
]
def scan_critical(text: str) -> list[tuple[str, str]]:
"""(label, line) for kernel lines matching a critical pattern (first match per line)."""
events: list[tuple[str, str]] = []
for line in text.splitlines():
for label, pat in _CRITICAL:
if pat.search(line):
events.append((label, line.strip()))
break
return events
def available() -> bool:
return bool(shutil.which("journalctl") or shutil.which("coredumpctl")
or shutil.which("nvidia-smi") or _xorg_log())
def collect(since: float | None = None) -> str:
"""Kernel + coredumps + NVIDIA snapshot + display log as one labelled block ('' if none)."""
sections: list[str] = []
kern = kernel_log(since)
if kern:
sections.append(f"--- Kernel log (journalctl -k) ---\n{kern}")
cores = coredumps(since)
if cores:
sections.append(f"--- Crashed processes (coredumpctl) ---\n{cores}")
nvidia = nvidia_snapshot()
if nvidia:
sections.append(f"--- NVIDIA snapshot (nvidia-smi -q) ---\n{nvidia}")
display = display_log(since)
if display:
sections.append(f"--- Display server log ({_session_type()}) ---\n{display}")
return "\n\n".join(sections)
+42 -16
View File
@@ -5,7 +5,7 @@ from __future__ import annotations
import threading import threading
from PySide6.QtCore import Qt, Signal from PySide6.QtCore import Qt, Signal
from PySide6.QtGui import QFont from PySide6.QtGui import QFont, QTextCursor
from PySide6.QtWidgets import ( from PySide6.QtWidgets import (
QDialog, QDialog,
QFrame, QFrame,
@@ -24,11 +24,15 @@ from .widgets import finding_card
class DiagnosticDialog(QDialog): class DiagnosticDialog(QDialog):
_explained = Signal(object) # (ok, text) from a user-triggered AI explanation _chunk = Signal(str) # streamed token delta (worker thread -> GUI)
_explained = Signal(object) # (ok, full_text) when the AI stream finishes
def __init__(self, result, parent=None) -> None: def __init__(self, result, parent=None) -> None:
super().__init__(parent) super().__init__(parent)
self._result = result self._result = result
self._stream_view = None
self._stream_status = None
self._chunk.connect(self._on_chunk)
self._explained.connect(self._on_explained) self._explained.connect(self._on_explained)
self.setWindowTitle(f"Diagnostic — {result.game}" if result.game else "Diagnostic") self.setWindowTitle(f"Diagnostic — {result.game}" if result.game else "Diagnostic")
self.resize(660, 680) self.resize(660, 680)
@@ -97,7 +101,7 @@ class DiagnosticDialog(QDialog):
buttons.addWidget(close) buttons.addWidget(close)
root.addLayout(buttons) root.addLayout(buttons)
# --- AI explanation (M14, D24) — runs only on this button press ---------------- # --- AI explanation (M14, D24) — streamed; runs only on this button press ----------
def _explain_with_ai(self) -> None: def _explain_with_ai(self) -> None:
from ..core import ai from ..core import ai
@@ -111,11 +115,14 @@ class DiagnosticDialog(QDialog):
if confirm != QMessageBox.StandardButton.Yes: if confirm != QMessageBox.StandardButton.Yes:
return return
self._explain_btn.setEnabled(False) self._explain_btn.setEnabled(False)
self._explain_btn.setText("Asking the AI…") dialog = self._open_stream_dialog()
threading.Thread(target=self._work_explain, daemon=True).start() threading.Thread(target=self._work_explain, daemon=True).start()
dialog.exec() # streaming fills the view live via signals during this nested loop
self._stream_view = self._stream_status = None
self._explain_btn.setEnabled(True)
def _work_explain(self) -> None: def _work_explain(self) -> None:
from ..core import ai, gamelogs from ..core import ai, gamelogs, syslogs
result = self._result result = self._result
summary = result.summary summary = result.summary
@@ -139,8 +146,12 @@ class DiagnosticDialog(QDialog):
logs = gamelogs.collect(since=since) # scoped to this session logs = gamelogs.collect(since=since) # scoped to this session
if logs: if logs:
lines.append("\nGame/Proton/Steam logs for this session:\n" + logs) lines.append("\nGame/Proton/Steam logs for this session:\n" + logs)
sys_logs = syslogs.collect(since=since) # kernel log + crashed-process records
if sys_logs:
lines.append("\nSystem logs for this session (kernel + crashed processes):\n" + sys_logs)
text = "\n".join(lines) text = "\n".join(lines)
ok, reply = ai.explain(text)
ok, reply = ai.explain_stream(text, on_chunk=lambda d: self._chunk.emit(d))
if result.dir: # record exactly what was sent, the model, and the reply (M15) if result.dir: # record exactly what was sent, the model, and the reply (M15)
from ..core import diagstore from ..core import diagstore
diagstore.record_ai( diagstore.record_ai(
@@ -149,11 +160,24 @@ class DiagnosticDialog(QDialog):
response=reply if ok else f"[error] {reply}") response=reply if ok else f"[error] {reply}")
self._explained.emit((ok, reply)) self._explained.emit((ok, reply))
def _on_chunk(self, delta: str) -> None:
if self._stream_view is None:
return
self._stream_view.moveCursor(QTextCursor.MoveOperation.End)
self._stream_view.insertPlainText(delta) # live plain text as tokens arrive
self._stream_view.ensureCursorVisible()
def _on_explained(self, result) -> None: def _on_explained(self, result) -> None:
ok, text = result ok, text = result
self._explain_btn.setEnabled(True) if self._stream_view is not None:
self._explain_btn.setText("Explain with AI") if ok:
self._show_explanation(text if ok else f"AI explanation failed:\n\n{text}") self._stream_view.setMarkdown(text) # re-render the finished answer as Markdown
else:
self._stream_view.setPlainText(f"AI explanation failed:\n\n{text}")
if self._stream_status is not None:
self._stream_status.setText(
"AI-generated suggestions — verify before acting, especially anything that changes "
"settings or data." if ok else "The request failed.")
# --- Report bundle (M15) ------------------------------------------------------ # --- Report bundle (M15) ------------------------------------------------------
def _make_report(self) -> None: def _make_report(self) -> None:
@@ -180,7 +204,8 @@ class DiagnosticDialog(QDialog):
if box.clickedButton() is open_btn: if box.clickedButton() is open_btn:
QDesktopServices.openUrl(QUrl.fromLocalFile(str(out.parent))) QDesktopServices.openUrl(QUrl.fromLocalFile(str(out.parent)))
def _show_explanation(self, text: str) -> None: def _open_stream_dialog(self) -> QDialog:
"""A live dialog the AI streams into; finalized to rendered Markdown when done."""
from ..core import ai from ..core import ai
dlg = QDialog(self) dlg = QDialog(self)
@@ -190,14 +215,15 @@ class DiagnosticDialog(QDialog):
view = QTextEdit() view = QTextEdit()
view.setObjectName("Report") view.setObjectName("Report")
view.setReadOnly(True) view.setReadOnly(True)
view.setMarkdown(text) # the model replies in Markdown — render it
lay.addWidget(view) lay.addWidget(view)
note = QLabel("AI-generated suggestions — verify before acting, especially anything that changes settings or data.") status = QLabel("Streaming from the model…")
note.setObjectName("Muted") status.setObjectName("Muted")
note.setWordWrap(True) status.setWordWrap(True)
lay.addWidget(note) lay.addWidget(status)
close = QPushButton("Close") close = QPushButton("Close")
close.setObjectName("PrimaryButton") close.setObjectName("PrimaryButton")
close.clicked.connect(dlg.accept) close.clicked.connect(dlg.accept)
lay.addWidget(close, alignment=Qt.AlignmentFlag.AlignRight) lay.addWidget(close, alignment=Qt.AlignmentFlag.AlignRight)
dlg.exec() self._stream_view = view
self._stream_status = status
return dlg
+35 -5
View File
@@ -20,6 +20,7 @@ from PySide6.QtWidgets import (
QMainWindow, QMainWindow,
QMessageBox, QMessageBox,
QPushButton, QPushButton,
QScrollArea,
QStackedWidget, QStackedWidget,
QSystemTrayIcon, QSystemTrayIcon,
QTextEdit, QTextEdit,
@@ -51,6 +52,10 @@ _NAV = [
("App", ["Settings", "Share"]), ("App", ["Settings", "Share"]),
] ]
_PAGES = [name for _section, names in _NAV for name in names] _PAGES = [name for _section, names in _NAV for name in names]
# Pages that manage their own scrolling (pinned header + inner scroll) or must fill the
# viewport (the Share terminal) — these are added to the stack as-is; every other page is
# wrapped in a QScrollArea so it scrolls when too tall and doesn't pin the window's height.
_NO_WRAP = {"Dashboard", "System Health", "Inventory", "Share"}
_ICON = Path(__file__).parent / "assets" / "rigdoctor.svg" _ICON = Path(__file__).parent / "assets" / "rigdoctor.svg"
@@ -68,7 +73,11 @@ class MainWindow(QMainWindow):
central = QWidget() central = QWidget()
self.setCentralWidget(central) self.setCentralWidget(central)
layout = QHBoxLayout(central) outer = QVBoxLayout(central)
outer.setContentsMargins(0, 0, 0, 0)
outer.setSpacing(0)
body = QWidget()
layout = QHBoxLayout(body)
layout.setContentsMargins(0, 0, 0, 0) layout.setContentsMargins(0, 0, 0, 0)
layout.setSpacing(0) layout.setSpacing(0)
@@ -100,11 +109,14 @@ class MainWindow(QMainWindow):
"Share": self.share_page, "Share": self.share_page,
} }
for name in _PAGES: for name in _PAGES:
self._stack.addWidget(self._pages[name]) page = self._pages[name]
self._stack.addWidget(page if name in _NO_WRAP else self._scrollable(page))
content_layout.addWidget(self._stack) content_layout.addWidget(self._stack)
layout.addWidget(self._build_sidebar()) layout.addWidget(self._build_sidebar())
layout.addWidget(content, 1) layout.addWidget(content, 1)
outer.addWidget(body, 1)
outer.addWidget(self._build_footer())
self._worker = SamplerWorker(interval=interval) self._worker = SamplerWorker(interval=interval)
self._worker.sampled.connect(self.dashboard.update_sample) self._worker.sampled.connect(self.dashboard.update_sample)
@@ -216,9 +228,6 @@ class MainWindow(QMainWindow):
v.addStretch(1) v.addStretch(1)
live = QLabel(f'<span style="color:{ACCENT};">●</span> <span style="color:{MUTED};">Live</span>') live = QLabel(f'<span style="color:{ACCENT};">●</span> <span style="color:{MUTED};">Live</span>')
v.addWidget(live) v.addWidget(live)
version = QLabel(f"v{__version__}")
version.setObjectName("Muted")
v.addWidget(version)
changelog_btn = QPushButton("Changelog") changelog_btn = QPushButton("Changelog")
changelog_btn.setObjectName("LinkButton") changelog_btn.setObjectName("LinkButton")
changelog_btn.setCursor(Qt.CursorShape.PointingHandCursor) changelog_btn.setCursor(Qt.CursorShape.PointingHandCursor)
@@ -248,6 +257,27 @@ class MainWindow(QMainWindow):
v.addWidget(self._restart_btn) v.addWidget(self._restart_btn)
return bar return bar
def _scrollable(self, page: QWidget) -> QScrollArea:
"""Wrap a page so it scrolls when taller than the window — and so the window can shrink
below the page's natural height instead of being pinned to it."""
area = QScrollArea()
area.setWidget(page)
area.setWidgetResizable(True)
area.setFrameShape(QFrame.Shape.NoFrame)
area.setHorizontalScrollBarPolicy(Qt.ScrollBarPolicy.ScrollBarAlwaysOff)
return area
def _build_footer(self) -> QFrame:
bar = QFrame()
bar.setObjectName("Footer")
h = QHBoxLayout(bar)
h.setContentsMargins(14, 5, 16, 5)
h.addStretch(1)
version = QLabel(f"RigDoctor v{__version__}")
version.setObjectName("Muted")
h.addWidget(version)
return bar
def _restart(self) -> None: def _restart(self) -> None:
gui = os.path.join(os.path.dirname(sys.executable), "rigdoctor-gui") gui = os.path.join(os.path.dirname(sys.executable), "rigdoctor-gui")
if os.path.exists(gui): if os.path.exists(gui):
+2 -1
View File
@@ -114,7 +114,8 @@ class SetupPage(QWidget):
grid.addWidget(QLabel("CPU temperature alert"), 1, 0) grid.addWidget(QLabel("CPU temperature alert"), 1, 0)
grid.addWidget(self._cpu_alert, 1, 1) grid.addWidget(self._cpu_alert, 1, 1)
alerts_layout.addLayout(grid) alerts_layout.addLayout(grid)
alerts_note = QLabel("GPU-lost and new-version alerts are included whenever notifications are enabled.") alerts_note = QLabel("GPU-lost, critical kernel events (Xid, out-of-memory, disk I/O, PCIe), "
"and new-version alerts are included whenever notifications are enabled.")
alerts_note.setObjectName("Muted") alerts_note.setObjectName("Muted")
alerts_note.setWordWrap(True) alerts_note.setWordWrap(True)
alerts_layout.addWidget(alerts_note) alerts_layout.addWidget(alerts_note)
+2
View File
@@ -68,6 +68,8 @@ QMainWindow, #ContentArea, #Page {{ background: {BG}; }}
QLabel {{ background: transparent; }} QLabel {{ background: transparent; }}
#Sidebar {{ background: {SIDEBAR}; border-right: 1px solid {CARD_BORDER}; }} #Sidebar {{ background: {SIDEBAR}; border-right: 1px solid {CARD_BORDER}; }}
#Footer {{ background: {SIDEBAR}; border-top: 1px solid {CARD_BORDER}; }}
#Footer QLabel {{ font-size: 11px; }}
#AppTitle {{ font-size: 17px; font-weight: 800; }} #AppTitle {{ font-size: 17px; font-weight: 800; }}
#AppSubtitle {{ color: {MUTED}; font-size: 11px; }} #AppSubtitle {{ color: {MUTED}; font-size: 11px; }}
+46
View File
@@ -114,5 +114,51 @@ class ExplainTests(unittest.TestCase):
self.assertEqual(headers["x-api-key"], "sk-ant-x") self.assertEqual(headers["x-api-key"], "sk-ant-x")
class _FakeResp:
"""A context-managed iterable of byte lines, like urlopen() returns."""
def __init__(self, lines):
self._lines = [l.encode("utf-8") for l in lines]
def __enter__(self):
return iter(self._lines)
def __exit__(self, *a):
return False
class StreamTests(unittest.TestCase):
def _cfg(self, **over):
base = {"ai_provider": "", "ai_model": "", "ai_endpoint": "http://localhost:11434"}
base.update(over)
return base
def test_ollama_stream_accumulates_and_callbacks(self):
lines = ['{"response": "It is ", "done": false}',
'{"response": "the PSU.", "done": false}',
'{"response": "", "done": true}']
chunks = []
with mock.patch.object(ai.config, "load_config",
return_value=self._cfg(ai_provider="ollama", ai_model="qwen2.5:7b")), \
mock.patch.object(ai, "_stream_request", return_value=_FakeResp(lines)):
ok, full = ai.explain_stream("Xid 79", on_chunk=chunks.append)
self.assertTrue(ok)
self.assertEqual(full, "It is the PSU.")
self.assertEqual(chunks, ["It is ", "the PSU."])
def test_claude_stream_parses_sse(self):
lines = [
'event: content_block_delta',
'data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Failing "}}',
'data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"disk."}}',
'data: {"type":"message_stop"}',
]
chunks = []
with mock.patch.object(ai.config, "load_config", return_value=self._cfg(ai_provider="claude")), \
mock.patch.object(ai.config, "load_ai_key", return_value="sk-ant-x"), \
mock.patch.object(ai, "_stream_request", return_value=_FakeResp(lines)):
ok, full = ai.explain_stream("SMART 197", on_chunk=chunks.append)
self.assertTrue(ok)
self.assertEqual(full, "Failing disk.")
self.assertEqual(chunks, ["Failing ", "disk."])
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()
+30
View File
@@ -34,5 +34,35 @@ class AlertTests(unittest.TestCase):
m.assert_called_once() m.assert_called_once()
class KernelEventAlertTests(unittest.TestCase):
@mock.patch.object(alerts, "notify")
def test_kernel_event_fires_once_within_cooldown(self, m):
mon = alerts.AlertMonitor(cooldown=300.0, event_interval=0.0)
mon._last_kernel_scan = 0.0 # force a scan
with mock.patch("rigdoctor.core.syslogs.kernel_log",
return_value="NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus"):
mon._scan_kernel_events()
mon._last_kernel_scan = 0.0 # force another scan — cooldown must suppress it
mon._scan_kernel_events()
self.assertEqual(m.call_count, 1)
self.assertIn("Xid", m.call_args[0][0])
@mock.patch.object(alerts, "notify")
def test_no_alert_when_kernel_log_empty(self, m):
mon = alerts.AlertMonitor(event_interval=0.0)
mon._last_kernel_scan = 0.0
with mock.patch("rigdoctor.core.syslogs.kernel_log", return_value=""):
mon._scan_kernel_events()
m.assert_not_called()
@mock.patch.object(alerts, "notify")
def test_scan_gated_by_interval(self, m):
mon = alerts.AlertMonitor(event_interval=9999.0) # just constructed → not due yet
with mock.patch("rigdoctor.core.syslogs.kernel_log", return_value="NVRM: Xid 79") as kl:
mon._scan_kernel_events()
kl.assert_not_called()
m.assert_not_called()
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()
+4
View File
@@ -47,11 +47,15 @@ class StoreTests(unittest.TestCase):
with mock.patch.object(diagstore, "enabled", return_value=True), \ with mock.patch.object(diagstore, "enabled", return_value=True), \
mock.patch("rigdoctor.render.render_summary", return_value="SUMMARY-TEXT"), \ mock.patch("rigdoctor.render.render_summary", return_value="SUMMARY-TEXT"), \
mock.patch("rigdoctor.core.gamelogs.collect", return_value="LOG-TEXT"), \ mock.patch("rigdoctor.core.gamelogs.collect", return_value="LOG-TEXT"), \
mock.patch("rigdoctor.core.syslogs.collect", return_value="SYS-LOG"), \
mock.patch("rigdoctor.core.inventory.collect", return_value=[]), \
mock.patch.object(diagstore.config, "DIAGNOSTICS_DIR", self.tmp / "diagnostics"): mock.patch.object(diagstore.config, "DIAGNOSTICS_DIR", self.tmp / "diagnostics"):
directory = diagstore.store(FakeResult()) directory = diagstore.store(FakeResult())
self.assertTrue((directory / "result.json").exists()) self.assertTrue((directory / "result.json").exists())
self.assertTrue((directory / "report.txt").exists()) self.assertTrue((directory / "report.txt").exists())
self.assertEqual((directory / "gamelogs.txt").read_text(), "LOG-TEXT") self.assertEqual((directory / "gamelogs.txt").read_text(), "LOG-TEXT")
self.assertEqual((directory / "syslogs.txt").read_text(), "SYS-LOG")
self.assertTrue((directory / "inventory.txt").exists()) # inventory included for debugging
data = json.loads((directory / "result.json").read_text()) data = json.loads((directory / "result.json").read_text())
self.assertEqual(data["game"], "Path of Exile 2") self.assertEqual(data["game"], "Path of Exile 2")
self.assertEqual(len(data["findings"]), 1) self.assertEqual(len(data["findings"]), 1)
+114
View File
@@ -0,0 +1,114 @@
"""Tests for M15 session-scoped system-log collection (kernel + coredumps)."""
import unittest
from unittest import mock
from rigdoctor.core import syslogs
class KernelLogTests(unittest.TestCase):
def test_passes_since_and_tails(self):
with mock.patch("shutil.which", return_value="/usr/bin/journalctl"), \
mock.patch.object(syslogs, "_run", return_value="X" * 100 + "TAILLINE") as run:
out = syslogs.kernel_log(since=1_000_000_000, max_bytes=8)
self.assertEqual(out, "TAILLINE")
cmd = run.call_args[0][0]
self.assertIn("-k", cmd)
self.assertIn("--since", cmd)
def test_missing_tool_returns_empty(self):
with mock.patch("shutil.which", return_value=None):
self.assertEqual(syslogs.kernel_log(), "")
class CoredumpTests(unittest.TestCase):
def test_empty_when_no_coredumps(self):
with mock.patch("shutil.which", return_value="/usr/bin/coredumpctl"), \
mock.patch.object(syslogs, "_run", return_value="No coredumps found."):
self.assertEqual(syslogs.coredumps(), "")
def test_returns_list(self):
with mock.patch("shutil.which", return_value="/usr/bin/coredumpctl"), \
mock.patch.object(syslogs, "_run", return_value="TIME PID SIG EXE\n... SEGV PathOfExile"):
out = syslogs.coredumps()
self.assertIn("PathOfExile", out)
class NvidiaTests(unittest.TestCase):
def test_missing_tool(self):
with mock.patch("shutil.which", return_value=None):
self.assertEqual(syslogs.nvidia_snapshot(), "")
def test_snapshot_head_truncated(self):
with mock.patch("shutil.which", return_value="/usr/bin/nvidia-smi"), \
mock.patch.object(syslogs, "_run", return_value="DRIVER\n" + "x" * 99999):
out = syslogs.nvidia_snapshot(max_bytes=10)
self.assertEqual(out, "DRIVER\nxxx") # head, not tail
class DisplayTests(unittest.TestCase):
def test_session_type_env(self):
with mock.patch.dict("os.environ", {"XDG_SESSION_TYPE": "wayland"}):
self.assertEqual(syslogs._session_type(), "wayland")
def test_x11_tails_xorg_log(self):
import tempfile
from pathlib import Path
log = Path(tempfile.mkdtemp()) / "Xorg.0.log"
log.write_text("(EE) NVIDIA(GPU-0): something failed")
with mock.patch.object(syslogs, "_session_type", return_value="x11"), \
mock.patch.object(syslogs, "_xorg_log", return_value=log):
out = syslogs.display_log()
self.assertIn("(EE) NVIDIA", out)
def test_wayland_uses_user_journal(self):
with mock.patch.object(syslogs, "_session_type", return_value="wayland"), \
mock.patch("shutil.which", return_value="/usr/bin/journalctl"), \
mock.patch.object(syslogs, "_run", return_value="gnome-shell: GPU error") as run:
out = syslogs.display_log(since=1_000_000_000)
self.assertIn("GPU error", out)
cmd = run.call_args[0][0]
self.assertIn("--user", cmd)
self.assertTrue(any(a.startswith("_COMM=") for a in cmd))
class ScanCriticalTests(unittest.TestCase):
def test_matches_each_category(self):
text = "\n".join([
"NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus",
"Out of memory: Killed process 1234 (PathOfExile)",
"mce: [Hardware Error]: CPU 0",
"pcieport 0000:00:01.0: AER: Corrected error received",
"blk_update_request: I/O error, dev sda, sector 99",
"this is a perfectly normal line",
])
labels = {label for label, _ in syslogs.scan_critical(text)}
self.assertEqual(labels, {
"GPU error (Xid)", "Out of memory", "CPU machine-check",
"PCIe error", "Disk I/O error"})
def test_clean_log_no_events(self):
self.assertEqual(syslogs.scan_critical("usb 1-2: new high-speed device\nsystemd: started"), [])
class CollectTests(unittest.TestCase):
def test_collect_combines_sections(self):
with mock.patch.object(syslogs, "kernel_log", return_value="NVRM: Xid 79"), \
mock.patch.object(syslogs, "coredumps", return_value="game SIGSEGV"), \
mock.patch.object(syslogs, "nvidia_snapshot", return_value="Driver Version 595"), \
mock.patch.object(syslogs, "display_log", return_value="(EE) NVIDIA"):
out = syslogs.collect()
for needle in ("Kernel log", "Xid 79", "Crashed processes", "SIGSEGV",
"NVIDIA snapshot", "595", "Display server log"):
self.assertIn(needle, out)
def test_collect_empty_when_nothing(self):
with mock.patch.object(syslogs, "kernel_log", return_value=""), \
mock.patch.object(syslogs, "coredumps", return_value=""), \
mock.patch.object(syslogs, "nvidia_snapshot", return_value=""), \
mock.patch.object(syslogs, "display_log", return_value=""):
self.assertEqual(syslogs.collect(), "")
if __name__ == "__main__":
unittest.main()