Compare commits

..

8 Commits

Author SHA1 Message Date
jessey 944945ce72 Merge pull request 'feat(m9): .deb package + CI build/publish — 0.35.0' (#29) from feat/deb-packaging into main
release / test (push) Successful in 13s
release / release (push) Successful in 19s
Reviewed-on: #29
2026-05-22 13:17:19 +00:00
jessey 78cd417d0b feat(m9): .deb package + CI build/publish — 0.35.0
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 28s
packaging/make_deb.py builds rigdoctor_<ver>_all.deb (Architecture: all) via
dpkg-deb, no debhelper: Depends python3; Recommends python3-pyside6/pyte (GUI by
default, --no-install-recommends = CLI only). Installs the package, both
launchers, desktop entry + icon; postinst refreshes the desktop database.
release.yml builds it as a release asset and optionally pushes to the Gitea apt
registry (REGISTRY_TOKEN). Verified locally: valid .deb, packaged launcher runs
'rigdoctor --version'. Docs/README/ROADMAP/MODULES updated; M9 complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:15:33 +02:00
jessey 856a3305ad Merge pull request 'feat(m8): event-based alerts — Xid/OOM/MCE/PCIe/disk from the kernel log — 0.34.0' (#28) from feat/event-alerts into main
release / test (push) Successful in 13s
release / release (push) Successful in 15s
Reviewed-on: #28
2026-05-22 12:48:41 +00:00
jessey 3b1a2e7393 Merge branch 'feat/event-alerts' of ssh://jesseyvanofferen.com:2222/jessey/rigdoctor into feat/event-alerts
tests / core (pull_request) Successful in 11s
tests / gui-smoke (pull_request) Successful in 26s
2026-05-22 14:42:53 +02:00
jessey 2989e8e23e ci: run tests.yml on pull_request only (no push) to avoid double runs
A branch with an open PR triggered both the push and pull_request events, running
every job twice. Trigger on pull_request only; pushes to main are already tested
by release.yml's `test` job. No version bump (CI config only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:42:41 +02:00
jessey 670df23e06 Merge branch 'main' into feat/event-alerts
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 26s
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 26s
2026-05-22 12:41:34 +00:00
jessey 2ee7763d00 feat(m8): event-based alerts — Xid/OOM/MCE/PCIe/disk from the kernel log — 0.34.0
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 27s
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 26s
AlertMonitor now scans the kernel log (journalctl -k) every ~30s and fires
one-shot, cooldown-gated desktop alerts on critical events: NVIDIA Xid, OOM
kills, CPU machine-checks, PCIe AER, and disk I/O errors — so users are warned
the moment something goes wrong, not only on a temperature threshold. Disk I/O
errors come from the kernel log (no root needed, unlike smartctl). Edge/spam
protection reuses the existing cooldown model. syslogs.scan_critical() does the
matching; init seeds last-scan to "now" so old boot logs don't alert on launch.
Tests for the matcher + monitor gating/cooldown; Settings note updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 14:41:13 +02:00
jessey bd6cad5a42 Merge pull request 'feat(ai): stream explanations live (Ollama NDJSON + Claude SSE) — 0.33.0' (#27) from feat/syslogs into main
release / test (push) Successful in 12s
tests / core (push) Successful in 12s
tests / gui-smoke (push) Successful in 25s
release / release (push) Successful in 15s
Reviewed-on: #27
2026-05-22 12:35:11 +00:00
14 changed files with 303 additions and 15 deletions
+20
View File
@@ -43,6 +43,9 @@ jobs:
- name: Build self-extracting installer (.run)
run: python packaging/make_run.py
- name: Build .deb
run: python packaging/make_deb.py
- name: Read version
id: ver
run: |
@@ -103,3 +106,20 @@ jobs:
"${API}/releases/${rid}/assets?name=$(basename "$f")" >/dev/null
done
echo "Published ${TAG}."
- name: Publish .deb to the Gitea apt registry (optional — needs REGISTRY_TOKEN)
env:
PKG_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
run: |
set -euo pipefail
if [ -z "${PKG_TOKEN:-}" ]; then
echo "PACKAGES_TOKEN not set — skipping apt publish (the .deb is still a release asset)."
exit 0
fi
OWNER="${{ github.repository_owner }}"
URL="${{ github.server_url }}/api/packages/${OWNER}/debian/pool/stable/main/upload"
for f in dist/*.deb; do
echo "Uploading $(basename "$f") to the apt registry…"
curl -sS --fail --user "${OWNER}:${PKG_TOKEN}" --upload-file "$f" "$URL"
done
echo "apt source: deb ${{ github.server_url }}/api/packages/${OWNER}/debian stable main"
+4 -3
View File
@@ -1,14 +1,15 @@
name: tests
run-name: Run test suite
# Runs the unittest suite on every push and pull request. Two jobs:
# Runs the unittest suite on pull requests (once per PR). Pushes to main are covered by the
# `test` job in release.yml, so we don't trigger on push here — that would double every run.
# Two jobs:
# core — stdlib-only install; the GUI tests skip (@skipUnless HAVE_QT). Bulletproof.
# gui-smoke — installs the GUI extra + offscreen Qt libs and runs the same suite headless,
# exercising the MainWindow/SetupWizard/DiagnosticDialog construction tests.
# Make `core` a required status check on `main` so a PR can't merge with failing tests.
# Make `tests / core (pull_request)` a required status check on `main` so a PR can't merge red.
on:
push:
pull_request:
jobs:
+18
View File
@@ -5,6 +5,24 @@ All notable changes to RigDoctor are recorded here. Format follows
(`MAJOR.MINOR.PATCH`, pre-1.0). `__version__` and `pyproject.toml` must match the git
release tag (so the auto-updater, D18, can compare versions).
## [0.35.0] - 2026-05-22
### Added
- **`.deb` package (M9 / D8)** — `packaging/make_deb.py` builds a `rigdoctor_<version>_all.deb`
(pure-Python, `Architecture: all`) via `dpkg-deb`: `Depends: python3`, with the GUI deps
(`python3-pyside6`, `python3-pyte`) as **Recommends** so `sudo apt install ./rigdoctor_*.deb`
gives the full app and `--no-install-recommends` gives CLI-only. Installs the package, both
launchers, the desktop entry, and the icon. CI (`release.yml`) builds it as a **release asset**
every release, and optionally publishes it to the Gitea **apt registry** (set a `REGISTRY_TOKEN`
secret) for `sudo apt install rigdoctor`. **M9 is now complete.**
## [0.34.0] - 2026-05-22
### Added
- **Event-based alerts (M8).** Beyond temperature + GPU-lost, RigDoctor now notifies on
**critical kernel events** — Xid (GPU error), out-of-memory kills, CPU machine-checks, PCIe
AER errors, and disk I/O errors — scanned from the kernel log every ~30s while monitoring and
fired one-shot (cooldown-gated, so no spam). A proactive warning the moment something goes
wrong, not just on a temperature threshold. Included whenever desktop notifications are on.
## [0.33.0] - 2026-05-22
### Added
- **AI explanations stream live.** "Explain with AI" now fills token-by-token as the model
+20
View File
@@ -78,6 +78,26 @@ also ships a one-file **`.run`** installer (download, `chmod +x`, run). Updates
accounts on the Git server (a Personal Access Token); save one via the GUI **Setup → Update
access** panel or `rigdoctor login`, then `rigdoctor update` (or the sidebar button).
## Install (`.deb`, system-wide)
Each release also ships a **`.deb`** (`Architecture: all`, M9/D8). Download it from the release
and install with apt (pulls the GUI deps — PySide6/pyte — via Recommends):
```bash
sudo apt install ./rigdoctor_<version>_all.deb # CLI-only: add --no-install-recommends
```
When the apt registry is enabled on the server, you can instead add it as a source and
`sudo apt update && sudo apt install rigdoctor` (with `apt upgrade` for updates):
```bash
curl -fsSL https://git.jesseyvanofferen.com/api/packages/jessey/debian/repository.key \
| sudo tee /etc/apt/keyrings/gitea-rigdoctor.asc > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/gitea-rigdoctor.asc] \
https://git.jesseyvanofferen.com/api/packages/jessey/debian stable main" \
| sudo tee /etc/apt/sources.list.d/rigdoctor.list
```
## Run it (dev)
Stdlib-only, no install needed (target is Python ≥ 3.11; tested on 3.14):
+1 -1
View File
@@ -18,7 +18,7 @@ Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
| M6 | Gaming env checks | Diagnostics | none | all | P2 | 🟨 |
| M10 | Desktop GUI | Desktop UI | **python3-pyside6** | all | P2 | ✅ |
| M11 | Tray / menu-bar applet | Desktop UI | **python3-pyside6** (+ AppIndicator on GNOME) | all | P2 | ✅ |
| M9 | Installer | (meta) | none | all | P1 | 🟨 |
| M9 | Installer (+ `.deb`) | (meta) | none | all | P1 | |
| M12 | Session sharing (shared terminal) | Sharing | none (relay) | all | P3 | ✅ |
| M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | ✅ |
| M14 | AI assistant (explain diagnostics) | (optional) | none (stdlib urllib; Ollama or Claude) | all | P3 | ✅ |
+6 -3
View File
@@ -67,9 +67,12 @@ Ubuntu + NVIDIA first; `.deb` distribution (see `DECISIONS.md`).
Settings "Recording trigger") incl. the zero-config **game-launch watcher**
(`core/watcher.py`, `rigdoctor watch`); and a **graphical first-run setup wizard**
(`gui/setup_wizard.py`): environment → dependency-bundle selection → install → recording
trigger → readiness, auto-launched by install.sh and re-runnable from Settings.
*Pending:* `.deb` packaging (next bullet).
- [ ] `.deb` packaging (D8) declaring per-bundle deps incl. python3-pyside6 for Desktop UI
trigger → readiness, auto-launched by install.sh and re-runnable from Settings; and a
**`.deb`** (`packaging/make_deb.py`, `Architecture: all`, `Depends: python3`,
`Recommends: python3-pyside6/pyte`) built + published in CI (release asset + optional
Gitea apt registry). **M9 complete.**
- [x] `.deb` packaging (D8) — built via `dpkg-deb` (no debhelper); GUI deps as Recommends so
`apt install rigdoctor` includes the Desktop UI, `--no-install-recommends` = CLI only.
## Phase 5 — Breadth (later)
- [ ] AMD GPU support in M1 (Steam Deck / Radeon)
+116
View File
@@ -0,0 +1,116 @@
"""Build a `.deb` for RigDoctor (M9 / D8) — dependency-light, no debhelper.
Pure-Python app, so it's `Architecture: all`: we stage the package into dist-packages, drop the
two launchers in /usr/bin, install the desktop entry + icon, write a DEBIAN/control, and call
`dpkg-deb`. The core is stdlib (`Depends: python3`); the GUI/tray deps are **Recommends**
(`python3-pyside6`, `python3-pyte`) so `apt install rigdoctor` gives the full app by default,
while `--no-install-recommends` yields a CLI-only install.
Run: `python packaging/make_deb.py` → `dist/rigdoctor_<version>_all.deb`.
"""
from __future__ import annotations
import shutil
import subprocess
import sys
from pathlib import Path
ROOT = Path(__file__).resolve().parents[1]
DIST = ROOT / "dist"
MAINTAINER = "Jessey van Offeren <jjvanofferen@gmail.com>"
HOMEPAGE = "https://git.jesseyvanofferen.com/jessey/rigdoctor"
def _version() -> str:
text = (ROOT / "src" / "rigdoctor" / "__init__.py").read_text(encoding="utf-8")
for line in text.splitlines():
if line.startswith("__version__"):
return line.split('"')[1]
raise SystemExit("could not read __version__")
_LAUNCHER = """\
#!/usr/bin/python3
import sys
from {module} import main
sys.exit(main())
"""
_DESKTOP = """\
[Desktop Entry]
Type=Application
Name=RigDoctor
Comment=Hardware monitoring & crash diagnostics for Linux gamers
Exec=rigdoctor-gui
Icon=rigdoctor
Terminal=false
Categories=System;Monitor;Utility;
StartupWMClass=rigdoctor
"""
_CONTROL = """\
Package: rigdoctor
Version: {version}
Architecture: all
Maintainer: {maintainer}
Section: utils
Priority: optional
Depends: python3 (>= 3.11)
Recommends: python3-pyside6, python3-pyte
Homepage: {homepage}
Description: Hardware monitoring & crash diagnostics for Linux gamers
RigDoctor monitors GPU/CPU temperatures, load, and sensors, captures crash
diagnostics while gaming, scans logs (Xid/SMART/kernel) for problems, and can
explain them in plain language. The CLI and background daemon are pure Python
(stdlib only); the optional desktop GUI and system-tray applet use PySide6,
pulled in via Recommends. Install with --no-install-recommends for CLI only.
"""
def _write(path: Path, text: str, mode: int = 0o644) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(text, encoding="utf-8")
path.chmod(mode)
def build() -> Path:
version = _version()
DIST.mkdir(exist_ok=True)
stage = DIST / f"rigdoctor_{version}_all"
if stage.exists():
shutil.rmtree(stage)
# Python package → dist-packages (importable system-wide), minus bytecode.
pkg_dst = stage / "usr/lib/python3/dist-packages/rigdoctor"
shutil.copytree(ROOT / "src" / "rigdoctor", pkg_dst,
ignore=shutil.ignore_patterns("__pycache__", "*.pyc"))
# Launchers.
_write(stage / "usr/bin/rigdoctor", _LAUNCHER.format(module="rigdoctor.cli"), 0o755)
_write(stage / "usr/bin/rigdoctor-gui", _LAUNCHER.format(module="rigdoctor.gui.app"), 0o755)
# Desktop entry + icon.
_write(stage / "usr/share/applications/rigdoctor.desktop", _DESKTOP)
icon = ROOT / "src" / "rigdoctor" / "gui" / "assets" / "rigdoctor.svg"
_write(stage / "usr/share/icons/hicolor/scalable/apps/rigdoctor.svg",
icon.read_text(encoding="utf-8"))
# Refresh the desktop database on install/remove (best-effort).
_write(stage / "DEBIAN/postinst",
"#!/bin/sh\nset -e\nupdate-desktop-database -q 2>/dev/null || true\n", 0o755)
_write(stage / "DEBIAN/postrm",
"#!/bin/sh\nset -e\nupdate-desktop-database -q 2>/dev/null || true\n", 0o755)
_write(stage / "DEBIAN/control",
_CONTROL.format(version=version, maintainer=MAINTAINER, homepage=HOMEPAGE))
out = DIST / f"rigdoctor_{version}_all.deb"
subprocess.run(["dpkg-deb", "--root-owner-group", "--build", str(stage), str(out)], check=True)
shutil.rmtree(stage)
return out
if __name__ == "__main__":
path = build()
print(f"built {path}")
sys.exit(0)
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "rigdoctor"
version = "0.33.0"
version = "0.35.0"
description = "Modular hardware monitoring & crash diagnostics for Linux gamers."
readme = "README.md"
requires-python = ">=3.11"
+1 -1
View File
@@ -1,3 +1,3 @@
"""RigDoctor — modular hardware monitoring & crash diagnostics for Linux gamers."""
__version__ = "0.33.0"
__version__ = "0.35.0"
+41 -5
View File
@@ -1,8 +1,9 @@
"""Desktop alerts (M8): notify on overheat / GPU-lost / new version via notify-send.
"""Desktop alerts (M8): notify on overheat / GPU-lost / critical kernel events / new version.
Edge-triggered: an alert fires when a condition becomes true (not every sample), and
can fire again only after it has cleared and a cooldown has passed — so a hot GPU or a
1-Hz sample loop doesn't spam notifications. Degrades to a no-op if notify-send is absent.
Edge-triggered: a sustained condition (hot GPU, GPU-lost) fires once when it becomes true and
can re-fire only after it clears + a cooldown; momentary **kernel events** (Xid, OOM-kill, MCE,
PCIe AER, disk I/O errors) are scanned from the kernel log every `event_interval` seconds and
fire one-shot (cooldown-gated). So a 1-Hz sample loop never spams. No-op if notify-send absent.
"""
from __future__ import annotations
@@ -57,13 +58,16 @@ def notify(title: str, message: str, urgency: str = "normal") -> bool:
class AlertMonitor:
"""Evaluate samples and raise edge-triggered desktop alerts."""
def __init__(self, gpu_temp: float = 90.0, cpu_temp: float = 95.0, cooldown: float = 300.0):
def __init__(self, gpu_temp: float = 90.0, cpu_temp: float = 95.0, cooldown: float = 300.0,
event_interval: float = 30.0):
self.gpu_temp = gpu_temp
self.cpu_temp = cpu_temp
self.cooldown = cooldown
self.event_interval = event_interval # how often to scan the kernel log
self.enabled = True
self._active: dict[str, bool] = {}
self._last: dict[str, float] = {}
self._last_kernel_scan = time.time() # only alert on events after the monitor starts
def _fire(self, key: str, title: str, message: str, urgency: str = "critical") -> None:
if self._active.get(key):
@@ -75,9 +79,39 @@ class AlertMonitor:
self._last[key] = now
notify(title, message, urgency)
def _notify_once(self, key: str, title: str, message: str, urgency: str = "critical") -> None:
"""One-shot alert for a momentary event (cooldown-gated, no active latch)."""
now = time.time()
if now - self._last.get(key, 0.0) < self.cooldown:
return
self._last[key] = now
notify(title, message, urgency)
def _clear(self, key: str) -> None:
self._active[key] = False
def _scan_kernel_events(self) -> None:
"""Periodically scan the kernel log for new critical events (Xid/OOM/MCE/PCIe/disk)."""
now = time.time()
if now - self._last_kernel_scan < self.event_interval:
return
since = self._last_kernel_scan
self._last_kernel_scan = now
try:
from . import syslogs
text = syslogs.kernel_log(since=since)
except Exception: # alerting must never crash the sample loop
return
if not text:
return
seen: set[str] = set()
for label, line in syslogs.scan_critical(text):
if label in seen: # one alert per category per scan
continue
seen.add(label)
self._notify_once(f"kernel:{label}", label, line[:180])
def check(self, sample: Sample) -> None:
if not self.enabled:
return
@@ -107,3 +141,5 @@ class AlertMonitor:
self._fire("gpu_lost", "GPU not responding", "nvidia-smi query timed out — the GPU may have dropped")
else:
self._clear("gpu_lost")
self._scan_kernel_events() # Xid / OOM / MCE / PCIe / disk I/O from the kernel log
+24
View File
@@ -13,6 +13,7 @@ Best-effort and size-bounded: degrades silently if a tool is missing or access i
from __future__ import annotations
import os
import re
import shutil
import subprocess
import time
@@ -118,6 +119,29 @@ def display_log(since: float | None = None, max_bytes: int = _MAX) -> str:
return _tail_file(log, max_bytes) if log else ""
# Kernel-log patterns worth alerting on in real time (M8 event alerts). (label, regex).
_CRITICAL = [
("GPU error (Xid)", re.compile(r"NVRM:\s*Xid", re.I)),
("Out of memory", re.compile(r"out of memory|oom-kill|killed process \d+", re.I)),
("CPU machine-check", re.compile(r"\bmce:|machine check", re.I)),
("PCIe error", re.compile(r"\bAER:|pcie bus error", re.I)),
("Disk I/O error", re.compile(
r"buffer i/o error|\bi/o error\b|critical medium error|ext4-fs error|"
r"blk_update_request:.*error|ata\d+.*(?:failed|error)", re.I)),
]
def scan_critical(text: str) -> list[tuple[str, str]]:
"""(label, line) for kernel lines matching a critical pattern (first match per line)."""
events: list[tuple[str, str]] = []
for line in text.splitlines():
for label, pat in _CRITICAL:
if pat.search(line):
events.append((label, line.strip()))
break
return events
def available() -> bool:
return bool(shutil.which("journalctl") or shutil.which("coredumpctl")
or shutil.which("nvidia-smi") or _xorg_log())
+2 -1
View File
@@ -114,7 +114,8 @@ class SetupPage(QWidget):
grid.addWidget(QLabel("CPU temperature alert"), 1, 0)
grid.addWidget(self._cpu_alert, 1, 1)
alerts_layout.addLayout(grid)
alerts_note = QLabel("GPU-lost and new-version alerts are included whenever notifications are enabled.")
alerts_note = QLabel("GPU-lost, critical kernel events (Xid, out-of-memory, disk I/O, PCIe), "
"and new-version alerts are included whenever notifications are enabled.")
alerts_note.setObjectName("Muted")
alerts_note.setWordWrap(True)
alerts_layout.addWidget(alerts_note)
+30
View File
@@ -34,5 +34,35 @@ class AlertTests(unittest.TestCase):
m.assert_called_once()
class KernelEventAlertTests(unittest.TestCase):
@mock.patch.object(alerts, "notify")
def test_kernel_event_fires_once_within_cooldown(self, m):
mon = alerts.AlertMonitor(cooldown=300.0, event_interval=0.0)
mon._last_kernel_scan = 0.0 # force a scan
with mock.patch("rigdoctor.core.syslogs.kernel_log",
return_value="NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus"):
mon._scan_kernel_events()
mon._last_kernel_scan = 0.0 # force another scan — cooldown must suppress it
mon._scan_kernel_events()
self.assertEqual(m.call_count, 1)
self.assertIn("Xid", m.call_args[0][0])
@mock.patch.object(alerts, "notify")
def test_no_alert_when_kernel_log_empty(self, m):
mon = alerts.AlertMonitor(event_interval=0.0)
mon._last_kernel_scan = 0.0
with mock.patch("rigdoctor.core.syslogs.kernel_log", return_value=""):
mon._scan_kernel_events()
m.assert_not_called()
@mock.patch.object(alerts, "notify")
def test_scan_gated_by_interval(self, m):
mon = alerts.AlertMonitor(event_interval=9999.0) # just constructed → not due yet
with mock.patch("rigdoctor.core.syslogs.kernel_log", return_value="NVRM: Xid 79") as kl:
mon._scan_kernel_events()
kl.assert_not_called()
m.assert_not_called()
if __name__ == "__main__":
unittest.main()
+19
View File
@@ -72,6 +72,25 @@ class DisplayTests(unittest.TestCase):
self.assertTrue(any(a.startswith("_COMM=") for a in cmd))
class ScanCriticalTests(unittest.TestCase):
def test_matches_each_category(self):
text = "\n".join([
"NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus",
"Out of memory: Killed process 1234 (PathOfExile)",
"mce: [Hardware Error]: CPU 0",
"pcieport 0000:00:01.0: AER: Corrected error received",
"blk_update_request: I/O error, dev sda, sector 99",
"this is a perfectly normal line",
])
labels = {label for label, _ in syslogs.scan_critical(text)}
self.assertEqual(labels, {
"GPU error (Xid)", "Out of memory", "CPU machine-check",
"PCIe error", "Disk I/O error"})
def test_clean_log_no_events(self):
self.assertEqual(syslogs.scan_critical("usb 1-2: new high-speed device\nsystemd: started"), [])
class CollectTests(unittest.TestCase):
def test_collect_combines_sections(self):
with mock.patch.object(syslogs, "kernel_log", return_value="NVRM: Xid 79"), \