Compare commits

..

4 Commits

Author SHA1 Message Date
jessey 8d6ce47e87 Merge pull request 'feat: D12 Steam-launch wrapper for auto crash-capture + doc status fixes — 0.16.0' (#11) from feat/m6-steam-detection into main
release / release (push) Successful in 14s
Reviewed-on: #11
2026-05-22 07:01:44 +00:00
jessey 03b2dd8363 feat: D12 Steam-launch wrapper for auto crash-capture + doc status fixes — 0.16.0
D12 "build first" wrapper: `rigdoctor wrap %command%` (Steam launch option /
Lutris/Heroic wrapper field) auto-brackets a focused diagnostic around a game —
start a game-tagged capture on launch, clean stop on exit; a hard freeze leaves
it unterminated → flagged as a crash next launch.

- core/wrap.py: game name from SteamAppId, PATH-proof launch_option(), run()
  that doesn't disturb an existing capture and returns the game's exit code.
- diagnostic.start() preserves an unanalyzed crash to diagnostic-crash.jsonl
  before clearing, so auto-relaunch can't wipe an unseen crash; pending_crash/
  analyze_crash check the archive first.
- GUI: "Auto-capture…" helper dialog (copyable launch-option string).
- Tests for wrap (name resolution, exit-code passthrough, no-double-start).
- docs: fix stale MODULES.md status column (M1/M3/M4/M5/M8/M10/M13 → done),
  update ROADMAP/MODULES for the wrapper + crash detection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:59:54 +02:00
jessey ab89dda0b4 Merge pull request 'feat: detect a hard-crashed diagnostic + analyze the crash boot — 0.15.0' (#10) from feat/m6-steam-detection into main
release / release (push) Successful in 13s
Reviewed-on: #10
2026-05-22 06:53:13 +00:00
jessey 305c88ba09 feat: detect a hard-crashed diagnostic + analyze the crash boot — 0.15.0
A focused capture that ends without a clean stop (no session-stop, no live
recorder) is treated as a likely hard freeze.

- core/diagnostic.py: pending_crash() detects the unterminated session;
  acknowledge_crash() dismisses it; analyze_crash() combines the captured window
  (final readings + GPU-lost) with a focused scan of the PREVIOUS (crashed) boot
  + SMART/driver/persistence/temps.
- health.check_previous_boot() scans `journalctl -k -b -1`; run_health_checks
  gained include_journal to avoid double-scanning for the crash path.
- GUI: Games page shows a warning banner on launch for an interrupted diagnostic
  with Analyze crash / Dismiss → results dialog.
- Tests for crash detection / clean-stop / acknowledge / in-progress.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:52:59 +02:00
13 changed files with 509 additions and 22 deletions
+29
View File
@@ -5,6 +5,35 @@ All notable changes to RigDoctor are recorded here. Format follows
(`MAJOR.MINOR.PATCH`, pre-1.0). `__version__` and `pyproject.toml` must match the git
release tag (so the auto-updater, D18, can compare versions).
## [0.16.0] - 2026-05-22
### Added
- **Automatic crash-capture via a Steam launch wrapper (M6/D12).** Set `rigdoctor wrap
%command%` as a game's Steam launch option (or in Lutris/Heroic's wrapper field) and RigDoctor
starts a focused, game-tagged capture when the game launches and stops it cleanly on exit — no
manual Run Diagnostic / Finish. A hard freeze leaves the capture unterminated, so it's flagged
as a crash next launch. The wrapper resolves the game name from Steam's `SteamAppId`, doesn't
disturb an existing capture, and returns the game's exit code. (`core/wrap.py`, `rigdoctor wrap`.)
- GUI **Auto-capture…** helper on the Games page: shows the exact launch-option line (absolute
path, copy button) and how to set it in Steam.
- Auto-capture preserves an unanalyzed crash (`diagnostic-crash.jsonl`) before starting a new
capture, so relaunching the game can't wipe a crash report you haven't seen yet.
### Fixed
- `docs/MODULES.md` status column was stale — M1, M3, M4, M5, M8, M10, and M13 are done and now
marked ✅ (only M2 and M11 remain not-started; M6/M9/M12 in progress).
## [0.15.0] - 2026-05-22
### Added
- **Hard-crash detection & recovery for the guided diagnostic.** If a focused capture ends
without a clean stop (the recorder never wrote `session-stop` and isn't running), RigDoctor
treats it as a likely hard freeze. On launch the **Games** page shows a warning banner —
*"Your last diagnostic for <game> ended unexpectedly…"* — with **Analyze crash** / **Dismiss**.
- **Deeper crash analysis.** *Analyze crash* combines the captured window (final readings before
the freeze + any GPU-lost event) with a focused scan of the **previous (crashed) boot's kernel
log** (`journalctl -k -b -1`: Xid/panic/OOM/MCE/AER/thermal) plus SMART/driver/persistence/
live-temp checks — the full "what happened" picture. `core/diagnostic.py` gains
`pending_crash()` / `analyze_crash()`; `health.check_previous_boot()` +
`run_health_checks(include_journal=False)` back it.
## [0.14.0] - 2026-05-22
### Changed
- **Dashboard headline tiles are now history trend graphs** instead of single-value gauges —
+19 -11
View File
@@ -8,18 +8,18 @@ Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
| ID | Module | Bundle | Key deps | GPU scope | Priority | Status |
|----|--------|--------|----------|-----------|----------|--------|
| M1 | Sensor core | Essential | none (nvidia-smi, sysfs) | all (NVIDIA first) | P0 | |
| M3 | Crash-capture logger | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | 🟨 |
| M4 | Health report (log scan) | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | 🟨 |
| M1 | Sensor core | Essential | none (nvidia-smi, sysfs) | all (NVIDIA first) | P0 | |
| M3 | Crash-capture logger | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | |
| M4 | Health report (log scan) | Essential | none (opt: smartmontools) | all (NVIDIA first) | P0 | |
| M2 | Live monitor (TUI) | Monitoring | none (stdlib curses) | all | P1 | ⬜ |
| M8 | Alerting | Monitoring | libnotify (opt) | all | P2 | 🟨 |
| M5 | System inventory | Diagnostics | none (opt: lm-sensors, dmidecode) | all | P1 | 🟨 |
| M8 | Alerting | Monitoring | libnotify (opt) | all | P2 | |
| M5 | System inventory | Diagnostics | none (opt: lm-sensors, dmidecode) | all | P1 | |
| M6 | Gaming env checks | Diagnostics | none | all | P2 | 🟨 |
| M10 | Desktop GUI | Desktop UI | **python3-pyside6** | all | P2 | 🟨 |
| M10 | Desktop GUI | Desktop UI | **python3-pyside6** | all | P2 | |
| M11 | Tray / menu-bar applet | Desktop UI | **python3-pyside6** (+ AppIndicator on GNOME) | all | P2 | ⬜ |
| M9 | Installer | (meta) | none | all | P1 | 🟨 |
| M12 | Session sharing / remote assist | Sharing | none (Tier 3: tmate/sshx) | all | P3 | 🟨 |
| M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | 🟨 |
| M13 | Auto-update | (core) | none (stdlib; user-local file swap) | all | P3 | |
| ~~M7~~ | ~~Stress / repro~~ | — | — | — | — | ❌ dropped (D7) |
## Notes per module
@@ -31,8 +31,10 @@ Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
*Implemented (manual trigger):* JSONL log with fsync-per-sample, size-based rotation
(`log_max_bytes`/`log_backups`), GPU-lost/recovered event markers, atomic status file, and
`rigdoctor record run|start|stop|status|report`. The foreground `run` is the systemd-ready
entrypoint; the service unit + always-on/game-launch triggers (D6/D12) land in Phase 4.
Also fully driven from the GUI's Recording/Logs page (M10) via shared `core.reccontrol`.
entrypoint. The **game-launch trigger** is implemented via the D12 wrapper (`rigdoctor wrap
%command%`, see M6/below); the `systemd --user` service unit + always-on trigger (D6) and the
zero-config watcher (D12) are still pending. Also fully driven from the GUI's Recording/Logs
page (M10) via shared `core.reccontrol`.
- **M4 Health report** — turns scattered logs into a prioritized, plain-language findings
list with **suggested** fixes (read-only, D9). Reuses M1 for a live snapshot. Also powers
the **guided diagnostic session** (with M3): pick a game → focused capture → scan →
@@ -56,8 +58,14 @@ Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
for the runtime-reversible tunables (governor / NVIDIA persistence / PCIe ASPM / swappiness /
THP — dropdown + Apply via a single pkexec prompt, `core/fixes.py`) and **one-click install**
of optional tools (GameMode / MangoHud / cpupower, now in the M9 catalog). GRUB/mitigations
stay suggestion-only. *Pending:* non-Steam launchers (Lutris/Heroic) and GPU power-profile
(PowerMizer) checks.
stay suggestion-only. *Guided diagnostic (D12 "pick a game", `core/diagnostic.py`):* a focused
capture tagged with a game → window-scoped report (capture summary + M4 findings), in the CLI
(`rigdoctor diagnose start/status/finish`) and GUI (per-game **Run Diagnostic** → recording
banner → results dialog). **Auto-capture** via the D12 wrapper (`rigdoctor wrap %command%`,
`core/wrap.py`; GUI "Auto-capture…" helper). **Hard crashes are detected** (capture left
without a clean stop) and flagged on next launch with a crash-boot kernel-log analysis
(`pending_crash`/`analyze_crash` + `health.check_previous_boot`). *Pending:* non-Steam
launchers (Lutris/Heroic), GPU power-profile (PowerMizer) checks, and the zero-config watcher.
- **M8 Alerting** — threshold/event notifications; integrates with the tray applet (M11).
- **M10 Desktop GUI** — PySide6 graphical front-end over the core engine (dashboard, log
browser, report viewer, logger controls). Optional; adds the Qt dependency. *Bootstrapped
+10 -5
View File
@@ -45,11 +45,16 @@ Ubuntu + NVIDIA first; `.deb` distribution (see `DECISIONS.md`).
diagnose start/status/finish`, and a **Run Diagnostic** button per game on the GUI Games
page → recording banner → results dialog with the capture summary + findings). Tags a
focused capture with the chosen game (own diagnostic log, window-scoped report) and
combines the capture summary with the M4 findings. *Pending:* the tray (M11) entry point,
and auto start/stop via the D12 wrapper/watcher.
- [ ] Logger trigger modes: always-on + game-launch (D12 — wrapper first:
`rigdoctor wrap %command%` + global Steam compat-tool; zero-config watcher
(Steam RunningAppID + /proc) and GameMode hook follow)
combines the capture summary with the M4 findings. **Auto start/stop** via the D12
wrapper is wired in, and a **hard-crash is detected** (capture left without a clean stop)
→ flagged on next launch with a deeper crash-boot log analysis. *Pending:* the tray (M11)
entry point and the zero-config watcher.
- [~] Logger trigger modes: always-on + game-launch (D12) — *game-launch **wrapper** done:*
`rigdoctor wrap %command%` (per-game Steam launch option / Lutris/Heroic wrapper field)
auto-brackets a focused capture around the game; GUI "Auto-capture…" helper shows the
launch-option string. *Pending:* global Steam compat-tool registration, the zero-config
watcher (Steam RunningAppID + /proc), GameMode hook, and the always-on `systemd --user`
service.
- [~] M9 interactive installer — *done:* distro/GPU detection + optional-dependency install
(`rigdoctor install`, GUI Setup tab); **user-local `install.sh` + self-extracting `.run`**
(no-root venv install, handles python3-venv prereq, CI-built). *Pending:* module-selection
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "rigdoctor"
version = "0.14.0"
version = "0.16.0"
description = "Modular hardware monitoring & crash diagnostics for Linux gamers."
readme = "README.md"
requires-python = ">=3.11"
+1 -1
View File
@@ -1,3 +1,3 @@
"""RigDoctor — modular hardware monitoring & crash diagnostics for Linux gamers."""
__version__ = "0.14.0"
__version__ = "0.16.0"
+12
View File
@@ -417,6 +417,12 @@ def cmd_diagnose(args) -> int:
return 0
def cmd_wrap(args) -> int:
from .core import wrap
return wrap.run(args.command)
def cmd_gameenv(args) -> int:
from dataclasses import asdict
@@ -605,6 +611,12 @@ def build_parser() -> argparse.ArgumentParser:
diag_finish.add_argument("--last", type=int, default=10, help="recent samples to show")
diag_finish.set_defaults(func=cmd_diagnose)
diag_p.set_defaults(func=cmd_diagnose, diagnose_cmd=None, last=10)
wrap_p = sub.add_parser(
"wrap", help="run a game with automatic crash-capture (Steam launch option, D12)")
wrap_p.add_argument("command", nargs=argparse.REMAINDER,
help="the game command — use `rigdoctor wrap %%command%%` in Steam")
wrap_p.set_defaults(func=cmd_wrap)
return p
+3
View File
@@ -26,6 +26,9 @@ LOG_FILE = LOG_DIR / "capture.jsonl"
# Guided diagnostic (M6/D12): a focused capture writes here, separate from the always-on
# crash log, so its report covers only that session's window.
DIAG_LOG = LOG_DIR / "diagnostic.jsonl"
# A crashed (unterminated, unacknowledged) diagnostic is preserved here when a new capture
# starts, so auto-capture (the Steam wrapper) relaunching the game doesn't wipe it first.
DIAG_CRASH = LOG_DIR / "diagnostic-crash.jsonl"
STATUS_FILE = STATE_DIR / "recorder.json"
PID_FILE = STATE_DIR / "recorder.pid"
SPAWN_LOG = STATE_DIR / "recorder.out"
+104 -1
View File
@@ -11,13 +11,16 @@ The capture is **manually bracketed** (start/finish) for now; auto start/stop on
from __future__ import annotations
import json
import time
from dataclasses import dataclass
from .. import config
from . import reccontrol
from .crashlog import Summary, summarize
from .health import Finding
from .health import CRITICAL, OK, WARNING, Finding
_SEV_ORDER = {CRITICAL: 0, WARNING: 1, "info": 2, OK: 3}
@dataclass
@@ -27,6 +30,14 @@ class DiagnosticResult:
findings: list[Finding] # health findings: Xid/SMART/driver/etc. (M4)
@dataclass
class CrashInfo:
game: str | None
samples: int
when: float | None # ts of the last captured sample (≈ when the freeze hit)
gpu_lost: bool
def _clear_diag_log() -> None:
"""Each diagnostic is a fresh focused capture — drop any previous session + segments."""
base = config.DIAG_LOG
@@ -42,6 +53,11 @@ def start(game: str | None = None, interval: float | None = None) -> int | None:
Returns the pid, or None if a capture is already running."""
if reccontrol.running_pid():
return None
if _crash_from_log(config.DIAG_LOG): # preserve an unanalyzed crash before overwriting it
try:
config.DIAG_LOG.replace(config.DIAG_CRASH)
except OSError:
pass
_clear_diag_log()
return reccontrol.start_background(interval=interval, out=str(config.DIAG_LOG), game=game)
@@ -82,3 +98,90 @@ def finish(last_n: int = 10, log_path=None) -> DiagnosticResult:
game = _game_from_summary(summary) or (reccontrol.read_status() or {}).get("game")
findings = run_health_checks()
return DiagnosticResult(game=game, summary=summary, findings=findings)
# --- hard-crash detection & post-crash analysis -----------------------------------
def _crash_from_log(path) -> CrashInfo | None:
"""CrashInfo if `path` holds an abnormally-ended session (start, no stop, not acked)."""
if not path.exists():
return None
summary = summarize(path)
kinds = {kind for _ts, kind, _detail in summary.events}
if "session-start" not in kinds:
return None
if "session-stop" in kinds or "diagnostic-acknowledged" in kinds:
return None
return CrashInfo(
game=_game_from_summary(summary),
samples=summary.samples,
when=summary.end,
gpu_lost="gpu-lost" in kinds,
)
def _crash_path():
"""Where the pending crash lives: the preserved archive if present, else the live log."""
return config.DIAG_CRASH if config.DIAG_CRASH.exists() else config.DIAG_LOG
def pending_crash() -> CrashInfo | None:
"""Detect a diagnostic that ended abnormally (no clean stop, no live recorder).
A focused capture writes `session-start` (+ `game`) and, on a clean stop, `session-stop`.
After a hard freeze that block never runs, so the log has a start with no stop and no
live recorder — that's our hard-crash signal. A crash preserved across an auto-relaunch
(`DIAG_CRASH`) is checked first. Returns None if a capture is running, none is recorded,
it stopped cleanly, or the user already acknowledged it.
"""
info = _crash_from_log(config.DIAG_CRASH) # preserved across a relaunch (wrapper)
if info is not None:
return info
if is_running():
return None
return _crash_from_log(config.DIAG_LOG)
def acknowledge_crash() -> None:
"""Mark the recorded crash as seen so it stops prompting."""
try:
config.DIAG_CRASH.unlink() # drop the preserved archive, if any
except OSError:
pass
try:
config.DIAG_LOG.parent.mkdir(parents=True, exist_ok=True)
with open(config.DIAG_LOG, "a", encoding="utf-8") as fh:
fh.write(json.dumps({"ts": time.time(), "event": "diagnostic-acknowledged", "detail": ""}) + "\n")
except OSError:
pass
def _crash_headline(summary: Summary) -> Finding:
gpu_lost = any(kind == "gpu-lost" for _ts, kind, _detail in summary.events)
when = time.strftime("%H:%M:%S", time.localtime(summary.end)) if summary.end else "?"
detail = (
f"The capture stopped abruptly at {when} after {summary.samples} samples, with no clean "
"shutdown recorded — consistent with a hard freeze or power loss."
)
if gpu_lost:
detail += " A GPU-lost event was captured during the session."
return Finding(
CRITICAL if gpu_lost else WARNING,
"Diagnostic",
"Session ended without a clean stop (likely a hard crash)",
detail,
"Review the last readings (Capture, above) and the crash-boot findings below.",
)
def analyze_crash(last_n: int = 15) -> DiagnosticResult:
"""Analyze a recorded hard crash: the captured window + the previous boot's kernel log
+ the rest of the health report (SMART/driver/persistence/temps)."""
from .health import check_previous_boot, run_health_checks
summary = summarize(_crash_path(), last_n=last_n)
findings: list[Finding] = [_crash_headline(summary)]
findings += check_previous_boot() # the crashed boot's kernel log
findings += run_health_checks(include_journal=False) # SMART/driver/persistence/temps
findings.sort(key=lambda f: _SEV_ORDER.get(f.severity, 9))
return DiagnosticResult(game=_game_from_summary(summary), summary=summary, findings=findings)
+22 -2
View File
@@ -146,6 +146,22 @@ def check_journal() -> list[Finding]:
return findings
def check_previous_boot() -> list[Finding]:
"""Scan the previous boot's kernel log — the boot that crashed — for fault signatures.
Needs persistent journald (else the crashed boot's logs were lost on reboot, which the
persistence check flags separately). Findings are framed as coming from that boot.
"""
out = _journalctl(["-k", "-b", "-1", "--no-pager", "-o", "cat"])
if not out or not out.strip():
return []
tagged = []
for f in scan_journal_text(out):
detail = ("Logged during the previous (crashed) boot. " + (f.detail or "")).strip()
tagged.append(Finding(f.severity, f.category, f.title, detail, f.suggestion))
return tagged
def check_journal_persistence() -> list[Finding]:
if Path("/var/log/journal").is_dir():
return []
@@ -235,17 +251,21 @@ def check_live_temps() -> list[Finding]:
)]
def run_health_checks() -> list[Finding]:
def run_health_checks(include_journal: bool = True) -> list[Finding]:
"""Run all checks and return findings sorted by severity (worst first).
SMART needs root; if the session collected it via launch elevation, use that
instead of re-running smartctl (which would just report "needs root").
`include_journal=False` skips the 7-day kernel-journal scan — used by the crash
analysis, which scans the previous (crashed) boot specifically instead.
"""
from . import elevation
findings: list[Finding] = []
findings += check_nvidia_driver()
findings += check_journal()
if include_journal:
findings += check_journal()
findings += check_journal_persistence()
priv = elevation.privileged()
if priv is not None and priv.get("smart") is not None:
+78
View File
@@ -0,0 +1,78 @@
"""Steam-launch wrapper (D12): auto-bracket a focused diagnostic around a game.
Set as a per-game Steam launch option — `rigdoctor wrap %command%` — or in Lutris/Heroic's
wrapper field. Steam expands `%command%` to the real game command; we start a focused capture
(tagged with the game), run the game, and stop the capture cleanly when it exits. A hard
freeze means the game (and this wrapper) never returns, so the capture is left without a clean
stop — which RigDoctor then flags as a crash on next launch.
Deterministic and daemonless (D12 "build first"): no polling, and it knows the title.
"""
from __future__ import annotations
import os
import signal
import subprocess
import sys
from pathlib import Path
def game_name_from_env() -> str | None:
"""The launching game's name, resolved from Steam's SteamAppId env var via the scan."""
appid = os.environ.get("SteamAppId") or os.environ.get("SteamGameId")
if not appid:
return None
from . import steam
games = steam.cached_games() or steam.scan_games(steam.selected_library_paths())
for game in games:
if game.appid == str(appid):
return game.name
return f"Steam app {appid}"
def launch_option() -> str:
"""The exact string to paste into Steam's Launch Options (absolute path → PATH-proof)."""
exe = Path(sys.executable).with_name("rigdoctor")
prog = str(exe) if exe.exists() else "rigdoctor"
quoted = f'"{prog}"' if " " in prog else prog
return f"{quoted} wrap %command%"
def run(command: list[str]) -> int:
"""Start a focused capture (unless one's already running), run the game, then stop it.
Returns the game's exit code so Steam sees the right status."""
from . import diagnostic, reccontrol
if not command:
print("usage: rigdoctor wrap %command% (set as a Steam launch option)", file=sys.stderr)
return 2
game = game_name_from_env() or os.path.basename(command[0])
started = False
if not reccontrol.running_pid(): # don't disturb an existing capture
started = diagnostic.start(game=game) is not None
proc: subprocess.Popen | None = None
def _forward(signum, _frame): # pass Steam's stop signal to the game
if proc is not None and proc.poll() is None:
try:
proc.send_signal(signum)
except OSError:
pass
previous = {sig: signal.signal(sig, _forward) for sig in (signal.SIGTERM, signal.SIGINT)}
try:
proc = subprocess.Popen(command)
rc = proc.wait()
except (OSError, ValueError, subprocess.SubprocessError) as exc:
print(f"rigdoctor wrap: couldn't launch the game: {exc}", file=sys.stderr)
rc = 1
finally:
for sig, handler in previous.items():
signal.signal(sig, handler)
if started:
reccontrol.stop_background() # clean stop → no false crash flag
return rc
+112 -1
View File
@@ -13,10 +13,13 @@ import time
from PySide6.QtCore import Qt, QTimer, Signal
from PySide6.QtWidgets import (
QApplication,
QCheckBox,
QDialog,
QFrame,
QHBoxLayout,
QLabel,
QLineEdit,
QMessageBox,
QPushButton,
QScrollArea,
@@ -26,7 +29,7 @@ from PySide6.QtWidgets import (
from ..config import load_config, update_config
from .diagnostic_dialog import DiagnosticDialog
from .theme import ACCENT, GOOD, MUTED
from .theme import ACCENT, GOOD, MUTED, WARN
def _game_row(name: str, sublabel: str, size: str, is_new: bool, appid: str = "", on_diagnose=None) -> QFrame:
@@ -99,6 +102,9 @@ class GamesPage(QWidget):
self._status = QLabel("")
self._status.setObjectName("Muted")
header.addWidget(self._status)
self._autocap_btn = QPushButton("Auto-capture…")
self._autocap_btn.clicked.connect(self._show_autocapture)
header.addWidget(self._autocap_btn)
self._rescan_btn = QPushButton("Rescan")
self._rescan_btn.setObjectName("PrimaryButton")
self._rescan_btn.clicked.connect(self.refresh)
@@ -126,6 +132,27 @@ class GamesPage(QWidget):
self._banner.hide()
root.addWidget(self._banner)
# Hard-crash banner: a previous diagnostic ended without a clean stop.
self._crash_banner = QFrame()
self._crash_banner.setObjectName("Card")
self._crash_banner.setStyleSheet(f"#Card {{ border: 1px solid {WARN}; }}")
crash_h = QHBoxLayout(self._crash_banner)
crash_h.setContentsMargins(16, 10, 16, 10)
crash_h.setSpacing(10)
self._crash_label = QLabel("")
self._crash_label.setWordWrap(True)
self._crash_label.setStyleSheet(f"color: {WARN}; font-weight: 700; background: transparent;")
crash_h.addWidget(self._crash_label, 1)
self._analyze_btn = QPushButton("Analyze crash")
self._analyze_btn.setObjectName("ActionButton")
self._analyze_btn.clicked.connect(self._analyze_crash)
crash_h.addWidget(self._analyze_btn)
self._dismiss_btn = QPushButton("Dismiss")
self._dismiss_btn.clicked.connect(self._dismiss_crash)
crash_h.addWidget(self._dismiss_btn)
self._crash_banner.hide()
root.addWidget(self._crash_banner)
self._diag_timer = QTimer(self)
self._diag_timer.setInterval(1000)
self._diag_timer.timeout.connect(self._poll_diag)
@@ -163,6 +190,7 @@ class GamesPage(QWidget):
self._load_cached() # instant display from the last scan
QTimer.singleShot(400, self.refresh) # then rescan in the background on launch
self._check_crash() # surface an interrupted (crashed) diagnostic
# --- loading ----------------------------------------------------------------------
@@ -357,8 +385,10 @@ class GamesPage(QWidget):
def _on_diag_done(self, result) -> None:
self._banner.hide()
self._crash_banner.hide()
self._finish_btn.setEnabled(True)
self._discard_btn.setEnabled(True)
self._analyze_btn.setEnabled(True)
if result is None:
QMessageBox.warning(self, "RigDoctor", "The diagnostic couldn't be analyzed.")
return
@@ -371,6 +401,85 @@ class GamesPage(QWidget):
reccontrol.stop_background()
self._banner.hide()
def _show_autocapture(self) -> None:
from ..core import wrap
option = wrap.launch_option()
dlg = QDialog(self)
dlg.setWindowTitle("Auto-capture in Steam")
dlg.resize(580, 250)
v = QVBoxLayout(dlg)
v.setContentsMargins(20, 18, 20, 16)
v.setSpacing(12)
info = QLabel(
"Capture automatically every time you launch a game — no need to click "
"Run Diagnostic.\n\n"
"1. In Steam, right-click the game → Properties → Launch Options.\n"
"2. Paste the line below.\n\n"
"RigDoctor starts a focused capture when the game launches and stops it on exit. "
"If the game hard-freezes, you'll get a crash report next time you open RigDoctor."
)
info.setWordWrap(True)
v.addWidget(info)
row = QHBoxLayout()
field = QLineEdit(option)
field.setReadOnly(True)
row.addWidget(field, 1)
copy = QPushButton("Copy")
copy.setObjectName("PrimaryButton")
copy.clicked.connect(lambda: QApplication.clipboard().setText(option))
row.addWidget(copy)
v.addLayout(row)
buttons = QHBoxLayout()
buttons.addStretch(1)
close = QPushButton("Close")
close.clicked.connect(dlg.accept)
buttons.addWidget(close)
v.addLayout(buttons)
dlg.exec()
# --- hard-crash recovery ----------------------------------------------------------
def _check_crash(self) -> None:
from ..core import diagnostic
info = diagnostic.pending_crash()
if info is None:
self._crash_banner.hide()
return
game = info.game or "your last game"
extra = " · ⚠ GPU-lost was captured" if info.gpu_lost else ""
self._crash_label.setText(
f"⚠ Your last diagnostic for {game} ended unexpectedly — likely a hard crash "
f"({info.samples} samples{extra}). Analyze it to see the final readings and the "
f"likely cause from the system logs."
)
self._analyze_btn.setEnabled(True)
self._crash_banner.show()
def _analyze_crash(self) -> None:
from ..core import diagnostic
diagnostic.acknowledge_crash() # don't prompt again for this one
self._analyze_btn.setEnabled(False)
self._crash_label.setText("Analyzing the crash (final readings + system logs)…")
threading.Thread(target=self._work_analyze_crash, daemon=True).start()
def _work_analyze_crash(self) -> None:
from ..core import diagnostic
try:
result = diagnostic.analyze_crash()
except Exception:
result = None
self._diag_done.emit(result)
def _dismiss_crash(self) -> None:
from ..core import diagnostic
diagnostic.acknowledge_crash()
self._crash_banner.hide()
# --- nav badge integration --------------------------------------------------------
def showEvent(self, event) -> None: # noqa: N802 (Qt override)
@@ -392,3 +501,5 @@ class GamesPage(QWidget):
self._banner.show()
if not self._diag_timer.isActive():
self._diag_timer.start()
else:
self._check_crash() # re-surface an interrupted diagnostic if one is pending
+50
View File
@@ -57,5 +57,55 @@ class FinishTests(unittest.TestCase):
self.assertTrue(any(kind == "gpu-lost" for _ts, kind, _d in result.summary.events))
class CrashDetectionTests(unittest.TestCase):
def _diag_log(self, d) -> Path:
return Path(d) / "diagnostic.jsonl"
def test_unterminated_session_is_a_pending_crash(self):
with tempfile.TemporaryDirectory() as d:
log = self._diag_log(d)
_write_log(str(log), "Tarkov") # has session-start + game, no session-stop
with mock.patch.object(diagnostic.config, "DIAG_LOG", log), \
mock.patch.object(diagnostic.config, "DIAG_CRASH", log.with_suffix(".crash")), \
mock.patch.object(diagnostic.reccontrol, "running_pid", return_value=None):
info = diagnostic.pending_crash()
self.assertIsNotNone(info)
self.assertEqual(info.game, "Tarkov")
self.assertTrue(info.gpu_lost) # _write_log writes a gpu-lost event
def test_clean_stop_is_not_a_crash(self):
with tempfile.TemporaryDirectory() as d:
log = self._diag_log(d)
w = CrashLogWriter(str(log))
w.write_event("session-start"); w.write_event("game", "X")
w.write_sample(Sample(time.time(), [Reading("gpu", "temp", 60.0, "°C", "")]))
w.write_event("session-stop", "samples=1")
w.close()
with mock.patch.object(diagnostic.config, "DIAG_LOG", log), \
mock.patch.object(diagnostic.config, "DIAG_CRASH", log.with_suffix(".crash")), \
mock.patch.object(diagnostic.reccontrol, "running_pid", return_value=None):
self.assertIsNone(diagnostic.pending_crash())
def test_acknowledge_clears_pending_crash(self):
with tempfile.TemporaryDirectory() as d:
log = self._diag_log(d)
_write_log(str(log), "Tarkov")
with mock.patch.object(diagnostic.config, "DIAG_LOG", log), \
mock.patch.object(diagnostic.config, "DIAG_CRASH", log.with_suffix(".crash")), \
mock.patch.object(diagnostic.reccontrol, "running_pid", return_value=None):
self.assertIsNotNone(diagnostic.pending_crash())
diagnostic.acknowledge_crash()
self.assertIsNone(diagnostic.pending_crash())
def test_running_capture_is_not_a_crash(self):
with tempfile.TemporaryDirectory() as d:
log = self._diag_log(d)
_write_log(str(log), "Tarkov")
with mock.patch.object(diagnostic.config, "DIAG_LOG", log), \
mock.patch.object(diagnostic.config, "DIAG_CRASH", log.with_suffix(".crash")), \
mock.patch.object(diagnostic.reccontrol, "running_pid", return_value=4321):
self.assertIsNone(diagnostic.pending_crash()) # it's in-progress, not crashed
if __name__ == "__main__":
unittest.main()
+68
View File
@@ -0,0 +1,68 @@
"""Tests for the D12 Steam-launch wrapper (rigdoctor wrap %command%)."""
import unittest
from unittest import mock
from rigdoctor.core import wrap
from rigdoctor.core.steam import Game
class LaunchOptionTests(unittest.TestCase):
def test_format(self):
opt = wrap.launch_option()
self.assertTrue(opt.endswith("wrap %command%"))
self.assertIn("rigdoctor", opt)
class GameNameTests(unittest.TestCase):
def test_resolves_from_steam_appid(self):
g = Game(appid="570", name="Dota 2", library="/x", installdir="dota")
with mock.patch.dict("os.environ", {"SteamAppId": "570"}), \
mock.patch("rigdoctor.core.steam.cached_games", return_value=[g]):
self.assertEqual(wrap.game_name_from_env(), "Dota 2")
def test_unknown_appid_falls_back(self):
with mock.patch.dict("os.environ", {"SteamAppId": "999"}), \
mock.patch("rigdoctor.core.steam.cached_games", return_value=[]), \
mock.patch("rigdoctor.core.steam.scan_games", return_value=[]):
self.assertEqual(wrap.game_name_from_env(), "Steam app 999")
def test_none_without_steam_env(self):
with mock.patch.dict("os.environ", {}, clear=True):
self.assertIsNone(wrap.game_name_from_env())
class RunTests(unittest.TestCase):
def test_brackets_capture_and_returns_exit_code(self):
with mock.patch("rigdoctor.core.reccontrol.running_pid", return_value=None), \
mock.patch("rigdoctor.core.diagnostic.start", return_value=123) as start, \
mock.patch("rigdoctor.core.reccontrol.stop_background") as stop, \
mock.patch.dict("os.environ", {}, clear=True):
rc = wrap.run(["true"])
self.assertEqual(rc, 0)
start.assert_called_once()
stop.assert_called_once()
def test_propagates_game_failure(self):
with mock.patch("rigdoctor.core.reccontrol.running_pid", return_value=None), \
mock.patch("rigdoctor.core.diagnostic.start", return_value=123), \
mock.patch("rigdoctor.core.reccontrol.stop_background"), \
mock.patch.dict("os.environ", {}, clear=True):
self.assertEqual(wrap.run(["false"]), 1)
def test_does_not_touch_an_existing_capture(self):
with mock.patch("rigdoctor.core.reccontrol.running_pid", return_value=999), \
mock.patch("rigdoctor.core.diagnostic.start") as start, \
mock.patch("rigdoctor.core.reccontrol.stop_background") as stop, \
mock.patch.dict("os.environ", {}, clear=True):
rc = wrap.run(["true"])
self.assertEqual(rc, 0)
start.assert_not_called()
stop.assert_not_called()
def test_empty_command_is_usage_error(self):
self.assertEqual(wrap.run([]), 2)
if __name__ == "__main__":
unittest.main()