Compare commits

..

10 Commits

Author SHA1 Message Date
jessey 8094a6f8c1 chore(release): v0.42.1
tests / core (pull_request) Successful in 14s
tests / gui-smoke (pull_request) Successful in 30s
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 16:33:25 +02:00
jessey a05fb0e9d6 fix(games): let the GUI Add-game dialog link a launcher & log folder
The "Add game…" button only prompted for a name (single-field QInputDialog),
so a custom game couldn't be given its launch command or log dir from the GUI —
the command/logdir were CLI-only, leaving SPT unlaunchable from the app. Replace
it with a proper dialog: name + an optional launch command/script (with a file
browser) + an optional log folder (auto-detected from the script's folder when
left blank), wired to the existing customgames.add(command=, logdir=).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 16:32:54 +02:00
jessey ac4863b0d4 Merge pull request 'feat(health): detect no-Xid GPU freezes (open-module VA-space faults)' (#46) from feat/gpu-vaspace-spt into main
release / test (push) Successful in 13s
release / release (push) Successful in 17s
Reviewed-on: #46
2026-05-29 14:10:58 +00:00
jessey b65f36bb2d Merge branch 'main' into feat/gpu-vaspace-spt
tests / core (pull_request) Successful in 12s
tests / gui-smoke (pull_request) Successful in 29s
2026-05-29 14:10:01 +00:00
jessey 0f9cb4b684 chore(release): v0.42.0
tests / core (pull_request) Successful in 17s
tests / gui-smoke (pull_request) Successful in 29s
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 16:09:02 +02:00
jessey b9bfec961c feat(games): manually add games (e.g. SPT) with launch + own logs
Some titles never show up in a Steam/Lutris/Heroic scan — standalone mod
launchers like SPT (Single-Player Tarkov), itch.io downloads, hand-installed
executables. Add a user-authored custom-games list (core/customgames.py) shown
alongside the other sources in `rigdoctor games` and the GUI.

Each entry can carry a launch command and a log directory:
  - `rigdoctor games add "SPT" --command .../tarkov.sh` (logs/ auto-detected)
  - `rigdoctor games play "SPT"` launches it under the crash-capture wrapper
    (wrap.run gains an explicit game-name override, since there's no SteamAppId)
  - the diagnostic now feeds the game's own logs to the analysis: gamelogs
    .collect(game=...) tails the registered log dir (SPT's server/launcher logs)
    alongside the kernel log, freshness-scoped by mtime.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 16:07:25 +02:00
jessey b1bc961b79 feat(health): detect no-Xid GPU freezes (open-module VA-space faults)
The kernel-log scanner only caught Xid codes, OOM, panic, MCE, AER, thermal,
and amdgpu resets — so a hard freeze that logs NO Xid slipped through entirely.
Add detection for the NVIDIA open-kernel-module VA-space mapping fault
(gpu_vaspace.c / dmaAllocMapping / NVKMS GEM-allocation failures), which can
storm for minutes and end in a freeze without the GPU ever "falling off the
bus". Also flag when the open kernel module (nvidia-*-open) is loaded — the
context behind these faults — and add an AI-knowledge entry so the assistant
distinguishes it from the Xid 79 hardware drop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 16:07:14 +02:00
jessey 410f8882ee Merge pull request 'feat(ai): import & analyze Windows crash dumps (.dmp) — 0.41.0' (#45) from feat/ram-speed into main
release / test (push) Successful in 12s
release / release (push) Successful in 14s
Reviewed-on: #45
2026-05-25 16:41:03 +00:00
jessey 1da7816741 Merge branch 'main' into feat/ram-speed
tests / core (pull_request) Successful in 13s
tests / gui-smoke (pull_request) Successful in 26s
2026-05-25 16:40:10 +00:00
jessey 33c554c29f feat(ai): import & analyze Windows crash dumps (.dmp) — 0.41.0
tests / core (pull_request) Successful in 16s
tests / gui-smoke (pull_request) Successful in 27s
Games page gains an "Import crash dump…" button (shown when an AI provider
is configured) that parses a Proton/Wine minidump and explains it via the
opt-in AI assistant. New stdlib core/minidump.py reads the MDMP streams
(crash reason, faulting module, OS/CPU, module list), optionally enriched
by minidump_stackwalk if installed. Adds ai_knowledge facts for exception
codes + faulting-module signatures, a MinidumpDialog, and CLI parity via
`rigdoctor ai dump <file>`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:39:52 +02:00
20 changed files with 1337 additions and 13 deletions
+44
View File
@@ -5,6 +5,50 @@ All notable changes to RigDoctor are recorded here. Format follows
(`MAJOR.MINOR.PATCH`, pre-1.0). `__version__` and `pyproject.toml` must match the git
release tag (so the auto-updater, D18, can compare versions).
## [0.42.1] - 2026-05-29
### Fixed
- **GUI "Add game…" can now link a launcher.** The dialog only asked for a name, so a custom
game (e.g. SPT) couldn't be given its launch command or log folder from the app — those were
CLI-only, leaving it unlaunchable from the GUI. It's now a proper form: name + an optional
launch command/script (with a **Browse…** file picker) + an optional log folder (auto-detected
from the script's folder when left blank).
## [0.42.0] - 2026-05-29
### Added
- **Detect hard freezes that log no Xid.** The kernel-log scanner caught Xid codes, OOM, panic,
MCE, PCIe AER, thermal events, and amdgpu resets — but a crash that logs *no* Xid slipped
through. It now flags the NVIDIA open-kernel-module **VA-space mapping fault** (`gpu_vaspace.c`
/ `dmaAllocMapping` assertions, NVKMS GEM-allocation failures) — a driver-internal error that
can storm for minutes and end in a freeze without the GPU ever "falling off the bus" (distinct
from Xid 79). A new `check_nvidia_module()` notes when the open module (`nvidia-*-open`) is
loaded — the context behind these faults — and a new `ai_knowledge` entry lets the assistant
tell the no-Xid freeze apart from the Xid 79 hardware drop.
- **Add games no launcher reports (e.g. SPT).** A user-authored custom-games list
(`core/customgames.py`) shows alongside Steam/Lutris/Heroic in `rigdoctor games` and the GUI
("Add game…"), for standalone mod launchers (Single-Player Tarkov), itch.io downloads, or any
hand-installed game. Each entry can carry a launch command and a log directory:
`rigdoctor games add "SPT" --command .../tarkov.sh` (a sibling `logs/` is auto-detected),
`rigdoctor games play "SPT"` launches it under the crash-capture wrapper (tagged with the real
name, not the script's), and the diagnostic now tails the game's *own* logs — SPT's
server/launcher logs — alongside the kernel log so the analysis sees what the game logged
before the freeze.
## [0.41.0] - 2026-05-25
### Added
- **Import a crash dump (`.dmp`) and explain it with AI.** The **Games** page gains an
"Import crash dump…" button (shown once an AI provider is configured) that opens a Windows
minidump — the kind a Proton/Wine game writes when it hard-crashes — parses it, and hands the
result to the opt-in AI assistant (D24; cloud sends still ask first). A new stdlib
`core/minidump.py` reads the `MDMP` streams with `struct` (no new deps): the exception / crash
reason (e.g. access violation `0xC0000005`), the **faulting module** (which DLL the crash
address lands in — `nvwgf2umx.dll`, `d3d11.dll`, an anticheat, the game's own `.exe`…), OS/CPU,
and the loaded-module list. If `minidump_stackwalk` (Breakpad) or `minidump-stackwalk`
(rust-minidump) is on PATH, its fuller report is appended best-effort. The model is told the
dump came from a Windows process under Proton, so fixes stay Linux/Proton-side (Proton version,
DXVK/VKD3D, driver, launch options) — never Windows admin/registry steps. New `ai_knowledge`
facts cover the common exception codes and faulting-module signatures. CLI parity:
`rigdoctor ai dump <file>`.
## [0.40.0] - 2026-05-22
### Added
- **RAM speed / XMP-EXPO check.** Inventory now shows each module's configured speed and, when it's
+2 -1
View File
@@ -24,7 +24,8 @@ freeze are usually lost. RigDoctor pulls it together and keeps the evidence.
- **Proactive alerts** — desktop notifications on overheating and critical kernel events
(GPU-lost, Xid, out-of-memory, disk I/O).
- **AI explanations** *(optional, opt-in)* — explain a diagnostic in plain language with a
**local model (Ollama)** or **Claude**. Never automatic; only when you press the button.
**local model (Ollama)** or **Claude**, or **import a Windows crash dump (`.dmp`)** from a
Proton game and have it parsed and analysed. Never automatic; only when you press the button.
- **Shareable reports** — zip a diagnostic (logs, inventory, AI transcript) to hand to someone,
or share a live **terminal session** for remote help.
- **Self-updating** — `apt upgrade`, or the in-app updater.
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "rigdoctor"
version = "0.40.0"
version = "0.42.1"
description = "Modular hardware monitoring & crash diagnostics for Linux gamers."
readme = "README.md"
requires-python = ">=3.11"
+1 -1
View File
@@ -1,3 +1,3 @@
"""RigDoctor — modular hardware monitoring & crash diagnostics for Linux gamers."""
__version__ = "0.40.0"
__version__ = "0.42.1"
+77 -2
View File
@@ -466,6 +466,20 @@ def cmd_ai(args) -> int:
print(msg)
return 0 if ok else 1
if sub == "dump":
# Parse a Windows .dmp minidump (e.g. from a Proton game crash) and explain it.
from .core import minidump
report = minidump.parse(args.file)
if not report.ok:
print(f"Couldn't analyze the dump — {report.error}")
return 1
print(minidump.to_text(report))
print(f"\nAsking {ai.provider_label()} to explain {os.path.basename(args.file)}\n")
ok, msg = ai.explain(minidump.to_ai_text(report))
print(msg)
return 0 if ok else 1
# explain: gather the current health findings and ask the provider to explain them.
from .core import health
@@ -511,13 +525,13 @@ def cmd_gameenv(args) -> int:
def cmd_games(args) -> int:
from dataclasses import asdict
from .core import launchers, steam
from .core import customgames, launchers, steam
selected = steam.selected_library_paths()
result = steam.rescan() if selected else None
steam_games = result.games if result else []
extra = launchers.scan() # non-Steam (Lutris/Heroic)
all_games = list(steam_games) + list(extra)
all_games = list(steam_games) + list(extra) + customgames.scan() # + user-added (SPT etc.)
if args.json:
print(json.dumps({
@@ -582,6 +596,50 @@ def cmd_games_libraries(args) -> int:
return 0
def cmd_games_add(args) -> int:
from .core import customgames
if customgames.add(args.name, command=args.command, logdir=args.logdir):
print(f"Added '{args.name}' to your games (custom). It'll show in `rigdoctor games` "
"and the diagnostic game picker.")
entry = customgames.get(args.name) or {}
if entry.get("command"):
print(f" launch: {entry['command']} (run with: rigdoctor games play \"{args.name}\")")
if entry.get("logdir"):
print(f" logs: {entry['logdir']} (included in crash diagnostics)")
return 0
print(f"'{args.name}' is blank or already in your custom games.")
return 1
def cmd_games_play(args) -> int:
from .core import customgames, wrap
command = customgames.command(args.name)
if command is None:
if customgames.get(args.name) is None:
print(f"'{args.name}' isn't in your custom games. Add it: "
f"rigdoctor games add \"{args.name}\" --command <launch script>")
else:
print(f"'{args.name}' has no launch command. Set one: "
f"rigdoctor games remove \"{args.name}\" && rigdoctor games add \"{args.name}\" "
"--command <launch script>")
return 1
print(f"Launching '{args.name}' with crash-capture… (capture stops cleanly on exit; "
"a hard freeze is flagged next time you open RigDoctor)")
return wrap.run(command, game=args.name)
def cmd_games_remove(args) -> int:
from .core import customgames
if customgames.remove(args.name):
print(f"Removed '{args.name}' from your custom games.")
return 0
print(f"'{args.name}' isn't in your custom games. Current: {', '.join(customgames.names()) or '(none)'}")
return 1
def build_parser() -> argparse.ArgumentParser:
p = argparse.ArgumentParser(
prog="rigdoctor",
@@ -667,6 +725,20 @@ def build_parser() -> argparse.ArgumentParser:
lib_p.add_argument("--json", action="store_true", help="output JSON")
lib_p.set_defaults(func=cmd_games_libraries)
add_p = games_sub.add_parser("add", help="add a game no launcher reports (e.g. SPT)")
add_p.add_argument("name", help="game name, e.g. \"SPT\"")
add_p.add_argument("--command", default=None,
help="launch command/script (e.g. the path to tarkov.sh) — enables `games play`")
add_p.add_argument("--logdir", default=None,
help="the game's own log directory (auto-detected as <command dir>/logs if present)")
add_p.set_defaults(func=cmd_games_add)
play_p = games_sub.add_parser("play", help="launch a custom game with crash-capture (e.g. SPT)")
play_p.add_argument("name", help="game name to launch")
play_p.set_defaults(func=cmd_games_play)
rm_p = games_sub.add_parser("remove", help="remove a previously added custom game")
rm_p.add_argument("name", help="game name to remove")
rm_p.set_defaults(func=cmd_games_remove)
env_p = sub.add_parser("gameenv", help="gaming environment checks (M6): flag stability/perf settings")
env_p.add_argument("--json", action="store_true", help="output JSON instead of text")
env_p.set_defaults(func=cmd_gameenv)
@@ -707,6 +779,9 @@ def build_parser() -> argparse.ArgumentParser:
ai_sub.add_parser("status", help="show the configured provider (contacts nothing)").set_defaults(func=cmd_ai)
ai_sub.add_parser("test", help="send a tiny probe to verify connectivity").set_defaults(func=cmd_ai)
ai_sub.add_parser("explain", help="explain the current health findings with AI").set_defaults(func=cmd_ai)
dump_p = ai_sub.add_parser("dump", help="parse a Windows .dmp crash dump and explain it with AI")
dump_p.add_argument("file", help="path to the .dmp minidump (e.g. from a Proton game crash)")
dump_p.set_defaults(func=cmd_ai)
ai_p.set_defaults(func=cmd_ai, ai_cmd=None)
bundle_p = sub.add_parser("bundle", help="zip the latest stored diagnostic into a report bundle (M15)")
+3
View File
@@ -36,6 +36,9 @@ SPAWN_LOG = STATE_DIR / "recorder.out"
# Gaming environment / game detection (M6) — cached Steam game scan (mutable state,
# not config: refreshed by the background scan on every launch).
GAMES_FILE = STATE_DIR / "games.json"
# User-added games that no launcher reports (e.g. SPT/standalone mod launchers). Authored
# by the user (not a refreshable cache), so it lives in DATA_DIR and persists across scans.
CUSTOM_GAMES_FILE = DATA_DIR / "custom-games.json"
# Logging & reports (opt-in via `logging_enabled`). App log: rotating file of app events.
# Each diagnostic is stored under DIAGNOSTICS_DIR/<id>/; "Report" zips one into REPORTS_DIR.
+37
View File
@@ -30,6 +30,14 @@ ENTRIES: list[tuple[tuple[str, ...], str]] = [
(("xid 8", "xid 62", "xid 63", "xid 64"),
"These Xid codes commonly indicate VRAM/ECC or memory-training problems — suspect failing "
"VRAM or an unstable memory overclock."),
(("va-space mapping", "gpu_vaspace", "dmaallocmapping", "nvkms memory for gem",
"open kernel module", "nvidia open"),
"NVIDIA open-kernel-module VA-space mapping errors (gpu_vaspace.c / dmaAllocMapping / "
"'Failed to allocate NVKMS memory for GEM object') are a driver-internal fault on the open "
"module (nvidia-*-open). They can storm for minutes and end in a HARD FREEZE with NO Xid "
"logged — so the GPU never 'falls off the bus', and this is distinct from the Xid 79 "
"hardware drop. Fix path: switch from the open to the proprietary NVIDIA kernel module and "
"update to the latest driver branch."),
(("smart 197", "current_pending_sector", "pending sector"),
"SMART 197 (Current Pending Sector) > 0 = sectors the drive can't read and is waiting to "
"reallocate — early sign of a failing disk. Back up now and run an extended self-test."),
@@ -76,6 +84,35 @@ ENTRIES: list[tuple[tuple[str, ...], str]] = [
(("fork without exec", "skipping destruction"),
"BENIGN: 'pid X != Y, skipping destruction (fork without exec?)' is routine Steam/Proton "
"process bookkeeping, not an error."),
# --- crash-dump (.dmp) reasoning -------------------------------------------------
(("access violation", "0xc0000005", "0xc0000006"),
"Windows exception 0xC0000005 (access violation) = the game read/wrote/executed memory it "
"wasn't allowed to. A write/read to a low address (near 0x0) is a null-pointer dereference, "
"usually a game or graphics-driver bug; under Proton it's often a DXVK/VKD3D or Proton-version "
"issue. Identify the faulting MODULE to localize the fault."),
(("stack overflow", "0xc00000fd"),
"Windows exception 0xC00000FD (stack overflow) = unbounded recursion or a huge stack "
"allocation in the crashing module — almost always a software bug in that module."),
(("0xc0000409", "stack buffer overrun", "fast fail"),
"Windows 0xC0000409 (stack buffer overrun / __fastfail) = a security check tripped on memory "
"corruption; frequently anticheat or a DRM/overlay injecting into the game. Suspect overlays "
"(Steam/Discord/MSI Afterburner-equivalents) and anticheat compatibility under Proton."),
(("0xc0000374", "heap corruption"),
"Windows 0xC0000374 (heap corruption) = something scribbled over heap memory earlier; the "
"crash point is a symptom, not the cause. Often a mod, an injected overlay, or unstable RAM."),
(("nvwgf2umx", "nvoglv", "nvd3dum", "nvldumd"),
"A faulting NVIDIA user-mode driver DLL (nvwgf2umx/nvoglv/nvd3dum) means the crash happened "
"inside the GPU driver under Proton. On Linux this points at the NVIDIA driver + the "
"DXVK/VKD3D translation layer: try a different driver branch or Proton/Proton-GE version, "
"clear the DXVK shader cache, and revert any GPU overclock/undervolt."),
(("easyanticheat", "eac", "battleye", "beclient", "anticheat"),
"A faulting anticheat module (EasyAntiCheat/BattlEye) under Proton is usually a compatibility "
"problem: confirm the title's anticheat has Proton/Linux support enabled and try the Proton "
"version the community recommends for it (often Proton-GE or a specific Valve build)."),
(("d3d11.dll", "d3d12.dll", "dxgi.dll", "d3d9.dll", "dxvk", "vkd3d"),
"A crash in a Direct3D/DXGI module under Proton runs through DXVK (D3D9/10/11) or VKD3D-Proton "
"(D3D12). Try a known-good Proton version, update/override DXVK-VKD3D, clear the shader cache, "
"and check the GPU driver — these are the usual fixes for D3D faults on Linux."),
]
+113
View File
@@ -0,0 +1,113 @@
"""User-added games (M6): a manual list for titles no launcher reports.
Some games never show up in a Steam/Lutris/Heroic scan — standalone mod launchers like
**SPT** (Single-Player Tarkov), itch.io downloads, or any hand-installed executable. This
module keeps a small user-authored list so those still appear in the game list and can be
picked for a focused diagnostic, in the same `steam.Game` shape as every other source.
Each entry is a name plus two optionals: a **launch command** (so `rigdoctor games play`
can start it under the auto-capture wrapper) and a **log directory** (so a crash diagnostic
can read the game's own logs — e.g. SPT's `logs/tarkov-latest.log`). Stored as JSON in
`config.CUSTOM_GAMES_FILE`; stdlib only; every reader degrades to [] on a missing/bad file.
"""
from __future__ import annotations
import json
import os
import shlex
from .. import config
from .steam import Game
LAUNCHER = "custom"
def _load() -> list[dict]:
try:
data = json.loads(config.CUSTOM_GAMES_FILE.read_text())
except (OSError, ValueError):
return []
games = data.get("games") if isinstance(data, dict) else None
return [g for g in games if isinstance(g, dict) and g.get("name")] if isinstance(games, list) else []
def _save(games: list[dict]) -> None:
config.CUSTOM_GAMES_FILE.parent.mkdir(parents=True, exist_ok=True)
config.CUSTOM_GAMES_FILE.write_text(json.dumps({"games": games}, indent=2, ensure_ascii=False) + "\n")
def names() -> list[str]:
"""Just the stored names (insertion order preserved)."""
return [str(g["name"]) for g in _load()]
def get(name: str) -> dict | None:
"""The stored entry (name + optional command/logdir) for a game, case-insensitive."""
name = (name or "").strip().lower()
return next((g for g in _load() if str(g["name"]).lower() == name), None)
def add(name: str, command: str | None = None, logdir: str | None = None) -> bool:
"""Add a game by name, with an optional launch command and log directory.
Returns False if the name is blank or already present (case-insensitive). When a command
is given but no logdir, a sibling `logs/` dir is inferred if it exists (covers SPT's layout).
"""
name = (name or "").strip()
if not name:
return False
if get(name):
return False
entry: dict = {"name": name}
command = (command or "").strip()
if command:
entry["command"] = command
if not logdir:
sibling = os.path.join(os.path.dirname(_argv0(command)), "logs")
if os.path.isdir(sibling):
logdir = sibling
logdir = (logdir or "").strip()
if logdir:
entry["logdir"] = os.path.expanduser(logdir)
games = _load()
games.append(entry)
_save(games)
return True
def remove(name: str) -> bool:
"""Remove a game by name (case-insensitive). Returns True if one was removed."""
name = (name or "").strip().lower()
games = _load()
kept = [g for g in games if str(g["name"]).lower() != name]
if len(kept) == len(games):
return False
_save(kept)
return True
def _argv0(command: str) -> str:
parts = shlex.split(command)
return parts[0] if parts else command
def command(name: str) -> list[str] | None:
"""The launch argv for a game (shlex-split), or None if it has no command."""
entry = get(name)
cmd = (entry or {}).get("command")
return shlex.split(cmd) if cmd else None
def log_dir(name: str) -> str | None:
"""The game's own log directory, or None if it isn't set / doesn't exist."""
entry = get(name)
path = (entry or {}).get("logdir")
return path if path and os.path.isdir(path) else None
def scan() -> list[Game]:
"""User-added games as `Game` objects (launcher='custom'), sorted by name."""
out = [Game(appid="", name=str(g["name"]), library="", installdir="", launcher=LAUNCHER)
for g in _load()]
return sorted(out, key=lambda g: g.name.lower())
+1 -1
View File
@@ -75,7 +75,7 @@ def store(result, capture_path=None, since: float | None = None) -> Path | None:
_write(target / "report.txt", "\n".join(report))
try:
logs = gamelogs.collect(since=since)
logs = gamelogs.collect(since=since, game=getattr(result, "game", None))
if logs:
_write(target / "gamelogs.txt", logs)
except OSError:
+35 -2
View File
@@ -81,15 +81,48 @@ def available() -> bool:
return bool(_proton_logs() or _steam_console())
def collect(since: float | None = None, max_bytes: int = 8000) -> str:
"""Recent Proton + Steam log tails as one labelled text block ('' if none).
def _custom_game_logs(game: str, since: float | None, max_bytes: int) -> list[str]:
"""Tail the recent ``*.log`` files in a custom game's own log dir (e.g. SPT's
``logs/tarkov-latest.log`` + ``server-latest.log``), newest first, freshness-scoped by mtime.
Custom-game logs use their own timestamp formats, so we scope by file mtime (like the Proton
log) rather than the ``[YYYY-MM-DD …]`` line filter used for the Steam console.
"""
from . import customgames
directory = customgames.log_dir(game)
if not directory:
return []
try:
files = [p for p in Path(directory).glob("*.log") if p.is_file()]
except OSError:
return []
files.sort(key=_mtime, reverse=True)
sections: list[str] = []
for log in files[:4]: # a session touches a handful (tarkov/server/launcher latest)
if since is not None and _mtime(log) < since:
continue
tail = _tail(log, max_bytes).strip()
if tail:
sections.append(f"--- {game} log ({log.name}) ---\n{tail}")
return sections
def collect(since: float | None = None, max_bytes: int = 8000, game: str | None = None) -> str:
"""Recent Proton + Steam (+ custom-game) log tails as one labelled text block ('' if none).
With ``since`` (epoch), scope to that session: skip a Proton log not written during/after
the session (a stale per-app log from an earlier game), and keep only Steam-console lines
timestamped at/after ``since`` — so we don't feed the model an unrelated past session.
``game`` (the diagnostic's focused title) pulls in that custom game's own logs if it has a
registered log dir — e.g. SPT's server/launcher logs, which Steam/Proton never see.
"""
sections: list[str] = []
if game:
sections += _custom_game_logs(game, since, max_bytes)
protons = _proton_logs()
if protons:
log = protons[0]
+73
View File
@@ -116,6 +116,31 @@ def scan_journal_text(text: str) -> list[Finding]:
"Check power/thermals/driver; capture a session with `rigdoctor record`.",
))
# NVIDIA open-kernel-module VA-space mapping faults: a driver-internal failure that can
# storm for minutes and end in a HARD FREEZE with NO Xid logged — the GPU never "falls off
# the bus", so the Xid scan above misses it entirely. These code paths live in the open
# kernel module (nvidia-*-open); the proprietary module doesn't hit them.
nvrm_va = [
ln for ln in lines
if "gpu_vaspace.c" in ln
or "_gvaspaceMappingInsert" in ln
or "dmaAllocMapping" in ln
or "NVKMS memory for GEM object" in ln
]
if nvrm_va:
findings.append(Finding(
WARNING, "GPU", f"NVIDIA driver VA-space mapping errors ×{len(nvrm_va)}",
"The NVIDIA kernel module repeatedly failed to update the GPU's virtual address "
"space (gpu_vaspace / dmaAllocMapping assertions, NVKMS GEM-allocation failures). "
"This is a driver-internal fault that can recur for minutes and end in a hard freeze "
"with NO Xid logged — distinct from an Xid 79 hardware drop. These code paths are "
"specific to the open kernel module (nvidia-*-open).",
"If you're on the open module, switch to the proprietary NVIDIA driver "
"(install `nvidia-driver-###` instead of the `…-open` variant) and update to the "
"latest branch, then reboot. Capture a session with `rigdoctor record` to confirm "
"the errors precede the freeze.",
))
return findings
@@ -188,6 +213,53 @@ def check_nvidia_driver() -> list[Finding]:
return []
def _read_text(path: str) -> str | None:
try:
return Path(path).read_text()
except OSError:
return None
def _nvidia_module_is_open() -> bool | None:
"""Whether the *loaded* NVIDIA kernel module is the open-source flavor.
True = open (nvidia-*-open), False = proprietary, None = can't tell / no NVIDIA module.
/proc is authoritative for the loaded module and needs no external tool; modinfo's filename
(…/nvidia-###-open/nvidia.ko) is the fallback.
"""
proc = _read_text("/proc/driver/nvidia/version")
if proc:
low = proc.lower()
if "open kernel module" in low:
return True
if "kernel module" in low: # proprietary banner: "NVIDIA UNIX … Kernel Module …"
return False
if shutil.which("modinfo"):
try:
out = subprocess.run(["modinfo", "nvidia"], capture_output=True, text=True, timeout=10).stdout
except (subprocess.SubprocessError, OSError):
out = ""
for line in out.splitlines():
if line.startswith("filename:"):
return "-open" in line
return None
def check_nvidia_module() -> list[Finding]:
"""Note when the open-source NVIDIA kernel module is loaded — the context behind the no-Xid
VA-space freeze signature, which lives in the open module's code paths (suggestion-only)."""
if _nvidia_module_is_open() is not True:
return []
return [Finding(
INFO, "Driver", "NVIDIA open kernel module in use",
"The loaded NVIDIA driver is the open-source kernel module (nvidia-*-open). It's fine for "
"most setups, but on some GeForce cards it hits driver-internal faults (VA-space mapping "
"errors, hard freezes with no Xid) that the proprietary module doesn't.",
"If you get unexplained hard freezes with no Xid in the logs, try the proprietary NVIDIA "
"driver (`nvidia-driver-###` rather than the `…-open` variant) on the latest branch.",
)]
def _smart_devices() -> list[str]:
try:
proc = subprocess.run(["smartctl", "--scan"], capture_output=True, text=True, timeout=10)
@@ -336,6 +408,7 @@ def run_health_checks(include_journal: bool = True) -> list[Finding]:
findings: list[Finding] = []
findings += check_nvidia_driver()
findings += check_nvidia_module()
if include_journal:
findings += check_journal()
findings += check_journal_persistence()
+314
View File
@@ -0,0 +1,314 @@
"""Parse a Windows crash dump (``.dmp`` minidump) into text the AI can reason over (M14).
Linux gamers get these from Windows games running under **Proton/Wine**: the game's
crash handler (Crashpad/Breakpad, Unreal/Unity, or Wine itself) writes a binary minidump
when the title hard-crashes. The file is binary, so we can't hand it to a model directly —
we parse the documented ``MDMP`` streams with stdlib :mod:`struct` (no pip deps, per the
core rule) and pull out the parts that actually diagnose a crash:
* the **exception / crash reason** (e.g. access violation 0xC0000005),
* the **faulting module** (which DLL the crash address lands in — ``nvwgf2umx.dll``,
``d3d11.dll``, an anticheat, the game's own .exe…),
* **OS / CPU** info, and the **loaded module list**.
If ``minidump_stackwalk`` (Breakpad) or ``minidump-stackwalk`` (rust-minidump) is on PATH,
its fuller report is appended best-effort; we never depend on it.
The result feeds the existing opt-in AI flow (:mod:`ai`) exactly like the sensor findings do.
"""
from __future__ import annotations
import shutil
import struct
import subprocess
import time
from dataclasses import dataclass, field
from pathlib import Path
from .health import CRITICAL, INFO, Finding
# --- MDMP on-disk layout (all little-endian, packed) --------------------------------
_SIGNATURE = b"MDMP"
_HEADER = struct.Struct("<4sIIIIIQ") # sig, ver, n_streams, dir_rva, csum, time, flags
_DIRECTORY = struct.Struct("<III") # stream_type, data_size, data_rva
_SYSINFO = struct.Struct("<HHHBBIIIII") # arch, lvl, rev, n_cpu, prod, maj, min, build, plat, csd
_MODULE_STRIDE = 108 # sizeof(MINIDUMP_MODULE)
# Stream types we read (MINIDUMP_STREAM_TYPE).
_MODULE_LIST = 4
_EXCEPTION = 6
_SYSTEM_INFO = 7
_COMMENT_A = 10
_COMMENT_W = 11
_ARCH = {0: "x86", 5: "ARM", 6: "IA-64", 9: "x86-64", 12: "ARM64", 0xFFFF: "unknown"}
_PLATFORM = {0x8201: "Linux", 0x8202: "Solaris", 0x8203: "macOS", 0x8204: "iOS",
0x8205: "Android", 0x8207: "NaCl"}
# Common Windows exception (NTSTATUS) codes — what the model needs named, not raw hex.
_EXCEPTION_NAMES = {
0x80000003: "Breakpoint",
0x80000004: "Single step",
0xC0000005: "Access violation",
0xC0000006: "In-page error",
0xC000001D: "Illegal instruction",
0xC0000025: "Noncontinuable exception",
0xC000008C: "Array bounds exceeded",
0xC000008E: "Float divide by zero",
0xC0000090: "Float invalid operation",
0xC0000094: "Integer divide by zero",
0xC0000095: "Integer overflow",
0xC0000096: "Privileged instruction",
0xC00000FD: "Stack overflow",
0xC0000135: "DLL not found",
0xC0000142: "DLL initialization failed",
0xC0000374: "Heap corruption",
0xC0000409: "Stack buffer overrun / fast fail",
0xC000041D: "Fatal user-callback exception",
0xE06D7363: "C++ exception (MSVC)",
}
_ACCESS = {0: "reading", 1: "writing", 8: "executing"} # AV ExceptionInformation[0]
_STACKWALK_BINS = ("minidump_stackwalk", "minidump-stackwalk")
_MODULES_SHOWN = 80 # cap the module list so the AI prompt stays bounded
@dataclass
class Module:
name: str # basename only
base: int
size: int
@dataclass
class MinidumpReport:
path: str
ok: bool = False
error: str = ""
crash_reason: str = ""
exception_code: int | None = None
exception_address: int | None = None
faulting_module: str | None = None
crashing_thread: int | None = None
os_name: str = ""
cpu_arch: str = ""
cpu_count: int = 0
timestamp: int | None = None
modules: list[Module] = field(default_factory=list)
comment: str = ""
stackwalk: str = ""
def parse(path, *, run_stackwalk: bool = True) -> MinidumpReport:
"""Parse a ``.dmp`` file. Never raises — a bad/unsupported file returns ``ok=False``."""
report = MinidumpReport(path=str(path))
try:
data = Path(path).read_bytes()
except OSError as exc:
report.error = f"can't read the file: {exc}"
return report
if len(data) < _HEADER.size or data[:4] != _SIGNATURE:
report.error = "not a Windows minidump (missing the 'MDMP' signature)."
return report
try:
_sig, _ver, n_streams, dir_rva, _csum, ts, _flags = _HEADER.unpack_from(data, 0)
report.timestamp = ts or None
streams = _streams(data, dir_rva, n_streams)
_read_system_info(data, streams.get(_SYSTEM_INFO), report)
report.modules = _read_modules(data, streams.get(_MODULE_LIST))
_read_exception(data, streams.get(_EXCEPTION), report)
report.comment = _read_comment(data, streams)
except (struct.error, ValueError, IndexError) as exc:
report.error = f"the minidump looks corrupt or unsupported: {exc}"
return report
if report.exception_address is not None:
report.faulting_module = _module_at(report.modules, report.exception_address)
report.ok = True
if run_stackwalk:
report.stackwalk = stackwalk(path)
return report
def _streams(data: bytes, dir_rva: int, n: int) -> dict[int, tuple[int, int]]:
"""Map stream_type -> (data_size, data_rva). First occurrence of each type wins."""
out: dict[int, tuple[int, int]] = {}
for i in range(n):
off = dir_rva + i * _DIRECTORY.size
if off + _DIRECTORY.size > len(data):
break
stype, size, rva = _DIRECTORY.unpack_from(data, off)
out.setdefault(stype, (size, rva))
return out
def _read_system_info(data: bytes, loc, report: MinidumpReport) -> None:
if not loc:
return
_size, rva = loc
arch, _lvl, _rev, n_cpu, _prod, major, minor, build, platform, _csd = \
_SYSINFO.unpack_from(data, rva)
report.cpu_arch = _ARCH.get(arch, f"arch 0x{arch:x}")
report.cpu_count = n_cpu
if platform == 2: # VER_PLATFORM_WIN32_NT
report.os_name = f"Windows {major}.{minor}.{build}"
elif platform in _PLATFORM:
ver = f" {major}.{minor}.{build}" if (major or minor or build) else ""
report.os_name = _PLATFORM[platform] + ver
else:
report.os_name = f"platform 0x{platform:x} {major}.{minor}.{build}"
def _read_modules(data: bytes, loc) -> list[Module]:
if not loc:
return []
_size, rva = loc
(count,) = struct.unpack_from("<I", data, rva)
base_off = rva + 4
modules: list[Module] = []
for i in range(count):
rec = base_off + i * _MODULE_STRIDE
if rec + _MODULE_STRIDE > len(data):
break
base, = struct.unpack_from("<Q", data, rec)
size, = struct.unpack_from("<I", data, rec + 8)
name_rva, = struct.unpack_from("<I", data, rec + 20)
modules.append(Module(_read_mdstring(data, name_rva), base, size))
return modules
def _read_exception(data: bytes, loc, report: MinidumpReport) -> None:
if not loc:
return
_size, rva = loc
thread_id, = struct.unpack_from("<I", data, rva) # MINIDUMP_EXCEPTION_STREAM
code, = struct.unpack_from("<I", data, rva + 8) # ExceptionRecord.ExceptionCode
address, = struct.unpack_from("<Q", data, rva + 24) # ExceptionRecord.ExceptionAddress
n_params, = struct.unpack_from("<I", data, rva + 32)
report.crashing_thread = thread_id
report.exception_code = code
report.exception_address = address
report.crash_reason = _describe_exception(data, rva, code, n_params)
def _describe_exception(data: bytes, rva: int, code: int, n_params: int) -> str:
name = _EXCEPTION_NAMES.get(code, "Unknown exception")
reason = f"{name} (0x{code:08X})"
if code in (0xC0000005, 0xC0000006) and n_params >= 2:
op = struct.unpack_from("<Q", data, rva + 40)[0] # ExceptionInformation[0]
addr = struct.unpack_from("<Q", data, rva + 48)[0] # ExceptionInformation[1]
reason += f" {_ACCESS.get(op, 'accessing')} 0x{addr:X}"
return reason
def _read_mdstring(data: bytes, rva: int) -> str:
"""A MINIDUMP_STRING (u32 byte-length + UTF-16LE), returned as a basename."""
if not rva or rva + 4 > len(data):
return ""
length, = struct.unpack_from("<I", data, rva)
start = rva + 4
raw = data[start:start + length]
text = raw.decode("utf-16-le", "replace").strip("\x00")
return text.replace("\\", "/").rsplit("/", 1)[-1] or text
def _read_comment(data: bytes, streams: dict[int, tuple[int, int]]) -> str:
if _COMMENT_W in streams:
size, rva = streams[_COMMENT_W]
return data[rva:rva + size].decode("utf-16-le", "replace").strip("\x00").strip()
if _COMMENT_A in streams:
size, rva = streams[_COMMENT_A]
return data[rva:rva + size].decode("utf-8", "replace").strip("\x00").strip()
return ""
def _module_at(modules: list[Module], address: int) -> str | None:
for m in modules:
if m.base <= address < m.base + m.size:
return m.name
return None
def stackwalk(path, timeout: float = 25.0, max_chars: int = 12000) -> str:
"""Best-effort fuller report from an external stackwalker, or '' if none is installed."""
exe = next((shutil.which(name) for name in _STACKWALK_BINS if shutil.which(name)), None)
if not exe:
return ""
try:
proc = subprocess.run(
[exe, str(path)], capture_output=True, text=True, timeout=timeout, check=False)
except (OSError, subprocess.SubprocessError):
return ""
return (proc.stdout or "").strip()[:max_chars]
# --- rendering ----------------------------------------------------------------------
def to_text(report: MinidumpReport) -> str:
"""Human-readable structured summary (also shown in the GUI)."""
name = Path(report.path).name
lines = [f"Crash dump: {name}"]
if report.crash_reason:
lines.append(f"Crash reason: {report.crash_reason}")
if report.faulting_module:
lines.append(f"Faulting module: {report.faulting_module}")
elif report.exception_address is not None:
lines.append(f"Faulting address: 0x{report.exception_address:X} (no module matched)")
if report.crashing_thread is not None:
lines.append(f"Crashing thread: {report.crashing_thread}")
if report.os_name:
lines.append(f"OS: {report.os_name}")
if report.cpu_arch:
cpus = f" ({report.cpu_count} logical)" if report.cpu_count else ""
lines.append(f"CPU: {report.cpu_arch}{cpus}")
if report.timestamp:
lines.append("Captured: " + time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(report.timestamp)))
if report.modules:
shown = report.modules[:_MODULES_SHOWN]
more = len(report.modules) - len(shown)
lines.append(f"\nLoaded modules ({len(report.modules)}):")
lines += [f"- {m.name}" for m in shown if m.name]
if more > 0:
lines.append(f"- (+{more} more)")
if report.comment:
lines.append(f"\nDump comment:\n{report.comment[:1000]}")
return "\n".join(lines)
def to_ai_text(report: MinidumpReport) -> str:
"""The block sent to the model: Proton/Linux framing + summary + stackwalk."""
framing = (
"These findings come from a Windows crash minidump (.dmp) produced by a game running "
"under Proton/Wine on Linux. The faulting modules are Windows DLLs inside the Proton "
"prefix, so the crash is a Windows-process fault but the fixes are Linux/Proton-side "
"(Proton version, DXVK/VKD3D, GPU driver, launch options, shader cache) — never Windows "
"admin/registry steps."
)
parts = [framing, "", to_text(report)]
if report.stackwalk:
parts.append("\nminidump_stackwalk output:\n" + report.stackwalk)
return "\n".join(parts)
def to_findings(report: MinidumpReport) -> list[Finding]:
"""Render the dump as Finding cards for the GUI (mirrors the health report)."""
findings: list[Finding] = []
detail_bits = []
if report.faulting_module:
detail_bits.append(f"in {report.faulting_module}")
if report.exception_address is not None:
detail_bits.append(f"at 0x{report.exception_address:X}")
detail = (report.crash_reason or "Crash recorded")
if detail_bits:
detail += " " + " ".join(detail_bits) + "."
findings.append(Finding(
CRITICAL, "Crash dump",
f"Crash in {report.faulting_module}" if report.faulting_module else "Crash recorded",
detail,
"Use “Explain with AI” for likely causes and Proton-side fixes.",
))
env_bits = [b for b in (report.os_name, report.cpu_arch and f"{report.cpu_arch} CPU") if b]
if env_bits:
findings.append(Finding(
INFO, "Crash dump", "Dump environment", " · ".join(env_bits)))
return findings
+7 -3
View File
@@ -40,16 +40,20 @@ def launch_option() -> str:
return f"{quoted} wrap %command%"
def run(command: list[str]) -> int:
def run(command: list[str], game: str | None = None) -> int:
"""Start a focused capture (unless one's already running), run the game, then stop it.
Returns the game's exit code so Steam sees the right status."""
Returns the game's exit code so Steam sees the right status.
`game` overrides name detection — used by `games play` for a custom game (e.g. SPT), where
there's no SteamAppId and the bare script name (tarkov.sh) wouldn't tag the capture usefully.
"""
from . import diagnostic, reccontrol
if not command:
print("usage: rigdoctor wrap %command% (set as a Steam launch option)", file=sys.stderr)
return 2
game = game_name_from_env() or os.path.basename(command[0])
game = game or game_name_from_env() or os.path.basename(command[0])
started = False
if not reccontrol.running_pid(): # don't disturb an existing capture
started = diagnostic.start(game=game) is not None
+1 -1
View File
@@ -143,7 +143,7 @@ class DiagnosticDialog(QDialog):
lines.append("\nCapture summary:\n" + render_summary(summary))
since = (summary.start - 60) if summary.start else None
logs = gamelogs.collect(since=since) # scoped to this session
logs = gamelogs.collect(since=since, game=result.game) # scoped to this session
if logs:
lines.append("\nGame/Proton/Steam logs for this session:\n" + logs)
sys_logs = syslogs.collect(since=since) # kernel log + crashed-process records
+138 -1
View File
@@ -16,6 +16,7 @@ from PySide6.QtWidgets import (
QApplication,
QCheckBox,
QDialog,
QFileDialog,
QFrame,
QHBoxLayout,
QLabel,
@@ -29,6 +30,7 @@ from PySide6.QtWidgets import (
from ..config import load_config, update_config
from .diagnostic_dialog import DiagnosticDialog
from .minidump_dialog import MinidumpDialog
from .theme import ACCENT, GOOD, MUTED, WARN
@@ -79,6 +81,7 @@ class GamesPage(QWidget):
_scanned = Signal(object) # steam.ScanResult
new_count_changed = Signal(int) # newly-installed game count (for the nav badge)
_diag_done = Signal(object) # DiagnosticResult — focused capture analyzed
_dump_parsed = Signal(object) # minidump.MinidumpReport — imported .dmp (or None)
def __init__(self) -> None:
super().__init__()
@@ -86,6 +89,7 @@ class GamesPage(QWidget):
self._libraries_ready.connect(self._render_libraries)
self._scanned.connect(self._render_games)
self._diag_done.connect(self._on_diag_done)
self._dump_parsed.connect(self._on_dump_parsed)
self._busy = False
self._new_appids: set[str] = set()
self._extra_games: list = [] # non-Steam (Lutris/Heroic), appended after a scan
@@ -103,9 +107,18 @@ class GamesPage(QWidget):
self._status = QLabel("")
self._status.setObjectName("Muted")
header.addWidget(self._status)
# Import a Windows crash dump (.dmp) from a Proton game and analyze it with AI.
# Shown only when an AI provider is configured (AI analysis is the point).
self._import_btn = QPushButton("Import crash dump…")
self._import_btn.clicked.connect(self._import_dump)
header.addWidget(self._import_btn)
self._autocap_btn = QPushButton("Auto-capture…")
self._autocap_btn.clicked.connect(self._show_autocapture)
header.addWidget(self._autocap_btn)
# Add a game no launcher reports (e.g. SPT / standalone mod launchers).
self._add_btn = QPushButton("Add game…")
self._add_btn.clicked.connect(self._add_custom_game)
header.addWidget(self._add_btn)
self._rescan_btn = QPushButton("Rescan")
self._rescan_btn.setObjectName("PrimaryButton")
self._rescan_btn.clicked.connect(self.refresh)
@@ -192,6 +205,7 @@ class GamesPage(QWidget):
self._load_cached() # instant display from the last scan
QTimer.singleShot(400, self.refresh) # then rescan in the background on launch
self._check_crash() # surface an interrupted (crashed) diagnostic
self._refresh_import_btn() # show Import only if AI is configured
# --- loading ----------------------------------------------------------------------
@@ -225,7 +239,9 @@ class GamesPage(QWidget):
]
self._libraries_ready.emit(libs)
try:
self._extra_games = launchers.scan() # Lutris / Heroic (non-Steam)
from ..core import customgames
# non-Steam: Lutris/Heroic + user-added games (SPT etc.)
self._extra_games = list(launchers.scan()) + customgames.scan()
except Exception:
self._extra_games = []
self._scanned.emit(steam.rescan())
@@ -413,6 +429,83 @@ class GamesPage(QWidget):
reccontrol.stop_background()
self._banner.hide()
def _add_custom_game(self) -> None:
"""Manually add a game no launcher reports (e.g. SPT): name + an optional launch
command/script (so it can be launched under crash-capture) and log folder."""
from ..core import customgames
dlg = QDialog(self)
dlg.setWindowTitle("Add game")
dlg.setMinimumWidth(560)
v = QVBoxLayout(dlg)
v.setContentsMargins(20, 18, 20, 16)
v.setSpacing(10)
intro = QLabel(
"Add a game no launcher reports — a standalone mod launcher like SPT, an itch.io "
"download, or any hand-installed game.")
intro.setWordWrap(True)
v.addWidget(intro)
name_edit = QLineEdit()
name_edit.setPlaceholderText("SPT")
v.addWidget(QLabel("Game name"))
v.addWidget(name_edit)
cmd_edit = QLineEdit()
cmd_edit.setPlaceholderText("e.g. /run/media/.../Escape-From-Tarkov/tarkov.sh")
cmd_row = QHBoxLayout()
cmd_row.addWidget(cmd_edit, 1)
cmd_browse = QPushButton("Browse…")
cmd_row.addWidget(cmd_browse, 0)
v.addWidget(QLabel("Launch command / script (optional — enables launch + auto-capture)"))
v.addLayout(cmd_row)
log_edit = QLineEdit()
log_edit.setPlaceholderText("auto-detected from the script's folder (its logs/ subfolder)")
log_row = QHBoxLayout()
log_row.addWidget(log_edit, 1)
log_browse = QPushButton("Browse…")
log_row.addWidget(log_browse, 0)
v.addWidget(QLabel("Log folder (optional — read into crash diagnostics)"))
v.addLayout(log_row)
def _pick_command() -> None:
path, _ = QFileDialog.getOpenFileName(dlg, "Select the launch script/executable")
if path:
cmd_edit.setText(path)
def _pick_logdir() -> None:
path = QFileDialog.getExistingDirectory(dlg, "Select the game's log folder")
if path:
log_edit.setText(path)
cmd_browse.clicked.connect(_pick_command)
log_browse.clicked.connect(_pick_logdir)
buttons = QHBoxLayout()
buttons.addStretch(1)
cancel = QPushButton("Cancel")
cancel.clicked.connect(dlg.reject)
buttons.addWidget(cancel)
add = QPushButton("Add")
add.setObjectName("PrimaryButton")
add.setDefault(True)
add.clicked.connect(dlg.accept)
buttons.addWidget(add)
v.addLayout(buttons)
if dlg.exec() != QDialog.DialogCode.Accepted:
return
name = name_edit.text().strip()
if not name:
return
if customgames.add(name, command=cmd_edit.text().strip() or None,
logdir=log_edit.text().strip() or None):
self.refresh()
else:
QMessageBox.information(self, "Add game", f"'{name}' is already in your games.")
def _show_autocapture(self) -> None:
from ..core import wrap
@@ -450,6 +543,49 @@ class GamesPage(QWidget):
v.addLayout(buttons)
dlg.exec()
# --- import a crash dump (.dmp) ---------------------------------------------------
def _refresh_import_btn(self) -> None:
from ..core import ai
self._import_btn.setVisible(ai.is_configured())
def _import_dump(self) -> None:
from ..core import ai
if not ai.is_configured():
QMessageBox.information(
self, "RigDoctor",
"Set up an AI provider first (Settings → AI assistant) to analyze a crash dump.")
return
path, _ = QFileDialog.getOpenFileName(
self, "Import crash dump", os.path.expanduser("~"),
"Crash dumps (*.dmp);;All files (*)")
if not path:
return
self._import_btn.setEnabled(False)
self._status.setText("Parsing crash dump…")
threading.Thread(target=self._work_import, args=(path,), daemon=True).start()
def _work_import(self, path: str) -> None:
from ..core import minidump
try:
report = minidump.parse(path) # parses + runs minidump_stackwalk if installed
except Exception:
report = None
self._dump_parsed.emit(report)
def _on_dump_parsed(self, report) -> None:
self._import_btn.setEnabled(True)
self._status.setText("")
if report is None or not report.ok:
detail = report.error if report is not None else "Couldn't read the file."
QMessageBox.warning(
self, "Import crash dump", f"Couldn't analyze the dump — {detail}")
return
MinidumpDialog(report, self).exec()
# --- hard-crash recovery ----------------------------------------------------------
def _check_crash(self) -> None:
@@ -498,6 +634,7 @@ class GamesPage(QWidget):
# Viewing the list acknowledges the new games: clear the sidebar badge. The NEW
# tags stay on the rows for this session so the user can still spot them.
super().showEvent(event)
self._refresh_import_btn() # AI may have been configured since this page was built
if self._new_appids:
from ..core import steam
+182
View File
@@ -0,0 +1,182 @@
"""Results view for an imported crash dump (.dmp, M14): parsed summary + AI explanation.
Mirrors :class:`DiagnosticDialog` — the same opt-in, streamed "Explain with AI" flow (D24),
applied to a Windows minidump parsed by :mod:`core.minidump` instead of a sensor capture.
"""
from __future__ import annotations
import threading
from pathlib import Path
from PySide6.QtCore import Qt, Signal
from PySide6.QtGui import QFont, QTextCursor
from PySide6.QtWidgets import (
QDialog,
QFrame,
QHBoxLayout,
QLabel,
QMessageBox,
QPushButton,
QScrollArea,
QTextEdit,
QVBoxLayout,
QWidget,
)
from ..core import minidump
from .widgets import finding_card
class MinidumpDialog(QDialog):
_chunk = Signal(str) # streamed token delta (worker thread -> GUI)
_explained = Signal(object) # (ok, full_text) when the AI stream finishes
def __init__(self, report: minidump.MinidumpReport, parent=None) -> None:
super().__init__(parent)
self._report = report
self._stream_view = None
self._stream_status = None
self._chunk.connect(self._on_chunk)
self._explained.connect(self._on_explained)
name = Path(report.path).name
self.setWindowTitle(f"Crash dump — {name}")
self.resize(660, 680)
root = QVBoxLayout(self)
root.setContentsMargins(20, 18, 20, 16)
root.setSpacing(14)
title = QLabel(f"Crash dump — {name}")
title.setObjectName("PageTitle")
root.addWidget(title)
scroll = QScrollArea()
scroll.setWidgetResizable(True)
scroll.setFrameShape(QFrame.Shape.NoFrame)
scroll.setStyleSheet("background: transparent;")
body = QWidget()
col = QVBoxLayout(body)
col.setContentsMargins(0, 0, 0, 0)
col.setSpacing(10)
col.setAlignment(Qt.AlignmentFlag.AlignTop)
# Parsed summary (crash reason / faulting module / OS / CPU / modules) — monospace.
summary_head = QLabel("Dump summary")
summary_head.setStyleSheet("font-weight: 700; background: transparent;")
col.addWidget(summary_head)
summary = QLabel(minidump.to_text(report))
summary.setObjectName("Report")
summary.setFont(QFont("monospace"))
summary.setTextInteractionFlags(Qt.TextInteractionFlag.TextSelectableByMouse)
summary.setWordWrap(False)
summary.setStyleSheet(
"background: #0d0f13; color: #cfd3da; border: 1px solid #2a2f39; "
"border-radius: 8px; padding: 10px;"
)
col.addWidget(summary)
findings = minidump.to_findings(report)
find_head = QLabel(f"Findings ({len(findings)})")
find_head.setStyleSheet("font-weight: 700; background: transparent;")
col.addWidget(find_head)
for finding in findings:
col.addWidget(finding_card(finding))
if report.stackwalk: # only when an external stackwalker was available
sw_head = QLabel("minidump_stackwalk output")
sw_head.setStyleSheet("font-weight: 700; background: transparent;")
col.addWidget(sw_head)
sw = QTextEdit()
sw.setObjectName("Report")
sw.setReadOnly(True)
sw.setFont(QFont("monospace"))
sw.setPlainText(report.stackwalk)
sw.setMinimumHeight(160)
col.addWidget(sw)
scroll.setWidget(body)
root.addWidget(scroll, 1)
buttons = QHBoxLayout()
self._explain_btn = QPushButton("Explain with AI")
self._explain_btn.clicked.connect(self._explain_with_ai)
from ..core import ai
self._explain_btn.setVisible(ai.is_configured()) # opt-in only; hidden if not set up
buttons.addWidget(self._explain_btn)
buttons.addStretch(1)
close = QPushButton("Close")
close.setObjectName("PrimaryButton")
close.clicked.connect(self.accept)
buttons.addWidget(close)
root.addLayout(buttons)
# --- AI explanation (M14, D24) — streamed; runs only on this button press ----------
def _explain_with_ai(self) -> None:
from ..core import ai
if not ai.is_local(): # cloud provider → explicit consent before sending data
confirm = QMessageBox.question(
self, "Send to AI provider",
f"This sends the parsed crash dump to {ai.provider_label()}.\n\nContinue?",
QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No,
QMessageBox.StandardButton.No,
)
if confirm != QMessageBox.StandardButton.Yes:
return
self._explain_btn.setEnabled(False)
dialog = self._open_stream_dialog()
threading.Thread(target=self._work_explain, daemon=True).start()
dialog.exec() # streaming fills the view live via signals during this nested loop
self._stream_view = self._stream_status = None
self._explain_btn.setEnabled(True)
def _work_explain(self) -> None:
from ..core import ai
text = minidump.to_ai_text(self._report)
ok, reply = ai.explain_stream(text, on_chunk=lambda d: self._chunk.emit(d))
self._explained.emit((ok, reply))
def _on_chunk(self, delta: str) -> None:
if self._stream_view is None:
return
self._stream_view.moveCursor(QTextCursor.MoveOperation.End)
self._stream_view.insertPlainText(delta) # live plain text as tokens arrive
self._stream_view.ensureCursorVisible()
def _on_explained(self, result) -> None:
ok, text = result
if self._stream_view is not None:
if ok:
self._stream_view.setMarkdown(text) # re-render the finished answer as Markdown
else:
self._stream_view.setPlainText(f"AI explanation failed:\n\n{text}")
if self._stream_status is not None:
self._stream_status.setText(
"AI-generated suggestions — verify before acting, especially anything that changes "
"settings or data." if ok else "The request failed.")
def _open_stream_dialog(self) -> QDialog:
"""A live dialog the AI streams into; finalized to rendered Markdown when done."""
from ..core import ai
dlg = QDialog(self)
dlg.setWindowTitle(f"AI explanation — {ai.provider_label()}")
dlg.resize(620, 520)
lay = QVBoxLayout(dlg)
view = QTextEdit()
view.setObjectName("Report")
view.setReadOnly(True)
lay.addWidget(view)
status = QLabel("Streaming from the model…")
status.setObjectName("Muted")
status.setWordWrap(True)
lay.addWidget(status)
close = QPushButton("Close")
close.setObjectName("PrimaryButton")
close.clicked.connect(dlg.accept)
lay.addWidget(close, alignment=Qt.AlignmentFlag.AlignRight)
self._stream_view = view
self._stream_status = status
return dlg
+85
View File
@@ -0,0 +1,85 @@
"""Tests for user-added games (M6): add/remove/scan of titles no launcher reports (e.g. SPT)."""
import tempfile
import unittest
from pathlib import Path
from unittest import mock
from rigdoctor.core import customgames
class CustomGamesTests(unittest.TestCase):
def setUp(self):
self._tmp = tempfile.TemporaryDirectory()
self._file = Path(self._tmp.name) / "custom-games.json"
self._patch = mock.patch.object(customgames.config, "CUSTOM_GAMES_FILE", self._file)
self._patch.start()
def tearDown(self):
self._patch.stop()
self._tmp.cleanup()
def test_missing_file_scans_empty(self):
self.assertEqual(customgames.scan(), [])
self.assertEqual(customgames.names(), [])
def test_add_then_scan_returns_game(self):
self.assertTrue(customgames.add("SPT"))
games = customgames.scan()
self.assertEqual(len(games), 1)
self.assertEqual(games[0].name, "SPT")
self.assertEqual(games[0].launcher, "custom")
self.assertTrue(self._file.exists()) # persisted
def test_add_is_idempotent_case_insensitive(self):
self.assertTrue(customgames.add("SPT"))
self.assertFalse(customgames.add("spt")) # already present
self.assertFalse(customgames.add(" ")) # blank
self.assertEqual(customgames.names(), ["SPT"])
def test_remove(self):
customgames.add("SPT")
customgames.add("Minecraft")
self.assertTrue(customgames.remove("spt")) # case-insensitive
self.assertEqual(customgames.names(), ["Minecraft"])
self.assertFalse(customgames.remove("nope"))
def test_scan_sorted_by_name(self):
for n in ("Zomboid", "Apex", "SPT"):
customgames.add(n)
self.assertEqual([g.name for g in customgames.scan()], ["Apex", "SPT", "Zomboid"])
def test_command_and_logdir_stored_and_resolved(self):
logs = Path(self._tmp.name) / "logs"
logs.mkdir()
sh = Path(self._tmp.name) / "tarkov.sh"
sh.write_text("#!/bin/sh\n")
self.assertTrue(customgames.add("SPT", command=str(sh), logdir=str(logs)))
self.assertEqual(customgames.command("SPT"), [str(sh)])
self.assertEqual(customgames.log_dir("SPT"), str(logs))
def test_logdir_inferred_from_sibling_logs(self):
# A command with a sibling logs/ dir (SPT's layout) → logdir auto-detected.
sh = Path(self._tmp.name) / "tarkov.sh"
sh.write_text("#!/bin/sh\n")
(Path(self._tmp.name) / "logs").mkdir()
self.assertTrue(customgames.add("SPT", command=str(sh)))
self.assertEqual(customgames.log_dir("SPT"), str(Path(self._tmp.name) / "logs"))
def test_no_command_resolves_to_none(self):
customgames.add("SPT")
self.assertIsNone(customgames.command("SPT"))
self.assertIsNone(customgames.command("missing"))
self.assertIsNone(customgames.log_dir("SPT"))
def test_corrupt_file_degrades_to_empty(self):
self._file.parent.mkdir(parents=True, exist_ok=True)
self._file.write_text("{not json")
self.assertEqual(customgames.scan(), [])
# and a subsequent add still works (overwrites the garbage)
self.assertTrue(customgames.add("SPT"))
self.assertEqual(customgames.names(), ["SPT"])
if __name__ == "__main__":
unittest.main()
+30
View File
@@ -47,6 +47,36 @@ class CollectTests(unittest.TestCase):
self.assertEqual(gamelogs.collect(), "")
class CustomGameLogTests(unittest.TestCase):
def test_collect_includes_custom_game_logs(self):
tmp = Path(tempfile.mkdtemp())
(tmp / "tarkov-latest.log").write_text(">>> Tarkov gone. clean exit")
(tmp / "server-latest.log").write_text("SPT server error: mod failed to load")
with mock.patch.object(gamelogs, "_proton_logs", return_value=[]), \
mock.patch.object(gamelogs, "_steam_console", return_value=None), \
mock.patch("rigdoctor.core.customgames.log_dir", return_value=str(tmp)):
out = gamelogs.collect(game="SPT")
self.assertIn("SPT log", out)
self.assertIn("server-latest.log", out)
self.assertIn("mod failed to load", out)
def test_custom_logs_skipped_when_stale(self):
tmp = Path(tempfile.mkdtemp())
old = tmp / "tarkov-latest.log"
old.write_text("an earlier session")
old_mtime = time.time() - 3600
os.utime(old, (old_mtime, old_mtime))
with mock.patch.object(gamelogs, "_proton_logs", return_value=[]), \
mock.patch.object(gamelogs, "_steam_console", return_value=None), \
mock.patch("rigdoctor.core.customgames.log_dir", return_value=str(tmp)):
self.assertEqual(gamelogs.collect(since=time.time() - 60, game="SPT"), "")
def test_no_game_means_no_custom_logs(self):
with mock.patch.object(gamelogs, "_proton_logs", return_value=[]), \
mock.patch.object(gamelogs, "_steam_console", return_value=None):
self.assertEqual(gamelogs.collect(), "") # game=None → custom lookup skipped
class SinceScopingTests(unittest.TestCase):
def test_since_filter_keeps_window_only(self):
text = (
+30
View File
@@ -11,11 +11,19 @@ from rigdoctor.core.health import (
WARNING,
check_displays,
check_memory_speed,
check_nvidia_module,
check_pcie_links,
run_health_checks,
scan_journal_text,
)
# A real no-Xid freeze: the open-module VA-space storm captured on 2026-05-29.
_VASPACE_LOG = """\
NVRM: nvCheckFailedNoLog: Check failed: 0 == (pMapNode->gpuMask & gpuMask) @ gpu_vaspace.c:4547
NVRM: dmaAllocMapping_GM107: can't update VA space for mapping @vaddr=0x4be00000
[drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* Failed to allocate NVKMS memory for GEM object
"""
class HealthScanTests(unittest.TestCase):
def test_xid_79_is_critical(self):
@@ -44,6 +52,28 @@ class HealthScanTests(unittest.TestCase):
def test_clean_text_yields_no_findings(self):
self.assertEqual(scan_journal_text("usb 1-1: new high-speed USB device\nbluetooth: ok"), [])
def test_vaspace_freeze_detected_without_any_xid(self):
findings = scan_journal_text(_VASPACE_LOG)
gpu = [f for f in findings if f.category == "GPU"]
self.assertEqual(len(gpu), 1)
self.assertEqual(gpu[0].severity, WARNING)
self.assertIn("VA-space", gpu[0].title)
# It must NOT be misreported as an Xid finding (the log has no Xid at all).
self.assertNotIn("Xid", gpu[0].title)
self.assertIn("open kernel module", gpu[0].detail.lower())
def test_open_module_finding_when_open_loaded(self):
with mock.patch("rigdoctor.core.health._nvidia_module_is_open", return_value=True):
findings = check_nvidia_module()
self.assertEqual(len(findings), 1)
self.assertEqual(findings[0].severity, INFO)
self.assertEqual(findings[0].category, "Driver")
def test_no_module_finding_when_proprietary_or_absent(self):
for state in (False, None):
with mock.patch("rigdoctor.core.health._nvidia_module_is_open", return_value=state):
self.assertEqual(check_nvidia_module(), [])
def test_run_health_checks_returns_findings(self):
# Runs against the real system; just assert it returns a sorted list of Findings.
findings = run_health_checks()
+163
View File
@@ -0,0 +1,163 @@
"""Tests for the .dmp minidump parser (M14) — builds a synthetic MDMP, no external tools."""
import struct
import tempfile
import unittest
from pathlib import Path
from unittest import mock
from rigdoctor.core import minidump
def _synthetic_dump() -> bytes:
"""A minimal but valid MDMP: header + SystemInfo + Exception + 2-module ModuleList.
Layout (absolute file offsets): header@0, directory@32, SystemInfo@68, Exception@96,
ModuleList@264, name strings@484. Module0 spans the exception address, so it's faulting.
"""
buf = bytearray(600)
struct.pack_into("<4sIIIIIQ", buf, 0, b"MDMP", 0xA793, 3, 32, 0, 1_700_000_000, 0)
struct.pack_into("<III", buf, 32, 7, 28, 68) # SystemInfoStream
struct.pack_into("<III", buf, 44, 6, 168, 96) # ExceptionStream
struct.pack_into("<III", buf, 56, 4, 220, 264) # ModuleListStream
# SystemInfo: x86-64, 16 CPUs, Windows 10.0.19041 (PlatformId 2 = Win32 NT).
struct.pack_into("<HHHBBIIIII", buf, 68, 9, 0, 0, 16, 1, 10, 0, 19041, 2, 0)
# Exception: access violation (write) at 0x140001234.
struct.pack_into("<I", buf, 96, 4321) # ThreadId
struct.pack_into("<I", buf, 96 + 8, 0xC0000005) # ExceptionCode
struct.pack_into("<Q", buf, 96 + 24, 0x140001234) # ExceptionAddress
struct.pack_into("<I", buf, 96 + 32, 2) # NumberParameters
struct.pack_into("<Q", buf, 96 + 40, 1) # info[0] = write
struct.pack_into("<Q", buf, 96 + 48, 0x0) # info[1] = faulting address
# ModuleList: 2 modules.
struct.pack_into("<I", buf, 264, 2)
m0, m1 = 268, 268 + minidump._MODULE_STRIDE
struct.pack_into("<Q", buf, m0, 0x140000000) # base
struct.pack_into("<I", buf, m0 + 8, 0x100000) # size (spans the exception address)
struct.pack_into("<I", buf, m0 + 20, 484) # name RVA
struct.pack_into("<Q", buf, m1, 0x180000000)
struct.pack_into("<I", buf, m1 + 8, 0x080000)
struct.pack_into("<I", buf, m1 + 20, 522)
name0 = "C:\\Games\\game.exe".encode("utf-16-le")
struct.pack_into("<I", buf, 484, len(name0))
buf[488:488 + len(name0)] = name0
name1 = "nvwgf2umx.dll".encode("utf-16-le")
struct.pack_into("<I", buf, 522, len(name1))
buf[526:526 + len(name1)] = name1
return bytes(buf)
class ParseTests(unittest.TestCase):
def setUp(self):
self._tmp = tempfile.NamedTemporaryFile(suffix=".dmp", delete=False)
self._tmp.write(_synthetic_dump())
self._tmp.close()
self.path = self._tmp.name
def tearDown(self):
Path(self.path).unlink(missing_ok=True)
def _parse(self):
return minidump.parse(self.path, run_stackwalk=False)
def test_parses_exception_and_faulting_module(self):
r = self._parse()
self.assertTrue(r.ok, r.error)
self.assertEqual(r.exception_code, 0xC0000005)
self.assertIn("Access violation", r.crash_reason)
self.assertIn("writing 0x0", r.crash_reason)
self.assertEqual(r.faulting_module, "game.exe") # basename, address inside module0
self.assertEqual(r.crashing_thread, 4321)
def test_parses_system_info_and_modules(self):
r = self._parse()
self.assertEqual(r.os_name, "Windows 10.0.19041")
self.assertEqual(r.cpu_arch, "x86-64")
self.assertEqual(r.cpu_count, 16)
self.assertEqual([m.name for m in r.modules], ["game.exe", "nvwgf2umx.dll"])
def test_to_text_and_ai_text(self):
r = self._parse()
text = minidump.to_text(r)
self.assertIn("game.exe", text)
self.assertIn("nvwgf2umx.dll", text)
self.assertIn("Access violation", text)
ai_text = minidump.to_ai_text(r)
self.assertIn("Proton", ai_text) # Linux/Proton framing for the model
self.assertIn("Crash reason", ai_text)
def test_to_findings(self):
findings = minidump.to_findings(self._parse())
self.assertEqual(findings[0].severity, minidump.CRITICAL)
self.assertIn("game.exe", findings[0].title)
def test_run_stackwalk_false_skips_external_tool(self):
self.assertEqual(self._parse().stackwalk, "")
class RobustnessTests(unittest.TestCase):
def test_non_minidump_file(self):
with tempfile.NamedTemporaryFile(suffix=".dmp", delete=False) as fh:
fh.write(b"not a dump at all")
path = fh.name
try:
r = minidump.parse(path, run_stackwalk=False)
finally:
Path(path).unlink(missing_ok=True)
self.assertFalse(r.ok)
self.assertIn("signature", r.error)
def test_missing_file(self):
r = minidump.parse("/nonexistent/does-not-exist.dmp", run_stackwalk=False)
self.assertFalse(r.ok)
self.assertIn("can't read", r.error)
def test_stackwalk_absent_returns_empty(self):
with mock.patch.object(minidump.shutil, "which", return_value=None):
self.assertEqual(minidump.stackwalk("/whatever.dmp"), "")
class CliDumpTests(unittest.TestCase):
"""`rigdoctor ai dump <file>` parses then explains via the configured provider."""
def _args(self, **over):
import argparse
base = {"ai_cmd": "dump", "file": ""}
base.update(over)
return argparse.Namespace(**base)
def test_dump_parses_and_explains(self):
from rigdoctor.core import ai
with tempfile.NamedTemporaryFile(suffix=".dmp", delete=False) as fh:
fh.write(_synthetic_dump())
path = fh.name
try:
with mock.patch.object(ai, "is_configured", return_value=True), \
mock.patch.object(ai, "provider_label", return_value="Claude (test)"), \
mock.patch.object(minidump, "stackwalk", return_value=""), \
mock.patch.object(ai, "explain", return_value=(True, "Likely DXVK.")) as explain:
from rigdoctor import cli
rc = cli.cmd_ai(self._args(file=path))
finally:
Path(path).unlink(missing_ok=True)
self.assertEqual(rc, 0)
sent = explain.call_args[0][0]
self.assertIn("Proton", sent) # the Linux/Proton framing reached the model
self.assertIn("game.exe", sent)
def test_dump_bad_file_returns_error(self):
from rigdoctor.core import ai
with mock.patch.object(ai, "is_configured", return_value=True):
from rigdoctor import cli
rc = cli.cmd_ai(self._args(file="/nope/missing.dmp"))
self.assertEqual(rc, 1)
if __name__ == "__main__":
unittest.main()