Compare commits
10 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| ab89dda0b4 | |||
| 305c88ba09 | |||
| 82f3ea49de | |||
| 8d695227bc | |||
| 82bef0a08c | |||
| 73f347449e | |||
| 5cd51beadf | |||
| 934b489fec | |||
| 7a283dc338 | |||
| 5682878f22 |
@@ -5,6 +5,58 @@ All notable changes to RigDoctor are recorded here. Format follows
|
||||
(`MAJOR.MINOR.PATCH`, pre-1.0). `__version__` and `pyproject.toml` must match the git
|
||||
release tag (so the auto-updater, D18, can compare versions).
|
||||
|
||||
## [0.15.0] - 2026-05-22
|
||||
### Added
|
||||
- **Hard-crash detection & recovery for the guided diagnostic.** If a focused capture ends
|
||||
without a clean stop (the recorder never wrote `session-stop` and isn't running), RigDoctor
|
||||
treats it as a likely hard freeze. On launch the **Games** page shows a warning banner —
|
||||
*"Your last diagnostic for <game> ended unexpectedly…"* — with **Analyze crash** / **Dismiss**.
|
||||
- **Deeper crash analysis.** *Analyze crash* combines the captured window (final readings before
|
||||
the freeze + any GPU-lost event) with a focused scan of the **previous (crashed) boot's kernel
|
||||
log** (`journalctl -k -b -1`: Xid/panic/OOM/MCE/AER/thermal) plus SMART/driver/persistence/
|
||||
live-temp checks — the full "what happened" picture. `core/diagnostic.py` gains
|
||||
`pending_crash()` / `analyze_crash()`; `health.check_previous_boot()` +
|
||||
`run_health_checks(include_journal=False)` back it.
|
||||
|
||||
## [0.14.0] - 2026-05-22
|
||||
### Changed
|
||||
- **Dashboard headline tiles are now history trend graphs** instead of single-value gauges —
|
||||
GPU temp, GPU load, CPU temp, and memory each plot their recent history (with the current
|
||||
value, window min/max, and a dashed warning-threshold line), so you can see changes over time
|
||||
rather than only the instantaneous reading. New `HistoryGraph` widget (QPainter, no new deps).
|
||||
|
||||
## [0.13.0] - 2026-05-22
|
||||
### Added
|
||||
- **Run Diagnostic now explains itself and can launch the game.** Clicking Run Diagnostic shows
|
||||
what to do — *play the game, reproduce the crash, then Finish & analyze* (and that data
|
||||
survives a hard freeze + reboot) — and offers **Launch game & start** (asks Steam to run it by
|
||||
appid) or **Start without launching**. The recording banner now spells out the next step
|
||||
instead of just showing a sample count.
|
||||
### Fixed
|
||||
- Button labels containing "&" (e.g. "Finish & analyze") rendered as "Finish _analyze" because
|
||||
Qt treated the "&" as a keyboard mnemonic — now escaped so the ampersand shows literally.
|
||||
|
||||
## [0.12.0] - 2026-05-22
|
||||
### Added
|
||||
- **Guided diagnostic in the GUI.** Each game on the **Games** page now has a **Run Diagnostic**
|
||||
button → a focused, game-tagged capture starts and a recording banner appears (live sample
|
||||
count, GPU-lost indicator) with **Finish & analyze** / **Discard**. Finishing opens a results
|
||||
dialog: the window-scoped capture summary (peak temps/power, events, last samples) plus the
|
||||
health findings as cards. The banner persists/restores if you navigate away and back while a
|
||||
capture is running. Shares `core/diagnostic.py` with the CLI (one flow, three front-ends).
|
||||
|
||||
## [0.11.0] - 2026-05-22
|
||||
### Added
|
||||
- **Guided diagnostic session (CLI) — the seed use case, end to end.** `rigdoctor diagnose
|
||||
start --game "<name>"` runs a **focused crash-capture tagged with that game** (its own
|
||||
diagnostic log, so the report is scoped to just that session), `diagnose status` shows
|
||||
progress, and `diagnose finish` stops it and prints a combined report: the **capture
|
||||
summary** (peak temps/power, GPU-lost events, last samples — M3) plus the **health findings**
|
||||
(Xid/SMART/driver/etc. — M4). The game can be given by `--game` or `--appid` (resolved from
|
||||
the Steam scan), and is recorded as a log event so it survives a crash + reboot.
|
||||
- Shared orchestration lives in `core/diagnostic.py` (one callable for CLI/GUI/tray, per
|
||||
ARCHITECTURE §7.1); the recorder/`record run` gained an optional `--game` tag.
|
||||
|
||||
## [0.10.2] - 2026-05-22
|
||||
### Changed
|
||||
- When an Environment **Apply**/**Install** fails, the status now shows the **real reason**
|
||||
|
||||
+7
-2
@@ -40,8 +40,13 @@ Ubuntu + NVIDIA first; `.deb` distribution (see `DECISIONS.md`).
|
||||
- [ ] M10 desktop GUI (PySide6: dashboard, log browser, report viewer, logger controls)
|
||||
- [ ] M11 tray / menu-bar applet (QSystemTrayIcon: live M1 readouts + Run Diagnostic +
|
||||
supporting actions — D13)
|
||||
- [ ] Guided diagnostic session (pick game → focused M3 capture → M4 scan → findings),
|
||||
shared by tray/GUI/CLI
|
||||
- [~] Guided diagnostic session (pick game → focused M3 capture → M4 scan → findings),
|
||||
shared by tray/GUI/CLI — *core + CLI + GUI done* (`core/diagnostic.py`, `rigdoctor
|
||||
diagnose start/status/finish`, and a **Run Diagnostic** button per game on the GUI Games
|
||||
page → recording banner → results dialog with the capture summary + findings). Tags a
|
||||
focused capture with the chosen game (own diagnostic log, window-scoped report) and
|
||||
combines the capture summary with the M4 findings. *Pending:* the tray (M11) entry point,
|
||||
and auto start/stop via the D12 wrapper/watcher.
|
||||
- [ ] Logger trigger modes: always-on + game-launch (D12 — wrapper first:
|
||||
`rigdoctor wrap %command%` + global Steam compat-tool; zero-config watcher
|
||||
(Steam RunningAppID + /proc) and GameMode hook follow)
|
||||
|
||||
+1
-1
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "rigdoctor"
|
||||
version = "0.10.2"
|
||||
version = "0.15.0"
|
||||
description = "Modular hardware monitoring & crash diagnostics for Linux gamers."
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
"""RigDoctor — modular hardware monitoring & crash diagnostics for Linux gamers."""
|
||||
|
||||
__version__ = "0.10.2"
|
||||
__version__ = "0.15.0"
|
||||
|
||||
@@ -86,6 +86,7 @@ def cmd_record_run(args) -> int:
|
||||
max_bytes=cfg["log_max_bytes"],
|
||||
backups=cfg["log_backups"],
|
||||
status_path=config.STATUS_FILE,
|
||||
game=getattr(args, "game", None),
|
||||
)
|
||||
|
||||
def _handle(_sig, _frame):
|
||||
@@ -345,6 +346,77 @@ def cmd_report(args) -> int:
|
||||
return 0
|
||||
|
||||
|
||||
def _resolve_game(args) -> str | None:
|
||||
"""Game name from --game, or looked up from --appid via the Steam scan."""
|
||||
if getattr(args, "game", None):
|
||||
return args.game
|
||||
if getattr(args, "appid", None):
|
||||
from .core import steam
|
||||
|
||||
for g in steam.scan_games(steam.selected_library_paths()):
|
||||
if g.appid == str(args.appid):
|
||||
return g.name
|
||||
return None
|
||||
return None
|
||||
|
||||
|
||||
def cmd_diagnose(args) -> int:
|
||||
from .core import diagnostic, reccontrol, steam
|
||||
|
||||
sub = args.diagnose_cmd or "status"
|
||||
|
||||
if sub == "start":
|
||||
if reccontrol.running_pid():
|
||||
print("A capture is already running — finish it with: rigdoctor diagnose finish")
|
||||
return 1
|
||||
game = _resolve_game(args)
|
||||
if game is None and (args.game or args.appid):
|
||||
print("Couldn't match that game in your selected Steam libraries.")
|
||||
return 1
|
||||
if game is None:
|
||||
games = steam.cached_games() or steam.scan_games(steam.selected_library_paths())
|
||||
if games:
|
||||
print("Pick a game to focus on, then re-run with --game:")
|
||||
for g in games:
|
||||
print(f" --game {g.name!r}")
|
||||
else:
|
||||
print("No games detected. Select a library: rigdoctor games libraries --all")
|
||||
return 1
|
||||
pid = diagnostic.start(game=game, interval=args.interval)
|
||||
time.sleep(1.0)
|
||||
if pid and reccontrol.pid_alive(pid):
|
||||
print(f"Diagnostic capture started for {game!r} (pid {pid}).")
|
||||
print(" Play your game. When you're done (or after a crash + reboot):")
|
||||
print(" rigdoctor diagnose finish")
|
||||
return 0
|
||||
print(f"Capture failed to start; see {config.SPAWN_LOG}")
|
||||
return 1
|
||||
|
||||
if sub == "status":
|
||||
status = diagnostic.active()
|
||||
if not status:
|
||||
print("No diagnostic capture is running.")
|
||||
return 0
|
||||
game = status.get("game") or "—"
|
||||
print(f"Capturing for {game!r}: {status.get('samples', 0)} samples"
|
||||
+ (" · GPU-lost seen" if status.get("gpu_lost") else ""))
|
||||
return 0
|
||||
|
||||
# finish
|
||||
if not reccontrol.running_pid() and not config.DIAG_LOG.exists():
|
||||
print("No diagnostic to analyze. Start one with: rigdoctor diagnose start --game <name>")
|
||||
return 1
|
||||
print("Stopping capture and analyzing…\n")
|
||||
result = diagnostic.finish(last_n=args.last)
|
||||
from .render import render_health, render_summary
|
||||
|
||||
if result.game:
|
||||
print(f"Diagnostic — {result.game}\n")
|
||||
print(render_summary(result.summary, log_path=config.DIAG_LOG))
|
||||
print("\n" + render_health(result.findings, title="Findings"))
|
||||
return 0
|
||||
|
||||
|
||||
def cmd_gameenv(args) -> int:
|
||||
from dataclasses import asdict
|
||||
|
||||
@@ -470,6 +542,7 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
run_p = rec_sub.add_parser("run", help="run the capture loop in the foreground (systemd-friendly)")
|
||||
run_p.add_argument("-n", "--interval", type=float, default=None, help="sampling interval (s)")
|
||||
run_p.add_argument("-o", "--out", default=None, help="log file path")
|
||||
run_p.add_argument("--game", default=None, help="tag the capture with a game name (M6/diagnose)")
|
||||
run_p.set_defaults(func=cmd_record_run)
|
||||
|
||||
start_p = rec_sub.add_parser("start", help="start recording in the background")
|
||||
@@ -519,6 +592,19 @@ def build_parser() -> argparse.ArgumentParser:
|
||||
env_p = sub.add_parser("gameenv", help="gaming environment checks (M6): flag stability/perf settings")
|
||||
env_p.add_argument("--json", action="store_true", help="output JSON instead of text")
|
||||
env_p.set_defaults(func=cmd_gameenv)
|
||||
|
||||
diag_p = sub.add_parser("diagnose", help="guided diagnostic: capture while gaming, then analyze")
|
||||
diag_sub = diag_p.add_subparsers(dest="diagnose_cmd")
|
||||
diag_start = diag_sub.add_parser("start", help="start a focused capture for a game")
|
||||
diag_start.add_argument("--game", default=None, help="game name to focus on")
|
||||
diag_start.add_argument("--appid", default=None, help="Steam appid to focus on (resolved to a name)")
|
||||
diag_start.add_argument("-n", "--interval", type=float, default=None, help="sampling interval (s)")
|
||||
diag_start.set_defaults(func=cmd_diagnose)
|
||||
diag_sub.add_parser("status", help="show the in-progress diagnostic").set_defaults(func=cmd_diagnose)
|
||||
diag_finish = diag_sub.add_parser("finish", help="stop the capture and analyze it")
|
||||
diag_finish.add_argument("--last", type=int, default=10, help="recent samples to show")
|
||||
diag_finish.set_defaults(func=cmd_diagnose)
|
||||
diag_p.set_defaults(func=cmd_diagnose, diagnose_cmd=None, last=10)
|
||||
return p
|
||||
|
||||
|
||||
|
||||
@@ -23,6 +23,9 @@ CONFIG_FILE = CONFIG_DIR / "config.toml"
|
||||
|
||||
# Crash-capture logger (M3)
|
||||
LOG_FILE = LOG_DIR / "capture.jsonl"
|
||||
# Guided diagnostic (M6/D12): a focused capture writes here, separate from the always-on
|
||||
# crash log, so its report covers only that session's window.
|
||||
DIAG_LOG = LOG_DIR / "diagnostic.jsonl"
|
||||
STATUS_FILE = STATE_DIR / "recorder.json"
|
||||
PID_FILE = STATE_DIR / "recorder.pid"
|
||||
SPAWN_LOG = STATE_DIR / "recorder.out"
|
||||
|
||||
@@ -0,0 +1,162 @@
|
||||
"""Guided diagnostic session (SPEC §4 / ARCHITECTURE §7.1): orchestrate M3 + M4.
|
||||
|
||||
The seed use case, one flow: **pick a game** → **focused crash-capture** scoped to that
|
||||
session (M3, tagged with the game) → on **finish**, **scan & analyze** (M4 health report)
|
||||
over the captured window + system logs → return a prioritized result. This is not a new
|
||||
module — it's a single shared callable so the CLI, GUI, and tray run the identical flow.
|
||||
|
||||
The capture is **manually bracketed** (start/finish) for now; auto start/stop on game launch
|
||||
(the D12 wrapper/watcher) plugs in here later without changing the result shape.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import time
|
||||
from dataclasses import dataclass
|
||||
|
||||
from .. import config
|
||||
from . import reccontrol
|
||||
from .crashlog import Summary, summarize
|
||||
from .health import CRITICAL, OK, WARNING, Finding
|
||||
|
||||
_SEV_ORDER = {CRITICAL: 0, WARNING: 1, "info": 2, OK: 3}
|
||||
|
||||
|
||||
@dataclass
|
||||
class DiagnosticResult:
|
||||
game: str | None
|
||||
summary: Summary # capture window: peak temps/power, events, last samples (M3)
|
||||
findings: list[Finding] # health findings: Xid/SMART/driver/etc. (M4)
|
||||
|
||||
|
||||
@dataclass
|
||||
class CrashInfo:
|
||||
game: str | None
|
||||
samples: int
|
||||
when: float | None # ts of the last captured sample (≈ when the freeze hit)
|
||||
gpu_lost: bool
|
||||
|
||||
|
||||
def _clear_diag_log() -> None:
|
||||
"""Each diagnostic is a fresh focused capture — drop any previous session + segments."""
|
||||
base = config.DIAG_LOG
|
||||
for p in [base, *base.parent.glob(base.name + ".*")]:
|
||||
try:
|
||||
p.unlink()
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
|
||||
def start(game: str | None = None, interval: float | None = None) -> int | None:
|
||||
"""Begin a focused capture, tagged with the game, into the dedicated diagnostic log.
|
||||
Returns the pid, or None if a capture is already running."""
|
||||
if reccontrol.running_pid():
|
||||
return None
|
||||
_clear_diag_log()
|
||||
return reccontrol.start_background(interval=interval, out=str(config.DIAG_LOG), game=game)
|
||||
|
||||
|
||||
def is_running() -> bool:
|
||||
return reccontrol.running_pid() is not None
|
||||
|
||||
|
||||
def active() -> dict | None:
|
||||
"""Status of the in-progress session (running flag, game, samples), or None if idle."""
|
||||
if not is_running():
|
||||
return None
|
||||
return reccontrol.read_status()
|
||||
|
||||
|
||||
def _await_stopped(timeout: float = 6.0) -> None:
|
||||
deadline = time.monotonic() + timeout
|
||||
while reccontrol.running_pid() and time.monotonic() < deadline:
|
||||
time.sleep(0.1)
|
||||
|
||||
|
||||
def _game_from_summary(summary: Summary) -> str | None:
|
||||
"""Recover the focused game from the log's 'game' event (survives a crash + reboot)."""
|
||||
for _ts, kind, detail in reversed(summary.events):
|
||||
if kind == "game" and detail:
|
||||
return detail
|
||||
return None
|
||||
|
||||
|
||||
def finish(last_n: int = 10, log_path=None) -> DiagnosticResult:
|
||||
"""Stop the capture (if running), summarize the window, and run the health report."""
|
||||
from .health import run_health_checks
|
||||
|
||||
reccontrol.stop_background()
|
||||
_await_stopped()
|
||||
path = log_path or config.DIAG_LOG
|
||||
summary = summarize(path, last_n=last_n)
|
||||
game = _game_from_summary(summary) or (reccontrol.read_status() or {}).get("game")
|
||||
findings = run_health_checks()
|
||||
return DiagnosticResult(game=game, summary=summary, findings=findings)
|
||||
|
||||
|
||||
# --- hard-crash detection & post-crash analysis -----------------------------------
|
||||
|
||||
def pending_crash() -> CrashInfo | None:
|
||||
"""Detect a diagnostic that ended abnormally (no clean stop, no live recorder).
|
||||
|
||||
A focused capture writes `session-start` (+ `game`) and, on a clean stop, `session-stop`.
|
||||
After a hard freeze that block never runs, so the log has a start with no stop and no
|
||||
live recorder — that's our hard-crash signal. Returns None if a capture is running, none
|
||||
is recorded, it stopped cleanly, or the user already acknowledged it.
|
||||
"""
|
||||
if is_running() or not config.DIAG_LOG.exists():
|
||||
return None
|
||||
summary = summarize(config.DIAG_LOG)
|
||||
kinds = {kind for _ts, kind, _detail in summary.events}
|
||||
if "session-start" not in kinds:
|
||||
return None
|
||||
if "session-stop" in kinds or "diagnostic-acknowledged" in kinds:
|
||||
return None
|
||||
return CrashInfo(
|
||||
game=_game_from_summary(summary),
|
||||
samples=summary.samples,
|
||||
when=summary.end,
|
||||
gpu_lost="gpu-lost" in kinds,
|
||||
)
|
||||
|
||||
|
||||
def acknowledge_crash() -> None:
|
||||
"""Mark the recorded crash as seen so it stops prompting (appends a marker event)."""
|
||||
try:
|
||||
config.DIAG_LOG.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(config.DIAG_LOG, "a", encoding="utf-8") as fh:
|
||||
fh.write(json.dumps({"ts": time.time(), "event": "diagnostic-acknowledged", "detail": ""}) + "\n")
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
|
||||
def _crash_headline(summary: Summary) -> Finding:
|
||||
gpu_lost = any(kind == "gpu-lost" for _ts, kind, _detail in summary.events)
|
||||
when = time.strftime("%H:%M:%S", time.localtime(summary.end)) if summary.end else "?"
|
||||
detail = (
|
||||
f"The capture stopped abruptly at {when} after {summary.samples} samples, with no clean "
|
||||
"shutdown recorded — consistent with a hard freeze or power loss."
|
||||
)
|
||||
if gpu_lost:
|
||||
detail += " A GPU-lost event was captured during the session."
|
||||
return Finding(
|
||||
CRITICAL if gpu_lost else WARNING,
|
||||
"Diagnostic",
|
||||
"Session ended without a clean stop (likely a hard crash)",
|
||||
detail,
|
||||
"Review the last readings (Capture, above) and the crash-boot findings below.",
|
||||
)
|
||||
|
||||
|
||||
def analyze_crash(last_n: int = 15) -> DiagnosticResult:
|
||||
"""Analyze a recorded hard crash: the captured window + the previous boot's kernel log
|
||||
+ the rest of the health report (SMART/driver/persistence/temps)."""
|
||||
from .health import check_previous_boot, run_health_checks
|
||||
|
||||
summary = summarize(config.DIAG_LOG, last_n=last_n)
|
||||
findings: list[Finding] = [_crash_headline(summary)]
|
||||
findings += check_previous_boot() # the crashed boot's kernel log
|
||||
findings += run_health_checks(include_journal=False) # SMART/driver/persistence/temps
|
||||
findings.sort(key=lambda f: _SEV_ORDER.get(f.severity, 9))
|
||||
return DiagnosticResult(game=_game_from_summary(summary), summary=summary, findings=findings)
|
||||
@@ -146,6 +146,22 @@ def check_journal() -> list[Finding]:
|
||||
return findings
|
||||
|
||||
|
||||
def check_previous_boot() -> list[Finding]:
|
||||
"""Scan the previous boot's kernel log — the boot that crashed — for fault signatures.
|
||||
|
||||
Needs persistent journald (else the crashed boot's logs were lost on reboot, which the
|
||||
persistence check flags separately). Findings are framed as coming from that boot.
|
||||
"""
|
||||
out = _journalctl(["-k", "-b", "-1", "--no-pager", "-o", "cat"])
|
||||
if not out or not out.strip():
|
||||
return []
|
||||
tagged = []
|
||||
for f in scan_journal_text(out):
|
||||
detail = ("Logged during the previous (crashed) boot. " + (f.detail or "")).strip()
|
||||
tagged.append(Finding(f.severity, f.category, f.title, detail, f.suggestion))
|
||||
return tagged
|
||||
|
||||
|
||||
def check_journal_persistence() -> list[Finding]:
|
||||
if Path("/var/log/journal").is_dir():
|
||||
return []
|
||||
@@ -235,16 +251,20 @@ def check_live_temps() -> list[Finding]:
|
||||
)]
|
||||
|
||||
|
||||
def run_health_checks() -> list[Finding]:
|
||||
def run_health_checks(include_journal: bool = True) -> list[Finding]:
|
||||
"""Run all checks and return findings sorted by severity (worst first).
|
||||
|
||||
SMART needs root; if the session collected it via launch elevation, use that
|
||||
instead of re-running smartctl (which would just report "needs root").
|
||||
|
||||
`include_journal=False` skips the 7-day kernel-journal scan — used by the crash
|
||||
analysis, which scans the previous (crashed) boot specifically instead.
|
||||
"""
|
||||
from . import elevation
|
||||
|
||||
findings: list[Finding] = []
|
||||
findings += check_nvidia_driver()
|
||||
if include_journal:
|
||||
findings += check_journal()
|
||||
findings += check_journal_persistence()
|
||||
priv = elevation.privileged()
|
||||
|
||||
@@ -38,7 +38,9 @@ def read_status() -> dict | None:
|
||||
return None
|
||||
|
||||
|
||||
def start_background(interval: float | None = None, out: str | None = None) -> int | None:
|
||||
def start_background(
|
||||
interval: float | None = None, out: str | None = None, game: str | None = None
|
||||
) -> int | None:
|
||||
"""Spawn a detached `record run`. Returns the child pid, or None if already running."""
|
||||
if running_pid():
|
||||
return None
|
||||
@@ -48,6 +50,8 @@ def start_background(interval: float | None = None, out: str | None = None) -> i
|
||||
cmd += ["--interval", str(interval)]
|
||||
if out:
|
||||
cmd += ["--out", out]
|
||||
if game:
|
||||
cmd += ["--game", game]
|
||||
out_fh = open(config.SPAWN_LOG, "a")
|
||||
proc = subprocess.Popen(
|
||||
cmd,
|
||||
|
||||
@@ -27,12 +27,14 @@ class Recorder:
|
||||
backups: int = 10,
|
||||
status_path=None,
|
||||
sampler: Sampler | None = None,
|
||||
game: str | None = None,
|
||||
) -> None:
|
||||
self.interval = interval
|
||||
self.sampler = sampler or Sampler(available_sources())
|
||||
self.writer = CrashLogWriter(log_path, max_bytes, backups)
|
||||
self.log_path = Path(log_path)
|
||||
self.status_path = Path(status_path) if status_path else None
|
||||
self.game = game or None
|
||||
self.samples = 0
|
||||
self._stop = threading.Event()
|
||||
self._gpu_lost = False
|
||||
@@ -43,6 +45,8 @@ class Recorder:
|
||||
|
||||
def run(self) -> None:
|
||||
self.writer.write_event("session-start", f"interval={self.interval:g}s")
|
||||
if self.game:
|
||||
self.writer.write_event("game", self.game) # tag the focused-diagnostic target
|
||||
self._write_status(running=True)
|
||||
try:
|
||||
while not self._stop.is_set():
|
||||
@@ -81,6 +85,7 @@ class Recorder:
|
||||
"samples": self.samples,
|
||||
"updated": time.time(),
|
||||
"gpu_lost": self._gpu_lost,
|
||||
"game": self.game,
|
||||
}
|
||||
if sample is not None:
|
||||
data["latest"] = headline(sample)
|
||||
|
||||
@@ -15,6 +15,8 @@ from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import time
|
||||
from dataclasses import asdict, dataclass
|
||||
from pathlib import Path
|
||||
@@ -351,6 +353,24 @@ def acknowledge_new() -> None:
|
||||
|
||||
# --- formatting -----------------------------------------------------------------------
|
||||
|
||||
def launch_game(appid: str) -> bool:
|
||||
"""Best-effort: ask Steam to launch a game by appid (steam:// URL). Non-blocking."""
|
||||
if not appid:
|
||||
return False
|
||||
url = f"steam://rungameid/{appid}"
|
||||
for cmd in (["steam", url], ["xdg-open", url]):
|
||||
if shutil.which(cmd[0]):
|
||||
try:
|
||||
subprocess.Popen(
|
||||
cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
|
||||
stdin=subprocess.DEVNULL, start_new_session=True,
|
||||
)
|
||||
return True
|
||||
except (OSError, subprocess.SubprocessError):
|
||||
continue
|
||||
return False
|
||||
|
||||
|
||||
def human_size(num_bytes: int) -> str:
|
||||
if num_bytes <= 0:
|
||||
return "—"
|
||||
|
||||
@@ -17,19 +17,19 @@ from PySide6.QtWidgets import (
|
||||
|
||||
from ..core.sample import Sample
|
||||
from ..render import metric_label
|
||||
from .widgets import Card, MetricBar, MetricRow, StatGauge
|
||||
from .widgets import Card, HistoryGraph, MetricBar, MetricRow
|
||||
|
||||
_GROUP_ORDER = ["gpu", "cpu", "memory", "storage"]
|
||||
_GROUP_TITLES = {"gpu": "GPU", "cpu": "CPU", "memory": "Memory", "storage": "Storage"}
|
||||
_BAR_METRICS = {"util", "mem_util", "fan", "used_pct"}
|
||||
|
||||
|
||||
def _gauge_card(gauge: StatGauge) -> QFrame:
|
||||
def _tile_card(widget: QWidget) -> QFrame:
|
||||
card = QFrame()
|
||||
card.setObjectName("Card")
|
||||
layout = QVBoxLayout(card)
|
||||
layout.setContentsMargins(6, 14, 6, 8)
|
||||
layout.addWidget(gauge)
|
||||
layout.setContentsMargins(6, 10, 6, 8)
|
||||
layout.addWidget(widget)
|
||||
return card
|
||||
|
||||
|
||||
@@ -54,16 +54,16 @@ class Dashboard(QWidget):
|
||||
header.addWidget(self._updated)
|
||||
root.addLayout(header)
|
||||
|
||||
# Headline gauges
|
||||
self._g_gpu_temp = StatGauge("GPU Temp", "°C", 100, "temp")
|
||||
self._g_gpu_load = StatGauge("GPU Load", "%", 100, "accent")
|
||||
self._g_cpu_temp = StatGauge("CPU Temp", "°C", 100, "temp")
|
||||
self._g_mem = StatGauge("Memory", "%", 100, "usage")
|
||||
gauges = QHBoxLayout()
|
||||
gauges.setSpacing(14)
|
||||
# Headline trend graphs (history over the session, not just the live value)
|
||||
self._g_gpu_temp = HistoryGraph("GPU Temp", "°C", 30, 100, "temp")
|
||||
self._g_gpu_load = HistoryGraph("GPU Load", "%", 0, 100, "accent")
|
||||
self._g_cpu_temp = HistoryGraph("CPU Temp", "°C", 30, 100, "temp")
|
||||
self._g_mem = HistoryGraph("Memory", "%", 0, 100, "usage")
|
||||
graphs = QHBoxLayout()
|
||||
graphs.setSpacing(14)
|
||||
for g in (self._g_gpu_temp, self._g_gpu_load, self._g_cpu_temp, self._g_mem):
|
||||
gauges.addWidget(_gauge_card(g))
|
||||
root.addLayout(gauges)
|
||||
graphs.addWidget(_tile_card(g))
|
||||
root.addLayout(graphs)
|
||||
|
||||
# Per-subsystem cards (scrollable, 2-column grid)
|
||||
scroll = QScrollArea()
|
||||
@@ -81,10 +81,10 @@ class Dashboard(QWidget):
|
||||
root.addWidget(scroll, 1)
|
||||
|
||||
def update_sample(self, sample: Sample) -> None:
|
||||
self._g_gpu_temp.set_value(self._val(sample, "gpu", "temp", ""))
|
||||
self._g_gpu_load.set_value(self._val(sample, "gpu", "util"))
|
||||
self._g_cpu_temp.set_value(self._cpu_temp(sample))
|
||||
self._g_mem.set_value(self._val(sample, "memory", "used_pct"))
|
||||
self._g_gpu_temp.add_value(self._val(sample, "gpu", "temp", ""))
|
||||
self._g_gpu_load.add_value(self._val(sample, "gpu", "util"))
|
||||
self._g_cpu_temp.add_value(self._cpu_temp(sample))
|
||||
self._g_mem.add_value(self._val(sample, "memory", "used_pct"))
|
||||
|
||||
keys = [r.key for r in sample.readings]
|
||||
if keys != self._built_keys: # sources appeared/disappeared
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
"""Results view for a guided diagnostic session (M6/D12): capture summary + findings."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from PySide6.QtCore import Qt
|
||||
from PySide6.QtGui import QFont
|
||||
from PySide6.QtWidgets import (
|
||||
QDialog,
|
||||
QFrame,
|
||||
QHBoxLayout,
|
||||
QLabel,
|
||||
QPushButton,
|
||||
QScrollArea,
|
||||
QVBoxLayout,
|
||||
QWidget,
|
||||
)
|
||||
|
||||
from ..render import render_summary
|
||||
from .widgets import finding_card
|
||||
|
||||
|
||||
class DiagnosticDialog(QDialog):
|
||||
def __init__(self, result, parent=None) -> None:
|
||||
super().__init__(parent)
|
||||
self.setWindowTitle(f"Diagnostic — {result.game}" if result.game else "Diagnostic")
|
||||
self.resize(660, 680)
|
||||
|
||||
root = QVBoxLayout(self)
|
||||
root.setContentsMargins(20, 18, 20, 16)
|
||||
root.setSpacing(14)
|
||||
|
||||
title = QLabel(f"Diagnostic — {result.game}" if result.game else "Diagnostic")
|
||||
title.setObjectName("PageTitle")
|
||||
root.addWidget(title)
|
||||
|
||||
scroll = QScrollArea()
|
||||
scroll.setWidgetResizable(True)
|
||||
scroll.setFrameShape(QFrame.Shape.NoFrame)
|
||||
scroll.setStyleSheet("background: transparent;")
|
||||
body = QWidget()
|
||||
col = QVBoxLayout(body)
|
||||
col.setContentsMargins(0, 0, 0, 0)
|
||||
col.setSpacing(10)
|
||||
col.setAlignment(Qt.AlignmentFlag.AlignTop)
|
||||
|
||||
# Capture window summary (peaks / events / last samples) — monospace for the columns.
|
||||
cap_head = QLabel("Capture")
|
||||
cap_head.setStyleSheet("font-weight: 700; background: transparent;")
|
||||
col.addWidget(cap_head)
|
||||
summary = QLabel(render_summary(result.summary))
|
||||
summary.setObjectName("Report")
|
||||
summary.setFont(QFont("monospace"))
|
||||
summary.setTextInteractionFlags(Qt.TextInteractionFlag.TextSelectableByMouse)
|
||||
summary.setWordWrap(False)
|
||||
summary.setStyleSheet(
|
||||
"background: #0d0f13; color: #cfd3da; border: 1px solid #2a2f39; "
|
||||
"border-radius: 8px; padding: 10px;"
|
||||
)
|
||||
col.addWidget(summary)
|
||||
|
||||
find_head = QLabel(f"Findings ({len(result.findings)})")
|
||||
find_head.setStyleSheet("font-weight: 700; background: transparent;")
|
||||
col.addWidget(find_head)
|
||||
if result.findings:
|
||||
for finding in result.findings:
|
||||
col.addWidget(finding_card(finding))
|
||||
else:
|
||||
none = QLabel("No findings.")
|
||||
none.setObjectName("Muted")
|
||||
col.addWidget(none)
|
||||
|
||||
scroll.setWidget(body)
|
||||
root.addWidget(scroll, 1)
|
||||
|
||||
buttons = QHBoxLayout()
|
||||
buttons.addStretch(1)
|
||||
close = QPushButton("Close")
|
||||
close.setObjectName("PrimaryButton")
|
||||
close.clicked.connect(self.accept)
|
||||
buttons.addWidget(close)
|
||||
root.addLayout(buttons)
|
||||
@@ -17,6 +17,7 @@ from PySide6.QtWidgets import (
|
||||
QFrame,
|
||||
QHBoxLayout,
|
||||
QLabel,
|
||||
QMessageBox,
|
||||
QPushButton,
|
||||
QScrollArea,
|
||||
QVBoxLayout,
|
||||
@@ -24,10 +25,11 @@ from PySide6.QtWidgets import (
|
||||
)
|
||||
|
||||
from ..config import load_config, update_config
|
||||
from .theme import ACCENT, GOOD, MUTED
|
||||
from .diagnostic_dialog import DiagnosticDialog
|
||||
from .theme import ACCENT, GOOD, MUTED, WARN
|
||||
|
||||
|
||||
def _game_row(name: str, sublabel: str, size: str, is_new: bool) -> QFrame:
|
||||
def _game_row(name: str, sublabel: str, size: str, is_new: bool, appid: str = "", on_diagnose=None) -> QFrame:
|
||||
card = QFrame()
|
||||
card.setObjectName("Card")
|
||||
h = QHBoxLayout(card)
|
||||
@@ -59,6 +61,13 @@ def _game_row(name: str, sublabel: str, size: str, is_new: bool) -> QFrame:
|
||||
size_label.setMinimumWidth(80)
|
||||
size_label.setAlignment(Qt.AlignmentFlag.AlignRight | Qt.AlignmentFlag.AlignVCenter)
|
||||
h.addWidget(size_label, 0)
|
||||
|
||||
if on_diagnose is not None:
|
||||
diag_btn = QPushButton("Run Diagnostic")
|
||||
diag_btn.setObjectName("ActionButton")
|
||||
diag_btn.setCursor(Qt.CursorShape.PointingHandCursor)
|
||||
diag_btn.clicked.connect(lambda: on_diagnose(name, appid))
|
||||
h.addWidget(diag_btn, 0)
|
||||
return card
|
||||
|
||||
|
||||
@@ -66,14 +75,17 @@ class GamesPage(QWidget):
|
||||
_libraries_ready = Signal(object) # list[dict(path, label, count, selected)]
|
||||
_scanned = Signal(object) # steam.ScanResult
|
||||
new_count_changed = Signal(int) # newly-installed game count (for the nav badge)
|
||||
_diag_done = Signal(object) # DiagnosticResult — focused capture analyzed
|
||||
|
||||
def __init__(self) -> None:
|
||||
super().__init__()
|
||||
self.setObjectName("Page")
|
||||
self._libraries_ready.connect(self._render_libraries)
|
||||
self._scanned.connect(self._render_games)
|
||||
self._diag_done.connect(self._on_diag_done)
|
||||
self._busy = False
|
||||
self._new_appids: set[str] = set()
|
||||
self._diag_game: str | None = None
|
||||
|
||||
root = QVBoxLayout(self)
|
||||
root.setContentsMargins(20, 18, 20, 18)
|
||||
@@ -93,6 +105,52 @@ class GamesPage(QWidget):
|
||||
header.addWidget(self._rescan_btn)
|
||||
root.addLayout(header)
|
||||
|
||||
# In-progress diagnostic banner (hidden until a focused capture is running).
|
||||
self._banner = QFrame()
|
||||
self._banner.setObjectName("Card")
|
||||
self._banner.setStyleSheet(f"#Card {{ border: 1px solid {ACCENT}; }}")
|
||||
banner_h = QHBoxLayout(self._banner)
|
||||
banner_h.setContentsMargins(16, 10, 16, 10)
|
||||
banner_h.setSpacing(10)
|
||||
self._banner_label = QLabel("")
|
||||
self._banner_label.setWordWrap(True)
|
||||
self._banner_label.setStyleSheet(f"color: {ACCENT}; font-weight: 700; background: transparent;")
|
||||
banner_h.addWidget(self._banner_label, 1)
|
||||
self._finish_btn = QPushButton("Finish && analyze") # && → literal & (not a mnemonic)
|
||||
self._finish_btn.setObjectName("ActionButton")
|
||||
self._finish_btn.clicked.connect(self._finish_diagnostic)
|
||||
banner_h.addWidget(self._finish_btn)
|
||||
self._discard_btn = QPushButton("Discard")
|
||||
self._discard_btn.clicked.connect(self._discard_diagnostic)
|
||||
banner_h.addWidget(self._discard_btn)
|
||||
self._banner.hide()
|
||||
root.addWidget(self._banner)
|
||||
|
||||
# Hard-crash banner: a previous diagnostic ended without a clean stop.
|
||||
self._crash_banner = QFrame()
|
||||
self._crash_banner.setObjectName("Card")
|
||||
self._crash_banner.setStyleSheet(f"#Card {{ border: 1px solid {WARN}; }}")
|
||||
crash_h = QHBoxLayout(self._crash_banner)
|
||||
crash_h.setContentsMargins(16, 10, 16, 10)
|
||||
crash_h.setSpacing(10)
|
||||
self._crash_label = QLabel("")
|
||||
self._crash_label.setWordWrap(True)
|
||||
self._crash_label.setStyleSheet(f"color: {WARN}; font-weight: 700; background: transparent;")
|
||||
crash_h.addWidget(self._crash_label, 1)
|
||||
self._analyze_btn = QPushButton("Analyze crash")
|
||||
self._analyze_btn.setObjectName("ActionButton")
|
||||
self._analyze_btn.clicked.connect(self._analyze_crash)
|
||||
crash_h.addWidget(self._analyze_btn)
|
||||
self._dismiss_btn = QPushButton("Dismiss")
|
||||
self._dismiss_btn.clicked.connect(self._dismiss_crash)
|
||||
crash_h.addWidget(self._dismiss_btn)
|
||||
self._crash_banner.hide()
|
||||
root.addWidget(self._crash_banner)
|
||||
|
||||
self._diag_timer = QTimer(self)
|
||||
self._diag_timer.setInterval(1000)
|
||||
self._diag_timer.timeout.connect(self._poll_diag)
|
||||
|
||||
# Libraries (opt-in checkboxes)
|
||||
lib_card = QFrame()
|
||||
lib_card.setObjectName("Card")
|
||||
@@ -126,6 +184,7 @@ class GamesPage(QWidget):
|
||||
|
||||
self._load_cached() # instant display from the last scan
|
||||
QTimer.singleShot(400, self.refresh) # then rescan in the background on launch
|
||||
self._check_crash() # surface an interrupted (crashed) diagnostic
|
||||
|
||||
# --- loading ----------------------------------------------------------------------
|
||||
|
||||
@@ -233,9 +292,151 @@ class GamesPage(QWidget):
|
||||
os.path.basename(g.library.rstrip("/")) or g.library,
|
||||
steam.human_size(g.size_bytes),
|
||||
g.appid in new_appids,
|
||||
appid=g.appid,
|
||||
on_diagnose=self._start_diagnostic,
|
||||
))
|
||||
self._list.addStretch(1)
|
||||
|
||||
# --- guided diagnostic (M6/D12) ---------------------------------------------------
|
||||
|
||||
def _start_diagnostic(self, name: str, appid: str = "") -> None:
|
||||
from ..core import diagnostic, steam
|
||||
|
||||
if diagnostic.is_running():
|
||||
QMessageBox.information(
|
||||
self, "RigDoctor",
|
||||
"A capture is already running — finish or discard it first.")
|
||||
return
|
||||
|
||||
# Tell the user what the flow actually is, and offer to launch the game for them.
|
||||
box = QMessageBox(self)
|
||||
box.setIcon(QMessageBox.Icon.Information)
|
||||
box.setWindowTitle(f"Run Diagnostic — {name}")
|
||||
box.setText(f"Record a focused diagnostic while you play {name}?")
|
||||
box.setInformativeText(
|
||||
"RigDoctor will capture sensors in the background. Then:\n\n"
|
||||
"1. Play the game and try to reproduce the freeze / black screen / crash.\n"
|
||||
"2. When you're done — or after a hard freeze and reboot — come back here and "
|
||||
"click “Finish & analyze”.\n\n"
|
||||
"Your readings are saved continuously, so even a hard lock won't lose them."
|
||||
)
|
||||
launch_btn = box.addButton("Launch game && start", QMessageBox.ButtonRole.AcceptRole)
|
||||
start_btn = box.addButton("Start without launching", QMessageBox.ButtonRole.ActionRole)
|
||||
box.addButton("Cancel", QMessageBox.ButtonRole.RejectRole)
|
||||
if not appid:
|
||||
launch_btn.setEnabled(False) # no appid → can't ask Steam to launch it
|
||||
box.exec()
|
||||
clicked = box.clickedButton()
|
||||
if clicked not in (launch_btn, start_btn):
|
||||
return
|
||||
|
||||
if diagnostic.start(game=name) is None:
|
||||
QMessageBox.warning(self, "RigDoctor", "Couldn't start the capture.")
|
||||
return
|
||||
launched = steam.launch_game(appid) if clicked is launch_btn else False
|
||||
self._diag_game = name
|
||||
self._finish_btn.setEnabled(True)
|
||||
self._discard_btn.setEnabled(True)
|
||||
self._banner.show()
|
||||
self._diag_timer.start()
|
||||
self._poll_diag()
|
||||
if clicked is launch_btn and not launched:
|
||||
QMessageBox.information(
|
||||
self, "RigDoctor",
|
||||
"Recording started, but couldn't launch the game automatically — "
|
||||
"launch it yourself, then click “Finish & analyze” when you're done.")
|
||||
|
||||
def _poll_diag(self) -> None:
|
||||
from ..core import diagnostic
|
||||
|
||||
status = diagnostic.active()
|
||||
if not status:
|
||||
self._diag_timer.stop() # recorder exited on its own
|
||||
return
|
||||
samples = status.get("samples", 0)
|
||||
lost = " · ⚠ GPU-lost detected" if status.get("gpu_lost") else ""
|
||||
game = status.get("game") or self._diag_game or "your game"
|
||||
self._banner_label.setText(
|
||||
f"● Recording {game} — play it and reproduce the problem, then click "
|
||||
f"“Finish & analyze”. ({samples} samples{lost})"
|
||||
)
|
||||
|
||||
def _finish_diagnostic(self) -> None:
|
||||
self._diag_timer.stop()
|
||||
self._finish_btn.setEnabled(False)
|
||||
self._discard_btn.setEnabled(False)
|
||||
self._banner_label.setText("Analyzing… (running the health report)")
|
||||
threading.Thread(target=self._work_finish, daemon=True).start()
|
||||
|
||||
def _work_finish(self) -> None:
|
||||
from ..core import diagnostic
|
||||
|
||||
try:
|
||||
result = diagnostic.finish()
|
||||
except Exception:
|
||||
result = None
|
||||
self._diag_done.emit(result)
|
||||
|
||||
def _on_diag_done(self, result) -> None:
|
||||
self._banner.hide()
|
||||
self._crash_banner.hide()
|
||||
self._finish_btn.setEnabled(True)
|
||||
self._discard_btn.setEnabled(True)
|
||||
self._analyze_btn.setEnabled(True)
|
||||
if result is None:
|
||||
QMessageBox.warning(self, "RigDoctor", "The diagnostic couldn't be analyzed.")
|
||||
return
|
||||
DiagnosticDialog(result, self).exec()
|
||||
|
||||
def _discard_diagnostic(self) -> None:
|
||||
from ..core import reccontrol
|
||||
|
||||
self._diag_timer.stop()
|
||||
reccontrol.stop_background()
|
||||
self._banner.hide()
|
||||
|
||||
# --- hard-crash recovery ----------------------------------------------------------
|
||||
|
||||
def _check_crash(self) -> None:
|
||||
from ..core import diagnostic
|
||||
|
||||
info = diagnostic.pending_crash()
|
||||
if info is None:
|
||||
self._crash_banner.hide()
|
||||
return
|
||||
game = info.game or "your last game"
|
||||
extra = " · ⚠ GPU-lost was captured" if info.gpu_lost else ""
|
||||
self._crash_label.setText(
|
||||
f"⚠ Your last diagnostic for {game} ended unexpectedly — likely a hard crash "
|
||||
f"({info.samples} samples{extra}). Analyze it to see the final readings and the "
|
||||
f"likely cause from the system logs."
|
||||
)
|
||||
self._analyze_btn.setEnabled(True)
|
||||
self._crash_banner.show()
|
||||
|
||||
def _analyze_crash(self) -> None:
|
||||
from ..core import diagnostic
|
||||
|
||||
diagnostic.acknowledge_crash() # don't prompt again for this one
|
||||
self._analyze_btn.setEnabled(False)
|
||||
self._crash_label.setText("Analyzing the crash (final readings + system logs)…")
|
||||
threading.Thread(target=self._work_analyze_crash, daemon=True).start()
|
||||
|
||||
def _work_analyze_crash(self) -> None:
|
||||
from ..core import diagnostic
|
||||
|
||||
try:
|
||||
result = diagnostic.analyze_crash()
|
||||
except Exception:
|
||||
result = None
|
||||
self._diag_done.emit(result)
|
||||
|
||||
def _dismiss_crash(self) -> None:
|
||||
from ..core import diagnostic
|
||||
|
||||
diagnostic.acknowledge_crash()
|
||||
self._crash_banner.hide()
|
||||
|
||||
# --- nav badge integration --------------------------------------------------------
|
||||
|
||||
def showEvent(self, event) -> None: # noqa: N802 (Qt override)
|
||||
@@ -247,3 +448,15 @@ class GamesPage(QWidget):
|
||||
|
||||
threading.Thread(target=steam.acknowledge_new, daemon=True).start()
|
||||
self.new_count_changed.emit(0)
|
||||
|
||||
# Reflect a capture that's still running (e.g. started earlier, navigated back).
|
||||
from ..core import diagnostic
|
||||
|
||||
if diagnostic.is_running():
|
||||
status = diagnostic.active() or {}
|
||||
self._diag_game = status.get("game") or self._diag_game
|
||||
self._banner.show()
|
||||
if not self._diag_timer.isActive():
|
||||
self._diag_timer.start()
|
||||
else:
|
||||
self._check_crash() # re-surface an interrupted diagnostic if one is pending
|
||||
|
||||
@@ -2,8 +2,10 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from PySide6.QtCore import QRectF, Qt
|
||||
from PySide6.QtGui import QColor, QFont, QPainter, QPen
|
||||
from collections import deque
|
||||
|
||||
from PySide6.QtCore import QPointF, QRectF, Qt
|
||||
from PySide6.QtGui import QColor, QFont, QPainter, QPainterPath, QPen
|
||||
from PySide6.QtWidgets import (
|
||||
QComboBox,
|
||||
QFrame,
|
||||
@@ -17,7 +19,19 @@ from PySide6.QtWidgets import (
|
||||
|
||||
from ..core.sample import Reading
|
||||
from ..render import format_value
|
||||
from .theme import ACCENT, CRIT, GOOD, MUTED, TEXT, TRACK, WARN, gauge_color, temp_color
|
||||
from .theme import (
|
||||
ACCENT,
|
||||
CRIT,
|
||||
GOOD,
|
||||
MUTED,
|
||||
TEMP_WARN,
|
||||
TEXT,
|
||||
TRACK,
|
||||
USAGE_WARN,
|
||||
WARN,
|
||||
gauge_color,
|
||||
temp_color,
|
||||
)
|
||||
|
||||
_SEV = {
|
||||
"critical": ("CRITICAL", CRIT),
|
||||
@@ -248,6 +262,117 @@ class StatGauge(QWidget):
|
||||
p.end()
|
||||
|
||||
|
||||
class HistoryGraph(QWidget):
|
||||
"""A headline metric as a trend: current value + window min/max + a history line.
|
||||
|
||||
Replaces the at-a-glance gauge with changes-over-time. `kind` drives the color
|
||||
(temp band / usage / accent), matching StatGauge so the dashboard stays consistent.
|
||||
"""
|
||||
|
||||
def __init__(self, title: str, unit: str = "", vmin: float = 0.0, vmax: float = 100.0,
|
||||
kind: str = "accent", history: int = 180) -> None:
|
||||
super().__init__()
|
||||
self._title = title
|
||||
self._unit = unit
|
||||
self._min = vmin
|
||||
self._max = vmax
|
||||
self._kind = kind # "temp" | "usage" | "accent"
|
||||
self._values: deque[float | None] = deque(maxlen=history)
|
||||
self.setMinimumSize(160, 132)
|
||||
|
||||
def add_value(self, value: float | None) -> None:
|
||||
self._values.append(value)
|
||||
self.update()
|
||||
|
||||
def _fmt(self, value: float | None) -> str:
|
||||
if value is None:
|
||||
return "—"
|
||||
if self._unit == "°C":
|
||||
return f"{value:.0f}°"
|
||||
if self._unit == "%":
|
||||
return f"{value:.0f}%"
|
||||
return f"{value:.0f}{self._unit}"
|
||||
|
||||
def paintEvent(self, event) -> None: # noqa: N802 (Qt override)
|
||||
p = QPainter(self)
|
||||
p.setRenderHint(QPainter.RenderHint.Antialiasing)
|
||||
w, h = self.width(), self.height()
|
||||
pad = 10.0
|
||||
present = [v for v in self._values if v is not None]
|
||||
current = next((v for v in reversed(self._values) if v is not None), None)
|
||||
color = QColor(gauge_color(self._kind, current))
|
||||
|
||||
ftitle = QFont()
|
||||
ftitle.setPointSizeF(10.0)
|
||||
ftitle.setBold(True)
|
||||
p.setFont(ftitle)
|
||||
p.setPen(QColor(MUTED))
|
||||
p.drawText(QRectF(pad, 6, w - 2 * pad, 18),
|
||||
Qt.AlignmentFlag.AlignLeft | Qt.AlignmentFlag.AlignVCenter, self._title)
|
||||
|
||||
fval = QFont()
|
||||
fval.setPointSizeF(21.0)
|
||||
fval.setBold(True)
|
||||
p.setFont(fval)
|
||||
p.setPen(color if current is not None else QColor(MUTED))
|
||||
p.drawText(QRectF(pad, 2, w - 2 * pad, 28),
|
||||
Qt.AlignmentFlag.AlignRight | Qt.AlignmentFlag.AlignTop, self._fmt(current))
|
||||
|
||||
if present:
|
||||
fsm = QFont()
|
||||
fsm.setPointSizeF(8.5)
|
||||
p.setFont(fsm)
|
||||
p.setPen(QColor(MUTED))
|
||||
p.drawText(QRectF(pad, 27, w - 2 * pad, 14), Qt.AlignmentFlag.AlignLeft,
|
||||
f"min {self._fmt(min(present))} max {self._fmt(max(present))}")
|
||||
|
||||
g_top, g_bot = 48.0, h - pad
|
||||
g_left, g_right = pad, w - pad
|
||||
span = self._max - self._min
|
||||
if g_bot - g_top < 12 or g_right - g_left < 12 or span <= 0:
|
||||
p.end()
|
||||
return
|
||||
|
||||
def y_of(v: float) -> float:
|
||||
frac = (max(self._min, min(self._max, v)) - self._min) / span
|
||||
return g_bot - frac * (g_bot - g_top)
|
||||
|
||||
warn = TEMP_WARN if self._kind == "temp" else (USAGE_WARN if self._kind == "usage" else None)
|
||||
if warn is not None and self._min <= warn <= self._max:
|
||||
pen = QPen(QColor(TRACK))
|
||||
pen.setWidthF(1.0)
|
||||
pen.setStyle(Qt.PenStyle.DashLine)
|
||||
p.setPen(pen)
|
||||
yw = y_of(warn)
|
||||
p.drawLine(QPointF(g_left, yw), QPointF(g_right, yw))
|
||||
|
||||
maxlen = self._values.maxlen or 1
|
||||
step = (g_right - g_left) / max(1, maxlen - 1)
|
||||
n = len(self._values)
|
||||
# Build the line newest-at-right; break it where readings are missing.
|
||||
path = QPainterPath()
|
||||
drawing = False
|
||||
for i, v in enumerate(self._values):
|
||||
if v is None:
|
||||
drawing = False
|
||||
continue
|
||||
x = g_right - (n - 1 - i) * step
|
||||
y = y_of(v)
|
||||
if drawing:
|
||||
path.lineTo(x, y)
|
||||
else:
|
||||
path.moveTo(x, y)
|
||||
drawing = True
|
||||
if not path.isEmpty():
|
||||
pen = QPen(color)
|
||||
pen.setWidthF(2.0)
|
||||
pen.setCapStyle(Qt.PenCapStyle.RoundCap)
|
||||
pen.setJoinStyle(Qt.PenJoinStyle.RoundJoin)
|
||||
p.setPen(pen)
|
||||
p.drawPath(path)
|
||||
p.end()
|
||||
|
||||
|
||||
class MetricBar(QWidget):
|
||||
"""A label + value with a thin progress bar (for 0–100% metrics)."""
|
||||
|
||||
|
||||
@@ -0,0 +1,107 @@
|
||||
"""Tests for the guided diagnostic orchestration (M3+M4 glue)."""
|
||||
|
||||
import tempfile
|
||||
import time
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
from unittest import mock
|
||||
|
||||
from rigdoctor.core import diagnostic
|
||||
from rigdoctor.core.crashlog import CrashLogWriter, summarize
|
||||
from rigdoctor.core.health import Finding
|
||||
from rigdoctor.core.sample import Reading, Sample
|
||||
|
||||
|
||||
def _write_log(path: str, game: str) -> None:
|
||||
w = CrashLogWriter(path)
|
||||
w.write_event("session-start", "interval=1s")
|
||||
w.write_event("game", game)
|
||||
for temp in (60.0, 72.0, 81.0):
|
||||
w.write_sample(Sample(ts=time.time(), readings=[Reading("gpu", "temp", temp, "°C", "")]))
|
||||
w.write_event("gpu-lost", "nvidia-smi query timed out")
|
||||
w.close()
|
||||
|
||||
|
||||
class GameRecoveryTests(unittest.TestCase):
|
||||
def test_game_recovered_from_log_event(self):
|
||||
with tempfile.TemporaryDirectory() as d:
|
||||
log = str(Path(d) / "capture.jsonl")
|
||||
_write_log(log, "Path of Exile 2")
|
||||
summary = summarize(log)
|
||||
self.assertEqual(diagnostic._game_from_summary(summary), "Path of Exile 2")
|
||||
|
||||
def test_no_game_event_returns_none(self):
|
||||
with tempfile.TemporaryDirectory() as d:
|
||||
log = str(Path(d) / "capture.jsonl")
|
||||
w = CrashLogWriter(log)
|
||||
w.write_event("session-start")
|
||||
w.close()
|
||||
self.assertIsNone(diagnostic._game_from_summary(summarize(log)))
|
||||
|
||||
|
||||
class FinishTests(unittest.TestCase):
|
||||
def test_finish_combines_summary_and_findings(self):
|
||||
with tempfile.TemporaryDirectory() as d:
|
||||
log = Path(d) / "capture.jsonl"
|
||||
_write_log(str(log), "Satisfactory")
|
||||
fake = [Finding("warning", "GPU", "NVIDIA Xid 79 ×1", "fell off the bus")]
|
||||
with mock.patch("rigdoctor.core.health.run_health_checks", return_value=fake), \
|
||||
mock.patch.object(diagnostic.reccontrol, "stop_background", return_value=False), \
|
||||
mock.patch.object(diagnostic.reccontrol, "running_pid", return_value=None):
|
||||
result = diagnostic.finish(log_path=log)
|
||||
self.assertEqual(result.game, "Satisfactory")
|
||||
self.assertEqual(result.summary.samples, 3)
|
||||
self.assertEqual(result.findings, fake)
|
||||
# peak GPU temp captured in the window, GPU-lost event recorded
|
||||
self.assertEqual(result.summary.maxima["gpu.temp"][0], 81.0)
|
||||
self.assertTrue(any(kind == "gpu-lost" for _ts, kind, _d in result.summary.events))
|
||||
|
||||
|
||||
class CrashDetectionTests(unittest.TestCase):
|
||||
def _diag_log(self, d) -> Path:
|
||||
return Path(d) / "diagnostic.jsonl"
|
||||
|
||||
def test_unterminated_session_is_a_pending_crash(self):
|
||||
with tempfile.TemporaryDirectory() as d:
|
||||
log = self._diag_log(d)
|
||||
_write_log(str(log), "Tarkov") # has session-start + game, no session-stop
|
||||
with mock.patch.object(diagnostic.config, "DIAG_LOG", log), \
|
||||
mock.patch.object(diagnostic.reccontrol, "running_pid", return_value=None):
|
||||
info = diagnostic.pending_crash()
|
||||
self.assertIsNotNone(info)
|
||||
self.assertEqual(info.game, "Tarkov")
|
||||
self.assertTrue(info.gpu_lost) # _write_log writes a gpu-lost event
|
||||
|
||||
def test_clean_stop_is_not_a_crash(self):
|
||||
with tempfile.TemporaryDirectory() as d:
|
||||
log = self._diag_log(d)
|
||||
w = CrashLogWriter(str(log))
|
||||
w.write_event("session-start"); w.write_event("game", "X")
|
||||
w.write_sample(Sample(time.time(), [Reading("gpu", "temp", 60.0, "°C", "")]))
|
||||
w.write_event("session-stop", "samples=1")
|
||||
w.close()
|
||||
with mock.patch.object(diagnostic.config, "DIAG_LOG", log), \
|
||||
mock.patch.object(diagnostic.reccontrol, "running_pid", return_value=None):
|
||||
self.assertIsNone(diagnostic.pending_crash())
|
||||
|
||||
def test_acknowledge_clears_pending_crash(self):
|
||||
with tempfile.TemporaryDirectory() as d:
|
||||
log = self._diag_log(d)
|
||||
_write_log(str(log), "Tarkov")
|
||||
with mock.patch.object(diagnostic.config, "DIAG_LOG", log), \
|
||||
mock.patch.object(diagnostic.reccontrol, "running_pid", return_value=None):
|
||||
self.assertIsNotNone(diagnostic.pending_crash())
|
||||
diagnostic.acknowledge_crash()
|
||||
self.assertIsNone(diagnostic.pending_crash())
|
||||
|
||||
def test_running_capture_is_not_a_crash(self):
|
||||
with tempfile.TemporaryDirectory() as d:
|
||||
log = self._diag_log(d)
|
||||
_write_log(str(log), "Tarkov")
|
||||
with mock.patch.object(diagnostic.config, "DIAG_LOG", log), \
|
||||
mock.patch.object(diagnostic.reccontrol, "running_pid", return_value=4321):
|
||||
self.assertIsNone(diagnostic.pending_crash()) # it's in-progress, not crashed
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
Reference in New Issue
Block a user