Merge pull request 'feat(m15): collect session-scoped system logs (kernel + coredumps) — 0.31.0' (#26) from feat/syslogs into main
release / release (push) Successful in 15s

Reviewed-on: #26
This commit was merged in pull request #26.
This commit is contained in:
2026-05-22 12:16:52 +00:00
10 changed files with 296 additions and 10 deletions
+21
View File
@@ -5,6 +5,27 @@ All notable changes to RigDoctor are recorded here. Format follows
(`MAJOR.MINOR.PATCH`, pre-1.0). `__version__` and `pyproject.toml` must match the git (`MAJOR.MINOR.PATCH`, pre-1.0). `__version__` and `pyproject.toml` must match the git
release tag (so the auto-updater, D18, can compare versions). release tag (so the auto-updater, D18, can compare versions).
## [0.32.0] - 2026-05-22
### Added
- **More for diagnostics & reports:**
- **`nvidia-smi -q` snapshot** — driver, throttle/clock-event reasons, clocks, power, temps,
PCIe link, ECC + retired pages (point-in-time at diagnostic time).
- **Display-server log** — auto-detected: `Xorg.0.log` on X11, or the compositor's user-journal
slice (gnome-shell/kwin/sway/gamescope) on Wayland.
- **Full system inventory** (M5 hardware/OS) is now included in each stored diagnostic and the
**Report** bundle — invaluable for larger/shared debugging.
These join the kernel log + coredump records in `syslogs.txt`/`inventory.*`, are saved per
diagnostic, included in the Report zip, and (logs) fed to the AI on "Explain".
## [0.31.0] - 2026-05-22
### Added
- **Diagnostics now collect session-scoped system logs** (`core/syslogs.py`): a kernel-log
slice (`journalctl -k` — Xid, OOM-killer, MCE, PCIe AER, thermal, hung tasks) and
**crashed-process records** (`coredumpctl` — which executable, signal, and when). They're saved
to the diagnostic directory (`syslogs.txt`), included in the **Report** bundle, and fed to the
AI on "Explain" alongside the game logs. Best-effort — degrades quietly if the tools are
missing or access is denied; scoped to the session window so it doesn't drag in old noise.
## [0.30.0] - 2026-05-22 ## [0.30.0] - 2026-05-22
### Added ### Added
- **Logging & report bundles (M15, D25)** — opt-in via one **Settings → Logging** toggle - **Logging & report bundles (M15, D25)** — opt-in via one **Settings → Logging** toggle
+7 -4
View File
@@ -132,10 +132,13 @@ Status: ⬜ not started · 🟦 designing · 🟨 in progress · ✅ done
- **M15 Logging & report bundles** (D25) — opt-in via one `logging_enabled` toggle (default off): - **M15 Logging & report bundles** (D25) — opt-in via one `logging_enabled` toggle (default off):
application logging to a rotating `app.log` (`core/applog.py`) and **per-diagnostic storage** application logging to a rotating `app.log` (`core/applog.py`) and **per-diagnostic storage**
(`core/diagstore.py`) — each diagnostic gets its own `DATA_DIR/diagnostics/<id>/` (capture, (`core/diagstore.py`) — each diagnostic gets its own `DATA_DIR/diagnostics/<id>/`: capture,
`result.json`, `report.txt`, scoped game logs, and an `ai/` record of every AI interaction: `result.json`, `report.txt`, the full **inventory** (M5: hardware/OS), scoped **game logs**
exact data sent, model, reply). **"Report"** zips one into `DATA_DIR/reports/` (GUI button on (`core/gamelogs.py`), scoped **system logs** (`core/syslogs.py``journalctl -k`,
the diagnostic dialog; CLI `rigdoctor bundle`). Stays local; shareable on demand. `coredumpctl`, an `nvidia-smi -q` snapshot, and the X11/Wayland display-server log), and an
`ai/` record of every AI interaction (exact data sent, model, reply). **"Report"** zips one
into `DATA_DIR/reports/` (GUI button on the diagnostic dialog; CLI `rigdoctor bundle`). Logs
are session-scoped and fed to the AI on "Explain". Stays local; shareable on demand.
## Bundles (final — D14) ## Bundles (final — D14)
- **Essential:** M1 + M3 + M4 *(the MVP, NVIDIA-only — D5)* - **Essential:** M1 + M3 + M4 *(the MVP, NVIDIA-only — D5)*
+5 -2
View File
@@ -165,8 +165,11 @@ the actual findings plus matched reference facts from a curated, exact-match kno
### M15 — Logging & report bundles (D25) ### M15 — Logging & report bundles (D25)
Opt-in (one `logging_enabled` toggle, default off). When on: the application logs to a rotating Opt-in (one `logging_enabled` toggle, default off). When on: the application logs to a rotating
`app.log`, and **each diagnostic is stored in its own directory** (capture log, structured `app.log`, and **each diagnostic is stored in its own directory** (capture log, structured
result, human-readable report, scoped game logs, and a record of every AI interaction — the result, human-readable report, the full **inventory** (M5 hardware/OS), session-scoped **game
exact data sent, the model, and its reply). A **Report** action zips one diagnostic's directory logs** (Proton/Steam) and **system logs** (`journalctl -k`, `coredumpctl`, an `nvidia-smi -q`
snapshot, and the X11/Wayland display-server log), and a record of every AI interaction — the
exact data sent, the model, and its reply). The collected logs are also fed to the AI on
"Explain". Collection is best-effort (degrades if tools are missing/denied). A **Report** action zips one diagnostic's directory
(plus the app log) into a shareable bundle saved under the reports folder (GUI button; CLI (plus the app log) into a shareable bundle saved under the reports folder (GUI button; CLI
`rigdoctor bundle`). Everything stays local — a report only leaves the machine if the user `rigdoctor bundle`). Everything stays local — a report only leaves the machine if the user
shares the zip. Stdlib only (`logging` + `zipfile`). shares the zip. Stdlib only (`logging` + `zipfile`).
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "rigdoctor" name = "rigdoctor"
version = "0.30.0" version = "0.32.0"
description = "Modular hardware monitoring & crash diagnostics for Linux gamers." description = "Modular hardware monitoring & crash diagnostics for Linux gamers."
readme = "README.md" readme = "README.md"
requires-python = ">=3.11" requires-python = ">=3.11"
+1 -1
View File
@@ -1,3 +1,3 @@
"""RigDoctor — modular hardware monitoring & crash diagnostics for Linux gamers.""" """RigDoctor — modular hardware monitoring & crash diagnostics for Linux gamers."""
__version__ = "0.30.0" __version__ = "0.32.0"
+17 -1
View File
@@ -51,7 +51,7 @@ def store(result, capture_path=None, since: float | None = None) -> Path | None:
if not enabled(): if not enabled():
return None return None
from ..render import render_summary from ..render import render_summary
from . import ai, gamelogs from . import ai, gamelogs, syslogs
target = _new_dir(getattr(result, "game", None)) target = _new_dir(getattr(result, "game", None))
@@ -80,6 +80,22 @@ def store(result, capture_path=None, since: float | None = None) -> Path | None:
_write(target / "gamelogs.txt", logs) _write(target / "gamelogs.txt", logs)
except OSError: except OSError:
pass pass
try:
sys_logs = syslogs.collect(since=since)
if sys_logs:
_write(target / "syslogs.txt", sys_logs)
except OSError:
pass
try: # full hardware/OS inventory (M5) — invaluable for larger debugging in a shared report
from . import inventory
sections = inventory.collect()
_write(target / "inventory.txt", inventory.render_text(sections))
_write(target / "inventory.json", inventory.render_json(sections))
except Exception: # inventory probes vary by machine; never let it break storage
pass
return target return target
+141
View File
@@ -0,0 +1,141 @@
"""Session-scoped system logs for diagnostics (M15): kernel, coredumps, NVIDIA, display.
Covers what the *system* logged when something went wrong, so the report bundle and the AI both
see it:
* kernel ring-buffer slice (`journalctl -k`) — Xid, OOM-killer, MCE, PCIe AER, thermal, hung tasks
* systemd-coredump records (`coredumpctl`) — did the game/wine dump core (SIGSEGV/ABRT), when
* an `nvidia-smi -q` snapshot — driver, throttle/clock-event reasons, clocks, power, temps, PCIe,
ECC + retired pages (point-in-time at diagnostic time)
* the display-server log — `Xorg.0.log` on X11, or the compositor's user-journal slice on Wayland
Best-effort and size-bounded: degrades silently if a tool is missing or access is denied. Stdlib only.
"""
from __future__ import annotations
import os
import shutil
import subprocess
import time
from pathlib import Path
_MAX = 8000 # cap each log section so the prompt/report stays small
_NV_MAX = 10000 # nvidia-smi -q is structured + valuable; allow a bit more (head-truncated)
# Compositors whose user-journal entries are the "Wayland log" (OR-matched by journalctl).
_COMPOSITORS = ("gnome-shell", "mutter", "kwin_wayland", "Xwayland", "sway", "gamescope")
_XORG_LOGS = ("~/.local/share/xorg/Xorg.0.log", "/var/log/Xorg.0.log")
def _since_arg(since: float | None) -> str | None:
return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(since)) if since else None
def _run(cmd: list[str], timeout: float = 15.0) -> str:
try:
proc = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)
except (OSError, subprocess.SubprocessError):
return ""
return (proc.stdout or "").strip()
def kernel_log(since: float | None = None, max_bytes: int = _MAX) -> str:
if not shutil.which("journalctl"):
return ""
cmd = ["journalctl", "-k", "--no-pager"]
since_arg = _since_arg(since)
if since_arg:
cmd += ["--since", since_arg]
out = _run(cmd)
if not out or out.strip().lower() == "-- no entries --": # journalctl's empty marker
return ""
return out[-max_bytes:]
def coredumps(since: float | None = None, max_bytes: int = _MAX) -> str:
if not shutil.which("coredumpctl"):
return ""
cmd = ["coredumpctl", "list", "--no-pager"]
since_arg = _since_arg(since)
if since_arg:
cmd += ["--since", since_arg]
out = _run(cmd)
if not out or "no coredumps" in out.lower():
return ""
return out[-max_bytes:]
def nvidia_snapshot(max_bytes: int = _NV_MAX) -> str:
"""Point-in-time `nvidia-smi -q` (head-truncated — driver/temps/clocks/ECC sit near the top)."""
if not shutil.which("nvidia-smi"):
return ""
out = _run(["nvidia-smi", "-q"])
return out[:max_bytes] if out else ""
def _xorg_log() -> Path | None:
for cand in _XORG_LOGS:
path = Path(os.path.expanduser(cand))
if path.exists():
return path
return None
def _session_type() -> str:
declared = os.environ.get("XDG_SESSION_TYPE", "").lower()
if declared in ("x11", "wayland"):
return declared
if os.environ.get("WAYLAND_DISPLAY"):
return "wayland"
return "x11" if _xorg_log() else "unknown"
def _tail_file(path: Path, max_bytes: int) -> str:
try:
size = path.stat().st_size
with path.open("rb") as fh:
if size > max_bytes:
fh.seek(size - max_bytes)
return fh.read().decode("utf-8", "replace")
except OSError:
return ""
def display_log(since: float | None = None, max_bytes: int = _MAX) -> str:
"""Xorg.0.log on X11, or the compositor's user-journal slice on Wayland ('' if none)."""
if _session_type() == "wayland":
if not shutil.which("journalctl"):
return ""
cmd = ["journalctl", "--user", "--no-pager"]
since_arg = _since_arg(since)
if since_arg:
cmd += ["--since", since_arg]
cmd += [f"_COMM={comp}" for comp in _COMPOSITORS] # OR-matched
out = _run(cmd)
if not out or out.strip().lower() == "-- no entries --":
return ""
return out[-max_bytes:]
log = _xorg_log() # X11: Xorg log isn't wall-clock-timestamped, so tail rather than scope
return _tail_file(log, max_bytes) if log else ""
def available() -> bool:
return bool(shutil.which("journalctl") or shutil.which("coredumpctl")
or shutil.which("nvidia-smi") or _xorg_log())
def collect(since: float | None = None) -> str:
"""Kernel + coredumps + NVIDIA snapshot + display log as one labelled block ('' if none)."""
sections: list[str] = []
kern = kernel_log(since)
if kern:
sections.append(f"--- Kernel log (journalctl -k) ---\n{kern}")
cores = coredumps(since)
if cores:
sections.append(f"--- Crashed processes (coredumpctl) ---\n{cores}")
nvidia = nvidia_snapshot()
if nvidia:
sections.append(f"--- NVIDIA snapshot (nvidia-smi -q) ---\n{nvidia}")
display = display_log(since)
if display:
sections.append(f"--- Display server log ({_session_type()}) ---\n{display}")
return "\n\n".join(sections)
+4 -1
View File
@@ -115,7 +115,7 @@ class DiagnosticDialog(QDialog):
threading.Thread(target=self._work_explain, daemon=True).start() threading.Thread(target=self._work_explain, daemon=True).start()
def _work_explain(self) -> None: def _work_explain(self) -> None:
from ..core import ai, gamelogs from ..core import ai, gamelogs, syslogs
result = self._result result = self._result
summary = result.summary summary = result.summary
@@ -139,6 +139,9 @@ class DiagnosticDialog(QDialog):
logs = gamelogs.collect(since=since) # scoped to this session logs = gamelogs.collect(since=since) # scoped to this session
if logs: if logs:
lines.append("\nGame/Proton/Steam logs for this session:\n" + logs) lines.append("\nGame/Proton/Steam logs for this session:\n" + logs)
sys_logs = syslogs.collect(since=since) # kernel log + crashed-process records
if sys_logs:
lines.append("\nSystem logs for this session (kernel + crashed processes):\n" + sys_logs)
text = "\n".join(lines) text = "\n".join(lines)
ok, reply = ai.explain(text) ok, reply = ai.explain(text)
if result.dir: # record exactly what was sent, the model, and the reply (M15) if result.dir: # record exactly what was sent, the model, and the reply (M15)
+4
View File
@@ -47,11 +47,15 @@ class StoreTests(unittest.TestCase):
with mock.patch.object(diagstore, "enabled", return_value=True), \ with mock.patch.object(diagstore, "enabled", return_value=True), \
mock.patch("rigdoctor.render.render_summary", return_value="SUMMARY-TEXT"), \ mock.patch("rigdoctor.render.render_summary", return_value="SUMMARY-TEXT"), \
mock.patch("rigdoctor.core.gamelogs.collect", return_value="LOG-TEXT"), \ mock.patch("rigdoctor.core.gamelogs.collect", return_value="LOG-TEXT"), \
mock.patch("rigdoctor.core.syslogs.collect", return_value="SYS-LOG"), \
mock.patch("rigdoctor.core.inventory.collect", return_value=[]), \
mock.patch.object(diagstore.config, "DIAGNOSTICS_DIR", self.tmp / "diagnostics"): mock.patch.object(diagstore.config, "DIAGNOSTICS_DIR", self.tmp / "diagnostics"):
directory = diagstore.store(FakeResult()) directory = diagstore.store(FakeResult())
self.assertTrue((directory / "result.json").exists()) self.assertTrue((directory / "result.json").exists())
self.assertTrue((directory / "report.txt").exists()) self.assertTrue((directory / "report.txt").exists())
self.assertEqual((directory / "gamelogs.txt").read_text(), "LOG-TEXT") self.assertEqual((directory / "gamelogs.txt").read_text(), "LOG-TEXT")
self.assertEqual((directory / "syslogs.txt").read_text(), "SYS-LOG")
self.assertTrue((directory / "inventory.txt").exists()) # inventory included for debugging
data = json.loads((directory / "result.json").read_text()) data = json.loads((directory / "result.json").read_text())
self.assertEqual(data["game"], "Path of Exile 2") self.assertEqual(data["game"], "Path of Exile 2")
self.assertEqual(len(data["findings"]), 1) self.assertEqual(len(data["findings"]), 1)
+95
View File
@@ -0,0 +1,95 @@
"""Tests for M15 session-scoped system-log collection (kernel + coredumps)."""
import unittest
from unittest import mock
from rigdoctor.core import syslogs
class KernelLogTests(unittest.TestCase):
def test_passes_since_and_tails(self):
with mock.patch("shutil.which", return_value="/usr/bin/journalctl"), \
mock.patch.object(syslogs, "_run", return_value="X" * 100 + "TAILLINE") as run:
out = syslogs.kernel_log(since=1_000_000_000, max_bytes=8)
self.assertEqual(out, "TAILLINE")
cmd = run.call_args[0][0]
self.assertIn("-k", cmd)
self.assertIn("--since", cmd)
def test_missing_tool_returns_empty(self):
with mock.patch("shutil.which", return_value=None):
self.assertEqual(syslogs.kernel_log(), "")
class CoredumpTests(unittest.TestCase):
def test_empty_when_no_coredumps(self):
with mock.patch("shutil.which", return_value="/usr/bin/coredumpctl"), \
mock.patch.object(syslogs, "_run", return_value="No coredumps found."):
self.assertEqual(syslogs.coredumps(), "")
def test_returns_list(self):
with mock.patch("shutil.which", return_value="/usr/bin/coredumpctl"), \
mock.patch.object(syslogs, "_run", return_value="TIME PID SIG EXE\n... SEGV PathOfExile"):
out = syslogs.coredumps()
self.assertIn("PathOfExile", out)
class NvidiaTests(unittest.TestCase):
def test_missing_tool(self):
with mock.patch("shutil.which", return_value=None):
self.assertEqual(syslogs.nvidia_snapshot(), "")
def test_snapshot_head_truncated(self):
with mock.patch("shutil.which", return_value="/usr/bin/nvidia-smi"), \
mock.patch.object(syslogs, "_run", return_value="DRIVER\n" + "x" * 99999):
out = syslogs.nvidia_snapshot(max_bytes=10)
self.assertEqual(out, "DRIVER\nxxx") # head, not tail
class DisplayTests(unittest.TestCase):
def test_session_type_env(self):
with mock.patch.dict("os.environ", {"XDG_SESSION_TYPE": "wayland"}):
self.assertEqual(syslogs._session_type(), "wayland")
def test_x11_tails_xorg_log(self):
import tempfile
from pathlib import Path
log = Path(tempfile.mkdtemp()) / "Xorg.0.log"
log.write_text("(EE) NVIDIA(GPU-0): something failed")
with mock.patch.object(syslogs, "_session_type", return_value="x11"), \
mock.patch.object(syslogs, "_xorg_log", return_value=log):
out = syslogs.display_log()
self.assertIn("(EE) NVIDIA", out)
def test_wayland_uses_user_journal(self):
with mock.patch.object(syslogs, "_session_type", return_value="wayland"), \
mock.patch("shutil.which", return_value="/usr/bin/journalctl"), \
mock.patch.object(syslogs, "_run", return_value="gnome-shell: GPU error") as run:
out = syslogs.display_log(since=1_000_000_000)
self.assertIn("GPU error", out)
cmd = run.call_args[0][0]
self.assertIn("--user", cmd)
self.assertTrue(any(a.startswith("_COMM=") for a in cmd))
class CollectTests(unittest.TestCase):
def test_collect_combines_sections(self):
with mock.patch.object(syslogs, "kernel_log", return_value="NVRM: Xid 79"), \
mock.patch.object(syslogs, "coredumps", return_value="game SIGSEGV"), \
mock.patch.object(syslogs, "nvidia_snapshot", return_value="Driver Version 595"), \
mock.patch.object(syslogs, "display_log", return_value="(EE) NVIDIA"):
out = syslogs.collect()
for needle in ("Kernel log", "Xid 79", "Crashed processes", "SIGSEGV",
"NVIDIA snapshot", "595", "Display server log"):
self.assertIn(needle, out)
def test_collect_empty_when_nothing(self):
with mock.patch.object(syslogs, "kernel_log", return_value=""), \
mock.patch.object(syslogs, "coredumps", return_value=""), \
mock.patch.object(syslogs, "nvidia_snapshot", return_value=""), \
mock.patch.object(syslogs, "display_log", return_value=""):
self.assertEqual(syslogs.collect(), "")
if __name__ == "__main__":
unittest.main()