Changelog
All notable changes to the Elixir/Phoenix Claude Code plugin.
Format: Keep a Changelog. Versioning: Semantic Versioning.
[Unreleased]
Added
Changed
Fixed
[2.12.0] - 2026-06-16
Workflow-completion release driven by 400-session analysis: three new skills
(/phx:recall, /phx:deps-update, /phx:watch-pr), /phx:pr-review v2 that
closes the review loop (fetch → fix → reply → resolve), a route-intent.sh
UserPromptSubmit hook replacing ~0%-firing CLAUDE.md prose routing, four new
Iron Laws (#23–#26), and an eval-hardening pass that backfilled the
AskUserQuestion 4-option check, cross-file consistency tests, and untracked-file
detection. Law count 22 → 26.
Added
- Iron Law #26 — Comments aren't commit messages (session analysis found
Oliver asking "remove unnecessary comments" on essentially every PR, 8+
sessions clustered June 2026). A change's reasoning — the bug, what it
replaces, the task — belongs in the commit/PR/squash, which git persists; not
in code comments. No issue-reference tags inline (
# ENA-1234). Keep only durable intrinsic facts a future reader needs regardless of history: footguns, invariants, library quirks. Wired into CLAUDE.md, theinject-iron-laws.shSubagentStart hook (code-writing subagents inherit it), theiron-law-judgeagent as detection#19(so/phx:reviewflags ticket tags, change-narration, and what-comments), and theinitinjectable template. Stops the comments being added during/phx:work//phx:quickrather than stripping them at PR time. Law count 25 → 26. - UserPromptSubmit routing hook (
route-intent.sh) — injects one-line/phx:suggestions directly into Claude's context for three high-signal intents: GitHub PR URLs / review-feedback phrasing →/phx:pr-review, Tidewave<context name="current-page">blocks →/phx:investigate, Elixir stack-trace pastes →/phx:investigate. Replaces CLAUDE.md prose routing rules measured at ~0% firing rate across 400 sessions. One suggestion per category per session, silent on explicit slash commands, gated onmix.exs, always exits 0 (UserPromptSubmit exit 2 would erase the user's prompt). /phx:recall— session and history archaeology (git-archaeology sessions ran manualgit log/diffpipelines with no plugin support). Three evidence layers, cheapest first:.claude/solutions/compound docs → git archaeology (--grep,-Spickaxe,--follow,-L) → ccrider MCP session search, gated with graceful degradation when the MCP is absent. ONE ccrider fetch = ONE subagent (3–15KB responses; writes a ≤30-line summary file). Every answer cites its evidence; clean misses are stated, then routed to/phx:compoundso the next recall stops at layer 1. 100% trigger accuracy./phx:deps-update— generic dependency freshness workflow (dependency maintenance was a recurring session pattern with no plugin support). Inventory viamix hex.outdated(exit 1 = normal "outdated" signal), changelog deltas via the built-inmix hex.package diff <pkg> <v1>..<v2>(no project-specific mix tasks), updates with coupled-group enforcement (Phoenix core, Ecto, Ash, Oban, telemetry families move together), breaking-change fixes, and PR splitting (patches bundled, minors by area, majors solo). Majors require an explicitmix.exsedit;override: trueonly when the per-package constraint table shows a transitive blocker. Hands off security to/phx:deps-audit(Mode B) and verification to/phx:verify. The only mutating deps skill — audit/vet stay read-only. 89% trigger accuracy./phx:watch-pr— token-conscious PR/CI watching (replaces hand-rolled 60-min foregroundsleeploops observed in session analysis). A quiet background watcher (scripts/watch-pr.sh, Monitor-tool-first withrun_in_backgroundfallback) pollsgh pr view --jsonin its own process and emits ONE line per genuinely-new event (review, comment, CI conclusion, merged/ closed, watchdog, gh-failure) — raw JSON never enters Claude's context, and Claude takes zero turns while idle (no cache-TTL straddling).--checks-onlydelegates togh pr checks --watch --fail-fast(exit code is the signal). Routes actionable reviews to/phx:pr-reviewand CI failures to/phx:investigate. 100% trigger accuracy on the new fixture./phx:pr-reviewv2 — closes the review loop (fetch → fix → reply → resolve). The old skill drafted replies but used REST endpoints that expose neither thread IDs nor resolved status, so it could never resolve a thread or skip handled ones. v2 fetches threads via GraphQLreviewThreads(thread ID +isResolved+isOutdated, paginated), replies via REST to the thread root, resolves viaresolveReviewThread, and is idempotent across review rounds — GitHub'sisResolvedis the state. New flags:--bots-only(triage CI bot passes — Copilot/Codex/CodeRabbit detected via__typename == "Bot"),--no-resolve. New Iron Laws: never resolve without a reply, never claim a fix without a shown diff, bot findings get the same scrutiny as humans. New references:gh-commands.md(3 comment surfaces, pagination, bot detection),bot-triage.md(batch flow + Elixir false-positive patterns).- Three new Iron Laws (#23–#25) from the 400-session analysis, wired into
elixir-idioms,liveview-patterns, the/phx:inittemplate, the SubagentStart injection hook, andiron-law-judgedetection patterns:- #23 Mix tasks start only what they need —
Mix.Task.run("app.config")+Application.ensure_all_started/1, neverMix.Task.run("app.start")(boots the full tree: endpoint binds the port, Oban starts consuming jobs). Themix-tasks.mdreference previously taught the anti-pattern; now fixed. - #24 LiveView handlers match
{:error, %Ecto.Changeset{}}explicitly — bare{:error, _}silently swallows form validation errors. - #25 Capture Gettext/CLDR locale before spawning Task/GenServer — locale is process-local; spawned processes reset to default.
- #23 Mix tasks start only what they need —
- Pre-migration safety section in
ecto-patterns/references/migrations.md— check duplicates (including soft-deleted rows) before unique indexes, with partial-index/data-fix/composite-key resolutions. - Tidewave reliability guards in
tidewave-integration— worktree/port verification (multi-worktree setups debug the wrong server), schema introspection before SQL, output-size caps,browser_evalserver-side fallbacks, and a QA-walkthrough pattern for feature smoke tests. - Eval: AskUserQuestion 4-option-limit check (
askuserquestion_option_limitmatcher) — the tool silently drops a 5th option; brainstorm shipped that way for months. Scans option lists after every AskUserQuestion mention (YAML- label:blocks and bullet/numbered runs), stops at headings, and skips sibling list items when the mention is itself inside a list. Backfilled into all 50 skill evals and the generator template; caught a real second instance in/phx:plan. - Eval: cross-file consistency tests (
lab/eval/tests/test_consistency.py) — two bug classes per-skill scoring can't see: references teaching anti-patterns their own Iron Laws ban (mix-tasks.md shipped theapp.startpattern Iron Law #23 bans), and skill scripts using cwd-relative.claude/paths (the nested-state-dir bug class). The path lint caught a 4th live instance inscripts/fetch-claude-docs.sh. make evalnow sees untracked files — brand-new skills/agents were invisible to thegit diff-based changed-file detection until first commit;git ls-files --othersis now merged into both detection paths.
Changed
Workflow handoffs between phases —
/phx:investigatenow ends with a routing step (quick fix vs/phx:planvs/phx:compound);/phx:reviewpasses the review file path to/phx:planfor follow-up plans;/phx:worksuggests/phx:compoundafter non-obvious fixes and re-verifies stale plans from earlier sessions./phx:fulldeflects existing plan files — description and a usage guard route.claude/plans/*/plan.mdarguments to/phx:workinstead of re-planning.intent-detectionhard guard — skips entirely when the message starts with any slash command; no more routing suggestions on top of explicit commands./phx:workbatches checkbox updates — one edit pass when several tasks complete together, not one Edit call per checkbox./phx:compoundwrite-block fallback — outputs the solution doc inline and points at/phx:permissionsinstead of silently dropping knowledge;/phx:permissionsnow always recommends workflow-artifact write grants (.claude/plans/,.claude/solutions/,.claude/reviews/).AskUserQuestion discipline in
brainstorm/triage— decisions only, concrete impact per option; fixed brainstorm's Decision Point exceeding the tool's 4-option limit (5 options meant one was always silently dropped).security-analyzer— new end-to-end flow checks from the 400-session analysis: IDOR viahandle_paramsURL params, data-flow through multi-step transforms, failure-path consistency inEcto.Multi/withchains, soft-delete leakage in authz lookups — all bug classes external review bots caught after plugin review passed.elixir-reviewer— failure-path review section (Multi/with error branches, short-circuit side effects, multi-step transforms, soft-delete filters), known false-positive traps (nil[:key]is nil-safe via Access), and diff-scoped reading rule to stop turn exhaustion on large PRs.verification-runner— compiles FIRST (turn 1 combines discovery +mix compile), maxTurns 10 → 15, earlier findings-file write; stops "compiling… let me check again" turn exhaustion observed on large PRs.parallel-reviewer+/phx:audit— rate-limit circuit breaker: when 2+ subagents fail with rate-limit/API errors, synthesize from existing outputs and tell the user to re-run after reset instead of dead-waiting on "continue".ecto-schema-designer— pre-UNIQUE-index migration safety check (duplicates + soft-deleted rows silently block production migrations).
Fixed
- Iron Law verifier is now blame-aware —
iron-law-verifier.shscans only the content the current Edit/Write introduced (new_string/content), not the whole file. Pre-existing violations in untouched regions no longer force unrelated refactors. block-dangerous-ops.shfails open on script errors — a corrupted hook file (e.g. merge-conflict markers) once blocked ALL Bash calls with no recovery; hooks.json now appends|| exit 0and the script documents the JSON-deny/exit-0 contract.- Stop hook warns about uncommitted feature-branch changes — prevents the lost-work-after-rebase incident class observed in session analysis.
liveview-architect+ecto-schema-designermissing Write — both agents still had the pre-v2.8.1disallowedTools: Write, ...frontmatter and fell back to inline output when spawned as reviewers ("I only have Read, Grep, and Glob"). Write now allowed for their own findings file; Edit stays disallowed.web-researchercould never write its output file — research workers were asked to save findings but had Write disallowed; agents burned all turns on fetches then lost the output. Write allowed + reserve-last-turns-for-output guard./phx:planpost-plan AskUserQuestion exceeded the 4-option limit — 5 options ("Review the plan" / "Adjust the plan" merged into one) meant one was always silently dropped. Fixed in the skill,planning-orchestrator, and both hook scripts that echo the list (precompact-rules.sh,plan-stop-reminder.sh).scripts/fetch-claude-docs.shwrote its cache relative to cwd — anchored to${CLAUDE_PROJECT_DIR:-$PWD}like the other skill scripts.
[2.11.0] - 2026-06-08
First-class Ash Framework support — an inline ash-framework skill (7 Iron
Laws, 100% trigger accuracy), three specialist agents, mix.exs auto-detection,
and Ash-aware output compression — plus the /phx:freeze scoped edit-lock and
per-skill eval + trigger coverage across all 47 skills.
Added
ash-frameworkskill — Iron Laws, generator workflow, and tiered research protocol (Tidewave →usage_rules→ WebFetch hexdocs.pm) for Ash Framework projects. Iron Laws: domain code interfaces, actor-on-query placement, generators first, codegen after resource changes, actions over functions, never edit resource snapshots, no directRepo.*.ash-resource-designeragent (sonnet) — designs Ash resources with actions, policies, relationships, and domain code interfaces. Outputs a design doc with generator commands and code interface stubs.ash-policy-revieweragent (sonnet) — audits Ash policy coverage,authorize?: falsebypass patterns, actor placement at call sites, and check module correctness.ash-query-optimizeragent (sonnet) — detects N+1 load patterns and surfaces the "Ash way" across 8 Iron Laws and 9 anti-patterns: load+length/count→ aggregates,count > 0→existsaggregate, post-loadEnum.filter→ query-customized loads, derivedMap.put→ calculations, multi-read +Enum.uniq_by→Ash.Query.combination_of, directRepo.*→ Ash actions/aggregates, wide-resource reads →Ash.Query.select.- Ash auto-detection —
detect-ash.shSessionStart hook announces theash-frameworkskill (andusage_rulessetup) when:ashappears inmix.exsoruse Ash.Resource/use Ash.Domainappears inlib/.priv/resource_snapshots/**added to the CLAUDE.md auto-load table with a reminder that snapshots are owned bymix ash.codegen. mix-compressionAsh filters — added[filters.mix-ash-codegen]and[filters.mix-ash-migrate]toreferences/rtk-filters.toml. Matches the same compression model asmix-ecto-migrate: happy-path short-circuits to one-liner, snapshot/migration file lists preserved verbatim, errors never stripped. Extends the documented 5-15% per-session token reduction to Ash workflows.freezeskill +freeze-gate.shhook — scoped edit lock (/phx:freeze). Writes a.claude/.freezesentinel (allow-list of path prefixes, or empty = freeze everything); aPreToolUsehook then deniesEdit/Write/NotebookEditoutside the allow-list. Use for read-only investigation or to keep a refactor inside specific dirs.
Changed
- Ash callouts in existing skills —
ecto-constraint-debug,liveview-patterns,phoenix-contexts,security, andtestingnow route Ash projects to theash-frameworkskill. CLAUDE.md "Ash Framework Detection" rewritten to load the skill and research viausage_rulesrather than deferring to external docs. mix-compressioninstall docs —rtk test→rtk verifyCLI syntax.
Fixed
[2.10.6] - 2026-06-04
Patch: Elixir 1.20 type-system awareness across the verify/review path, plus
contributor tooling — a /release skill and single-sourced markdownlint ignores.
Added
- Elixir 1.20 type-system awareness. Elixir v1.20 (2026-06-03) completed
its first type-system milestone: the compiler now infers types and gradually
type-checks every program without annotations, reporting dead code and
verified bugs (guaranteed runtime failures) as
mix compilewarnings — built-in, no Dialyzer/PLT. The practical impact for the plugin: on 1.20+ (OTP 27+),mix compile --warnings-as-errors— which/phx:verify,/phx:workcheckpoints, and the "fix CI" pattern run everywhere — now fails the build on type violations. Changes:- New reference
skills/elixir-idioms/references/elixir-120-type-system.md: thedynamic()mental model (refinable range, disjoint-only flagging), guard/clause/map inference, how to read & fix a violation, and a compiler-checker-vs-Dialyzer comparison table. skills/elixir-idioms/SKILL.md: reference pointer added.skills/verify/SKILL.md+agents/verification-runner.md: note that--warnings-as-errorsnow surfaces type violations at the compile step, and to suspect a newly-detected verified bug (not a regression) when a previously-green build fails after a 1.20 bump.agents/elixir-reviewer.md: new "Type Checking (Compiler vs Dialyzer)" note — the built-in checker is the first line of type safety, complementary to (not redundant with) Dialyzer.
- New reference
- Protected-section invariant in the autoresearch loop (contributor
tooling, not distributed). The
## Iron Lawssection of every SKILL.md is now append-only slow state: the loop may add a law but a delete/reword forces REVERT. Enforced as a hard gate viachecks.shcheck #7 (backed bylab/autoresearch/scripts/protected_sections.py, which diffs the working tree against git HEAD, prefix-stripped so renumbering is allowed), plus a "Protected Sections" declaration inlab/autoresearch/program.md. Borrowed from SkillOpt (arXiv 2605.23904), which measured this fast/slow guarantee at ~22 points on SpreadsheetBench. A live test confirmed the necessity: the 8-dimension scorer is blind to single-law deletion (composite andsafetyboth stay 1.0, because the scorer is stateless and thesafetydimension only checks section presence + min count) — so the old soft gate would have silently accepted dropping a security Iron Law. New tests:lab/autoresearch/tests/test_protected_sections.py(10 cases). /releasecontributor skill for cutting plugin releases — bumpsplugin.json, finalizes the CHANGELOG, gates onmake ci, tagsvX.Y.Z, and runsgh release create. Encodes the Release/Versioning checklist and local gotchas as Iron Laws (claude plugin tagdoesn't work for this marketplace layout;plugin.json== CHANGELOG heading == tag; confirm before the outward-facing publish; never force-push)..claude/skills/release/— contributor tooling, not distributed.
Changed
- markdownlint ignores single-sourced via
.markdownlintignore(gitignore syntax), de-duplicating the list acrosspackage.jsonand the Makefile and excluding untracked non-source dirs (social/,.rtk/) somake cistays green on promo/cache content. Contributor tooling, not distributed.
[2.10.5] - 2026-05-25
Patch: route audit subagents to declared-model specialists instead of
general-purpose, cutting Opus subagent volume per /phx:audit run.
Changed
skills/audit/SKILL.md: route 3 of 5 parallel audit subagents to declared-model plugin specialists instead ofgeneral-purpose(which inherits the parent session model, usually Opus). Architecture →phoenix-patterns-analyst(sonnet), Security →security-analyzer(opus), Test health →testing-reviewer(sonnet). Performance and Dependency tracks kept ongeneral-purposewith TODO notes — no plugin specialist exists for project-wide perf or deps audit. Motivation: a JSONL analysis of 4,561 local sessions found 61.6% of Task/Agent invocations bypass the plugin via CC built-ins, materially explaining why Sonnet+Haiku combined are only ~7% of total token spend despite 18 of 22 plugin agents declaring those models.
[2.10.4] - 2026-05-21
Patch: fix force-push hook false-positive (issue #61) and the same scan-past-separator class in the two sibling rules.
Fixed
block-dangerous-ops.sh(PreToolUse) — the force-push regexgit push.*(--force|-f)\bmatched--force-with-lease(in ERE,\bis a word boundary and the hyphen after--forceis non-word, so the boundary triggered on the lease variant) AND scanned past shell command separators, so an unrelated&& gh ... --force-with-leaseon the same line tripped the deny. The hook blocked the very command itspermissionDecisionReasonrecommended as the safer alternative. Reported by @inou (issue #61) — hit on a Sprint 8 rebase cycle that stranded three rebased branches. New ERE anchors on start-of-line or shell separator (;&|&&||), keeps the scan inside the current command ([^;|&]*), and requires the flag to end at a word terminator (([[:space:];&|]|$)), so--force-with-leaseis allowed while real--force/-fare still blocked. The same anchor fix is applied to themix ecto.(reset|drop)andMIX_ENV=prod mixrules in the same file, which had the identical scan-past-separator failure mode (e.g.echo "do not run mix ecto.reset" && mix testused to be denied).- New
plugins/elixir-phoenix/hooks/tests/block-dangerous-ops_test.shregression harness — 41 cases covering real force-push, the lease variant, scan-past-separator false positives, Elixir-only Ecto and MIX_ENV rules, and themix.exs-gated cross-project bleed (#55). Run withbash plugins/elixir-phoenix/hooks/tests/block-dangerous-ops_test.sh.
[2.10.3] - 2026-05-20
Patch release bundling two unreleased changes since v2.10.2: CC hook-API adoption from PR #56 and the eval-framework multi-model trigger scorer.
Added
check-pending-plans.sh(Stop hook) now surfacesbackground_tasks[]andsession_crons[]from hook input as terminal warnings — catches forgottenmix phx.server,iex -S mix,mix watchprocesses and pending/schedulejobs at session stop (CC 2.1.145+ field).block-dangerous-ops.sh(PreToolUse) now emits structured JSON output withpermissionDecision: "deny", a user-facing reason, andhookSpecificOutput.additionalContextcontaining the safer alternative. Thanks to the CC 2.1.110 fix that preserves additionalContext on blocked tool calls, the safer alternative now persists into Claude's next turn instead of being a one-shot stderr message.- CLAUDE.md documents the new
type: "mcp_tool"hook (CC 2.1.118+) with its SessionStart caveat — MCP servers may not be connected at SessionStart, so detection probes stay on direct HTTP /curl; reservemcp_toolfor PreToolUse / PostToolUse / Stop where the connection is live. - Release checklist documents that
claude plugin tag(CC 2.1.118+) does NOT work for this repo's marketplace layout (it expects.claude-plugin/plugin.jsonat the repo root, but our plugin lives atplugins/elixir-phoenix/.claude-plugin/plugin.json). Manualgit tag vX.Y.Zremains the canonical path.
Added (contributor)
- Multi-model trigger eval —
lab/eval/trigger_scorer.pygained a--model <alias_or_full_id>flag (defaultclaude-haiku-4-5, preserves all existing behavior). Aliases (haiku/sonnet/opus) canonicalize to full IDs so--model haikuand--model claude-haiku-4-5share one cache. Non-default models land inlab/eval/triggers/results/by-model/{model}/; per-result JSON records themodelfield so caches are self-describing. lab/eval/compare_models.py— N-way model comparator. Loads N_aggregate.jsonfiles via--models alias,alias,…or--aggregates path…, prints an ASCII table sorted by per-skill spread with↕/⚠markers at 10%/20% disagreement, plus an apples-to-apples intersection mean and pairwise delta when skill sets differ.--format jsonfor machine consumption.Makefile:MODEL=sonnet make eval-multimodel(full per-model sweep),MODELS=haiku,sonnet make eval-compare-models(cached comparison). Foundation for verifying v3.0.0 multi-agent ports (Codex/OpenCode/Pi) on non-Claude routing judges. See issue #48, T1.3 Phase 1.
Changed
block-dangerous-ops.shElixir-specific branches (mix ecto.reset,mix ecto.drop,MIX_ENV=prod) now self-gate onmix.exspresence, matching the PR #55 cross-project-bleed pattern. The git force-push branch remains intentionally global.- SessionStart welcome echo in
hooks.jsonconverted toargs: []exec form (CC 2.1.139+) to eliminate nested shell quoting. /phx:permissionsrisk-classification flags thatBash(find:*)allow rules no longer auto-approvefind -exec/find -delete(CC 2.1.113+ tightening).
[2.10.2] - 2026-05-20
Fixed
/phx:research,/phx:brainstorm,/phx:perf,/phx:pr-reviewfailing with "skill not listed" when invoked via slash command (issue #53, reported by @bigardone). Root cause:disable-model-invocation: truewas still set on these four skills, triggering Claude Code bug #26251 where the model refuses to invoke a skill via the Skill tool even when the user typed the slash command. Removing the flag — matching the precedent established in commitf1fc494(plan/review/investigate) — restores reliable invocation across native CC and third-party CC wrappers (Conductor, OpenCode, etc.), and lets the model see these skills in its inventory so workflow chains (/phx:brainstorm → /phx:plan,intent-detection → /phx:research) resolve correctly.
[2.10.1] - 2026-05-20
Patch release fixing cross-project bleed when the plugin is enabled globally
(issue #55). All Elixir-specific hooks now self-gate on mix.exs presence —
they no-op cleanly in non-Elixir repos instead of firing Phoenix Iron Laws on
unrelated files. security-reminder.sh additionally tightens its filename
match to eliminate false positives on parent directory names and non-source
files.
Fixed
- Hooks now self-gate on
mix.exspresence — no Iron Laws, security reminders, subagent context injection,.claude/directory creation, or plan-STOP messages in non-Elixir projects when the plugin is enabled globally. Affects:security-reminder.sh,log-progress.sh,inject-iron-laws.sh,precompact-rules.sh,setup-dirs.sh,plan-stop-reminder.sh,format-elixir.sh,iron-law-verifier.sh,debug-statement-warning.sh(#55). security-reminder.shfilename matching tightened — basename-only match with word-boundary separators (_.-) and restricted to Elixir source extensions (.ex/.exs/.heex/.eex/.leex). Eliminates false positives liketokenizer.cpp(token),/admin_panel/foo.ex(parent diradmin),docs/session-notes.md(wrong extension), and the reporter'ssession-state.mdcase (#55).hooks.jsonEdit|Write block — addedif:extension filter forsecurity-reminder.shas defense in depth alongside the script's self-gating.
Changed
- README install section: noted project-scope enable as a tidiness preference for multi-stack developers (global enable is now safe).
[2.10.0] - 2026-05-16
Adds a second, framework-agnostic companion plugin to the
oliver-kriska marketplace: catchup. It is a fully independent
plugin (own .claude-plugin/plugin.json, own version 0.1.0, own
README) — installed separately and not coupled to Elixir/Phoenix.
The elixir-phoenix bump to 2.10.0 is the marketplace release vehicle
(single root CHANGELOG); the only elixir-phoenix-internal changes
this release are the README companion section and a /phx:help
routing row. Implements GitHub issue #47.
Added
catchupplugin —/catchupreturn-from-absence briefing. Standalone plugin atplugins/catchup/, second entry in.claude-plugin/marketplace.json. User-triggered skill (disable-model-invocation, slash-only). Fans out to GitHub (gh), git, Linear MCP, and Google Calendar MCP, then emits one prioritized brief in the 10-element Context Brief Framework scoped to a personal catch-up (Intent + ranked priorities, what moved, conflict risks, timeline). Flags:--since(incl.last-sessionmtime auto-detect),--sources,--depth quick|standard|deep,--focus. Writes.claude/catchup/brief-<date>.md+ a ≤25-line inline summary.- Impact-on-your-scope analysis (issue #47, @druyang). First-class
brief block: intersects files moved on the default branch by others
in the window with the reader's in-flight scope (open-PR files, local
feature-branch diffs, working tree); classifies direct vs
adjacent overlap;
--depth deepreads incoming diffs for per-file semantic impact;--focus impactnarrows the brief to only this. Answers "how do these changes affect my work", not just "what did I miss". - Graceful-degradation contract. Sources are detected before
query; a missing source becomes one honest line in the brief's
Risks/assumptions block, never an error.
git logis the always-available floor (valid minimum brief). No-Linear-MCP proxy: harvests[A-Z]{2,}-\d+ticket refs from commit/PR titles (labelled unverified). Privacy default is excerpt-only; Slack/Gmail are v2 opt-in. v2 surface (scheduling,.claude/catchup.local.md, cross-project rollup) is pinned inreferences/config-schema.mdbut not built. - Timezone-correct windows. Calendar words (
friday,yesterday, a date) resolve in the user's local TZ (the machine running/catchup), pivot through a singleSINCE_EPOCH, then derive a UTCSINCE_ISO. Every source is compared on that one absolute instant, so colleagues in other timezones are included from your boundary ("since my Friday", not "since each author's local Friday"). Fixes a UTC-vs-local resolution bug (±14h). The brief's Timeline shows the anchor with its TZ abbrev. - Sonnet delegation (cost/speed). The
/catchupskill is now a thin orchestrator: it resolves the window + sources, then spawns a newcatchup-runneragent (model: sonnet,effort: medium) for thegh/gitfan-out, impact analysis, and brief assembly — so the caller's (often Opus) session no longer pays for the bulk I/O and summarization. MCP (Linear/Calendar) is still pulled in the caller's context (subagent MCP is unreliable) and passed to the agent. Skilleffortloweredhigh → medium. - Smarter default window —
last-active. Replaceslast-sessionas the default: takes the MAX of (newest Claude session mtime for this repo, your last own commit's committer-date, your last own PR/review activity). The latest footprint is the true "you were last here" instant; the brief records which signal won. New explicit values:--since last-session(sessions only),--since last-commit/last-mine(your git/PR only). /ketchup🍅 easter-egg alias. A second slash-only skill (skills/ketchup/) that forwards verbatim to/catchup— same flags, same behavior, squeezier name.- Verified end-to-end against a busy multi-developer production repo (Linear/Calendar MCP absent → degradation + proxy paths exercised; real direct file overlaps surfaced across local branches, a high-churn core module as the hotspot).
Changed
elixir-phoenixREADME: added a "Companion plugin:catchup" install section./phx:help: added a "Returning after time off" routing row pointing to/catchup.catchupis repo-scoped by default. New--scope repo|allflag (defaultrepo). Every GitHub signal — review-requested, notifications/mentions — is now filtered to the repo/catchupran in: review-requests usegh pr list --repo "$REPO" --search "review-requested:@me"(was org-widegh search prs), and pings use the repo-scoped/repos/$REPO/notificationsendpoint (was cross-repo/notifications?all=true).--scope allre-enables cross-repo, but those hits are listed in a separate Other repos subsection and a Risks line, never folded into the repo's own lists.
Fixed
catchupcross-repo leakage (production finding). A brief run inside one repo listed other repos' review queue, notifications, and mentions (org-widegh search/cross-repo/notifications), contradicting the expectation that a per-repo catch-up is scoped to that repo. Now repo-scoped by default; cross-repo is opt-in and segregated (see Changed →--scope).catchup-runnerturn budget (production finding, ccrider- verified). A busy real repo hit the agent'smaxTurnsmid-assembly so it never returned the inline summary, forcing the (often Opus) caller to re-summarize — defeating the Sonnet cost delegation.maxTurns 25 → 60, added a "Tool economy" section (batch shell, write the brief before risking the budget), and a skill-sideSendMessagefallback that finishes the summary cheaply in Sonnet instead of in the caller.catchupcorrectness audit — 3 shell bugs. (1)git log --name-onlyover a range under-counts files ~70% due to history simplification (real repo: 44 vs 140 ground truth — missed a landed migration and a 14-file conflict); replaced with per-commitgit diff-tree --no-commit-id --name-only -r. (2)awk -F'|'on commit subjects corrupts fields when a subject contains|(e.g.feat(a|b):); switched to TAB (%x09) — macOS awk does not accept-F'\x1f'. (3) Unbounded local-branch scan firehosed on a 400-branch repo; bounded to your own branches active in 60d, capped.catchupcross-repo timestamp discipline (--scope allproduction finding). On a narrow window a--scope allbrief (1) printed GitHub's UTCupdatedAt(06:36:23Z) with a local-TZ label (06:36 CEST, actually08:36 CEST), and (2) promoted a pre-window standing review request to "do first" by bundling it with an unrelated in-window issue update.catchup-runner+source-adaptersnow state two hard rules: judge each item on its own controlling timestamp ≥SINCE_EPOCH(a related in-window object never drags a pre-window object into Top priorities — it goes to the "pre-window, for completeness" line), and convertZ→LOCAL_TZbefore printing any clock time.catchupanonymization. Removed client repo/ticket identifiers from the distributed plugin and CHANGELOG; examples use genericPROJ-####/lib/app*placeholders.
[2.9.0] - 2026-05-16
Ships the /phx:deps-audit + /phx:deps-vet Hex/Elixir supply-chain
suite. Built across five internal phases and two real-project dogfood
passes (two production apps) and consolidated into a single release —
none of the interim 2.10.0–2.12.0 bumps were ever tagged or shipped
(last release was v2.8.8).
Added
/phx:deps-audit— Hex supply-chain audit. 8-rule MVP catalogue per dependency tarball: bidi Unicode (Trojan Source CVE-2021-42574),Code.eval_*/:erlang.applyat module scope, compile-timeSystem.cmd/:os.cmd/Port.open,:erlang.binary_to_term/1without:safe, new:git/:pathdeps, maintainer rotation (Hex API), large base64 blobs, and Levenshtein typosquats (≤2 + 1000× download delta). Modes: B (working vs HEAD, default), C (vs--base <ref>), A (--preview— locked vs Hex latest). Wrapsmix hex.audit(always),mix_audit(GHSA) andosv-scanner(OSV.dev) when present — never auto-installs. Output: markdown triage table + JSON sidecar (.claude/deps-audit/last-run.json) + SARIF 2.1.0 (--sarif). Eval composite 1.000.- Differential CVE pass.
scripts/diff_cves.pyrunsmix_auditagainst OLD and NEWmix.lock(tmpdir copy — never mutates the real lock) and reportspatched/introduced/still_exposed. The 25-package virgil dogfood surfaced 4 CVEs patched in real time that the old single-state scan missed. GHSA freshness warning when the advisory cache is >24h old. /phx:deps-vet—hex_vet.exsaudit ledger. cargo-vet-style trust ledger at project root;:safe_to_deploy/:safe_to_run/:does_not_implement_crypto. Vetted{pkg,version}pairs downgrade audit findings to INFO; lock-vs-ledger disagreement → lock wins.--seedimports a curated provenance baseline,--checkcross-referencesmix.lock,--listrenders the table.- PreToolUse gate. Tiered
deps-audit-gate.shonmix deps.{get,update,compile}(Tier 0 lock-SHA cache → Tier 1 bidi + new-dep → Tier 2 full). Tri-mode policyfalse | :new_only | :strict | :full(:new_onlydefault).PHX_SKIP_DEPS_AUDIT=1escape hatch. - LLM triage (threshold-gated).
hex-deps-triager(sonnet) triages packages scoring >10 into{confidence, verdict, rationale, fp_reasons[]};context-supervisor(haiku) consolidates so the parent context sees only the verdict. Advisory-only (ordering, not severity). - Precision layers (optional, soft deps). Semgrep ruleset
(
priv/semgrep/elixir-supply-chain.yaml) and YARA rules (priv/yara/hex-malware.yar) — skipped cleanly if absent. - CI + lifecycle.
--cinon-interactive mode (exit 0/1/2) with GitHub/CircleCI/GitLab/Drone samples. Monthly cassette- and seed-regen workflows with an org-policy 403 artifact fallback. EEF CNA real-CVE corpus (decimal, bandit, phoenix, postgrex, cowlib) + synthetic fixtures; full smoke harness (runner.sh+lib/detectors.sh+fixtures.d/). - Solutions auto-feed. After a BLOCK finding, prompts (never
auto-writes) for
/phx:compound; future audits pre-elevate matching snippets from.claude/solutions/supply-chain/. - Distributed imports v1.
imports:allow-list inhex_vet.exs(only the plugin seed in v1), 24h TTL, renderer attribution. - Routing wired into
/phx:helpand/phx:introcheat sheets; contributorreferences/skill-checklist.md.
Changed
- Cache architecture: persistent → per-run ephemeral. No
.claude/deps-audit/cache/; each run gets a freshmktemp -d${AUDIT_TMPDIR}torn down viatrap … EXIT. A "no vulnerabilities" verdict now reflects today's Hex + GHSA, never a stale snapshot. Removedprune_cache()and the plannedcache_signature.json. Persistent files retained:last-run.json(gate sidecar) andpolicy.exs(user-owned). Newreferences/audit-tmpdir.md. Migration:rm -rf .claude/deps-audit/cache/is safe. - Default full 8-rule scan with streaming
[N/M] pkg verprogress;--quickopts down to CVE + retirement (<10s). Removed the prompt that let users silently skip heuristics. - Empty
hex_vet.exsstub defaults toblock_on_unvetted: :new_only.
Fixed
- CRITICAL — skill installed
mix_auditwhen asked, violating the non-mutating contract (added the dep tomix.exs/mix.lock). Iron Law #2 reworded to "NEVER install … even if asked"; consent-resistant guidance added. A skill that mutates "because the user asked" is, to a security reviewer, indistinguishable from one that mutates on its own. - CRITICAL — gate policy parser silently downgraded enforcement.
Took the first regex match in
hex_vet.exs, so a commented example (# block_on_unvetted: false) beat a real:strictsetting — fail-open. Now strips comments, takes the last uncommented match (Elixir last-assignment-wins), warns on multiple keys. - Phase 5 hardening (security/test review): the CVE-diff harness
no longer silently SKIPs fixtures missing
setup.sh/expected.txt;_mix_audit_run_with_lockunsetsMIX_*env before running (defense-in-depth for Iron Law #2); a failed lock-copy now bubblesreturn 2instead of a false-green "no vulnerabilities". - Reference robustness (2026-05-16 virgil dogfood):
cross-tool-call
${AUDIT_TMPDIR}handoff (export/functions/trap EXITdo not survive separate Bash tool calls) + the quoted-heredoc trap;tarball-fetcherbakedfetch.sh(zsh has noexport -f);python3 + urllibis now the canonical Hex API client (curl hit "Malformed input to a URL function"); mandatory2>/dev/nullonCode.eval_file("mix.lock"); expectedmix deps.auditrecompile documented. deps-vet: Iron Law #6 — confirmation counts are computed, not estimated (--seedshowed26/4vs real23/7); dropped the false "top-100" seed label (~30 entries) and reframed it as a pinned provenance baseline, not current-lock certification.
Out of scope (deferred)
- Companion
phx_deps_vetHex package — follow-up (separate repo). - Multi-org distributed audit imports — after the single-import trust-chain proves out.
- OTP-level CVE detection (SSH, inets, public_key) — needs an OTP version layer separate from Hex packages.
- Auto-refresh of the GHSA cache via PreToolUse hook.
- Regenerate
hex_vet_seed.exsagainst current top-package versions (the bundled seed is Phoenix-1.7-era) — via the monthlyseed-regen.ymlCI or a dedicated reviewed pass.
[2.8.9] - 2026-05-08
Changed
- Skill descriptions tightened to reduce routing false positives:
audit— removed "security" from listed scope (security skill owns that signal); cleaner separation from focused security/boundaries asks.assigns-audit— leads with "Inspect" instead of "Audit" verb to disambiguate fromauditskill. Trigger accuracy 0.80 → 0.90.challenge— added "OTP designs" to scope so OTP supervision tree challenges route correctly. Trigger accuracy 0.80 → 1.00.document— clarified scope to @doc/@moduledoc only, not README or external docs. Trigger accuracy 0.80 → 1.00.liveview-patterns— trigger prompts tightened with explicit "LiveView" / "phx-" markers. Trigger accuracy 0.625 → 0.75.n1-check— added explicit "NOT for unrelated Ecto questions or wider database performance" guard. Trigger accuracy 0.70 → 0.90.
helpIron Law #5 — capitalized "NEVER block" / "DO NOT redirect" for the eval framework's safety matcher.
Fixed
- Eval set contamination — stripped 209 routing hint annotations (em-dash separators, arrows, parentheticals) from 38 of 42 trigger test files. Per Oren et al. (ICLR 2024) "Proving Test Set Contamination in Black Box Language Models," these annotations leaked the correct routing decision to haiku inside the test prompt itself, inflating behavioral scores by rewarding hint-following over real routing competence. Average accuracy held at 91% post-strip; composition shifted to honest baseline. Contributor-only — no user impact except cleaner future tournament inputs.
- README references — wrapped a 270-char attribution line that
was breaking
make eval-alllint.
Added (contributor)
lab/eval/triggers/strip_hints.py— re-runnable script that strips hint annotations from trigger files. Idempotent, supports--dry-runand--stats. Regex tightened vs unmerged PR #24's original: requires leading whitespace before separators (preserves inline em-dash punctuation) and matches only the rightmost annotation per pass (safer on prompts with multiple separator layers).
[2.8.8] - 2026-05-08
Added
/phx:mix-compressionskill (issue #40, Angle 1) — installs rtk filters that compressmix test/credo/dialyzer/compile/deps.get/ecto.migrateoutput before it reaches the transcript. Bundledreferences/rtk-filters.tomlis the battle-tested filter set with embedded test fixtures: short-circuits happy paths to one-liners (mix test: all pass,mix credo: clean) while preserving compile errors, test failures, and stack traces. Critical signals (** (CompileError),== Compilation error in,FAILURES, dialyzer warnings, file:line frames) are never stripped. Expected gain on mix-heavy sessions: 5-15% per-session token reduction. Skill walks through detection (which rtk), install (homebrew +rtk init zshshell hook), seeding.rtk/filters.toml, and verification viartk test mix-test. Pointer added to/phx:permissions"Related" section. Architecture note in skill body: this lives in a skill rather than aPostToolUsehook because rtk's subprocess-wrapping is the correct architectural layer — hook output cannot retroactively shrink what's already in the transcript.Retention@K convergence metric (issue #40, Angle 3) — new
lab/autoresearch/retention.pymodule +retentionCLI subcommand onrun-iteration.py. Computes overlap of top-K skills (by trigger accuracy) between consecutive iterations and appends tolab/autoresearch/retention.jsonl(gitignored, append-only ledger). Defaults match the TACO paper (arXiv 2604.19572):K=30,threshold=0.9,streak=2. Newtarget --check-retentionflag short-circuits toretention_convergedwhen the top-K ranking has stabilized for two consecutive iterations — autoresearch can stop running mutations when the skill pool stops reshuffling. Pure-function core (retention_at_k,compute_topk_by_trigger,is_converged) testable without any I/O fixtures. Dev-tooling only — zero impact on plugin users.
Notes
- Issue #40 Angle 2 deferred — evolving
compound-docsinto a rule pool depends on Angle 1 telemetry justifying the investment. With rtk carrying compression at the subprocess layer (and CC v2.1.121'shookSpecificOutput.updatedToolOutputopening a future hook path), there's no urgency to build a second compression layer in the plugin. Re-evaluate after a quarter of dogfooding rtk + Retention@K.
[2.8.7] - 2026-05-08
Changed
- Iron Law #1 — SEO/dead-render exception (issue #44) —
iron-law-judgenow uses 4-state detection instead of binary CRITICAL on anyRepo.*in mount. Cache-backed disconnected branches (Cache.*,:persistent_term, ETS) are recognised as the canonical SEO/dead-render pattern and pass cleanly. UncachedRepo.*in the disconnected branch downgrades from BLOCKER to SUGGESTION with a "if SEO, prefer cache-backed" hint. Updated wording inliveview-patternsSKILL,async-streams.mdreference (new "SEO Dead-Render Pattern" section),liveview-architect, root CLAUDE.md,inject-iron-laws.sh, andintro/tutorial-content.mdso all surfaces stay coherent. Verified end-to-end: 6 synthetic LiveView fixtures classified correctly (1 CRITICAL, 4 CLEAN incl. cache-backed/persistent_term, 1 SUGGESTION). Resolves the false-positive flagged by @javiercr.
[2.8.6] - 2026-04-28
Changed
- CC changelog audit — bumped tracked Claude Code version to v2.1.121
(
.claude/cc-changelog/last-checked-version.txt) and refreshed the audit notes inmemory/reference_cc_source_internals.md. No BREAKING or DEPRECATION items affecting the plugin. Highlights for plugin authors:PostToolUsehookSpecificOutput.updatedToolOutputnow works for all tools (previously MCP-only) — opens the door forformat-elixir.sh/error-critic.shto rewrite mix output instead of only appending hints;--dangerously-skip-permissionsno longer prompts on writes to.claude/skills/ | agents/ | commands/, which directly unblocks autoresearch and skill-creator loops;CLAUDE_CODE_FORK_SUBAGENT=1now works in non-interactiveclaude -psessions, enabling forked subagents inlab/eval/andlab/autoresearch/scripts;${CLAUDE_EFFORT}is now substituted inside skill content, opening up effort-driven skill branches that align with the plugin's existingeffort:frontmatter convention;claude ultrareview [target]now exists as a non-interactive subcommand with--jsonoutput. Reliability fixes worth noting: MCP servers now auto-retry 3× on transient startup errors, the Esc-during-stdio-MCP regression from 2.1.105 is fixed, several memory leaks are closed, and--resumenow skips corrupted transcript lines instead of crashing (relevant to session-scan / session-deep-dive). Adding$schematoplugin.json/marketplace.jsonis now supported byclaude plugin validatebut is deferred until the canonical schema URL is published.
[2.8.5] - 2026-04-27
Changed
- CC changelog audit — bumped tracked Claude Code version to v2.1.119
(
.claude/cc-changelog/last-checked-version.txt) and refreshed the audit notes inmemory/reference_cc_source_internals.md. Highlights for plugin authors:PostToolUse/PostToolUseFailurehook inputs now includeduration_ms; asyncPostToolUsehooks emitting no response no longer write empty session-transcript entries (ourlog-progress.shbenefits silently — no code change needed); skills invoked before auto-compaction no longer re-execute against the next user message;--printmode and--agent <name>now honor agenttools:/disallowedTools:andpermissionMode:for built-in agents (relevant for future headless agent runs inlab/eval/). No BREAKING or DEPRECATION items affecting plugin code.
[2.8.4] - 2026-04-24
Added
/narrow-bare-rescueskill — new user-invocable skill for auditing and narrowing barerescue _ ->/rescue e ->clauses in Elixir to explicit exception-type lists so programmer bugs (UndefinedFunctionError,KeyError, typos) propagate instead of being silently swallowed. Motivated by the Erlang Secure Coding Guide rule LNG-002 ("Do Not Usecatch"). Ships with:SKILL.md— Iron Laws (5 rules) + 4-step workflow (find → taxonomy lookup → apply → verify)references/taxonomy.md— verified exception sets for 16 work categories (JSON, Ecto + Postgres, Money/Decimal, File I/O, Req, ExAws, ExCmd, Regex, atoms-from-strings, Phoenix forms, Plug, Phoenix LiveView HEEx/MDEx, NimbleCSV, DOCX/PDF extraction, explicitraise, plus a "programmer-bug exceptions to EXCLUDE" table). Validated against Elixir 1.19 / OTP 28.references/patterns.md— special patterns:is_exception/1replacement, Oban "log and reraise" (with__STACKTRACE__), ExCmd'sExCmd.Stream.AbnormalExit, module-attribute hoisting for ≥3 rescues sharing a taxonomy, partitioning ≥50-site cleanups into per-directory PR clusters, and the regression-prevention Credo check pattern.lab/eval/triggers/narrow-bare-rescue.json— 10-prompt trigger test set.- Invocation:
/narrow-bare-rescue [file_path | directory | --all]. - Eval: composite score 0.968 (structural), 80% trigger accuracy, 100% trigger precision.
[2.8.3] - 2026-04-23
Added
/phx:reviewcross-checks implementation against requirements (requested by Thiago Ferrari Pimentel on Slack, 2026-04-23). The review now emits a## Requirements Coveragetable with columns# | Requirement | Status | Evidence, classifying each stated requirement as MET / PARTIAL / UNMET / UNCLEAR. This formalizes the cross-check pattern already done manually in sessionba3f7890(2026-04-17, a production repo) where the table was titled "Cross-check against Linear PROJ-8931 acceptance criteria".- Auto-detection of the requirements source (no argument required).
/phx:reviewnow tries, in priority order:- Explicit
$ARGUMENTS(path to.md,PROJ-8931, or#42) - Conversation context (recent
mcp__linear__get_issue/gh issue viewresults are reused — no re-fetch) - Git branch regex (
[A-Za-z][A-Za-z0-9_]+-\d+, matching branches likeproj-8278-extraction-scaffolding) - Commit subjects since main (
[A-Z]+-\d+or#\d+) - Most recently modified
.claude/plans/*/plan.md(extracts only- [x]completed items) - None → emits
NOT AVAILABLEwith sources tried (never silent).
- Explicit
- New
requirements-verifieragent (sonnet, read-only,omitClaudeMd). Extracts requirements from the source, Greps the diff for evidence, classifies each item. Spawned in parallel with other review agents when a source is detected. - New Usage:
/phx:review PROJ-8931,/phx:review #42,/phx:review --no-requirements. - New reference:
skills/review/references/requirements-detection.mddocuments sources, regexes, fetch commands, and failure handling.
Changed
/phx:reviewverdict now considers Requirements Coverage: anyUNMETescalates toREQUIRES CHANGES;PARTIALdowngradesPASS→PASS WITH WARNINGS.BLOCKED(Iron Law violations) still takes precedence.- Review template places the coverage block before per-severity findings — "did we deliver what we promised" is the user's first question.
Fixed
log-progress.shwrote entries to the wrong plan (issue #38, bigardone). The hook picked the most recently modifiedprogress.mdacross ALL plans vials -t | head -1, so with more than one plan in.claude/plans/the[HH:MM] Modified: <file>lines landed in whichever plan had been touched last — often a completed plan, not the plan/phx:workwas actually running. Bug has existed since the init commit (2026-02-13); surfaced once users accumulated parallel plans. The progress-file branch is removed entirely — the/phx:workskill already logs structured progress entries itself, so the hook-driven append was both redundant and structurally unsound (no reliable way to identify the active plan from inside aPostToolUsehook). The cross-project JSONL metrics branch is unchanged.
[2.8.2] - 2026-04-17
Changed
- Tournament-refined skill descriptions for
plan,liveview-patterns, andintent-detection. Rewritten using concrete use-case phrases (billing, RBAC, Presence,assign_async, streams) instead of technical vocabulary. Matches the "users describe features, not mechanics" routing pattern observed in session analysis. Output from the first tournament run on skills with <75% trigger accuracy. - Reframed skill description 250-char target as plugin listing-budget discipline
— CC raised
MAX_LISTING_DESC_CHARSfrom 250 to 1,536 in v2.1.105, but our target stays at 250. Rationale is no longer "CC hard cap"; it's "~8K skill- listing budget divided across ~40 skills ≈ 200 chars per description". Longer descriptions would crowd out other skills in the listing and hurt routing accuracy across the whole plugin. UpdatedCLAUDE.md,lab/eval/matchers.py,lab/eval/scorer.py,lab/eval/generate_evals.py,lab/eval/evals/_template.json,lab/tournament/config.yaml,lab/tournament/prompts.py, and.claude/skills/cc-changelog/references/analysis-rules.md. Eval threshold unchanged (still 250), so no skill scores should move. /phx:introtutorial gains a "Playing Nicely With Claude Code Built-Ins" subsection covering auto mode + xhigh effort (Opus 4.7),/focus, recap feature, and/less-permission-prompts(all new in CC v2.1.108–2.1.111). The plugin's workflow commands pair with these, not replace them.
Internal (contributor tooling — not distributed)
- New
lab/tournament/module — pairwise LLM-judge tournaments on skill description variants, using held-out trigger prompts to pick winners once structural eval is saturated (composite = 1.000) but trigger accuracy lags (<75%). Includes config, prompts, LLM adapter, tournament core, pytest suite. make eval-tournamenttarget + cached trigger-accuracy gate inlab/eval/run_eval.sh(readstriggers/results/JSON, fails if any skill <75%, points atmake eval-tournament).- Autoresearch tournament mode —
find_weakesttournament mode + newtournamentsubcommand that gates on structural 1.000 and journals the result. - Held-out trigger test split — trigger JSON files gain a
should_trigger_testfield so tournament rounds judge on prompts the training set hasn't seen. - Gitignore cleanup — ignore
output/,raw/,scripts/imessage-state.json,lab/tournament/results/,.claude/research/. Fixed.claude/cc-changelog/changelog-cache.mdpattern (inline# commenton the same line made it part of the pattern, so the file was never actually ignored).
[2.8.1] - 2026-04-11
Fixed
/phx:reviewnow actually writes findings files — Review agents (elixir-reviewer,testing-reviewer,iron-law-judge,security-analyzer,oban-specialist,deployment-validator,verification-runner,parallel-reviewer) previously declareddisallowedTools: Write, Edit, NotebookEditand could not write to disk. The skill told them to write findings to.claude/plans/{slug}/reviews/{agent}.md; the main context fell back to extracting from each agent's return message, producing the visible log line "Agent didn't write the file. Let me read its output to extract findings." Fixed by allowingWrite(keepingEditandNotebookEditdisallowed so source code stays protected), bumpingmaxTurnsfrom 15 → 25 for the six non-mechanical reviewers (burned on Read/Grep before writing on large diffs), and adding an explicit "write partial findings by turn ~12, refine later" instruction to each agent. Closes #33 — thanks @bigardone for the report./phx:reviewskill passes explicitoutput_filepath to every agent — Step 2 now includes a per-agent file mapping (elixir.md,testing.md,iron-laws.md,security.md,oban.md,deploy.md,verification.md) so the orchestrator can read findings deterministically instead of reparsing agent messages./phx:reviewskill Step 3 logs a scratchpad warning on missing output file — When an agent completes but its expected findings file is missing (turn exhaustion, error, etc.), the skill now writes a timestamped warning to.claude/plans/{slug}/scratchpad.mdand marks the extracted section as⚠️ EXTRACTED FROM AGENT MESSAGE, making the failure auditable instead of silent.parallel-reviewerspawns real specialist agents instead ofgeneral-purposeimpersonation — Previously usedsubagent_type: "general-purpose"with "You are acting as the X agent" prompts as a workaround for specialists lackingWrite. Now that real reviewers can write,parallel-revieweruseselixir-phoenix:elixir-reviewer,elixir-phoenix:security-analyzer,elixir-phoenix:testing-reviewer, andelixir-phoenix:verification-runnerdirectly — carrying their domain checklists, skills, and Iron Laws automatically.
Changed
- Agent checklist in
CLAUDE.mdupdated to reflect the new convention: review agents declaredisallowedTools: Edit, NotebookEdit(notWrite, Edit, NotebookEdit). Write is allowed for own findings file only; Edit blocks source code modification, upholding Review Iron Law #1.
[2.8.0] - 2026-04-03
Added
/phx:brainstorm— Adaptive requirements gathering — New command skill implementing an interview-research-synthesis loop for ideation before planning. Asks context-aware questions one at a time across 6 dimensions (What, Why, Where, How, Edge cases, Scope), runs lightweight codebase scans between questions, and offers parallel research via diverge-evaluate-converge pattern. Produces.claude/plans/{slug}/interview.mdthat/phx:plandetects and consumes, skipping its own clarification phase. Inspired by Virgil EI, ALFA framework (2502.14860), MediQ (2406.00922), and LLM Discussion Framework (2405.06373). Closes #28 — thanks @bigardone for the feature request./phx:planinterview detection — Plan skill now checks for brainstorminterview.mdartifacts and skips clarification when found withStatus: COMPLETE./cc-changelogcontributor skill — Automates Claude Code changelog auditing: fetches CC changelog from GitHub, extracts new entries since last check using semver comparison, and guides impact analysis against plugin components. Includesfetch-cc-changelog.shscript with caching and diff support.
Fixed
- xref cycle detection uses
--label compile— All 6 locations now usemix xref graph --format cycles --label compileinstead of bare--format cycles. Prevents false positive HIGH-severity findings from benign runtime cycles caused byverified_routes()macro in standard Phoenix projects. Affected:xref-analyzeragent,boundariesskill,auditscoring,architecture-checks,call-tracingreference. Closes #30 — thanks @bigardone for the excellent bug report. - 5 brainstorm issues from real-world session — From first test session (gettext performance brainstorm): enforce formal Decision Points with mandatory AskUserQuestion, ask Scope within first 3-4 questions, improve plan handoff UX with exact copy-paste command, cap first research cycle at 2 agents (Iron Law #7), and track research iterations with soft limit after 3 cycles.
Changed
disableSkillShellExecutionresilience — Converted executable bash fenced blocks to inline prose instructions across 18 skills (14 BROKEN, 4 DEGRADED). Skills now instruct Claude via prose ("Runmix compile", "Use Grep to search...") instead of```bashblocks that CC may block whendisableSkillShellExecutionis enabled (CC v2.1.91). Tool-replaceable commands (grep,cat,find,ls) converted to Claude tool references (Grep, Read, Glob). Documentation/example blocks unchanged.- Removed
disableModelInvocationfrom plan, review, investigate — The flag blocked programmaticSkill()calls during workflow transitions (brainstorm→plan, work→review). Confirmed in 3+ sessions. Kept on brainstorm, research, pr-review, perf where unwanted auto-loading is a real risk.
[2.7.0] - 2026-04-02
Added
- Comprehensive Oban Pro support — Rewrote
oban-pro-basics.md(80→358 lines) with accurate Pro.Worker APIs, args_schema, Workflows, Batches, Chunks, Relay, Smart Engine configuration, and Pro plugin migration guide. - Smart Engine gotchas — Documented two production-validated gotchas: one partition limiter per queue constraint, and snooze rolling back attempt counter (caused 72k+ orphaned jobs in real production incident).
- Iron Law #7 (Oban) — "SMART ENGINE: NEVER USE
attemptTO LIMIT SNOOZES" added to SKILL.md, oban-specialist agent, and iron-law-judge detection rule #9b. - Pro Testing patterns — Added Oban Pro Testing section to testing-patterns.md
with
drain_jobs/1, workflow testing, and version-check notes. - Smart Engine queue config — Added Smart Engine and Pro Plugin Config sections to queue-config.md with global/local/rate limit examples.
Changed
- Replace deprecated
TaskOutputwithRead— 5 orchestrator agents and 1 skill reference updated to use background agent notification +Readon output files instead of the deprecatedTaskOutputtool (removed in CC v2.1.89). maxTurnsfor all 20 agents — Added turn limits to prevent runaway agents:maxTurns: 10for haiku agents,maxTurns: 15for sonnet/opus specialists. Previously only 5 orchestrators had limits.- Conditional skill auto-loading via
paths:— 6 reference skills now declare file patterns for automatic loading (CC v2.1.84): liveview-patterns (*_live.ex), ecto-patterns (migrations/*.exs), oban (*_worker.ex), security (*auth*.ex), testing (*_test.exs), deploy (Dockerfile,fly.toml). Addresses #1 gap from session analysis (zero skill auto-loading in 137 sessions). claude plugin validatein CI — Addedmake validatetarget that runsclaude plugin validatefor frontmatter + hooks.json schema checking.- Oban skill description — Now mentions both
perform/1(OSS) andprocess/1(Pro) for better routing when users work with Oban Pro workers. - Oban specialist agent — Enhanced Pro-Specific Review checklist with partition constraint checks, snooze pattern detection, and new Pro Red Flags examples.
- Iron law judge — Added detection rule #9b for snooze + attempt guard infinite loop pattern in worker files (CRITICAL severity, DEFINITE confidence).
[2.6.1] - 2026-04-01
Added
- Structured scratchpad —
check-scratchpad.shauto-initializes template with Dead Ends, Decisions, Open Questions, Handoff sections. Highlights dead-end count on session resume.precompact-rules.shinjects Dead Ends into compaction context. - Source quality tiers in web-researcher — T1-T5 tier classification for research output. Every source tagged with quality tier, synthesis notes source reliability.
Changed
- Hook
ifconditions — PostToolUse hooks now use declarativeiffilters (e.g.,"if": "Edit(*.ex)") to skip non-Elixir files without spawning a shell. Split singleEdit|Writematcher into three targeted groups (Edit, Write, Edit|Write). PostToolUseFailure hooks use"if": "Bash(*mix*)"to only fire on mix failures. - Async SessionStart hooks —
detect-tidewave.shandcheck-branch-freshness.shnow run withasync: true, reducing session start time by up to 32 seconds. - Skill descriptions optimized — Rewrote 32 skill descriptions to fit within Claude Code's internal 250-character listing budget (80% were previously truncated).
- Read-only agents get
omitClaudeMd: true— 16 of 20 agents that can't modify code now skip CLAUDE.md loading, reducing subagent context overhead.
Fixed
- Stale command references: removed
/phx:autoresearchfrom help/intro, fixed/phx:learn→/phx:learn-from-fixacross 9 files.
Removed
verify-elixir.sh— Dead hook (wasexit 0no-op). Compilation verification runs in/phx:workphase checkpoints.
[2.6.0] - 2026-03-27
Added
/phx:helpcommand — Interactive command advisor that recommends the right/phx:command based on user description or ambient context (git status, plans)/phx:permissionsskill — Analyzes recent sessions, classifies Bash commands by risk (GREEN/YELLOW/RED), recommends safe additions tosettings.json/phx:verifyproject-aware discovery — Readsmix.exsto detect installed tools (credo, dialyxir, sobelow, ex_check), adapts verification sequence. Uses composite aliases (mix ci,mix precommit) when available, falls back to individual steps if alias fails locally- 8-dimension eval framework (
lab/eval/) — Deterministic scoring for skills (completeness, accuracy, conciseness, triggering, safety, clarity, specificity, behavioral) and agents (completeness, accuracy, conciseness, safety, consistency). 24 Python matchers, per-skill eval definitions for all 40 skills + 20 agents - Behavioral trigger eval — Haiku-based trigger accuracy testing (8 prompts per skill). Measures whether Claude routes user requests to the correct skill. Cost: ~$1.50 per full sweep. Baseline: 84% average accuracy
- Autoresearch loop (
lab/autoresearch/) — Self-improving skill that proposes mutations, evaluates, keeps/reverts via git. Wrapper script (run-iteration.py), structural checks (checks.sh), JSONL journal with ASI failure metadata, ideas backlog. Proven: 20+ iterations, 100% win rate - Agent eval (
lab/eval/agent_scorer.py) — 5-dimension scoring for all 20 agents. Checks tools validity, read-only enforcement, bypassPermissions, model/ effort consistency. All 20 agents at perfect score - CI Quality Gate — 5-job pipeline: markdown/YAML/JSON lint, Python lint (ruff), shell lint (shellcheck), security audits (npm audit, pip-audit), skill+agent eval. 52 pytest tests for the eval framework
- Makefile — Primary command interface:
make eval,make test,make ci,make eval-fix(auto-fix + suggest autoresearch). Language-agnostic entry point plugin-dev-workflowlocal skill — Auto-triggers when editing plugin files. Guides contributors through eval commands, CLI syntax, pre-commit checklist- Interesting findings log —
lab/findings/interesting.jsonlcaptures metrics, research insights, bugs, patterns during development. 45+ entries - Dependabot for pip ecosystem + requirements.txt (PyYAML, pytest)
- Staged evaluation (from Hyperagents paper) —
/phx:autoresearchloop runs cheap checks first (compile 5s), skips expensive checks (test 30s+) if cheap fail
Changed
- 36 of 40 skill descriptions rewritten — Added "Use when..." clauses per Anthropic trigger optimization guide. Domain keywords added, vague words removed. Behavioral sweep improved plan (0%→100% recall), quick (0%→100%), boundaries, document, liveview-patterns, pr-review, security
- Iron Laws added to 6 skills missing them (hexdocs-fetcher, learn-from-fix, quick, init, boundaries, verify)
- Stale references fixed —
/phx:learn→/phx:learn-from-fixacross 3 skills. YAML frontmatter fixed in perf and permissions (unquoted brackets) - Review Step 2 compressed from 49 to 37 lines
- Planning orchestrator — Research cache reuse expanded with glob discovery, keyword grep, freshness gate (48h), agent skip mapping
- deep-bug-investigator — effort: high → medium (matches sonnet model)
no_dangerous_patternsmatcher — Skips Iron Laws, Red Flags, Detection, Checklist, Confidence Levels sections (false positive fixes for anti-pattern docs)- README — Updated counts (40 skills, 20 agents), added contributing guide with eval commands, roadmap section
- Permissions output format — Fixed deprecated
Bash(name:*)→Bash(name *)per Claude Code docs
Fixed
/phx:verifyalias fallback — Discovery now validates aliases againstmix.lockbefore using them. Falls back to individual steps if composite command fails (e.g.,mix checkwhen ex_check not installed locally)setup-dirs.sh— Added.claude/research/to SessionStart directory creationlearn-from-fixname mismatch — Frontmatter corrected to match directory- CI yamllint — Ignores
node_modules/and.claude/directories - CI ruff — Ignores E402 (imports after sys.path.insert are intentional)
- Unused Python imports — Cleaned across agent_scorer, generate_evals, matchers
[2.5.0] - 2026-03-21
Added
effortfrontmatter on all 38 skills — Skills now declare effort level (low/medium/high) per Claude Code v2.1.80. Mechanical skills (verify, quick, compound, brief) uselow; reference skills (ecto-patterns, security) usemedium; complex reasoning skills (plan, full, investigate, review) usehigh. Reduces token usage on simple tasks while preserving quality on complex oneseffortfrontmatter on all 20 agents — Agents declare effort matching their cognitive load. Haiku agents (context-supervisor, verification-runner, web-researcher, xref-analyzer) uselow; sonnet specialists usemedium; opus orchestrators and security-analyzer usehighPostCompacthook (postcompact-verify.sh) — Verifies active plan state survived context compaction. Warns Claude to re-read plan and scratchpad files when unchecked tasks detected post-compaction (Claude Code v2.1.76)StopFailurehook (stop-failure-log.sh) — Logs API failures to plan scratchpad for resume detection. Next session's check-resume hook picks up the failure context and suggests/phx:work --continue(Claude Code v2.1.78)- Plugin
settings.json— Ships recommended defaults:effort: medium,showTurnDuration: true. Users inherit these unless overridden in their own settings (Claude Code v2.1.49) ${CLAUDE_PLUGIN_DATA}persistent storage — setup-dirs creates${CLAUDE_PLUGIN_DATA}/skill-metrics/for cross-project metrics that survive plugin updates. log-progress writes edit events as JSONL for cross-project aggregation (Claude Code v2.1.78)${CLAUDE_SKILL_DIR}variable in 30 skills — Reference file paths now use${CLAUDE_SKILL_DIR}/references/instead of barereferences/, making paths explicit and reliable across plugin cache locations (Claude Code v2.1.71)
Changed
- hooks.json — Added PostCompact and StopFailure hook events (now 9 hook types total, up from 7)
- setup-dirs.sh — Creates persistent plugin data directory when
${CLAUDE_PLUGIN_DATA}is available - log-progress.sh — Writes cross-project edit metrics to JSONL in persistent plugin data directory
/phx:permissionsskill — Analyzes recent Claude Code sessions to identify frequently-approved Bash commands, classifies them by risk (GREEN/YELLOW/RED), and recommends safe additions tosettings.json. Inspired by Intercom's permission analyzer pattern. Includes 4 Iron Laws,--daysand--dry-runflags, and reference docs for risk classification and settings format
[2.4.0] - 2026-03-19
Fixed
Document: no-op pre-check —
/phx:documentnow checksgit difffor new.exfiles before running full audit. Prevents 35-message analysis sessions that conclude "PASS — nothing needed" (session bb0a0454)Challenge: dedup enforcement — Strengthened prior findings dedup to prevent "3 challenges to clear" problem where same critical issues re-appear across consecutive runs. Now MANDATORY with explicit SKIP for fixed issues and one-line PERSISTENT mentions
Investigate: no confirmatory subagents — Added rule to avoid spawning parallel subagents when root cause already identified in main context (~80K tokens wasted in session c135330a)
Audit: lean agent output — Added output efficiency rule to audit subagent prompts (report only issues, not clean checks)
Full: Stronger no-narration enforcement — Post-PR validation (19 sessions, 5 days) showed 30% of messages still had "Let me now..." preamble. Upgraded from soft suggestion to HARD rule with explicit prohibited phrases and self-correction instruction
Review agents: Verify before claiming — Added mandatory rule to elixir-reviewer and oban-specialist: never claim library behavior without checking source/docs first. Prevents incorrect BLOCKER findings that inject wrong code (confirmed: session f0242cf5 had two agents independently make wrong Oban Pro snooze claim, causing revert + user correction cycle)
Changed
- Review: Conditional agent spawning — Iron-law-judge now skipped when PostToolUse hooks already verified all files; verification-runner skipped when work phase passed all tests. Saves 80-150K tokens per review (validated across 56 sessions: iron-law-judge used 78K tokens for zero violations in R3 /phx:full; verification-runner was always redundant)
- Review: Lightweight path — For <200 lines changed, spawn only elixir-reviewer + security-analyzer. Saves 30-50K tokens per small review
- Review: Diff-scoped agents — All review agents now receive
git diff --name-onlywith instruction to focus on NEW code only. Pre-existing issues get one-line mentions. Eliminates 25-50% of false positives from pre-existing code flagging - Iron-law-judge: Violations only — Removed "Clean Checks" output section (was 62% of output = ~2,800 words of "checked and it's fine"). Now outputs only violations with one summary line for clean checks
- All review agents: No praise sections — Removed "What's Good" from elixir-reviewer, "Good Practices Observed" from testing-reviewer, and "N/A" category listings from security-analyzer. These consumed 16-56% of output tokens for zero actionable value
- Context-supervisor now mandatory for 4+ agents — Previously optional, now required. Prevents 12-20K tokens of raw agent output flooding the parent context (never used in any of 6 review sessions)
- Plan: Skip research from review — New Iron Law #7: when planning from review/investigation output, skip research agents. The findings ARE the research. (56-session analysis: same finding discovered 3-4x across review→investigate→plan, wasting ~96K tokens)
- Work: Scoped verification — Per-task: compile only (format handled by hook). Per-phase: compile + scoped tests. Full suite only at final gate. Eliminates 40-50% of redundant verification runs
- Full: Lean review + no narration — Added Iron Laws #6 (skip redundant review agents) and #7 (no narration in autonomous mode). Execute tool calls directly without "Let me now..." preamble
Added
- Skill eval framework (
evals/) — 3-phase automated testing for plugin skills with structural assertions (16 matcher types, zero API cost) and behavioral tests (LLM-as-judge with synthetic Phoenix scenarios) /evalcommand skill — Run structural, behavioral, A/B, and regression evals from Claude Code sessions- 4 synthetic test scenarios — acme_shop (18 files, 4 bugs), demo_blog (10 files, 2 bugs), sample_crm (25 files, 3 bugs), tiny_api (6 files, greenfield)
- 9 structural assertion specs — compound, plan, review, work, verify, quick, ecto-patterns, liveview-patterns, security
- 5 behavioral behavior specs — plan, review, investigate, compound, work
- eval-judge agent — Sonnet-based read-only judge for behavioral scoring
- Eval suite orchestrator (
run_suite.py) — baseline management, regression detection, A/B comparison, trend tracking - npm scripts:
eval:structural,eval:structural:changed,eval:full
[2.3.1] - 2026-03-12
Changed
- Skill descriptions: full optimization pass — Applied Skill Creator methodology (trigger eval queries + train/test optimization) to all 12 auto-triggered reference skills. Average triggering accuracy improved from 15.0/20 to 19.3/20 (+29%). Key techniques: replaced generic terms with specific API/file keywords, added negative boundaries to prevent skill overlap, used user vocabulary instead of meta-language. Biggest wins: intent-detection (+10), assigns-audit (+7), oban (+6), elixir-idioms (+5)
[2.3.0] - 2026-03-11
Added
- Iron Law #22 — VERIFY BEFORE CLAIMING DONE: never say "should work"
without running
mix compile && mix test(inspired by Superpowers plugin) - PreToolUse
block-dangerous-ops.shhook — blocksmix ecto.reset/drop,git push --force, andMIX_ENV=prodbefore execution - PostToolUse
debug-statement-warning.shhook — warns aboutIO.inspect,dbg(),IO.putsleft in production.exfiles - Review conventions system (
references/conventions.md) — after review, offer to suppress accepted patterns or enforce new conventions via.claude/conventions.md. Review agents read conventions and skip suppressed patterns (inspired by Carmack Council plugin) - Pre-existing issue separation — review findings on unchanged code marked PRE-EXISTING and excluded from verdict (inspired by iterative-engineering)
Changed
- Review system: dynamic reviewer selection — analyze diff to select 3-5 agents from pool instead of always spawning all 5. Always-on: elixir-reviewer, iron-law-judge, verification-runner. Conditional: security-analyzer, testing-reviewer, oban-specialist, deployment-validator (inspired by iterative-engineering)
- Review system: anti-over-recommendation filter — 5 noise-filtering questions applied to findings before writing review (inspired by Carmack Council)
- Review system: mandatory summary table — every review ends with
at-a-glance
| # | Finding | Severity | Reviewer | File | New? |table - Review system: lane discipline — explicit overlap resolution rules between parallel review agents for consistent deduplication
- Skill descriptions: CSO audit — 4 skills (full, work, plan, compound) reworded to lead with trigger conditions instead of workflow summaries (inspired by Superpowers CSO discovery)
- Skill descriptions: anti-trigger patterns — ecto-patterns, security,
liveview-patterns now include
DO NOT load for...conditions (inspired by Anthropic Skills repo)
[2.2.0] - 2026-03-11
Fixed
- PreCompact hook (
precompact-rules.sh) — Fixed JSON validation failure that broke context preservation across compaction. Claude Code's schema validation rejectshookSpecificOutputwithhookEventName: "PreCompact"(only PreToolUse/PostToolUse/UserPromptSubmit are valid). Switched to top-levelsystemMessagefield which is schema-valid for all hook types
Changed
- web-researcher agent — Full rewrite as haiku fetch worker (was sonnet). Source-specific WebFetch extraction prompts (ElixirForum, HexDocs, GitHub, blogs) reduce token usage 30-50% per fetch. Parallel WebFetch calls in single response for 3-5x speedup. Removed unused tools (Read, Grep, Glob) and elixir-idioms skill preload (caused safety scanner false positives). Agent is now a focused data collector; synthesis stays with the caller
- research skill (
/phx:research) — Added query decomposition (extracts 2-4 focused queries from long user input instead of passing raw text to WebSearch), pre-flight cache check, and parallel worker spawning (1-3 web-researcher agents per topic cluster). New Iron Law: never pass raw user input as WebSearch query. Removes duplicate searching (skill searches OR agent searches, not both) - planning-orchestrator — Updated web-researcher spawn guidance: pass focused queries or pre-searched URLs, spawn multiple agents for multi-topic research
- agent-selection reference — Added web-researcher spawn rules (model, URL limits, summary size, parallel spawning)
- research skill (
/phx:research) — Added Tidewave-first routing: when topic is about an existing dependency, usesmcp__tidewave__get_docs(version-exact, zero web tokens) before falling through to web search - planning-orchestrator — Added Phase 1c research cache reuse: checks
.claude/research/and.claude/plans/*/research/for existing research before spawning web-researcher agents (prevents duplicate web research across planning sessions) - intro tutorial — Updated
/phx:researchdescription in cheat sheet to reflect parallel workers and Tidewave-first routing
Added
- PostToolUse iron-law-verifier.sh hook — Programmatic code-content scanning for Iron Law violations after Edit/Write. Catches String.to_atom, :float for money, raw/1 with variables, implicit cross joins, bare GenServer.start_link, and assign_new misuse. Inspired by AutoHarness (Lou et al., 2026) "harness-as-action-verifier" pattern: code validates LLM output and feeds specific violation + line number back for targeted retry
- PostToolUseFailure error-critic.sh hook — Detects repeated mix command failures and escalates from generic hints (attempt 1) to structured critic analysis (attempt 3+). Tracks failure count per command, consolidates error history, and suggests /phx:investigate. Implements the Critic→Refiner pattern from AutoHarness: structured error consolidation before retry prevents debugging loops
- harness-patterns.md reference — New work skill reference documenting the critic-refiner pattern for error recovery, action verification hook architecture, and anti-patterns for unstructured retry loops
Changed
- fulltext-search.md — Rewritten with generated columns (preferred over triggers), trigram similarity (pg_trgm), hybrid search with RRF, multi-language support. Based on Search is Not Magic with PostgreSQL
- oban-pro-basics.md — Slimmed to essentials + official HexDocs links. Prevents stale static content; directs to upstream for latest API
- 5 skill descriptions improved —
plan(--existing mode),research(--library flag),hexdocs-fetcher(wrapper purpose),examples(workflow demos),audit(5 specific areas) - Official doc links added to
otp-patterns.md,mix-tasks.md,elixir-118-features.md,oban-pro-basics.md,testing-patterns.md— enables fresh doc fetching
Fixed
workskill — Added mandatory scratchpad read before implementing + clarify-ambiguous-tasks Iron Law. Addresses high correction rate (0.61) from skill-monitor dataskill-monitor— Added skill-type weighting so analysis/check skills (verify, triage, perf, boundaries) use appropriate thresholds instead of universal 0.5 cutoffperf,boundaries,pr-review— Added "findings to plan" next-steps nudge so analysis results lead to actionable follow-up instead of getting lostfullskill — Added missing Iron Laws section (5 rules: verification, cycle limits, state transitions, discover-first, agent output boundaries)auditskill — Trimmed from 192 to 154 lines (was over 185 hard limit)reviewskill — Trimmed from 190 to 169 lines (was over 185 hard limit)boundariesskill — Trimmed from 170 to 145 lines (was over 150 hard limit)compute-metrics.py— Fixed datetime.min tz-naive comparison crash in trends, fixed fromisoformat returning naive datetime for date-only strings
Removed
- 3 unfinished deploy references —
ci-templates.md,kubernetes-config.md,observability.md(undocumented, incomplete, not double-checked)
[2.1.0] - 2026-03-05
Added
- SubagentStart hook — injects all 21 Iron Laws into every spawned subagent
via
additionalContext(fixes #1 session analysis finding: zero skill auto-loading in subagents) - PostToolUseFailure hook — Elixir-specific debugging hints when mix
compile/test/credo/ecto commands fail, injected via
additionalContext - Skill effectiveness monitoring (
/skill-monitor) — per-skill metrics dashboard with action rate, friction, corrections tracking. Includesskill-effectiveness-analyzeragent for improvement recommendations - 9 new reference files —
otp-patterns.md,js-interop.md,ci-templates.md,with-and-pipes.md,scopes-auth.md,advanced-patterns.md,documentation-patterns.md,briefing-guide.md,execution-guide.md - Iron Laws sections added to skills: audit, document, investigate, research
- Changelog and semantic versioning
Fixed
- PostToolUse hooks broken for ~1 month (CRITICAL) —
plan-stop-reminder,security-reminder,format-elixirall wrote to stdout which is verbose-mode only. Now use stderr + exit 2 so Claude actually receives the messages - PreCompact rules never injected — stdout has no context injection path
for PreCompact. Rewritten to use JSON
hookSpecificOutput.additionalContext - SessionStart hooks running on /compact — split matchers so informational hooks (scratchpad, resume, branch freshness) only run on startup|resume
- compute-metrics.py O(n^2) bug —
messages.index()replaced withenumeratefor correct windowing and O(n) performance - compute-metrics.py post_test_runs always 0 — ccrider-format messages have empty tool input; added text-based detection fallback
- compute-metrics.py backfill schema gap —
backfill_from_v1now includesskill_effectiveness: {}for consistent schema
Changed
- All 38 skill descriptions enriched for better auto-loading triggers (e.g., assigns-audit now triggers on "memory leaks", "slow LiveView renders")
- Updated CLAUDE.md hooks section with all 6 hook events and output patterns
- Updated README with
/skill-monitorin session analysis tools - Updated
/phx:introtutorial hooks table with new hooks
[2.0.0] - 2026-02-19
Added
- Iron Law #21: never use
assign_newfor values refreshed every mount - VERIFYING phase in
/phx:fullworkflow (compile + format + credo + test between work and review) - Behavioral rules in CLAUDE.md: auto-load patterns, skill loading by file type, Iron Laws enforcement protocol
- Elixir 1.18 deprecations reference, try/after patterns, mix tasks reference
/phx:briefskill for interactive plan briefings with visual formatting/docs-checkcontributor tool for plugin compatibility validation- Markdown linting with markdownlint + husky pre-commit hooks
learn-from-fixrewritten to write to project memory (not plugin files)
Changed
- Agent model tiers optimized for Sonnet 4.6: most specialists moved from opus to sonnet, haiku for mechanical tasks (verification, compression)
- Planning workflow improved: agent blocking, session handoff for 5+ task plans, research synthesis
- Review, verify, testing, and Tidewave skills enhanced
- Intro tutorial split into 6 sections (was 5) to prevent content truncation
- Session analysis migrated to v2 pipeline (scan/deep-dive/trends with JSONL append-only ledger)
Fixed
- Challenge skill dedup and multiSelect support
- Parallel-reviewer and skill tool scoping permissions
permissionMode: bypassPermissionsapplied to all 20 agents (was causing "Bash command permission check failed" in background agents)- Project name leaks in skill content
- Stale counts and intro tutorial accuracy
- Template placeholder filtering in session extraction
[1.0.0] - 2026-02-13
Added
- Initial release
- 20 specialist agents (orchestrators, reviewers, analysts)
- 38 skills covering full development lifecycle
- 20 Iron Laws (LiveView, Ecto, Oban, Security, OTP, Elixir)
- Plan-Work-Review-Compound workflow cycle
- PostToolUse hooks: format check, security reminder, progress logging
- SessionStart hooks: directory setup, Tidewave detection
- Stop hook: warn on uncompleted plan tasks
- PreCompact hook: rule preservation across context compaction
- Tidewave MCP integration (auto-detected)
- Context supervisor pattern for multi-agent output compression
- Plan namespaces (
.claude/plans/{slug}/) - Compound knowledge system (
.claude/solutions/)