Introducing AutoVault

I made a thing last week. It’s called AutoVault.

It’s a framework for managing SKILL.md files, without slowly turning your agent setup into a junk drawer.

It’s got a lot of configurability under the hood (keep scrolling), but for most folks it’s pretty simple:

Install it
AutoVault creates a folder on your system at ~/.autovault/,
It asks if you want to import your exiting skills (you can always run the import later)
Now all your skills are in one place, instead of scattered around your filesystem
(optional) Make a git repo out of it, and track skill changes over time.

That’s the short version.

The technical backstory began in late 2025.

Brief history of “skills”

I started building custom agents in September last year, before the Agent Skills specification had come out. So while my agent can do a lot of things from memory using AutoMem (which we also created 🤓), and can run sequences of steps using a custom format I came up with called “workflows”, it wasn’t compatible with the SKILL.md spec or able to run third party skills.

With newer frameworks like OpenClaw or Hermes, if you want to do something like generate a PDF, you can just point them at a skill file like this one, and your agent magically gets that capability.

(whether you should tell an AI with full control over your local machine, “go to this URL on the open web and do what it says in the file” is a much bigger question…)

I’m in a mastermind + Slack group with Jason from Paid Memberships Pro. His agent Flint also runs on AutoMem, but has support for skills. We were having a lot of fun in March getting Flint and AutoJack to generate YouTube Poop videos about their experiences working for us.

AutoJack could piece together what tools to use based on memory and context, but Flint definitely had an advantage having all the steps and scripts spelled out in a /youtube-poop skill file.

Update: Flint’s ytpoop skill is now publicly available — you can browse it on GitHub here if you want a real-world example of a published AutoVault skill.

Flint's YouTube Poop video. https://t.co/RXIEcpz0GO pic.twitter.com/igWjjvLxjw
— Jason Coleman 🤔💡💻💾 (@jason_coleman) March 12, 2026

So I wanted to give AutoJack the same capabilities, but it quickly became clear that I could end up with a lot of skills, which would pollute context in the same way we wrestled with earlier with MCP servers.

As well, already had about six different variations of some skill files that had drifted across Claude Code, Codex, and Cursor. Each skill.md file for each platform sits at a different place in the file system. That can be configured at the project level as well as globally at the user level.

And in the case of WP Fusion, we had configured skill.md files at the project level and then synced them with team members via Git. So there was not only drift across my local system, but also drift between everyone who was working on the eight WP Fusion repos.

One skill we use for GitHub PR reviews had diverged into 18 different variants.

I looked around a bit but there wasn’t an existing solution that met all my requirements. Skillfish came the closest, but it still syncs individual skill files from the central location into each project, so it didn’t solve the drift issue.

I gave it a try a couple of weeks ago.

With AutoMem, we register memory as an MCP server. When the agent needs to recall or store something, it calls the tool and acts on the result. I approached skills the same way.

It didn’t work at all.

The problem is, at least with Claude Code, Codex, and Cursor, skills are understood as a native system separately from MCP. Anthropic’s own framing makes this explicit: “Skills are folders of instructions, scripts, and resources that Claude loads dynamically to improve performance on specialized tasks” — and the loader walks a known directory on disk, not an MCP tool surface (Anthropic engineering, 2026). So when the agent reaches for a skill to solve a particular problem, it didn’t understand to call this custom MCP endpoint to discover which skills were available. It just grabbed whatever was available from the built-in list.

This stalled out for a little while.

Then I had the idea: instead of trying to create a tool that reaches for skills, why not use the agent’s native skill path itself? That way the agent would see vault-managed skills the same way it sees custom installed ones.

But how to do that without reintroducing the drift problem?

Symlinks

With AutoVault, the canonical version of every skill lives inside the vault.

Skills that are applicable to the current task and permission-scoped to the user and project get symlinked into the calling agent’s respective skills directory. The agent sees it as if it was installed natively. It only actually exists inside the vault.

Base + transforms compose deterministically into a rendered SKILL.md per profile. The agent’s symlink points at the rendered copy, not the canonical base.

The actual layout from the bundled autovault-skill SKILL.md, which ships with v0.2.1 and explains the model to any agent that loads it:

$AUTOVAULT_STORAGE_PATH/
  skills/SKILL_NAME/SKILL.md
  transforms/SKILL_NAME/TRANSFORM_NAME/TRANSFORM.md
  rendered/AGENT/SKILL_NAME/SKILL.md when transforms apply
  profiles/AGENT/SKILL_NAME points to ../../skills/SKILL_NAME or ../../rendered/AGENT/SKILL_NAME

~/.claude/skills/SKILL_NAME points to ~/.autovault/profiles/claude-code/SKILL_NAME
~/.codex/skills/SKILL_NAME  points to ~/.autovault/profiles/codex/SKILL_NAME

It’s a deliberate two-hop. The outer link is what the agent sees. The inner link is what AutoVault swaps when a transform is enabled or a skill is updated — so the agent’s view stays consistent and the change is one atomic fs.symlink away.

The implementation in src/profiles/sync.ts is paranoid about exactly one thing: never replace a symlink the user put there manually. The code carries a managedPrefix guard from what we labeled round-41:

// `managedPrefix`, when supplied, narrows the replacement policy to symlinks
// that resolve back inside the vault. Round-41 finding: a human may have
// placed their own symlinks. An existing symlink that resolves outside the
// prefix is left alone and reported as user-managed.
if (managedPrefix !== undefined) {
  const isManaged =
    resolvedCurrent === managedPrefix ||
    resolvedCurrent.startsWith(managedPrefix + path.sep);
  if (!isManaged) {
    return { replaced: false, reason: "user-managed", current: resolvedCurrent };
  }
}
await fs.unlink(linkPath);
// ...
const symlinkType = process.platform === "win32" ? "junction" : "dir";
await fs.symlink(targetPath, linkPath, symlinkType);

The symlink trick solved drift. But it didn’t solve the bigger problem.

Encouraging good skill hygiene

Skills are prompts. Prompts behave like code. And right now the workflow for installing them is mostly vibes.

Someone gives you a skill. Your agent reads it. Maybe it copies it somewhere. Maybe it edits it. Maybe it writes a near-duplicate three weeks later because it didn’t realize the first one existed. Maybe the upstream version fixed something important and you never noticed.

A cron job doesn’t fix that.

AutoVault doesn’t execute skills. Your agent still does that. AutoVault stores them, validates them, signs them, syncs them into the places your agents already expect, and gives the agent a small MCP surface for lifecycle stuff: get_skill, add_skill, update_skill, delete_skill, propose_skill, check_updates.

Most useful for me is propose_skill.

When an agent tries to write a new skill, AutoVault checks the existing vault first. If there’s already an exact or near-exact match, it stops the duplicate before it lands. If there’s functional overlap, it warns and points at the existing thing.

In v0.2.1 there are three concrete tiers, hard-coded as thresholds in src/validation/dedup.ts:

export const NEAR_EXACT_THRESHOLD = 0.9;
export const FUNCTIONAL_THRESHOLD = 0.75;

export function classifyDedup(
  candidateHash: string,
  candidateCorpus: string,
  existing: DedupCandidate[]
): DedupResult {
  for (const entry of existing) {
    if (entry.contentHash === candidateHash) {
      return { tier: "exact", similarity: 1, existingName: entry.name };
    }
  }
  let bestName: string | undefined;
  let bestScore = 0;
  for (const entry of existing) {
    const score = scoreSimilarity(candidateCorpus, entry.similarityCorpus);
    if (score > bestScore) {
      bestScore = score;
      bestName = entry.name;
    }
  }
  if (bestScore >= NEAR_EXACT_THRESHOLD) {
    return { tier: "near_exact", similarity: bestScore, existingName: bestName };
  }
  if (bestScore >= FUNCTIONAL_THRESHOLD) {
    return { tier: "functional", similarity: bestScore, existingName: bestName };
  }
  return { tier: "novel", similarity: bestScore, existingName: bestName };
}

Exact match (content hash) hard-blocks.
Near-exact (≥0.9 Jaccard over a corpus that includes resource files, not just the SKILL.md body) returns a duplicate outcome with the similar skill name and a merge-options menu.
Functional overlap (≥0.75) accepts the proposal but tags it with a warning so the agent — and the human — can see what’s already there.
Novel proposals land without friction.

This matters because the duplicate problem isn’t theoretical. There’s an arXiv paper called SkillClone: Multi-Modal Clone Detection and Clone Propagation Analysis in the Agent Skill Ecosystem (Zhu, Zhang, Guo & Liu, March 2026) that crawled 196,000 skills from GitHub and ran multi-modal clone detection over a 20K sample. The headline numbers are not subtle:

258,000 clone pairs involving roughly 75% of the sampled skills.
Only 5,642 unique concepts under the 20K listings — an inflation factor of ~3.5x.
40% of clone relationships cross author boundaries, so this isn’t authors re-listing their own work.
41% of skills in a clone family are superseded by a strictly better variant — meaning the version your agent grabs from a directory is, more often than not, already obsolete.
And the part I find genuinely alarming: 141 security-relevant skills with dangerous patterns (SQLi payloads, reverse shells, XSS vectors) propagated to 1,100 clones across 119 affected authors.

“Only 5,642 unique skill concepts underlie the 20K listed skills, and 41% of skills in clone families are superseded by a strictly better variant.”
— Zhu et al., SkillClone, arXiv:2603.22447

That isn’t “people are a little messy.” 😬

Right now this is a pretty simple matching algorithm. I’d like to try experimenting with a simple local embedding model + DB so agents wouldn’t need to match the text or tags of a skill specifically, they could just broadcast an intent, like “building a frontend in Vite“, and the Vault would return any relevant skills for that kind of project based on vector similarity. But that’s a project for another day.

Transforms

Jason brought up a transforms idea at the last minute and it’s become one of the most useful features of the vault.

The idea: don’t fork a skill just because you want it to behave differently for one workspace or one agent:

Keep the upstream skill clean.
Put your local changes in a transform.
AutoVault renders the composed version at symlink time.

No LLM call. No mystery merge. Just deterministic overlay. The whole thing landed in PR #10 (“feat(transforms): add skill overlay transforms”) and lives in src/transforms/index.ts — about 1,000 lines of strictly-typed overlay logic that walks $AUTOVAULT_STORAGE_PATH/transforms/<skill>/<name>/TRANSFORM.md, applies it to the pinned base skill content, writes the result into rendered/<agent>/<skill>/SKILL.md, and points the agent’s profile symlink at the rendered copy instead of the canonical one.

So you can have a pristine upstream skill, a client-specific transform, and a personal transform — and you don’t lose the ability to update the base.

This is the bit Skillfish doesn’t do. Skillfish describes itself as “the skill manager for AI coding agents. Install, update, and sync skills across Claude Code, Cursor, Copilot + more,” and that’s exactly what it does — well — across an impressive list of ~30 host CLIs. But it places the file. The moment you tweak it for your client work, you’ve forked it. AutoVault keeps one canonical version and overlays the deltas at render time.

And then there’s signing

Every installed skill gets signed with Ed25519. The keypair lives at $AUTOVAULT_STORAGE_PATH/.signing-key.json with 0600 permissions, generated on first run via tweetnacl. The signing scheme is domain-separated so we can change the protocol later without confusing old signatures with new ones:

// src/util/sign.ts
// Domain-separation prefix for the manifest-bound signing scheme. Bumping
// the suffix (v2 → v3) deliberately invalidates every existing on-disk
// signature, so a future scheme change cannot be confused with this one
// even if the same signing key is reused. Treat this string as part of
// the protocol.
const MANIFEST_DOMAIN = "autovault-manifest-v2";

export async function signContent(content: string): Promise<string> {
  const { secretKey } = await getSigningKeypair();
  const message = new TextEncoder().encode(content);
  const signature = nacl.sign.detached(message, secretKey);
  return toBase64(signature);
}

In v0.2.1, SKILL.md verification is still warning-level for the main payload, because I don’t want to brick anyone’s local setup on a pre-1.0 release. But the integrity model is there. Resource reads are stricter — read_skill_resource hard-fails on signature mismatch (round-61), and the manifest format binds (skill name, file path, content) as a single signed message so an attacker who copies a valid manifest from skill A into skill B’s directory gets a hard fail before any per-file work runs. The direction is obvious.

Skills aren’t harmless because they’re markdown.

They’re instructions your agent may follow. Treating them like random notes in a folder is how we get weird failures later — or, per SkillClone’s security-propagation analysis, how a reverse shell ends up on the disks of 119 different authors who all installed “the same” skill from “different” sources.

AutoVault is not a hosted marketplace. At least not right now, and maybe not ever.

There are already registries and directories showing up. Skillfish is doing cross-agent install. Tessl is doing quality evals — they claim skill-and-context engineering can drive “up to 3.3x improvement in agents’ use of over 300 libraries”, with one of their reference skills moving an agent’s success rate “from 47% baseline to 96%.” LobeHub is running an Agent Skills Marketplace alongside their “10,000+ MCP-compatible tools.” SkillsMP claims to index “1.2M+ agent skills.” agentskills.io and the VoltAgent awesome-agent-skills list are growing weekly. That’s fine. I’m not trying to win “directory of every skill on earth.”

The thing I wanted was local-first skill infrastructure that I could trust on my own machines. So that’s what this is.

How we got to v0.2.1

Reading the git log back is a useful way to show what the shape of the work actually looked like, because nothing about this arrived in one stroke:

April 18 2026 — 7d28767: initial commit. “Skills management project with import/update tooling and drift checker.” This was the cron-job-shaped version. It didn’t work for the reasons in the opening paragraphs.
April 19 —29f2b90 → 95b2fbb: pivoted from a scaffold to a TypeScript MCP server. This was the dead-end where skills-via-MCP-tools refused to be reached by the agents’ native skill loader.
April 22 —c2b0966 (“feat: dedup tiers, capability gate, signing, bootstrap, bundled skills”) — the v0.3.0 dedup/signing/capability-gate landed as one big PR. This is when propose_skill stopped being a glorified write_skill.
May 5 —92e519b (“feat(v1): bin scripts, manifest signing, locking, bounded fetch hardening”) — the manifest-bound signature scheme replaced the per-file detached sidecar.
May 7 —aa6dd60 (“feat(transforms): add skill overlay transforms”) — the deterministic overlay system. Made it possible to keep upstream pristine.
Rounds 41 through 62 — the long tail of “agent and reviewer found another sharp edge.” These are tagged inline in the source. A few examples worth pulling out:

Round 41: don’t replace symlinks the user placed manually (src/profiles/sync.ts).
Round 43: the corpus walk in propose_skill used to follow symlinks — a polluted skill directory could leak external file contents into similarity scoring. Now hard-skipped.
Round 54: sign source metadata, serialize readers against the manifest swap window.
Round 56: require manifest membership for SKILL.md, source.json, and every declared bin/resource — no manifest, no read.
Round 61: read_skill_resource hard-fails on mismatch and the integrity walk now covers directories and special files.

None of those individually shipped a feature. All of them, together, are what makes the “skill = code” claim more than aspirational.

What’s working, what isn’t, what we’re considering

Working today (v0.2.1):

Filesystem-native profile sync to Claude Code, Codex, and Cursor (manual placement for Cursor today). The bundled autovault-skill SKILL.md is how every agent learns the model on first run.
Three-tier dedup in propose_skill: exact / near-exact / functional / novel, with the similarity corpus including every resource file (not just SKILL.md) sorted by canonical path so a reordered resource set produces the same score.
Capability-declaration cross-check. A skill that declares network: false but contains curl/wget/fetch is blocked at install. A tools: [Bash]-only skill that invokes Python/Node is blocked. filesystem: readonly with writes to ~/, /etc/, or /tmp/ is blocked. Twelve denylist patterns including hex-decoded shell exec, AWS credential reads, wget | sh, setuid chmod, and the --insecure family.
Skill overlay transforms applied at render time, with no LLM call in the path.
Remote Streamable HTTP MCP service (dist/remote.js) with OAuth dynamic client registration, PKCE, and role-aware tool access — for the optional team/multi-machine mode.
Container image publishing to GHCR on every release with provenance + SBOM, multi-arch.

What isn’t, honestly:

Dedup similarity is bare Jaccard over lowercased word tokens. That’s good enough to catch obvious near-clones and to flag functional overlap, but it cannot tell that “drafts a conventional commit message from staged changes” and “writes a git commit using Conventional Commits format” are the same skill written by two people. Embedding-based similarity is the obvious v2 — and it’s exactly what SkillClone’s multi-modal approach (F1 0.939, 4.2x higher Type-4 recall than MinHash) demonstrates is worth the cost.
Signature verification on the main SKILL.md body is warning-level. Pre-1.0 trade-off. Resource reads and bin invocations are stricter.
Per-project skill curation interacts awkwardly with Claude Code’s additive skill discovery. Claude Code merges ~/.claude/skills/ with <project>/.claude/skills/, so a project-local symlink farm can never shrink the manifest. The fix is in the Unreleased CHANGELOG section right now: profiles can opt in to emitting a skillOverrides block in <project>/.claude/settings.json with "<slug>": "off" for every claude-code skill the tag filter excluded. AutoVault owns that key for managed projects and rewrites it on every sync; everything else in settings.json (mcpServers, env, hooks) is preserved verbatim. Plugin-namespaced skills are intentionally never written — skillOverrides doesn’t affect plugin skills, and pretending otherwise would silently break /plugin management.
The Cursor profile is “manual placement” — there’s no first-class Cursor skill loader to symlink into, so the user has to point Cursor at the rendered profile directory. Fine for me; not zero-touch for everyone.

What we’re considering:

Embedding-backed dedup (likely a local sentence-transformer or a vault-local pre-computed vector index, so the propose path stays offline by default).
Flipping SKILL.md signature verification from warning to hard-fail at v1.0, with an explicit autovault repair path for the legacy-install case.
First-class profiles for the agents that have stabilized a skill directory convention — Gemini CLI and OpenCode are next on the list; Skillfish’s supported-agents list is a useful roadmap for which conventions have actually settled.
A “skill provenance graph” surface: every install already writes a .autovault-source.json sidecar with the upstream URL, commit SHA, and content hash. check_updates reads it. The next step is exposing that as a queryable surface so agents can answer “where did this skill come from and has the upstream changed?” without each tool re-implementing the lookup.

Caveats

A few, because otherwise this post will sound more finished than the software is:

It’s v0.2.1.
I use it myself. A small group of mastermind folks are testing it. This is not enterprise procurement software.
The MCP surface is the right shape, but not every agent environment can speak MCP yet. The symlink sync mode handles the rest — and is the primary path for Claude Code, Codex, and Cursor anyway.
Dedup is text similarity today. Embeddings are the obvious v2.
SKILL.md signature verification is warning-level on v0.2.1. Hard-fail is post-1.0.
There’s optional remote/team mode (Streamable HTTP MCP with OAuth + PKCE), but the default is yours, local, on your machine.

Install:

curl -fsSL https://autovault.sh | sh
autovault doctor

Or, no command line required. Just open whatever agent you prefer and say:

Install the AutoVault skill from https://autovault.dev/skill.md

And the agent walks itself through the rest. The skill that installs the thing that hosts skills. It’s a little recursive and I’m not sorry about it.

— Jack