The vault pattern: Why folders beat databases for personal tools

I’ve been building personal tools for years. Every time, I start with SQLite. Every time, I regret it.

Not because SQLite is bad—it’s fantastic. But because personal tools have different constraints than web apps, and databases optimize for the wrong things.

The problem with traditional databases

I built a journaling app in 2020. Used SQLite, like everyone said I should. Then I wanted to sync between devices. My options:

Build a sync server (now I’m running infrastructure)
Use a sync service (now I’m vendor-locked)
Try to sync the SQLite file (corruption city)

I went with option 1. Six months later, I shut down the server. My journal died with it.

Why CRDTs change everything

Last year I discovered Yjs. It’s a CRDT library that makes distributed data structures feel like magic. Here’s the key insight that took me too long to realize:

With CRDTs, sync isn’t a feature you build. It’s a property of the data structure.

// This is all the sync logic you need
doc1.on('update', update => {
  Y.applyUpdate(doc2, update)
})

That’s it. No conflict resolution. No sync servers. No three-way merges.

The vault pattern

Once you have CRDTs, you don’t need a database. You need a place to put files. Here’s the pattern:

~/YourApp/
├── service1/
│   ├── data.yjs      # The CRDT document
│   ├── assets/       # Binary files (images, etc)
│   └── config.json   # Service-specific settings
├── service2/
│   └── data.yjs
└── README.md         # Explain the structure

Each service owns a folder. Each folder contains:

One or more Yjs documents (the magic)
Regular files for assets
Human-readable configs

One document per logical unit

The first time I used Yjs, I made one giant document for everything. Bad idea. Here’s why:

Large documents = slow initial sync
Every change updates every peer
Merge conflicts become tangled

Instead, use one document per logical unit:

One doc per note
One doc per email thread
One doc per project

Now syncs are fast, conflicts are isolated, and you can garbage collect old data.

The implementation

Here’s actual code from Epicenter’s note service:

class NoteService {
  constructor(vaultPath) {
    this.notesDir = path.join(vaultPath, 'notes')
    this.docs = new Map()
  }

  async createNote(title) {
    const doc = new Y.Doc()
    const noteId = generateId()
    
    // Define the structure
    const note = doc.getMap('note')
    note.set('title', title)
    note.set('content', new Y.Text())
    note.set('created', Date.now())
    
    // Save to disk
    const filePath = path.join(this.notesDir, `${noteId}.yjs`)
    await fs.writeFile(filePath, Y.encodeStateAsUpdate(doc))
    
    this.docs.set(noteId, doc)
    return noteId
  }

  async loadNote(noteId) {
    const filePath = path.join(this.notesDir, `${noteId}.yjs`)
    const data = await fs.readFile(filePath)
    
    const doc = new Y.Doc()
    Y.applyUpdate(doc, data)
    
    this.docs.set(noteId, doc)
    return doc
  }
}

Simple, right? No SQL schemas. No migrations. Just data structures and files.

Handling binary data

CRDTs are great for text and structured data. They’re terrible for binary blobs. So don’t put binaries in CRDTs. Put them in the filesystem:

async addAttachment(noteId, file) {
  // Hash the content for deduplication
  const hash = await hashFile(file)
  const ext = path.extname(file.name)
  
  // Store in assets folder
  const assetPath = path.join(this.notesDir, 'assets', `${hash}${ext}`)
  await fs.copyFile(file.path, assetPath)
  
  // Reference in the CRDT
  const doc = this.docs.get(noteId)
  const attachments = doc.getArray('attachments')
  attachments.push([{
    hash,
    name: file.name,
    size: file.size,
    type: file.type
  }])
}

Now your attachments are just files. You can browse them with Finder. You can back them up with rsync. They’re not trapped in a database blob column.

Making it fast

CRDTs can get slow if you’re not careful. Here’s what works:

Enable garbage collection: doc.gc = true
Snapshot periodically: Save the full state every 1000 updates
Load incrementally: Start with the snapshot, then apply recent updates
Index separately: Keep a lightweight index for search

Here’s our snapshotting logic:

class DocManager {
  constructor(filePath) {
    this.filePath = filePath
    this.snapshotPath = filePath + '.snapshot'
    this.updatesPath = filePath + '.updates'
    this.updateCount = 0
  }

  async save(doc, update) {
    // Always append the update
    await fs.appendFile(this.updatesPath, update)
    this.updateCount++
    
    // Snapshot every 1000 updates
    if (this.updateCount >= 1000) {
      const state = Y.encodeStateAsUpdate(doc)
      await fs.writeFile(this.snapshotPath, state)
      await fs.unlink(this.updatesPath)
      this.updateCount = 0
    }
  }

  async load() {
    const doc = new Y.Doc()
    
    // Load snapshot if it exists
    if (await exists(this.snapshotPath)) {
      const snapshot = await fs.readFile(this.snapshotPath)
      Y.applyUpdate(doc, snapshot)
    }
    
    // Apply recent updates
    if (await exists(this.updatesPath)) {
      const updates = await fs.readFile(this.updatesPath)
      // Updates are concatenated, need to parse them
      let offset = 0
      while (offset < updates.length) {
        const updateSize = readVarInt(updates, offset)
        offset += getVarIntSize(updateSize)
        const update = updates.slice(offset, offset + updateSize)
        Y.applyUpdate(doc, update)
        offset += updateSize
      }
    }
    
    return doc
  }
}

The payoff

Here’s what you get with the vault pattern:

True offline-first: No sync server to be offline from
Debuggable: It’s just files. Use standard Unix tools
Portable: Copy the folder to move your data
Extensible: Third-party tools can read/write the format
Future-proof: Even if Yjs dies, your data is in documented formats

When not to use this

The vault pattern isn’t always right. Skip it if:

You need complex queries (use SQLite)
You have millions of items (use a real database)
You need ACID transactions (CRDTs are eventually consistent)
You’re building a web service (this is for personal tools)

Try it yourself

I extracted the core vault pattern into a library: vault-crdt. It handles the boring parts (snapshotting, corruption recovery) so you can focus on your app.

Or just steal the ideas. That’s why I’m writing this.

Personal tools should be personal. Your data should be yours. And folders of files is the most personal, portable data format we’ve invented.

Time to stop renting databases for our own thoughts.