# Building a Knowledge Base That Speaks to Both Humans and AI

**Author:** Aaddrick Williams
**Date:** March 11, 2026

**URL:** https://nonconvexlabs.com/blog/building-a-knowledge-base-that-speaks-to-both-humans-and-ai

---

I started a Claude Code session to work on a client's GCP migration. Before I typed anything, the agent read yesterday's journal entry from my vault, found the note about the deployment that had failed a health check, traversed two hops through the knowledge graph to pull up the service agreement and the launch timeline, and asked whether I wanted to pick up where I'd left off.

I didn't paste anything. I didn't re-explain context. Same notes, same links, same search.

## What It Looks Like

Here's what both audiences get from 152 notes.

| | What the human gets | What the AI gets |
|---|---|---|
| **Notes** | Rendered markdown with syntax highlighting, Mermaid diagrams, clickable wikilinks. Broken links styled distinctly so you see what's missing. | Raw content with frontmatter metadata, tags, visibility, and timestamps via `vault-read-note-tool`. |
| **Search** | A search bar with a semantic/text toggle. Type a concept, get results ranked by meaning. | `vault-semantic-search-tool` for meaning-based queries, `vault-search-tool` for substring matching. The tool descriptions tell the agent when to use each one. |
| **Graph** | An interactive D3 force-directed visualization. Nodes color-coded by folder, sized by connection count. Filter by folder or tag, zoom, pan, click for details. | `vault-neighborhood-tool` for multi-hop traversal, `vault-shortest-path-tool` to find how two notes connect, `vault-hub-notes-tool` for the most connected nodes, `vault-orphan-notes-tool` for isolated ones. |
| **Journal** | A calendar view. Click a date, read what happened. | Read `journal/2026-03-10` and know what was deployed, what broke, what decisions were made. All before the session starts. |
| **Writing** | A web form. Edit in the browser, see rendered preview, version history tracks changes. | `vault-create-note-tool`, `vault-update-note-tool`, `vault-edit-note-tool` for selective text replacement. Write meeting notes or investigation findings during a session. The next session finds them. |
| **Discovery** | Browse folders, filter by tag, follow wikilinks, explore the graph. | `vault-suggested-links-tool` uses embedding distance to surface notes that are semantically related but not yet linked. A recommendation engine for your own knowledge base. |

Sixteen MCP tools total. The vault launched with 11 tools and basic CRUD. Within three days: semantic search, 5 graph traversal tools, an interactive graph visualization, version history, and selective text editing. The whole system runs on Cloud SQL PostgreSQL 17 with pgvector. Zero new infrastructure. Embedding cost: $0 (Voyage AI free tier).

The rest of this article is about the design decisions that make one data store serve both audiences.

## The Problem

Every conversation with an AI coding tool starts with amnesia. The tool knows nothing about your project history, your decisions, or what happened yesterday. `CLAUDE.md` and `.claude/` files help: they encode stable conventions. But they don't cover the knowledge that accumulates as you work. The database schema decision from last Tuesday. The meeting notes from a client call. The investigation that turned out to be a timezone issue.

That knowledge lives in your head, or in a notes app your dev tools can't see. You end up as the bridge, copy-pasting context between two systems that don't talk to each other.

The common approaches each serve one audience. A dedicated vector store (Pinecone, Weaviate) optimizes for machine retrieval but gives humans nothing to browse. A wiki or Notion optimizes for human reading but gives AI tools nothing to query programmatically. RAG pipelines bolt retrieval onto existing content but create a separate layer that diverges from how humans navigate the same information.

I wanted one system. One data store, one schema, two interfaces.

## Architecture Decisions

Three decisions shaped the design.

### PostgreSQL + pgvector, Not a Dedicated Vector DB

The site already runs on Cloud SQL (PostgreSQL 17). Every note (content, metadata, tags, version history, wikilinks, and embeddings) lives in the same database. Semantic search, graph traversal, access control, and full-text search all query the same rows. Adding semantic search meant adding a column, not a service.

pgvector is a PostgreSQL extension. You add a `vector(1024)` column to your existing table, create an IVFFlat index, and query with the `<=>` cosine distance operator. No separate vector store to sync. No ETL pipeline to keep two systems consistent.

```sql
ALTER TABLE vault_notes ADD COLUMN embedding vector(1024);
CREATE INDEX vault_notes_embedding_idx ON vault_notes
  USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
```

The embeddings come from Voyage AI's `voyage-3.5` model (1024 dimensions), which has a free tier that covers my usage. Total infrastructure cost for semantic search: $0.

A dedicated vector database makes sense when you have millions of documents or need sub-millisecond retrieval. I have 152 notes. pgvector handles that without breaking a sweat.

### MCP as the AI Interface

The Model Context Protocol is an open standard for connecting AI tools to external data sources. Claude Code reads MCP server configurations natively. You point it at a server URL, it discovers the available tools, and it can call them during a session.

I could have built a custom tool that reads notes from a REST API. But MCP gives me something a custom tool doesn't: the tool descriptions are prompts.

```php
Tool::create('vault-semantic-search-tool', [
    'description' => 'Semantic search using AI embeddings for meaning-based matching. '
        . 'Finds conceptually related notes even when exact words differ. '
        . 'Use natural language queries (e.g. "how we handle authentication" '
        . 'rather than "auth"). For exact substring matching, use vault-search-tool instead.',
]);
```

That description is an instruction, not documentation. When Claude Code sees both `vault-search-tool` and `vault-semantic-search-tool`, the descriptions tell it when to use each one. The tool names, parameter descriptions, and behavioral guidance all load into the model's context as part of the MCP handshake. The AI reasons about the knowledge base before it makes a single call.

```mermaid
graph TD
    subgraph Data Layer
        PG[(PostgreSQL + pgvector)]
        VS[VaultService]
        PG --- VS
    end

    subgraph Human Interface
        WC[Web Controller] --> VS
        WEB[Browser UI]
        WEB --> WC
    end

    subgraph AI Interface
        MCP[MCP Tools] --> VS
        CC[Claude Code]
        CC --> MCP
    end

    subgraph Shared Capabilities
        VS --> WIKI[Wikilink Graph]
        VS --> SEM[Semantic Search]
        VS --> VER[Version History]
        VS --> TAGS[Tags & Metadata]
    end
```

### Wikilinks as a Free Knowledge Graph

Wikilinks (`[[note-path]]`) are a convention from Obsidian and other personal knowledge tools. You write `[[projects/ncl/roadmap]]` in a note, and it creates a link to that note. The link is bidirectional: the target note can discover all notes that link to it via backlinks.

In the database, wikilinks are stored in a `vault_links` table: `source_note_id`, `target_path`, `target_note_id`. A regex parser extracts them on every save. Unresolved links store `target_note_id = null` and get resolved automatically when a matching note is created later.

This is a graph. And PostgreSQL can traverse it.

```sql
WITH RECURSIVE neighborhood AS (
    SELECT target_note_id AS note_id, 1 AS depth
    FROM vault_links WHERE source_note_id = :start
    UNION
    SELECT vl.target_note_id, n.depth + 1
    FROM vault_links vl
    JOIN neighborhood n ON vl.source_note_id = n.note_id
    WHERE n.depth < :max_hops AND vl.target_note_id IS NOT NULL
)
SELECT DISTINCT note_id, MIN(depth) FROM neighborhood GROUP BY note_id;
```

A recursive CTE gives you multi-hop graph traversal with no new infrastructure. The same `vault_links` table powers the interactive D3 graph in the browser and the traversal tools the AI uses. The human sees nodes and edges. The agent gets structured results it can reason over. Same data, same query, different output.

### Frontmatter as the Contract Between Interfaces

Notes support optional YAML frontmatter (`title`, `tags`, `visibility` between `---` fences). Frontmatter values override API parameters. A note created by a human in the web editor and a note created by an AI via MCP produce the same format. Both interfaces parse the same metadata the same way. The format matches Obsidian conventions, so anyone who's used Obsidian already knows it.

Paths like `projects/ncl/roadmap` look like directories but they're string columns in the database. Humans browse familiar folder hierarchies. The AI renames or reorganizes with a simple update, no filesystem to manage.

## Lessons From Production

The architecture was the easy part. Shipping it surfaced problems that no design document would have predicted.

**The scheduler debugging.** Cloud Run is serverless. Containers boot on request and shut down when idle. Laravel's `schedule:run` needs to execute every minute. It checks internally which jobs are due.

My first attempt: use Cloud Scheduler to hit an HTTP endpoint that triggers `schedule:run`. I pointed it at the wrong Cloud Run job. The scheduler was calling the migration job instead. No errors, because the migration job ran successfully. It just wasn't running my scheduled tasks. Cloud Scheduler showed green. My reindexing job never ran. I spent more time debugging this than I spent designing the entire data model.

The fix: a dedicated Cloud Run job (`scheduler`) that runs `php artisan schedule:run`, triggered every minute. `schedule:run` must execute every minute even on Cloud Run, because Laravel determines internally which jobs are due. Run it less frequently and jobs get skipped silently.

**Rate limiting the free tier.** Without rate limiting, the first reindex attempt tried to embed all 152 notes at once and Voyage AI throttled it. The fix: 10 notes per batch, 25-second delay between batches. First run indexed 90 of 152 notes. The scheduler picked up the rest on subsequent runs. Boring, reliable, and free.

## Transferable Patterns

These patterns apply to any stack, not just Laravel or pgvector.

**Design your schema for both audiences from day one.** A note needs a title for display and a path for programmatic access. Tags need to be filterable in a UI and queryable via an API. Frontmatter that a human writes in YAML needs to be parseable by a service layer. Design for both from the start and you won't need a translation layer later.

**Expose the same service layer through multiple interfaces.** The vault has one service class (`VaultService`) with methods like `createNote()`, `searchNotes()`, `semanticSearch()`, `getNeighborhood()`. The web controller calls these methods. The MCP tools call these methods. The controller and the MCP layer contain no business logic. When I added `editNote()` (selective text replacement), both interfaces got it automatically. One implementation, two interfaces, zero duplication.

**Wikilinks + embeddings = poor man's knowledge graph.** Wikilinks are explicit, human-authored connections. Embeddings are implicit, computed from meaning. Together they cover two kinds of relatedness without new infrastructure. A recursive CTE on the wikilinks table gives you graph traversal. Cosine distance on the embeddings column gives you similarity search. Both run on Postgres.

**MCP tool descriptions are prompts.** They shape how the AI uses your system. A description that says "use natural language queries" produces different behavior than one that says "search query string." A description that says "prefer semantic search by default" changes the agent's default tool selection. Write your tool descriptions like you're writing instructions for a capable but unfamiliar colleague. Because you are.

---

The vault has been running for a short time. It started as a data model and 11 tools and grew faster than I expected once the service layer was in place.

If your team is losing time re-explaining decisions to AI tools that forget everything between sessions, or if you're maintaining separate systems for human knowledge and AI context, [I'd be happy to talk through the architecture](https://nonconvexlabs.com/contact).