你的文件系统已经是一个图数据库。

你的文件系统已经是一个图数据库。
Your File System Is Already A Graph Database

原始链接: https://rumproarious.com/2026/04/04/your-file-system-is-already-a-graph-database/

## 使用LLM和Obsidian构建个人知识库受Andrew Karpathy的工作启发，作者详细介绍了一种长达十年的实践：使用简单文件系统（Obsidian和markdown文件）——而非复杂的向量数据库——作为强大的个人知识库，并利用LLM进行增强。核心思想是超越零散的笔记，创建一个用于*上下文工程*的系统。与其无休止地为设计文档或项目交接等任务重新收集信息，不如将所有内容——会议记录、Slack对话、文档——集中起来，并使用受PARA启发的文件夹结构（项目、领域、人物、每日/会议）进行组织。Markdown文件中的维基链接创建了一个可导航的互联知识“图谱”。然后，LLM充当该图谱的自然语言查询引擎。通过向LLM提供相关项目文件夹的访问权限，输出将得到显著改善，因为模型使用*实际*的历史记录，而不仅仅是概括性的回忆。这使得LLM交互从基本的辅助转变为高度知情、感知上下文的工作。最大的挑战仍然是自动化收件箱处理——有效地分类和整合新信息。然而，即使从小处着手——建立基本的文件夹结构并坚持在会议后做笔记——也能立即获得好处，让工作随着时间的推移而积累。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录你的文件系统已经是一个图数据库 (rumproarious.com) 5 分，alxndr 1小时前 | 隐藏 | 过去 | 收藏 | 1 条评论帮助 alxndr 1小时前 | 上一个 [–] > […] 知识库不仅仅用于研究。它是一个上下文工程系统。你正在构建你的 LLM 需要的精确输入来完成有用的工作。 > […] 提示 “帮我写一个限速服务的设计文档” 和提示一个可以访问你的项目文件夹、六个月的会议记录、三个先前设计文档、团队讨论方法论的 Slack 线程以及你对现有架构的笔记的 LLM 之间存在真正的差异。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

Karpathy recently posted about using LLMs to build personal knowledge bases — collecting raw sources into a directory, having an LLM “compile” them into a wiki of interlinked markdown files, and viewing the whole thing in Obsidian. He followed it up with an “idea file,” a gist you can hand to your agent so it builds the system for you.

This is a great idea, I’ve been doing some form of this for over a decade. My Staff Eng co-host @davidnoelromas reached out after the tweet to ask for more details on how I’ve been using obsidian and AI. This an expanded version of what I told him.

I’ve collected possibly too many markdown files.

find . -type f | wc -l
52447

That’s my obsidian vault, and I use it with AI everyday without a special database, or a vector store, or a RAG pipeline. It’s merely files on disk.

The problem this actually solves

Think about the context you carry around in your head for your job. The history of decisions on a project. What you discussed with your manager three months ago. The Slack thread where the team landed on an approach. The Google Doc someone shared in a meeting you half-remember. The slowly evolving understanding of how a system works that lives across fifteen people’s heads and nowhere else.

Now think about what happens when you need to produce something from all that context. A design doc. A perf packet. A project handoff. An onboarding guide for a new team member. You spend hours reassembling context from Slack, docs, emails, your own memory, and you still miss things.

The knowledge base turns this into a system instead of a scramble.

The architecture

A file system with markdown and wikilinks is already a graph database. Files are nodes. Wikilinks are semantic edges. Folders introduce taxonomy. You don’t need a special MCP server or plugin. The file system abstraction is the interface, and LLMs are surprisingly good at navigating it.

I use a structure borrowed from Tiago Forte’s Building a Second Brain, with the PARA taxonomy as a starting point, extended with categories that match how I actually work:

/projects/{name}
/areas/{topics}
/people/{slack_handle}
/daily/{year}/{month}/{day}/
/meetings/{year}/{month}/{day}/

Markdown files are nodes, wikilinks ([[target]]) are edges, the folder taxonomy is the schema and LLMs is the query engine. A graph database with a natural language query interface. No infrastructure required.

How it works day to day

After every meeting, the agent creates a note in daily/{year}/{month}/{day}/, downloads any attached Google Docs, and links everything to the long-running notes I keep for each person I interact with regularly. A note from a 1:1 with my boss JP gets a wikilink to [[/people/jp|jp]] and to whatever projects we discussed.

Over months, each person’s note becomes a timeline of every conversation, decision, and open thread. Each project folder accumulates every relevant artifact. You don’t have to remember where things are. The graph remembers.

For a work project, I can point the agent at a starting doc and say: “Spider through every tool you have access to and pull down all the related context.” It grabs Slack threads, Google Docs, web resources, all rendered as markdown inside the project folder. From that assembled context, the agent can draft design docs, product vision statements, problem/solution analyses. The output is better than prompting cold because the LLM is working with the real history of the project, not your summary of it.

This is the part Karpathy’s tweet hints at but doesn’t fully spell out: the knowledge base isn’t just for research. It’s a context engineering system. You’re building the exact input your LLM needs to do useful work.

What makes this different from just using an LLM

You might be thinking: I already ask Claude to help me write a design doc. True. But there’s a real difference between prompting “help me write a design doc for a rate limiting service” and prompting an LLM that has access to your project folder with six months of meeting notes, three prior design docs, the Slack thread where the team debated the approach, and your notes on the existing architecture.

The knowledge base is a context engineering system. You’re not building a wiki for the sake of having a wiki. You’re building the input layer that makes every future LLM interaction better. Every meeting note, every linked decision, every filed artifact improves the quality of every query that follows.

Where this is still hard

The piece I haven’t cracked is automated inbox processing. The idea is straightforward: web clippings, meeting notes, Slack saves, and random captures all land in an inbox folder. The agent processes everything new, applies progressive summarization, breaks content into atomic pieces, correlates each piece with the right project, area, or person.

I have a graveyard of experiments here. The LLM is good at summarizing and categorizing. The hard part is defining what “processed” means in a way that’s consistent enough to be useful six months later but flexible enough to handle the variety of stuff that lands in an inbox. Every attempt has been either too rigid (everything gets the same treatment) or too loose (the vault drifts into chaos).

If you’ve solved this, I’d genuinely like to hear about it.

Getting started

You don’t need 52,000 files to get value from this. Start with three things:

One: Create the folder structure. Projects, areas, people, daily. Even empty, the taxonomy gives you and the LLM a schema.

Two: After your next meeting, have the agent create a note and link it to the relevant person and project. Do this for a week. Watch the graph start to form.

Three: The next time you need to write something, a design doc, a status update, a perf self-review, point the agent at the relevant folders and ask it to draft from what’s there.

The difference is noticeable right away. Not because the LLM is smarter, but because it finally has the context to be useful.

Your work compounds. That’s the thing that feels genuinely new.