I made an LLM agent build a spoiler-free wiki about a novel where the reader can select the latest chapter they've read, and then see pages written with only contents of the book up to that chapter.
I've wanted something like this since I read The Wheel Of Time a decade ago. Reading epic fantasy and other novels comes with needing to remember a lot of characters and places etc, sometimes over multiple years. However, in that time, you can't look up information about characters etc in various fan-built resources because spoilers will be present.
A couple years ago I started trying to build this, but with a fixed pipeline of named entity resolution followed by individual page updates. But it ended up that in place of trying to build a specific pipeline for how a chapter should be processed, and mostly as an excuse to have some fun, the LLM is given a bunch of tools and the ability to start sub-agents and tries to figure out how to update the whole wiki for a given chapter on its own. It was moderately successful! You can see an example of the output for Anathem by Neal Stephenson at Avout Archive.
Each chapter starts an agent with the system prompt and chapter prompt. While that chapter is being processed, all wiki pages written are considered part of that chapter, and the agent only has access to the text of that chapter and previous chapters. All the links are processed afterwards to link to other pages as-of the same chapter. Once the agent returns control to the user, the current chapter in the system is advanced and a new agent is started for that chapter.
The agent has access to the following nine tools:
ReadChapter: Responds with the text of the requested chapter. Originally I had it take an identifier for the chapter as input, but those ids were zero-indexed, and chapters in the book were one-indexed, and that confused the LLM a lot. I changed the tool to take an offset from the current chapter in case the LLM wanted to read a previous chapter again. That was entirely unnecessary - it never read any chapter other than the current one.
WriteWikiPage:
Lets the agent rewrite a whole wiki page. The agent can ignore fields to preserve their old values, but notably, there's no way to edit just a part of the body of the page. The title and body are self-explanatory. The summary is what's shown in search results. The names
field is a list of names the agent associates with the page for use in the search tool. There's also a parameter so the agent can delete the page and redirect all links currently pointing to the page to some other page.
ReadWikiPage: Shows the agent the current contents of a specified wiki page.
SearchWikiByName: Given a list of names for the same entity, each is compared to all names specified on pages throughout the wiki with rapidfuzz, and the results for each query-name are combined with reciprocal rank fusion. The top results with their summary are then returned to the agent. I probably should have implemented search as just full text search on the wiki, but fuzzy+rrf was a hold over from how I had been thinking of doing this before deciding to have everything done by an agent.
SpawnAgent: Starts a new agent with the system prompt and whatever prompt this agent specified; more on that below. When that agent returns control to the user, the contents of the last assistant text message are given as the response to this tool.
WritePrompt: The prompts that SpawnAgent can use must be stored in the database instead of entirely written from scratch each time. Each prompt has a key, a short summary, and the full template. I used Python template string as the format.
ListPrompts: Shows the agent all prompt keys in the system along with their summary.
ShowPrompt: Shows the agent the details for a given prompt key.
RequestExpertFeedback: This tool gives the LLM a place to write any questions, and then I can fill in the tool response with whatever I want. This slowed down processing a lot because inevitably it would ask a question as soon as I went to sleep, but gave some insight into common problems it encountered.
I implemented context compression because I was worried the context windows would get too large, but that never happened. The largest chat was 150k tokens of the 400k limit for GPT-5.
Letting the whole process be guided by the LLM was pretty cool because it found cases I hadn't thought about yet. Notably in an early test run, it started using the search tool to make sure links were using valid url slugs. I added link validation to WriteWikiPage after that so it wouldn't burn tokens on such a simple check.
The code for generating a wiki is on GitHub. Two good entry points for reading the code are the main loop and the SQL schema.
Claude Code was used a lot in making this which was pretty fun. A few things that stood out about working with it for me:
- It wrote the fuzzy string matching with RRF across multiple queries in basically one shot. That was awesome.
- It regularly made a change I requested, saw tests fail, and decided I didn't want what I said so it would revert the change. That was not awesome.
I ran the system multiple times: First, many times on a small demo book I had ChatGPT write for me for development. Then on a small novel a few times. Finally on Anathem. Here's a review of the final run on Anathem.
To generate Avout Archive, I used GPT-5 with high reasoning, high verbosity, and the flex service tier. In total, it used 575 million input tokens and produced 40 million output tokens, for a total cost of $282. That took about four days, but it regularly was stopped waiting for feedback from me.
In total there were 45,158 tool calls and 4262 individual agents. The distribution of token size of each agent at completion was:
- Max: 156,688
- 90%: 42,442
- 50%: 29,750
- Min: 10,271
The system used the RequestExpertFeedback tool 36 times, here's a summary of the questions it asked:
- Seven times it was unsure if something should be one page or multiple pages. These were great questions! This is what I had hoped it would use the tool for.
- Nine times it got confused by WriteWikiPage modifying the names that it gave a page. Notably, I stripped any leading 'the' from the names to avoid duplicates, but it didn't like that the names changed.
- Nine times it got confused by absurdly bad rules it put into its prompts. Notable cases were making up rules for the url slugs and worrying about moving them around; Deciding at one point that two pages should never share any names; and artificially limiting the number of links allowed on a page and then not being sure what to do when updating pages with a lot of links.
- Four times it got confused by asking named entity resolution to provide a type for each entity, but the entity processor then discovered that type was incorrect.
- Three times when spawning a new agent, the existing agent provided incomplete input data with an ellipsis at the end.
- One time it didn't actually ask for feedback, it told me politely that it was annoyed by my automated reminders to use parallel tool calls.
- Three times it got confused because the prompt template ended up looking like this:
## Inputs
- $entity_display_name: Human-friendly display used for the entity in the chapter
...
But when values were applied to the template that looked like this:
## Inputs
- Orth: Human-friendly display used for the entity in the chapter
...
Here are the summaries of the final prompts the system wrote. You can see all the prompt versions on GitHub.
Reads the latest started chapter and returns a strict JSON array of wiki‑worthy entities with attested aliases plus drop‑article variants, ASCII fallbacks, and singular/plural where appropriate; cross‑checks existing pages to unify nickname/formal aliases; excludes generic one‑offs; no commentary, no types.
Processes one entity from the current chapter: search or create/update its wiki page with spoiler-free facts; STRICT names hygiene with fail‑closed enforcement (title string, display name, drop‑article base, ASCII fallbacks, singular/plural when attested), exclude non‑names (possessives/institutions) from Known names, robust discovery that prefers existing pages, duplicate merge via redirect, validated linking with base unarticled display text; concise quick‑reference focus.
Dispatches per-entity processors in parallel for the current chapter’s NER entities and aggregates reports with coverage/duplicate checks; HARD names-hygiene enforcement; cross‑slug duplicate detection via overlapping aliases with optional merge via redirect; link‑display consistency notes; avoids slug canonicalization.
Summarizes the latest started chapter and saves it to 'chapter-summary' with a concise overview and factual bullets; validates links, prefers base unarticled link text (including named events), avoids meta and composite slugs; uses ASCII display text for link display where diacritics/curly punctuation would otherwise appear.
Checks that all entities from NER (no types) received valid processor reports; flags gaps/duplicates; returns a concise audit report without any slug canonicalization advice.
Performs a focused quality check on a sample of pages updated/created this round and proposes prompt improvements without enforcing link-count limits or slug canonicalization.
Reads an existing page by slug, creates a canonical page with a preferred slug/title and merged aliases, then redirects the old slug to the new one. Ensures links are valid and content remains spoiler-free.
Reads a specific wiki page and applies targeted quality fixes, including Names hygiene, link hygiene, bullet normalization, and now optional encoding sanitization and summary text replacements.
Reads a wiki page and applies safe, literal spelling corrections to the Body based on a provided replacements map, then updates the page without adding spoilers.
Scans a page and removes links where a named feature slug is used generically (based on cue keywords), replacing with plain lower-case text.
- The summary of chapter five is pretty funny.
- It was pretty inconsistent with capitalization and definite articles in page titles, e.g., “suitsack”, “Farspark”, “the Fall of Baz”.
- I really should have added a way to manually start conversations that run separate from the main process so I could ask it to fix things that were clearly problems. When it surfaced problems with RequestExpertFeedback, I would regularly tell it to fix prompts, but it didn't want to because it was busy completing some other task; and my attempts at convincing it that expert feedback should be prioritized more highly didn't work super well.
- If you're genuinely curious to see the full logs from the run, email me and I'll get you a version with the book text redacted.