AI that reads your documents
Ask a document any question — get a grounded answer
- 1 Go to Downloads (curriculum.32dots.de/share) and download 'Session 3 — AI that reads your documents'.
- 2 In n8n: click the ⋯ menu → Import from file. Select the downloaded JSON.
- 3 The workflow opens. Click 'Chat' (bottom right) to open the chat panel.
- 4 Ask: 'What is a transformer architecture?'
- 5 Ask: 'What is the difference between a language model and an embedding model?'
- 6 Ask: 'Who invented the Higgs boson?' (Not in the document — notice what happens.)
- 7 Open the Document URL node and change the URL to any Wikipedia page about your research topic.
You got grounded answers for the first two questions and a correct refusal (or honest 'not in document') for the third.
How document QA works in n8n
This pattern — fetch a document, stuff it into context, answer from it — works for any public URL. The limitation is the context window (~12,000 chars here). For longer documents you need chunking and retrieval (RAG). Session 14 covers that.
- ?What happens when you ask about something not in the document? Is the refusal consistent?
- ?Open the Prepare Context node. Where does $('When chat message received').first().json.chatInput appear — and why not just $json.chatInput?
- ?What would break if you removed the Simple Memory node?
Point the workflow at your own document
Change the Document URL to a Wikipedia article, a PubMed abstract, or any public page in your field. Verify the AI answers correctly and refuses questions outside the document.
- 1 Open the Document URL node. Paste a URL for a page relevant to your research.
- 2 Open the chat. Ask three questions: one clearly answered by the page, one at the edge, one definitely outside.
- 3 In the Prepare Context node, change substring(0, 12000) to a larger or smaller value. Test how this affects response quality.
- 4 Change the system prompt in the AI Agent to add a citation format: 'Always end your answer with: Source: [section name]'.
- 5 Try two different types of documents (e.g. a Wikipedia article and a PubMed abstract). Which one answers better and why?
Share a screenshot of three test questions (one inside, one edge, one outside) with a one-sentence explanation of why the edge case worked or failed.
✎This workflow always reads the full page on every question. What are the tradeoffs compared to chunking the document once and storing it in a vector database?