Build a grounded chatbot over your own PDF

USE 0 - 12 min

Chat with a paper from your own field in under 15 minutes

Dify is an open-source, source-available Studio for building production-ready LLM apps — RAG chatbots, agents, and multi-step workflows — assembled on a visual drag-and-drop canvas rather than written in code. The fastest way to feel what it does is to upload a single PDF and ask a question about it: Dify chunks and embeds the document into a managed knowledge base, then answers with a grounded citation pulled straight from your file. For personal use, point yourself at the Dify Cloud Sandbox (free) or a Docker self-host; the course instance at dify.32dots.de is currently paused.

1 Open the Dify Studio. On Dify Cloud the Sandbox (Free) tier costs $0 and gives you 200 message credits, 50 MB of knowledge storage, and 50 knowledge documents — plenty for this lesson. (Self-hosting via Docker needs roughly 4 GB of RAM.)
2 Click Create App → Chatbot. This is one of Dify's three app types (Chatbot, Agent, Workflow). Give it a name.
3 Open the Knowledge tab and upload a PDF — a paper from your own field works best. Dify chunks the document and embeds it into a managed knowledge base for grounded Q&A.
4 Hit Publish. Dify gives you a public chat URL (and an embeddable widget) for the app you just made.
5 Ask the deployed chatbot: What is the main finding of this paper? Read the answer — it should quote your document and show a source citation.

✓

Your published chatbot answered a question about your PDF with a grounded citation drawn from the document you uploaded.

UNDERSTAND 12 - 22 min

What RAG is doing under the hood

Your chatbot did not 'know' the paper — it retrieved relevant chunks of it at question time and fed them to the model. That pattern is RAG (retrieval-augmented generation), and it is what makes the answer grounded and citable instead of a confident guess.

Key concept

Dify's built-in RAG pipeline turns your uploaded files into a managed knowledge base: it splits each document into chunks, embeds them, and at question time retrieves the chunks most relevant to your prompt before the model writes its answer. The citation you saw is the chunk it retrieved. This is why the same chatbot, pointed at a different document, gives different grounded answers without any code change.

?If the chatbot answered well, which sentence in your PDF do you think it retrieved? Open the cited source to check.
?What would happen if you asked a question whose answer is NOT in the document? Try it and watch how the model behaves.
?On the Sandbox tier you have 200 message credits and 50 knowledge documents — which limit would you hit first in a real classroom?

BUILD 22 - 30 min

Make the chatbot answer in your own voice

A grounded answer is only useful if it lands the way your reader needs. The chatbot's behaviour is set by its instructions, not by code — so you can reshape it in plain language.

Your task

Edit the chatbot's instructions so it answers as a concise study aid, then re-ask your question and confirm it still cites the source.

1 Open your chatbot's prompt/instructions field in the Studio.
2 Write an instruction such as: Answer in three short bullet points aimed at a first-year student, and always name the source.
3 Re-publish, then ask What is the main finding of this paper? again.
4 Confirm the format changed AND a citation is still present — grounding should survive the rewrite.

Deliverable

A published chatbot whose answers follow your instruction format while still citing the uploaded paper.