32dots HEIDELBERG AI
Session 8

Multi-stage literature pipeline

medium
USE 0 - 15 min

Run a five-stage literature pipeline — query to comparison table

  1. 1 Go to Downloads (curriculum.32dots.de/share) and download 'Session 8 — Multi-stage literature pipeline'.
  2. 2 In n8n: ⋯ → Import from file. Open the chat panel.
  3. 3 Type: 'mTOR inhibitor resistance mechanisms in breast cancer'.
  4. 4 Wait — the pipeline runs 5 stages (watch the execution log on the right as each node lights up).
  5. 5 Read the Markdown table in the response. Check: does the AI correctly identify methods and limitations?
  6. 6 Run again with your own research topic.
  7. 7 Click into the 'Stage 3 — AI Extract' node in the execution log. Read the raw JSON it returned.

You see a comparison table with at least 3-4 papers. You can identify which stage is Stage 3 (AI extraction) and describe what it does.

UNDERSTAND 15 - 60 min

Five-stage pipeline design

🔎Stage 1 — SearchCODE NODE + PUBMED ESEARCH📥Stage 2 — FetchCODE NODE + PUBMED EFETCH🧩Stage 3 — AIExtractAI AGENT (TYPEVERSION 1.7) — STRUCTURED EXTRACTION📊Stage 4+5 —Filter andFormatCODE NODE — JSON PARSING, FILTERING, MARKDOWN FORMATTING
Key concept

Keep each stage's responsibility obvious and testable. You can copy one abstract into the Stage 3 AI Agent and run it alone to check extraction quality. Separation of concerns is not just good engineering — it is good scientific workflow design.

  1. ?What happens if the AI returns slightly malformed JSON in Stage 3? Open the Code node and find where this is handled.
  2. ?The extraction prompt asks for 5 fields. What happens if an abstract does not mention methodology? Is the result filtered out?
  3. ?How would you extend Stage 5 to also produce a BibTeX citation file alongside the Markdown table?
BUILD 60 - 90 min

Add a sixth stage: citation counts

After Stage 2 (Fetch), add a Semantic Scholar API call that retrieves citation counts, then incorporate them into the Stage 5 table.

  1. 1 After Stage 2 — Fetch (PubMed efetch), add an HTTP Request: GET https://api.semanticscholar.org/graph/v1/paper/PMID:{pmid}?fields=citationCount — start with one PMID ($('Stage 2 — Fetch').first().json.ids.split(',')[0]).
  2. 2 Add a Set node that extracts citationCount and passes it forward alongside the abstracts.
  3. 3 Update the Stage 3 AI Extract system prompt: add a 'citations' field to the requested JSON (pass the count as context).
  4. 4 Update the Stage 4+5 Code node to include a Citations column in the Markdown table.
  5. 5 Test with a well-known paper. Does the count match Google Scholar?
  6. 6 Test with a paper from 2024. What happens when citation data is not yet available?
Deliverable

Screenshot of a comparison table with a Citations column, plus a one-sentence note on what happened with the newest paper.

Your pipeline runs in about 60 seconds for 5 papers. How would you adapt it to run nightly on a saved PubMed search and send you a Mattermost message when new papers appear — without triggering it manually?