Session 5

Pulling data from scientific databases

medium

USE 0 - 15 min

Query PubMed with natural language — no credentials needed

1 Go to Downloads (curriculum.32dots.de/share) and download 'Session 5 — Pulling data from scientific databases'.
2 In n8n: ⋯ → Import from file. Open the chat panel.
3 Type: 'Find papers about CRISPR base editing from 2024'.
4 Wait for the response — the workflow makes two API calls before the AI answers.
5 Type: 'What are the main limitations mentioned in those papers?'
6 Type: 'Find me papers about mRNA vaccine immunogenicity'. Note: a new search runs.
7 Look at the execution log on the right — click each node to see its output at each stage.

✓

You see AI-summarised paper lists for both queries. You can explain what each node in the execution log produced.

UNDERSTAND 15 - 60 min

The two-step PubMed API pattern

Key concept

Scientific databases are not magic — they are REST APIs returning structured data. The pattern is always: search → get IDs → fetch details → extract fields. Once you know this pattern, connecting to UniProt, ChEMBL, or Semantic Scholar is identical.

?What is the NCBI rate limit for unauthenticated API calls? Where in the workflow would you add a delay to avoid hitting it?
?Change retmax=5 to retmax=20 in the Build Search URL Code node. What happens to response quality vs. cost?
?UniProt also has a REST API. What would the esearch-equivalent URL look like to find all human proteins involved in apoptosis?

BUILD 60 - 90 min

Extend to a second database

Your task

After the AI answer, add a Semantic Scholar API call to retrieve citation counts for the top papers, and include that data in the response.

1 Note the PMIDs in the Extract IDs node output.
2 After the Fetch Abstracts node, add an HTTP Request: GET https://api.semanticscholar.org/graph/v1/paper/PMID:{pmid}?fields=citationCount
3 For simplicity, do this for just the first PMID (use $json.ids.split(',')[0]).
4 Add a Set node: citationCount = $json.citationCount, pmid = $json.paperId.
5 Update the Prepare Context Set node to include the citation count.
6 Update the AI system prompt to mention citation counts in the summary.
7 Test: does the AI now include citation data in its response?

Deliverable

Screenshot of a workflow run that includes citation count data in the AI's response.

Self-check · tick before you mark done

I can explain the two-step PubMed pattern: esearch for IDs, efetch for content.
I understand why the session key requires the full $('When chat message received') expression.
I connected to a second scientific API and included its data in the output.

✎Your workflow runs a fresh PubMed query on every message. What would you need to change to cache results — so repeat queries for the same topic don't burn rate limits?