32dots HEIDELBERG AI
Session 29

Project studio II — build and test

hard
  • Build the MVP end-to-end matching your card-28 blueprint
  • Run 5 realistic test cases and log every failure with root cause
  • Tell the difference between "runs" and "works"

Systems improve through iteration, not first drafts. Expect the first working version to fail on realistic inputs — that's not a setback, that's the data you came here to collect.

  • If your MVP passes 5/5 test cases on the first run, is that more likely success or an easy test set?
  • Why log failures with root cause, not just a pass/fail count?

This week is focused on implementation, debugging, and testing. Getting a workflow to run is only the first step; evaluating and tightening it is what makes it useful. Expect the first working version to fail on realistic inputs.

A real test case is one the system wasn't obviously designed for — messy formatting, edge cases, out-of-distribution inputs. If all 5 test cases look like the example in the prompt, you haven't tested anything.

A paper-comparison assistant works on clean abstracts but fails on preprints with unusual formatting — forcing a preprocessing node and a stricter prompt.

▸ Use the instructor's finished build before you build yours feel what "done" looks like — then recreate it

Not loading? https://dify.32dots.de/chat/uj6fnngSnkLAvYHV

Finished Project studio II — build and test
The finished app — use this as your target
ask Ask a question about "Project studio II — build and test"

Build the cognition pieces of your project: the Chatbot / Agent / Workflow / Completion apps identified in card 28. Use Logs + the debug panel to inspect every failure. ### n8n task Build the plumbing: the trigger, the sources, the storage, the notifications. Connect to your Dify apps via HTTP Request. Run at least 5 realistic test cases end-to-end and record failures. ### Blocks needed - Dify: whichever app types your blueprint calls for - n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node - Postgres / Google Sheets: run log

Blocks needed
  • Dify: whichever app types your blueprint calls for
  • n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node
  • Postgres / Google Sheets: run log
n8n Task Open in n8n → 🔑 student@cos.32dots.de · cos2026

Build the plumbing: the trigger, the sources, the storage, the notifications. Connect to your Dify apps via HTTP Request. Run at least 5 realistic test cases end-to-end and record failures. ### Blocks needed - Dify: whichever app types your blueprint calls for - n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node - Postgres / Google Sheets: run log

Nodes needed
  • Dify: whichever app types your blueprint calls for
  • n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node
  • Postgres / Google Sheets: run log

A real project uses both, split as in the architecture principle. Build the slice in each tool, connect via HTTP, debug separately. Keep the wiring thin — the goal is to find out which piece actually breaks first under realistic input.

  • **Friendly test cases** — if your tests all look like the system prompt's example, you're measuring prompt quality, not robustness.
  • **Fixing silently** — each fix should be logged: what failed, what you changed, whether the test now passes. Otherwise tomorrow-you can't reproduce today-you's thinking.
  • **Skipping the end-to-end** — running only the Dify or only the n8n part leaves integration bugs for the demo.

Why is the first working version often still a bad system?

Build the MVP end-to-end (Dify apps + n8n orchestration). Test with at least 5 real cases and record every failure.

Deliverable

Working MVP (URLs + n8n workflow export) + a 5-case failure log with root cause per case.

Which failure surprised you most — and what does it tell you about an assumption you made in the design week?