Session 29

Project studio II — build and test

hard

By the end, you can

Build the MVP end-to-end matching your card-28 blueprint
Run 5 realistic test cases and log every failure with root cause
Tell the difference between "runs" and "works"

Why this matters

Systems improve through iteration, not first drafts. Expect the first working version to fail on realistic inputs â€” that's not a setback, that's the data you came here to collect.

Warm-up · answer in one sentence

If your MVP passes 5/5 test cases on the first run, is that more likely success or an easy test set?
Why log failures with root cause, not just a pass/fail count?

Explanation

This week is focused on implementation, debugging, and testing. Getting a workflow to run is only the first step; evaluating and tightening it is what makes it useful. Expect the first working version to fail on realistic inputs.

A real test case is one the system wasn't obviously designed for â€” messy formatting, edge cases, out-of-distribution inputs. If all 5 test cases look like the example in the prompt, you haven't tested anything.

Scientific example

A paper-comparison assistant works on clean abstracts but fails on preprints with unusual formatting â€” forcing a preprocessing node and a stricter prompt.

Try the reference bot first

▸ Use the instructor's finished build before you build yours feel what "done" looks like — then recreate it

Not loading? https://dify.32dots.de/chat/uj6fnngSnkLAvYHV

What you're building

Finished Project studio II — build and test — The finished app — use this as your target

ask Ask a question about "Project studio II — build and test"

Reference

n8n docs ↗ open
Dify docs ↗ open

Dify Task Open chat → Explore workflow ↗

Build the cognition pieces of your project: the Chatbot / Agent / Workflow / Completion apps identified in card 28. Use Logs + the debug panel to inspect every failure. ### n8n task Build the plumbing: the trigger, the sources, the storage, the notifications. Connect to your Dify apps via HTTP Request. Run at least 5 realistic test cases end-to-end and record failures. ### Blocks needed - Dify: whichever app types your blueprint calls for - n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node - Postgres / Google Sheets: run log

Blocks needed

Dify: whichever app types your blueprint calls for
n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node
Postgres / Google Sheets: run log

n8n Task Open in n8n → 🔑 student@cos.32dots.de · cos2026

Build the plumbing: the trigger, the sources, the storage, the notifications. Connect to your Dify apps via HTTP Request. Run at least 5 realistic test cases end-to-end and record failures. ### Blocks needed - Dify: whichever app types your blueprint calls for - n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node - Postgres / Google Sheets: run log

Nodes needed

Dify: whichever app types your blueprint calls for
n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node
Postgres / Google Sheets: run log

Why this platform

A real project uses both, split as in the architecture principle. Build the slice in each tool, connect via HTTP, debug separately. Keep the wiring thin â€” the goal is to find out which piece actually breaks first under realistic input.

Common pitfalls

**Friendly test cases** â€” if your tests all look like the system prompt's example, you're measuring prompt quality, not robustness.
**Fixing silently** â€” each fix should be logged: what failed, what you changed, whether the test now passes. Otherwise tomorrow-you can't reproduce today-you's thinking.
**Skipping the end-to-end** â€” running only the Dify or only the n8n part leaves integration bugs for the demo.

Self-check · tick before you mark done

5 test cases include at least 2 adversarial/edge inputs
Every failure has root cause + fix logged
End-to-end runs (not just the Dify piece or the n8n piece)
You can point to one change you made *because* of a test failure
You can explain in one sentence what you learned that you would tell a labmate tomorrow

Student riddle

Why is the first working version often still a bad system?

Your answer first — then reveal

Mini project

Build the MVP end-to-end (Dify apps + n8n orchestration). Test with at least 5 real cases and record every failure.

Deliverable

Working MVP (URLs + n8n workflow export) + a 5-case failure log with root cause per case.

Reflection · one sentence in your journal

Which failure surprised you most â€” and what does it tell you about an assumption you made in the design week?