Project studio II — build and test
- Build the MVP end-to-end matching your card-28 blueprint
- Run 5 realistic test cases and log every failure with root cause
- Tell the difference between "runs" and "works"
Systems improve through iteration, not first drafts. Expect the first working version to fail on realistic inputs — that's not a setback, that's the data you came here to collect.
- If your MVP passes 5/5 test cases on the first run, is that more likely success or an easy test set?
- Why log failures with root cause, not just a pass/fail count?
This week is focused on implementation, debugging, and testing. Getting a workflow to run is only the first step; evaluating and tightening it is what makes it useful. Expect the first working version to fail on realistic inputs.
A real test case is one the system wasn't obviously designed for — messy formatting, edge cases, out-of-distribution inputs. If all 5 test cases look like the example in the prompt, you haven't tested anything.
A paper-comparison assistant works on clean abstracts but fails on preprints with unusual formatting — forcing a preprocessing node and a stricter prompt.
▸ Use the instructor's finished build before you build yours feel what "done" looks like — then recreate it
Not loading? https://dify.32dots.de/chat/uj6fnngSnkLAvYHV
Build the cognition pieces of your project: the Chatbot / Agent / Workflow / Completion apps identified in card 28. Use Logs + the debug panel to inspect every failure. ### n8n task Build the plumbing: the trigger, the sources, the storage, the notifications. Connect to your Dify apps via HTTP Request. Run at least 5 realistic test cases end-to-end and record failures. ### Blocks needed - Dify: whichever app types your blueprint calls for - n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node - Postgres / Google Sheets: run log
Dify: whichever app types your blueprint calls forn8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output nodePostgres / Google Sheets: run log
Build the plumbing: the trigger, the sources, the storage, the notifications. Connect to your Dify apps via HTTP Request. Run at least 5 realistic test cases end-to-end and record failures. ### Blocks needed - Dify: whichever app types your blueprint calls for - n8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output node - Postgres / Google Sheets: run log
Dify: whichever app types your blueprint calls forn8n: Trigger + HTTP Request (to Dify) + at least one Storage and one Output nodePostgres / Google Sheets: run log
A real project uses both, split as in the architecture principle. Build the slice in each tool, connect via HTTP, debug separately. Keep the wiring thin — the goal is to find out which piece actually breaks first under realistic input.
- **Friendly test cases** — if your tests all look like the system prompt's example, you're measuring prompt quality, not robustness.
- **Fixing silently** — each fix should be logged: what failed, what you changed, whether the test now passes. Otherwise tomorrow-you can't reproduce today-you's thinking.
- **Skipping the end-to-end** — running only the Dify or only the n8n part leaves integration bugs for the demo.
Why is the first working version often still a bad system?
Build the MVP end-to-end (Dify apps + n8n orchestration). Test with at least 5 real cases and record every failure.
Working MVP (URLs + n8n workflow export) + a 5-case failure log with root cause per case.
Which failure surprised you most — and what does it tell you about an assumption you made in the design week?