Observability and logs
- Log model-level AND pipeline-level events into one queryable Postgres table
- Reconstruct a past decision from logs alone (no guessing)
- Name the three fields every AI log row should contain
Systems cannot be improved if their behavior is invisible. Most bug reports for AI systems are unactionable because the logs aren't good enough to reconstruct what happened — fix that first, then fix everything else.
- If a user says "the bot gave a weird answer yesterday afternoon", what do you need in your logs to reproduce it?
- Why is "success/failure" a useless log field on its own?
Observability means capturing what the system did, what input it received, what outputs it produced, which tools were called, and where failures occurred. This is essential for debugging, accountability, and reproducibility.
Minimum fields per row: timestamp, input hash (not raw input, if sensitive), model + tokens, tool calls, output snippet, rubric score. Log before and after every LLM call — "what we sent" vs "what it returned" is the most valuable diff you'll ever have.
Your pipeline ran overnight. These 5 log lines appeared. Which require immediate action vs routine review vs ignore?
A log shows: abstract X was processed, classified as "methods", routed to branch 2, reviewed with score 3/4, and delivered to PI inbox at 14:07.
▸ Use the instructor's finished build before you build yours feel what "done" looks like — then recreate it
Not loading? https://dify.32dots.de/chat/nRz3495614RET1H0
Use Dify's built-in **Logs** tab for any app (every run is captured: inputs, outputs, tokens, latency). Then extend: add a **Code** node at the end of your Workflow that POSTs `{run_id, input, output, rubric_score}` to a webhook. Inspect the Logs + your own webhook receiver side by side. ### n8n task Take an existing workflow. Add a Set/Code node that builds a log object `{timestamp, input_hash, classification, model, tokens, output_snippet}`, and append it to a Google Sheet or Postgres table. Also screenshot n8n's built-in Executions list. ### Blocks needed - Dify: built-in Logs tab + a Code/HTTP node for external logging - n8n: Set/Code + Google Sheets / Postgres append + built-in Executions view - External sink: same Postgres table shared by both
Dify: built-in Logs tab + a Code/HTTP node for external loggingn8n: Set/Code + Google Sheets / Postgres append + built-in Executions viewExternal sink: same Postgres table shared by both
Take an existing workflow. Add a Set/Code node that builds a log object `{timestamp, input_hash, classification, model, tokens, output_snippet}`, and append it to a Google Sheet or Postgres table. Also screenshot n8n's built-in Executions list. ### Blocks needed - Dify: built-in Logs tab + a Code/HTTP node for external logging - n8n: Set/Code + Google Sheets / Postgres append + built-in Executions view - External sink: same Postgres table shared by both
Dify: built-in Logs tab + a Code/HTTP node for external loggingn8n: Set/Code + Google Sheets / Postgres append + built-in Executions viewExternal sink: same Postgres table shared by both
Dify's Logs are great for LLM-specific telemetry (tokens, latency, which tool was called). n8n's Executions are great for pipeline telemetry (which node failed, full payload at each step). Real systems need both: Dify logs for "what did the model do?", n8n logs for "what did the pipeline do?". Point both at the same Postgres and you have one queryable truth.
- **Logging raw PII** — hash or redact sensitive inputs before they hit the log table, or the log itself becomes a compliance problem.
- **No correlation ID** — without a `run_id` linking Dify and n8n rows, you can't trace one request across both systems.
- **Logs that no one reads** — if you can't show a weekly summary from your log table, you're hoarding rows, not observing.
Why is system logging part of scientific rigor?
Add logging to the seminar-inbox assistant (from card 16) — every run goes into a single shared Postgres table with enough fields to reconstruct the decision.
Workflow with explicit run log + screenshot of 10 logged runs in the sink.
For the system you'd most like to improve, what's the one field that *isn't* in your logs today that would let you diagnose the next bug?