Session 20

Observability and logs

medium

By the end, you can

Log model-level AND pipeline-level events into one queryable Postgres table
Reconstruct a past decision from logs alone (no guessing)
Name the three fields every AI log row should contain

Why this matters

Systems cannot be improved if their behavior is invisible. Most bug reports for AI systems are unactionable because the logs aren't good enough to reconstruct what happened â€” fix that first, then fix everything else.

Warm-up · answer in one sentence

If a user says "the bot gave a weird answer yesterday afternoon", what do you need in your logs to reproduce it?
Why is "success/failure" a useless log field on its own?

Explanation

Observability means capturing what the system did, what input it received, what outputs it produced, which tools were called, and where failures occurred. This is essential for debugging, accountability, and reproducibility.

Minimum fields per row: timestamp, input hash (not raw input, if sensitive), model + tokens, tool calls, output snippet, rubric score. Log before and after every LLM call â€” "what we sent" vs "what it returned" is the most valuable diff you'll ever have.

scenario

Your pipeline ran overnight. These 5 log lines appeared. Which require immediate action vs routine review vs ignore?

[WARN] LLM latency 4.2s â€” above 2s threshold on 3 consecutive calls

[INFO] Pipeline completed. 847 records processed. 0 errors.

[ERROR] Tool call to PubMed returned 429 (rate limit). Retry in 60s.

[WARN] Output contained 'patient_id=P-4821' â€” possible PII in response

[DEBUG] Embedding model: text-embedding-3-small, dim=1536, batch=32

Scientific example

A log shows: abstract X was processed, classified as "methods", routed to branch 2, reviewed with score 3/4, and delivered to PI inbox at 14:07.

Try the reference bot first

▸ Use the instructor's finished build before you build yours feel what "done" looks like — then recreate it

Not loading? https://dify.32dots.de/chat/nRz3495614RET1H0

What you're building

Finished Observability and logs — The finished app — use this as your target

ask Ask a question about "Observability and logs"

Reference

n8n docs home ↗ open
Dify monitoring ↗ open

Dify Task Open chat → Explore workflow ↗

Use Dify's built-in **Logs** tab for any app (every run is captured: inputs, outputs, tokens, latency). Then extend: add a **Code** node at the end of your Workflow that POSTs `{run_id, input, output, rubric_score}` to a webhook. Inspect the Logs + your own webhook receiver side by side. ### n8n task Take an existing workflow. Add a Set/Code node that builds a log object `{timestamp, input_hash, classification, model, tokens, output_snippet}`, and append it to a Google Sheet or Postgres table. Also screenshot n8n's built-in Executions list. ### Blocks needed - Dify: built-in Logs tab + a Code/HTTP node for external logging - n8n: Set/Code + Google Sheets / Postgres append + built-in Executions view - External sink: same Postgres table shared by both

Blocks needed

Dify: built-in Logs tab + a Code/HTTP node for external logging
n8n: Set/Code + Google Sheets / Postgres append + built-in Executions view
External sink: same Postgres table shared by both

n8n Task Open in n8n → 🔑 student@cos.32dots.de · cos2026

Take an existing workflow. Add a Set/Code node that builds a log object `{timestamp, input_hash, classification, model, tokens, output_snippet}`, and append it to a Google Sheet or Postgres table. Also screenshot n8n's built-in Executions list. ### Blocks needed - Dify: built-in Logs tab + a Code/HTTP node for external logging - n8n: Set/Code + Google Sheets / Postgres append + built-in Executions view - External sink: same Postgres table shared by both

Nodes needed

Dify: built-in Logs tab + a Code/HTTP node for external logging
n8n: Set/Code + Google Sheets / Postgres append + built-in Executions view
External sink: same Postgres table shared by both

Why this platform

Dify's Logs are great for LLM-specific telemetry (tokens, latency, which tool was called). n8n's Executions are great for pipeline telemetry (which node failed, full payload at each step). Real systems need both: Dify logs for "what did the model do?", n8n logs for "what did the pipeline do?". Point both at the same Postgres and you have one queryable truth.

Common pitfalls

**Logging raw PII** â€” hash or redact sensitive inputs before they hit the log table, or the log itself becomes a compliance problem.
**No correlation ID** â€” without a `run_id` linking Dify and n8n rows, you can't trace one request across both systems.
**Logs that no one reads** â€” if you can't show a weekly summary from your log table, you're hoarding rows, not observing.

Self-check · tick before you mark done

Every log row has timestamp, run_id, model, token counts, and output snippet
You reconstructed one past run purely from the log
Sensitive inputs are hashed or redacted in the log
You can write one SQL query to count failures this week
You can explain in one sentence what you learned that you would tell a labmate tomorrow

Student riddle

Why is system logging part of scientific rigor?

Your answer first — then reveal

Mini project

Add logging to the seminar-inbox assistant (from card 16) â€” every run goes into a single shared Postgres table with enough fields to reconstruct the decision.

Deliverable

Workflow with explicit run log + screenshot of 10 logged runs in the sink.

Reflection · one sentence in your journal

For the system you'd most like to improve, what's the one field that *isn't* in your logs today that would let you diagnose the next bug?