Safety

Evaluation, privacy, and trust

Trust-and-safety patterns for AI systems — output evaluation, observability, approval gates, GDPR-aware data handling, and MCP. Being rewritten for the new course shape; existing cards are still reachable by direct URL.

After this chapter you can

→ Write a rubric-based evaluation loop for any AI output

→ Add a human approval gate for borderline outputs

→ Identify what data must never enter an AI system

Evaluating outputs

Students need to judge quality rather than assume it. "It looks good" is not a quality signal â€” it's the absence of one. The generator+reviewer pattern is the smallest eval that actually works.

Observability and logs

Systems cannot be improved if their behavior is invisible. Most bug reports for AI systems are unactionable because the logs aren't good enough to reconstruct what happened â€” fix that first, then fix everything else.

Safety and approval gates

Action systems need controls. Automation without gates is how small bugs become medium incidents â€” and "the model seemed confident" is not a defence when the message has already been sent.

Privacy and sensitive data

Scientific environments often handle sensitive material. Privacy is load-bearing infrastructure, not a disclaimer at the end of a README â€” and the cheapest control is a deterministic redaction step that runs before any model sees the data.

MCP and connected systems

Students should understand how AI systems connect to tools and data in a standardized way. Connections are where half of production bugs live â€” getting fluent in the vocabulary pays off the first time something breaks at 2am.