Glossary
Definitions for kind readers who do not live in coding all day. Corrections and better explanations are always welcome.
LLM (Large Language Model)
Large Language Model — the technical word for what people casually call "AI". A language model is trained on large amounts of text and predicts, at a high level, the most likely next word, or more precisely the next token. Examples include Claude (Anthropic), GPT (OpenAI), Mistral, and Llama. There is a growing zoo of LLMs for language understanding, summarisation, code generation, translation, and reasoning across many domains. Development is moving fast.
Pattern completion
The mechanism by which an LLM "thinks": in response to an input it predicts the most likely next token, then the next one, then the next one. No reflection, no knowledge, no truth check — just statistical pattern completion based on training data. This explains why LLMs can produce fluent, plausible answers without understanding a topic. Pattern completion is also the reason for hallucinations: the most likely next token is not always the correct one, just the one that fits best.
Hallucination
The standard term for a false but confidently phrased statement from an LLM. Confidence is the important part: the model has no built-in way to know "I do not know". It predicts likely tokens even when the answer is invented. Spotting hallucinations will probably become one of the key skills for users.
Token
The smallest unit in which an LLM processes text — often part of a word, sometimes a whole word, sometimes only a character. Models have costs per token and a maximum token budget per request.
Context window
The maximum amount of tokens an LLM can consider in one request — input and output together. Current models have context windows from roughly 8,000 to 1,000,000 tokens. Anything beyond that window is not seen, or the tool has to summarise older material (see Compaction). It defines the agent's effective attention in a session. Important: larger windows do not solve security problems — they scale them. More material in the window means more untrusted input.
Slop
Colloquial term for low-quality generated LLM output — plausible-sounding but empty or wrong text. Related to "bullshit" in the philosophical sense. This helps with mental sorting: some of what an LLM produces is slop. Not because the model is malicious, but because pattern completion sometimes produces nothing substantial.
Repository (repo)
A folder where a project lives — plus the full history of all changes to that folder. It is managed by a version-control tool, in practice almost always Git. A repo lets you see who changed what, when, and why, and restore earlier states. Full, elegant traceability. It does not prevent bad code from being written, but afterwards you can roll back.
Branch
A parallel line of work in a repository. Instead of working directly on the main line (main), you create a separate branch for each task, build there, and merge it only when the change is finished and checked.
Why it helps. Risk isolation. If a branch turns out badly, you throw it away — the main line stays untouched. An agent can experiment in its own branch without breaking the main line.
Commit
A single named change in the repository — like a save point with a note attached ("this changed"). A good commit is small, focused, and describable in one sentence. It makes the history of your work readable, enables rollback of individual steps, and creates clear accountability.
Diff
The difference between two states of a file or repository — usually shown line by line: red removed, green added. The diff is what you read when you want to evaluate a change. It focuses attention on what actually changed, not on the whole file.
Linter
A tool that automatically scans code for known bad patterns — style issues, risky constructs, common bug sources. A linter does not execute the code; it reads it and reports matches.
Why it helps. Fast, cheap filter for common mistakes. Runs in seconds, often directly in the editor. What it does not do. It finds what was defined as a pattern beforehand. Logic errors, bad architecture, or subtle security issues are outside its reach. Examples. Python: ruff, flake8, pylint. JavaScript/TypeScript: eslint. A security-focused Python variant: bandit.
Type checker
A tool that checks whether data types in a program fit together — for example, whether a function expecting a number is accidentally called with text. A type checker also does not execute the code; it analyses it statically.
Why it helps. Catches a whole class of errors that would otherwise appear only at runtime — often exactly when a user triggers the problem. Makes refactoring much safer. What it does not do. It says nothing about what the code does, only whether the parts fit together. Examples. Python: mypy, pyright. In languages with built-in types (Rust, Go, TypeScript), the check is part of the compiler or toolchain.
Tests
Code that checks other code. A test suite is a collection of small programs that state a concrete expectation ("if I call function X with Y, Z should come out") and automatically report whether that expectation is met.
Why at all — and why so often. Three reasons stand independently:
- Create a baseline.
- Self-check for the agent. If an agent claims to have built feature X and the test for X is still red, the agent has not understood the situation correctly.
- Protection against blind confidence. LLMs are often confident where they should not be. A running test is a piece of reality that cannot be negotiated with.
Test types that matter in security contexts:
- Ask your coding agent specifically for the relevant tests for your project...
What tests do not do. They only check what someone thought to check. Examples. Python: pytest (standard), hypothesis (property-based), atheris (fuzzing).
Pre-commit hooks
Small programs that run automatically before a commit can be created. If one hook is red, the commit is rejected and the code has to be fixed first.
Why they help. Prevent bad, risky, or forbidden code from entering the repository history in the first place. Linters, type checkers, and secret scanners are typical pre-commit hooks.
Pre-push hook
A local Git hook that runs when pushing, right before changes leave your machine and are sent to a remote repository. It is the place for more expensive checks that you do not want to run on every commit: full test suite, broader security checks, integration checks.
Secret scanner
A specialised linter with one job: scan the code and the diff for patterns that look like secrets — API keys, passwords, tokens, private keys.
Why it helps. Prevents the most common kind of data leak: accidentally committed keys that then live forever in Git history. What it does not do. It recognises what looks like a secret by shape, length, or prefix. Obfuscated or unusual secrets can slip through. The tool works with a baseline: known harmless findings, such as examples in documentation, are whitelisted so the check does not turn red every time. Examples. detect-secrets, gitleaks, trufflehog.
CI (Continuous Integration)
A pipeline that automatically runs a defined set of checks on every push: tests, linter, type check, secret scan, dependency audit. Only when all checks are green is a change considered ready for the main branch.
Why it helps. The counterpart to local pre-commit hooks — not bypassable in the same way, because the pipeline runs on a server the developer does not control. It makes the test and security state of every change visible and comparable. Examples. GitHub Actions, GitLab CI, CircleCI, Jenkins. For a solo project, GitHub Actions with a short YAML config is often enough.
Dependency audit
A check of installed packages against databases of known vulnerabilities. The tool does not look at your code, but at the libraries your project brings along.
Why it helps. Shows when a dependency already has publicly known vulnerabilities. This matters especially in agent-built projects, because new packages can appear quickly without anyone consciously checking their maintenance state. What it does not do. It only finds known vulnerabilities. A clean audit is not proof that a dependency is safe or well maintained. Examples. pip-audit for Python, npm audit for Node projects, cargo audit for Rust.
Trust boundary
A conceptual line between "inside, which I trust" and "outside, which I do not trust". Concrete examples: between my program and the user typing something; between my program and a file it reads; between my program and an answer an LLM gives. Every trust boundary is a point where inputs must be checked, validated, or cleaned before they are processed further.
Why it helps. Clear thinking about where security checks must sit. "Validate at the entrance" instead of "somewhere in the code, hopefully". What it does not do. Trust boundaries must be consciously drawn and respected. A boundary nobody named is not guarded either.
Non-destructive defaults
A working principle: the safe default of a tool or agent is conservative — delete nothing, overwrite nothing, silently rewrite nothing at scale. Anyone who wants destructive behaviour has to request it explicitly by setting a flag, confirming, or getting approval.
YOLO mode
You Only Live Once — the casual term for unbraked full autonomy of an agent: everything is allowed, nothing is checked, every action runs immediately. Most coding agents have a corresponding setting, often called "auto-edit", "yolo", "accept-all", or something similar.
Why it helps. For playgrounds and throwaway experiments in a sandbox where nothing important can break. What it does not do. It is the opposite of security.
Steering files
Configuration files that give a coding agent rules, priorities, and behavioural guidance before and during each session. Names differ by provider: Claude Code uses ~/.claude/CLAUDE.md globally and CLAUDE.md in the project repo; Codex uses AGENTS.md; Cursor uses .cursorrules. Functionally they are similar: an attempt to teach an LLM a kind of professional caution.
Why they help. Set the agent's default mode — what it should do, what it should not do, and how it should behave under uncertainty. What they do not do. Steering files are high-probability suggestions, not hard barriers. A manipulated, distracted, or overloaded agent can ignore them.
Session anchor
A short starting prompt for a concrete work session. It tells the agent what today's goal is, what scope applies, which sources matter, and which rules or gates matter especially today.
Why it helps. Prevents tasks, roles, and contexts from blending into one another. A good session anchor turns a fuzzy conversation into a clear work order. What it does not do. It replaces neither global rules nor project rules. If the rest of the context is bad or drifts later, even a good starting anchor can lose effect.
Cross-agent audit
A review method for garage projects without human code reviewers: a second, independent coding agent, ideally a different model or provider, is explicitly asked to take a change apart rather than extend it — find weaknesses, check assumptions, expose blind spots.
Why it helps. Different models have statistically different blind spots. What the first agent considers correct, the second may see as problematic — and vice versa. What does not help. The auditor agent can also be wrong.
Blast radius
Security jargon for the reach of damage a single faulty action can cause. An action with a small blast radius affects one file; one with a large blast radius affects the whole system, multiple users, or external services.
Rollback
The way back to an earlier working state. In everyday Git usage this usually means reverting a commit, discarding a branch, or restoring a known-good state.
Why it helps. Your project survives that last mistake. If you work in small steps and keep clean intermediate states, you can often return quickly to a stable point. What it does not do. A rollback does not heal external damage that already happened. If a secret was published or a system changed, "back in Git" is only one part of damage control.
Mutation testing
A test-quality check: deliberately change one spot in the code — remove a line, flip a condition, replace an operator — and see whether the test suite notices. If the tests remain green despite the mutation, they are blind at that spot.
Why it helps. One of the strongest practical measures of test effectiveness that a non-coder can apply selectively.
Coverage
A measure of what share of the code the test suite touches at all, usually given as a percentage, such as 85%. Tools like pytest-cov for Python report this per file, function, and line.
Why it helps. Rough lower bound: what is not touched is certainly not tested. What it does not do. High coverage does not equal good tests. A function being called is not the same as being checked. Coverage measures contact, not meaning.
Compaction
When a coding tool summarises older parts of a long session instead of keeping them verbatim, usually to save context-window space and cost. Claude Code does this at a certain depth; some Cursor flows and Aider do too; in OpenClaw it can be triggered manually.
Why it helps. Enables very long sessions without overflowing the window. Warning. Compaction deforms. A summary always loses detail, sometimes exactly the detail that matters later. Meanings can drift. Before important decisions, start a new session rather than relying on compacted memory.
Lost in the middle
A documented effect in long inputs: LLMs weigh the middle of a long input less strongly than the beginning and the end. Important information in the middle of a 100k-token context is statistically less likely to be used.
Why it helps to know this. Explains why large contexts are not automatically better. Important instructions belong at the front (system prompt) or at the end of the request.
Trust zones
The deliberate classification of material that flows into an agent's context into different trust levels. An internally reviewed architecture document is a different trust zone than a random Markdown file from the internet.
Why it helps. Turns the question "may this influence the agent?" into an explicit decision instead of an implicit accident.
Provenance
The question "where did this information come from?" — answered explicitly, ideally with source, date of last check, and checking person. A claim without provenance is often as problematic in an LLM context as it is in journalism, only without an editor to reject it.
Why it helps. Makes silent poisoning more visible. "Claim X — seen where?" is a question whose unanswerability is a warning sign.
Context minimisation
The principle: put only the information into the context that the agent really needs for the current task — not "everything that might be useful". More context sounds safer, but is usually the opposite: more untrusted material, more attention dilution, more attack surface.
Why it helps. Reduces both hallucination risk and cost.
Untrusted input
Everything the agent receives not from an explicitly trusted source: websites, PDFs, tickets, emails, screenshots, issues, copy-paste from the internet. "Untrusted" does not automatically mean malicious; it means not verified and therefore potentially manipulated.
Why it helps to know this. The term sharpens thinking hygiene. Once material is untrusted, it should not silently be treated as an instruction.
Lethal Trifecta
A key diagnosis, coined by Simon Willison, for the dangerous configuration in agentic systems: (1) untrusted input meets (2) access to sensitive data meets (3) tools that can act (code execution, file writes, network calls). If all three capabilities are present at the same time, the damage potential of prompt injection is high: an injected instruction can do destructive things, read data, and exfiltrate it through a tool.
Why it helps to know this. Makes clear why an advisory-only agent without action tools is a much smaller problem than a fully active one. If one leg is missing, the threat level shrinks considerably.
Rule of Two
A simple safety rule for agent design: whenever possible, an agent should never have all three ingredients of the Lethal Trifecta at the same time. At most two of three: untrusted input, sensitive data, tools with impact.
Why it helps to know this. Not magic, but a useful architecture filter. The Rule of Two reduces the blast radius if an agent falls for poisoned context.
Prompt injection
A class of attacks where an attacker places instructions in the data channel of an LLM, for example in a file, an email, or a web page the agent reads, and the model interprets and follows them as commands. Direct injection: the attacker writes directly into the prompt. Indirect injection: the attacker writes into something the agent later consumes.
Why it helps to know this. Explains why every source flowing into the context has to be treated as potentially hostile — even your own file, if someone may have smuggled something into it.
HITL (Human in the Loop)
A deliberately built-in point where a human has to approve, check, or stop before the agent continues. Typical places: before file writes, network calls, emails, deployments, and anything expensive or irreversible.
Why it helps. HITL is friction on purpose. Slower than full automation, but often the last sensible brake.
Sandboxing
A technical cage for software or agents: limited file access, limited processes, limited network, limited permissions. The agent is not allowed to do everything the host system could do.
Why it helps. Limits damage when an agent hallucinates, is manipulated, or is simply badly built. Sandboxing does not prevent every error, but it more often turns "catastrophic" into "annoying, but contained".
No claim of correctness. You are seeing my learning process. Corrections, disagreement, and field reports are welcome (email in the legal notice). This post will change. Created with LLM support. What is true today may already be different tomorrow.