The shift that already happened
Prompt
Engineering
is Dead.
In June 2025, Andrej Karpathy ended the debate. Tobi Lütke amplified it. Gartner confirmed it by July. The era of clever wording is over.
What this deck covers
The timeline · The shift · Context engineering · Skills · Guardrails · Autonomous decisions · How to actually build it
The rise and fall
of prompt engineering.
"The hottest new programming language is English"
Karpathy · GPT-3 · Prompt engineering becomes a job title · $300K salaries on Glassdoor.
Peak hype. Chain-of-thought, role-playing, magic tokens.
Prompt agencies, prompt marketplaces. Clever wording as strategy. It worked — for demos.
Production arrives. The cracks show.
Real users arrive. Accuracy slips. Answers drift. The problem isn't the words — it's what the system remembers and forgets.
Karpathy coins "Context Engineering"
"Filling the context window with just the right information." Tobi Lütke immediately echoes it.
Gartner: "Context engineering is in, prompt engineering is out."
Gartner puts it in a headline. No longer a blog post.
Anthropic releases Agent Skills open standard (SKILL.md)
32 tools in weeks: Claude Code, Codex, Cursor, VS Code, Copilot, Gemini CLI. Skills = the npm of AI agents.
Context engineering is the discipline. Agents are the runtime.
Gartner: 40% of enterprise apps include agents by end of 2026 (vs <5% in 2025). Job title: system architect.
Prompt engineering
vs. context engineering.
It's not just a rename. It's a different job.
✕ Prompt Engineer
Obsesses over wording · Role-playing tricks · Chain-of-thought hacks · Rewrites the same prompt 40 times · Works on demos, breaks in production
✓ Context Engineer
Architects the full information environment · Designs what the agent knows, when it knows it, and how much · Builds for production reliability, not demo magic
✦ Andrej Karpathy, June 2025
"In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step."
✦ Tobi Lütke (Shopify CEO)
"I really like the term 'context engineering' over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM."
The 5 layers of context.
LLMs have a finite attention budget. Every token competes. What you put in — and when — determines what you get out.
Instructions
System prompt, role, objectives, constraints, output format. The constitution of the agent. Static, cached, loaded once.
Memory
Ephemeral (conversation turns) · Semantic (vector DB, past decisions) · Episodic (what this user did before). You design all three — the model has none by default.
Tools & Actions
Tool definitions in context = tokens. Load only what's needed for this step. Tool schemas must be designed like APIs — clear names, minimal description.
Retrieved knowledge (RAG)
What the agent retrieves from external sources. Freshness, relevance, and length are your design decisions — not the model's.
Conversation state
The dynamic layer: what happened in this session. Compress, summarize, prune. Never let context grow unbounded — precision drops as context grows.
✦ The rule
Right information · Right time · Right amount. Everything else is noise that degrades reasoning.
Skills: the npm
of AI agents.
Anthropic SKILL.md open standard · Dec 2025 · Adopted by 32 tools in weeks.
What is a skill?
A Markdown file with YAML frontmatter that packages procedural knowledge for an agent. Not a prompt. Not a function. An operating procedure — version-controlled, reusable, portable across tools.
How it loads
Agent reads only name + description at startup (~80 tokens). Full skill loads on-demand when relevant. Zero cost when not needed.
Agent identity
A single agent assumes different "identities" on demand. Activates a skill's instructions, constraints and tone. Returns to base when done.
✦ The insight
The gap between "generally intelligent" and "specifically useful" is where most AI-assisted work fails. Skills close that gap. You encode your context once. The agent applies it every time.
Guardrails are
not optional.
They are the system.
McKinsey: 80% of organizations have already encountered risky agent behaviors — unauthorized data exposure, improper system access — in production.
Input validation
Prompt injection detection. PII classifiers. Intent scoring before execution.
Output validation
Schema enforcement. Toxicity check. Fact-grounding before the response leaves.
Policy engines
OPA, Cedar — rules live outside the model. Can't be prompt-injected. Enforced at runtime.
Kill switches
Per-tool disable in seconds, no redeploy. Each agent has minimum privilege — only what it needs for the task.
⚠ The hard rule
Guardrails must live outside the model. If your only defense is a prompt instruction like "never reveal sensitive data" — you don't have a guardrail. You have a polite request. An adversarial input will override it.
Bounded autonomy architecture
Clear operational limits · Escalation paths to humans for high-stakes decisions · Comprehensive audit trails · Governance agents that monitor other agents · Red-teaming before deployment
Not automation.
Decisions.
The old framing was "automate this task." The new one is "define what the agent is allowed to decide, and when it must escalate."
The decision matrix
✓ Agent decides
Routing · Retrieval · Formatting · Tool selection · Low-stakes actions
⚡ Agent recommends
Medium-stakes · Ambiguous cases · First instance of new pattern
🛑 Human approves
Irreversible actions · PII access · Financial transactions · External communications · Anything that can't be undone
✦ The new job description
You don't write code for the agent. You define its objectives, constraints, decision thresholds, and escalation logic. The core skill is systems thinking, not syntax.
The best "prompt"
is a document.
The real leverage is not crafting instructions — it's writing the comprehensive document that tells the agent everything it needs to make good decisions. Not what to say. How to think.
What the document contains
Identity & role — who the agent is and what it cares about
Decision logic — what criteria determine each choice
Hard constraints — what it can never do, regardless of instructions
Escalation triggers — exactly when to stop and ask a human
Examples — good decisions, bad decisions, and why
The test of a great agent document
Could a new person join your team, read only this document, and make the same decisions as your best agent? If yes — it's a good document. If no — that's where your agent fails.
Use AI to build
your context.
The best prompt? Ask the AI to help you create it. The best agent document? Co-write it with the model.
Step 1 — Start with outcomes, not instructions
Don't write "you are a helpful assistant." Write "when a user asks about X, the goal is Y. Success looks like Z. Failure looks like W." Work backwards from the decision, not forward from the words.
Step 2 — Ask the AI to pressure-test it
Step 3 — Build skills for reusable procedures
Any procedure you run more than 3 times belongs in a SKILL.md. Not in the system prompt. Skills are on-demand — they don't cost tokens unless needed.
Step 4 — Define every constraint explicitly
List every "never do X" as a concrete rule with examples of what X looks like. Vague constraints produce inconsistent behavior. Specific constraints produce predictable agents.
Step 5 — Eval before deploy
Create a golden dataset of 20–50 decisions your agent should make. Run it before every change. If the agent's decision changes without you intending it — you have a regression.
The new job isn't
prompt writer.
It's system architect.
What the role actually does
Designs the overarching system architecture and agent topology
Defines precise objectives, guardrails, and decision thresholds
Engineers context: what loads, when, at what cost
Owns accountability for what the agent decides — not the model provider
Dead skills
Prompt writer · AI whisperer · Jailbreak expert · Persona designer
Live skills
Context architect · Agent designer · LLMOps engineer · AI systems thinker
✦ CIO Magazine, 2026
"Their value will lie in designing the overarching system architecture, defining the precise objectives and guardrails, and rigorously validating the output. It's a move from hands-on keyboard creation to high-level system design. The core skill becomes systems thinking, not syntax."
Everything you need
to get started.
Context layers
Instructions · Memory · Tools · RAG · State
Skills format
SKILL.md · YAML frontmatter · On-demand load · ~80 tokens at rest
Guardrails
Input/output validation · Policy engines · Kill switches · Minimum privilege
Decision design
Decide / Recommend / Escalate. Map every action to a tier.
✦ Practical checklist
Write outcomes first, instructions second
Use AI to pressure-test your agent document
Every repeated procedure → SKILL.md
Guardrails outside the model — not in the prompt
Map every decision to: agent / recommend / human
Cache static context. Never grow context unbounded.
The model
is not your
product.
What you build around it is.
Dead
Prompt engineer · Magic words · Clever phrasing · Demo-only AI
Alive
Context engineer · Systems architect · Skills · Guardrails · Autonomous decision design