What Separates AI Hobbyists From AI Engineers?
(Warning: if you read this essay to the end, donât be surprised or discouraged to learn that your AI engineering courses from 2025 are already obsolete. This essay is an effort to focus you on the meta-engineering skills that matter regardless of what AI hype technology comes up).
I've been having conversations with former colleagues from Google DeepMind, who are now scattered across Anthropic, OpenAI, and various AI labs.
And they all keep saying the same thing about people trying to break into AI engineering:
"Most developers think the bottleneck is prompt engineering. It's not. It's context engineering."
Iâve only been working with AI since 2025, whereas most of them have been building and researching AI since 2016. Ten years.
But in just 1 year, Iâve had to rewrite tools twice. And so what theyâre saying is EXACTLY what Iâm seeing in the âreal worldâ of AI engineering.
Hereâs the insider insight:
Every production AI system has two parts:
-
the model (GPT-4, Claude, Gemini) and
-
the harness (everything you build around itâcontext management, tool calling, error handling, state persistence).
The harness is where most of us outside the frontier labs spend our time. Heck, even inside labs like Anthropic, most of the team is spending their time on the harness around Claude, the family of AI LLMs.
The harness is the scaffold that helps us build around the model. The harness takes a lot of time, testing, evals, context management, prompt refinement, steering, etc. A lot of true engineering, which is (sorry to keep banging on about this) MUCH more than just writing code.
And the fascinating thing about model evolution?
As models get smarter, you need less harness.
OpenAI's o1 reasoning model documentation literally says:
"Keep prompts simple and direct. Avoid chain-of-thought prompts. Since these models perform reasoning internally, prompting them to 'think step by step' is unnecessary."
That is for the chain of thought reasoning models.
Anthropic removed their "planning tool" from Claude 4's scaffold. It no longer needs it.
The better the model, the more your complex scaffolding actually hurts performance.
This is the opposite of how most developers think. They keep adding complexityâmore prompt engineering, more elaborate frameworks, more "AI agent architectures"âwhen the actual move is to strip it back.
Here's the thing that took me years to learn as a lawyer-turned-engineer: The code you write matters less than understanding the system that needs to be engineered to deliver the outcomes we value. There is a lot of design thinking and tradeoff analysis. And of course, there is the cost, especially for pay-per-use resources like tokens.
And right now, while everyone's obsessing about which framework to learn (LangChain? AI ADK? AutoGen? CrewAI?), the actual engineers shipping AI products are solving a completely different problem.
They're mastering harness engineering:
knowing when to add scaffolding and when to get out of the model's way.
The Real Problem That Only Insiders Understand
You know that feeling when you've built a multi-agent system that works perfectly... for exactly 3 turns of conversation?
Then it either:
-
Costs $2 per run and becomes economically unviable
-
Starts hallucinating because the context window is polluted with irrelevant garbage
-
Hits token limits and crashes
While you're Googling "best AI agent framework," production engineers at Google just published their playbook for agent development. Not marketing material. Their actual architecture document.
Note how it's not just about languages or âpatternsâ like RAG. Itâs about understanding the whole system, so you pick the right tool to avoid the âHammer and the Nailâ bias.
And it reveals something critical: The difference between a toy demo and a production system isn't the model.
It's how you manage what the model sees.
What Google's ADK Architecture Actually Reveals
December 2024. Google's team published their Agent Development Kit (ADK) architecture after scaling "complex single- or multi-agentic systems" in production.
And let me be clear â as of the date of writing, it is not my favorite framework for AI agentic engineering.
But its design gave me the best insights into agentic principles.
Here's what matters:
The naive pattern kills production systems: Appending everything into one giant prompt collapses under three pressures:
-
Cost/latency spirals (every token costs money and time)
-
Signal degradation (model gets distracted by irrelevant context)
-
Working memory limits (even 128k tokens aren't enough for real tasks)
Their solution? Treat context as a compiled system, not string manipulation.
Three core AI engineering patterns:
-
Separate storage from presentation - What you store, â what the model sees
-
Explicit transformations - Build context through ordered processors, not ad-hoc concatenation
-
Scope by default - Each agent sees the minimum required context, must explicitly reach for more
Think of it like this: You wouldn't dump your entire codebase into every function call. Why dump your entire conversation history into every AI call?
By the way, here is an interesting research paper on Arxiv that summarizes findings on popular agentic patterns. Definitely worth a readâŠbut donât be surprised if itâs at least partially obsolete within 6 months.
While all these emerging patterns may have a rapid expiry date, here's the paradox I'm seeing in conversations with people at the labs:
The faster we move, the more we seem to fall back on established, first-principle type patterns that predate AI.
The Harness Paradox: Scaffolding vs. Stripping Back
December 2024. OpenAI releases o1 with extended reasoning capabilities.
The documentation includes something that made everyone's "advanced prompt engineering" courses obsolete overnight:
"Avoid chain-of-thought prompts. Since these models perform reasoning internally, prompting them to 'think step by step' or 'explain your reasoning' is unnecessary."
In fact, research showed that o1-mini underperformed GPT-4o in 24% of simple tasks specifically because of "unnecessary, expansive reasoning."
The model was overthinking because developers were still using scaffolding designed for dumber models.
Anthropic saw the same thing. Claude 4 removed the third "planning tool" that Claude 3.7 Sonnet needed. The model internalized the capability.
One engineer at Anthropic said, "We kept simplifying the harness, and performance kept improving. People don't believe it until they test it."
Here's what's happening:
Early models (GPT-3 era): Needed extensive scaffoldingâdetailed instructions, multi-shot examples, explicit chain-of-thought prompting, complex orchestration
Reasoning models (o1, Claude 4): Simpler prompts perform better. "Keep prompts simple and direct" is now official guidance.
The skill shift: From building elaborate harnesses to knowing when to strip them back
This creates a fascinating problem:
Developers who've spent months mastering prompt engineering techniques may now be employing techniques that are actually harmful.
What Anthropic Discovered
Exactly a year ago, Anthropic released their findings from "dozens of teams building LLM agents across industries."
The pattern? The most successful implementations use simple, composable patterns rather than complex frameworks.
Not what the framework evangelists want to hear. Especially since that was a full year ago!
(BTW, this is why I only learn from the courses released by AI labs themselves. I generally avoid AI engineering courses from course creators. This is because itâs way too early for patterns and best practices to be solidified, and since course creators tend to re-package whatâs already in the productâs official docs anyway, I reasoned that I might as well skip the middle man and learn straight from the source. Especially since the learnings will have a short shelf life. Rapid obsolescence of information means we have no time for info to be âpackagedâ...thatâs the problem universities have today - your learning are obsolete by the time you finish).
So far, there seem to be six composable patterns that are meta-level enough to be effective no matter which model you choose to power them:
-
Prompt chaining - Break tasks into sequential validated steps
-
Routing - Direct queries to specialized sub-tasks
-
Parallelization - Run independent steps simultaneously
-
Orchestrator-workers - Manager coordinates specialist agents
-
Evaluator-optimizer - Iterative refinement loops
-
Autonomous loops - Self-directed until completion
But: these aren't "advanced techniques."
These are basic software engineering patterns applied to AI.
Remember how I said it feels like weâre converging on systems-related meta principles?
If you've built any production system, you already know these patterns. The AI part is simply automating things that humans or code already and removing the need for extensive rules-based programs to be maintained.
The Skills vs. MCP Convergence
Not long after Anthropic open-sourced their "Skills" specification, OpenAI quietly adopted the identical architecture in ChatGPT.
The industry may have just converged on a standard answer.
Two complementary pieces:
-
MCP (Model Context Protocol) - Secure connectivity to external tools/data
-
Skills - Procedural knowledge for using those tools effectively
Why this matters for you: The architecture you learn today will work across Claude, GPT, and Gemini tomorrow. âSkillsâ may be becoming infrastructure. Makes sense given that theyâre just high-signal packaged prompts.
What This Means For Aspiring AI Engineers
I talk to mid-career professionals every week who've done courses but can't get interviews.
The conversation usually goes:
-
"I know Python basics."
-
"I've built some RAG demo.s"
-
"But employers want production experience."
Here's the disconnect: You're learning yesterday's patterns in a world where all your competition has the same skills. Remember: Job Descriptions list skills, but hiring managers look for risk minimization. 80% of applicants have the skills. Yet only 1 individual will convince the hiring manager theyâre the least risky of all the thousands of applicants. For that specific role.
Maybe you're mastering elaborate prompt engineering when the industry just shifted to simpler prompts.
Maybe you're learning complex frameworks when production teams are writing 100 lines of Python and a bunch of carefully communicated instructions in MD files.
You're building scaffolding when the actual skill is knowing when to remove it and maintain or improve performance on evals.
These things are hard to teach in courses. Just like driving. They must be learned by doing.
Harness engineering is the new bottleneck.
The skill isn't "advanced prompt engineering." It's understanding:
When to add complexity:
-
Context compaction for long-running conversations
-
KV-cache optimization for cost/latency
-
Artifact handling for large data
-
Multi-session persistence for state management
-
Tool orchestration for external API calls
When to strip it back:
-
Using reasoning models (o1, Claude 4) â simpler prompts
-
Simple tasks â don't over-engineer the reasoning
-
Clear instructions â model handles the rest
-
Trust the model's internal capabilities
Because, unlike traditional software, where the platform stays stable, AI models are improving every 3-6 months. Your harness design that worked for GPT-4 breaks with o1. Your elaborate scaffolding for Claude 3.5 underperforms with Claude 4.
The Opportunity Hiding In Plain Sight
While everyone races to learn the latest framework, actual production teams are solving:
Context compaction - Summarizing old events to prevent token bloat (Google's ADK uses LLMs to compress conversation history at configurable thresholds)
KV-cache optimization - The single most important metric for production agents (affects both latency and cost)
Artifact handling - Don't dump 5MB CSVs into context; use references and load on-demand
Multi-session persistence - Anthropic just solved long-running agent memory with initializer + coding agent patterns
These aren't "advanced AI techniques." They're database patterns, caching strategies, and state management you'd learn in any backend engineering role.
The truth is that the bar has been raised because AI makes coding easier, but engineering much harder.
And Iâve been warning for years in my podcast that people are mistaking coding for engineering at a time when the market has matured from âcodersâ to âengineersâ because 20+ years into the startup boom, we now have lots of mature engineers with protogĂ©s in the workforce.
The Good News for Full Stack Devs
If you can build a REST API with proper error handling, you can build production AI agents. The primitives are the same:
-
State management
-
Request/response cycles
-
External service calls
-
Retry logic
-
Logging and monitoring
Learning from hype will make you fail. If youâre learning based on the YouTube Algorithm and social media buzz worlds, then youâre in trouble.
Here's the uncomfortable truth: While you're learning elaborate prompt chains, someone who understands when to use a 10-line Python script versus when to trust Claude 4's internal reasoning is shipping production AI agents.
You need to dig in and read carefully what production engineers at the labs actually do. They publish blogs every week. Learn how they change and adapt, and just follow their approach.
Itâs too early for courses when the techniques change every month.
The competition isn't in knowing the frameworks. It's in understanding the architectureâand when to simplify it.
And unlike ML theory or transformer math, this is learnable. Because at its core, it's just:
-
Systems engineering (distributed systems patterns)
-
Cost optimization (token management, caching)
-
API design (tool integration, error handling)
-
Performance engineering (latency, throughput)
Applied to models that keep getting smarter.
But while engineering trends and habits change, the goals of engineering donât change.
Web Stack Acronyms Timeline
Donât ignore history. Here is a rough timeline of the âtrendy tech stacksâ from the last 15 years. Watch how quickly they change.
Late 1990s - Early 2000s:
-
LAMP (1998) - Linux, Apache, MySQL, PHP/Perl/Python
-
The original and most famous stack acronym
-
Dominated early web development
Mid 2000s:
-
WAMP (2000s) - Windows, Apache, MySQL, PHP
-
Windows variant of LAMP
-
MAMP (2000s) - macOS, Apache, MySQL, PHP
-
Mac variant of LAMP
-
XAMPP (2002) - Cross-platform, Apache, MySQL, PHP, Perl
-
Cross-platform development environment
Late 2000s - Early 2010s:
-
LAPP (late 2000s) - Linux, Apache, PostgreSQL, PHP
-
PostgreSQL variant of LAMP
2010s - JavaScript Era:
-
MEAN (2013) - MongoDB, Express.js, Angular, Node.js
-
First major all-JavaScript stack
-
Popularized by Valeri Karpov
-
MERN (2015) - MongoDB, Express.js, React, Node.js
-
React variant of MEAN
-
MEVN (2016) - MongoDB, Express.js, Vue.js, Node.js
-
Vue variant of MEAN
-
PERN (mid-2010s) - PostgreSQL, Express.js, React, Node.js
-
PostgreSQL variant of MERN
Late 2010s - Jamstack Era:
-
JAMstack (2016) - JavaScript, APIs, Markup
-
Coined by Netlify's Mathias Biilmann
-
Static site generation approach
2020s - Modern Variants:
-
T3 Stack (2022) - TypeScript, tRPC, Tailwind, Next.js, Prisma
-
Modern type-safe stack
-
PETAL (early 2020s) - Phoenix, Elixir, Tailwind, Alpine.js, LiveView
-
Elixir-based stack
Honorable Mentions:
-
WINS - Windows, IIS, .NET, SQL Server (Microsoft stack)
-
ELK - Elasticsearch, Logstash, Kibana (logging/analytics stack)
-
SMACK - Spark, Mesos, Akka, Cassandra, Kafka (big data stack)
The evolution shows: LAMP â JavaScript stacks (MEAN/MERN) â Jamstack â Type-safe modern stacks
This is the problem with trend/hype cycles. You know what has not changed?
What good engineering means.
A good engineer in one stack was a good engineer in another stack. Tools change. Excellent engineering does not.
Itâs no different this time.
Four ways we can help you:
1. Wondering what learning to code actually means?
Becoming a coder is much more than just "learning to code" some languages. When I got hired at Google, for example, I didn't know 3 out of the 4 languages I had to write every day.
Check out
đ My FreeCodeCamp Course on YouTube --> Before You Learn To Code (Video).
đ Updated version (including Google and other big tech experiences)
2. Inner Circle (Free Preview Included)
Our personalized, intensive mentorship program is designed to help career changers go from zero to software developerâand actually get hired. Itâs not for everyone, but if youâre ready to commit, weâll walk with you every step.
đ Preview the Inner Circle Program -> free preview.
đ Apply for Future Coders Inner Circle â https://www.matchfitmastery.com/inner-circle
3. Career Change To Code Podcast
Driving? At the gym? Hiding in the bathroom? Perfect time to inject the best techniques for a career change to code directly into your brain via
đDrip tips directly into your brain with the Easier Said Than Done podcast: YouTube | Spotify
4. Weekly Tips In Your Inbox
đ Subscribe to this newsletter (itâs free). I try to keep it to 3 minutes or less so you can read in the elevator, waiting in lines, in the bathroom...đ