Header Logo
Log In
← Back to all posts

What Separates AI Hobbyists From AI Engineers?

by Zubin Pratap
Dec 30, 2025
Connect

(Warning: if you read this essay to the end, don’t be surprised or discouraged to learn that your AI engineering courses from 2025 are already obsolete. This essay is an effort to focus you on the meta-engineering skills that matter regardless of what AI hype technology comes up).

I've been having conversations with former colleagues from Google DeepMind, who are now scattered across Anthropic, OpenAI, and various AI labs.

And they all keep saying the same thing about people trying to break into AI engineering:

"Most developers think the bottleneck is prompt engineering. It's not. It's context engineering."

I’ve only been working with AI since 2025, whereas most of them have been building and researching AI  since 2016.  Ten years.

But in just 1 year, I’ve had to rewrite tools twice.  And so what they’re saying is EXACTLY what I’m seeing in the “real world” of AI engineering.

Here’s the insider insight:

Every production AI system has two parts: 

  • the model (GPT-4, Claude, Gemini) and 

  • the harness (everything you build around it—context management, tool calling, error handling, state persistence).

The harness is where most of us outside the frontier labs spend our time. Heck, even inside labs like Anthropic, most of the team is spending their time on the harness around Claude, the family of AI LLMs.

The harness is the scaffold that helps us build around the model. The harness takes a lot of time, testing, evals, context management, prompt refinement, steering, etc. A lot of true engineering, which is (sorry to keep banging on about this) MUCH more than just writing code.

And the fascinating thing about model evolution? 

As models get smarter, you need less harness.

OpenAI's o1 reasoning model documentation literally says: 

"Keep prompts simple and direct. Avoid chain-of-thought prompts. Since these models perform reasoning internally, prompting them to 'think step by step' is unnecessary."

That is for the chain of thought reasoning models. 

Anthropic removed their "planning tool" from Claude 4's scaffold. It no longer needs it.

The better the model, the more your complex scaffolding actually hurts performance.

This is the opposite of how most developers think. They keep adding complexity—more prompt engineering, more elaborate frameworks, more "AI agent architectures"—when the actual move is to strip it back.

Here's the thing that took me years to learn as a lawyer-turned-engineer: The code you write matters less than understanding the system that needs to be engineered to deliver the outcomes we value.  There is a lot of design thinking and tradeoff analysis.  And of course, there is the cost, especially for pay-per-use resources like tokens.

And right now, while everyone's obsessing about which framework to learn (LangChain? AI ADK? AutoGen? CrewAI?), the actual engineers shipping AI products are solving a completely different problem.

They're mastering harness engineering: 

knowing when to add scaffolding and when to get out of the model's way.

The Real Problem That Only Insiders Understand

You know that feeling when you've built a multi-agent system that works perfectly... for exactly 3 turns of conversation?

Then it either:

  • Costs $2 per run and becomes economically unviable

  • Starts hallucinating because the context window is polluted with irrelevant garbage

  • Hits token limits and crashes

While you're Googling "best AI agent framework," production engineers at Google just published their playbook for agent development. Not marketing material. Their actual architecture document.

Note how it's not just about languages or “patterns” like RAG. It’s about understanding the whole system, so you pick the right tool to avoid the â€œHammer and the Nail” bias.

And it reveals something critical: The difference between a toy demo and a production system isn't the model. 

It's how you manage what the model sees.

What Google's ADK Architecture Actually Reveals

December 2024. Google's team published their Agent Development Kit (ADK) architecture after scaling "complex single- or multi-agentic systems" in production.

And let me be clear – as of the date of writing, it is not my favorite framework for AI agentic engineering.

But its design gave me the best insights into agentic principles.

Here's what matters:

The naive pattern kills production systems: Appending everything into one giant prompt collapses under three pressures:

  1. Cost/latency spirals (every token costs money and time)

  2. Signal degradation (model gets distracted by irrelevant context)

  3. Working memory limits (even 128k tokens aren't enough for real tasks)

Their solution? Treat context as a compiled system, not string manipulation.

Three core AI engineering patterns:

  • Separate storage from presentation - What you store, ≠ what the model sees

  • Explicit transformations - Build context through ordered processors, not ad-hoc concatenation

  • Scope by default - Each agent sees the minimum required context, must explicitly reach for more

Think of it like this: You wouldn't dump your entire codebase into every function call. Why dump your entire conversation history into every AI call?

By the way, here is an interesting research paper on Arxiv that summarizes findings on popular agentic patterns. Definitely worth a read
but don’t be surprised if it’s at least partially obsolete within 6 months.

While all these emerging patterns may have  a rapid expiry date, here's the paradox I'm seeing in conversations with people at the labs:

The faster we move, the more we seem to fall back on established, first-principle type patterns that predate AI.

The Harness Paradox: Scaffolding vs. Stripping Back

December 2024. OpenAI releases o1 with extended reasoning capabilities.

The documentation includes something that made everyone's "advanced prompt engineering" courses obsolete overnight:

"Avoid chain-of-thought prompts. Since these models perform reasoning internally, prompting them to 'think step by step' or 'explain your reasoning' is unnecessary."

In fact, research showed that o1-mini underperformed GPT-4o in 24% of simple tasks specifically because of "unnecessary, expansive reasoning."

The model was overthinking because developers were still using scaffolding designed for dumber models.

Anthropic saw the same thing. Claude 4 removed the third "planning tool" that Claude 3.7 Sonnet needed. The model internalized the capability.

One engineer at Anthropic said, "We kept simplifying the harness, and performance kept improving. People don't believe it until they test it."

Here's what's happening:

Early models (GPT-3 era): Needed extensive scaffolding—detailed instructions, multi-shot examples, explicit chain-of-thought prompting, complex orchestration

Reasoning models (o1, Claude 4): Simpler prompts perform better. "Keep prompts simple and direct" is now official guidance.

The skill shift: From building elaborate harnesses to knowing when to strip them back

This creates a fascinating problem:  

Developers who've spent months mastering prompt engineering techniques may now be employing techniques that are actually harmful.

What Anthropic Discovered 

Exactly a year ago, Anthropic released their findings from "dozens of teams building LLM agents across industries."

The pattern? The most successful implementations use simple, composable patterns rather than complex frameworks.

Not what the framework evangelists want to hear. Especially since that was a full year ago!

(BTW, this is why I only learn from the courses released by AI labs themselves. I generally avoid AI engineering courses from course creators. This is because it’s way too early for patterns and best practices to be solidified, and since course creators tend to re-package what’s already in the product’s official docs anyway, I reasoned that I might as well skip the middle man and learn straight from the source. Especially since the learnings will have a short shelf life.  Rapid obsolescence of information means we have no time for info to be “packaged”...that’s the problem universities have today - your learning are obsolete by the time you finish).

So far, there seem to be six composable patterns that are meta-level enough to be effective no matter which model you choose to power them:

  1. Prompt chaining - Break tasks into sequential validated steps

  2. Routing - Direct queries to specialized sub-tasks

  3. Parallelization - Run independent steps simultaneously

  4. Orchestrator-workers - Manager coordinates specialist agents

  5. Evaluator-optimizer - Iterative refinement loops

  6. Autonomous loops - Self-directed until completion

But: these aren't "advanced techniques." 

These are basic software engineering patterns applied to AI. 

Remember how I said it feels like we’re converging on systems-related meta principles?

If you've built any production system, you already know these patterns. The AI part is simply automating things that humans or code already and removing the need for extensive rules-based programs to be maintained.

The Skills vs. MCP Convergence

Not long after Anthropic open-sourced their "Skills" specification, OpenAI quietly adopted the identical architecture in ChatGPT. 

The industry may have just converged on a standard answer.

Two complementary pieces:

  • MCP (Model Context Protocol) - Secure connectivity to external tools/data

  • Skills - Procedural knowledge for using those tools effectively

Why this matters for you: The architecture you learn today will work across Claude, GPT, and Gemini tomorrow. “Skills” may be becoming infrastructure. Makes sense given that they’re just high-signal packaged prompts.

What This Means For Aspiring AI Engineers

I talk to mid-career professionals every week who've done courses but can't get interviews.

The conversation usually goes:

  • "I know Python basics."

  • "I've built some RAG demo.s"

  • "But employers want production experience."

Here's the disconnect: You're learning yesterday's patterns in a world where all your competition has the same skills. Remember: Job Descriptions list skills, but hiring managers look for risk minimization. 80% of applicants have the skills. Yet only 1 individual will convince the hiring manager they’re the least risky of all the thousands of applicants.  For that specific role. 

Maybe you're mastering elaborate prompt engineering when the industry just shifted to simpler prompts.

Maybe you're learning complex frameworks when production teams are writing 100 lines of Python and a bunch of carefully communicated instructions in MD files.

You're building scaffolding when the actual skill is knowing when to remove it and maintain or improve performance on evals.

These things are hard to teach in courses. Just like driving.  They must be learned by doing.

Harness engineering is the new bottleneck.

The skill isn't "advanced prompt engineering." It's understanding:

When to add complexity:

  • Context compaction for long-running conversations

  • KV-cache optimization for cost/latency

  • Artifact handling for large data

  • Multi-session persistence for state management

  • Tool orchestration for external API calls

When to strip it back:

  • Using reasoning models (o1, Claude 4) → simpler prompts

  • Simple tasks → don't over-engineer the reasoning

  • Clear instructions → model handles the rest

  • Trust the model's internal capabilities

Because, unlike traditional software, where the platform stays stable, AI models are improving every 3-6 months. Your harness design that worked for GPT-4 breaks with o1. Your elaborate scaffolding for Claude 3.5 underperforms with Claude 4.

The Opportunity Hiding In Plain Sight

While everyone races to learn the latest framework, actual production teams are solving:

Context compaction - Summarizing old events to prevent token bloat (Google's ADK uses LLMs to compress conversation history at configurable thresholds)

KV-cache optimization - The single most important metric for production agents (affects both latency and cost)

Artifact handling - Don't dump 5MB CSVs into context; use references and load on-demand

Multi-session persistence - Anthropic just solved long-running agent memory with initializer + coding agent patterns

These aren't "advanced AI techniques." They're database patterns, caching strategies, and state management you'd learn in any backend engineering role.

The truth is that the bar has been raised because AI makes coding easier, but engineering much harder. 

And I’ve been warning for years in my podcast that people are mistaking coding for engineering at a time when the market has matured from “coders” to “engineers” because 20+ years into the startup boom, we now have lots of mature engineers with protogĂ©s in the workforce.

The Good News for Full Stack Devs

If you can build a REST API with proper error handling, you can build production AI agents. The primitives are the same:

  • State management

  • Request/response cycles

  • External service calls

  • Retry logic

  • Logging and monitoring

Learning from hype will make you fail. If you’re learning based on the YouTube Algorithm and social media buzz worlds,  then you’re in trouble.

Here's the uncomfortable truth: While you're learning elaborate prompt chains, someone who understands when to use a 10-line Python script versus when to trust Claude 4's internal reasoning is shipping production AI agents.

You need to dig in and read carefully what production engineers at the labs actually do. They publish blogs every week.   Learn how they change and adapt, and just follow their approach.   

It’s too early for courses when the techniques change every month.

The competition isn't in knowing the frameworks. It's in understanding the architecture—and when to simplify it.

And unlike ML theory or transformer math, this is learnable. Because at its core, it's just:

  • Systems engineering (distributed systems patterns)

  • Cost optimization (token management, caching)

  • API design (tool integration, error handling)

  • Performance engineering (latency, throughput)

Applied to models that keep getting smarter.

But while engineering trends and habits change, the goals of engineering don’t change.

Web Stack Acronyms Timeline

Don’t ignore history.  Here is a rough timeline of the “trendy tech stacks” from the last 15 years.   Watch how quickly they change.  

Late 1990s - Early 2000s:

  • LAMP (1998) - Linux, Apache, MySQL, PHP/Perl/Python

    • The original and most famous stack acronym

    • Dominated early web development

Mid 2000s:

  • WAMP (2000s) - Windows, Apache, MySQL, PHP

    • Windows variant of LAMP

  • MAMP (2000s) - macOS, Apache, MySQL, PHP

    • Mac variant of LAMP

  • XAMPP (2002) - Cross-platform, Apache, MySQL, PHP, Perl

    • Cross-platform development environment

Late 2000s - Early 2010s:

  • LAPP (late 2000s) - Linux, Apache, PostgreSQL, PHP

    • PostgreSQL variant of LAMP

2010s - JavaScript Era:

  • MEAN (2013) - MongoDB, Express.js, Angular, Node.js

    • First major all-JavaScript stack

    • Popularized by Valeri Karpov

  • MERN (2015) - MongoDB, Express.js, React, Node.js

    • React variant of MEAN

  • MEVN (2016) - MongoDB, Express.js, Vue.js, Node.js

    • Vue variant of MEAN

  • PERN (mid-2010s) - PostgreSQL, Express.js, React, Node.js

    • PostgreSQL variant of MERN

Late 2010s - Jamstack Era:

  • JAMstack (2016) - JavaScript, APIs, Markup

    • Coined by Netlify's Mathias Biilmann

    • Static site generation approach

2020s - Modern Variants:

  • T3 Stack (2022) - TypeScript, tRPC, Tailwind, Next.js, Prisma

    • Modern type-safe stack

  • PETAL (early 2020s) - Phoenix, Elixir, Tailwind, Alpine.js, LiveView

    • Elixir-based stack

Honorable Mentions:

  • WINS - Windows, IIS, .NET, SQL Server (Microsoft stack)

  • ELK - Elasticsearch, Logstash, Kibana (logging/analytics stack)

  • SMACK - Spark, Mesos, Akka, Cassandra, Kafka (big data stack)

The evolution shows: LAMP → JavaScript stacks (MEAN/MERN) → Jamstack → Type-safe modern stacks

This is the problem with trend/hype cycles.  You know what has not changed?

What good engineering means. 

A good engineer in one stack was a good engineer in another stack.  Tools change.  Excellent engineering does not.

It’s no different this time.

Four ways we can help you:

1. Wondering what learning to code actually means? 

Becoming a coder is much more than just "learning to code" some languages.  When I got hired at Google, for example, I didn't know 3 out of the 4 languages I had to write every day. 

Check out

👉 My FreeCodeCamp Course on YouTube -->  Before You Learn To Code (Video).

👉 Updated version (including Google and other big tech experiences) 

--> check it out here.

2. Inner Circle (Free Preview Included)

Our personalized, intensive mentorship program is designed to help career changers go from zero to software developer—and actually get hired. It’s not for everyone, but if you’re ready to commit, we’ll walk with you every step.

👉 Preview the Inner Circle Program -> free preview.

👉 Apply for Future Coders Inner Circle → https://www.matchfitmastery.com/inner-circle 

3. Career Change To Code Podcast

Driving? At the gym? Hiding in the bathroom? Perfect time to inject the best techniques for a career change to code directly into your brain via 

👉Drip tips directly into your brain with the Easier Said Than Done podcast: YouTube | Spotify 

4. Weekly Tips In Your Inbox

👉 Subscribe to this newsletter (it’s free).  I try to keep it to 3 minutes or less so you can read in the elevator, waiting in lines, in the bathroom...😝 

 



Wow. This is the last week of 2025. Zoom. Whoosh. Year done.
And here’s an opportunity for those of you looking for coding jobs. While everyone's fighting for the same shrinking pool of Big Tech jobs, the federal government just opened a side door most people will ignore. The US Tech Force program launched last week. They're hiring engineers—salaries $150K-$200K, no degree required, skills-based only. Most self-taught coders will scroll past this. They w...
Token Limits Aren't Your Problem. Context Engineering Is.
Everyone's obsessing over token limits. They're solving the wrong problem. Google just dropped a guide on multi-agent context management that exposes what most developers get catastrophically wrong: treating context like string concatenation when it's actually an architecture problem. Read it. Several times. If you're building with AI—or plan to—this matters. The Default Is Broken Here's what 9...
The job search isn't about being good. It's about being good and noticed.
You learned to code. You can build. You solve problems. You're competent. You’re hard-working.  And you're still not getting hired. So you blame the market. You blame AI. You blame hiring managers for being "gatekeepers." You don't blame yourself for doing the one thing that actually matters: telling people you exist. Let's be honest about what's happening here.   You're avoiding marketing you...

Career Change To Tech

Career change to code? Learning to code is not enough. Not even close. Just like learning how to dribble doesn't make you a pro ball player. There are 7 steps. So you need 7 campaigns. My newsletter is your unfair advantage on each. You also get access to my Podcast and other free resources - each of which help you understand exactly how I went from 37yo lawyer to Google Engineer. Sign up so you never miss out.
Footer Logo
© 2026 Alpha-Zeus Worldwide Pty Ltd
Powered by Kajabi

Join Our Free Trial

Get started today before this once in a lifetime opportunity expires.