RAG Is a Design Pattern — Not a Technology (And Why That Matters)

← Back to all posts

After 4 years stuck learning "fundamentals" that got me nowhere as a lawyer, trying to become a developer, I learned something critical: the real skill isn't learning the hot new thing. It's learning the classical thing that underpins the hot new thing.

Like me, you've probably done a bunch of courses.

But unlike me, you probably have had to learn a bunch of things in the midst of massive AI hype.

So you’ve probably read more about AI than you have about programming fundamentals.

You understand embeddings, vector databases, and semantic search. You've probably even built a RAG pipeline from scratch. And yet... you're still not shipping anything that works reliably in production.

Here's why: You're treating RAG like it's a technology you need to master, when it's actually just a design pattern. And worse—it's a design pattern that's rapidly becoming commoditized infrastructure you shouldn't be building at all.

After almost a year of building RAG-related applications for work, I have now found a 20-line alternative that got released just a week ago, and which I knew was coming all along.

Google just made RAG a simple API call.

RAG Was Never a Technology

Let's be clear about what RAG actually is: Retrieval (find stuff) + Augmented (add it to the prompt) + Generation (LLM responds).

That's it. Three concepts duct-taped together with a catchy acronym.

Jeff Huber from Chroma said it bluntly:

"We never use the term RAG. I hate the term RAG."

Why?

Because the name obscures what you're actually doing, which is just retrieval followed by generation. The "augmented" part is literally just concatenating strings into a prompt.

Long Live Context Engineering - with Jeff Huber of Chroma

And Chroma is one of the best-known vector DBs out there.

Think about it: You wouldn't claim you've "mastered REST API technology" because you can make HTTP requests.

REST is a pattern. RAG is the same—it's an architectural approach, not a skill to perfect.

But here's where it gets worse for everyone grinding away on custom RAG implementations: the companies winning right now aren't the ones with the most sophisticated homegrown RAG systems.

They're the ones using managed solutions so they can focus on actual context engineering.

Managed RAG Is Eating the World

On Nov 6, 2025, Google launched File Search in the Gemini API.

I’m a die-hard Xoogler, so I had to check it out.

With this new API, I had a simple RAG proof of concept in 50 lines of code...

... in 1 hour flat.

Here’s the script.

What does Gemini File Search do?

Sad news: it does everything I’ve spent MONTHS building manually:

Automatic chunking strategies
Embeddings generation
Vector storage and indexing
Citation tracking
Reranking and Hybrid Searching with BM25

Good news: It's now a utility.

No configuration. No vector database setup. Minimal chunking decisions. It just works.

It's costs like utilitied do too.

Free storage. Free query-time embeddings. $0.15 per million tokens for initial indexing.

The pattern is obvious: RAG infrastructure is becoming a solved commodity.

The companies building it are billion-dollar ventures with entire teams of distributed systems engineers. You cannot out-engineer them. You should not try.

Now, do you believe me? Since I've been saying for months that RAG is not a skill or a technology… It's just a pattern for retrieval and prompting.

The Real Skill: Context Engineering

If you've got another 15m spare, then read more about Context Engineering here.

Here's what nobody told you (because they were too busy selling you RAG courses):

Context engineering is orders of magnitude more important than RAG implementation.

Context engineering is the discipline of figuring out what information should be in the LLM's context window at any given moment and how to get better at selecting it over time.

Chroma's research revealed something critical: context rot.

As you add more tokens to a context window, model performance degrades significantly—even when all the information is "relevant." Claude Sonnet 3.5 maintained the best performance across long contexts, but it even showed measurable degradation.

I've witnessed this firsthand and struggled with it for months because it turns out that managing the context window for an LLM, especially in agentic modes, is incredibly hard and incredibly important.

The a-ha epiphany: More information ≠ , better results.

The skill isn't retrieval volume—it's retrieval precision, timing, context management and assembly strategy.

Real world context engineering includes:

First-stage hybrid retrieval: Combining dense vectors, lexical search (BM25), and metadata filtering to get 200-300 candidates
Re-ranking: Using LLMs to cull 300 candidates down to 20-40 highest-value chunks
Context management: Deduplication, source diversification, hard token caps, injection, compaction, externalization (to file systems typically) and strategic ordering (instructions first)
Generative benchmarking: Creating eval datasets where LLMs generate query/answer pairs from your chunks
Evals against baselines
Continuous improvement: Error analysis → re-chunking → filter tuning → prompt refinement
No model fine tuning (this is getting less important when using frontier models)

Anthropic's Contextual Retrieval technique showed a 67% reduction in retrieval failures by adding document context to each chunk before embedding.

That's not infrastructure engineering—that's context engineering.

What This Means for You

If you're stuck after months of building RAG systems, here's the hard truth: You've been optimizing the wrong variable.

And I really hope that you're not among those who think that making a chatbot is the height of RAG. That's honestly the equivalent of thinking that animating a front end with cute CSS is software engineering.

Here’s some good news, though.

You don't need to become a vector database expert. You don't need to master chunking algorithms. You don't need to build distributed retrieval systems from scratch.

You need to:

Use managed RAG infrastructure (Gemini File Search, Claude Projects, Chroma Cloud) and stop rebuilding commodity plumbing
Invest your time in context engineering—learning how to select, assemble, and refine what goes into context windows, across multiple models
Build evaluation loops—spend one evening creating a 50-100 example gold dataset and wire it into your workflow
Focus on the problems that only AI solve—understanding your domain, your users, your specific retrieval patterns is essential when assessing whether AI is a fit. And as Martin Fowler says, communicating domain knowledge may be one of the most important functions of an engineer.

And above all, you need to first learn to program really well before you learn these new patterns. Learning these patterns before you've learned foundational programming will simply give you really bad habits and really good confidence.

The developers shipping successful AI products right now aren't the ones with the most sophisticated RAG implementations. They're the ones who realized RAG is just old retrieval infrastructure in a new pipeline.

So they delegated it to managed services and invested their scarce time in the hard problems that actually differentiate their products.

The Pattern Always Repeats

In 2010, everyone "learned databases" by configuring MySQL replication. Today? Managed databases are the default, and the valuable skill is data modeling and query optimization.

In 2015, everyone "learned Kubernetes" by manually setting up clusters. Today? Managed Kubernetes is the default, and the valuable skill is understanding workload orchestration patterns.

RAG is following the same trajectory. 2 years from now, manually building RAG pipelines will seem as absurd as manually managing database replication seems today.

The market rewards people who identify what's becoming commoditized infrastructure and focus their limited time on the problems that remain unsolved. Right now, that's context engineering—not RAG implementation.

You can continue to focus on custom RAG systems, convincing yourself that mastery of chunking algorithms and vector databases is the path forward.

Or you can accept that the infrastructure battle is already over, use managed solutions, and spend your time on context engineering—the skill that will actually differentiate your work and your career.

Your call. But the opportunity cost of that choice compounds every single day.

Four ways we can help you:

1. Wondering what learning to code actually means?

Becoming a coder is much more than just "learning to code" some languages. When I got hired at Google, for example, I didn't know 3 out of the 4 languages I had to write every day.

Check out

👉 My FreeCodeCamp Course on YouTube --> Before You Learn To Code (Video).

👉 Updated version (including Google and other big tech experiences)

--> check it out here.

2. Inner Circle (Free Preview Included)

Our personalized, intensive mentorship program is designed to help career changers go from zero to software developer—and actually get hired. It’s not for everyone, but if you’re ready to commit, we’ll walk with you every step.

👉 Preview the Inner Circle Program -> free preview.

👉 Apply for Future Coders Inner Circle → https://www.matchfitmastery.com/inner-circle

3. Career Change To Code Podcast

Driving? At the gym? Hiding in the bathroom? Perfect time to inject the best techniques for a career change to code directly into your brain via

👉Drip tips directly into your brain with the Easier Said Than Done podcast: YouTube | Spotify

4. Weekly Tips In Your Inbox

👉 Subscribe to this newsletter (it’s free). I try and keep it to 3 minutes or less so you can read in the elevator, waiting in lines, in the bathroom...😝

RAG Isn't a Technology. It's a Design Pattern. And You're Probably Wasting Time Building It.