Architecting AI Systems Part 4:When Does the System Actually need AI?

This is the final post in a series. In the earlier posts, we worked through each of these decisions in detail

A fair question to ask after three posts of “you didn’t need AI here” is whether we ever think you do. We do. Knowing where that line sits is the entire job.

The discipline in the first three posts wasn’t skepticism about AI. It was refusing to spend it in the wrong places. That only works if you know exactly where the right places are — and can build them properly when you get there.

So here is the other half of the same review: the parts of this system where we would reach for embeddings, retrieval, and generation without hesitation, what would have to be true for the line to move, and what we would actually build when it does.

When the question space stops being finite

In the assistant, the questions clustered. They clustered because the platform was domain-specific — a bounded set of concerns the business already organized around. Classification worked because the space was knowable.

That property is a condition, not a law. The moment the product expands — a broader catalog, a general-purpose store, user-generated content, questions that wander outside the domain — the clusters dissolve. You can’t enumerate intents you can’t anticipate.

That is the point where generation stops being avoidable. When a meaningful share of real questions genuinely don’t belong to any known category, the right architecture is retrieval-augmented generation: embed the corpus, retrieve what’s relevant to a novel question, and let a model compose an answer grounded in it. Not because it’s modern — because there is no finite list left to classify against.

The principle: classification scales with a bounded problem; generation is what you reach for when the boundary is gone. The architect’s job is to know which world the client is actually in — and to design the seam so one can grow into the other without a rebuild.

When the signal lives only in language

Even in the lean version of this system, embeddings earned a place. Recipe narratives and customer reviews carry meaning that no structured field holds — why people actually choose something: comforting, quick, kid-friendly, good for a hard week. That signal exists only in unstructured human language, and similarity over embeddings is the right tool to extract it.

This is worth stating plainly because it cuts against a lazy reading of the series: we used embeddings. We were right to. The discipline wasn’t avoiding them — it was not dragging a vector database and a generation pipeline along for the ride when the task was “understand meaning in prose,” not “answer questions at scale.”

The principle: when the information you need exists only as language and never as a field you could query, semantic representation is the correct tool — and it stands on its own, independent of whether you also need retrieval or generation.

When scale changes the math

For a catalog of this size, exact, precomputed, in-memory approaches outperform heavy retrieval infrastructure on every axis that matters — cost, latency, simplicity. That is a statement about scale, and scale moves.

There is a real size at which the picture inverts: when the corpus grows large enough that exact comparison is too slow, that holding everything in memory stops being practical, that approximate search over a purpose-built index becomes the only way to keep responses fast. At that point a vector database isn’t trend-chasing. It’s the correct tool for a genuine scale problem — and standing one up, tuned properly, is exactly the work we’d do.

The principle: the right architecture is a function of the numbers, not the fashion. The same firm that argued against the infrastructure at one scale is the firm that builds it at another — because the recommendation was always about the problem, never about the technology.

What this actually says about the work

Across the series, the answer kept coming back as relational logic, curated rules, a small classifier — for this client, at this scale, today. None of those answers were anti-AI. Every one of them was the result of asking what the system needed to do and matching the tool to it.

The same method that says “not here” says “absolutely here” when the conditions are met — and means it with the same precision. An open-ended question space calls for retrieval and generation. Meaning trapped in language calls for embeddings. Scale beyond a threshold calls for a vector database. We will build every one of those the day the problem warrants it, and we’ll build it well, precisely because we didn’t reach for it the day it didn’t.

Choosing the simplest sufficient architecture and architecting a sophisticated AI system are not opposing skills. They are the same skill, pointed at different problems. Knowing which problem is in front of you is the part that matters.

Team Cennest