all notes
2025-06-21Naman Barkiya

When to use RAG, when to fine-tune, and when to just use a good prompt.

Start with prompting. Add RAG when the product needs answers grounded in private data. Reach for fine-tuning only after the first two options have been exhausted and evaluation shows they were not enough. Most founders reach for fine-tuning first, which is the most expensive mistake they can make.

A decision tree for founders choosing between the three AI implementation shapes, with rough costs for being wrong.

Every week a founder asks us whether their product should use RAG, fine-tune a model, or "just prompt." The honest answer is that the three solve different problems and picking wrong is how six months of budget disappears into an evaluation loop that never converges.

Here is the decision tree we use internally, and the rough cost of being wrong on each branch.

Start with: what does your product need the model to do?

The rule: start with prompting, add RAG when the product needs grounded facts, reach for fine-tuning only when the first two have been exhausted and you have measurable evidence they weren't enough.

The rough cost of each option

The most common mistake we see

Founders reach for fine-tuning because it feels like the "real AI" move. The majority of the time what they actually need is better prompting plus RAG over their support docs. We have talked four prospective clients out of fine-tuning engagements in the last year, and in each case the cheaper solution landed inside the budget and timeline the fine-tune would have blown.

The opposite mistake is rarer but expensive: founders with genuine domain-voice problems who try to prompt their way through them. Legal drafting, medical triage, highly structured report generation. Prompting produces 85% acceptable output and 15% dangerous output, which is exactly the distribution that eats trust.

Evaluation is the unskippable piece

Whichever path you pick, you need an evaluation harness before you need the feature. Otherwise you cannot tell if a change made the product better or worse, and progress becomes vibes.

For LaunchProd we ran RAG with a retrieval evaluation that scored relevance on every change. The harness saved us from two "obvious improvements" that would have quietly regressed quality.


Heuristics


Written 2025-06-21 by Naman Barkiya.

FAQ

Questions this usually surfaces.

What's the typical engineering cost of each option?
Prompting: two to five days. RAG: two to four weeks for a real implementation. Fine-tuning: four to twelve weeks including data, training, and evaluation.
Why is evaluation the unskippable piece?
Without an evaluation harness, you cannot tell whether a change made the system better or worse. Progress becomes vibes, which is how AI products quietly degrade after launch.