Question 1

Do you build with LangGraph, AutoGen, CrewAI, or custom orchestration?

Accepted Answer

We pick per workflow. LangGraph is our default for stateful, branching workflows because the state machine is auditable. We use lighter custom orchestration when the workflow is mostly linear with a small number of tools. We avoid over-frameworked stacks where the framework abstraction is heavier than the workflow it is modeling. The choice is documented in the architecture decision record.

Question 2

When do you choose RAG vs fine-tuning?

Accepted Answer

RAG handles the facts (changing knowledge); fine-tuning handles the voice or stable in-domain reasoning patterns. Most production systems use both: RAG for grounded answers with citations, light fine-tuning for tone or format compliance. See /blog/rag-vs-vector-search-vs-llm-fine-tuning/ for the decision framework.

Question 3

How do you prevent hallucination?

Accepted Answer

Three layers. Retrieval grounding — the agent cites the source document for every factual claim. Deterministic guardrails — every tool call is checked against a policy ruleset before execution. Full observability — every prompt, retrieved context, tool call, and model response is logged so any unsafe behavior is reproducible after the fact.

Question 4

What models do you use?

Accepted Answer

Closed-source frontier models (OpenAI, Anthropic, Google) when accuracy is paramount. Open-source models (Llama, Qwen, Mixtral families) when data sovereignty or unit economics demand it. Embedding models picked per workload, measured against the eval set. Every choice is documented and reviewed quarterly so the system rides the model-quality curve.

Question 5

Can the system run fully on-premise?

Accepted Answer

Yes. We have shipped fully self-hosted RAG and agent systems on customer infrastructure with open-source generation models, on-premise vector stores, and on-premise observability. The trade-off is unit economics and the speed of riding the frontier model improvement curve, both explicit in the architecture decision record.

Question 6

What about token cost in production?

Accepted Answer

Every project includes a quarterly Token Audit re-evaluating routing, caching, and model selection against the eval set. See /blog/ai-agent-token-cost-audit/ for the methodology. Buyers who skip post-launch token audits silently lose 30-50% margin to unrouted frontier model usage.

Technical Approach

Overview

Key things to know about the technical approach

Orchestration is chosen per workflow

RAG and fine-tuning solve different problems

Hallucination control is layered

Frequently Asked Questions

Related guides

Other FAQ Categories