Back to FAQ Hub
FAQ Category
Technical Approach
Architecture choices, model routing, RAG vs fine-tuning, eval framework, observability, and security posture for production AI in 2026.
Do you build with LangGraph, AutoGen, CrewAI, or custom orchestration?
We pick per workflow. LangGraph is our default for stateful, branching workflows because the state machine is auditable. We use lighter custom orchestration when the workflow is mostly linear with a small number of tools. We avoid over-frameworked stacks where the framework abstraction is heavier than the workflow it is modeling. The choice is documented in the architecture decision record.
When do you choose RAG vs fine-tuning?
RAG handles the facts (changing knowledge); fine-tuning handles the voice or stable in-domain reasoning patterns. Most production systems use both: RAG for grounded answers with citations, light fine-tuning for tone or format compliance. See /blog/rag-vs-vector-search-vs-llm-fine-tuning/ for the decision framework.
How do you prevent hallucination?
Three layers. Retrieval grounding — the agent cites the source document for every factual claim. Deterministic guardrails — every tool call is checked against a policy ruleset before execution. Full observability — every prompt, retrieved context, tool call, and model response is logged so any unsafe behavior is reproducible after the fact.
What models do you use?
Closed-source frontier models (OpenAI, Anthropic, Google) when accuracy is paramount. Open-source models (Llama, Qwen, Mixtral families) when data sovereignty or unit economics demand it. Embedding models picked per workload, measured against the eval set. Every choice is documented and reviewed quarterly so the system rides the model-quality curve.
Can the system run fully on-premise?
Yes. We have shipped fully self-hosted RAG and agent systems on customer infrastructure with open-source generation models, on-premise vector stores, and on-premise observability. The trade-off is unit economics and the speed of riding the frontier model improvement curve, both explicit in the architecture decision record.
What about token cost in production?
Every project includes a quarterly Token Audit re-evaluating routing, caching, and model selection against the eval set. See /blog/ai-agent-token-cost-audit/ for the methodology. Buyers who skip post-launch token audits silently lose 30-50% margin to unrouted frontier model usage.