AI & Outsourcing Glossary
Practitioner definitions for the AI engineering, RAG, agent orchestration, outsourcing, and SaaS MVP terms used across the DevStudio AI knowledge base.
Each term is defined in 1-3 sentences with cross-links to the article that goes deeper. 62 terms in this edition; the glossary is reviewed quarterly and grows with the article corpus.
-
Agent
An AI system that takes goal-directed actions across one or more tools to complete a task. Distinguished from a chatbot by the presence of tool-use, multi-step planning, and the ability to execute (not just describe) workflows.
Read more → -
Multi-Agent System
An architecture where multiple specialized AI agents coordinate to solve a task, often via a planner / worker / critic decomposition. Useful when tasks span multiple domains or require parallelism.
Read more → -
Tool Use
An agent's ability to call external tools (APIs, databases, search, code execution) as part of its reasoning. Modern agents typically have a declared tool surface with explicit auth scopes per tool.
-
Function Calling
A model capability where the LLM emits a structured tool-call request rather than free-form text, enabling reliable downstream execution. Sometimes called structured output or tool calling.
-
Agent Orchestration
The control-flow layer that decides which tool an agent calls, in what order, and how to recover from failure. Typically implemented as a state machine (LangGraph) or custom orchestrator.
Read more → -
LangGraph
A stateful, graph-based agent orchestration framework built on LangChain. The 2026 default for stateful, branching agent workflows because the state machine is auditable.
Read more → -
AutoGen
Microsoft's multi-agent conversation framework. Best fit for collaborative agent patterns; used less often for production systems where a state machine is easier to audit.
-
CrewAI
A role-based multi-agent framework that emphasizes specialized agent personas. Lighter weight than LangGraph or AutoGen.
-
Reasoning Trace
The intermediate steps an agent takes between receiving a task and emitting an answer, including tool calls and intermediate model outputs. Critical for observability and debugging.
-
Self-Reflection
An agent pattern where the model reviews its own output before final emission, often with a critic prompt. Improves quality at the cost of latency and tokens.
-
RAG
Retrieval-Augmented Generation. A pipeline where a query is matched against a document corpus, top-K relevant chunks are retrieved, and a language model generates an answer grounded in those chunks with explicit citations.
Read more → -
Vector Search
Approximate nearest-neighbor search over learned semantic embeddings. A retrieval primitive used by RAG, not a complete RAG pipeline by itself.
Read more → -
Hybrid Retrieval
A retrieval strategy that combines lexical search (BM25) with dense vector search and merges results via reciprocal rank fusion or weighted scoring. The 2026 production default for enterprise RAG.
Read more → -
BM25
A bag-of-words ranking function used in lexical search engines (Elasticsearch, OpenSearch). Catches exact-match keyword cases that pure vector retrieval misses.
-
Reranking
A second-pass scoring step that re-orders top-K retrieval candidates using a more expensive cross-encoder model. Typically lifts retrieval precision 5-15 points.
-
Cross-Encoder
A model that takes a query and a candidate document together and outputs a relevance score. Used in reranking; more accurate but slower than embedding-based retrieval.
-
Chunking
The process of splitting source documents into retrieval-sized units before embedding. Strategy must vary by source type (legal by clause, code by function, support by message).
Read more → -
Embedding
A vector representation of text (or other content) that captures semantic similarity. Generated by an embedding model and stored in a vector index.
-
Embedding Drift
The gradual loss of retrieval quality as a corpus expands or evolves while the embedding index remains static. Addressed by re-embedding or upgrading the embedding model.
-
Grounded Generation
An answer-generation pattern where the LLM is prompted to cite specific source chunks for every factual claim. The basis of citation-correct RAG outputs.
Read more → -
Citation Correctness
An eval metric measuring whether the source a RAG answer cites for a claim actually contains that claim. Production threshold typically >=95%.
Read more → -
Faithfulness
An eval metric measuring whether all factual claims in a generated answer are supported by the retrieved context. Production threshold typically >=95%.
Read more → -
Refusal Correctness
An eval metric measuring whether the system correctly refused to answer when no relevant context was retrieved. Catches hallucination and over-confidence.
Read more → -
Eval Set
A labeled reference set of 200+ test cases (query, expected answer, expected tool calls, expected refusal flag) used to measure agent or RAG quality. Built in week 1.
Read more → -
Eval Week 1
DevStudio's commitment to ship the eval set before any production code merges, gating CI on the eval pass rate from day one.
Read more → -
CI Gating
A continuous-integration practice where the eval suite runs on every PR; merge is blocked if any metric drops below threshold. Prevents silent quality regressions.
-
Production Drift
The gradual degradation of system quality in live traffic compared to the eval baseline. Detected by sampled live-quality and weekly eval re-runs.
Read more → -
LLM-as-Judge
An evaluation technique where a peer-tier or stronger LLM rates outputs against a rubric. Used for scalable scoring; calibrated against human ratings on 5-10% of samples.
-
Token Audit
A 90-day cadence of re-evaluating model routing, caching, and prompt budget to keep AI unit economics predictable. Part of DevStudio's quarterly Token Audit commitment.
Read more → -
Model Routing
Routing different request types to different model tiers (frontier, strong, fast, open-source) based on cost-vs-quality tradeoffs measured against the eval set.
Read more → -
Prompt Budget
An explicit cap on system-prompt and per-request token usage, monitored against actual production usage. Prevents prompt-rot and silent cost growth.
-
Semantic Caching
A caching layer that returns cached responses for queries semantically similar to past queries, reducing token cost for FAQ-shaped traffic.
-
Unit Cost Ceiling
The maximum acceptable cost per resolved query or generated artifact. Set in scoping; instrumented in production from day one.
Read more → -
Observability for AI
Instrumentation that captures latency, cost, and quality drift per AI surface. Tools include LangSmith, Phoenix (Arize), Datadog APM with OpenTelemetry.
-
Paid Scoping
DevStudio's 1-2 week, $700-$2,800 fixed-price feasibility engagement that produces a written go/no-go, 50-item readiness checklist, eval plan, and cost model. About one in four scopings recommends not building.
Read more → -
RFP
Request For Proposal. A structured document used to solicit bids from outsourcing vendors. Effective AI RFPs run 12 sections covering business outcome, eval requirements, and walk-away criteria.
Read more → -
Body Shop
An outsourcing model where vendors bill hourly for engineer time without committing to engineering discipline (no eval, no acceptance criteria, no code ownership). Avoid for AI work.
Read more → -
Senior Offshore
An outsourcing model where ex-FAANG/BAT engineering leadership delivers production-grade work at 3-4x the per-dollar engineering depth of onshore senior at parity quality.
Read more → -
Build-and-Train Hybrid
An engagement pattern where a vendor builds v1 while the in-house team learns alongside, then takes ownership at production. Combines vendor speed with in-house ownership.
Read more → -
Operate-with-You
A post-launch retainer model where the vendor maintains the production system on a monthly cost-and-scope basis while the customer's team operates day-to-day.
-
Source-Code Ownership
A contract clause confirming the customer owns all source code, infrastructure-as-code, eval set, and runbook delivered. The single most important outsourcing-contract item for AI projects.
Read more → -
Acceptance Criteria
Numeric and behavioral conditions that define when a delivery increment is 'done'. Per-increment criteria are part of every DevStudio engagement.
Read more → -
Architecture Decision Record
A short written document capturing one load-bearing technical choice with the trade-offs and the reason it was chosen. Onboarding artifact and ongoing reference.
Read more → -
Onboarding Checklist
A 30-item shared checklist run on day 1 of every engagement covering access provisioning, code-base orientation, decision contracts, eval expectations, and escalation paths.
Read more → -
Walk-Away Criteria
Pre-agreed conditions under which the customer (or vendor) ends the engagement mid-flight. Documented in the RFP and contract before kickoff.
Read more → -
6-Month QA Window
DevStudio's commitment to a six-month warranty period for production fixes after handover, included in every project rate.
-
MVP
Minimum Viable Product. A focused product release that ships the smallest functional surface able to validate the riskiest assumption in the business plan.
Read more → -
Pre-Verified Modules
Battle-tested third-party services (Auth0/Clerk for auth, Stripe for billing, Resend for email) used to compress MVP build time. The 80% of an MVP that is not your moat.
Read more → -
Multi-Tenancy
An architectural pattern where one product instance serves multiple customer organizations with workspace-level data isolation. Decision is hard to reverse; pick deliberately at MVP stage.
Read more → -
Vertical Slice
An end-to-end thin path through the product (sign-up to first valuable action) used as an early proof point. Built before sideways feature breadth in MVP.
-
PII
Personally Identifiable Information. Subject to data-residency, redaction, and audit requirements that affect AI architecture choices.
-
Data Residency
The legal requirement that data physically remain in a specific geographic jurisdiction. Affects cloud region selection and vendor model choice.
-
HIPAA
U.S. Health Insurance Portability and Accountability Act. Imposes strict data handling requirements on AI systems that process U.S. healthcare data.
-
SOC 2
A widely-used compliance framework for SaaS vendors covering security, availability, confidentiality, and processing integrity. Common buyer requirement at mid-market.
-
Prompt Injection
An attack where adversarial inputs cause the LLM to deviate from its instructions. Mitigated by input scanning, output validation, and least-privilege tool scopes.
-
PII Redaction
An ingestion-pipeline step that removes personal identifiers from documents before embedding, preventing PII from entering the retrieval index.
-
Onshore
An outsourcing model where engineers are in the same country as the buyer. Premium rate, zero time-zone friction, smallest legal-IP setup.
Read more → -
Nearshore
An outsourcing model where engineers are in a country within 3-4 hours of the buyer's time zone. Latin America to US, Eastern Europe to EU.
Read more → -
Offshore
An outsourcing model where engineers are 8-12+ hours offset from the buyer. East Asia, South Asia. Async-first delivery is the operating model.
Read more → -
EEAT
Experience, Expertise, Authoritativeness, Trustworthiness. Google's content quality rubric. Improved by named authors with Person Schema, citation discipline, and verifiable expertise.
-
Content Cluster
A group of topically-related articles cross-linked into a topical authority surface, often anchored by a pillar page. Strong cluster: 5+ articles plus internal links plus a pillar.
-
Pillar Page
A long-form, authoritative page that aggregates a topical cluster into a single SEO surface, with explicit ItemList Schema and internal links to every cluster article.