Generative AI Core

Building Quality Generative AI Capabilities

The pointPrompting is 80% of the game — master it before fine-tuning.

At a glanceEverything on this page

GENERATION MATURITY LADDER

G0 → G5 with the diagnostic build path — each step escalates only on demonstrated failure of the previous one.

G2–G4 are branches, not rungs — compose by need; many systems need RAG and never need function calling.

START SIMPLE

CUSTOMIZE

G0Basic Prompting

Plain promptSimple Q&A, classification, drafting

Start with a PromptPlain zero/few-shot prompt

Wrong format or inconsistent?

G1Engineered Prompts

+ CoT, few-shot, templatesReasoning, reliability, repeatability

Engineer the PromptCoT, templates, decomposition

Need typed output?

G2Structured Output

+ JSON, schemas, typedNeed validated data, not free text

Structured OutputJSON mode, schemas, validation

Need to trigger actions?

G3Function Calling

+ Tool invocation (single)Model should trigger APIs

Function CallingModel invokes tool / API

Wrong facts or missing data?

G4Retrieval-Augmented

+ RAG / groundingNeed domain facts or current info

Add RAGRetrieve & ground in domain data

Style / tone still off?

G5Fine-Tuned / Custom

+ LoRA, RLHF/DPOStyle / behavior, not knowledge

Fine-TuneLoRA, PEFT, RLHF/DPO

Boundary: a single model-triggered tool call is Generative (G3); a loop that observes results and decides next actions is Agentic (L1+) — see Agentic AI Core.

Diagnose first: bad prompt, wrong context, or wrong model?

GROUNDING SPECTRUM

Factuality Axis — From Creative Generation to Verified Claims

MORE CREATIVE

MORE FACTUAL

UngroundedFree-form creative generation

AttributedCites sources inline

GroundedRefuses when unsure

VerifiedEvery claim fact-checkedCitation-check vs. retrieval + judge-model verification + human spot-audit · adds latency & cost

Ungrounded

Free-form creative generation

Attributed

Cites sources inline

Grounded

Refuses when unsure

Verified

Every claim fact-checked

Citation-check vs. retrieval + judge-model verification + human spot-audit · adds latency & cost

Favor Creative:

Creative / Marketing

Brainstorming

Ideation

Favor Factual:

High Stakes

Regulated / Legal

Audit Required

Must Be True

CAPABILITY REFERENCE

The 11 pillars that implement every G-tier · Click any tile for component details

GenAI
Use Cases

Cross-cutting

GenAI Use Case Patterns

G4 · RAG

Knowledge & Context

G0 · G1

Prompt Engineering

G2 · G3

Input/Output

Cross-cutting

Response Quality

Cross-cutting

Evaluation & Testing

Cross-cutting

Model Orchestration

Cross-cutting

Human-AI Interaction

Cross-cutting

Safety & Guardrails

Cross-cutting

GenAI Operations

Fine-Tuning

Quality
Outputs

RAG DEEP DIVE — FROM NAIVE TO ADVANCED

Where most GenAI projects actually fail · Escalate only when the previous tier can't explain the failure

R0Naive RAG

Chunk → embed → top-k → stuff

“Prototype, simple docs”

Fixed chunks

Single embed model

Top-k vector

Stuff into context

R1Tuned Retrieval

Chunking · hybrid · filters

“Retrieving wrong stuff”

Semantic / parent-child chunks

Hybrid (dense + BM25)

Metadata filters

R2Reranked

Cross-encoder · MMR

“Top-k ≠ top-ranked”

Cross-encoder rerank

MMR diversity

LLM-as-reranker

R3Query-Aware

Rewrite · HyDE · decompose

“Query-doc mismatch”

Query rewriting

HyDE

Multi-query expansion

Decomposition

R4Tuned Retriever

Fine-tuned embeddings

“Generic embeds miss domain”

Domain-tuned embeddings

Contrastive learning

Custom reranker

R5Advanced Patterns

Hierarchical · graph · agentic

“Single-shot unfit”

Hierarchical RAG

Graph RAG

Agentic RAG (iterative)

Long-context RAG

RAG Ops (applies to all tiers):

Eval: RAGAS · MRR · NDCG · Recall@k

Citation & Attribution

Refusal When Unsure

Index Freshness & Lineage

ANTI-PATTERNS TO AVOID

Common mistakes that make generative systems hallucinate, drift, or burn cash

Blind Fine-Tuning

Fine-tuning when a better prompt would have worked

→ Exhaust prompt engineering first

RAG Without Evals

Shipping retrieval with no golden dataset

→ Define eval data before retrieval code

Temperature Roulette

Temperature 1.0 on factual queries

→ Low temp for facts, high for creative

Bigger Model Fallacy

Reaching for a bigger model to fix a prompt issue

→ Diagnose: prompt, context, or model?

Launch Without a Judge

No LLM-as-judge or human eval pre-production

→ Build an eval harness before shipping

Prompt Spaghetti

Giant system prompt with conflicting instructions

→ Decompose: one prompt, one job

RAG Without Reranking

Top-k dumped straight into context, no rerank

→ Always add a reranker above naive RAG

Hallucination Denial

No refusal behavior — model confidently makes things up

→ Teach the system to say “I don’t know”

Index Staleness

RAG corpus never refreshed after launch

→ Treat index freshness as a first-class metric

Quality-First

Grounded

Transparent

Safe

Cost-Effective

Generative AI Core - AI Transformation Framework