LlamaIndex ‘legal-kb’: Agentic recovery over Index v2 with find, find, read, and grep tools

LlamaIndex published legal-kbpublic reference request on GitHub. It is defined as a database of legal documents, powered by LlamaIndex Index v2 (LlamaParse Platform). The project demonstrates a pattern the team calls the Retrieval Harness for agent retrieval.
The method is different from single shot recovery. Instead of embedding searches per query, the agent is provided with system-style tools. It can then tap into a large, dynamic knowledge base to solve a task. Developers of mirror operations already know the tools: semantic and keyword search, regex grep, file search, and read.
What is legal-kb?
legal-kb is a TanStack Start web application, not a library. You sign in, create a project, upload files, and chat with an agent. Each project is displayed as a managed LlamaCloud Index v2. Uploaded files are processed and indexed automatically in the background. The chat agent then queries that reference live each time.
Retrieval Harness, in plain terms
The harness provides a continuous data pipeline over your documents. It connects to the data source, indexes it, and keeps it updated. On top of that pipeline, it exposes a set of tools to the agent.
Those tools are intentionally close to the functionality of the file system. An agent can list files, read a file, insert within a file, or perform a hybrid search. Because the tools are standard, you can connect the harness to your agents.
Agent in src/lib/agent.ts you are given four tools. Each one maps to the Index v2 retrieval API. The table below lists them as initialized.
| A tool | It supports API | Key parameters | What it does |
|---|---|---|---|
retrieve |
beta.retrieval.retrieve |
query, top_k, score_threshold, rerank_top_n, file_name, file_version |
It uses hybrid semantic search; optional rescheduling; returns fragments and quotes |
findFiles |
beta.retrieval.find |
file_name, file_name_contains |
Searches files by specific name or substring; paginate by default |
readFile |
beta.retrieval.read |
file_id, offset, max_length |
Reads raw file content, with size and length windows |
grepFile |
beta.retrieval.grep |
file_id, pattern, context_chars, limit |
Matches a pattern in a single file; returns the locations of the character |
A system prompt enforces the command. The agent should make the call findFiles first to establish a list of documents. It then decreases with retrieveand confirms the exact words with readFile or grepFile before saying.
How it works under the hood
Loads follow a clear pipe in between src/lib/files.ts. Bytes are pushed to the project’s LlamaCloud source directory. A File again ProjectFile row is written in PostgreSQL with Prisma. Index synchronization is triggered but not pending; the polling state of the UI is up to date.
Versioning is performed in the (project, filename) pair. Reloading nda.pdf in the same project produces v1, v2, v3 side by side. The retrieval layer filters to version metadata field. This gives version control over the knowledge base itself.
The agent uses the ToolLoopAgent from Vercel AI SDK 6. You choose OpenAI or Anthropic each time and bring your keys. Reasoning is distributed: Claude’s models use extended thinking; OpenAI’s reasoning models use moderate reasoning effort.
Here is a brief but honest view of retrieve instrument and agent.
import { LlamaCloud } from '@llamaindex/llama-cloud'
import { tool, ToolLoopAgent } from 'ai'
import { z } from 'zod'
import { makeCitationId } from './citations'
// One tool closure per index. Wraps Index v2 retrieval APIs.
function createLlamaParseTools(apiKey: string, projectId: string, indexId: string) {
const client = new LlamaCloud({ apiKey })
const retrieve = tool({
description: 'Run a semantic retrieval query against an index.',
inputSchema: z.object({
query: z.string(),
top_k: z.number().nullable(),
score_threshold: z.number().nullable(),
rerank_top_n: z.number().nullable(), // set to enable reranking
file_name: z.string().nullable(), // metadata filter
file_version: z.number().nullable(),
}),
execute: async ({ query, top_k, score_threshold, rerank_top_n, file_name }) => {
const custom_filters = file_name
? { file_name: { operator: 'eq' as const, value: file_name } }
: undefined
const response = await client.beta.retrieval.retrieve({
index_id: indexId,
project_id: projectId,
query,
top_k,
score_threshold,
rerank: rerank_top_n != null ? { enabled: true, top_n: rerank_top_n } : undefined,
custom_filters,
})
// Return a model-readable list plus citations that drive the UI chips.
const citations = response.results.map((r) => ({
id: makeCitationId(), // e.g. "c7f2qa"
fileName: r.metadata?.file_name,
score: r.rerank_score ?? r.score ?? null,
preview: r.content.slice(0, 500),
}))
const formatted = response.results
.map((r, i) => `### Result #${i + 1}nn${r.content.slice(0, 600)}`)
.join('nn---nn')
return { formatted, citations }
},
})
// findFiles / readFile / grepFile follow the same shape, backed by
// client.beta.retrieval.find / .read / .grep
return { retrieve /* , findFiles, readFile, grepFile */ }
}
export function buildAgent(model, apiKey: string, projectId: string, indexId: string) {
return new ToolLoopAgent({
model,
tools: createLlamaParseTools(apiKey, projectId, indexId),
instructions:
'Always call findFiles first, ground every answer in the documents, ' +
'and cite ids inline as `cite:`.',
})
}
Answers contain visual excerpts. Each detected episode receives a short ID, such as cite:c7f2qa. The agent refers to the id on the line, and the UI provides a clickable quoting chip. Clicking it opens a screenshot of the source page with a bounding box rectangle over the highlighted text.
Naive RAG versus agent Retrieval Harness
The harness is a different use model than the single shot RAG. The comparison below focuses on behavior.
| Size | Naive / one-shot RAG | Agentic Retrieval Harness (Guide v2) |
|---|---|---|
| Retrieval flow | One vector search per query | Multi-step loop: find → retrieve → read/grep |
| Search for methods | Vector similarity only | Hybrid semantic, keyword, and regex grep search |
| Context | Top-k fixed slices | The agent reads full files or windows on demand |
| Burning | Fixed index | A continuous pipeline with version synchronization |
| Precision control | It is very hidden | top_k, score_threshold, rerank_top_n transparent |
| Quotes | Chunk ID | Visual excerpts with page screenshots and bboxes |
| A very good fit | A short question answers | Long horizon document activities |
Use situations, and examples
The design targets domains where agents navigate large sets of documents. Legal and fintech are examples mentioned.
- Consider the contractual question: ‘What notice is required to terminate an MSA?’ The agent lists the files, it works
retrievethen greps the specific clause. Responds with a quote from a specific page. - Consider due diligence throughout the data room: An agent can
findFilesby name thenreadFileeach candidate. Checks paragraphs without opening all PDFs. - Consider a policy base with version: Because
retrievehe accepted afile_versionfilter, the agent can query a specific version. This supports tracking of change over time.
Reference implementation
‘+label+’
‘+html; feed.appendChild(children); ping(); res(); }, delay); }); } var C1,C2; function run(forceKey){ if(busy) return; busy=true; go.disabled=true; if(empty)empty.style.display=’none’; feed.innerHTML=”; var it = forceKey? INTENTS.filter(function(x){return x.key===forceKey;})[0] : match(input.value||”); C1=remove(); C2=remove(); if(!it){ addStep(‘find’,’findFiles’,callHTML(‘findFiles’,{},’3 files: Mutual_NDA.pdf (v2), MSA_Acme_Vendor.pdf (v1), Employment_Agreement.pdf (v1)’),150) add,(response),'(response),'(fu)
The referenced documents do not contain enough information to answer that. Try termination, confidentiality, payment terms, non-compete, liability, or governing law.
‘,700); }) .then(done); come back; } litFile(it.file); // 1) findFiles (always first) addStep(‘find’,’findFiles’,callHTML(‘findFiles’,{},’3 files listed · ‘+it.file+’ (v’+it.ver+’) is a candidate’),150) // 2) return (mixed search) .then(function(){ return addStep(”,’retrieve’,callHTML(‘retrieve’,{query:it.query,top_k:5,rerank_top_n:3}),820); }) .then(function(){ return addStep(”,’results’,retrieveResults) 7/8 ensure) .then(function(){ return addStep(‘grep’,’grepFile’,callHTML(‘grepFile’,{file:it.file,pattern:it.grep.slice(0,32)+’…’},’1 match confirmed on p.’+it.page),820); }) // 4) (return response) with base. addStep(‘ans’,’answer’,’
‘+answerHTML(it)+’
‘,780); }) .then(done); } function done(){ busy=false; go.disabled=false; } function callHTML(name,args,note){ var a=Object.keys(args).map(function(k){ var v=args[k]; var val = typeof v===’number’ ? ‘‘+v+’‘:’“‘+esc(String(v))+'”‘; come back‘+k+’: ‘+val; }).join(‘, ‘); var line=”
→ a tool “+name+'({ ‘+a+’ })’; if(note)line+=’
✓ ‘+esc(note)+’‘; line+=’
‘; return line; } function returnResults(it){ var s2=(it.score-0.14).toFixed(3); var h=”
Result #1 · ‘+it.file+’ · p.’+it.page+’score ‘+it.score.toFixed(3)+’ · quote:’+C1+’
‘+esc(it.chunk.slice(0,150))+’…
‘+’
Result #2 · ‘+it.file+’ · p.’+it.page+’points ‘+s2+’ · quote:’+C2+’
‘+esc(it.chunk.slice(120,250))+’…
‘+’
‘; return h; } function answerHTML(it){ var html=esc(it.answer) .return(‘§CITE§’,’quote:’+C1+’‘).return(‘§CITE2§’,’quote:’+C2+’‘); // stash the root of the modal._cur=it; return html; } // citation modal var modal=root.querySelector(‘#modal’), shot=root.querySelector(‘#shot’), mpv=root.querySelector(‘#mpv’), mt=root.querySelector(‘#mt’); feed.addEventListener(‘click’,function(e){ var chip=e.target.closest(‘.citechip’); if(!chip)return; var it=root._cur; if(!it)return; mt.textContent=it.file+’ · page ‘+it.page+’ · v’+inner.=’ shot.
‘+esc(it.chunk)+’
‘+”; mpv.textContent=it.chunk; modal.classList.add(‘on’); ping(); }); root.querySelector(‘#mx’).onclick=function(){modal.classList.remove(‘on’);ping();}; modal.onclick=function(e){ if(e.target===modal){modal.classList.remove(‘on’);ping();} }; go.onclick=function(){ run(null); }; input.addEventListener(‘keydown’,function(e){ if(e.key===’Enter’)run(null); }); // automatically resize WordPress embed function ping(){ try { var h=document.getElementById(‘mtp-harness’).offsetHeight+40; parent.postMessage({type:’mtp-harness-height’,height:h},’*’); }catch(e){} } window.addEventListener(‘load’,ping); window.addEventListener(‘resize’,ping); setTimeout(ping, 300); })();



