LlamaIndex ‘legal-kb’: Agentic recovery over Index v2 with find, find, read, and grep tools

0 0 5 minutes read

LlamaIndex ‘legal-kb’: Agentic recovery over Index v2 with find, find, read, and grep tools

LlamaIndex published legal-kbpublic reference request on GitHub. It is defined as a database of legal documents, powered by LlamaIndex Index v2 (LlamaParse Platform). The project demonstrates a pattern the team calls the Retrieval Harness for agent retrieval.

The method is different from single shot recovery. Instead of embedding searches per query, the agent is provided with system-style tools. It can then tap into a large, dynamic knowledge base to solve a task. Developers of mirror operations already know the tools: semantic and keyword search, regex grep, file search, and read.

What is legal-kb?

legal-kb is a TanStack Start web application, not a library. You sign in, create a project, upload files, and chat with an agent. Each project is displayed as a managed LlamaCloud Index v2. Uploaded files are processed and indexed automatically in the background. The chat agent then queries that reference live each time.

Retrieval Harness, in plain terms

The harness provides a continuous data pipeline over your documents. It connects to the data source, indexes it, and keeps it updated. On top of that pipeline, it exposes a set of tools to the agent.

Those tools are intentionally close to the functionality of the file system. An agent can list files, read a file, insert within a file, or perform a hybrid search. Because the tools are standard, you can connect the harness to your agents.

Agent in src/lib/agent.ts you are given four tools. Each one maps to the Index v2 retrieval API. The table below lists them as initialized.

A tool	It supports API	Key parameters	What it does
`retrieve`	`beta.retrieval.retrieve`	`query`, `top_k`, `score_threshold`, `rerank_top_n`, `file_name`, `file_version`	It uses hybrid semantic search; optional rescheduling; returns fragments and quotes
`findFiles`	`beta.retrieval.find`	`file_name`, `file_name_contains`	Searches files by specific name or substring; paginate by default
`readFile`	`beta.retrieval.read`	`file_id`, `offset`, `max_length`	Reads raw file content, with size and length windows
`grepFile`	`beta.retrieval.grep`	`file_id`, `pattern`, `context_chars`, `limit`	Matches a pattern in a single file; returns the locations of the character

A system prompt enforces the command. The agent should make the call findFiles first to establish a list of documents. It then decreases with retrieveand confirms the exact words with readFile or grepFile before saying.

How it works under the hood

Loads follow a clear pipe in between src/lib/files.ts. Bytes are pushed to the project’s LlamaCloud source directory. A File again ProjectFile row is written in PostgreSQL with Prisma. Index synchronization is triggered but not pending; the polling state of the UI is up to date.

Versioning is performed in the (project, filename) pair. Reloading nda.pdf in the same project produces v1, v2, v3 side by side. The retrieval layer filters to version metadata field. This gives version control over the knowledge base itself.

The agent uses the ToolLoopAgent from Vercel AI SDK 6. You choose OpenAI or Anthropic each time and bring your keys. Reasoning is distributed: Claude’s models use extended thinking; OpenAI’s reasoning models use moderate reasoning effort.

Here is a brief but honest view of retrieve instrument and agent.

import { LlamaCloud } from '@llamaindex/llama-cloud'
import { tool, ToolLoopAgent } from 'ai'
import { z } from 'zod'
import { makeCitationId } from './citations'

// One tool closure per index. Wraps Index v2 retrieval APIs.
function createLlamaParseTools(apiKey: string, projectId: string, indexId: string) {
  const client = new LlamaCloud({ apiKey })

  const retrieve = tool({
    description: 'Run a semantic retrieval query against an index.',
    inputSchema: z.object({
      query: z.string(),
      top_k: z.number().nullable(),
      score_threshold: z.number().nullable(),
      rerank_top_n: z.number().nullable(),   // set to enable reranking
      file_name: z.string().nullable(),      // metadata filter
      file_version: z.number().nullable(),
    }),
    execute: async ({ query, top_k, score_threshold, rerank_top_n, file_name }) => {
      const custom_filters = file_name
        ? { file_name: { operator: 'eq' as const, value: file_name } }
        : undefined

      const response = await client.beta.retrieval.retrieve({
        index_id: indexId,
        project_id: projectId,
        query,
        top_k,
        score_threshold,
        rerank: rerank_top_n != null ? { enabled: true, top_n: rerank_top_n } : undefined,
        custom_filters,
      })

      // Return a model-readable list plus citations that drive the UI chips.
      const citations = response.results.map((r) => ({
        id: makeCitationId(),                    // e.g. "c7f2qa"
        fileName: r.metadata?.file_name,
        score: r.rerank_score ?? r.score ?? null,
        preview: r.content.slice(0, 500),
      }))
      const formatted = response.results
        .map((r, i) => `### Result #${i + 1}nn${r.content.slice(0, 600)}`)
        .join('nn---nn')
      return { formatted, citations }
    },
  })

  // findFiles / readFile / grepFile follow the same shape, backed by
  // client.beta.retrieval.find / .read / .grep
  return { retrieve /* , findFiles, readFile, grepFile */ }
}

export function buildAgent(model, apiKey: string, projectId: string, indexId: string) {
  return new ToolLoopAgent({
    model,
    tools: createLlamaParseTools(apiKey, projectId, indexId),
    instructions:
      'Always call findFiles first, ground every answer in the documents, ' +
      'and cite ids inline as `cite:`.',
  })
}

Answers contain visual excerpts. Each detected episode receives a short ID, such as cite:c7f2qa. The agent refers to the id on the line, and the UI provides a clickable quoting chip. Clicking it opens a screenshot of the source page with a bounding box rectangle over the highlighted text.

Naive RAG versus agent Retrieval Harness

The harness is a different use model than the single shot RAG. The comparison below focuses on behavior.

Size	Naive / one-shot RAG	Agentic Retrieval Harness (Guide v2)
Retrieval flow	One vector search per query	Multi-step loop: find → retrieve → read/grep
Search for methods	Vector similarity only	Hybrid semantic, keyword, and regex grep search
Context	Top-k fixed slices	The agent reads full files or windows on demand
Burning	Fixed index	A continuous pipeline with version synchronization
Precision control	It is very hidden	`top_k`, `score_threshold`, `rerank_top_n` transparent
Quotes	Chunk ID	Visual excerpts with page screenshots and bboxes
A very good fit	A short question answers	Long horizon document activities

Use situations, and examples

The design targets domains where agents navigate large sets of documents. Legal and fintech are examples mentioned.

Consider the contractual question: ‘What notice is required to terminate an MSA?’ The agent lists the files, it works retrievethen greps the specific clause. Responds with a quote from a specific page.
Consider due diligence throughout the data room: An agent can findFiles by name then readFile each candidate. Checks paragraphs without opening all PDFs.
Consider a policy base with version: Because retrieve he accepted a file_version filter, the agent can query a specific version. This supports tracking of change over time.

Reference implementation

/g,’>’);} function same(text){ var t=text.toLowerCase(),best=null,hit=0; INTENTS.forEach(function(it){ var c=0; it.kw.forEach(function(k){ if(t.indexOf(k)>-1)c++; }); if(c>hit){hit=c;best=it;}}); return the best; } function litFile(fn){ root.querySelectorAll(‘.file’).forEach(function(f){ f.classList.toggle(‘lit’, f.getAttribute(‘data-fn’)===fn); }); } function addStep(cls,label,html,delay){ return new Promise(function(res){ setTimeout(function(){ var s=document.createElement(‘div’);s.className=”step”; s.innerHTML=’

‘+label+’

‘+html; feed.appendChild(children); ping(); res(); }, delay); }); } var C1,C2; function run(forceKey){ if(busy) return; busy=true; go.disabled=true; if(empty)empty.style.display=’none’; feed.innerHTML=”; var it = forceKey? INTENTS.filter(function(x){return x.key===forceKey;})[0] : match(input.value||”); C1=remove(); C2=remove(); if(!it){ addStep(‘find’,’findFiles’,callHTML(‘findFiles’,{},’3 files: Mutual_NDA.pdf (v2), MSA_Acme_Vendor.pdf (v1), Employment_Agreement.pdf (v1)’),150) add,(response),'(response),'(fu)

The referenced documents do not contain enough information to answer that. Try termination, confidentiality, payment terms, non-compete, liability, or governing law.

‘,700); }) .then(done); come back; } litFile(it.file); // 1) findFiles (always first) addStep(‘find’,’findFiles’,callHTML(‘findFiles’,{},’3 files listed · ‘+it.file+’ (v’+it.ver+’) is a candidate’),150) // 2) return (mixed search) .then(function(){ return addStep(”,’retrieve’,callHTML(‘retrieve’,{query:it.query,top_k:5,rerank_top_n:3}),820); }) .then(function(){ return addStep(”,’results’,retrieveResults) 7/8 ensure) .then(function(){ return addStep(‘grep’,’grepFile’,callHTML(‘grepFile’,{file:it.file,pattern:it.grep.slice(0,32)+’…’},’1 match confirmed on p.’+it.page),820); }) // 4) (return response) with base. addStep(‘ans’,’answer’,’

‘+answerHTML(it)+’

‘,780); }) .then(done); } function done(){ busy=false; go.disabled=false; } function callHTML(name,args,note){ var a=Object.keys(args).map(function(k){ var v=args[k]; var val = typeof v===’number’ ? ‘‘+v+’‘:’“‘+esc(String(v))+'”‘; come back‘+k+’: ‘+val; }).join(‘, ‘); var line=”

→ a tool “+name+'({ ‘+a+’ })’; if(note)line+=’
✓ ‘+esc(note)+’‘; line+=’

‘; return line; } function returnResults(it){ var s2=(it.score-0.14).toFixed(3); var h=”

“+”

Result #1 · ‘+it.file+’ · p.’+it.page+’score ‘+it.score.toFixed(3)+’ · quote:’+C1+’

‘+esc(it.chunk.slice(0,150))+’…

‘+’

Result #2 · ‘+it.file+’ · p.’+it.page+’points ‘+s2+’ · quote:’+C2+’

‘+esc(it.chunk.slice(120,250))+’…

‘+’

‘; return h; } function answerHTML(it){ var html=esc(it.answer) .return(‘§CITE§’,’quote:’+C1+’‘).return(‘§CITE2§’,’quote:’+C2+’‘); // stash the root of the modal._cur=it; return html; } // citation modal var modal=root.querySelector(‘#modal’), shot=root.querySelector(‘#shot’), mpv=root.querySelector(‘#mpv’), mt=root.querySelector(‘#mt’); feed.addEventListener(‘click’,function(e){ var chip=e.target.closest(‘.citechip’); if(!chip)return; var it=root._cur; if(!it)return; mt.textContent=it.file+’ · page ‘+it.page+’ · v’+inner.=’ shot.

‘+esc(it.chunk)+’

‘+”; mpv.textContent=it.chunk; modal.classList.add(‘on’); ping(); }); root.querySelector(‘#mx’).onclick=function(){modal.classList.remove(‘on’);ping();}; modal.onclick=function(e){ if(e.target===modal){modal.classList.remove(‘on’);ping();} }; go.onclick=function(){ run(null); }; input.addEventListener(‘keydown’,function(e){ if(e.key===’Enter’)run(null); }); // automatically resize WordPress embed function ping(){ try { var h=document.getElementById(‘mtp-harness’).offsetHeight+40; parent.postMessage({type:’mtp-harness-height’,height:h},’*’); }catch(e){} } window.addEventListener(‘load’,ping); window.addEventListener(‘resize’,ping); setTimeout(ping, 300); })();

admin 23 minutes ago

0 0 5 minutes read