LlamaIndex
A data framework focused on retrieval over your own data: it ingests documents, indexes them, and exposes query engines that agents call as tools.
Why it matters
Where LangChain is agent-loop-first, LlamaIndex is RAG-first — it owns the ingestion-to-retrieval pipeline that most agents actually live or die on. It bundles 100+ data loaders (PDF, Notion, SQL, web) plus node parsing, indexing, and re-ranking, so building a “chat with my docs” rag-agent is a handful of lines. The agent layer then treats each index as a queryable tool, letting one agent route across many data sources.
How it works
The pipeline is load → parse into Nodes → index → query.
| Stage | What it does |
|---|---|
| Reader | file/API → Document objects |
| Node parser | splits docs into Node chunks |
| Index | VectorStoreIndex, SummaryIndex, etc. |
| Retriever | Index → top-k Nodes for a query |
| Query engine | retriever + LLM synthesis = answer |
- A
VectorStoreIndexembeds Nodes for semantic search; aSummaryIndexwalks all Nodes for whole-corpus questions. - Response synthesis modes matter:
compactstuffs Nodes into one prompt;refineloops Node-by-Node when results overflow the context window. - Wrap a query engine as a
QueryEngineTooland hand it to an agent; with several tools the agent does routing — pick the right index per question. - A
PropertyGraphIndexadds graph/entity retrieval beyond flat vectors.
Example
Docs-to-agent in five steps:
docs = SimpleDirectoryReader("./policies").load_data()
index = VectorStoreIndex.from_documents(docs) # parse+embed+store
qe = index.as_query_engine(response_mode="compact")
tool = QueryEngineTool.from_defaults(qe, name="policy_docs")
agent = ReActAgent.from_tools([tool, hr_db_tool]) # routes across sources
agent.chat("How many sick days do I have left?")The agent reads the question, routes to policy_docs vs hr_db_tool, and the query engine returns a synthesized, citation-backed answer.
Pitfalls
- Re-indexing every run.
from_documentsre-embeds the whole corpus;persist()the index and reload, or you pay the embedding cost each boot. - Wrong synthesis mode.
compactsilently truncates when Nodes exceed context; userefine/tree_summarizefor large result sets. - Default chunking. The out-of-box splitter ignores document structure; tune
chunk_size/overlap to your content or recall drops. - One giant index. Routing across several small, well-named indexes beats one undifferentiated blob the agent can’t reason about.