GraphRAG on VectorScaleDB

Why the Mapping Matters

Recent peer-reviewed research out of NYU Shanghai evaluated GraphRAG-style explicit-graph retrieval against vector-only RAG on multi-hop reasoning tasks. The explicit-graph approach wins clearly on questions that require chaining relationships across entities — the exact workload where flat nearest-neighbor search degrades.

External Validation

Explicit Graphs Win Multi-Hop

Independent academic work confirms what the VectorScaleDB architecture assumed from day one: retrieval that traverses an explicit relationship graph beats retrieval that only ranks by embedding similarity whenever the question spans more than one hop.

Native, Not Retrofit

The Graph Is the Substrate

GraphRAG pipelines typically extract a graph as a post-processing step on top of a vector store. VectorScaleDB computes the graph continuously as behavioral coupling between entities. There is no extract-then-retrieve pipeline — the graph is always current.

One Query Surface

Vectors + Graph, Same API

You do not choose between vector search and graph traversal. A single query plan uses the coupling structure for expansion and vector similarity for ranking. The query language does not split along the seam.

Vocabulary Translation

If you are coming from a GraphRAG implementation — Microsoft Research's open-source reference, LangChain GraphRAG, LlamaIndex property graph, or a hand-rolled Neo4j + embeddings pipeline — the concepts map cleanly. Only the vocabulary changes.

GraphRAG Term	VectorScaleDB Equivalent	Notes
Entity extraction (LLM over chunks)	Entity-type classification at ingest	200+ first-class entity types across 20+ domains; classification happens in the ingest adapter, not as a separate LLM pass.
Relationship extraction	Coupling discovery	Relationships are inferred from co-behavior over time, not parsed from prose. No LLM in the extraction loop.
Knowledge graph	Coupling matrix	The coupling matrix is the explicit graph — weighted, directed, and versioned alongside the underlying segments.
Community detection (Leiden, Louvain)	Regime clustering	Behavioral regimes group entities that move together. Regimes are produced by the compression engine and visible in every query response.
Community summary	Regime summary / domain composition	Each regime carries a centroid, drift magnitude, member count, and per-domain composition breakdown. Available at `/v1/query/forecast` and related endpoints.
Subgraph retrieval / local search	Coupling-neighbor query	Given a seed entity, expand along coupling edges above a configurable weight, then rank with vector similarity. One call, not two.
Global search (community-level answers)	Regime-level query	Answer over regime summaries instead of individual segments. Same query shape, different resolution level.
Hierarchical summarization	Multi-resolution hierarchical summary	Summaries exist at segment, regime, and cross-domain cluster levels. The query planner chooses the resolution that matches the question.
Multi-hop reasoning	Cascade / coupling traversal	Follow coupling edges across entity-type boundaries to reach related entities that no single embedding would have returned. Cascade prediction is the same mechanism exposed as a forecasting endpoint.
Graph embeddings	Coupled vectors	Every stored vector is already positioned relative to its neighbours in the coupling structure. No separate embedding step is required to blend structure and content.
Anomaly / novelty detection	Behavioral anomaly detection	An entity that does not fit its regime or coupling neighbourhood scores high on anomaly. Works across domains with the unified segment format.
Re-indexing after schema change	Not applicable	The coupling matrix evolves continuously as entities behave. There is no batch rebuild step because there is no batch extraction step.

What This Means in Practice

Three concrete consequences of the coupling matrix being native rather than retrofitted.

Freshness

No Graph Rebuild Lag

GraphRAG pipelines typically rebuild the knowledge graph on a batch schedule — nightly, weekly, or on demand. Between runs, the graph drifts out of sync with the underlying data. VectorScaleDB updates couplings as segments are ingested, so the graph is never stale.

Cost

No LLM in the Write Path

Entity and relationship extraction in classic GraphRAG runs an LLM over every ingested chunk. VectorScaleDB does not put an LLM in the write path. Coupling is derived from behavior. LLMs are a consumer of the graph, never a dependency for building it.

Scope

Works Past Text

GraphRAG was designed for document corpora. The coupling matrix treats documents, telemetry, trajectories, financial ticks, and biological signals with the same math. A retrieval query can legitimately cross from a text passage to a sensor regime via shared coupling.

Looking for the human-reasoning angle?

This page is deliberately scoped to the GraphRAG isomorphism. If you are here because you care about how the coupling matrix supports dialectical, multi-perspective reasoning — the cyborg-readiness framing — see the cross-domain intelligence page and the architecture overview.

Cross-Domain Intelligence Architecture