Unified Multi-Model Storage

Five Data Models, One Engine

Stop stitching together separate databases for vectors, documents, graphs, blobs, and text search. VectorScaleDB stores and queries all five natively, with cross-model joins at query time.

Vectors

Temporal-semantic vector storage

The core data model. High-dimensional vectors with temporal context, behavioral drift tracking, and regime-aware compression. Sub-millisecond similarity search across billions of vectors with time-bounded queries.

Documents

Structured metadata storage

JSON-compatible metadata attached to any entity. Indexed for efficient filtering and faceted queries. Metadata participates in vector queries as pre-filters, post-filters, or scoring signals — no separate document store required.

Graphs

Entity relationship traversal

Cross-entity relationships stored as first-class edges. Traverse from a sensor to its device, from a device to its fleet, from a fleet to its operator. Graph traversal combines with vector similarity for relationship-aware nearest neighbor search.

Blobs

Content-addressable binary storage

Raw binary data — images, model weights, sensor firmware, configuration snapshots — stored with content-addressable deduplication. Identical blobs are stored once regardless of how many entities reference them. Automatic garbage collection reclaims unreferenced blobs.

Full-Text

Integrated text search

Full-text search over entity metadata, log messages, annotations, and any text field. Combined with vector similarity for hybrid search: find entities whose behavior matches a vector pattern AND whose metadata matches a text query, in a single request.

Automatic Tiered Storage

Data moves automatically between storage tiers based on access patterns. Hot data stays in memory for microsecond access. Cold data compresses to disk. You set the policy; the engine manages placement.

Hot Tier

In-memory LRU cache

Frequently accessed segments and active entities reside in memory with LRU eviction. Sub-microsecond access latency. Cache size adapts automatically to available memory, respecting the resource governor's budget constraints.

Warm Tier

Optimized on-disk storage

Recently-active data stored on disk with bloom filters for fast negative lookups. Read latency in the low microseconds for SSD-backed deployments. Automatic promotion to hot tier on repeated access.

Cold Tier

Maximum compression archival

Historical data compressed with aggressive algorithms (up to 19x additional compression beyond behavioral compression). Slightly higher read latency in exchange for dramatically reduced storage costs. Transparent promotion on access — queries work identically across all tiers.

Content-Addressable Deduplication

Identical data is stored once, regardless of how many entities reference it. Content hashes serve as universal identifiers — enabling deduplication, integrity verification, and decentralized caching in a single mechanism.

Deduplication

Automatic cross-entity dedup

When multiple entities produce identical behavioral segments, model weights, or binary blobs, the storage layer detects the duplication via content hashing and stores a single copy. References are lightweight pointers to the canonical copy. In fleets with thousands of similar devices, deduplication can reduce storage by 40-80%.

Integrity

Built-in corruption detection

Every stored object carries its content hash. On read, the hash is recomputed and verified. Any corruption — from disk failure, bit rot, or tampering — is detected immediately. Combined with the SHA-256 audit chain, this provides end-to-end data integrity from ingestion to query.

Semantic Chunk Compression

Beyond behavioral regime compression, the unified storage layer applies semantic-aware compression that understands the structure of your data.

Structure-Aware

Per-column encoding

Each data type is encoded with a type-optimal strategy chosen automatically — the engine applies intelligent multi-stage compression and selects the best approach per field without manual tuning.

Cross-Entity

Fleet-level compression

Entities of the same type often share structural similarities. The compression engine identifies common patterns across entities in the same domain and compresses against shared baselines, achieving additional 2-5x compression beyond per-entity behavioral compression.

Queryable

Compression-transparent queries

All compression is transparent to the query layer. Queries operate on compressed data directly when possible — centroid comparisons, range checks, and bloom filter lookups happen without decompression. Only final result materialization requires full decompression.