Now Public - v0.1.0 on PyPI
TurboAgents logo

TurboAgents

Supercharge Your Agents with TurboQuant

Keep your stack. Add compressed retrieval.

TurboQuant-style KV-cache and vector compression for agent and RAG systems. Works with your existing vector store, retrieval layer, and agent runtime. Framework-agnostic. Benchmark-first.

TurboQuant
Chroma
LanceDB
SurrealDB
MLX Native
Apache 2.0
terminal - turboagents
$ cd your-rag-project
$
TurboAgents v0.1.0a2 - compression infrastructure
Walsh-Hadamard rotation + PolarQuant encoding
Chroma adapter → recall@10 = 1.0 validated
FAISS adapter → recall@10 = 1.0 validated
Benchmark complete.

How It Works

Sits Between Your Embeddings and Your Results

TurboAgents applies compressed scoring and reranking between your vector database and your agent or RAG pipeline. Simple by design.

Your Vector DB

Chroma · FAISS · pgvector · LanceDB · SurrealDB

TurboAgents

Compressed scoring & reranking

Your Agent / RAG

Final optimized results

The design principle: keep your framework, keep your vector database, add TurboAgents where retrieval cost and memory pressure start to hurt. No giant migration plan required.

See the full architecture

Quantization pipeline, adapter integration, and benchmark methodology - all documented.

What TurboAgents Does

Compressed retrieval, vector reranking, KV-style optimization, and benchmark-first validation - all in one package.

KV-Cache Compression

Walsh-Hadamard rotation with PolarQuant-style angle/radius encoding. Compress KV-cache for extended context windows in local and server inference.

Vector Payload Compression

Reduce vector storage costs and scale retrieval systems without replacing your existing vector database backend.

Quality/Latency Benchmarking

Explicit, measurable recall metrics with CLI-driven benchmarks. Know exactly what you trade before you commit.

Multi-Backend Adapters

Validated adapters for Chroma, FAISS, LanceDB, pgvector, and SurrealDB. Preserves your existing infrastructure investment.

MLX & llama.cpp Support

First-class MLX integration with a validated 3.5-bit sweet spot on 3B models. Experimental vLLM support for server-side inference.

Validated Results

Every result is validated, documented, and reproducible. Real recall metrics across all supported backends.

Validated Results

Backend Benchmark Coverage

Every adapter is benchmarked with real recall metrics.

BackendRecall@10BitsStatus
Chroma1.04-bit
Validated
FAISS1.04-bit
Validated
pgvector0.8974-bit
Validated
LanceDB0.70-0.754-bit
Validated
SurrealDBN/A4-bit
Integrated

Full benchmark methodology and results at superagenticai.github.io/turboagents/benchmarks

Run the benchmarks yourself

CLI-driven benchmarks with turboagents bench rag. Compare backends, measure tradeoffs, decide with data.

Native Adapters

Built-In Vector Store Adapters

Drop-in compressed retrieval for the vector databases you already use. Each adapter wraps the native client and adds TurboQuant-style scoring and reranking.

Chroma

TurboChroma

Persistent compressed retrieval with collection-level storage. Wraps the Chroma client and adds quantized scoring without changing your schema.

Recall@10: 1.0

LanceDB

TurboLanceDB

Local or remote LanceDB with table-level compressed retrieval. URI-based configuration for flexible deployment topologies.

Recall@10: 0.70-0.75

SurrealDB

TurboSurrealDB

WebSocket-connected SurrealDB with namespace and database isolation. Compressed vector storage for multi-tenant agent deployments.

Recall@10: Integrated
FAISSTurboFAISS

In-memory compressed retrieval. No persistence needed.

recall: 1.0
pgvectorTurboPgvector

PostgreSQL-native compressed retrieval for production setups.

recall: 0.897

Beyond Retrieval

Inference Runtime Engines

TurboAgents compresses KV-cache payloads so local and server-side inference can hold more context. Wrapper-first integration with MLX, llama.cpp, and vLLM - without pretending native kernels exist where they do not.

MLX

Apple Silicon
Validated

3.5 bits identified as the best quality/performance tradeoff on Llama-3.2-3B-Instruct-4bit. First-class local serving via turboagents serve --backend mlx.

llama.cpp

Cross-Platform
Supported

Wrapper for building and inspecting runtime command paths. Integrates with existing GGUF model workflows for compressed inference on commodity hardware.

vLLM

Server-Side
Experimental

Plugin scaffold for vLLM-based server deployments. Dry-run mode builds the correct serve commands for production exploration.

turboagents serve
# Local MLX inference
turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run
# llama.cpp on GGUF models
turboagents serve --backend llamacpp --model model.gguf --dry-run
# vLLM server-side
turboagents serve --backend vllm --model meta-llama/Llama-3.1-8B-Instruct --dry-run

Explore the runtime engine layer

Architecture docs cover the wrapper-first strategy, runtime detection, and how compression integrates with each inference backend.

Developer Experience

CLI-First Toolkit

Environment inspection, benchmarking, compression, and runtime serving - all from a single CLI entry point.

turboagents doctor

System diagnostics - platform, Python version, optional packages, adapter status for llama.cpp, MLX, and vLLM

turboagents bench kv --format json

Synthetic KV-style reconstruction metrics across bit-widths. Measure compression quality before committing.

turboagents bench rag --format markdown

Synthetic retrieval metrics across bit-widths for all supported vector backends.

turboagents compress --input vectors.npy --bits 3.5 --head-dim 128

Direct vector compression to serialized payloads. Specify bit-width, head dimension, and seed for reproducibility.

turboagents serve --backend mlx --model [model] --dry-run

Build and inspect runtime serve commands for MLX, llama.cpp, or vLLM backends. Dry-run by default.

Three Ways to Integrate

TurboAgents works as an agent runtime layer, RAG middleware, or standalone evaluation tooling - depending on where you need compression.

1

Agent Runtime Layer

turboagents.engines.*

Add beneath existing agent frameworks using turboagents.engines.mlx, llamacpp, or vllm. Compress KV-cache payloads so inference holds more context without modifying application-level agent code.

2

RAG Middleware

turboagents.rag.*

Integrate at the retrieval layer using TurboFAISS, TurboChroma, TurboLanceDB, TurboSurrealDB, or TurboPgvector. Native client or sidecar patterns depending on the backend.

3

Evaluation Tooling

turboagents bench / compress

Use the CLI for compression fit assessment before deeper integration. Run bench kv, bench rag, or bench paper to understand tradeoffs with your specific data.

See all integration patterns

The architecture docs cover native vs sidecar patterns, engine wrappers, and how to evaluate compression fit for your specific stack.

Get Started

Install in Seconds. Benchmark Immediately.

Three install paths depending on what you need. Then run your first benchmark.

1

Core install

shell
uv add turboagents
2

Add retrieval adapters or MLX support

shell
uv add "turboagents[rag]" # Retrieval adapters
uv add "turboagents[mlx]" # MLX support
uv add "turboagents[all]" # Everything
3

Run your first benchmark

shell
turboagents bench rag --backend chroma

Full setup guide and examples

Adapter configuration, benchmark options, framework integration patterns, and MLX optimization - all in the docs.

Reference Integration

TurboAgents + SuperOptiX

TurboAgents is standalone, but SuperOptiX is the first full reference integration - end-to-end compressed retrieval inside a real agent framework.

turboagents-chroma

Chroma-backed compressed retrieval

Validated
turboagents-faiss

FAISS-backed compressed retrieval

Validated
turboagents-lancedb

LanceDB-backed compressed retrieval

Validated
turboagents-surrealdb

SurrealDB-backed compressed retrieval

Validated
install
# Install SuperOptiX with TurboAgents
uv pip install "superoptix[turboagents]"

TurboAgents can operate as standalone infrastructure or power GEPA vector-store backends and shared RAG retrievers inside SuperOptiX. Four validated backends, full playbook integration via YAML RAG blocks.

See It in Action

TurboAgents Demo

Watch the walkthrough covering installation, CLI benchmarks, adapter integration, and compressed retrieval in practice.

Follow along with theturboagents-demorepo

Ready to Compress Your Retrieval Stack?

Install TurboAgents, run it against your vector store, and see where it helps. Docs, benchmarks, adapters, and examples - everything is ready.

uv add turboagents && turboagents bench rag