Now Public - v0.1.0 on PyPI

TurboAgents

Supercharge Your Agents with TurboQuant

Keep your stack. Add compressed retrieval.

TurboQuant-style KV-cache and vector compression for agent and RAG systems. Works with your existing vector store, retrieval layer, and agent runtime. Framework-agnostic. Benchmark-first.

TurboQuant

Chroma

LanceDB

SurrealDB

MLX Native

Apache 2.0

Explore Docs GitHub PyPI

terminal - turboagents

$ cd your-rag-project

▸ TurboAgents v0.1.0a2 - compression infrastructure

▸ Walsh-Hadamard rotation + PolarQuant encoding

▸ Chroma adapter → recall@10 = 1.0 validated

▸ FAISS adapter → recall@10 = 1.0 validated

▸ Benchmark complete.

How It Works

Sits Between Your Embeddings and Your Results

TurboAgents applies compressed scoring and reranking between your vector database and your agent or RAG pipeline. Simple by design.

Your Vector DB

Chroma · FAISS · pgvector · LanceDB · SurrealDB

TurboAgents

Compressed scoring & reranking

Your Agent / RAG

Final optimized results

The design principle: keep your framework, keep your vector database, add TurboAgents where retrieval cost and memory pressure start to hurt. No giant migration plan required.

See the full architecture

Quantization pipeline, adapter integration, and benchmark methodology - all documented.

Explore TurboAgents View Benchmarks

What TurboAgents Does

Compressed retrieval, vector reranking, KV-style optimization, and benchmark-first validation - all in one package.

KV-Cache Compression

Walsh-Hadamard rotation with PolarQuant-style angle/radius encoding. Compress KV-cache for extended context windows in local and server inference.

Vector Payload Compression

Reduce vector storage costs and scale retrieval systems without replacing your existing vector database backend.

Quality/Latency Benchmarking

Explicit, measurable recall metrics with CLI-driven benchmarks. Know exactly what you trade before you commit.

Multi-Backend Adapters

Validated adapters for Chroma, FAISS, LanceDB, pgvector, and SurrealDB. Preserves your existing infrastructure investment.

MLX & llama.cpp Support

First-class MLX integration with a validated 3.5-bit sweet spot on 3B models. Experimental vLLM support for server-side inference.

Validated Results

Every result is validated, documented, and reproducible. Real recall metrics across all supported backends.

Validated Results

Backend Benchmark Coverage

Every adapter is benchmarked with real recall metrics.

BackendRecall@10BitsStatus

Chroma1.04-bit

Validated

FAISS1.04-bit

Validated

pgvector0.8974-bit

Validated

LanceDB0.70-0.754-bit

Validated

SurrealDBN/A4-bit

Integrated

Full benchmark methodology and results at superagenticai.github.io/turboagents/benchmarks

Run the benchmarks yourself

CLI-driven benchmarks with turboagents bench rag. Compare backends, measure tradeoffs, decide with data.

Explore TurboAgents View Benchmarks

Native Adapters

Built-In Vector Store Adapters

Drop-in compressed retrieval for the vector databases you already use. Each adapter wraps the native client and adds TurboQuant-style scoring and reranking.

Chroma

TurboChroma

Persistent compressed retrieval with collection-level storage. Wraps the Chroma client and adds quantized scoring without changing your schema.

Recall@10: 1.0

Docs

LanceDB

TurboLanceDB

Local or remote LanceDB with table-level compressed retrieval. URI-based configuration for flexible deployment topologies.

Recall@10: 0.70-0.75

Docs

SurrealDB

TurboSurrealDB

WebSocket-connected SurrealDB with namespace and database isolation. Compressed vector storage for multi-tenant agent deployments.

Recall@10: Integrated

Docs

FAISSTurboFAISS

In-memory compressed retrieval. No persistence needed.

recall: 1.0

pgvectorTurboPgvector

PostgreSQL-native compressed retrieval for production setups.

recall: 0.897

Beyond Retrieval

Inference Runtime Engines

TurboAgents compresses KV-cache payloads so local and server-side inference can hold more context. Wrapper-first integration with MLX, llama.cpp, and vLLM - without pretending native kernels exist where they do not.

MLX

Apple Silicon

Validated

3.5 bits identified as the best quality/performance tradeoff on Llama-3.2-3B-Instruct-4bit. First-class local serving via turboagents serve --backend mlx.

llama.cpp

Cross-Platform

Supported

Wrapper for building and inspecting runtime command paths. Integrates with existing GGUF model workflows for compressed inference on commodity hardware.

vLLM

Server-Side

Experimental

Plugin scaffold for vLLM-based server deployments. Dry-run mode builds the correct serve commands for production exploration.

turboagents serve

❯# Local MLX inference
❯turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run
❯# llama.cpp on GGUF models
❯turboagents serve --backend llamacpp --model model.gguf --dry-run
❯# vLLM server-side
❯turboagents serve --backend vllm --model meta-llama/Llama-3.1-8B-Instruct --dry-run

Explore the runtime engine layer

Architecture docs cover the wrapper-first strategy, runtime detection, and how compression integrates with each inference backend.

Explore TurboAgents View Benchmarks

Developer Experience

CLI-First Toolkit

Environment inspection, benchmarking, compression, and runtime serving - all from a single CLI entry point.

turboagents doctor

System diagnostics - platform, Python version, optional packages, adapter status for llama.cpp, MLX, and vLLM

turboagents bench kv --format json

Synthetic KV-style reconstruction metrics across bit-widths. Measure compression quality before committing.

turboagents bench rag --format markdown

Synthetic retrieval metrics across bit-widths for all supported vector backends.

turboagents compress --input vectors.npy --bits 3.5 --head-dim 128

Direct vector compression to serialized payloads. Specify bit-width, head dimension, and seed for reproducibility.

turboagents serve --backend mlx --model [model] --dry-run

Build and inspect runtime serve commands for MLX, llama.cpp, or vLLM backends. Dry-run by default.

Three Ways to Integrate

TurboAgents works as an agent runtime layer, RAG middleware, or standalone evaluation tooling - depending on where you need compression.

Agent Runtime Layer

turboagents.engines.*

Add beneath existing agent frameworks using turboagents.engines.mlx, llamacpp, or vllm. Compress KV-cache payloads so inference holds more context without modifying application-level agent code.

RAG Middleware

turboagents.rag.*

Integrate at the retrieval layer using TurboFAISS, TurboChroma, TurboLanceDB, TurboSurrealDB, or TurboPgvector. Native client or sidecar patterns depending on the backend.

Evaluation Tooling

turboagents bench / compress

Use the CLI for compression fit assessment before deeper integration. Run bench kv, bench rag, or bench paper to understand tradeoffs with your specific data.

See all integration patterns

The architecture docs cover native vs sidecar patterns, engine wrappers, and how to evaluate compression fit for your specific stack.

Explore TurboAgents View Benchmarks

Get Started

Install in Seconds. Benchmark Immediately.

Three install paths depending on what you need. Then run your first benchmark.

Core install

shell

❯uv add turboagents

Add retrieval adapters or MLX support

shell

❯uv add "turboagents[rag]"    # Retrieval adapters
❯uv add "turboagents[mlx]"    # MLX support
❯uv add "turboagents[all]"    # Everything

Run your first benchmark

shell

❯turboagents bench rag --backend chroma

Full setup guide and examples

Adapter configuration, benchmark options, framework integration patterns, and MLX optimization - all in the docs.

Explore TurboAgents View Benchmarks

Reference Integration

TurboAgents + SuperOptiX

TurboAgents is standalone, but SuperOptiX is the first full reference integration - end-to-end compressed retrieval inside a real agent framework.

turboagents-chroma

Chroma-backed compressed retrieval

Validated

turboagents-faiss

FAISS-backed compressed retrieval

Validated

turboagents-lancedb

LanceDB-backed compressed retrieval

Validated

turboagents-surrealdb

SurrealDB-backed compressed retrieval

Validated

install

❯# Install SuperOptiX with TurboAgents
❯uv pip install "superoptix[turboagents]"

TurboAgents can operate as standalone infrastructure or power GEPA vector-store backends and shared RAG retrievers inside SuperOptiX. Four validated backends, full playbook integration via YAML RAG blocks.

Integration Guide SuperOptiX Docs SuperOptiX GitHub

Deep Dive

Read the full announcement for the design philosophy and the story behind TurboAgents.

TurboAgents

Announcement

Introducing TurboAgents: Supercharge Your Agents with TurboQuant

TurboAgents is a Python package for compressed retrieval and KV-style optimization for agent and RAG systems. Framework-agnostic infrastructure that plugs into the system you already have.

Read Full Post

See It in Action

TurboAgents Demo

Watch the walkthrough covering installation, CLI benchmarks, adapter integration, and compressed retrieval in practice.

Follow along with theturboagents-demorepo

Ready to Compress Your Retrieval Stack?

Install TurboAgents, run it against your vector store, and see where it helps. Docs, benchmarks, adapters, and examples - everything is ready.

❯uv add turboagents && turboagents bench rag

Explore TurboAgents Star on GitHub Benchmarks SuperOptiX Docs

PyPI Package Getting Started Adapters SuperOptiX GitHub Apache 2.0 Licensed