Semble: Intelligent Code Search That Slashes Token Usage by 98%

The Problem with Traditional Code Search for AI Agents

When AI coding assistants like Claude Code tackle large codebases, they often rely on grep to locate relevant code. But grep is a blunt instrument—it scans files line by line, consuming massive numbers of tokens and frequently missing the right matches. The result: wasted compute, slower responses, and incomplete context for the agent. Existing alternatives either demand GPU-powered indexing, require API keys, or suffer from poor retrieval quality. Developers need a tool that is fast, accurate, and economical with tokens.

Semble: Intelligent Code Search That Slashes Token Usage by 98%
Source: hnrss.org

Introducing Semble: A Token-Efficient Alternative

Semble is an open-source code search engine built specifically for AI agents. Developed by Stephan and Thomas, it addresses the token waste problem head-on. By combining static Model2Vec embeddings (using their custom model, potion-code-16M) with BM25, fused via Reciprocal Rank Fusion (RRF) and reranked using code-aware signals, Semble achieves state-of-the-art retrieval without any transformers. This means everything runs on CPU, making it accessible and inexpensive.

How It Works

The magic lies in the hybrid approach: static embeddings capture semantic meaning without the overhead of running a transformer model, while BM25 provides traditional keyword matching. RRF blends the two rankings, and a lightweight reranking step fine-tunes results based on code-specific heuristics. The entire pipeline is optimized for speed—typically indexing a repository takes ~250 milliseconds, and each query completes in ~1.5 milliseconds on CPU.

Benchmark Performance: Almost Perfect Accuracy

On a benchmark of approximately 1,250 query/document pairs across 63 repositories and 19 programming languages, Semble delivers remarkable results:

These numbers show that Semble nearly matches the retrieval quality of much heavier transformer models while being dramatically faster and token-efficient.

Key Features

Getting Started

Integrating Semble with Claude Code is a one-liner:

Semble: Intelligent Code Search That Slashes Token Usage by 98%
Source: hnrss.org
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble

For other environments (Cursor, Codex, OpenCode), check the README for detailed instructions.

Why This Matters for AI Agents

Agents work in loops: they ask a question, gather context, then act. Every token spent on grep or reading full files adds latency and cost. By slashing token usage by 98%, Semble allows agents to operate faster, handle larger codebases, and stay within budget. Because it runs on CPU with no external dependencies, it works immediately out of the box—perfect for local, offline, or air-gapped environments.

Conclusion

Semble proves that you don’t need massive transformer models for high-quality code retrieval. Its hybrid approach offers a practical, efficient solution for AI coding tools. Whether you’re building a custom agent or using Claude Code, Semble can dramatically reduce token consumption while maintaining near-perfect retrieval accuracy. Try it today and see the difference.

For more details, including the full benchmark methodology and model weights, visit the Semble repository and the benchmarks page. The static model is available on Hugging Face.

Tags:

Recommended

Discover More

Securing .NET AI Agents: How to Govern MCP Tool Execution with AGTA Practical Guide to Modifying Pod Resources in Suspended Kubernetes Jobs (Beta)Security Alert: Malicious Code Hidden in Linux Version of Cemu Wii U EmulatorMastering Python Fundamentals: A Comprehensive Quiz-Based ReviewCISA Flags Critical Cisco SD-WAN Flaw: 7 Key Insights on CVE-2026-20182