GitHub - Muvon/octocode: Semantic code searcher and codebase utility

https://github.com/Muvon/octocode • Jun 8, 2025 19:44

Extracto

Semantic code searcher and codebase utility. Contribute to Muvon/octocode development by creating an account on GitHub.

Resumen

Resumen Principal

Octocode, desarrollado por Muvon Un Limited, emerge como una solución innovadora diseñada para transformar la interacción de los desarrolladores con bases de código complejas. Se posiciona como un indexador de código inteligente y motor de búsqueda semántica que va más allá de la indexación tradicional, construyendo grafos de conocimiento detallados de todo el codebase. Su propuesta de valor reside en la combinación de capacidades avanzadas de IA con un diseño local-first, lo que permite una comprensión profunda del código, mapeo de relaciones entre componentes y asistencia inteligente. Facilita la navegación y el análisis mediante consultas en lenguaje natural, la generación automática de descripciones arquitectónicas y la identificación de dependencias. Además, mejora significativamente los flujos de trabajo de desarrollo con funciones como la generación de mensajes de commit y la revisión de código impulsada por IA, integrándose de forma fluida con el ecosistema de herramientas de IA gracias al soporte del Model Context Protocol (MCP) y una optimización de rendimiento robusta.

Elementos Clave

Grafos de Conocimiento (GraphRAG) y Búsqueda Semántica: Octocode destaca por su capacidad para construir grafos de conocimiento inteligentes que descubren automáticamente relaciones entre archivos y módulos, rastrean dependencias e incluso generan descripciones arquitectónicas con IA. Esto se complementa con una búsqueda semántica avanzada que permite a los desarrolladores realizar consultas en lenguaje natural a través de todo su código, documentación y texto, obteniendo resultados altamente relevantes gracias a la puntuación de similitud y la expansión de símbolos.
Capacidades de IA Integradas y Sistema de Memoria: El software incorpora funciones de IA para optimizar la productividad, como la generación inteligente de mensajes de commit, la revisión de código basada en mejores prácticas y un sistema de memoria persistente. Este último permite a los desarrolladores almacenar y recuperar insights, decisiones y contexto importantes mediante búsqueda semántica, facilitando el mantenimiento y la comprensión a largo plazo del proyecto.
Amplio Soporte Multi-Lenguaje y Rendimiento Optimizado: Octocode ofrece soporte para una diversidad de lenguajes de programación clave, incluyendo Rust, Python, JavaScript, TypeScript, Go, PHP, C++, Ruby, entre otros, utilizando el análisis basado en Tree-sitter para una extracción precisa de símbolos. La herramienta está diseñada para la eficiencia, con características como la indexación optimizada, el procesamiento por lotes inteligente, la persistencia frecuente de datos y el uso de la base de datos columnar Lance para una búsqueda vectorial rápida, además de opciones de embeddings locales y en la nube.
**Integración con MCP Server y Flexibilidad de Desplieg

Contenido

Octocode - Intelligent Code Indexer and Graph Builder

🚀 Overview

Octocode is a powerful code indexer and semantic search engine that builds intelligent knowledge graphs of your codebase. It combines advanced AI capabilities with local-first design to provide deep code understanding, relationship mapping, and intelligent assistance for developers.

✨ Key Features

🔍 Semantic Code Search

Natural language queries across your entire codebase
Multi-mode search (code, documentation, text, or all)
Intelligent ranking with similarity scoring
Symbol expansion for comprehensive results

🕸️ Knowledge Graph (GraphRAG)

Automatic relationship discovery between files and modules
Import/export dependency tracking
AI-powered file descriptions and architectural insights
Path finding between code components

🌐 Multi-Language Support

Rust, Python, JavaScript, TypeScript, Go, PHP
C++, Ruby, JSON, Bash, Markdown
Tree-sitter based parsing for accurate symbol extraction

🧠 AI-Powered Features

Smart commit message generation
Code review with best practices analysis
Memory system for storing insights, decisions, and context
Semantic memory search with vector similarity
Memory relationships and automatic context linking
Multiple LLM support via OpenRouter

🔌 MCP Server Integration

Built-in Model Context Protocol server
Seamless integration with AI assistants (Claude Desktop, etc.)
Real-time file watching and auto-reindexing
Rich tool ecosystem for code analysis

⚡ Performance & Flexibility

Optimized indexing: Batch metadata loading eliminates database query storms
Smart batching: 16 files per batch with token-aware API optimization
Frequent persistence: Data saved every 16 files (max 16 files at risk)
Fast file traversal: Single-pass progressive counting and processing
Local embedding models: FastEmbed and SentenceTransformer (macOS only)
Cloud embedding providers: Voyage AI (default), Jina AI, Google
Free tier available: Voyage AI provides 200M free tokens monthly
Lance columnar database for fast vector search
Incremental indexing and git-aware optimization

📦 Installation

Download Prebuilt Binary (Recommended)

# Universal install script (Linux, macOS, Windows) - requires curl
curl -fsSL https://raw.githubusercontent.com/Muvon/octocode/master/install.sh | sh

Or download manually from GitHub Releases.

Using Cargo (from Git)

cargo install --git https://github.com/Muvon/octocode

Build from Source

Prerequisites:

Rust 1.70+ (install from rustup.rs)
Git (for repository features)

git clone https://github.com/Muvon/octocode.git
cd octocode

# macOS: Full build with local embeddings
cargo build --release

# Windows/Linux: Cloud embeddings only (due to ONNX Runtime issues)
cargo build --release --no-default-features

Note: Prebuilt binaries use cloud embeddings only. Local embeddings require building from source on macOS.

🔑 Getting Started - API Keys

⚠️ Important: Octocode requires API keys to function. Local embedding models are only available on macOS builds.

Required: Voyage AI (Embeddings)

export VOYAGE_API_KEY="your-voyage-api-key"

Free tier: 200M tokens per month
Get API key: voyageai.com
Used for: Code and text embeddings (semantic search)

Optional: OpenRouter (LLM Features)

export OPENROUTER_API_KEY="your-openrouter-api-key"

Get API key: openrouter.ai
Used for: Commit messages, code review, GraphRAG descriptions
Note: Basic search and indexing work without this

Platform Limitations

Windows/Linux: Must use cloud embeddings (Voyage AI default)
macOS: Can use local embeddings (build from source) or cloud embeddings

🚀 Quick Start

1. Setup API Keys (Required)

# Set Voyage AI API key for embeddings (free 200M tokens/month)
export VOYAGE_API_KEY="your-voyage-api-key"

# Optional: Set OpenRouter API key for LLM features (commit, review, GraphRAG)
export OPENROUTER_API_KEY="your-openrouter-api-key"

Get your free API keys:

Voyage AI: Get free API key (200M tokens/month free)
OpenRouter: Get API key (optional, for LLM features)

2. Basic Usage

# Index your current directory
octocode index

# Search your codebase
octocode search "HTTP request handling"

# View code signatures
octocode view "src/**/*.rs"

3. AI-Powered Git Workflow (Requires OpenRouter API Key)

# Generate intelligent commit messages
git add .
octocode commit

# Review code for best practices
octocode review

4. MCP Server for AI Assistants

# Start MCP server
octocode mcp

# Use with Claude Desktop or other MCP-compatible tools
# Provides: search_code, search_graphrag, memorize, remember, forget

4. Memory Management

# Store important insights and decisions
octocode memory memorize \
  --title "Authentication Bug Fix" \
  --content "Fixed JWT token validation in auth middleware" \
  --memory-type bug_fix \
  --tags security,jwt,auth

# Search your memory with semantic similarity
octocode memory remember "JWT authentication issues"

# Get memories by type, tags, or files
octocode memory by-type bug_fix
octocode memory by-tags security,auth
octocode memory for-files src/auth.rs

# Clear all memory data (useful for testing)
octocode memory clear-all --yes

5. Advanced Features

# Enable GraphRAG with AI descriptions (requires OpenRouter API key)
octocode config --graphrag-enabled true
octocode index

# Search the knowledge graph
octocode graphrag search --query "authentication modules"

# Watch for changes
octocode watch

📋 Command Reference

Command	Description	Example
`octocode index`	Index the codebase	`octocode index --reindex`
`octocode search <query>`	Semantic code search	`octocode search "error handling"`
`octocode graphrag <operation>`	Knowledge graph operations	`octocode graphrag search --query "auth"`
`octocode view [pattern]`	View code signatures	`octocode view "src/*/.rs" --md`
`octocode commit`	AI-powered git commit	`octocode commit --all`
`octocode review`	Code review assistant	`octocode review --focus security`
`octocode memory <operation>`	Memory management	`octocode memory remember "auth bugs"`
`octocode mcp`	Start MCP server	`octocode mcp --debug`
`octocode watch`	Auto-reindex on changes	`octocode watch --quiet`
`octocode config`	Manage configuration	`octocode config --show`

🧠 Memory Management

Octocode includes a powerful memory system for storing and retrieving project insights, decisions, and context using semantic search and relationship mapping.

Memory Operations

Command	Description	Example
`memorize`	Store new information	`octocode memory memorize --title "Bug Fix" --content "Details..."`
`remember`	Search memories semantically	`octocode memory remember "authentication issues"`
`forget`	Delete specific memories	`octocode memory forget --memory-id abc123`
`update`	Update existing memory	`octocode memory update abc123 --add-tags security`
`get`	Retrieve memory by ID	`octocode memory get abc123`
`recent`	List recent memories	`octocode memory recent --limit 10`
`by-type`	Filter by memory type	`octocode memory by-type bug_fix`
`by-tags`	Filter by tags	`octocode memory by-tags security,auth`
`for-files`	Find memories for files	`octocode memory for-files src/auth.rs`
`stats`	Show memory statistics	`octocode memory stats`
`cleanup`	Remove old memories	`octocode memory cleanup`
`clear-all`	Delete all memories	`octocode memory clear-all --yes`
`relate`	Create relationships	`octocode memory relate source-id target-id`

Memory Types

code - Code-related insights and patterns
bug_fix - Bug reports and solutions
feature - Feature implementations and decisions
architecture - Architectural decisions and patterns
performance - Performance optimizations and metrics
security - Security considerations and fixes
testing - Test strategies and results
documentation - Documentation notes and updates

Examples

# Store a bug fix with context
octocode memory memorize \
  --title "JWT Token Validation Fix" \
  --content "Fixed race condition in token refresh logic by adding mutex lock" \
  --memory-type bug_fix \
  --importance 0.8 \
  --tags security,jwt,race-condition \
  --files src/auth/jwt.rs,src/middleware/auth.rs

# Search for authentication-related memories
octocode memory remember "JWT authentication problems" \
  --memory-types bug_fix,security \
  --min-relevance 0.7

# Get all security-related memories
octocode memory by-tags security --format json

# Clear all memory data (useful for testing/reset)
octocode memory clear-all --yes

🔧 Configuration

Octocode stores configuration in ~/.local/share/octocode/config.toml.

Required Setup

# Set Voyage AI API key (required for embeddings)
export VOYAGE_API_KEY="your-voyage-api-key"

# Optional: Set OpenRouter API key for LLM features
export OPENROUTER_API_KEY="your-openrouter-api-key"

Advanced Configuration

# View current configuration
octocode config --show

# Use local models (macOS only - requires building from source)
octocode config \
  --code-embedding-model "fastembed:jinaai/jina-embeddings-v2-base-code" \
  --text-embedding-model "fastembed:sentence-transformers/all-MiniLM-L6-v2-quantized"

# Use different cloud embedding provider
octocode config \
  --code-embedding-model "jina:jina-embeddings-v2-base-code" \
  --text-embedding-model "jina:jina-embeddings-v2-base-en"

# Enable GraphRAG with AI descriptions
octocode config --graphrag-enabled true

# Set custom OpenRouter model
octocode config --model "openai/gpt-4o-mini"

Default Models

Code embedding: voyage:voyage-code-2 (Voyage AI)
Text embedding: voyage:voyage-2 (Voyage AI)
LLM: openai/gpt-4o-mini (via OpenRouter)

Platform Support

Windows/Linux: Cloud embeddings only (Voyage AI, Jina AI, Google)
macOS: Local embeddings available (FastEmbed, SentenceTransformer) + cloud options

📚 Documentation

Architecture - Core components and system design
Configuration - Setup and configuration options
Advanced Usage - Advanced features and workflows
Contributing - Development setup and contribution guidelines
Performance - Performance metrics and optimization tips

🔒 Privacy & Security

🏠 Local-first option: FastEmbed and SentenceTransformer run entirely offline (macOS only)
🔑 Secure storage: API keys stored locally, environment variables supported
📁 Respects .gitignore: Never indexes sensitive files or directories
🛡️ MCP security: Server runs locally with no external network access for search
🌐 Cloud embeddings: Voyage AI and other providers process only file metadata, not source code

🌐 Supported Languages

Language	Extensions	Features
Rust	`.rs`	Full AST parsing, pub/use detection, module structure
Python	`.py`	Import/class/function extraction, docstring parsing
JavaScript	`.js`, `.jsx`	ES6 imports/exports, function declarations
TypeScript	`.ts`, `.tsx`	Type definitions, interface extraction
Go	`.go`	Package/import analysis, struct/interface parsing
PHP	`.php`	Class/function extraction, namespace support
C++	`.cpp`, `.hpp`, `.h`	Include analysis, class/function extraction
Ruby	`.rb`	Class/module extraction, method definitions
JSON	`.json`	Structure analysis, key extraction
Bash	`.sh`, `.bash`	Function and variable extraction
Markdown	`.md`	Document section indexing, header extraction

🤝 Support & Community

🐛 Issues: GitHub Issues
📧 Email: opensource@muvon.io
🏢 Company: Muvon Un Limited (Hong Kong)

⚖️ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Built with ❤️ by the Muvon team in Hong Kong