GitHub - Muvon/octocode: Semantic code searcher and codebase utility
Extracto
Semantic code searcher and codebase utility. Contribute to Muvon/octocode development by creating an account on GitHub.
Resumen
Resumen Principal
Octocode, desarrollado por Muvon Un Limited, emerge como una solución innovadora diseñada para transformar la interacción de los desarrolladores con bases de código complejas. Se posiciona como un indexador de código inteligente y motor de búsqueda semántica que va más allá de la indexación tradicional, construyendo grafos de conocimiento detallados de todo el codebase. Su propuesta de valor reside en la combinación de capacidades avanzadas de IA con un diseño local-first, lo que permite una comprensión profunda del código, mapeo de relaciones entre componentes y asistencia inteligente. Facilita la navegación y el análisis mediante consultas en lenguaje natural, la generación automática de descripciones arquitectónicas y la identificación de dependencias. Además, mejora significativamente los flujos de trabajo de desarrollo con funciones como la generación de mensajes de commit y la revisión de código impulsada por IA, integrándose de forma fluida con el ecosistema de herramientas de IA gracias al soporte del Model Context Protocol (MCP) y una optimización de rendimiento robusta.
Elementos Clave
- Grafos de Conocimiento (GraphRAG) y Búsqueda Semántica: Octocode destaca por su capacidad para construir grafos de conocimiento inteligentes que descubren automáticamente relaciones entre archivos y módulos, rastrean dependencias e incluso generan descripciones arquitectónicas con IA. Esto se complementa con una búsqueda semántica avanzada que permite a los desarrolladores realizar consultas en lenguaje natural a través de todo su código, documentación y texto, obteniendo resultados altamente relevantes gracias a la puntuación de similitud y la expansión de símbolos.
- Capacidades de IA Integradas y Sistema de Memoria: El software incorpora funciones de IA para optimizar la productividad, como la generación inteligente de mensajes de commit, la revisión de código basada en mejores prácticas y un sistema de memoria persistente. Este último permite a los desarrolladores almacenar y recuperar insights, decisiones y contexto importantes mediante búsqueda semántica, facilitando el mantenimiento y la comprensión a largo plazo del proyecto.
- Amplio Soporte Multi-Lenguaje y Rendimiento Optimizado: Octocode ofrece soporte para una diversidad de lenguajes de programación clave, incluyendo Rust, Python, JavaScript, TypeScript, Go, PHP, C++, Ruby, entre otros, utilizando el análisis basado en Tree-sitter para una extracción precisa de símbolos. La herramienta está diseñada para la eficiencia, con características como la indexación optimizada, el procesamiento por lotes inteligente, la persistencia frecuente de datos y el uso de la base de datos columnar Lance para una búsqueda vectorial rápida, además de opciones de embeddings locales y en la nube.
- **Integración con MCP Server y Flexibilidad de Desplieg
Contenido
Octocode - Intelligent Code Indexer and Graph Builder
© 2025 Muvon Un Limited (Hong Kong) | Website | Product Page
🚀 Overview
Octocode is a powerful code indexer and semantic search engine that builds intelligent knowledge graphs of your codebase. It combines advanced AI capabilities with local-first design to provide deep code understanding, relationship mapping, and intelligent assistance for developers.
✨ Key Features
🔍 Semantic Code Search
- Natural language queries across your entire codebase
- Multi-mode search (code, documentation, text, or all)
- Intelligent ranking with similarity scoring
- Symbol expansion for comprehensive results
🕸️ Knowledge Graph (GraphRAG)
- Automatic relationship discovery between files and modules
- Import/export dependency tracking
- AI-powered file descriptions and architectural insights
- Path finding between code components
🌐 Multi-Language Support
- Rust, Python, JavaScript, TypeScript, Go, PHP
- C++, Ruby, JSON, Bash, Markdown
- Tree-sitter based parsing for accurate symbol extraction
🧠 AI-Powered Features
- Smart commit message generation
- Code review with best practices analysis
- Memory system for storing insights, decisions, and context
- Semantic memory search with vector similarity
- Memory relationships and automatic context linking
- Multiple LLM support via OpenRouter
🔌 MCP Server Integration
- Built-in Model Context Protocol server
- Seamless integration with AI assistants (Claude Desktop, etc.)
- Real-time file watching and auto-reindexing
- Rich tool ecosystem for code analysis
⚡ Performance & Flexibility
- Optimized indexing: Batch metadata loading eliminates database query storms
- Smart batching: 16 files per batch with token-aware API optimization
- Frequent persistence: Data saved every 16 files (max 16 files at risk)
- Fast file traversal: Single-pass progressive counting and processing
- Local embedding models: FastEmbed and SentenceTransformer (macOS only)
- Cloud embedding providers: Voyage AI (default), Jina AI, Google
- Free tier available: Voyage AI provides 200M free tokens monthly
- Lance columnar database for fast vector search
- Incremental indexing and git-aware optimization
📦 Installation
Download Prebuilt Binary (Recommended)
# Universal install script (Linux, macOS, Windows) - requires curl curl -fsSL https://raw.githubusercontent.com/Muvon/octocode/master/install.sh | sh
Or download manually from GitHub Releases.
Using Cargo (from Git)
cargo install --git https://github.com/Muvon/octocode
Build from Source
Prerequisites:
- Rust 1.70+ (install from rustup.rs)
- Git (for repository features)
git clone https://github.com/Muvon/octocode.git cd octocode # macOS: Full build with local embeddings cargo build --release # Windows/Linux: Cloud embeddings only (due to ONNX Runtime issues) cargo build --release --no-default-features
Note: Prebuilt binaries use cloud embeddings only. Local embeddings require building from source on macOS.
🔑 Getting Started - API Keys
Required: Voyage AI (Embeddings)
export VOYAGE_API_KEY="your-voyage-api-key"
- Free tier: 200M tokens per month
- Get API key: voyageai.com
- Used for: Code and text embeddings (semantic search)
Optional: OpenRouter (LLM Features)
export OPENROUTER_API_KEY="your-openrouter-api-key"
- Get API key: openrouter.ai
- Used for: Commit messages, code review, GraphRAG descriptions
- Note: Basic search and indexing work without this
Platform Limitations
- Windows/Linux: Must use cloud embeddings (Voyage AI default)
- macOS: Can use local embeddings (build from source) or cloud embeddings
🚀 Quick Start
1. Setup API Keys (Required)
# Set Voyage AI API key for embeddings (free 200M tokens/month) export VOYAGE_API_KEY="your-voyage-api-key" # Optional: Set OpenRouter API key for LLM features (commit, review, GraphRAG) export OPENROUTER_API_KEY="your-openrouter-api-key"
Get your free API keys:
- Voyage AI: Get free API key (200M tokens/month free)
- OpenRouter: Get API key (optional, for LLM features)
2. Basic Usage
# Index your current directory octocode index # Search your codebase octocode search "HTTP request handling" # View code signatures octocode view "src/**/*.rs"
3. AI-Powered Git Workflow (Requires OpenRouter API Key)
# Generate intelligent commit messages git add . octocode commit # Review code for best practices octocode review
4. MCP Server for AI Assistants
# Start MCP server octocode mcp # Use with Claude Desktop or other MCP-compatible tools # Provides: search_code, search_graphrag, memorize, remember, forget
4. Memory Management
# Store important insights and decisions octocode memory memorize \ --title "Authentication Bug Fix" \ --content "Fixed JWT token validation in auth middleware" \ --memory-type bug_fix \ --tags security,jwt,auth # Search your memory with semantic similarity octocode memory remember "JWT authentication issues" # Get memories by type, tags, or files octocode memory by-type bug_fix octocode memory by-tags security,auth octocode memory for-files src/auth.rs # Clear all memory data (useful for testing) octocode memory clear-all --yes
5. Advanced Features
# Enable GraphRAG with AI descriptions (requires OpenRouter API key) octocode config --graphrag-enabled true octocode index # Search the knowledge graph octocode graphrag search --query "authentication modules" # Watch for changes octocode watch
📋 Command Reference
| Command | Description | Example |
|---|---|---|
octocode index |
Index the codebase | octocode index --reindex |
octocode search <query> |
Semantic code search | octocode search "error handling" |
octocode graphrag <operation> |
Knowledge graph operations | octocode graphrag search --query "auth" |
octocode view [pattern] |
View code signatures | octocode view "src/**/*.rs" --md |
octocode commit |
AI-powered git commit | octocode commit --all |
octocode review |
Code review assistant | octocode review --focus security |
octocode memory <operation> |
Memory management | octocode memory remember "auth bugs" |
octocode mcp |
Start MCP server | octocode mcp --debug |
octocode watch |
Auto-reindex on changes | octocode watch --quiet |
octocode config |
Manage configuration | octocode config --show |
🧠 Memory Management
Octocode includes a powerful memory system for storing and retrieving project insights, decisions, and context using semantic search and relationship mapping.
Memory Operations
| Command | Description | Example |
|---|---|---|
memorize |
Store new information | octocode memory memorize --title "Bug Fix" --content "Details..." |
remember |
Search memories semantically | octocode memory remember "authentication issues" |
forget |
Delete specific memories | octocode memory forget --memory-id abc123 |
update |
Update existing memory | octocode memory update abc123 --add-tags security |
get |
Retrieve memory by ID | octocode memory get abc123 |
recent |
List recent memories | octocode memory recent --limit 10 |
by-type |
Filter by memory type | octocode memory by-type bug_fix |
by-tags |
Filter by tags | octocode memory by-tags security,auth |
for-files |
Find memories for files | octocode memory for-files src/auth.rs |
stats |
Show memory statistics | octocode memory stats |
cleanup |
Remove old memories | octocode memory cleanup |
clear-all |
Delete all memories | octocode memory clear-all --yes |
relate |
Create relationships | octocode memory relate source-id target-id |
Memory Types
code- Code-related insights and patternsbug_fix- Bug reports and solutionsfeature- Feature implementations and decisionsarchitecture- Architectural decisions and patternsperformance- Performance optimizations and metricssecurity- Security considerations and fixestesting- Test strategies and resultsdocumentation- Documentation notes and updates
Examples
# Store a bug fix with context octocode memory memorize \ --title "JWT Token Validation Fix" \ --content "Fixed race condition in token refresh logic by adding mutex lock" \ --memory-type bug_fix \ --importance 0.8 \ --tags security,jwt,race-condition \ --files src/auth/jwt.rs,src/middleware/auth.rs # Search for authentication-related memories octocode memory remember "JWT authentication problems" \ --memory-types bug_fix,security \ --min-relevance 0.7 # Get all security-related memories octocode memory by-tags security --format json # Clear all memory data (useful for testing/reset) octocode memory clear-all --yes
🔧 Configuration
Octocode stores configuration in ~/.local/share/octocode/config.toml.
Required Setup
# Set Voyage AI API key (required for embeddings) export VOYAGE_API_KEY="your-voyage-api-key" # Optional: Set OpenRouter API key for LLM features export OPENROUTER_API_KEY="your-openrouter-api-key"
Advanced Configuration
# View current configuration octocode config --show # Use local models (macOS only - requires building from source) octocode config \ --code-embedding-model "fastembed:jinaai/jina-embeddings-v2-base-code" \ --text-embedding-model "fastembed:sentence-transformers/all-MiniLM-L6-v2-quantized" # Use different cloud embedding provider octocode config \ --code-embedding-model "jina:jina-embeddings-v2-base-code" \ --text-embedding-model "jina:jina-embeddings-v2-base-en" # Enable GraphRAG with AI descriptions octocode config --graphrag-enabled true # Set custom OpenRouter model octocode config --model "openai/gpt-4o-mini"
Default Models
- Code embedding:
voyage:voyage-code-2(Voyage AI) - Text embedding:
voyage:voyage-2(Voyage AI) - LLM:
openai/gpt-4o-mini(via OpenRouter)
Platform Support
- Windows/Linux: Cloud embeddings only (Voyage AI, Jina AI, Google)
- macOS: Local embeddings available (FastEmbed, SentenceTransformer) + cloud options
📚 Documentation
- Architecture - Core components and system design
- Configuration - Setup and configuration options
- Advanced Usage - Advanced features and workflows
- Contributing - Development setup and contribution guidelines
- Performance - Performance metrics and optimization tips
🔒 Privacy & Security
- 🏠 Local-first option: FastEmbed and SentenceTransformer run entirely offline (macOS only)
- 🔑 Secure storage: API keys stored locally, environment variables supported
- 📁 Respects .gitignore: Never indexes sensitive files or directories
- 🛡️ MCP security: Server runs locally with no external network access for search
- 🌐 Cloud embeddings: Voyage AI and other providers process only file metadata, not source code
🌐 Supported Languages
| Language | Extensions | Features |
|---|---|---|
| Rust | .rs |
Full AST parsing, pub/use detection, module structure |
| Python | .py |
Import/class/function extraction, docstring parsing |
| JavaScript | .js, .jsx |
ES6 imports/exports, function declarations |
| TypeScript | .ts, .tsx |
Type definitions, interface extraction |
| Go | .go |
Package/import analysis, struct/interface parsing |
| PHP | .php |
Class/function extraction, namespace support |
| C++ | .cpp, .hpp, .h |
Include analysis, class/function extraction |
| Ruby | .rb |
Class/module extraction, method definitions |
| JSON | .json |
Structure analysis, key extraction |
| Bash | .sh, .bash |
Function and variable extraction |
| Markdown | .md |
Document section indexing, header extraction |
🤝 Support & Community
- 🐛 Issues: GitHub Issues
- 📧 Email: opensource@muvon.io
- 🏢 Company: Muvon Un Limited (Hong Kong)
⚖️ License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Built with ❤️ by the Muvon team in Hong Kong
Fuente: GitHub