Absortio

Email → Summary → Bookmark → Email

GitHub - Muvon/octocode: Semantic code searcher and codebase utility

Extracto

Semantic code searcher and codebase utility. Contribute to Muvon/octocode development by creating an account on GitHub.

Resumen

Resumen Principal

Octocode, desarrollado por Muvon Un Limited, emerge como una solución innovadora diseñada para transformar la interacción de los desarrolladores con bases de código complejas. Se posiciona como un indexador de código inteligente y motor de búsqueda semántica que va más allá de la indexación tradicional, construyendo grafos de conocimiento detallados de todo el codebase. Su propuesta de valor reside en la combinación de capacidades avanzadas de IA con un diseño local-first, lo que permite una comprensión profunda del código, mapeo de relaciones entre componentes y asistencia inteligente. Facilita la navegación y el análisis mediante consultas en lenguaje natural, la generación automática de descripciones arquitectónicas y la identificación de dependencias. Además, mejora significativamente los flujos de trabajo de desarrollo con funciones como la generación de mensajes de commit y la revisión de código impulsada por IA, integrándose de forma fluida con el ecosistema de herramientas de IA gracias al soporte del Model Context Protocol (MCP) y una optimización de rendimiento robusta.

Elementos Clave

  • Grafos de Conocimiento (GraphRAG) y Búsqueda Semántica: Octocode destaca por su capacidad para construir grafos de conocimiento inteligentes que descubren automáticamente relaciones entre archivos y módulos, rastrean dependencias e incluso generan descripciones arquitectónicas con IA. Esto se complementa con una búsqueda semántica avanzada que permite a los desarrolladores realizar consultas en lenguaje natural a través de todo su código, documentación y texto, obteniendo resultados altamente relevantes gracias a la puntuación de similitud y la expansión de símbolos.
  • Capacidades de IA Integradas y Sistema de Memoria: El software incorpora funciones de IA para optimizar la productividad, como la generación inteligente de mensajes de commit, la revisión de código basada en mejores prácticas y un sistema de memoria persistente. Este último permite a los desarrolladores almacenar y recuperar insights, decisiones y contexto importantes mediante búsqueda semántica, facilitando el mantenimiento y la comprensión a largo plazo del proyecto.
  • Amplio Soporte Multi-Lenguaje y Rendimiento Optimizado: Octocode ofrece soporte para una diversidad de lenguajes de programación clave, incluyendo Rust, Python, JavaScript, TypeScript, Go, PHP, C++, Ruby, entre otros, utilizando el análisis basado en Tree-sitter para una extracción precisa de símbolos. La herramienta está diseñada para la eficiencia, con características como la indexación optimizada, el procesamiento por lotes inteligente, la persistencia frecuente de datos y el uso de la base de datos columnar Lance para una búsqueda vectorial rápida, además de opciones de embeddings locales y en la nube.
  • **Integración con MCP Server y Flexibilidad de Desplieg

Contenido

Octocode - Intelligent Code Indexer and Graph Builder

© 2025 Muvon Un Limited (Hong Kong) | Website | Product Page

License Rust

🚀 Overview

Octocode is a powerful code indexer and semantic search engine that builds intelligent knowledge graphs of your codebase. It combines advanced AI capabilities with local-first design to provide deep code understanding, relationship mapping, and intelligent assistance for developers.

✨ Key Features

🔍 Semantic Code Search

  • Natural language queries across your entire codebase
  • Multi-mode search (code, documentation, text, or all)
  • Intelligent ranking with similarity scoring
  • Symbol expansion for comprehensive results

🕸️ Knowledge Graph (GraphRAG)

  • Automatic relationship discovery between files and modules
  • Import/export dependency tracking
  • AI-powered file descriptions and architectural insights
  • Path finding between code components

🌐 Multi-Language Support

  • Rust, Python, JavaScript, TypeScript, Go, PHP
  • C++, Ruby, JSON, Bash, Markdown
  • Tree-sitter based parsing for accurate symbol extraction

🧠 AI-Powered Features

  • Smart commit message generation
  • Code review with best practices analysis
  • Memory system for storing insights, decisions, and context
  • Semantic memory search with vector similarity
  • Memory relationships and automatic context linking
  • Multiple LLM support via OpenRouter

🔌 MCP Server Integration

  • Built-in Model Context Protocol server
  • Seamless integration with AI assistants (Claude Desktop, etc.)
  • Real-time file watching and auto-reindexing
  • Rich tool ecosystem for code analysis

Performance & Flexibility

  • Optimized indexing: Batch metadata loading eliminates database query storms
  • Smart batching: 16 files per batch with token-aware API optimization
  • Frequent persistence: Data saved every 16 files (max 16 files at risk)
  • Fast file traversal: Single-pass progressive counting and processing
  • Local embedding models: FastEmbed and SentenceTransformer (macOS only)
  • Cloud embedding providers: Voyage AI (default), Jina AI, Google
  • Free tier available: Voyage AI provides 200M free tokens monthly
  • Lance columnar database for fast vector search
  • Incremental indexing and git-aware optimization

📦 Installation

Download Prebuilt Binary (Recommended)

# Universal install script (Linux, macOS, Windows) - requires curl
curl -fsSL https://raw.githubusercontent.com/Muvon/octocode/master/install.sh | sh

Or download manually from GitHub Releases.

Using Cargo (from Git)

cargo install --git https://github.com/Muvon/octocode

Build from Source

Prerequisites:

git clone https://github.com/Muvon/octocode.git
cd octocode

# macOS: Full build with local embeddings
cargo build --release

# Windows/Linux: Cloud embeddings only (due to ONNX Runtime issues)
cargo build --release --no-default-features

Note: Prebuilt binaries use cloud embeddings only. Local embeddings require building from source on macOS.

🔑 Getting Started - API Keys

⚠️ Important: Octocode requires API keys to function. Local embedding models are only available on macOS builds.

Required: Voyage AI (Embeddings)

export VOYAGE_API_KEY="your-voyage-api-key"
  • Free tier: 200M tokens per month
  • Get API key: voyageai.com
  • Used for: Code and text embeddings (semantic search)

Optional: OpenRouter (LLM Features)

export OPENROUTER_API_KEY="your-openrouter-api-key"
  • Get API key: openrouter.ai
  • Used for: Commit messages, code review, GraphRAG descriptions
  • Note: Basic search and indexing work without this

Platform Limitations

  • Windows/Linux: Must use cloud embeddings (Voyage AI default)
  • macOS: Can use local embeddings (build from source) or cloud embeddings

🚀 Quick Start

1. Setup API Keys (Required)

# Set Voyage AI API key for embeddings (free 200M tokens/month)
export VOYAGE_API_KEY="your-voyage-api-key"

# Optional: Set OpenRouter API key for LLM features (commit, review, GraphRAG)
export OPENROUTER_API_KEY="your-openrouter-api-key"

Get your free API keys:

2. Basic Usage

# Index your current directory
octocode index

# Search your codebase
octocode search "HTTP request handling"

# View code signatures
octocode view "src/**/*.rs"

3. AI-Powered Git Workflow (Requires OpenRouter API Key)

# Generate intelligent commit messages
git add .
octocode commit

# Review code for best practices
octocode review

4. MCP Server for AI Assistants

# Start MCP server
octocode mcp

# Use with Claude Desktop or other MCP-compatible tools
# Provides: search_code, search_graphrag, memorize, remember, forget

4. Memory Management

# Store important insights and decisions
octocode memory memorize \
  --title "Authentication Bug Fix" \
  --content "Fixed JWT token validation in auth middleware" \
  --memory-type bug_fix \
  --tags security,jwt,auth

# Search your memory with semantic similarity
octocode memory remember "JWT authentication issues"

# Get memories by type, tags, or files
octocode memory by-type bug_fix
octocode memory by-tags security,auth
octocode memory for-files src/auth.rs

# Clear all memory data (useful for testing)
octocode memory clear-all --yes

5. Advanced Features

# Enable GraphRAG with AI descriptions (requires OpenRouter API key)
octocode config --graphrag-enabled true
octocode index

# Search the knowledge graph
octocode graphrag search --query "authentication modules"

# Watch for changes
octocode watch

📋 Command Reference

Command Description Example
octocode index Index the codebase octocode index --reindex
octocode search <query> Semantic code search octocode search "error handling"
octocode graphrag <operation> Knowledge graph operations octocode graphrag search --query "auth"
octocode view [pattern] View code signatures octocode view "src/**/*.rs" --md
octocode commit AI-powered git commit octocode commit --all
octocode review Code review assistant octocode review --focus security
octocode memory <operation> Memory management octocode memory remember "auth bugs"
octocode mcp Start MCP server octocode mcp --debug
octocode watch Auto-reindex on changes octocode watch --quiet
octocode config Manage configuration octocode config --show

🧠 Memory Management

Octocode includes a powerful memory system for storing and retrieving project insights, decisions, and context using semantic search and relationship mapping.

Memory Operations

Command Description Example
memorize Store new information octocode memory memorize --title "Bug Fix" --content "Details..."
remember Search memories semantically octocode memory remember "authentication issues"
forget Delete specific memories octocode memory forget --memory-id abc123
update Update existing memory octocode memory update abc123 --add-tags security
get Retrieve memory by ID octocode memory get abc123
recent List recent memories octocode memory recent --limit 10
by-type Filter by memory type octocode memory by-type bug_fix
by-tags Filter by tags octocode memory by-tags security,auth
for-files Find memories for files octocode memory for-files src/auth.rs
stats Show memory statistics octocode memory stats
cleanup Remove old memories octocode memory cleanup
clear-all Delete all memories octocode memory clear-all --yes
relate Create relationships octocode memory relate source-id target-id

Memory Types

  • code - Code-related insights and patterns
  • bug_fix - Bug reports and solutions
  • feature - Feature implementations and decisions
  • architecture - Architectural decisions and patterns
  • performance - Performance optimizations and metrics
  • security - Security considerations and fixes
  • testing - Test strategies and results
  • documentation - Documentation notes and updates

Examples

# Store a bug fix with context
octocode memory memorize \
  --title "JWT Token Validation Fix" \
  --content "Fixed race condition in token refresh logic by adding mutex lock" \
  --memory-type bug_fix \
  --importance 0.8 \
  --tags security,jwt,race-condition \
  --files src/auth/jwt.rs,src/middleware/auth.rs

# Search for authentication-related memories
octocode memory remember "JWT authentication problems" \
  --memory-types bug_fix,security \
  --min-relevance 0.7

# Get all security-related memories
octocode memory by-tags security --format json

# Clear all memory data (useful for testing/reset)
octocode memory clear-all --yes

🔧 Configuration

Octocode stores configuration in ~/.local/share/octocode/config.toml.

Required Setup

# Set Voyage AI API key (required for embeddings)
export VOYAGE_API_KEY="your-voyage-api-key"

# Optional: Set OpenRouter API key for LLM features
export OPENROUTER_API_KEY="your-openrouter-api-key"

Advanced Configuration

# View current configuration
octocode config --show

# Use local models (macOS only - requires building from source)
octocode config \
  --code-embedding-model "fastembed:jinaai/jina-embeddings-v2-base-code" \
  --text-embedding-model "fastembed:sentence-transformers/all-MiniLM-L6-v2-quantized"

# Use different cloud embedding provider
octocode config \
  --code-embedding-model "jina:jina-embeddings-v2-base-code" \
  --text-embedding-model "jina:jina-embeddings-v2-base-en"

# Enable GraphRAG with AI descriptions
octocode config --graphrag-enabled true

# Set custom OpenRouter model
octocode config --model "openai/gpt-4o-mini"

Default Models

  • Code embedding: voyage:voyage-code-2 (Voyage AI)
  • Text embedding: voyage:voyage-2 (Voyage AI)
  • LLM: openai/gpt-4o-mini (via OpenRouter)

Platform Support

  • Windows/Linux: Cloud embeddings only (Voyage AI, Jina AI, Google)
  • macOS: Local embeddings available (FastEmbed, SentenceTransformer) + cloud options

📚 Documentation

🔒 Privacy & Security

  • 🏠 Local-first option: FastEmbed and SentenceTransformer run entirely offline (macOS only)
  • 🔑 Secure storage: API keys stored locally, environment variables supported
  • 📁 Respects .gitignore: Never indexes sensitive files or directories
  • 🛡️ MCP security: Server runs locally with no external network access for search
  • 🌐 Cloud embeddings: Voyage AI and other providers process only file metadata, not source code

🌐 Supported Languages

Language Extensions Features
Rust .rs Full AST parsing, pub/use detection, module structure
Python .py Import/class/function extraction, docstring parsing
JavaScript .js, .jsx ES6 imports/exports, function declarations
TypeScript .ts, .tsx Type definitions, interface extraction
Go .go Package/import analysis, struct/interface parsing
PHP .php Class/function extraction, namespace support
C++ .cpp, .hpp, .h Include analysis, class/function extraction
Ruby .rb Class/module extraction, method definitions
JSON .json Structure analysis, key extraction
Bash .sh, .bash Function and variable extraction
Markdown .md Document section indexing, header extraction

🤝 Support & Community

⚖️ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Built with ❤️ by the Muvon team in Hong Kong

Fuente: GitHub