GitHub - antonbelev/llm-fuse
Extracto
Contribute to antonbelev/llm-fuse development by creating an account on GitHub.
Contenido
llm-fuse
llm-fuse is a command‑line tool designed to help you quickly generate an aggregated text file (or multiple files when chunking is enabled) from numerous files within a repository. This output can then be pasted into a large language model (LLM) prompt to provide context from multiple source files.
Features
- Local Directory Scanning: Recursively scan a local directory for files.
- Git‑tracked Files Option: Optionally limit scanning to Git‑tracked files.
- Remote Repository Cloning: Clone and process a Git repository from GitHub, GitLab, or any other Git‑based service.
- File Filtering: Include or exclude files using regular expressions.
- Token Counting: Roughly estimate token counts (approx. 1 token per 4 characters) for each file.
- Aggregated Output: Generates a primary output file with a summary header, file system diagram, and individual file sections.
- Content Chunking: Automatically splits file content into manageable chunks if it exceeds a specified maximum token threshold (via the
--max-tokensoption). The primary output file (group 1) includes the summary and file tree, while additional chunks are written to separate output files.
Examples
For examples of what the script can output have a look at EXAMPLE_OUTPUT - the file was generated by running:
llm-fuse --git --repo https://github.com/antonbelev/llm-fuse
Installation
To install llm-fuse globally, follow these steps:
- Clone the Repository: Open your terminal and run:
git clone https://github.com/antonbelev/llm-fuse.git
cd llm-fuse- Choose an Installation Method
You can install llm-fuse using one of the following methods:
Option 1: Global Installation with pip
You can install the package globally (or for your user) with pip. Note that if you use a Homebrew‑managed Python on macOS, you might encounter restrictions installing system‑wide packages. To install for your user, run:
This will install the command‑line tool llm-fuse into your user’s local bin directory (typically ~/.local/bin). Make sure that directory is added to your PATH.
Option 2: Installation with pipx (Recommended)
pipx allows you to install and run Python CLI applications in isolated environments without interfering with your system Python. This is especially useful if you want the command to be available globally without activating a virtual environment.
- Install pipx (if not already installed):
brew install pipx brew upgrade pipx
- Install
llm-fusewith pipx:
From the root of your repository, run:
pipx will create an isolated environment for the tool and install its CLI command. The command is typically named llm-fuse (as defined in the entry point).
- Ensure Your
PATHis Set Up (following instructions are For macOS zsh specific)
By default, pipx installs commands in ~/.local/bin. To ensure this directory is included in your system's PATH, run:
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc
This command does the following:
- Appends export
PATH="$HOME/.local/bin:$PATH"to your~/.zshrcfile. - Immediately applies the changes by running source
~/.zshrc.
Using pipx is recommended because it keeps the CLI application isolated from your system packages while still making the command available globally.
Usage
Too see all options run:
Processing a Local Directory
# Process the current directory and output to output.txt llm-fuse # Process a specific directory and only include Python files llm-fuse /path/to/repo --include "\.py$" # Exclude test files llm-fuse /path/to/repo --exclude "test" # Use only Git‑tracked files (if in a Git repository) llm-fuse /path/to/repo --git
Processing a Remote Repository
# Process a GitHub repository using the default branch llm-fuse --repo https://github.com/user/repo.git # Process a GitLab repository specifying a branch llm-fuse --repo https://gitlab.com/user/repo.git --branch develop
Enabling Content Chunking
If you have very large files, you can specify a maximum token threshold using the --max-tokens option. Files exceeding this threshold will be split into chunks, with additional output files created for subsequent chunks (only the primary output file includes the summary header and file system diagram).
# Process a repository and split large files into chunks of 4000 tokens
llm-fuse /path/to/repo --max-tokens 4000The output is written to output.txt by default. You can specify a different file name with the --output option.
Contributing
Contributions are welcome! Feel free to open issues or submit pull requests.
Local Development
To run the unit tests:
python3 -m unittest discover -s tests
License
This project is licensed under the MIT License. See the LICENSE file for details.
Fuente: GitHub