GitHub - Sanix-Darker/cisv: The csv parser on steroids.

CISV

Cisv is a csv parser on steroids... literally. It's a high-performance CSV parser/writer leveraging SIMD instructions and zero-copy memory mapping. Available as both a Node.js native addon and standalone CLI tool with extensive configuration options.

I wrote about basics in a blog post, you can read here :https://sanixdk.xyz/blogs/how-i-accidentally-created-the-fastest-csv-parser-ever-made.

PERFORMANCE

469,968 MB/s throughput on 2M row CSV files (AVX-512)
10-100x faster than popular CSV parsers
Zero-copy memory-mapped I/O with kernel optimizations
SIMD accelerated with AVX-512/AVX2 auto-detection
Dynamic lookup tables for configurable parsing

CLI BENCHMARKS WITH DOCKER

$ docker build -t cisv-benchmark .

To run them... choosing some specs for the container to size resources, you can :

$ docker run --rm      \
    --cpus="2.0"       \
    --memory="4g"      \
    --memory-swap="4g" \
    --cpu-shares=1024  \
    --security-opt     \
    seccomp=unconfined \
    cisv-benchmark

BENCHMARKS

Benchmarks comparison with existing popular tools, cf pipeline you can check : (https://github.com/Sanix-Darker/cisv/actions/runs/17194915214/job/48775516036)

SYNCHRONOUS RESULTS

Library	Speed (MB/s)	Avg Time (ms)	Operations/sec
cisv (sync)	30.04	0.02	64936
csv-parse (sync)	13.35	0.03	28870
papaparse (sync)	25.16	0.02	54406

SYNCHRONOUS RESULTS (WITH DATA ACCESS)

Library	Speed (MB/s)	Avg Time (ms)	Operations/sec
cisv (sync)	31.24	0.01	67543
csv-parse (sync)	15.42	0.03	33335
papaparse (sync)	25.49	0.02	55107

ASYNCHRONOUS RESULTS

Library	Speed (MB/s)	Avg Time (ms)	Operations/sec
cisv (async/stream)	61.31	0.01	132561
papaparse (async/stream)	19.24	0.02	41603
neat-csv (async/promise)	9.09	0.05	19655

ASYNCHRONOUS RESULTS (WITH DATA ACCESS)

Library	Speed (MB/s)	Avg Time (ms)	Operations/sec
cisv (async/stream)	24.59	0.02	53160
papaparse (async/stream)	21.86	0.02	47260
neat-csv (async/promise)	9.38	0.05	20283

INSTALLATION

NODE.JS PACKAGE

CLI TOOL (FROM SOURCE)

git clone https://github.com/sanix-darker/cisv
cd cisv
make cli
sudo make install-cli

BUILD FROM SOURCE (NODE.JS ADDON)

npm install -g node-gyp
make build

QUICK START

NODE.JS

const { cisvParser } = require('cisv');

// Basic usage
const parser = new cisvParser();
const rows = parser.parseSync('./data.csv');

// With configuration (optional)
const tsv_parser = new cisvParser({
    delimiter: '\t',
    quote: "'",
    trim: true
});
const tsv_rows = tsv_parser.parseSync('./data.tsv');

CLI

# Basic parsing
cisv data.csv

# Parse TSV file
cisv -d $'\t' data.tsv

# Parse with custom quote and trim
cisv -q "'" -t data.csv

# Skip comment lines
cisv -m '#' config.csv

CONFIGURATION OPTIONS

Parser Configuration

const parser = new cisvParser({
    // Field delimiter character (default: ',')
    delimiter: ',',

    // Quote character (default: '"')
    quote: '"',

    // Escape character (null for RFC4180 "" style, default: null)
    escape: null,

    // Comment character to skip lines (default: null)
    comment: '#',

    // Trim whitespace from fields (default: false)
    trim: true,

    // Skip empty lines (default: false)
    skipEmptyLines: true,

    // Use relaxed parsing rules (default: false)
    relaxed: false,

    // Skip lines with parse errors (default: false)
    skipLinesWithError: true,

    // Maximum row size in bytes (0 = unlimited, default: 0)
    maxRowSize: 1048576,

    // Start parsing from line N (1-based, default: 1)
    fromLine: 10,

    // Stop parsing at line N (0 = until end, default: 0)
    toLine: 1000
});

Dynamic Configuration

// Set configuration after creation
parser.setConfig({
    delimiter: ';',
    quote: "'",
    trim: true
});

// Get current configuration
const config = parser.getConfig();
console.log(config);

API REFERENCE

TYPESCRIPT DEFINITIONS

interface CisvConfig {
    delimiter?: string;
    quote?: string;
    escape?: string | null;
    comment?: string | null;
    trim?: boolean;
    skipEmptyLines?: boolean;
    relaxed?: boolean;
    skipLinesWithError?: boolean;
    maxRowSize?: number;
    fromLine?: number;
    toLine?: number;
}

interface ParsedRow extends Array<string> {}

interface ParseStats {
    rowCount: number;
    fieldCount: number;
    totalBytes: number;
    parseTime: number;
    currentLine: number;
}

interface TransformInfo {
    cTransformCount: number;
    jsTransformCount: number;
    fieldIndices: number[];
}

class cisvParser {
    constructor(config?: CisvConfig);
    parseSync(path: string): ParsedRow[];
    parse(path: string): Promise<ParsedRow[]>;
    parseString(csv: string): ParsedRow[];
    write(chunk: string | Buffer): void;
    end(): void;
    getRows(): ParsedRow[];
    clear(): void;
    setConfig(config: CisvConfig): void;
    getConfig(): CisvConfig;
    transform(fieldIndex: number, type: string | Function): this;
    removeTransform(fieldIndex: number): this;
    clearTransforms(): this;
    getStats(): ParseStats;
    getTransformInfo(): TransformInfo;
    destroy(): void;

    static countRows(path: string): number;
    static countRowsWithConfig(path: string, config?: CisvConfig): number;
}

BASIC PARSING

import { cisvParser } from "cisv";

// Default configuration (standard CSV)
const parser = new cisvParser();
const rows = parser.parseSync('data.csv');

// Custom configuration (TSV with single quotes)
const tsvParser = new cisvParser({
    delimiter: '\t',
    quote: "'"
});
const tsvRows = tsvParser.parseSync('data.tsv');

// Parse specific line range
const rangeParser = new cisvParser({
    fromLine: 100,
    toLine: 1000
});
const subset = rangeParser.parseSync('large.csv');

// Skip comments and empty lines
const cleanParser = new cisvParser({
    comment: '#',
    skipEmptyLines: true,
    trim: true
});
const cleanData = cleanParser.parseSync('config.csv');

STREAMING

import { cisvParser } from "cisv";
import fs from 'fs';

const streamParser = new cisvParser({
    delimiter: ',',
    trim: true
});

const stream = fs.createReadStream('huge-file.csv');

stream.on('data', chunk => streamParser.write(chunk));
stream.on('end', () => {
    streamParser.end();
    const results = streamParser.getRows();
    console.log(`Parsed ${results.length} rows`);
});

DATA TRANSFORMATION

const parser = new cisvParser();

// Built-in C transforms (optimized)
parser
    .transform(0, 'uppercase')      // Column 0 to uppercase
    .transform(1, 'lowercase')       // Column 1 to lowercase
    .transform(2, 'trim')           // Column 2 trim whitespace
    .transform(3, 'to_int')         // Column 3 to integer
    .transform(4, 'to_float')       // Column 4 to float
    .transform(5, 'base64_encode')  // Column 5 to base64
    .transform(6, 'hash_sha256');   // Column 6 to SHA256

// Custom fieldname transform :
parser
    .transform('name', 'uppercase');

// Custom row transform :
parser
    .transformRow((row, rowObj) => {console.log(row}});

// Custom JavaScript transforms
parser.transform(7, value => new Date(value).toISOString());

// Apply to all fields
parser.transform(-1, value => value.replace(/[^\w\s]/gi, ''));

const transformed = parser.parseSync('data.csv');

ROW COUNTING

import { cisvParser } from "cisv";

// Fast row counting without parsing
const count = cisvParser.countRows('large.csv');

// Count with specific configuration
const tsvCount = cisvParser.countRowsWithConfig('data.tsv', {
    delimiter: '\t',
    skipEmptyLines: true,
    fromLine: 10,
    toLine: 1000
});

CLI USAGE

PARSING OPTIONS

cisv [OPTIONS] [FILE]

General Options:
  -h, --help              Show help message
  -v, --version           Show version
  -o, --output FILE       Write to FILE instead of stdout
  -b, --benchmark         Run benchmark mode

Configuration Options:
  -d, --delimiter DELIM   Field delimiter (default: ,)
  -q, --quote CHAR        Quote character (default: ")
  -e, --escape CHAR       Escape character (default: RFC4180 style)
  -m, --comment CHAR      Comment character (default: none)
  -t, --trim              Trim whitespace from fields
  -r, --relaxed           Use relaxed parsing rules
  --skip-empty            Skip empty lines
  --skip-errors           Skip lines with parse errors
  --max-row SIZE          Maximum row size in bytes
  --from-line N           Start from line N (1-based)
  --to-line N             Stop at line N

Processing Options:
  -s, --select COLS       Select columns (comma-separated indices)
  -c, --count             Show only row count
  --head N                Show first N rows
  --tail N                Show last N rows

EXAMPLES

# Parse TSV file
cisv -d $'\t' data.tsv

# Parse CSV with semicolon delimiter and single quotes
cisv -d ';' -q "'" european.csv

# Skip comment lines starting with #
cisv -m '#' config.csv

# Trim whitespace and skip empty lines
cisv -t --skip-empty messy.csv

# Parse lines 100-1000 only
cisv --from-line 100 --to-line 1000 large.csv

# Select specific columns
cisv -s 0,2,5,7 data.csv

# Count rows with specific configuration
cisv -c -d $'\t' --skip-empty data.tsv

# Benchmark with custom delimiter
cisv -b -d ';' european.csv

WRITING

cisv write [OPTIONS]

Options:
  -g, --generate N       Generate N rows of test data
  -o, --output FILE      Output file
  -d, --delimiter DELIM  Field delimiter
  -Q, --quote-all        Quote all fields
  -r, --crlf             Use CRLF line endings
  -n, --null TEXT        Null representation
  -b, --benchmark        Benchmark mode

BENCHMARKS

PARSER PERFORMANCE (273 MB, 5M ROWS)

Parser	Speed (MB/s)	Time (ms)	Relative
cisv	7,184	38	1.0x (fastest)
rust-csv	391	698	18x slower
xsv	650	420	11x slower
csvkit	28	9,875	260x slower

NODE.JS LIBRARY BENCHMARKS

Library	Speed (MB/s)	Operations/sec	Configuration Support
cisv	61.24	136,343	Full
csv-parse	15.48	34,471	Partial
papaparse	25.67	57,147	Partial

(you can check more benchmarks details from release pipelines)

RUNNING BENCHMARKS

# CLI benchmarks
make clean && make cli && make benchmark-cli

# Node.js benchmarks
npm run benchmark

# Benchmark with custom configuration
cisv -b -d ';' -q "'" --trim european.csv

TECHNICAL ARCHITECTURE

SIMD Processing: AVX-512 (64-byte vectors) or AVX2 (32-byte vectors) for parallel processing
Dynamic Lookup Tables: Generated per-configuration for optimal state transitions
Memory Mapping: Direct kernel-to-userspace zero-copy with mmap()
Optimized Buffering: 1MB ring buffer sized for L3 cache efficiency
Compiler Optimizations: LTO and architecture-specific tuning with -march=native
Configurable Parsing: RFC 4180 compliant with extensive customization options

FEATURES (PROS)

RFC 4180 compliant with configurable extensions
Handles quoted fields with embedded delimiters
Support for multiple CSV dialects (TSV, PSV, etc.)
Comment line support
Field trimming and empty line handling
Line range parsing for large files
Streaming API for unlimited file sizes
Safe fallback for non-x86 architectures
High-performance CSV writer with SIMD optimization
Row counting without full parsing

LIMITATIONS

Linux/Unix support only (optimized for x86_64 CPU)
Windows support planned for future release

LICENSE

ACKNOWLEDGMENTS

Inspired by:

simdjson - Parsing gigabytes of JSON per second
xsv - Fast CSV command line toolkit
rust-csv - CSV parser for Rust

GitHub - Sanix-Darker/cisv: The csv parser on steroids.

Extracto

Contenido

CISV

PERFORMANCE

CLI BENCHMARKS WITH DOCKER

BENCHMARKS

SYNCHRONOUS RESULTS

SYNCHRONOUS RESULTS (WITH DATA ACCESS)

ASYNCHRONOUS RESULTS

ASYNCHRONOUS RESULTS (WITH DATA ACCESS)

INSTALLATION

NODE.JS PACKAGE

CLI TOOL (FROM SOURCE)

BUILD FROM SOURCE (NODE.JS ADDON)

QUICK START

NODE.JS

CLI

CONFIGURATION OPTIONS

Parser Configuration

Dynamic Configuration

API REFERENCE

TYPESCRIPT DEFINITIONS

BASIC PARSING

STREAMING

DATA TRANSFORMATION

ROW COUNTING

CLI USAGE

PARSING OPTIONS

EXAMPLES

WRITING

BENCHMARKS

PARSER PERFORMANCE (273 MB, 5M ROWS)

NODE.JS LIBRARY BENCHMARKS

RUNNING BENCHMARKS

TECHNICAL ARCHITECTURE

FEATURES (PROS)

LIMITATIONS

LICENSE

ACKNOWLEDGMENTS