Absortio

Email → Summary → Bookmark → Email

GitHub - Qovery/replibyte: Seed your development database with real data ⚡️

Extracto

Seed your development database with real data ⚡️. Contribute to Qovery/replibyte development by creating an account on GitHub.

Contenido

replibyte logo

Seed Your Development Database With Real Data ⚡️

Replibyte is a powerful tool to seed your databases
with real data and other cool features 🔥

stable badge Build and Tests Discord

Features

  • Support data backup and restore for PostgreSQL, MySQL and MongoDB
  • Replace sensitive data with fake data
  • Works on large database (> 10GB) (read Design)
  • Database Subsetting: Scale down a production database to a more reasonable size 🔥
  • Start a local database with the prod data in a single command 🔥
  • On-the-fly data (de)compression (Zlib)
  • On-the-fly data de/encryption (AES-256)
  • Fully stateless (no server, no daemon) and lightweight binary 🍃
  • Use custom transformers

Here are the features we plan to support

  • Auto-detect and version database schema change
  • Auto-detect sensitive fields
  • Auto-clean backed up data

Install

Install on MacOSX

⚠️ RepliByte homebrew auto release is in maintenance. Consider using Docker or building from source in the meantime ⚠️

brew tap Qovery/replibyte
brew install replibyte

Or manually.

Install on Linux
# download latest replibyte archive for Linux
curl -s https://api.github.com/repos/Qovery/replibyte/releases/latest | \
    jq -r '.assets[].browser_download_url' | \
    grep -i 'linux-musl.tar.gz$' | wget -qi - && \

# unarchive
tar zxf *.tar.gz

# make replibyte executable
chmod +x replibyte

# make it accessible from everywhere
mv replibyte /usr/local/bin/
Install on Windows

Download the latest Windows release and install it.

Install from source
git clone https://github.com/Qovery/replibyte.git && cd replibyte 

# Install cargo
# visit: https://doc.rust-lang.org/cargo/getting-started/installation.html

# Build with cargo
cargo build --release

# Run RepliByte
./target/release/replibyte -h
Run replibyte with Docker
git clone https://github.com/Qovery/replibyte.git

# Build image with Docker
docker build -t replibyte -f Dockerfile .

# Run RepliByte
docker run -v $(pwd)/examples:/examples/ replibyte -c /examples/replibyte.yaml transformer list

Feel free to edit ./examples/replibyte.yaml with your configuration.

Usage

What is RepliByte

Example with PostgreSQL as a Source and Destination database AND S3 as a Bridge (cf configuration file)

Create a dev database dataset from your production database

Show me
replibyte -c prod-conf.yaml backup run

The backup is compressed and stored on your S3 bucket (cf configuration).

Create a dev database dataset from a dump file

Show me
cat dump.sql | replibyte -c prod-conf.yaml backup run -s postgres -i

The backup is compressed and stored on your S3 bucket (cf configuration).

Seed my local database (Docker required)

Show me

List all your backups to choose one:

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true

Restore the latest one into a Postgres container bound on 5433 (default: 5432) port:

replibyte -c prod-conf.yaml restore local -v latest --image postgres --port 5433

To connect to your Postgres database, use the following connection string:
> postgres://postgres:password@localhost:5433/postgres
Waiting for Ctrl-C to stop the container

OR restore a specific one:

replibyte -c prod-conf.yaml restore local -v backup-1647706359405 --image postgres --port 5433

The seed comes from your S3 bucket (cf configuration)

Seed a remote database

Show me

Show your backups:

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true

Restore the latest one:

replibyte -c prod-conf.yaml restore remote -v latest

OR restore a specific one:

replibyte -c prod-conf.yaml restore remote -v backup-1647706359405

The seed comes from your S3 bucket (cf configuration)

Configuration

Create your prod-conf.yaml configuration file to source your production database.

encryption_key: $MY_PRIVATE_ENC_KEY # optional - encrypt data on bridge
source:
  connection_uri: $DATABASE_URL
  database_subset: # optional - downscale database while keeping it consistent
    database: public
    table: orders
    strategy_name: random
    strategy_options:
      percent: 50
    passthrough_tables:
      - us_states
  transformers: # optional - hide sensitive data
    - database: public
      table: employees
      columns:
        - name: last_name
          transformer_name: random
        - name: birth_date
          transformer_name: random-date
        - name: first_name
          transformer_name: first-name
        - name: email
          transformer_name: email
        - name: username
          transformer_name: keep-first-char
    - database: public
      table: customers
      columns:
        - name: phone
          transformer_name: phone-number
bridge:
  bucket: $BUCKET_NAME
  region: $S3_REGION
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY

Run the app for the source

replibyte -c prod-conf.yaml

Destination

Create your staging-conf.yaml configuration file to sync your production database with your staging database.

bridge:
  bucket: $BUCKET_NAME
  region: $S3_REGION
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
  connection_uri: $DATABASE_URL
encryption_key: $MY_PRIVATE_ENC_KEY # optional - needed to decrypt data on bridge if there was an encryption_key defined when running the source backup

Run the app for the destination

replibyte -c staging-conf.yaml

How RepliByte works

Show me how RepliByte works

Check out our Design page

Connectors

Supported Source connectors

  • PostgreSQL
  • MongoDB
  • Local dump file
  • MySQL

Supported Transformers

A transformer is useful to change / hide the value of a column. RepliByte provides pre-made transformers.

Check out the list of our available Transformers

RepliByte Bridge

The S3 wire protocol, used by RepliByte bridge, is supported by most cloud providers. Here is a non-exhaustive list of S3 compatible services.

Cloud Service Provider S3 service name S3 compatible
Amazon Web Services S3 Yes (Original)
Google Cloud Platform Cloud Storage Yes
Microsoft Azure Blob Storage Yes
Digital Ocean Spaces Yes
Scaleway Object Storage Yes
Minio Object Storage Yes

Feel free to drop a PR to include another S3 compatible solution.

Supported Destination connectors

  • PostgreSQL
  • MongoDB
  • Local dump file
  • MySQL

Motivation

At Qovery (the company behind RepliByte), developers can clone their applications and databases just with one click. However, the cloning process can be tedious and time-consuming, and we end up copying the information multiple times. With RepliByte, the Qovery team wants to provide a comprehensive way to seed cloud databases from one place to another.

The long-term motivation behind RepliByte is to provide a way to clone any database in real-time. This project starts small, but has big ambition!

FAQ

Q: Does RepliByte is an ETL?

Answer

RepliByte is not an ETL like AirByte, AirFlow, Talend, and it will never be. If you need to synchronize versatile data sources, you are better choosing a classic ETL. RepliByte is a tool for software engineers to help them to synchronize data from the same databases. With RepliByte, you can only replicate data from the same type of databases. As mentioned above, the primary purpose of RepliByte is to duplicate into different environments. You can see RepliByte as a specific use case of an ETL, where an ETL is more generic.

Q: Do you support backup from a dump file?

Answer

absolutely,

cat dump.sql | replibyte -c prod-conf.yaml backup run -s postgres -i

and

replibyte -c prod-conf.yaml backup run -s postgres -f dump.sql

How RepliByte can list the backups? Is there an API?

Answer

There is no API, RepliByte is fully stateless and store the backup list into the bridge (E.g. S3) via an index_file .


⬆️ Open an issue if you have any question - I'll pick the most common questions and put them here with the answer

Contributing

Show me how to contribute

Local development

For local development, you will need to install Docker and run docker compose -f ./docker-compose-dev.yml to start the local databases. At the moment, docker-compose includes 2 PostgreSQL database instances, 2 MySQL instances, 2 MongoDB instances and a MinIO bridge. One source, one destination by database and one bridge. In the future, we will provide more options.

The Minio console is accessible at http://localhost:9001.

Once your Docker instances are running, you can run the RepliByte tests, to check if everything is configured correctly:

AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin cargo test

How to contribute

RepliByte is in its early stage of development and need some time to be usable in production. We need some help, and you are welcome to contribute. To better synchronize consider joining our #replibyte channel on our Discord. Otherwise, you can pick any open issues and contribute.

Where should I start?

Check the open issues and their priority.

How can I contact you?

3 options:

  1. Open an issue.
  2. Join our #replibyte channel on our discord.
  3. Drop us an email to github+replibyte {at} qovery {dot} com.

Telemetry

Show me

RepliByte collects anonymized data from users in order to improve our product. Feel free to inspect the code here. This can be deactivated at any time, and any data that has already been collected can be deleted on request (hello+replibyte {at} qovery {dot} com).

Collected data

  • Command line parameters
  • Options used (subset, transformer, compression) in the configuration file.

Thanks

Thanks to all people sharing their ideas to make RepliByte better. We do appreciate it. I would also thank AirByte, a great product and a trustworthy source of inspiration for this project.

Additional resources

Fuente: GitHub