Absortio

Email → Summary → Bookmark → Email

GitHub - TobikoData/sqlmesh: Scalable and efficient data transformation framework - backwards compatible with dbt.

Extracto

Scalable and efficient data transformation framework - backwards compatible with dbt. - TobikoData/sqlmesh

Resumen

Resumen Principal

SQLMesh es un marco de transformación de datos de próxima generación diseñado para entregar datos de manera rápida, eficiente y sin errores. Se posiciona como una solución superior, no solo una alternativa a dbt, al ofrecer a los equipos de datos la capacidad de ejecutar y desplegar transformaciones escritas en SQL o Python con una visibilidad y control excepcionales, independientemente del tamaño del proyecto. Su propuesta de valor central reside en su énfasis en la eficiencia operativa y la fiabilidad de los datos. Permite a los desarrolladores obtener un impacto instantáneo y contexto de sus cambios SQL, tanto en la interfaz de línea de comandos como a través de su extensión para VSCode. La introducción de Entornos de Datos Virtuales (VDE) es un diferenciador clave, ya que permite el desarrollo aislado sin incurrir en costes de almacén de datos, implementando un flujo de trabajo "Plan / Apply" para prever el impacto de los cambios y un bot de CI/CD para despliegues blue-green con verdadera confianza.

Elementos Clave

  • Entornos de Datos Virtuales (VDE) para Desarrollo Aislado: SQLMesh permite la creación de entornos de desarrollo aislados que simulan el estado de los datos, eliminando la necesidad de replicar costosos almacenes de datos. Este enfoque se complementa con un flujo de trabajo "Plan / Apply", similar a Terraform, que permite a los equipos visualizar y comprender el impacto potencial de sus cambios antes de la implementación, y facilita despliegues blue-green robustos mediante un bot de CI/CD dedicado para transiciones sin interrupciones.
  • Eficiencia y Pruebas Automatizadas Integradas: La plataforma está diseñada para la máxima eficiencia, asegurando que nunca se construya una tabla más de una vez al rastrear las modificaciones de los datos y ejecutar solo las transformaciones incrementales necesarias. SQLMesh genera y ejecuta automáticamente pruebas unitarias para los modelos de datos, utilizando consultas en vivo para validar los resultados esperados. Adicionalmente, configura auditorías automatizadas para garantizar la calidad y la coherencia de los datos, proporcionando una base sólida para la confianza en los datos.
  • Transpilación Multi-Dialecto y Linaje de Columnas: SQLMesh ofrece una notable flexibilidad al permitir a los

Contenido

SQLMesh logo

SQLMesh is a next-generation data transformation framework designed to ship data quickly, efficiently, and without error. Data teams can run and deploy data transformations written in SQL or Python with visibility and control at any size.

It is more than just a dbt alternative.

Architecture Diagram

Core Features

SQLMesh Plan Mode

Get instant SQL impact and context of your changes, both in the CLI and in the SQLMesh VSCode Extension

Virtual Data Environments
  • Create isolated development environments without data warehouse costs
  • Plan / Apply workflow like Terraform to understand potential impact of changes
  • Easy to use CI/CD bot for true blue-green deployments
Efficiency and Testing

Running this command will generate a unit test file in the tests/ folder: test_stg_payments.yaml

Runs a live query to generate the expected output of the model

sqlmesh create_test tcloud_demo.stg_payments --query tcloud_demo.seed_raw_payments "select * from tcloud_demo.seed_raw_payments limit 5"

# run the unit test
sqlmesh test
MODEL (
  name tcloud_demo.stg_payments,
  cron '@daily',
  grain payment_id,
  audits (UNIQUE_VALUES(columns = (
      payment_id
  )), NOT_NULL(columns = (
      payment_id
  )))
);

SELECT
    id AS payment_id,
    order_id,
    payment_method,
    amount / 100 AS amount, /* `amount` is currently stored in cents, so we convert it to dollars */
    'new_column' AS new_column, /* non-breaking change example  */
FROM tcloud_demo.seed_raw_payments
test_stg_payments:
model: tcloud_demo.stg_payments
inputs:
    tcloud_demo.seed_raw_payments:
      - id: 66
        order_id: 58
        payment_method: coupon
        amount: 1800
      - id: 27
        order_id: 24
        payment_method: coupon
        amount: 2600
      - id: 30
        order_id: 25
        payment_method: coupon
        amount: 1600
      - id: 109
        order_id: 95
        payment_method: coupon
        amount: 2400
      - id: 3
        order_id: 3
        payment_method: coupon
        amount: 100
outputs:
    query:
      - payment_id: 66
        order_id: 58
        payment_method: coupon
        amount: 18.0
        new_column: new_column
      - payment_id: 27
        order_id: 24
        payment_method: coupon
        amount: 26.0
        new_column: new_column
      - payment_id: 30
        order_id: 25
        payment_method: coupon
        amount: 16.0
        new_column: new_column
      - payment_id: 109
        order_id: 95
        payment_method: coupon
        amount: 24.0
        new_column: new_column
      - payment_id: 3
        order_id: 3
        payment_method: coupon
        amount: 1.0
        new_column: new_column
  • Never build a table more than once
  • Track what data’s been modified and run only the necessary transformations for incremental models
  • Run unit tests for free and configure automated audits
  • Run table diffs between prod and dev based on tables/views impacted by a change
Level Up Your SQL Write SQL in any dialect and SQLMesh will transpile it to your target SQL dialect on the fly before sending it to the warehouse. Transpile Example
  • Debug transformation errors before you run them in your warehouse in 10+ different SQL dialects
  • Definitions using simply SQL (no need for redundant and confusing Jinja + YAML)
  • See impact of changes before you run them in your warehouse with column-level lineage

For more information, check out the website and documentation.

Getting Started

Install SQLMesh through pypi by running:

mkdir sqlmesh-example
cd sqlmesh-example
python -m venv .venv
source .venv/bin/activate
pip install 'sqlmesh[lsp]' # install the sqlmesh package with extensions to work with VSCode
source .venv/bin/activate # reactivate the venv to ensure you're using the right installation
sqlmesh init # follow the prompts to get started (choose DuckDB)

Note: You may need to run python3 or pip3 instead of python or pip, depending on your python installation.

Windows Installation
mkdir sqlmesh-example
cd sqlmesh-example
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install 'sqlmesh[lsp]' # install the sqlmesh package with extensions to work with VSCode
.\.venv\Scripts\Activate.ps1 # reactivate the venv to ensure you're using the right installation
sqlmesh init # follow the prompts to get started (choose DuckDB)

Follow the quickstart guide to learn how to use SQLMesh. You already have a head start!

Follow the crash course to learn the core movesets and use the easy to reference cheat sheet.

Follow this example to learn how to use SQLMesh in a full walkthrough.

Join Our Community

Together, we want to build data transformation without the waste. Connect with us in the following ways:

Contribution

Contributions in the form of issues or pull requests (from fork) are greatly appreciated.

Read more on how to contribute to SQLMesh open source.

Watch this video walkthrough to see how our team contributes a feature to SQLMesh.

Fuente: GitHub