GitHub - TobikoData/sqlmesh: Scalable and efficient data transformation framework - backwards compatible with dbt.
Extracto
Scalable and efficient data transformation framework - backwards compatible with dbt. - TobikoData/sqlmesh
Resumen
Resumen Principal
SQLMesh es un marco de transformación de datos de próxima generación diseñado para entregar datos de manera rápida, eficiente y sin errores. Se posiciona como una solución superior, no solo una alternativa a dbt, al ofrecer a los equipos de datos la capacidad de ejecutar y desplegar transformaciones escritas en SQL o Python con una visibilidad y control excepcionales, independientemente del tamaño del proyecto. Su propuesta de valor central reside en su énfasis en la eficiencia operativa y la fiabilidad de los datos. Permite a los desarrolladores obtener un impacto instantáneo y contexto de sus cambios SQL, tanto en la interfaz de línea de comandos como a través de su extensión para VSCode. La introducción de Entornos de Datos Virtuales (VDE) es un diferenciador clave, ya que permite el desarrollo aislado sin incurrir en costes de almacén de datos, implementando un flujo de trabajo "Plan / Apply" para prever el impacto de los cambios y un bot de CI/CD para despliegues blue-green con verdadera confianza.
Elementos Clave
- Entornos de Datos Virtuales (VDE) para Desarrollo Aislado: SQLMesh permite la creación de entornos de desarrollo aislados que simulan el estado de los datos, eliminando la necesidad de replicar costosos almacenes de datos. Este enfoque se complementa con un flujo de trabajo "Plan / Apply", similar a Terraform, que permite a los equipos visualizar y comprender el impacto potencial de sus cambios antes de la implementación, y facilita despliegues blue-green robustos mediante un bot de CI/CD dedicado para transiciones sin interrupciones.
- Eficiencia y Pruebas Automatizadas Integradas: La plataforma está diseñada para la máxima eficiencia, asegurando que nunca se construya una tabla más de una vez al rastrear las modificaciones de los datos y ejecutar solo las transformaciones incrementales necesarias. SQLMesh genera y ejecuta automáticamente pruebas unitarias para los modelos de datos, utilizando consultas en vivo para validar los resultados esperados. Adicionalmente, configura auditorías automatizadas para garantizar la calidad y la coherencia de los datos, proporcionando una base sólida para la confianza en los datos.
- Transpilación Multi-Dialecto y Linaje de Columnas: SQLMesh ofrece una notable flexibilidad al permitir a los
Contenido
SQLMesh is a next-generation data transformation framework designed to ship data quickly, efficiently, and without error. Data teams can run and deploy data transformations written in SQL or Python with visibility and control at any size.
It is more than just a dbt alternative.
Core Features
Get instant SQL impact and context of your changes, both in the CLI and in the SQLMesh VSCode Extension
Virtual Data Environments
- See a full diagram of how Virtual Data Environments work
- Watch this video to learn more
- Create isolated development environments without data warehouse costs
- Plan / Apply workflow like Terraform to understand potential impact of changes
- Easy to use CI/CD bot for true blue-green deployments
Efficiency and Testing
Running this command will generate a unit test file in the tests/ folder: test_stg_payments.yaml
Runs a live query to generate the expected output of the model
sqlmesh create_test tcloud_demo.stg_payments --query tcloud_demo.seed_raw_payments "select * from tcloud_demo.seed_raw_payments limit 5" # run the unit test sqlmesh test
MODEL ( name tcloud_demo.stg_payments, cron '@daily', grain payment_id, audits (UNIQUE_VALUES(columns = ( payment_id )), NOT_NULL(columns = ( payment_id ))) ); SELECT id AS payment_id, order_id, payment_method, amount / 100 AS amount, /* `amount` is currently stored in cents, so we convert it to dollars */ 'new_column' AS new_column, /* non-breaking change example */ FROM tcloud_demo.seed_raw_payments
test_stg_payments: model: tcloud_demo.stg_payments inputs: tcloud_demo.seed_raw_payments: - id: 66 order_id: 58 payment_method: coupon amount: 1800 - id: 27 order_id: 24 payment_method: coupon amount: 2600 - id: 30 order_id: 25 payment_method: coupon amount: 1600 - id: 109 order_id: 95 payment_method: coupon amount: 2400 - id: 3 order_id: 3 payment_method: coupon amount: 100 outputs: query: - payment_id: 66 order_id: 58 payment_method: coupon amount: 18.0 new_column: new_column - payment_id: 27 order_id: 24 payment_method: coupon amount: 26.0 new_column: new_column - payment_id: 30 order_id: 25 payment_method: coupon amount: 16.0 new_column: new_column - payment_id: 109 order_id: 95 payment_method: coupon amount: 24.0 new_column: new_column - payment_id: 3 order_id: 3 payment_method: coupon amount: 1.0 new_column: new_column
- Never build a table more than once
- Track what data’s been modified and run only the necessary transformations for incremental models
- Run unit tests for free and configure automated audits
- Run table diffs between prod and dev based on tables/views impacted by a change
Level Up Your SQL
Write SQL in any dialect and SQLMesh will transpile it to your target SQL dialect on the fly before sending it to the warehouse.
- Debug transformation errors before you run them in your warehouse in 10+ different SQL dialects
- Definitions using simply SQL (no need for redundant and confusing
Jinja+YAML) - See impact of changes before you run them in your warehouse with column-level lineage
For more information, check out the website and documentation.
Getting Started
Install SQLMesh through pypi by running:
mkdir sqlmesh-example cd sqlmesh-example python -m venv .venv source .venv/bin/activate pip install 'sqlmesh[lsp]' # install the sqlmesh package with extensions to work with VSCode source .venv/bin/activate # reactivate the venv to ensure you're using the right installation sqlmesh init # follow the prompts to get started (choose DuckDB)
Note: You may need to run
python3orpip3instead ofpythonorpip, depending on your python installation.
Windows Installation
mkdir sqlmesh-example cd sqlmesh-example python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install 'sqlmesh[lsp]' # install the sqlmesh package with extensions to work with VSCode .\.venv\Scripts\Activate.ps1 # reactivate the venv to ensure you're using the right installation sqlmesh init # follow the prompts to get started (choose DuckDB)
Follow the quickstart guide to learn how to use SQLMesh. You already have a head start!
Follow the crash course to learn the core movesets and use the easy to reference cheat sheet.
Follow this example to learn how to use SQLMesh in a full walkthrough.
Join Our Community
Together, we want to build data transformation without the waste. Connect with us in the following ways:
- Join the Tobiko Slack Community to ask questions, or just to say hi!
- File an issue on our GitHub
- Send us an email at hello@tobikodata.com with your questions or feedback
- Read our blog
Contribution
Contributions in the form of issues or pull requests (from fork) are greatly appreciated.
Read more on how to contribute to SQLMesh open source.
Watch this video walkthrough to see how our team contributes a feature to SQLMesh.
Fuente: GitHub


