Absortio

Email → Summary → Bookmark → Email

ClickHouse vs TimescaleDB | Cupper

Extracto

Publish date: Oct 27, 2021 Last updated: Nov 2, 2021

Resumen

Resumen Principal

El artículo "ClickHouse vs TimescaleDB" presenta un análisis comparativo detallado entre dos sistemas de gestión de bases de datos diseñados para manejar grandes volúmenes de datos temporales. ClickHouse, desarrollado por Yandex, se posiciona como una solución de almacenamiento columnar optimizada para analíticas OLAP (Online Analytical Processing), destacando por su capacidad de procesar consultas complejas a gran velocidad. Por otro lado, TimescaleDB se presenta como una extensión de PostgreSQL específicamente diseñada para series temporales, combinando la familiaridad de SQL con funcionalidades avanzadas para datos temporales. La comparación revela diferencias fundamentales en arquitectura, rendimiento y casos de uso óptimos. ClickHouse demuestra superioridad en escenarios que requieren agregaciones masivas y consultas analíticas complejas, mientras que TimescaleDB ofrece ventajas en mantenibilidad y facilidad de uso para equipos ya familiarizados con el ecosistema PostgreSQL. Ambas soluciones presentan enfoques distintos para abordar el desafío del procesamiento de datos temporales a escala empresarial.

Elementos Clave

  • Arquitectura de almacenamiento: ClickHouse utiliza almacenamiento columnar nativo optimizado para lecturas analíticas, mientras que TimescaleDB emplea la estructura relacional tradicional de PostgreSQL con particionamiento horizontal automático de series temporales, lo que facilita la administración para equipos con experiencia en SQL estándar.

  • Rendimiento en consultas: ClickHouse destaca en consultas analíticas complejas que involucran grandes conjuntos de datos, gracias a su motor vectorial y compresión avanzada, mientras que TimescaleDB ofrece rendimiento consistente en operaciones de series temporales específicas, especialmente en ingesta de datos en tiempo real y consultas de ventana temporal.

  • Facilidad de adopción: TimescaleDB presenta una curva de aprendizaje más suave para desarrolladores familiarizados con PostgreSQL, manteniendo compatibilidad con herramientas existentes, mientras que ClickHouse requiere un enfoque más especializado en optimización de consultas y modelado de datos para aprovechar su máximo potencial.

  • Casos de uso específicos: ClickHouse se adapta mejor a data warehouses y dashboards ejecutivos con consultas ad-hoc complejas, mientras que TimescaleDB es ideal para IoT, monitoreo de infraestructura y aplicaciones que requieren almacenamiento y análisis de métricas temporales continuas con baja latencia.

Análisis e Implicaciones

La elección entre estas plataformas tiene implicaciones estratégicas significativas para las organizaciones que manejan grandes volúmenes de datos temporales. La decisión impacta directamente en el costo operativo, la productividad del equipo y la escalabilidad futura de las soluciones de análisis. TimescaleDB representa una evolución conservadora que preserva inversiones existentes en PostgreSQL, mientras que ClickHouse ofrece una transformación más radical hacia arquitecturas optimizadas para analítica masiva.

Contexto Adicional

Ambas tecnologías reflejan tendencias más amplias en el panorama de bases de datos: la especialización para cargas de trabajo específicas y la evolución hacia soluciones híbridas que combinan lo mejor de diferentes paradigmas. La comparación también ilumina la importancia creciente de las series temporales en aplicaciones modernas de inteligencia empresarial y monitoreo automatizado.

Contenido

Publish date: Oct 27, 2021
Last updated: Nov 2, 2021

Credits

Special thanks to Ilya for extending this blog by adding remaining ClickHouse equivalent queries (Window functions, Joins and Geo queries) from the TimescaleDB docs. Your feedback and help is much appreciated.

Introduction

Recently, TimescaleDB published a blog comparing ClickHouse & TimescaleDB using timescale/tsbs, a timeseries benchmarking framework. I have some experience with PostgreSQL and ClickHouse but never got the chance to play with TimescaleDB. Some of the claims about TimescaleDB made in their post are very bold, that made me even more curious. I thought it’d be a great opportunity to try it out and see if those claims are really true.

Benchmark Methodology

To make sure that I am doing apple to apple comparison and results don’t get biased due to my lack of knowledge of one particular system, I decided to select a dataset which both TimescaleDB and ClickHouse has documented in their website: NYC Taxi Dataset.

From TimescaleDB page, it does analysis on a subset of NYC taxi dataset (~11M rows: 1 month), I decided to download from them and run the queries mentioned there. Let’s see how ClickHouse and TimescaleDB perform.

To make sure that I am capturing the correct result, I ran the same query (all mentioned in this blog) multiple times and picked the one which ran fastest.

Hardware

Following is the spec of the physical machine selected for the benchmarks.

$ rg processor /proc/cpuinfo | wc -l
32
$ free -h
               total        used        free      shared  buff/cache   available
Mem:           125Gi       2.9Gi        99Gi       5.9Gi        23Gi       114Gi
Swap:             0B          0B          0B

Provisioning

To make it easily replicable, I decided to go with docker containers for running them.

TimescaleDB

$ docker run -d -e POSTGRES_PASSWORD=timescaledb timescale/timescaledb:2.4.0-pg13

As mentioned in their blog, I used timescaledb-tune utility to tune the configuration. I made sure that updated config is picked by triggering a config reload.

# timescaledb-tune
Using postgresql.conf at this path:
/var/lib/postgresql/data/postgresql.conf

Is this correct? [(y)es/(n)o]: y
Writing backup to:
/tmp/timescaledb_tune.backup202110290606

success: shared_preload_libraries is set correctly

Tune memory/parallelism/WAL and other settings? [(y)es/(n)o]: y
Recommendations based on 125.79 GB of available memory and 32 CPUs for PostgreSQL 13

Memory settings recommendations
success: memory settings are already tuned

Parallelism settings recommendations
success: parallelism settings are already tuned

WAL settings recommendations
success: WAL settings are already tuned

Miscellaneous settings recommendations
success: miscellaneous settings are already tuned
Saving changes to: /var/lib/postgresql/data/postgresql.conf

# psql -U postgres
psql (13.3)
Type "help" for help.

postgres=# SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)
postgres=# select version();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 13.3 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20210424) 10.3.1 20210424, 64-bit
(1 row)

ClickHouse

$ docker run -d --ulimit nofile=262144:262144 yandex/clickhouse-server:21.8
SELECT version()
┌─version()──┐
│ 21.8.10.19 │
└────────────┘

Table Creation

TimescaleDB

To make sure that timescaledb extension is installed:

postgres=# \dx
                                      List of installed extensions
    Name     | Version |   Schema   |                            Description
-------------+---------+------------+-------------------------------------------------------------------
 plpgsql     | 1.0     | pg_catalog | PL/pgSQL procedural language
 timescaledb | 2.4.0   | public     | Enables scalable inserts and complex queries for time-series data
(2 rows)
Table creation
postgres=# CREATE TABLE "rides"(
     vendor_id TEXT,
     pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
     dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
     passenger_count NUMERIC,
     trip_distance NUMERIC,
     pickup_longitude  NUMERIC,
     pickup_latitude   NUMERIC,
     rate_code         INTEGER,
     dropoff_longitude NUMERIC,
     dropoff_latitude  NUMERIC,
     payment_type INTEGER,
     fare_amount NUMERIC,
     extra NUMERIC,
     mta_tax NUMERIC,
     tip_amount NUMERIC,
     tolls_amount NUMERIC,
     improvement_surcharge NUMERIC,
     total_amount NUMERIC
);
CREATE TABLE

postgres=# SELECT create_hypertable(
     'rides',
     'pickup_datetime',
     'payment_type',
     2,
     create_default_indexes=>FALSE
);
create_hypertable
--------------------
 (1,public,rides,t)
(1 row)

postgres=# CREATE TABLE IF NOT EXISTS "payment_types"(
    payment_type INTEGER,
    description TEXT
);
CREATE TABLE

postgres=# INSERT INTO payment_types(payment_type, description) VALUES
    (1, 'credit card'),
    (2, 'cash'),
    (3, 'no charge'),
    (4, 'dispute'),
    (5, 'unknown'),
    (6, 'voided trip');
INSERT 0 6

postgres=# CREATE TABLE IF NOT EXISTS "rates"(
    rate_code   INTEGER,
    description TEXT
);
CREATE TABLE

postgres=# INSERT INTO rates(rate_code, description) VALUES
    (1, 'standard rate'),
    (2, 'JFK'),
    (3, 'Newark'),
    (4, 'Nassau or Westchester'),
    (5, 'negotiated fare'),
    (6, 'group ride');
INSERT 0 6
Indices creation
postgres=# CREATE INDEX ON rides (vendor_id, pickup_datetime desc);
CREATE INDEX
postgres=# CREATE INDEX ON rides (pickup_datetime desc, vendor_id);
CREATE INDEX
postgres=# CREATE INDEX ON rides (rate_code, pickup_datetime DESC);
CREATE INDEX
postgres=# CREATE INDEX ON rides (passenger_count, pickup_datetime desc);
CREATE INDEX

ClickHouse

Table creation
CREATE TABLE rides
(
  `vendor_id` String,
  `pickup_datetime` DateTime,
  `dropoff_datetime` Nullable(DateTime),
  `passenger_count` Nullable(UInt8),
  `trip_distance` Nullable(Float64),
  `pickup_longitude` Nullable(Float64),
  `pickup_latitude` Nullable(Float64),
  `rate_code` Nullable(UInt8),
  `dropoff_longitude` Nullable(Float64),
  `dropoff_latitude` Nullable(Float64),
  `payment_type` Nullable(String),
  `fare_amount` Nullable(Float32),
  `extra` Nullable(Float32),
  `mta_tax` Nullable(Float32),
  `tip_amount` Nullable(Float32),
  `tolls_amount` Nullable(Float32),
  `improvement_surcharge` Nullable(Float32),
  `total_amount` Nullable(Float32)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(pickup_datetime)
ORDER BY pickup_datetime
SETTINGS index_granularity = 8192;

CREATE TABLE payment_types
(
  `payment_type` UInt8,
  `description` String
)
ENGINE = MergeTree
ORDER BY payment_type;

INSERT INTO payment_types(payment_type, description) VALUES
(1, 'credit card'),
(2, 'cash'),
(3, 'no charge'),
(4, 'dispute'),
(5, 'unknown'),
(6, 'voided trip');

CREATE TABLE rates
(
  `rate_code` UInt8,
  `description` String
)
ENGINE = MergeTree
ORDER BY rate_code;

INSERT INTO rates(rate_code, description) VALUES
(1, 'standard rate'),
(2, 'JFK'),
(3, 'Newark'),
(4, 'Nassau or Westchester'),
(5, 'negotiated fare'),
(6, 'group ride');

Data Loading

Let’s try loading the dataset. We had to turn timing on for TimescaleDB. ClickHouse shows the query duration by default.

postgres=# \timing on
Timing is on.

TimescaleDB

postgres=# \COPY rides FROM nyc_data_rides.csv CSV;
COPY 10906860
Time: 220695.563 ms (03:40.696)

Since postgres COPY command is single threaded, this was expected. We decided to try the timescaledb-parallel-copy increasing workers and batch size from default.

# timescaledb-parallel-copy --db-name postgres --table rides --file nyc_data_rides.csv --workers 10 --batch-size 10000 --reporting-period 10s
at 10s, row rate 187998.43/sec (period), row rate 187998.43/sec (overall), 1.880000E+06 total rows
at 20s, row rate 201998.60/sec (period), row rate 194998.51/sec (overall), 3.900000E+06 total rows
at 30s, row rate 192001.22/sec (period), row rate 193999.42/sec (overall), 5.820000E+06 total rows
at 40s, row rate 181001.56/sec (period), row rate 190749.99/sec (overall), 7.630000E+06 total rows
at 50s, row rate 194998.37/sec (period), row rate 191599.67/sec (overall), 9.580000E+06 total rows
COPY 10906860

ClickHouse

# cat nyc_data_rides.csv | clickhouse-client --time --query="INSERT INTO rides FORMAT CSV"
6.702
Database Mechanism Duration (s)
TimescaleDB COPY 220
TimescaleDB timescaledb-parallel-copy 50
ClickHouse INSERT 7

Data Storage

TimescaleDB

# du -sh /var/lib/postgresql/data
4.2G    /var/lib/postgresql/data

ClickHouse

# du -sh /var/lib/clickhouse
453M    /var/lib/clickhouse

Data Query

Counting all the rides that happened in 1 month.

TimescaleDB
postgres=# select count(*) from rides;
  count
----------
 10906860
(1 row)

Time: 434.663 ms
ClickHouse
SELECT count(*)
FROM rides

Query id: f2ec17cc-bf45-42a9-8181-e77ba5097416

┌──count()─┐
│ 10906860 │
└──────────┘

1 rows in set. Elapsed: 0.003 sec.
Database Query Duration (ms)
TimescaleDB 435
ClickHouse 3

Counting the number of rides on daily basis.

TimescaleDB
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, COUNT(*)
FROM rides
GROUP BY day
ORDER BY day;
         day         | count
---------------------+--------
 2016-01-01 00:00:00 | 345037
 2016-01-02 00:00:00 | 312831
 2016-01-03 00:00:00 | 302878
 2016-01-04 00:00:00 | 316171
 2016-01-05 00:00:00 | 343251
 2016-01-06 00:00:00 | 348516
 2016-01-07 00:00:00 | 364894
 2016-01-08 00:00:00 | 392070
 2016-01-09 00:00:00 | 405825
 2016-01-10 00:00:00 | 351788
 2016-01-11 00:00:00 | 342651
 2016-01-12 00:00:00 | 367390
 2016-01-13 00:00:00 | 395090
 2016-01-14 00:00:00 | 396473
 2016-01-15 00:00:00 | 401289
 2016-01-16 00:00:00 | 411899
 2016-01-17 00:00:00 | 379156
 2016-01-18 00:00:00 | 341481
 2016-01-19 00:00:00 | 385187
 2016-01-20 00:00:00 | 382105
 2016-01-21 00:00:00 | 399654
 2016-01-22 00:00:00 | 420162
 2016-01-23 00:00:00 |  78133
 2016-01-24 00:00:00 | 159766
 2016-01-25 00:00:00 | 282087
 2016-01-26 00:00:00 | 327655
 2016-01-27 00:00:00 | 359180
 2016-01-28 00:00:00 | 383326
 2016-01-29 00:00:00 | 414039
 2016-01-30 00:00:00 | 435369
 2016-01-31 00:00:00 | 361505
 2017-11-17 00:00:00 |      2
(32 rows)

Time: 1054.827 ms (00:01.055)
ClickHouse
SELECT
    toDate(pickup_datetime) AS day,
    COUNT(*)
FROM rides
GROUP BY day
ORDER BY day ASC

Query id: 97033310-0219-47d9-850c-405cae2e66a2

┌────────day─┬─count()─┐
│ 2016-01-01 │  345037 │
│ 2016-01-02 │  312831 │
│ 2016-01-03 │  302878 │
│ 2016-01-04 │  316171 │
│ 2016-01-05 │  343251 │
│ 2016-01-06 │  348516 │
│ 2016-01-07 │  364894 │
│ 2016-01-08 │  392070 │
│ 2016-01-09 │  405825 │
│ 2016-01-10 │  351788 │
│ 2016-01-11 │  342651 │
│ 2016-01-12 │  367390 │
│ 2016-01-13 │  395090 │
│ 2016-01-14 │  396473 │
│ 2016-01-15 │  401289 │
│ 2016-01-16 │  411899 │
│ 2016-01-17 │  379156 │
│ 2016-01-18 │  341481 │
│ 2016-01-19 │  385187 │
│ 2016-01-20 │  382105 │
│ 2016-01-21 │  399654 │
│ 2016-01-22 │  420162 │
│ 2016-01-23 │   78133 │
│ 2016-01-24 │  159766 │
│ 2016-01-25 │  282087 │
│ 2016-01-26 │  327655 │
│ 2016-01-27 │  359180 │
│ 2016-01-28 │  383326 │
│ 2016-01-29 │  414039 │
│ 2016-01-30 │  435369 │
│ 2016-01-31 │  361505 │
│ 2017-11-17 │       2 │
└────────────┴─────────┘

32 rows in set. Elapsed: 0.021 sec. Processed 10.91 million rows, 43.63 MB (516.78 million rows/s., 2.07 GB/s.)
Database Query Duration (ms)
TimescaleDB 1055
ClickHouse 21

Calculating the average fare amount for passengers.

TimescaleDB
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, avg(fare_amount)
FROM rides
WHERE passenger_count = 1
GROUP BY day
ORDER BY day;
         day         |         avg
---------------------+---------------------
 2016-01-01 00:00:00 | 12.5464748850129787
 2016-01-02 00:00:00 | 12.1129878886746750
 2016-01-03 00:00:00 | 12.8262352076841150
 2016-01-04 00:00:00 | 11.9116533573721472
 2016-01-05 00:00:00 | 11.7534235580737452
 2016-01-06 00:00:00 | 11.7824805635293235
 2016-01-07 00:00:00 | 11.9498961299166930
 2016-01-08 00:00:00 | 11.8738271415577235
 2016-01-09 00:00:00 | 11.4878104510668158
 2016-01-10 00:00:00 | 12.0888246569772508
 2016-01-11 00:00:00 | 12.1343557409576630
 2016-01-12 00:00:00 | 11.8598489409162272
 2016-01-13 00:00:00 | 11.8031804213844504
 2016-01-14 00:00:00 | 12.3123409782144971
 2016-01-15 00:00:00 | 12.5613139967255876
 2016-01-16 00:00:00 | 11.8351508976680107
 2016-01-17 00:00:00 | 12.0295602006362590
 2016-01-18 00:00:00 | 11.8623529387202913
 2016-01-19 00:00:00 | 12.3472510415266765
 2016-01-20 00:00:00 | 12.4609413534048953
 2016-01-21 00:00:00 | 12.6083024358000252
 2016-01-22 00:00:00 | 12.0812572192038226
 2016-01-23 00:00:00 | 11.8564034256627468
 2016-01-24 00:00:00 | 12.1723178660326188
 2016-01-25 00:00:00 | 14.6872562077357373
 2016-01-26 00:00:00 | 14.2577452509214630
 2016-01-27 00:00:00 | 13.2432283566172145
 2016-01-28 00:00:00 | 12.9854551087781026
 2016-01-29 00:00:00 | 12.5911266525638913
 2016-01-30 00:00:00 | 12.2759556743269683
 2016-01-31 00:00:00 | 12.5867816634207905
 2017-11-17 00:00:00 |  9.7500000000000000
(32 rows)

Time: 1499.333 ms (00:01.499)
ClickHouse
SELECT
    toDate(pickup_datetime) AS day,
    avg(fare_amount)
FROM rides
WHERE passenger_count = 1
GROUP BY day
ORDER BY day ASC

Query id: 92abcc32-f516-42d9-99dd-3b0faa7103fb

┌────────day─┬───avg(fare_amount)─┐
│ 2016-01-01 │ 12.546474885017933 │
│ 2016-01-02 │ 12.112987888606051 │
│ 2016-01-03 │ 12.826235207666542 │
│ 2016-01-04 │ 11.911653357328007 │
│ 2016-01-05 │ 11.753423558134099 │
│ 2016-01-06 │  11.78248056342069 │
│ 2016-01-07 │ 11.949896129834796 │
│ 2016-01-08 │ 11.873827141488679 │
│ 2016-01-09 │ 11.487810451064721 │
│ 2016-01-10 │ 12.088824656929884 │
│ 2016-01-11 │ 12.134355740954636 │
│ 2016-01-12 │ 11.859848940902094 │
│ 2016-01-13 │ 11.803180421366301 │
│ 2016-01-14 │ 12.312340978159439 │
│ 2016-01-15 │ 12.561313996603548 │
│ 2016-01-16 │ 11.835150897615112 │
│ 2016-01-17 │  12.02956020059271 │
│ 2016-01-18 │ 11.862352938597022 │
│ 2016-01-19 │ 12.347251041579934 │
│ 2016-01-20 │ 12.460941353486442 │
│ 2016-01-21 │ 12.608302435857224 │
│ 2016-01-22 │ 12.081257219198683 │
│ 2016-01-23 │ 11.856403425394397 │
│ 2016-01-24 │  12.17231786602557 │
│ 2016-01-25 │ 14.687256207078526 │
│ 2016-01-26 │ 14.257745250869808 │
│ 2016-01-27 │ 13.243228356662698 │
│ 2016-01-28 │ 12.985455108728543 │
│ 2016-01-29 │ 12.591126652543593 │
│ 2016-01-30 │ 12.275955679633002 │
│ 2016-01-31 │ 12.586781663432406 │
│ 2017-11-17 │               9.75 │
└────────────┴────────────────────┘

32 rows in set. Elapsed: 0.054 sec. Processed 10.91 million rows, 119.98 MB (202.67 million rows/s., 2.23 GB/s.)
Database Query Duration (ms)
TimescaleDB 1500
ClickHouse 54

Calculating number of rides that took place for each rate type

TimescaleDB
postgres=# SELECT rate_code, COUNT(vendor_id) AS num_trips
FROM rides
WHERE pickup_datetime < '2016-02-01'
GROUP BY rate_code
ORDER BY rate_code;
 rate_code | num_trips
-----------+-----------
         1 |  10626315
         2 |    225019
         3 |     16822
         4 |      4696
         5 |     33688
         6 |       102
        99 |       216
(7 rows)

Time: 1014.495 ms (00:01.014)
ClickHouse
SELECT
    rate_code,
    COUNT(vendor_id) AS num_trips
FROM rides
WHERE pickup_datetime < '2016-02-01'
GROUP BY rate_code
ORDER BY rate_code ASC

Query id: 288415a8-316b-42c8-8c65-ec85d476bb1e

┌─rate_code─┬─num_trips─┐
│         1 │  10626315 │
│         2 │    225019 │
│         3 │     16822 │
│         4 │      4696 │
│         5 │     33688 │
│         6 │       102 │
│        99 │       216 │
└───────────┴───────────┘

7 rows in set. Elapsed: 0.053 sec. Processed 10.91 million rows, 174.51 MB (205.99 million rows/s., 3.30 GB/s.)
Database Query Duration (ms)
TimescaleDB 1014
ClickHouse 53

Analysis of rides to JFK and EWR.

TimescaleDB
postgres=# SELECT rates.description, COUNT(vendor_id) AS num_trips,
   AVG(dropoff_datetime - pickup_datetime) AS avg_trip_duration, AVG(total_amount) AS avg_total,
   AVG(tip_amount) AS avg_tip, MIN(trip_distance) AS min_distance, AVG (trip_distance) AS avg_distance, MAX(trip_distance) AS max_distance,
   AVG(passenger_count) AS avg_passengers
 FROM rides
 JOIN rates ON rides.rate_code = rates.rate_code
 WHERE rides.rate_code IN (2,3) AND pickup_datetime < '2016-02-01'
 GROUP BY rates.description
 ORDER BY rates.description;
 description | num_trips | avg_trip_duration |      avg_total      |      avg_tip       | min_distance |    avg_distance     | max_distance |   avg_passengers
-------------+-----------+-------------------+---------------------+--------------------+--------------+---------------------+--------------+--------------------
 JFK         |    225019 | 00:45:46.822517   | 64.3278115181384683 | 7.3334228220728027 |         0.00 | 17.2602816651038357 |       221.00 | 1.7333869584346211
 Newark      |     16822 | 00:35:16.157472   | 86.4633688027582927 | 9.5461657353465700 |         0.00 | 16.2706122934252764 |       177.23 | 1.7435501129473309
(2 rows)

Time: 5132.768 ms (00:05.133)
ClickHouse
SELECT
    rates.description,
    COUNT(vendor_id) AS num_trips,
    AVG(dropoff_datetime - pickup_datetime) AS avg_trip_duration,
    AVG(total_amount) AS avg_total,
    AVG(tip_amount) AS avg_tip,
    MIN(trip_distance) AS min_distance,
    AVG(trip_distance) AS avg_distance,
    MAX(trip_distance) AS max_distance,
    AVG(passenger_count) AS avg_passengers
FROM rides
INNER JOIN rates ON rides.rate_code = rates.rate_code
WHERE (rides.rate_code IN (2, 3)) AND (pickup_datetime < '2016-02-01')
GROUP BY rates.description
ORDER BY rates.description ASC

Query id: 303edbf1-73f8-4426-86b2-66fbecc57f4c

┌─description─┬─num_trips─┬──avg_trip_duration─┬─────────avg_total─┬───────────avg_tip─┬─min_distance─┬───────avg_distance─┬─max_distance─┬────avg_passengers─┐
│ JFK         │    225019 │ 2746.8225172096577 │ 64.32781102360629 │ 7.333422817827029 │            0 │  17.26028166510384 │          221 │ 1.733386958434621 │
│ Newark      │     16822 │  2116.157472357627 │ 86.46336993224668 │ 9.546165732798725 │            0 │ 16.270612293425287 │       177.23 │ 1.743550112947331 │
└─────────────┴───────────┴────────────────────┴───────────────────┴───────────────────┴──────────────┴────────────────────┴──────────────┴───────────────────┘

2 rows in set. Elapsed: 0.116 sec. Processed 10.91 million rows, 458.09 MB (94.31 million rows/s., 3.96 GB/s.)
Database Query Duration (ms)
TimescaleDB 5132
ClickHouse 116

Calculate the number of rides happened every 5 minutes for the first day of 2016.

TimescaleDB
postgres=# SELECT
EXTRACT(hour from pickup_datetime) as hours,
trunc(EXTRACT(minute from pickup_datetime) / 5)*5 AS five_mins,
COUNT(*)
FROM rides
WHERE pickup_datetime < '2016-01-02 00:00'
GROUP BY hours, five_mins;
hours | five_mins | count
-------+-----------+-------
  0 |     0 |   703
  0 |     5 |  1482
  0 |    10 |  1959
  0 |    15 |  2200
  0 |    20 |  2285
  0 |    25 |  2291
  0 |    30 |  2349
  0 |    35 |  2328
  0 |    40 |  2440
  0 |    45 |  2372
  0 |    50 |  2388
  0 |    55 |  2473
  1 |     0 |  2395
  1 |     5 |  2510
  1 |    10 |  2412
  1 |    15 |  2482
  1 |    20 |  2428
  1 |    25 |  2433
  1 |    30 |  2337
  1 |    35 |  2366
  1 |    40 |  2325
  1 |    45 |  2257
  1 |    50 |  2316
  1 |    55 |  2250
  2 |     0 |  2303
  2 |     5 |  2259
  2 |    10 |  2253
  2 |    15 |  2176
  2 |    20 |  2138
  2 |    25 |  2131
  2 |    30 |  2135
  2 |    35 |  2047
  2 |    40 |  2086
  2 |    45 |  2049
  2 |    50 |  1938
  2 |    55 |  1975
  3 |     0 |  1966
  3 |     5 |  1927
  3 |    10 |  1896
  3 |    15 |  1840
  3 |    20 |  1917
  3 |    25 |  1816
  3 |    30 |  1784
  3 |    35 |  1607
  3 |    40 |  1652
  3 |    45 |  1619
  3 |    50 |  1655
  3 |    55 |  1598
  4 |     0 |  1684
  4 |     5 |  1608
  4 |    10 |  1533
  4 |    15 |  1468
  4 |    20 |  1430
  4 |    25 |  1303
  4 |    30 |  1230
  4 |    35 |  1171
  4 |    40 |  1092
  4 |    45 |  1023
  4 |    50 |   998
  4 |    55 |   879
  5 |     0 |   837
  5 |     5 |   858
  5 |    10 |   733
  5 |    15 |   776
  5 |    20 |   662
  5 |    25 |   703
  5 |    30 |   629
  5 |    35 |   624
  5 |    40 |   611
  5 |    45 |   560
  5 |    50 |   555
  5 |    55 |   554
  6 |     0 |   551
  6 |     5 |   519
  6 |    10 |   516
  6 |    15 |   501
  6 |    20 |   500
  6 |    25 |   510
  6 |    30 |   497
  6 |    35 |   524
  6 |    40 |   508
  6 |    45 |   463
  6 |    50 |   479
  6 |    55 |   483
  7 |     0 |   476
  7 |     5 |   456
  7 |    10 |   447
  7 |    15 |   459
  7 |    20 |   451
  7 |    25 |   450
  7 |    30 |   440
  7 |    35 |   418
  7 |    40 |   417
  7 |    45 |   426
  7 |    50 |   413
  7 |    55 |   401
  8 |     0 |   396
  8 |     5 |   405
  8 |    10 |   409
  8 |    15 |   447
  8 |    20 |   408
  8 |    25 |   404
  8 |    30 |   396
  8 |    35 |   421
  8 |    40 |   441
  8 |    45 |   374
  8 |    50 |   427
  8 |    55 |   412
  9 |     0 |   410
  9 |     5 |   454
  9 |    10 |   487
  9 |    15 |   510
  9 |    20 |   488
  9 |    25 |   456
  9 |    30 |   456
  9 |    35 |   539
  9 |    40 |   542
  9 |    45 |   564
  9 |    50 |   568
  9 |    55 |   581
  10 |     0 |   596
  10 |     5 |   592
  10 |    10 |   669
  10 |    15 |   656
  10 |    20 |   703
  10 |    25 |   667
  10 |    30 |   706
  10 |    35 |   662
  10 |    40 |   739
  10 |    45 |   788
  10 |    50 |   814
  10 |    55 |   843
  11 |     0 |   779
  11 |     5 |   861
  11 |    10 |   798
  11 |    15 |   881
  11 |    20 |   893
  11 |    25 |   977
  11 |    30 |  1006
  11 |    35 |  1025
  11 |    40 |  1114
  11 |    45 |  1120
  11 |    50 |  1165
  11 |    55 |  1133
  12 |     0 |  1132
  12 |     5 |  1156
  12 |    10 |  1157
  12 |    15 |  1166
  12 |    20 |  1203
  12 |    25 |  1122
  12 |    30 |  1119
  12 |    35 |  1181
  12 |    40 |  1194
  12 |    45 |  1228
  12 |    50 |  1174
  12 |    55 |  1195
  13 |     0 |  1234
  13 |     5 |  1204
  13 |    10 |  1276
  13 |    15 |  1285
  13 |    20 |  1272
  13 |    25 |  1280
  13 |    30 |  1282
  13 |    35 |  1252
  13 |    40 |  1420
  13 |    45 |  1370
  13 |    50 |  1430
  13 |    55 |  1389
  14 |     0 |  1423
  14 |     5 |  1346
  14 |    10 |  1293
  14 |    15 |  1328
  14 |    20 |  1355
  14 |    25 |  1333
  14 |    30 |  1367
  14 |    35 |  1475
  14 |    40 |  1397
  14 |    45 |  1439
  14 |    50 |  1444
  14 |    55 |  1352
  15 |     0 |  1315
  15 |     5 |  1327
  15 |    10 |  1330
  15 |    15 |  1380
  15 |    20 |  1351
  15 |    25 |  1322
  15 |    30 |  1411
  15 |    35 |  1346
  15 |    40 |  1355
  15 |    45 |  1353
  15 |    50 |  1386
  15 |    55 |  1263
  16 |     0 |  1308
  16 |     5 |  1278
  16 |    10 |  1185
  16 |    15 |  1256
  16 |    20 |  1223
  16 |    25 |  1223
  16 |    30 |  1204
  16 |    35 |  1269
  16 |    40 |  1184
  16 |    45 |  1208
  16 |    50 |  1182
  16 |    55 |  1266
  17 |     0 |  1228
  17 |     5 |  1323
  17 |    10 |  1285
  17 |    15 |  1242
  17 |    20 |  1303
  17 |    25 |  1272
  17 |    30 |  1335
  17 |    35 |  1297
  17 |    40 |  1328
  17 |    45 |  1444
  17 |    50 |  1423
  17 |    55 |  1367
  18 |     0 |  1365
  18 |     5 |  1377
  18 |    10 |  1327
  18 |    15 |  1352
  18 |    20 |  1370
  18 |    25 |  1316
  18 |    30 |  1448
  18 |    35 |  1341
  18 |    40 |  1475
  18 |    45 |  1508
  18 |    50 |  1543
  18 |    55 |  1358
  19 |     0 |  1308
  19 |     5 |  1368
  19 |    10 |  1426
  19 |    15 |  1372
  19 |    20 |  1365
  19 |    25 |  1315
  19 |    30 |  1321
  19 |    35 |  1260
  19 |    40 |  1280
  19 |    45 |  1246
  19 |    50 |  1216
  19 |    55 |  1192
  20 |     0 |  1154
  20 |     5 |  1091
  20 |    10 |  1100
  20 |    15 |  1118
  20 |    20 |  1105
  20 |    25 |  1060
  20 |    30 |  1135
  20 |    35 |  1126
  20 |    40 |  1098
  20 |    45 |  1136
  20 |    50 |  1039
  20 |    55 |  1087
  21 |     0 |  1042
  21 |     5 |  1025
  21 |    10 |  1113
  21 |    15 |  1127
  21 |    20 |  1102
  21 |    25 |  1068
  21 |    30 |  1104
  21 |    35 |  1083
  21 |    40 |  1101
  21 |    45 |  1127
  21 |    50 |  1086
  21 |    55 |  1095
  22 |     0 |  1103
  22 |     5 |  1199
  22 |    10 |  1126
  22 |    15 |  1130
  22 |    20 |  1110
  22 |    25 |  1098
  22 |    30 |  1197
  22 |    35 |  1189
  22 |    40 |  1167
  22 |    45 |  1247
  22 |    50 |  1146
  22 |    55 |  1030
  23 |     0 |  1139
  23 |     5 |  1082
  23 |    10 |  1065
  23 |    15 |   986
  23 |    20 |  1032
  23 |    25 |   988
  23 |    30 |  1032
  23 |    35 |  1100
  23 |    40 |  1142
  23 |    45 |  1145
  23 |    50 |  1149
  23 |    55 |  1063
(288 rows)

Time: 170.031 ms
ClickHouse
SELECT
    toHour(pickup_datetime) AS hours,
    trunc(toMinute(pickup_datetime) / 5) * 5 AS five_mins,
    COUNT(*)
FROM rides
WHERE pickup_datetime < '2016-01-02 00:00:00'
GROUP BY
    hours,
    five_mins
ORDER BY
    hours ASC,
    five_mins ASC

Query id: f9606d18-3852-41a3-9b21-7b58dc23c9da

┌─hours─┬─five_mins─┬─count()─┐
│     0 │         0 │     703 │
│     0 │         5 │    1482 │
│     0 │        10 │    1959 │
│     0 │        15 │    2200 │
│     0 │        20 │    2285 │
│     0 │        25 │    2291 │
│     0 │        30 │    2349 │
│     0 │        35 │    2328 │
│     0 │        40 │    2440 │
│     0 │        45 │    2372 │
│     0 │        50 │    2388 │
│     0 │        55 │    2473 │
│     1 │         0 │    2395 │
│     1 │         5 │    2510 │
│     1 │        10 │    2412 │
│     1 │        15 │    2482 │
│     1 │        20 │    2428 │
│     1 │        25 │    2433 │
│     1 │        30 │    2337 │
│     1 │        35 │    2366 │
│     1 │        40 │    2325 │
│     1 │        45 │    2257 │
│     1 │        50 │    2316 │
│     1 │        55 │    2250 │
│     2 │         0 │    2303 │
│     2 │         5 │    2259 │
│     2 │        10 │    2253 │
│     2 │        15 │    2176 │
│     2 │        20 │    2138 │
│     2 │        25 │    2131 │
│     2 │        30 │    2135 │
│     2 │        35 │    2047 │
│     2 │        40 │    2086 │
│     2 │        45 │    2049 │
│     2 │        50 │    1938 │
│     2 │        55 │    1975 │
│     3 │         0 │    1966 │
│     3 │         5 │    1927 │
│     3 │        10 │    1896 │
│     3 │        15 │    1840 │
│     3 │        20 │    1917 │
│     3 │        25 │    1816 │
│     3 │        30 │    1784 │
│     3 │        35 │    1607 │
│     3 │        40 │    1652 │
│     3 │        45 │    1619 │
│     3 │        50 │    1655 │
│     3 │        55 │    1598 │
│     4 │         0 │    1684 │
│     4 │         5 │    1608 │
│     4 │        10 │    1533 │
│     4 │        15 │    1468 │
│     4 │        20 │    1430 │
│     4 │        25 │    1303 │
│     4 │        30 │    1230 │
│     4 │        35 │    1171 │
│     4 │        40 │    1092 │
│     4 │        45 │    1023 │
│     4 │        50 │     998 │
│     4 │        55 │     879 │
│     5 │         0 │     837 │
│     5 │         5 │     858 │
│     5 │        10 │     733 │
│     5 │        15 │     776 │
│     5 │        20 │     662 │
│     5 │        25 │     703 │
│     5 │        30 │     629 │
│     5 │        35 │     624 │
│     5 │        40 │     611 │
│     5 │        45 │     560 │
│     5 │        50 │     555 │
│     5 │        55 │     554 │
│     6 │         0 │     551 │
│     6 │         5 │     519 │
│     6 │        10 │     516 │
│     6 │        15 │     501 │
│     6 │        20 │     500 │
│     6 │        25 │     510 │
│     6 │        30 │     497 │
│     6 │        35 │     524 │
│     6 │        40 │     508 │
│     6 │        45 │     463 │
│     6 │        50 │     479 │
│     6 │        55 │     483 │
│     7 │         0 │     476 │
│     7 │         5 │     456 │
│     7 │        10 │     447 │
│     7 │        15 │     459 │
│     7 │        20 │     451 │
│     7 │        25 │     450 │
│     7 │        30 │     440 │
│     7 │        35 │     418 │
│     7 │        40 │     417 │
│     7 │        45 │     426 │
│     7 │        50 │     413 │
│     7 │        55 │     401 │
│     8 │         0 │     396 │
│     8 │         5 │     405 │
│     8 │        10 │     409 │
│     8 │        15 │     447 │
│     8 │        20 │     408 │
│     8 │        25 │     404 │
│     8 │        30 │     396 │
│     8 │        35 │     421 │
│     8 │        40 │     441 │
│     8 │        45 │     374 │
│     8 │        50 │     427 │
│     8 │        55 │     412 │
│     9 │         0 │     410 │
│     9 │         5 │     454 │
│     9 │        10 │     487 │
│     9 │        15 │     510 │
│     9 │        20 │     488 │
│     9 │        25 │     456 │
│     9 │        30 │     456 │
│     9 │        35 │     539 │
│     9 │        40 │     542 │
│     9 │        45 │     564 │
│     9 │        50 │     568 │
│     9 │        55 │     581 │
│    10 │         0 │     596 │
│    10 │         5 │     592 │
│    10 │        10 │     669 │
│    10 │        15 │     656 │
│    10 │        20 │     703 │
│    10 │        25 │     667 │
│    10 │        30 │     706 │
│    10 │        35 │     662 │
│    10 │        40 │     739 │
│    10 │        45 │     788 │
│    10 │        50 │     814 │
│    10 │        55 │     843 │
│    11 │         0 │     779 │
│    11 │         5 │     861 │
│    11 │        10 │     798 │
│    11 │        15 │     881 │
│    11 │        20 │     893 │
│    11 │        25 │     977 │
│    11 │        30 │    1006 │
│    11 │        35 │    1025 │
│    11 │        40 │    1114 │
│    11 │        45 │    1120 │
│    11 │        50 │    1165 │
│    11 │        55 │    1133 │
│    12 │         0 │    1132 │
│    12 │         5 │    1156 │
│    12 │        10 │    1157 │
│    12 │        15 │    1166 │
│    12 │        20 │    1203 │
│    12 │        25 │    1122 │
│    12 │        30 │    1119 │
│    12 │        35 │    1181 │
│    12 │        40 │    1194 │
│    12 │        45 │    1228 │
│    12 │        50 │    1174 │
│    12 │        55 │    1195 │
│    13 │         0 │    1234 │
│    13 │         5 │    1204 │
│    13 │        10 │    1276 │
│    13 │        15 │    1285 │
│    13 │        20 │    1272 │
│    13 │        25 │    1280 │
│    13 │        30 │    1282 │
│    13 │        35 │    1252 │
│    13 │        40 │    1420 │
│    13 │        45 │    1370 │
│    13 │        50 │    1430 │
│    13 │        55 │    1389 │
│    14 │         0 │    1423 │
│    14 │         5 │    1346 │
│    14 │        10 │    1293 │
│    14 │        15 │    1328 │
│    14 │        20 │    1355 │
│    14 │        25 │    1333 │
│    14 │        30 │    1367 │
│    14 │        35 │    1475 │
│    14 │        40 │    1397 │
│    14 │        45 │    1439 │
│    14 │        50 │    1444 │
│    14 │        55 │    1352 │
│    15 │         0 │    1315 │
│    15 │         5 │    1327 │
│    15 │        10 │    1330 │
│    15 │        15 │    1380 │
│    15 │        20 │    1351 │
│    15 │        25 │    1322 │
│    15 │        30 │    1411 │
│    15 │        35 │    1346 │
│    15 │        40 │    1355 │
│    15 │        45 │    1353 │
│    15 │        50 │    1386 │
│    15 │        55 │    1263 │
│    16 │         0 │    1308 │
│    16 │         5 │    1278 │
│    16 │        10 │    1185 │
│    16 │        15 │    1256 │
│    16 │        20 │    1223 │
│    16 │        25 │    1223 │
│    16 │        30 │    1204 │
│    16 │        35 │    1269 │
│    16 │        40 │    1184 │
│    16 │        45 │    1208 │
│    16 │        50 │    1182 │
│    16 │        55 │    1266 │
│    17 │         0 │    1228 │
│    17 │         5 │    1323 │
│    17 │        10 │    1285 │
│    17 │        15 │    1242 │
│    17 │        20 │    1303 │
│    17 │        25 │    1272 │
│    17 │        30 │    1335 │
│    17 │        35 │    1297 │
│    17 │        40 │    1328 │
│    17 │        45 │    1444 │
│    17 │        50 │    1423 │
│    17 │        55 │    1367 │
│    18 │         0 │    1365 │
│    18 │         5 │    1377 │
│    18 │        10 │    1327 │
│    18 │        15 │    1352 │
│    18 │        20 │    1370 │
│    18 │        25 │    1316 │
│    18 │        30 │    1448 │
│    18 │        35 │    1341 │
│    18 │        40 │    1475 │
│    18 │        45 │    1508 │
│    18 │        50 │    1543 │
│    18 │        55 │    1358 │
│    19 │         0 │    1308 │
│    19 │         5 │    1368 │
│    19 │        10 │    1426 │
│    19 │        15 │    1372 │
│    19 │        20 │    1365 │
│    19 │        25 │    1315 │
│    19 │        30 │    1321 │
│    19 │        35 │    1260 │
│    19 │        40 │    1280 │
│    19 │        45 │    1246 │
│    19 │        50 │    1216 │
│    19 │        55 │    1192 │
│    20 │         0 │    1154 │
│    20 │         5 │    1091 │
│    20 │        10 │    1100 │
│    20 │        15 │    1118 │
│    20 │        20 │    1105 │
│    20 │        25 │    1060 │
│    20 │        30 │    1135 │
│    20 │        35 │    1126 │
│    20 │        40 │    1098 │
│    20 │        45 │    1136 │
│    20 │        50 │    1039 │
│    20 │        55 │    1087 │
│    21 │         0 │    1042 │
│    21 │         5 │    1025 │
│    21 │        10 │    1113 │
│    21 │        15 │    1127 │
│    21 │        20 │    1102 │
│    21 │        25 │    1068 │
│    21 │        30 │    1104 │
│    21 │        35 │    1083 │
│    21 │        40 │    1101 │
│    21 │        45 │    1127 │
│    21 │        50 │    1086 │
│    21 │        55 │    1095 │
│    22 │         0 │    1103 │
│    22 │         5 │    1199 │
│    22 │        10 │    1126 │
│    22 │        15 │    1130 │
│    22 │        20 │    1110 │
│    22 │        25 │    1098 │
│    22 │        30 │    1197 │
│    22 │        35 │    1189 │
│    22 │        40 │    1167 │
│    22 │        45 │    1247 │
│    22 │        50 │    1146 │
│    22 │        55 │    1030 │
│    23 │         0 │    1139 │
│    23 │         5 │    1082 │
│    23 │        10 │    1065 │
│    23 │        15 │     986 │
│    23 │        20 │    1032 │
│    23 │        25 │     988 │
│    23 │        30 │    1032 │
│    23 │        35 │    1100 │
│    23 │        40 │    1142 │
│    23 │        45 │    1145 │
│    23 │        50 │    1149 │
│    23 │        55 │    1063 │
└───────┴───────────┴─────────┘

288 rows in set. Elapsed: 0.015 sec. Processed 352.26 thousand rows, 1.41 MB (23.05 million rows/s., 92.19 MB/s.)
Database Query Duration (ms)
TimescaleDB 170
ClickHouse 15

Calculating number of rides on New Year’s morning originated from within 400m of Times Square, in 30 minute buckets.

TimescaleDB
postgres=# SELECT time_bucket('30 minutes', pickup_datetime) AS thirty_min, COUNT(*) AS near_times_sq
FROM rides
WHERE ST_Distance(pickup_geom, ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 400
AND pickup_datetime < '2016-01-01 14:00'
GROUP BY thirty_min ORDER BY thirty_min;
    thirty_min      | near_times_sq
---------------------+---------------
2016-01-01 00:00:00 |            74
2016-01-01 00:30:00 |           102
2016-01-01 01:00:00 |           120
2016-01-01 01:30:00 |            98
2016-01-01 02:00:00 |           112
2016-01-01 02:30:00 |           109
2016-01-01 03:00:00 |           163
2016-01-01 03:30:00 |           181
2016-01-01 04:00:00 |           214
2016-01-01 04:30:00 |           185
2016-01-01 05:00:00 |           158
2016-01-01 05:30:00 |           113
2016-01-01 06:00:00 |           102
2016-01-01 06:30:00 |            91
2016-01-01 07:00:00 |            88
2016-01-01 07:30:00 |            58
2016-01-01 08:00:00 |            72
2016-01-01 08:30:00 |            94
2016-01-01 09:00:00 |           115
2016-01-01 09:30:00 |           118
2016-01-01 10:00:00 |           135
2016-01-01 10:30:00 |           160
2016-01-01 11:00:00 |           212
2016-01-01 11:30:00 |           229
2016-01-01 12:00:00 |           244
2016-01-01 12:30:00 |           230
2016-01-01 13:00:00 |           235
2016-01-01 13:30:00 |           238
(28 rows)

Time: 167.335 ms
ClickHouse
SELECT
    toStartOfInterval(pickup_datetime, toIntervalMinute(30)) AS thirty_min,
    COUNT(*) AS near_times_sq
FROM rides
WHERE (greatCircleDistance(-73.9851, 40.7589, pickup_longitude, pickup_latitude) < 400) AND (pickup_datetime < '2016-01-01 14:00:00')
GROUP BY thirty_min
ORDER BY thirty_min ASC

Query id: 047dfcbf-8a49-47b1-a2b6-2a44600bed30

┌──────────thirty_min─┬─near_times_sq─┐
│ 2016-01-01 00:00:00 │            73 │
│ 2016-01-01 00:30:00 │           101 │
│ 2016-01-01 01:00:00 │           120 │
│ 2016-01-01 01:30:00 │            98 │
│ 2016-01-01 02:00:00 │           111 │
│ 2016-01-01 02:30:00 │           109 │
│ 2016-01-01 03:00:00 │           163 │
│ 2016-01-01 03:30:00 │           181 │
│ 2016-01-01 04:00:00 │           215 │
│ 2016-01-01 04:30:00 │           186 │
│ 2016-01-01 05:00:00 │           158 │
│ 2016-01-01 05:30:00 │           114 │
│ 2016-01-01 06:00:00 │           101 │
│ 2016-01-01 06:30:00 │            92 │
│ 2016-01-01 07:00:00 │            88 │
│ 2016-01-01 07:30:00 │            60 │
│ 2016-01-01 08:00:00 │            72 │
│ 2016-01-01 08:30:00 │            95 │
│ 2016-01-01 09:00:00 │           114 │
│ 2016-01-01 09:30:00 │           118 │
│ 2016-01-01 10:00:00 │           136 │
│ 2016-01-01 10:30:00 │           160 │
│ 2016-01-01 11:00:00 │           212 │
│ 2016-01-01 11:30:00 │           230 │
│ 2016-01-01 12:00:00 │           245 │
│ 2016-01-01 12:30:00 │           228 │
│ 2016-01-01 13:00:00 │           237 │
│ 2016-01-01 13:30:00 │           239 │
└─────────────────────┴───────────────┘

28 rows in set. Elapsed: 0.016 sec. Processed 196.61 thousand rows, 4.33 MB (12.11 million rows/s., 266.38 MB/s.)
Database Query Duration (ms)
TimescaleDB 167
ClickHouse 16

Enabling TimescaleDB Compression

After running the above benchmarks, I felt that I am missing something important about TimescaleDB and started going through their docs again until I realised that TimescaleDB doesn’t enable compression by default, which is quite strange. I ran the benchmark queries by enabling compression.

postgres=# ALTER TABLE rides SET (
   timescaledb.compress,
   timescaledb.compress_segmentby = 'vendor_id'
);
ALTER TABLE

postgres=# SELECT add_compression_policy('rides', INTERVAL '0 days');
 add_compression_policy
------------------------
                   1000
(1 row)

After enabling the compression, verified that it triggered the compression from logs.

2021-10-31 02:16:48.675 UTC [252] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_1_chunk
2021-10-31 02:16:57.931 UTC [255] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_2_chunk
2021-10-31 02:17:04.964 UTC [258] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_5_chunk
2021-10-31 02:17:18.686 UTC [260] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_6_chunk
2021-10-31 02:17:25.894 UTC [264] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_7_chunk
2021-10-31 02:17:39.172 UTC [266] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_8_chunk
2021-10-31 02:17:49.474 UTC [270] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_9_chunk
2021-10-31 02:17:55.110 UTC [273] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_10_chunk
2021-10-31 02:18:03.536 UTC [275] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_3_chunk
2021-10-31 02:18:08.258 UTC [278] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_4_chunk
2021-10-31 02:18:08.403 UTC [280] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_11_chunk
2021-10-31 02:18:08.510 UTC [281] LOG:  completed compressing chunk _timescaledb_internal._hyper_1_12_chunk

This definitely reduced the disk usage by TimescaleDB.

# du -sh /var/lib/postgresql/data/
1.6G    /var/lib/postgresql/data/

Let’s run the above queries again.

Counting all the rides that happened in 1 month.

TimescaleDB (with compression)
postgres=# select count(*) from rides;
  count
----------
 10906860
(1 row)

Time: 282.369 ms

Counting the number of rides on daily basis.

TimescaleDB (with compression)
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, COUNT(*)
FROM rides
GROUP BY day
ORDER BY day;
         day         | count
---------------------+--------
 2016-01-01 00:00:00 | 345037
 2016-01-02 00:00:00 | 312831
 2016-01-03 00:00:00 | 302878
 2016-01-04 00:00:00 | 316171
 2016-01-05 00:00:00 | 343251
 2016-01-06 00:00:00 | 348516
 2016-01-07 00:00:00 | 364894
 2016-01-08 00:00:00 | 392070
 2016-01-09 00:00:00 | 405825
 2016-01-10 00:00:00 | 351788
 2016-01-11 00:00:00 | 342651
 2016-01-12 00:00:00 | 367390
 2016-01-13 00:00:00 | 395090
 2016-01-14 00:00:00 | 396473
 2016-01-15 00:00:00 | 401289
 2016-01-16 00:00:00 | 411899
 2016-01-17 00:00:00 | 379156
 2016-01-18 00:00:00 | 341481
 2016-01-19 00:00:00 | 385187
 2016-01-20 00:00:00 | 382105
 2016-01-21 00:00:00 | 399654
 2016-01-22 00:00:00 | 420162
 2016-01-23 00:00:00 |  78133
 2016-01-24 00:00:00 | 159766
 2016-01-25 00:00:00 | 282087
 2016-01-26 00:00:00 | 327655
 2016-01-27 00:00:00 | 359180
 2016-01-28 00:00:00 | 383326
 2016-01-29 00:00:00 | 414039
 2016-01-30 00:00:00 | 435369
 2016-01-31 00:00:00 | 361505
 2017-11-17 00:00:00 |      2
(32 rows)

Time: 1033.075 ms (00:01.033)

Calculating the average fare amount for passengers.

TimescaleDB (with compression)
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, avg(fare_amount)
FROM rides
WHERE passenger_count = 1
GROUP BY day
ORDER BY day;
         day         |         avg
---------------------+---------------------
 2016-01-01 00:00:00 | 12.5464748850129787
 2016-01-02 00:00:00 | 12.1129878886746750
 2016-01-03 00:00:00 | 12.8262352076841150
 2016-01-04 00:00:00 | 11.9116533573721472
 2016-01-05 00:00:00 | 11.7534235580737452
 2016-01-06 00:00:00 | 11.7824805635293235
 2016-01-07 00:00:00 | 11.9498961299166930
 2016-01-08 00:00:00 | 11.8738271415577235
 2016-01-09 00:00:00 | 11.4878104510668158
 2016-01-10 00:00:00 | 12.0888246569772508
 2016-01-11 00:00:00 | 12.1343557409576630
 2016-01-12 00:00:00 | 11.8598489409162272
 2016-01-13 00:00:00 | 11.8031804213844504
 2016-01-14 00:00:00 | 12.3123409782144971
 2016-01-15 00:00:00 | 12.5613139967255876
 2016-01-16 00:00:00 | 11.8351508976680107
 2016-01-17 00:00:00 | 12.0295602006362590
 2016-01-18 00:00:00 | 11.8623529387202913
 2016-01-19 00:00:00 | 12.3472510415266765
 2016-01-20 00:00:00 | 12.4609413534048953
 2016-01-21 00:00:00 | 12.6083024358000252
 2016-01-22 00:00:00 | 12.0812572192038226
 2016-01-23 00:00:00 | 11.8564034256627468
 2016-01-24 00:00:00 | 12.1723178660326188
 2016-01-25 00:00:00 | 14.6872562077357373
 2016-01-26 00:00:00 | 14.2577452509214630
 2016-01-27 00:00:00 | 13.2432283566172145
 2016-01-28 00:00:00 | 12.9854551087781026
 2016-01-29 00:00:00 | 12.5911266525638913
 2016-01-30 00:00:00 | 12.2759556743269683
 2016-01-31 00:00:00 | 12.5867816634207905
 2017-11-17 00:00:00 |  9.7500000000000000
(32 rows)

Time: 1599.145 ms (00:01.599)

Calculating number of rides that took place for each rate type

TimescaleDB (with compression)
postgres=# SELECT rate_code, COUNT(vendor_id) AS num_trips
FROM rides
WHERE pickup_datetime < '2016-02-01'
GROUP BY rate_code
ORDER BY rate_code;
 rate_code | num_trips
-----------+-----------
         1 |  10626315
         2 |    225019
         3 |     16822
         4 |      4696
         5 |     33688
         6 |       102
        99 |       216
(7 rows)

Time: 974.998 ms

Overall, although some TimescaleDB queries became faster by enabling compression but many others became bit slower probably due to decompression overhead. This may be the reason why TimescaleDB disable compression by default.

Digging Deeper Into TimescaleDB

I wanted to understand how the queries work in TimescaleDB. So I decided to try out EXPLAIN (ANALYZE, FORMAT JSON, VERBOSE, BUFFERS) <query>.

alt text alt text

It stores the data in chunks (12 in our case) and runs the query over those chunks parallelly and aggregate the result.

postgres=# SELECT show_chunks('rides');
               show_chunks
-----------------------------------------
 _timescaledb_internal._hyper_1_1_chunk
 _timescaledb_internal._hyper_1_2_chunk
 ...
 ...
 _timescaledb_internal._hyper_1_11_chunk
 _timescaledb_internal._hyper_1_12_chunk
(12 rows)

I was curious if changing this chunk size/count will impact the query performance.

Reduce chunk count

Let’s first try to reduce the number of chunks.

postgres=# SELECT set_chunk_time_interval('rides', INTERVAL '1 year');
 set_chunk_time_interval
-------------------------

(1 row)

postgres=# SELECT show_chunks('rides');
                show_chunks
-------------------------------------------
 _timescaledb_internal._hyper_1_1565_chunk
 _timescaledb_internal._hyper_1_1566_chunk
 _timescaledb_internal._hyper_1_1567_chunk
 _timescaledb_internal._hyper_1_1568_chunk
(4 rows)

Let’s run the query to calculate the number of rides on daily basis.

TimescaleDB
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, COUNT(*)
FROM rides
GROUP BY day
ORDER BY day;
         day         | count
---------------------+--------
 2016-01-01 00:00:00 | 345037
 2016-01-02 00:00:00 | 312831
 2016-01-03 00:00:00 | 302878
 2016-01-04 00:00:00 | 316171
 2016-01-05 00:00:00 | 343251
 2016-01-06 00:00:00 | 348516
 2016-01-07 00:00:00 | 364894
 2016-01-08 00:00:00 | 392070
 2016-01-09 00:00:00 | 405825
 2016-01-10 00:00:00 | 351788
 2016-01-11 00:00:00 | 342651
 2016-01-12 00:00:00 | 367390
 2016-01-13 00:00:00 | 395090
 2016-01-14 00:00:00 | 396473
 2016-01-15 00:00:00 | 401289
 2016-01-16 00:00:00 | 411899
 2016-01-17 00:00:00 | 379156
 2016-01-18 00:00:00 | 341481
 2016-01-19 00:00:00 | 385187
 2016-01-20 00:00:00 | 382105
 2016-01-21 00:00:00 | 399654
 2016-01-22 00:00:00 | 420162
 2016-01-23 00:00:00 |  78133
 2016-01-24 00:00:00 | 159766
 2016-01-25 00:00:00 | 282087
 2016-01-26 00:00:00 | 327655
 2016-01-27 00:00:00 | 359180
 2016-01-28 00:00:00 | 383326
 2016-01-29 00:00:00 | 414039
 2016-01-30 00:00:00 | 435369
 2016-01-31 00:00:00 | 361505
 2017-11-17 00:00:00 |      2
(32 rows)

Time: 1184.163 ms (00:01.184)

Query performance reduced by this change.

Increase chunk count

Let’s now try to increase the number of chunks.

postgres=# SELECT set_chunk_time_interval('rides', INTERVAL '1 day');
 set_chunk_time_interval
-------------------------

(1 row)

postgres=# SELECT show_chunks('rides');
                show_chunks
-------------------------------------------
 _timescaledb_internal._hyper_1_1501_chunk
 _timescaledb_internal._hyper_1_1502_chunk
 ...
 ...
 _timescaledb_internal._hyper_1_1563_chunk
 _timescaledb_internal._hyper_1_1564_chunk
(64 rows)

This created 64 chunks.

Let’s run the query to calculate the number of rides on daily basis.

TimescaleDB
postgres=# SELECT date_trunc('day', pickup_datetime)
AS day, COUNT(*)
FROM rides
GROUP BY day
ORDER BY day;
         day         | count
---------------------+--------
 2016-01-01 00:00:00 | 345037
 2016-01-02 00:00:00 | 312831
 2016-01-03 00:00:00 | 302878
 2016-01-04 00:00:00 | 316171
 2016-01-05 00:00:00 | 343251
 2016-01-06 00:00:00 | 348516
 2016-01-07 00:00:00 | 364894
 2016-01-08 00:00:00 | 392070
 2016-01-09 00:00:00 | 405825
 2016-01-10 00:00:00 | 351788
 2016-01-11 00:00:00 | 342651
 2016-01-12 00:00:00 | 367390
 2016-01-13 00:00:00 | 395090
 2016-01-14 00:00:00 | 396473
 2016-01-15 00:00:00 | 401289
 2016-01-16 00:00:00 | 411899
 2016-01-17 00:00:00 | 379156
 2016-01-18 00:00:00 | 341481
 2016-01-19 00:00:00 | 385187
 2016-01-20 00:00:00 | 382105
 2016-01-21 00:00:00 | 399654
 2016-01-22 00:00:00 | 420162
 2016-01-23 00:00:00 |  78133
 2016-01-24 00:00:00 | 159766
 2016-01-25 00:00:00 | 282087
 2016-01-26 00:00:00 | 327655
 2016-01-27 00:00:00 | 359180
 2016-01-28 00:00:00 | 383326
 2016-01-29 00:00:00 | 414039
 2016-01-30 00:00:00 | 435369
 2016-01-31 00:00:00 | 361505
 2017-11-17 00:00:00 |      2
(32 rows)

Time: 691.523 ms

Query performance improved by this change.

Database Chunk Count Query Duration (ms)
TimescaleDB 4 1184
TimescaleDB 12 1055
TimescaleDB 64 692

Conclusion

  • We found ClickHouse did better while evaluting on storage, read, write performance.
  • ClickHouse defaults are pretty good and doesn’t require any server side tuning, while tuning the table schema definitely helps. TimescaleDB requires tuning to make it perform faster.

Last but not the least, I may be completely wrong in my evaluation. Feel free to point it out. We all are doing our best to write good software.