Is Hadoop Dead?

Minimalist Guide to Lossless Compression

Faster File Distribution with HDFS and S3

A Minimalist Guide to Apache Flume

A Minimalist Guide to FoundationDB

A Book Review of "Architecting Modern Data Platforms"

1.1 Billion Taxi Rides: Spark 2.4.0 versus Presto 0.214

1.1 Billion Taxi Rides: 108-core ClickHouse Cluster

Convert CSVs to ORC Faster

Working with the Hadoop Distributed File System (HDFS)

Systems Monitoring: top vs Htop vs Glances

Working with Data Feeds

A Minimalist Guide to Microsoft SQL Server 2017 on Ubuntu Linux

1.1 Billion Taxi Rides with SQLite, Parquet & HDFS

Customising Airflow: Beyond Boilerplate Settings

Using SQL to query Kafka, MongoDB, MySQL, PostgreSQL and Redis with Presto

Python & Big Data: Airflow & Jupyter Notebook with Hadoop 3, Spark & Presto

1.1 Billion Taxi Rides Benchmark: EC2 versus EMR

Hadoop 3 Single-Node Install Guide

1.1B Taxi Rides with 20 Nvidia Telsa P100s and BrytlytDB

1.1 Billion Taxi Rides with BrytlytDB 2.0 & 2x p2.16xls

A Minimalist Guide to SQLite (with Python 3)

1.1 Billion Taxi Trips on 3 Raspberry Pis Running Spark 2.2

1.1B taxi rides benchmark on the GPU- and PostgreSQL-powered BrytlytDB

Compiling MapD's Source Code

1.1B taxi rides benchmarked on distributed GPU-powered MapD

Detecting Bots in Apache and Nginx Logs Using Python

Doom Bots in TensorFlow

Analysing Petabytes of Websites with PySpark

Summary of the 1.1 Billion Taxi Rides Benchmarks

More →