Modern Data Stack

What is the Modern Data Stack?

Reference Articles:

  1. The future of modern data stack

The Modern Data Stack commonly refers to a collection of technologies that comprise a cloud-native data platform, generally leveraged to reduce the complexity in running a traditional data platform.

The individual components are not fixed, but they typically include:

the following as key capabilities of technology in the modern data stack:

  • Offered as a Managed Service: Requires no or minimal setup and configuration from users and absolutely no engineering required. Users can get started today, and it’s not a vapid marketing promise.
  • Centered around a Cloud Data Warehouse(CDW): Everything “just works” off-the-shelf if companies use a popular CDW. By being opinionated about where your data is, you eliminate messy integrations and tools play well together.
  • Democratizes data via a SQL-Centric Ecosystem:Tools are built for data/analytics engineers and business users. These users often know the most about a company’s data, so it makes sense to try to upskill them by giving them tools that speak their language.
  • Elastic Workloads: Pay for what you use. Scale up instantly to handle large workloads. Money is the only scale limitation in the modern cloud.
  • Focus on Operational Workflows(Automation): Point-and-click tools are nice for low-tech users, but it’s all kind of meaningless if there’s not a viable path to production. Modern data stack tools are often built with automation as a core competency.

Reference Articles:

  1. The Modern data stack an overview
  2. THe Modern Data Stack: Open Source Edition

zoomify

ETL (Extrac, Transform, Load) Tools

Data Warehouses, Lakes & Lakehouses:

Graph Databases & Analysis

Customer Data Platforms

Data Transformation Tools

Business Intelligence (BI) Tools

Data Catalog & Event Discovery, Documentation, & Governance Tools ( metadata management)

Data Pipeline tools

  • argo (Open Source)