Getting Started with Apache Iceberg Tables Using AWS Glue Custom Connector

Apache Iceberg is an open table format designed for huge, petabyte-scale tables. Iceberg brings transactions, record-level updates/deletes to data lakes. The project was originally developed at Netflix to solve long-standing issues with their usage of huge, petabyte-scale tables. It was open-sourced in 2018 as an Apache Incubator project. Amazon Web Services (AWS) recently announced the public preview of Amazon Athena ACID transactions,... Continue Reading →

Step by Step Data Analysis Process

As a data engineer, one of our biggest task is to analyze the data. It is a very important step to understand problems and to explore data in meaningful ways. Data analysis helps us to understand the past by exploring the data and creating predictive models by providing input to the data science teams. The... Continue Reading →

Quick Introduction to Apache NiFi and Key Features

Apache NiFi is one of the most popular ETL platform within the open-source community. It provides a web-based user interface for creating, monitoring & controlling data flows. Apache Nifi Terms While working with NiFi, there are terms you need to get familiar with and these are the important aspects of NiFi. Most important building blocks... Continue Reading →

Running Kafka Locally on Windows Using Docker

In this post, we will discuss how you can run Kafka on your windows machine. If you are looking to create a local development environment which uses Kafka, the easiest way is to get the confluent platform docker image and run with docker compose. Compose is a tool for defining and running multi-container docker applications,... Continue Reading →

Website Powered by WordPress.com.

Up ↑