Demo of Apache Kafka on Heroku

Data processing pipeline architecture using Apache Kafka on Heroku

Overview Architecture

Overview

This system consumes data from the Twitter Streaming API, manipulates the data using a series of Heroku apps, and generates a dynamic visualization of the manipulated data.

The architecture uses five Heroku apps, each serving a different role in the data pipeline.

  • Data Ingest: Read from Twitter streaming API and produce messages for high volume ingest on Kafka topic
  • Data Fanout: Consume ingested messages and fans out to discrete keyword Kafka topics
  • Aggregate Statistic Calculation: Consume messages from keyword Kafka topics, and calculate and produce aggregate mention count to a topic
  • Related Terms Generation: Consumes messages from keyword Kafka topics, and produce related words & related word count to a topic
  • Visualization: Consume messages from aggregate and related words Kafka topics and generate the dynamic stream visualizations in a web application

Architecture

In addition to showing how the system is architected, this diagram also shows how data moves through the system.

The Kafka cluster is represented by the large light purple rectangle. Within that, each named rectangle represents a Kafka topic. The hexagons are Heroku apps that manipulate data. They produce data to and/or consume data from Kafka topics.