Comparison: Kafka or Event Hubs connector to consume streaming data from DataBricks in IoT scenario
Expected reader and outcome from this article
- Expected reader: Software engineer/Data engineer who uses Azure IoT and Spark technologies
- Outcome: Understand one of the important difference between Kafka/Event Hubs connector
Motivation
When I design and develop application in manufacturing industry IoT scenario, our team sometimes use DataBricks (Spark) to consume streaming data from sensors.
I uses Azure IoT Hub for managing and receiving data in cloud side. It has Event Hubs compatible endpoint. Event Hubs is compatible with Apache Kafka. Therefore we have 2 options to consume streaming data from IoT Hub on Azure DataBricks. I want to decide which connector I use based on understanding difference.
Conclusion
One of the important difference between these connector is that Event Hubs connector can consume both message properties and message body, but Kafka can consume only message body. Therefore I prefer to use Event Hubs connector than Kafka connector.
When a developer develops device application with Azure IoT Hub Device SDK, it enable us to set 2 kinds of message properties.
- System Property: Defined by IoT Hub automatically. It includes device id assigned in Azure IoT Hub. You may want to group-by aggregation with the device id.
- Application property: Property a developer can set as he/she like. For example, the developer can set
Alert
Boolean property, then Spark application can switch its process.
Detail and working codes
Prerequisites
- Basic understanding about Spark, Kafka and IoT
- Azure IoT Hubs and send telemetry data from device
- Understand Event Hub compatible built-in endpoint in IoT Hub
- Have run notebooks in Azure DataBricks
Consume streaming data with Event Hubs connector
Please see line 1 and 2 for preparing expected cluster and installing required library, then run this in your Databricks notebook
You can see properties (application property) and systemProperties in the streaming data. Of course, you can consume message body as body column.
Consume streaming data with Kafka connector
Please see line 1 and 2 for preparing expected cluster and installing required library, then run this in your Databricks notebook
You don’t consume message properties in the streaming data, but can consume message body as value column.