Real-Time Data Stream Processing In Azure Part: 1

There are multiple ways to do real time analytics on Azure, depends on source type and the analytics which we want to perform. This article will introduce the major services in Azure which are involved for real time data solution and at the end will compare all to know which one to use when. You can follow this article also if you are interested in real time data solution.

Streaming using HDInsight (Apache Kafka)

Kafka is real time streaming data platform to collect big data or to do real time analysis or both. There are 2 ways we can have Kafka on Azure either we can install Kafka on IaaS virtual machine and scale manually or we can have Kafka service install in HDInsight. The main challenge with Kafka is its dependency with Zookeeper, which is another hadoop component to manage other hadoop services. Zookeeper stores Kafka metadata like location of partitions, topics configuration etc. Zookeeper also provides administrative tool to Kafka like fault tolerance, leader selection etc. being outside Kafka it create complex architecture but as per current design Kafka team is working to inbuilt these feature as self-managed metadata, find more details here.

Azure Event Grid

Event Grid is a fully managed event service that enables you to easily manage events across many different Azure services and applications. Made for performance and scale, it simplifies building event-driven applications and serverless architectures.
        Event routing service – react to event
        Publisher-Subscriber model
        Not registered in DA subscription
Example:
        Filters to route specific events to different endpoints
        Multicast to multiple endpoints
COST: $0.60 per million operations
SLA: 99.99%
Event Grid Basic tier is priced as pay-per-use based on operations performed. Operations include ingress of events to Domains or Topics, advanced matches, delivery attempts, and management calls. Plan pricing includes a monthly free grant of 100,000 operations.


Azure IoT Hub

IoT Hub is a managed service, hosted in the cloud that acts as a central message hub for bi-directional communication between your IoT application and the devices it manages.
        IoT-specific capabilities
        Establish bidirectional communication
        Authenticate every device for enhanced security
        Automate device provisioning to accelerate IoT deployment
COST: $0.10 per 1,000 operations
SLA: 99.9%


Azure Event Hub

A big data streaming platform and event ingestion service Scalable, Reliable (No Data Loss), Support multiple protocols and SDKs
Key Concepts:
               Event producers send events via AMQP, HTTP, Kafka
               - separated in 1* to 32 partitions
               (load balanced, ordered within partition only)
               Event Hub – unique stream of data
               Event Hub Namespace – Collection of Event Hub (Scoping Container with Shared properties)
               Consumer Group – unique view on event hub data
               - offset: position of event
               - Checkpointing: save offset at client side
COST
Throughput - $0.03/hour
(1 MB/s ingress, 2 MB/s egress)
Ingress - $0.028 per million events
Capture - $0.10/hour
SLA: 99.95% (Standard)


Azure Stream Analytics

Azure Stream Analytics is a real-time analytics and complex event-processing engine that is designed to analyze and process high volumes of fast streaming data from multiple sources simultaneously. The key feature is you can do SQL like query on stream of data which is very good as it is very easy to find people with SQL skill set in comparison to Java, C# etc.

        Real-time analytics and complex event-processing engine
        SQL over stream of data - Stream Analytics Query Language
        Can join multiple stream, self-join
        Out of the box Azure Integrations
        Built in ML based anomaly detection
        Built-in and Custom function
        Support complex types
COST
        $0.11/hour per streaming unit + $1/device/month for IoT job
        SLA: 99.9%


Which One to Use?

Event Hub vs Kafka
        Apache Kafka (Open Source) is installed and run
        Apache Kafka required custom code in Java or C# and depend on Zookeeper
        Azure Event Hubs is a fully managed service i.e. No to manage servers or networks or worry about configuring brokers
        While Kafka is popular with its wide eco system and its on-premises/cloud presence

Event Hub vs IoT Hub
        Azure Event Hub - Device-to-cloud messaging only
        IoT Hub - Device-to-cloud and Cloud-to-device messaging (bi-directional communication)

Azure Stream Analytics
        Process, and analyze streaming data from Azure Event Hubs, IoT Hub etc.
        Faster Learning Curve

Post a Comment

Thanks for your comment !
I will review your this and will respond you as soon as possible.