Waqas Ahmad — Software Architect & Technical Consultant - Available USA, Europe, Global

Waqas Ahmad — Software Architect & Technical Consultant

Specializing in

Distributed Systems

.NET ArchitectureCloud-Native ArchitectureAzure Cloud EngineeringAPI ArchitectureMicroservices ArchitectureEvent-Driven ArchitectureDatabase Design & Optimization

👋 Hi, I'm Waqas — a Software Architect and Technical Consultant specializing in .NET, Azure, microservices, and API-first system design..
I help companies build reliable, maintainable, and high-performance backend platforms that scale.

Experienced across engineering ecosystems shaped by Microsoft, the Cloud Native Computing Foundation, and the Apache Software Foundation.

Available for remote consulting (USA, Europe, Global) — flexible across EST, PST, GMT & CET.

services
Article

Data Engineering: Batch vs Streaming In-Depth

Batch vs streaming in data engineering: when to use which and watermarks.

services
Read the article

Introduction

This guidance is relevant when the topic of this article applies to your system or design choices; it breaks down when constraints or context differ. I’ve applied it in real projects and refined the takeaways over time (as of 2026).

Choosing the wrong data-processing paradigm—batch vs streaming—leads to either unnecessary complexity and cost or missed latency and real-time needs. This article explains when to use batch and when to use streaming, common patterns, Azure services for each, and how to handle late data and exactly-once processing. For data engineers and architects, getting the choice right matters for latency, cost, and operational complexity.

For more on event-driven and data pipelines, see the Event-Driven Architecture resource.

Decision Context

  • System scale: Data volumes from GB to TB (or more); processing on a schedule (batch) or continuously (streaming). Applies when you’re designing or refactoring data pipelines and need to choose batch vs streaming.
  • Team size: Data engineers and platform owners; someone must own pipeline definition, watermarks, and late-data policy. Works when the team can reason about event time vs processing time and consistency trade-offs.
  • Time / budget pressure: Fits greenfield pipelines and “we need real-time” or “we need nightly reports”; breaks down when requirements are vague (e.g. “as real-time as possible” with no latency target) or when there’s no capacity to operate streaming.
  • Technical constraints: Azure (Data Factory, Synapse, Event Hubs, Stream Analytics) or similar; batch = scheduled jobs and storage; streaming = event ingestion and continuous processing. Assumes you can handle late data and define watermarks.
  • Non-goals: This article does not optimise for on-prem only or for a specific vendor’s API; it focuses on when to use batch vs streaming and common patterns.

Batch vs streaming: overview

Aspect Batch Streaming
Processing Scheduled (hourly, daily) Continuous
Latency Minutes to hours Seconds to minutes
Complexity Lower Higher
Cost Pay when running Pay continuously
Use case Reports, analytics, ETL Real-time dashboards, alerts

When to use batch

Batch processing runs on a schedule and processes data in windows (e.g., last 24 hours).

Use batch when:

  • Latency of hours is acceptable
  • Processing is compute-heavy (aggregations, ML training)
  • Data arrives in files (CSV, Parquet)
  • Cost matters more than latency

Examples:

  • Daily sales reports
  • Monthly billing runs
  • Data warehouse ETL
  • ML model training

Key concepts:

  • Incremental load: Only process new/changed data since last run
  • Watermark: Track what was already processed
  • Idempotency: Re-running produces same result

When to use streaming

Streaming processing runs continuously and processes events as they arrive.

Use streaming when:

  • Latency of seconds or minutes is required
  • Events trigger immediate actions
  • Real-time dashboards or alerts needed
  • Data arrives as continuous stream

Examples:

  • Real-time fraud detection
  • Live dashboards
  • IoT sensor processing
  • Clickstream analytics

Key concepts:

  • Event time vs processing time: When event occurred vs when processed
  • Windowing: Group events by time (tumbling, sliding, session)
  • Watermarks: Handle late-arriving data

Azure services for batch

Service Use case
Azure Data Factory Orchestration, ETL pipelines
Azure Synapse Pipelines Data warehouse ETL
Azure Databricks Spark batch jobs, ML
Azure Batch Large-scale parallel compute
Azure Functions (Timer) Scheduled lightweight jobs

Example: Data Factory pipeline

  1. Copy data from Blob Storage
  2. Transform with Mapping Data Flow
  3. Load into Synapse Analytics
  4. Trigger on schedule (daily at 2 AM)

Azure services for streaming

Service Use case
Azure Event Hubs High-throughput event ingestion
Azure Stream Analytics SQL-based stream processing
Azure Databricks Streaming Spark Structured Streaming
Azure Functions (Event Hub trigger) Event-driven compute
Kafka on HDInsight/Confluent Kafka workloads

Example: Stream Analytics query

-- Tumbling window: count events per minute
SELECT
    System.Timestamp AS WindowEnd,
    COUNT(*) AS EventCount
FROM
    EventHubInput TIMESTAMP BY EventTime
GROUP BY
    TumblingWindow(minute, 1)

Common patterns

Lambda architecture

Combine batch and streaming:

  • Batch layer: Accurate, complete data (daily)
  • Speed layer: Real-time approximations
  • Serving layer: Query both

Use when you need both historical accuracy and real-time speed.

Kappa architecture

Streaming only:

  • Process everything as streams
  • Replay from event store when needed

Simpler than Lambda; use when streaming can handle all needs.

Event sourcing + streaming

Store all events; process stream for projections and analytics.

Incremental batch

Process only new data since last watermark:

-- Watermark pattern
SELECT * FROM Orders
WHERE UpdatedAt > @LastWatermark
ORDER BY UpdatedAt

Handling late data

Events can arrive late (network delays, offline devices).

Strategies:

  • Watermarks: Declare “all data before X has arrived”
  • Allowed lateness: Accept events up to N minutes late
  • Reprocessing: Re-run batch for late corrections

Stream Analytics example:

-- Allow events up to 5 minutes late
SELECT *
FROM EventHubInput TIMESTAMP BY EventTime
WHERE DATEDIFF(minute, EventTime, System.Timestamp) <= 5

Enterprise best practices

1. Start with batch; add streaming when needed. Batch is simpler. Add streaming only when latency requirements demand it.

2. Use incremental loads. Do not reprocess everything. Track watermarks; process only new data.

3. Make processing idempotent. Re-running should produce the same result. Use upserts, not inserts.

4. Monitor lag. In streaming, track how far behind processing is. Alert if lag grows.

5. Plan for late data. Define allowed lateness; decide how to handle late corrections.

6. Use dead-letter queues. Capture failed events for investigation and reprocessing.

7. Test with realistic data. Simulate real volumes, including late and out-of-order events.

8. Document SLAs. Define expected latency, throughput, and completeness.

Common issues

Issue Cause Fix
High latency Processing too slow Scale out; optimize queries
Late data Events arrive after window Allow lateness; use watermarks
Duplicates At-least-once delivery Idempotent writes; deduplication
Data loss Processing errors Dead-letter queue; checkpointing
Cost spike Unexpected volume Auto-scale limits; budget alerts
Drift Batch and streaming differ Reconciliation; single source of truth

Summary

Batch processes data on a schedule with higher latency but simpler architecture; streaming processes continuously with lower latency but higher complexity—choose by latency requirement and ops capacity. Picking streaming by default when batch would suffice wastes cost and complexity; skipping streaming when the product needs near-real-time leads to wrong architecture. Next, define your latency target (e.g. < 1 min), then choose batch, streaming, or a hybrid and design watermarks and late-data policy if you stream.

Position & Rationale

I use batch when latency of hours (or at least many minutes) is acceptable and when the workload is easier to reason about and debug in scheduled runs—reports, ETL, aggregations that don’t need to be real-time. I use streaming when the business needs low-latency visibility (e.g. dashboards, alerts) or event-driven actions and when we can invest in watermarks, late data, and operational complexity. I avoid defaulting to streaming because it sounds modern; batch is often simpler and cheaper. I also avoid “batch when we could stream” when the product clearly needs near-real-time; in that case I design for streaming and accept the complexity. Hybrid (batch for correctness/reconciliation, streaming for real-time) is common and I use it when both completeness and latency matter.

Trade-Offs & Failure Modes

Batch sacrifices latency and real-time feedback; you gain simpler architecture, easier debugging, and lower operational cost. Streaming sacrifices simplicity and often exact consistency; you gain low latency and continuous processing. Hybrid adds two pipelines to maintain but can deliver both. Failure modes: choosing streaming without a clear latency requirement and then overpaying in complexity; ignoring late data and watermarks and getting wrong results; treating “real-time” as a single bucket instead of defining acceptable delay; running batch and streaming without a reconciliation path so they drift.

What Most Guides Miss

Most guides list “batch vs streaming” features but don’t stress that the real decision is latency requirement and operational capacity. If the business can’t articulate why sub-minute latency matters, batch is often enough. Another gap: late data in streaming is underplayed—events arrive out of order and after the window closes; you need a policy (allow lateness, side outputs, or drop) and to document it. Reconciliation between batch and streaming (e.g. “batch is source of truth, streaming is for real-time view”) is rarely discussed but matters when you run both.

Decision Framework

  • If latency of hours is acceptable and the workload is reports/ETL → Batch; schedule and run; keep it simple.
  • If latency of seconds/minutes is required (dashboards, alerts) → Streaming; design event flow, watermarks, and late-data policy.
  • If both completeness and real-time matter → Hybrid: batch for authoritative aggregates, streaming for real-time view; reconcile periodically.
  • For late data → Define watermark and lateness policy; use side outputs or allow lateness windows and document the semantics.
  • For production → Monitor backpressure, lag, and pipeline health; have runbooks for replay and failure.

Key Takeaways

  • Batch = scheduled, higher latency, simpler; streaming = continuous, lower latency, more complex. Choose by latency requirement and ops capacity.
  • Don’t default to streaming; batch is often sufficient and cheaper to run.
  • Late data in streaming needs a clear policy (watermarks, allow lateness, side outputs).
  • Hybrid (batch + streaming) is valid when you need both correctness and real-time; design reconciliation.
  • Define “real-time” with a number (e.g. < 1 min) so you can design and measure.

When I Would Use This Again — and When I Wouldn’t

I’d use batch again for reports, ETL, and any workload where hourly or daily latency is fine and I want to minimise operational complexity. I’d use streaming again when the product needs near-real-time (e.g. alerts, live dashboards) and the team can own watermarks and late data. I wouldn’t choose streaming without a clear latency target; “as fast as possible” leads to over-engineering. I also wouldn’t run streaming and batch without a defined reconciliation or source-of-truth story—otherwise the two pipelines drift and no one knows which to trust.

services
Frequently Asked Questions

Frequently Asked Questions

What is batch processing?

Batch processing runs on a schedule (hourly, daily) and processes data in windows. Use when latency of hours is acceptable.

What is streaming processing?

Streaming processing runs continuously and processes events as they arrive. Use when latency of seconds or minutes is required.

When should I use batch vs streaming?

Use batch for reports, ETL, and analytics where latency of hours is fine. Use streaming for real-time dashboards, alerts, and event-driven actions.

What is a watermark?

A watermark tracks how far processing has progressed. In batch, it marks the last processed timestamp. In streaming, it declares “all events before X have arrived.”

What is event time vs processing time?

Event time is when the event occurred. Processing time is when it was processed. Use event time for accurate analytics; processing time can vary.

What is windowing?

Windowing groups events by time. Types: tumbling (fixed, non-overlapping), sliding (overlapping), session (activity-based).

How do I handle late data?

Use watermarks and allowed lateness. Accept events up to N minutes late. For corrections, re-run batch or use append-only with latest wins.

What is Lambda architecture?

Lambda combines batch (accurate) and streaming (fast). Serving layer queries both. Complex but handles both accuracy and real-time.

What is Kappa architecture?

Kappa uses streaming only. Replay from event store when needed. Simpler than Lambda; works when streaming can handle all needs.

What Azure services are for batch?

Data Factory, Synapse Pipelines, Databricks, Azure Batch. Use for ETL, data warehouse loads, and heavy compute.

What Azure services are for streaming?

Event Hubs (ingestion), Stream Analytics (SQL processing), Databricks Streaming (Spark), Functions (event-driven).

How do I make processing idempotent?

Use upserts instead of inserts. Include event ID for deduplication. Design so re-running produces same result.

What is exactly-once processing?

Guarantee each event is processed exactly once (no loss, no duplicates). Hard to achieve; often use at-least-once with idempotent writes.

How do I monitor streaming lag?

Track difference between event time and processing time. Alert if lag exceeds threshold.

Should I use both batch and streaming?

Often yes. Batch for historical accuracy and heavy compute; streaming for real-time needs. Lambda architecture formalizes this.

services
Related Guides & Resources

services
Related services