Article

Azure Cloud Architecture Patterns for Scalable Apps

Q: When should I use App Service vs AKS vs Container Apps?

Use App Service for simple web/API when you do not need multi-service orchestration or portability. Use AKS when you need full Kubernetes (service mesh, advanced networking, multi-cloud). Use Container Apps when you want serverless-style containers with scale-to-zero and less operational overhead than AKS.

Q: When should I use Azure SQL vs Cosmos DB?

Use Azure SQL when your data is relational and you need ACID and JOINs. Use Cosmos DB when you need global distribution, tunable consistency, or a document/key-value model at scale. Cosmos is more expensive per GB; use it when multi-region or single-digit-ms latency is a requirement.

Q: What are the main cost drivers in Azure?

Compute (App Service, AKS nodes, Functions), storage (SQL, Cosmos, Blob), messaging (Service Bus, Event Grid, Event Hubs), and egress. Use reserved capacity for baseline; auto-pause and right-size; tag resources so that you can attribute cost by team and environment.

Q: How do I choose between Service Bus, Event Grid, and Event Hubs?

Service Bus: queues and topics for reliable, ordered messaging between your services. Event Grid: high-throughput event routing and Azure integration (e.g. blob created). Event Hubs: high-throughput ingestion and stream processing. Do not use Event Hubs as a general-purpose queue.

Q: What is Managed Identity and why use it?

Managed Identity lets Azure resources (App Service, AKS, Functions) authenticate to other Azure services (Key Vault, SQL, Service Bus) without storing secrets. Use it for service-to-service auth so that no connection strings or keys are in code or config.

Q: How do I right-size Azure resources?

Measure usage (CPU, memory, throughput) with Azure Monitor; start with the minimum tier that meets SLA. Scale up when metrics justify it; use auto-scale for variable load. For dev/staging, use auto-shutdown and smaller SKUs.

Q: When should I use Azure Container Apps vs AKS?

Use Container Apps when you want serverless-style containers (scale to zero, event-driven or HTTP scaling) with less operational overhead than Kubernetes. Use AKS when you need full Kubernetes: multiple node pools, service mesh, advanced networking, or workloads that require K8s APIs.

Q: What is the difference between DTU and vCore for Azure SQL?

DTU is a blended measure of CPU, memory, and I/O; simpler for small to medium workloads. vCore gives explicit control over CPU and memory and supports higher limits and read replicas. Start with DTU; move to vCore when you need more control or scale.

Q: When should I use Logic Apps vs Azure Functions?

Use Logic Apps for low-code workflows with connectors and a visual designer; good for integration scenarios. Use Functions for code-first logic, complex branching, or .NET/Node/Python. Functions give full control; Logic Apps are faster for simple, connector-based flows.

Azure architecture: compute, storage, messaging, security, and deployment. Decision.

March 17, 2024 · Waqas Ahmad

Read the article

Introduction

This guidance is relevant when the topic of this article applies to your system or design choices; it breaks down when constraints or context differ. I’ve applied it in real projects and refined the takeaways over time (as of 2026).

Choosing the wrong Azure building blocks leads to over-provisioning, cost overruns, or under-planning for scale. This article explains Azure cloud architecture patterns for compute (App Service, AKS, Container Apps, Functions), storage (Azure SQL, Cosmos DB, Blob), messaging (Service Bus, Event Grid, Event Hubs), security, deployment, and cost discipline. Matching each workload to the right service and right-sizing from day one matters for architects and tech leads who need scalable systems without wasted cost or complexity.

If you are new to Azure cloud architecture, start with Topics covered and Azure cloud architecture at a glance.

For a deeper overview of this topic, explore the full Cloud-Native Architecture guide.

Topics covered

Decision Context
What is Azure cloud architecture and why it matters
What is App Service vs containers vs serverless?
Azure cloud architecture at a glance
Compute: App Service vs AKS vs Container Apps vs Functions
Data stores: Azure SQL vs Cosmos DB vs Blob
Class structure: how the pieces fit together
Messaging and integration: Service Bus, Event Grid, Event Hubs
Security and identity: Azure AD, Managed Identity, Key Vault
Networking: VNet, private endpoints, Front Door
Deployment and DevOps: slots, Bicep, pipelines
Monitoring and observability
Decision framework: when to choose what
Right-sizing and cost discipline
High availability and disaster recovery
Common issues and challenges
Best practices and pitfalls
Summary
Position & Rationale
Trade-Offs & Failure Modes
What Most Guides Miss
Decision Framework
Key Takeaways
When I Would Use This Again — and When I Wouldn’t
Frequently Asked Questions

Decision Context

System scale: From single App Service + SQL to multi-region AKS + Cosmos + Event Hubs; the patterns apply when you’re choosing compute, storage, messaging, and security on Azure for scalable apps.
Team size: One to several teams; platform or architecture often owns the initial choices (App Service vs AKS, SQL vs Cosmos); delivery teams own the apps that run on them.
Time / budget pressure: Fits greenfield and migration; breaks down when “we’ll just use the same as last time” without matching workload—then you over- or under-provision.
Technical constraints: Azure (App Service, AKS, Container Apps, Functions, SQL, Cosmos, Blob, Service Bus, Event Grid, Event Hubs, Key Vault, etc.); .NET where relevant.
Non-goals: This article does not optimize for AWS or GCP, for minimal cost at any cost, or for “one size fits all”; it optimises for right-sized Azure architecture and cost discipline.

What is Azure cloud architecture and why it matters

Cloud architecture is how you combine compute, storage, networking, and identity so that your application is scalable, secure, and cost-effective. On Azure, that means choosing the right mix of App Service, containers (AKS, Container Apps), serverless (Functions, Logic Apps), databases (Azure SQL, Cosmos DB, Blob), and messaging (Service Bus, Event Grid, Event Hubs). There is no single “right” architecture—it depends on your workload, team size, and how much operational overhead you can absorb.

A small line-of-business app might sit entirely on App Service and Azure SQL; a global, event-driven platform might use AKS, Cosmos DB, and Event Hubs. The patterns in this article help you decide when to use what, so you do not over-provision (e.g. AKS for a single API) or under-plan (e.g. Azure SQL for a globally distributed document store). I have seen teams spend months migrating to Kubernetes when App Service would have done the job, and others hit scaling walls because they chose the wrong storage tier from day one.

Why it matters: The wrong choice costs time, money, and complexity. Right-sizing from the start—and knowing when to move to the next tier—keeps delivery fast and cost under control. We go through compute, storage, messaging, security, networking, deployment, and monitoring in turn, then tie it together with a decision framework and real-world lessons.

What is App Service vs containers vs serverless?

In short: App Service for a single web app or API, AKS or Container Apps when you need orchestration or multiple services, Functions for event-driven logic. Before diving into the full at-a-glance table, here is a short build-up so you know what each compute option is and when it fits.

Azure App Service is platform-as-a-service (PaaS) for web apps and APIs: you deploy your code (or a container); Microsoft runs the VMs, patching, and load balancing. You choose a plan (Basic, Standard, Premium) and scale out by adding instances. Deployment slots let you stage a new version and swap it with production with minimal downtime. Use App Service when your app is a traditional web or API (e.g. ASP.NET Core, Node, Python), you do not need to run multiple services in one place, and you are happy with Azure as the sole host. It suits internal tools, marketing sites, and many line-of-business APIs. Cost is predictable and operations are minimal.

Azure Kubernetes Service (AKS) and Azure Container Apps are for when you need orchestration: multiple services, rolling updates, service mesh, or portability to another cloud or on-prem. AKS gives you full Kubernetes: you describe what you want (e.g. “run three replicas of my API”) in YAML or Helm charts, and the control plane keeps the cluster in that state. AKS is Microsoft’s managed Kubernetes: they run the control plane; you get a cluster and add node pools (the VMs that run your pods). You get portability—the same manifests can run on another cloud or on-prem—and a large ecosystem (Helm, Kustomize, GitOps). The trade-off is operational complexity: you own the nodes, networking, and upgrades unless you use a fully managed option. Container Apps sits between App Service and AKS: you run containers, but scaling and networking are simpler; you can scale to zero and trigger on HTTP or events (e.g. Service Bus messages). Good fit when you have a handful of microservices and do not need full Kubernetes APIs.

Azure Functions (and Logic Apps for low-code workflows) are for event-driven, short-lived work: reacting to blob uploads, queue messages, or HTTP webhooks. Use Functions for small, focused pieces of logic that scale independently; avoid long-running or stateful processes unless you use Durable Functions. Serverless reduces idle cost but adds cold-start and timeout constraints—suit it to the workload. Functions run in a consumption plan (pay per execution, scale to zero) or a premium plan (always-on, no cold start, VNet integration). For an API that must respond in under 100 ms, consumption-plan cold starts can be a problem; for a nightly batch job or a webhook that processes queue messages, they are usually fine.

Azure cloud architecture at a glance

Area	Service	What it is	When to use
Compute	App Service	PaaS for web/API; managed hosting, slots, scaling	Single web app or API; no multi-service orchestration
Compute	AKS	Managed Kubernetes; full K8s APIs	Many services, rolling updates, portability, service mesh
Compute	Container Apps	Serverless-style containers; scale to zero, event-driven	Handful of microservices; less ops than AKS
Compute	Functions	Serverless; event-driven, short-lived	Blob/queue/HTTP triggers; batch, webhooks
Storage	Azure SQL	Relational; ACID, JOINs	Transactional, relational workloads
Storage	Cosmos DB	Global NoSQL; tunable consistency	Global distribution, low latency at scale, document/key-value
Storage	Blob	Unstructured; files, backups, data lake	Large binaries; no query; analytics with Data Lake/Synapse
Messaging	Service Bus	Queues and topics; reliable, ordered	Work between your services; dead-letter, sessions
Messaging	Event Grid	Event routing; push, high throughput	Fan-out, Azure resource events
Messaging	Event Hubs	High-throughput ingestion	Telemetry, logs, stream processing
Identity	Azure AD	Identity and auth	Users, apps; tokens for APIs
Secrets	Key Vault	Secrets, keys, certs	Connection strings, API keys; reference from app
Edge	Front Door / APIM	Global load balancing, WAF, caching	Edge routing, security, CDN

Loading diagram…

Compute: App Service vs AKS vs Container Apps vs Functions

Azure App Service is the fastest path to production for many web applications. You get managed hosting, scaling (manual or autoscale), deployment slots, and integration with Azure AD and Key Vault. Use App Service when your app is a traditional web or API, you do not need to run multiple services in one place, and you are happy with Azure as the sole host. It suits internal tools, marketing sites, and many line-of-business APIs. Cost is predictable and operations are minimal.

AKS or Container Apps fit when you need orchestration: multiple services, rolling updates, service mesh, or portability to other clouds or on-prem. Choose AKS when your team already knows Kubernetes and you need advanced networking or stateful workloads; choose Container Apps when you want less operational overhead and event-driven or HTTP scaling.

Azure Functions (and Logic Apps for low-code workflows) are ideal for event-driven, short-lived work: reacting to blob uploads, queue messages, or HTTP webhooks. Use Functions for small, focused pieces of logic that scale independently; avoid long-running or stateful processes unless you use Durable Functions.

Loading diagram…

Data stores: Azure SQL vs Cosmos DB vs Blob

Azure SQL Database is the default for relational workloads: transactional consistency, JOINs, and existing tooling (EF Core, Dapper). Use it when your data model is relational and you need ACID guarantees. Choose the right tier (e.g. DTU or vCore) based on throughput and storage; scale up when needed and consider read replicas for read-heavy scenarios.

Azure Cosmos DB is a globally distributed NoSQL service with tunable consistency (strong to eventual). Use it when you need global distribution, low latency at scale, or a document/key-value model that does not fit SQL. It is more expensive per GB than Azure SQL; use it when multi-region write or single-digit-millisecond latency is a requirement. Model your data for the API you choose (SQL, MongoDB, etc.) and design partition keys for even distribution.

Azure Blob Storage is for unstructured data: files, backups, static assets, and data lakes. Use it for large binary objects and when you do not need query capability; combine with Azure Data Lake Storage or Synapse if you need analytics on top.

Example: Azure SQL + EF Core

// Example: choosing storage in configuration
// Azure SQL: connection string in Key Vault, EF Core
// MyApp.Infrastructure/OrderRepository.cs
public class OrderRepository : IOrderRepository
{
    private readonly AppDbContext _context;
    public OrderRepository(AppDbContext context) => _context = context;

    public async Task<Order> GetByIdAsync(int id, CancellationToken ct)
        => await _context.Orders.Include(o => o.Lines).FirstOrDefaultAsync(o => o.Id == id, ct);
}

Example: Cosmos DB with partition key

// Cosmos: endpoint + key, SDK with partition key
// MyApp.Infrastructure/CosmosOrderRepository.cs
public class CosmosOrderRepository
{
    private readonly Container _container;
    public CosmosOrderRepository(CosmosClient client, string db, string containerName)
        => _container = client.GetContainer(db, containerName);

    public async Task<OrderDocument> GetByIdAsync(string id, string partitionKey, CancellationToken ct)
    {
        var response = await _container.ReadItemAsync<OrderDocument>(id, new PartitionKey(partitionKey), cancellationToken: ct);
        return response.Resource;
    }
}

For Cosmos DB, the partition key determines how data is distributed; choose one that spreads load evenly (e.g. tenant ID or a high-cardinality field). Avoid a partition key that causes hot partitions (e.g. “status” when 90% of documents have status “active”).

Class structure: how the pieces fit together

A typical Azure solution involves an edge (Front Door or API Management), compute (App Service or AKS), storage abstractions, and health/observability. Keeping interfaces for storage and messaging lets you swap implementations (e.g. Azure SQL vs Cosmos, Service Bus vs Event Grid) without changing business logic.

Loading diagram…

Front Door (or API Management) sits at the edge: routing, caching, and optional WAF. WebApp is your ASP.NET Core app running on App Service or in AKS; it depends on IOrderRepository and IBlobStore so that you can test with mocks and swap Azure SQL for Cosmos, or Blob for local storage in dev. HealthCheck verifies that dependencies (e.g. database, Blob) are reachable so the orchestrator can take unhealthy instances out of rotation.

Messaging and integration: Service Bus, Event Grid, Event Hubs

For queues and topics between your own services, Azure Service Bus is the workhorse: reliable, ordered, and integrated with .NET. Use Event Grid for event routing at scale—high throughput, push/subscribe, and deep Azure integration (e.g. blob created, resource changed). Use Event Hubs for high-throughput ingestion (telemetry, logs) and stream processing. Do not use Event Hubs as a general-purpose queue; use Service Bus or Storage Queues for that.

Service	Use case	Delivery	Throughput
Service Bus	Reliable work between your services; dead-letter, sessions	At-least-once, ordered (with sessions)	High but not millions/sec
Event Grid	Fan-out, Azure resource events	At-least-once; push	Very high
Event Hubs	Ingestion, stream processing	At-least-once; consumer groups	Millions/sec

Security and identity: Azure AD, Managed Identity, Key Vault

Use Azure AD (or Entra ID) for identity and Managed Identity for service-to-service auth so that no secrets are stored in code. Put connection strings and keys in Key Vault and reference them from App Service, AKS, or Functions. Enable networking controls: VNet integration, private endpoints, and firewall rules so that only authorised traffic reaches your resources.

In practice: every App Service or Function that talks to SQL or Service Bus should use Managed Identity to get a token; connection strings live in Key Vault and are referenced by name, not copied into config. For human users, Azure AD (or B2C for consumer apps) issues tokens; your API validates them with the standard JWT middleware. Do not skip VNet integration for apps that only need to call other Azure services—private endpoints keep traffic off the public internet and satisfy many compliance requirements.

Networking: VNet, private endpoints, Front Door

VNet integration lets your App Service or Function App reach resources in a virtual network (e.g. Azure SQL, Service Bus) without exposing those resources to the public internet. Private endpoints attach a private IP from your VNet to an Azure service (e.g. SQL, Storage, Key Vault) so that traffic stays on the Microsoft backbone. Use them when compliance or security requires no public endpoint.

Azure Front Door (or API Management) sits at the edge: global load balancing, WAF, caching, and routing to your backends. Use Front Door when you need geo-routing, DDoS protection, or unified entry for multiple backends. CDN (often combined with Front Door) reduces latency and egress by caching responses at the edge.

Deployment and DevOps: slots, Bicep, pipelines

For App Service, use deployment slots to stage a new version and swap it with production; that gives zero-downtime deployments and a quick rollback (swap back). For AKS, use rolling updates so that new pods are brought up before old ones are terminated; pair that with readiness probes so traffic only goes to pods that can serve. For Functions, deploy via ARM, Bicep, or Terraform so that infrastructure and code are in one place; use application settings and Key Vault references so that secrets are not in the deployment package.

Example: minimal Bicep for App Service + Azure SQL

// main.bicep – minimal App Service + SQL for illustration
param location string = resourceGroup().location
param appName string = 'myapp'
param sqlServerName string = 'myapp-sql'
param sqlDbName string = 'MyAppDb'

resource sqlServer 'Microsoft.Sql/servers@2023-05-01-preview' = {
  name: sqlServerName
  location: location
  properties: {
    administratorLogin: 'sqladmin'
    administratorLoginPassword: 'CHANGE_ME_USE_KEY_VAULT'
  }
}

resource sqlDb 'Microsoft.Sql/servers/databases@2023-05-01-preview' = {
  parent: sqlServer
  name: sqlDbName
  location: location
  sku: { name: 'Basic' }
}

resource appServicePlan 'Microsoft.Web/serverfarms@2022-09-01' = {
  name: '${appName}-plan'
  location: location
  sku: { name: 'B1', tier: 'Basic' }
}

resource webApp 'Microsoft.Web/sites@2022-09-01' = {
  name: appName
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    serverFarmId: appServicePlan.id
    siteConfig: {
      netFrameworkVersion: 'v8.0'
    }
  }
}

What this Bicep does: It creates a SQL server and database, an App Service plan, and a Web App with system-assigned Managed Identity. In production, store the SQL admin password in Key Vault and reference it; do not put it in the Bicep file. A single pipeline that builds, tests, and deploys to dev then staging then production (with approval gates for prod) keeps everyone on the same page.

Monitoring and observability

You cannot right-size or fix what you do not measure. Azure Monitor and Application Insights give you logs, metrics, and (with the right SDK) distributed tracing. Enable Application Insights on every App Service, Function App, and AKS workload so that requests, dependencies, and exceptions are captured. Set alerts on error rate, latency (e.g. p95 above a threshold), and dependency failures. For AKS, use container insights; for Cosmos DB, watch RU consumption and throttle rate. A simple dashboard that shows request count, error rate, and dependency latency for your main API is a good starting point. Do not skip logging and metrics because “we will add them later”—adding instrumentation to a live system is harder than building it in from day one.

Decision framework: when to choose what

Before adding a service, run through a short checklist.

Compute: Is it a single web app or API with no need for multi-service orchestration? → App Service. Do you need to run many services, rolling updates, or portability to another cloud? → AKS or Container Apps. Is it event-driven, short-lived logic (e.g. react to blob upload, process queue message)? → Functions.

Storage: Relational data, JOINs, and ACID? → Azure SQL. Global distribution, very high throughput, or document/key-value at scale? → Cosmos DB. Files, backups, or data lake? → Blob (and optionally Data Lake Storage).

Messaging: Reliable, ordered work between your own services? → Service Bus. Fan-out events or reaction to Azure resource events? → Event Grid. High-throughput ingestion or stream processing? → Event Hubs.

Document these choices in an ADR (Architecture Decision Record) so that when someone asks “why Cosmos and not SQL?” you have a clear answer.

Right-sizing and cost discipline

Right-sizing starts with measurement. Use Azure Monitor to see CPU, memory, and throughput for your App Service plan, SQL DTUs/vCores, and Cosmos RUs. Start with the minimum tier that meets your SLA; scale up when you see sustained pressure (e.g. CPU consistently above 70%, or throttling in Cosmos). For development and staging, use auto-shutdown (App Service, VMs) and the smallest SKUs—no need for production-sized databases in dev. Reserved capacity (one or three years) can cut compute cost significantly for baseline workloads. Tag every resource with team, project, and environment so that Cost Management can show spend by area.

Practical rules: if your App Service plan is at 5% CPU most of the time, downsize the plan or reduce instance count. If Cosmos is throttling (429s), increase RUs or optimise queries and partition key design. Egress (data leaving Azure) is often forgotten until the bill arrives; keep data in-region where possible and use CDN or Front Door cache to reduce repeated fetches. Budget alerts in Cost Management are free and take five minutes to set up; set one at 80% of your expected spend so you get a warning before the month ends.

High availability and disaster recovery

High availability (HA): For App Service, use multiple instances (scale out) and deployment slots for zero-downtime swaps. For AKS, run multiple replicas per deployment and spread pods across availability zones (if your region supports them). For Azure SQL, use zone-redundant configuration or failover groups for automatic failover. For Cosmos DB, enable multi-region write or multi-region read depending on your consistency and latency requirements.

Disaster recovery (DR): Define RTO (recovery time objective) and RPO (recovery point objective). For Azure SQL, geo-replication and failover groups provide DR to another region. For Cosmos DB, add a secondary region and configure failover priority. For Blob, RA-GRS (read-access geo-redundant storage) replicates to a secondary region. For App Service and AKS, replicate your deployment to a secondary region and use Traffic Manager or Front Door for failover. Test failover and rollback regularly so that when a real disaster happens, you are not debugging the runbook for the first time.

Common issues and challenges

Over-provisioning compute: Using AKS or containers when App Service would suffice increases cost and operational complexity. Match the service to the workload: start with App Service for web/API; move to AKS or Container Apps only when you need orchestration, multi-service deployment, or portability. I have seen a team run a single .NET API on a 10-node AKS cluster for a year before someone asked why; they moved it to App Service and cut the bill by two-thirds.

Wrong storage choice: Using Cosmos DB for simple relational workloads or Azure SQL for global, low-latency NoSQL leads to cost and complexity. Choose by data model and scale: Azure SQL for relational; Cosmos DB for global NoSQL; Blob for files. A classic mistake is “we might go global someday” and picking Cosmos for a purely relational app; you pay more and get no benefit until you actually need multi-region write.

Serverless cold starts: Azure Functions can have noticeable cold-start latency. For latency-sensitive APIs, use premium plan or always-on App Service; use Functions for event-driven, batch, or background work where a few hundred ms delay is acceptable.

Cost explosion: Leaving resources running, over-provisioning, or ignoring egress can spiral cost. Use Cost Management and tags; auto-pause and right-size; prefer reserved capacity for baseline workloads. Staging environments that run 24/7 “in case we need to test” are a common leak; auto-shutdown outside business hours or scale to zero where the platform supports it.

Security misconfiguration: Exposing resources without private endpoints, storing secrets in code, or weak identity. Use Managed Identity, Key Vault, and networking controls so that only authorised traffic and identities reach your resources. Private endpoints and VNet integration remove the server from the public internet entirely—required for many compliance frameworks.

Best practices and pitfalls

Do:

Match compute to the workload: App Service for simple web/API, AKS or Container Apps for microservices, Functions for event-driven logic.
Choose storage by data model and scale: Azure SQL for relational, Cosmos DB for global NoSQL, Blob for files.
Use Managed Identity and Key Vault for secrets; never store connection strings in code or config.
Enable Application Insights and alerts from day one; right-size using Azure Monitor data.
Tag every resource (team, project, environment) for cost attribution and governance.
Document architecture decisions in ADRs; use Bicep or Terraform for repeatable deployments.

Don’t:

Don’t use AKS for a single API when App Service would suffice.
Don’t use Cosmos DB for purely relational workloads “in case we go global.”
Don’t use Event Hubs as a general-purpose queue; use Service Bus or Storage Queues.
Don’t skip private endpoints or VNet integration when compliance or security requires it.
Don’t leave dev/staging resources running 24/7 without auto-shutdown or scale-to-zero.

Summary

Match each workload to the right Azure service and right-size from the start—misalignment drives cost and complexity. Getting compute, storage, and messaging choices right matters for scalability and operational sanity; wrong choices cost time, money, and rework. Next, map your current or planned workloads to the decision framework above and adjust one area (e.g. compute or storage) before changing everything.

Compute: App Service for simple web/API, AKS or Container Apps for microservices, Functions for event-driven logic. Match the service to the workload.
Storage: Azure SQL for relational, Cosmos DB for global NoSQL, Blob for files. Right-size and monitor cost.
Messaging: Service Bus for queues and topics, Event Grid for event routing, Event Hubs for ingestion. Don’t use Event Hubs as a general-purpose queue.
Security: Azure AD and Managed Identity; Key Vault for secrets; VNet integration and private endpoints where required.
Deployment: Slots for App Service, rolling updates for AKS, Bicep/ARM for infrastructure; approval gates for production.
Cost: Measure, tag, right-size, reserved capacity for baseline, budget alerts. Avoid over-provisioning and idle dev resources.

Position & Rationale

I use App Service first for most web APIs and SPAs; I add AKS or Container Apps when I need multi-container, orchestration, or portability. I use Azure SQL for relational, Cosmos DB for global distribution or document shape, Blob for files and cold data. I avoid Functions for long-running or stateful work; I use Service Bus for ordered queues, Event Grid for event fan-out, Event Hubs for high-throughput ingestion. I always plan Managed Identity and Key Vault so secrets stay out of code; I right-size from day one and review cost regularly.

Trade-Offs & Failure Modes

What this sacrifices: Some flexibility—you’re committed to Azure and to the chosen compute/storage mix; migrating later has cost. You also accept operational overhead (monitoring, patching, cost alerts).
Where it degrades: When teams choose AKS “because we might need it” and then don’t use orchestration; or when Cosmos is chosen for a single-region relational workload. It also degrades when cost is not reviewed and sprawl grows.
How it fails when misapplied: App Service for a 100-node batch job; or Azure SQL for a globally distributed document store. Another failure: no Private Endpoints or Key Vault so secrets and data are exposed.
Early warning signs: “We’re on AKS but we only run one app”; “our Azure bill doubled and we don’t know why”; “we’re not using Managed Identity.”

What Most Guides Miss

Most guides list services. The hard part is right-sizing: start with the smallest fit (App Service + SQL), then move to containers or Cosmos when you hit a real limit—not in advance. The other gap: cost discipline—tag resources, set budgets and alerts, and review quarterly; I’ve seen teams over-provision and only notice when finance asks. Finally: HA/DR—multi-region and failover are not automatic; you need to design for them (e.g. Cosmos multi-region, SQL geo-replication, Front Door) and test failover.

Decision Framework

If you need a web API or SPA → Start with App Service + Azure SQL (or Cosmos if document model); add slots for staging.
If you need containers or multi-service orchestration → Container Apps for simpler, AKS for full Kubernetes; don’t choose AKS for a single app.
If you need event-driven or messaging → Service Bus for queues, Event Grid for fan-out, Event Hubs for high-throughput ingestion; match the guarantee (at-least-once, ordering) to the workload.
If you’re storing secrets or keys → Key Vault and Managed Identity from day one; never in config or code.
If cost is a concern → Right-size; set budgets and alerts; review and downscale unused resources.

You can also explore more patterns in the Cloud-Native Architecture resource page.

Key Takeaways

Match compute to workload: App Service first, containers when you need orchestration or portability; avoid AKS for a single app.
Match storage to data model and scale: SQL for relational, Cosmos for global or document, Blob for files; plan for growth but don’t over-provision day one.
Use Managed Identity and Key Vault; Private Endpoints where sensitive; plan HA/DR if you need it and test failover.
Right-size and set cost alerts; review quarterly so sprawl doesn’t surprise you.
Service Bus (queues), Event Grid (events), Event Hubs (ingestion)—choose by guarantee and throughput.

When I Would Use This Again — and When I Wouldn’t

I would use these Azure patterns again when I’m designing or migrating scalable apps on Azure and need to choose compute, storage, messaging, and security. I wouldn’t use them when the target is AWS or GCP—then the mental model applies but services differ. I also wouldn’t over-provision (e.g. AKS, Cosmos) for a small LOB app; start with App Service + SQL and move when you hit a real limit. Alternative: for tiny side projects, a single App Service plan and SQL database may be enough; add complexity only when required.

Frequently Asked Questions

When should I use App Service vs AKS vs Container Apps?

Use App Service for simple web/API when you do not need multi-service orchestration or portability. Use AKS when you need full Kubernetes (service mesh, advanced networking, multi-cloud). Use Container Apps when you want serverless-style containers with scale-to-zero and less operational overhead than AKS.

When should I use Azure SQL vs Cosmos DB?

Use Azure SQL when your data is relational and you need ACID and JOINs. Use Cosmos DB when you need global distribution, tunable consistency, or a document/key-value model at scale. Cosmos is more expensive per GB; use it when multi-region or single-digit-ms latency is a requirement.

What are the main cost drivers in Azure?

Compute (App Service, AKS nodes, Functions), storage (SQL, Cosmos, Blob), messaging (Service Bus, Event Grid, Event Hubs), and egress. Use reserved capacity for baseline; auto-pause and right-size; tag resources so that you can attribute cost by team and environment.

How do I choose between Service Bus, Event Grid, and Event Hubs?

Service Bus: queues and topics for reliable, ordered messaging between your services. Event Grid: high-throughput event routing and Azure integration (e.g. blob created). Event Hubs: high-throughput ingestion and stream processing. Do not use Event Hubs as a general-purpose queue.

What is Managed Identity and why use it?

Managed Identity lets Azure resources (App Service, AKS, Functions) authenticate to other Azure services (Key Vault, SQL, Service Bus) without storing secrets. Use it for service-to-service auth so that no connection strings or keys are in code or config.

How do I right-size Azure resources?

Measure usage (CPU, memory, throughput) with Azure Monitor; start with the minimum tier that meets SLA. Scale up when metrics justify it; use auto-scale for variable load. For dev/staging, use auto-shutdown and smaller SKUs.

When should I use Azure Container Apps vs AKS?

Use Container Apps when you want serverless-style containers (scale to zero, event-driven or HTTP scaling) with less operational overhead than Kubernetes. Use AKS when you need full Kubernetes: multiple node pools, service mesh, advanced networking, or workloads that require K8s APIs. Container Apps is often enough for 2–5 microservices; AKS fits when you have many services or need portability to another K8s cluster.

How do I reduce Azure egress cost?

Keep data and traffic in the same region where possible. Use private endpoints and VNet integration so that traffic between your app and Azure services (SQL, Storage, Service Bus) stays on the backbone. Cache responses at the edge (e.g. Front Door, CDN) to reduce repeated fetches. For cross-region, use Azure backbone and consider Traffic Manager or Front Door for geo-routing instead of duplicating data everywhere.

What is the difference between DTU and vCore for Azure SQL?

DTU (Database Transaction Unit) is a blended measure of CPU, memory, and I/O; simpler to reason about and good for small to medium workloads. vCore gives you explicit control over CPU and memory and supports higher limits and features like read replicas. For most apps, start with DTU; move to vCore when you need more control or higher scale.

When should I use Logic Apps vs Azure Functions?

Use Logic Apps when you need a low-code workflow (e.g. “when email arrives, parse and write to SQL”) with connectors and a visual designer; good for integration scenarios and non-developers. Use Functions when you need code-first logic, complex branching, or .NET/Node/Python. Functions give you full control; Logic Apps are faster to build for simple, connector-based flows.

How do I secure App Service and AKS?

For App Service: enable Managed Identity, use Key Vault references for secrets, turn on HTTPS only, and use VNet integration or private endpoints so the app is not exposed unnecessarily. For AKS: use Azure AD integration for cluster access, RBAC for in-cluster permissions, network policies to restrict pod traffic, and private cluster if you do not need public API server access. Never store secrets in config or environment variables; use Key Vault and Managed Identity.

What is Azure Cost Management and how do I use it?

Cost Management (in the Azure portal) shows spend by resource, resource group, tag, and service. Use tags (e.g. Team=Orders, Env=Prod) on every resource so you can slice cost by team or environment. Set budgets and alerts so that unexpected spikes (e.g. a runaway Function or a new Cosmos container) trigger a notification. Review recommendations (e.g. reserved capacity, right-sizing) regularly.

When should I use read replicas for Azure SQL?

Use read replicas when you have read-heavy workloads and want to offload queries from the primary. The replica is eventually consistent; use it for reporting, dashboards, or read-only API paths. Do not use it for transactional reads that must see the latest write—use the primary for that. Replicas add cost; only add them when the primary is under sustained read pressure.

How do I choose between Event Grid and Service Bus for events?

Use Event Grid when you need high-throughput, push-based delivery to many subscribers, or reaction to Azure resource events (blob created, resource changed). Use Service Bus when you need reliable, ordered processing with at-least-once delivery, dead-letter, and sessions. Event Grid is fire-and-forget at scale; Service Bus is for work that must be processed exactly once (or with explicit retries).

What is the minimum I need for production on Azure?

At least: compute (App Service or AKS) with Managed Identity; storage (Azure SQL or Cosmos) with Key Vault for connection strings; HTTPS and Azure AD (or B2C) for auth; Azure Monitor (or Application Insights) for logs and metrics; backup and disaster recovery per your RTO/RPO. Add private endpoints and VNet if compliance requires it. Do not skip monitoring and backup—they are not optional for production.

When should I use private endpoints?

Use private endpoints when you want traffic between your app and an Azure service (SQL, Storage, Key Vault, Service Bus) to stay on the Microsoft backbone and not cross the public internet. Required for many compliance frameworks (e.g. no public SQL endpoint). Combine with VNet integration on App Service or Functions so that outbound calls use the VNet and hit the private endpoint.

How do I deploy App Service with Bicep?

Use the Microsoft.Web/serverfarms and Microsoft.Web/sites resources in Bicep. Enable system-assigned Managed Identity on the site; reference Key Vault secrets for connection strings. Use deployment slots for staging. Store the Bicep in your repo and run it from a pipeline (Azure DevOps, GitHub Actions) with approval gates for production.

What is the difference between Front Door and API Management?

Front Door is a global load balancer and CDN with optional WAF; it routes traffic to your backends (App Service, AKS, etc.) and can cache responses. API Management (APIM) is an API gateway that sits in front of your APIs: rate limiting, authentication, transformation, and developer portal. Use Front Door for global routing and caching; use APIM when you need API-level policies, versioning, or a developer portal. You can use both: Front Door at the edge, APIM behind it for API-specific logic.

How do I design for high availability on Azure?

Use multiple instances (App Service scale out, AKS replicas), availability zones where supported, health checks and readiness probes, and deployment slots or rolling updates for zero-downtime deploys. For Azure SQL, use zone-redundant or failover groups; for Cosmos DB, enable multi-region. Define RTO and RPO and test failover and rollback regularly.

Related Guides & Resources

Part of cluster

Cloud & Azure — explore related topics:

Azure Cloud Architecture Microservices Architecture

Waqas Ahmad — Software Architect & Technical Consultant

Distributed Systems

Article

Azure Cloud Architecture Patterns for Scalable Apps

Read the article

Introduction

Topics covered

Decision Context

What is Azure cloud architecture and why it matters

What is App Service vs containers vs serverless?

Azure cloud architecture at a glance

Compute: App Service vs AKS vs Container Apps vs Functions

Data stores: Azure SQL vs Cosmos DB vs Blob

Class structure: how the pieces fit together

Messaging and integration: Service Bus, Event Grid, Event Hubs

Security and identity: Azure AD, Managed Identity, Key Vault

Networking: VNet, private endpoints, Front Door

Deployment and DevOps: slots, Bicep, pipelines

Monitoring and observability

Decision framework: when to choose what

Right-sizing and cost discipline

High availability and disaster recovery

Common issues and challenges

Best practices and pitfalls

Summary

Position & Rationale

Trade-Offs & Failure Modes

What Most Guides Miss

Decision Framework

Key Takeaways

When I Would Use This Again — and When I Wouldn’t

Frequently Asked Questions

Frequently Asked Questions

When should I use App Service vs AKS vs Container Apps?

When should I use Azure SQL vs Cosmos DB?

What are the main cost drivers in Azure?

How do I choose between Service Bus, Event Grid, and Event Hubs?

What is Managed Identity and why use it?

How do I right-size Azure resources?

When should I use Azure Container Apps vs AKS?

How do I reduce Azure egress cost?

What is the difference between DTU and vCore for Azure SQL?

When should I use Logic Apps vs Azure Functions?

How do I secure App Service and AKS?

What is Azure Cost Management and how do I use it?

When should I use read replicas for Azure SQL?

How do I choose between Event Grid and Service Bus for events?

What is the minimum I need for production on Azure?

When should I use private endpoints?

How do I deploy App Service with Bicep?

What is the difference between Front Door and API Management?

How do I design for high availability on Azure?

Related Guides & Resources

Related articles

Part of cluster

Related services