Article

Azure Microservices Architecture Best Practices

Q: When should I use AKS vs Service Fabric?

Use AKS for greenfield, container-first microservices when your services are stateless or state lives in databases/caches. Use Service Fabric when you have existing Service Fabric workloads, need stateful .NET services (reliable collections, actors), or require Windows containers. AKS gives portability and a large ecosystem; Service Fabric gives strong .NET and stateful primitives.

Q: What is the difference between liveness and readiness?

Liveness answers "is the process alive?"—if it fails, the orchestrator restarts the pod. Readiness answers "can this instance take traffic?"—if it fails, the pod is removed from the load balancer. Use a minimal liveness check (or none) and put dependency checks (DB, message bus) in readiness so the pod is not restarted when a dependency is temporarily down.

Q: Should I use synchronous REST or messaging between microservices?

Prefer asynchronous messaging (Service Bus, Event Grid) for cross-service communication so that availability and latency of one service do not cascade. Use synchronous REST only within a single service boundary or at the edge (e.g. API Gateway to one backend for a request-response flow).

Q: How do I secure Service Bus and avoid connection strings?

Use Managed Identity for producers and consumers: enable system- or user-assigned identity on your App Service or AKS pod identity, and grant the identity Azure Service Bus Data Sender/Receiver (or similar) on the namespace. In code, use DefaultAzureCredential or ManagedIdentityCredential; do not put connection strings in config.

Q: What are the main cost drivers for microservices on Azure?

Compute (AKS node pools or Service Fabric VMs), messaging (Service Bus, Event Grid), data (Azure SQL, Cosmos DB), and egress. Right-size node pools, use reserved capacity for baseline load, tag resources, and set budgets. Review messaging usage (per-operation cost) and storage tiers regularly.

Q: How do I implement health checks in .NET for AKS?

Use ASP.NET Core Health Checks: register AddHealthChecks(), add AddDbContextCheck and custom checks for Service Bus or other dependencies. Map liveness to a path with Predicate = _ => false (no checks) and readiness to a path that runs all checks. In Kubernetes, set livenessProbe and readinessProbe to hit those URLs.

Q: When should I use Event Grid vs Service Bus?

Use Event Grid for high-throughput, push-based event delivery and Azure resource events (e.g. blob created). Use Service Bus for reliable, ordered processing with dead-letter and sessions when work must be processed exactly once or with explicit retries. Event Grid is at-least-once and fan-out; Service Bus is for work queues and ordered processing.

Q: How many microservices should I start with?

Start with a small number (e.g. 2–5) and split only when you have clear ownership, independent release needs, or scaling/resilience requirements that justify the cost. Avoid splitting by technical layer; split by bounded context.

Q: What is Managed Identity and why use it for microservices?

Managed Identity lets Azure resources (App Service, AKS pods, Functions) authenticate to other Azure services (Key Vault, SQL, Service Bus) without storing secrets. Use it for all service-to-service auth so that no connection strings or keys are in code or config; rotation is handled by Azure.

Q: How do I run microservices locally?

Run dependencies in Docker (e.g. Azurite for storage, local emulators where available) or use stub implementations. Integration tests can hit a real Service Bus namespace in a dev subscription. For full local stacks, consider Tye or Docker Compose to orchestrate multiple services.

Azure microservices: AKS vs Service Fabric, event-driven design, health checks, and cost.

April 12, 2024 · Waqas Ahmad

Read the article

Introduction

This guidance is relevant when the topic of this article applies to your system or design choices; it breaks down when constraints or context differ. I’ve applied it in real projects and refined the takeaways over time (as of 2026).

Building microservices on Azure forces choices about orchestration (AKS vs Service Fabric), messaging (Service Bus, Event Grid), and cost—choices that affect scalability, operational complexity, and maintainability. This article explains when to choose AKS vs Service Fabric, how to design event-driven communication, how to implement health checks and readiness, and how to control cost without sacrificing reliability, with concrete code (Program.cs, Dockerfile, appsettings). Getting these decisions right matters for architects and tech leads who need consistent resilience, observability, and API design across services.

If you are new to Azure microservices, start with Topics covered and Azure microservices at a glance. We explain AKS, Service Fabric, event-driven patterns, health checks, and cost with tables, diagrams, and code.

For a deeper overview of this topic, explore the full Microservices Architecture guide.

Decision Context

System scale: Multiple services (typically 3+) on Azure; from a handful to dozens; applies when you’re building or operating microservices and need consistency in APIs, resilience, and observability.
Team size: One team per service or a small number of services; platform may own gateway, messaging, and observability; delivery teams own their services.
Time / budget pressure: Fits greenfield and incremental decomposition; breaks down when “we’ll add resilience later” and never do—then production bites.
Technical constraints: Azure (App Service, AKS, Service Bus, Event Grid, API Management, etc.); .NET typical; assumes you can add circuit breaker, retry, and tracing.
Non-goals: This article does not optimize for monoliths or for “microservices at any cost”; it optimises for consistent, resilient service design when you’ve already chosen a multi-service architecture.

What are microservices and why they matter

Microservices are an architectural style where you build a system as a set of small, independently deployable services. Each service owns a bounded piece of business capability (e.g. orders, billing, notifications) and communicates with others via APIs or messages. Unlike a monolith (one big application and one database), you deploy and scale each service separately; teams can own and release their service without waiting for the whole system. That brings benefits—independent scaling, technology diversity, clearer ownership—but also complexity: distributed tracing, eventual consistency, and more moving parts.

On Azure, you run these services on containers (e.g. in AKS) or managed runtimes (e.g. Service Fabric, App Service), with messaging (Service Bus, Event Grid) and databases (Azure SQL, Cosmos DB) tying them together. Getting the orchestration, messaging, and operational choices right from the start avoids costly rework and keeps delivery fast. The rest of this article focuses on how to do it well on Azure.

What is Azure Kubernetes Service (AKS)?

Azure Kubernetes Service (AKS) is Microsoft’s managed Kubernetes offering. Kubernetes (K8s) is an open-source system for orchestrating containers: it schedules and runs your containers (e.g. Docker images of your .NET API) on a cluster of machines, restarts failed containers, scales them up or down, and handles load balancing and rolling updates. You describe what you want (e.g. “run 3 replicas of my API”) in manifests (YAML or Helm charts), and Kubernetes keeps the cluster in that state.

AKS runs the control plane for you; you get a cluster and add node pools (the VMs that run your workloads). You focus on your apps; Microsoft handles upgrades, security patches, and scaling of the control plane. AKS fits stateless services well: your API, workers, or frontend run in containers; state lives in databases or caches outside the pod. It is the default choice for most new microservices on Azure because of portability (same K8s elsewhere) and a large ecosystem (Helm, Kustomize, GitOps, Azure Monitor integration).

What is Azure Service Fabric?

Azure Service Fabric is a different platform: a distributed systems runtime from Microsoft that can run containers but also native .NET services (stateful or stateless). It is built for stateful scenarios: in-memory state, reliable collections (distributed key-value stores), actors (stateful objects with single-threaded access), and long-running workflows. Service Fabric handles replication, failover, and rolling upgrades for you.

If your domain has actors (e.g. user session, device state) or stateful workflows that benefit from colocating logic and state, Service Fabric can simplify your design. The trade-off is lock-in to Microsoft’s stack and a steeper learning curve for teams used to Kubernetes. It is often chosen when you already have Service Fabric workloads, need Windows containers, or want strong .NET integration without adopting Kubernetes.

Azure microservices at a glance

Concept	What it is	When to use
AKS	Managed Kubernetes; container orchestration	Stateless or state-externalised microservices; portability; default for greenfield
Service Fabric	Distributed runtime; stateful services, actors, reliable collections	Stateful .NET services; Windows containers; existing SF workloads
Service Bus	Queues and topics; reliable, ordered messaging	Work between services; dead-letter; sessions
Event Grid	Event routing; push, high throughput	Fan-out; Azure resource events
Health checks	Liveness (am I up?) and readiness (can I take traffic?)	AKS/K8s probes; load balancer removal when not ready
Managed Identity	Azure AD identity for the app; no secrets in code	Auth to Key Vault, SQL, Service Bus

Loading diagram…

AKS vs Service Fabric in depth

AKS is the default choice for most new microservices. It is based on Kubernetes, uses open standards, and has a large ecosystem (Helm, Kustomize, GitOps). AKS fits well when your services are stateless or when state is externalised to databases and caches. You get portability: the same manifests can run on other Kubernetes offerings or on-prem. Teams that already know Kubernetes ramp up quickly, and Azure integration (Managed Identity, Key Vault, Monitor) is solid.

Service Fabric shines when you need stateful services (in-memory state, reliable collections), strong .NET integration, or Windows containers. It gives you a distributed runtime with built-in replication, failover, and rolling upgrades. If your domain has actors or long-running stateful workflows, Service Fabric can simplify your design. The trade-off is lock-in to Microsoft’s stack and a steeper learning curve for teams coming from Kubernetes.

Criterion	AKS	Service Fabric
Orchestration model	Kubernetes (pods, deployments, services)	Service Fabric (replicas, partitions, actors)
State	Stateless or state in external stores	Native stateful services; reliable collections; actors
Portability	High (K8s elsewhere)	Low (Microsoft stack)
Ecosystem	Helm, Kustomize, GitOps, broad tooling	Microsoft-centric tooling
.NET integration	Containers (any runtime)	First-class .NET; native hosting
Windows containers	Supported	Strong support
Typical use	Greenfield, container-first microservices	Existing SF; stateful .NET; Windows workloads

Recommendation: Prefer AKS for greenfield, container-first microservices. Choose Service Fabric when you have existing Service Fabric workloads, need stateful .NET services, or require Windows containers for legacy components.

Event-driven communication: Service Bus and Event Grid

Prefer asynchronous messaging between microservices so that availability and latency of one service do not cascade. On Azure, the main options are Azure Service Bus (queues and topics) and Azure Event Grid (event routing).

Service Bus gives you queues (point-to-point) and topics (publish-subscribe with filters). Messages are reliable, ordered (with sessions), and support dead-letter for failed processing. Use Service Bus when work must be processed exactly once or with explicit retries, or when you need sessions (e.g. per-user ordering). Use Managed Identity so no connection strings are in code.

Event Grid is high-throughput, push-based event delivery. It fits fan-out (one event to many subscribers) and Azure resource events (e.g. blob created, resource updated). It is at-least-once and does not replace a queue for ordered, exactly-once work. Use Event Grid when you need event routing at scale or integration with Azure services.

Loading diagram…

Health checks and readiness

In AKS, liveness and readiness probes determine whether a pod is kept running and whether it receives traffic. A liveness probe that fails causes the pod to be restarted; a readiness probe that fails removes the pod from the Service’s endpoints so it no longer receives requests (e.g. during startup or when a dependency is down).

Implement a health endpoint in your .NET API that checks dependencies (database, message bus). Use ASP.NET Core Health Checks: separate liveness (minimal: “process is up”) from readiness (dependencies OK). Expose them on different paths (e.g. /health/live and /health/ready) and point Kubernetes probes at them. That way, the orchestrator does not kill the pod when the database is temporarily slow, but it does stop sending traffic until the service is ready.

Cost optimization

Main cost drivers for Azure microservices: compute (AKS node pools, Service Fabric VMs), messaging (Service Bus, Event Grid), data (Azure SQL, Cosmos DB), and egress. To keep cost under control:

Right-size node pools: Start with the smallest node SKU that meets your resource requests; use scale-in and scale-out (or cluster autoscaler) for variable load.
Use Managed Identity: Avoid storing connection strings and keys; use Key Vault references and Managed Identity so you do not pay for extra secret management and reduce risk.
Reserved capacity: For baseline load, reserved instances or Savings Plans reduce compute cost.
Tag everything: Tag resources by team, environment, and project so you can attribute cost and set budgets.
Review messaging: Service Bus pricing is per operation and per topic/queue; consolidate or archive old topics. Event Grid is per event; avoid fan-out explosion if cost is a concern.

Decision framework: when to choose what

Scenario	Prefer
New microservices; stateless or state in DB/cache	AKS
Need portability (multi-cloud or on-prem)	AKS
Team knows Kubernetes	AKS
Existing Service Fabric estate	Service Fabric (or gradual move to AKS)
Stateful .NET; actors; reliable collections	Service Fabric
Windows containers required	Service Fabric or AKS with Windows node pools
Reliable, ordered work between services	Service Bus (queues/topics)
Fan-out events; Azure resource events	Event Grid

Minimal service: Program.cs, health and messaging

Step 1: Basic API with health

// OrderService/Program.cs (minimal)
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddHealthChecks();
var app = builder.Build();
app.MapHealthChecks("/health/live", new Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckOptions { Predicate = _ => false });
app.MapHealthChecks("/health/ready");
app.Run();

What this does: /health/live returns 200 with no checks (liveness = “process is up”). /health/ready runs all registered checks (readiness). In AKS, point liveness at /health/live and readiness at /health/ready.

Step 2: Add dependency checks (readiness)

// OrderService/Program.cs — add DbContext and Service Bus check
builder.Services.AddDbContext<OrderDbContext>(options =>
    options.UseSqlServer(builder.Configuration.GetConnectionString("Orders")));
builder.Services.AddSingleton<ServiceBusHealthCheck>();
builder.Services.AddHealthChecks()
    .AddDbContextCheck<OrderDbContext>("db")
    .AddCheck<ServiceBusHealthCheck>("servicebus");

What this does: Readiness now fails if the database or Service Bus is unreachable, so the pod is removed from the load balancer until dependencies are back.

Step 3: Publish to Service Bus

// OrderService/Services/OrderMessagePublisher.cs
public class OrderMessagePublisher : IOrderMessagePublisher
{
    private readonly ServiceBusClient _client;
    private readonly ServiceBusSender _sender;
    public OrderMessagePublisher(ServiceBusClient client, string topicName)
    {
        _client = client;
        _sender = _client.CreateSender(topicName);
    }
    public async Task PublishOrderPlacedAsync(OrderPlacedEvent evt, CancellationToken ct)
    {
        var body = JsonSerializer.Serialize(evt);
        await _sender.SendMessageAsync(new ServiceBusMessage(body), ct);
    }
}

How this fits together: The API uses IOrderMessagePublisher to publish events; the implementation uses Azure.Messaging.ServiceBus with a topic. Register ServiceBusClient with Managed Identity in production so no connection string is stored. Health check can ping the namespace or send a probe message to confirm connectivity.

Dockerfile and appsettings

Dockerfile: Multi-stage build keeps the image small. Use a non-root user and expose the port your app listens on.

# OrderService/Dockerfile
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY ["OrderService/OrderService.csproj", "OrderService/"]
RUN dotnet restore "OrderService/OrderService.csproj"
COPY . .
RUN dotnet build "OrderService/OrderService.csproj" -c Release -o /app/build
FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS runtime
WORKDIR /app
COPY --from=build /app/build .
USER 1000
EXPOSE 8080
ENTRYPOINT ["dotnet", "OrderService.dll"]

appsettings: Keep secrets out of config. Use environment variables or Key Vault references (e.g. App Service Key Vault references, or AKS secrets / external-secrets). Example structure:

{
  "Logging": { "LogLevel": { "Default": "Information" } },
  "ConnectionStrings": {
    "Orders": "Server=...;Database=Orders;..."
  },
  "ServiceBus": {
    "Namespace": "my-namespace.servicebus.windows.net",
    "TopicName": "orders"
  }
}

In production, replace connection strings with Managed Identity and Key Vault; Namespace can stay in config if you use Managed Identity for auth.

Class structure: how the pieces fit together

An Azure microservices solution typically involves: the API edge, backend services, messaging abstractions, and health/observability. Each service exposes HTTP or gRPC and depends on interfaces for messaging and persistence so you can swap implementations (e.g. Service Bus vs Event Grid) without changing business logic.

Loading diagram…

Common issues and challenges

Synchronous chains: Calling multiple services over HTTP in a chain increases latency and couples availability. Prefer async messaging for cross-service work.
No health/readiness split: Using a single health endpoint that checks dependencies can cause the orchestrator to restart the pod when the DB is slow. Split liveness (minimal) and readiness (dependencies).
Secrets in config: Connection strings in appsettings or env vars are a security and rotation burden. Use Managed Identity and Key Vault.
Over-partitioning: Too many microservices too early increases operational and network cost. Start with a small number of bounded contexts and split when ownership or scaling justifies it.
Ignoring cost: Unbounded node pools, unused topics, or large Cosmos/SQL tiers add up. Tag, budget, and right-size from day one.

Best practices and pitfalls

Do:

Prefer AKS for greenfield microservices unless you have a strong reason for Service Fabric.
Use Managed Identity for all Azure resource auth (Key Vault, SQL, Service Bus).
Implement liveness and readiness and wire them to K8s probes.
Use async messaging (Service Bus or Event Grid) for cross-service communication.
Tag resources and set budgets and alerts.
Keep ADRs (Architecture Decision Records) for why you chose AKS vs Service Fabric, or Service Bus vs Event Grid.

Do not:

Store connection strings or keys in code or config when Managed Identity is available.
Use a single health check that fails on dependency issues for liveness (or the orchestrator will restart the pod).
Call many services synchronously in a chain; use events or queues.
Split into dozens of services before you have clear ownership and release needs.
Skip monitoring and distributed tracing; use correlation IDs and Azure Monitor / Application Insights.

Summary

AKS is the default for new Azure microservices; Service Fabric fits stateful .NET, actors, or existing SF workloads—use Service Bus for reliable work between services and Event Grid for fan-out, and implement liveness and readiness with dependencies. Getting orchestration, messaging, and observability wrong leads to production incidents and rework; designing for failure and tracing from day one keeps systems reliable. Next, map your bounded contexts and deployment targets, then choose AKS or Service Fabric and add health checks and correlation IDs before scaling out.

AKS is the default for new Azure microservices; Service Fabric fits stateful .NET, actors, or existing SF workloads.
Use Service Bus for reliable, ordered work between services; Event Grid for fan-out and Azure events.
Implement liveness (minimal) and readiness (with dependencies) and map them to AKS probes.
Managed Identity and Key Vault keep secrets out of code; tag and budget to control cost.
Structure services with interfaces for messaging and persistence; use Dockerfile multi-stage builds and appsettings without secrets in repo.
Avoid synchronous chains, over-partitioning, and skipping health checks or observability.

Position & Rationale

I favour API-first and contracts (OpenAPI, versioning) so services don’t break each other; resilience (circuit breaker, retry with backoff) so one failing service doesn’t cascade. I use messaging (Service Bus, Event Grid) for async and distributed tracing (e.g. Application Insights, W3C) so we can follow a request across services. I avoid shared databases between services; each service owns its data and exposes an API. I also avoid “we’ll add observability later”—correlation IDs and health checks from day one.

Trade-Offs & Failure Modes

What this sacrifices: Operational complexity (many services, many deploys, many failure modes); you accept eventual consistency and network failures as normal.
Where it degrades: When services are too fine-grained (network hop for every operation) or when nobody owns cross-cutting concerns (auth, tracing, gateway).
How it fails when misapplied: No circuit breaker so one slow dependency takes down the service; or no idempotency so retries duplicate side effects.
Early warning signs: “We don’t know which service is slow”; “our gateway is a single point of failure”; “we have no distributed tracing.”

What Most Guides Miss

Most guides list “best practices” without who owns what. Gateway, messaging, and tracing are often platform concerns; service teams own their API and resilience. If that split is unclear, you get gaps. The other gap: contract testing—services that consume others should have contract tests (e.g. Pact) so breaking changes are caught before deploy. Finally: idempotency—when you retry or replay messages, handlers must be idempotent or you get duplicate orders, double charges, etc.; many guides mention retry but not idempotency.

Decision Framework

If you’re adding a new service → Define API (OpenAPI), version it, add health and readiness; use circuit breaker and retry for outbound calls.
If you’re integrating services → Prefer async (messaging) for fire-and-forget; sync (HTTP) when you need an immediate response; ensure idempotency for retries.
If you have no distributed tracing → Add correlation IDs (W3C trace context) and send them to Application Insights or similar; start with one service and expand.
If the gateway is a bottleneck or single point of failure → Scale it, add health checks, and consider multi-region if needed.
If services share a database → Plan to split; shared DB creates coupling and blocks independent deploy.

You can also explore more patterns in the Microservices Architecture resource page.

Key Takeaways

One service per bounded context; each owns its data and exposes an API; no shared database.
Resilience: circuit breaker and retry with backoff for outbound calls; design for failure.
Observability: correlation IDs and distributed tracing from day one; health checks for every service.
Contracts and versioning (OpenAPI, URL or header versioning) so consumers don’t break.
Idempotency for message handlers and retried operations so retries don’t duplicate side effects.

Need help designing resilient microservices? I support teams with domain boundaries, service decomposition, and distributed systems architecture.

When I Would Use This Again — and When I Wouldn’t

I would use these practices again when I’m building or operating microservices on Azure and need consistent resilience, observability, and API design. I wouldn’t use them for a monolith—then focus on modular monolith and in-process boundaries first. I also wouldn’t skip resilience or tracing “to ship faster”; production incidents cost more. Alternative: if you’re decomposing a monolith, introduce circuit breaker and tracing for the first extracted service and then apply the same pattern as you split further.

Frequently Asked Questions

When should I use AKS vs Service Fabric?

Use AKS for greenfield, container-first microservices when your services are stateless or state lives in databases/caches. Use Service Fabric when you have existing Service Fabric workloads, need stateful .NET services (reliable collections, actors), or require Windows containers. AKS gives portability and a large ecosystem; Service Fabric gives strong .NET and stateful primitives.

What is the difference between liveness and readiness?

Liveness answers “is the process alive?”—if it fails, the orchestrator restarts the pod. Readiness answers “can this instance take traffic?”—if it fails, the pod is removed from the load balancer. Use a minimal liveness check (or none) and put dependency checks (DB, message bus) in readiness so the pod is not restarted when a dependency is temporarily down.

Should I use synchronous REST or messaging between microservices?

Prefer asynchronous messaging (Service Bus, Event Grid) for cross-service communication so that availability and latency of one service do not cascade. Use synchronous REST only within a single service boundary or at the edge (e.g. API Gateway to one backend for a request-response flow).

How do I secure Service Bus and avoid connection strings?

Use Managed Identity for producers and consumers: enable system- or user-assigned identity on your App Service or AKS pod identity, and grant the identity Azure Service Bus Data Sender/Receiver (or similar) on the namespace. In code, use DefaultAzureCredential or ManagedIdentityCredential; do not put connection strings in config.

What are the main cost drivers for microservices on Azure?

Compute (AKS node pools or Service Fabric VMs), messaging (Service Bus, Event Grid), data (Azure SQL, Cosmos DB), and egress. Right-size node pools, use reserved capacity for baseline load, tag resources, and set budgets. Review messaging usage (per-operation cost) and storage tiers regularly.

How do I implement health checks in .NET for AKS?

Use ASP.NET Core Health Checks: register AddHealthChecks(), add AddDbContextCheck and custom checks for Service Bus or other dependencies. Map liveness to a path with Predicate = _ => false (no checks) and readiness to a path that runs all checks. In Kubernetes, set livenessProbe and readinessProbe to hit those URLs.

When should I use Event Grid vs Service Bus?

Use Event Grid for high-throughput, push-based event delivery and Azure resource events (e.g. blob created). Use Service Bus for reliable, ordered processing with dead-letter and sessions when work must be processed exactly once or with explicit retries. Event Grid is at-least-once and fan-out; Service Bus is for work queues and ordered processing.

How many microservices should I start with?

Start with a small number (e.g. 2–5) and split only when you have clear ownership, independent release needs, or scaling/resilience requirements that justify the cost. Avoid splitting by technical layer; split by bounded context.

What is Managed Identity and why use it for microservices?

Managed Identity lets Azure resources (App Service, AKS pods, Functions) authenticate to other Azure services (Key Vault, SQL, Service Bus) without storing secrets. Use it for all service-to-service auth so that no connection strings or keys are in code or config; rotation is handled by Azure.

How do I run microservices locally?

Run dependencies in Docker (e.g. Azurite for storage, local emulators where available) or use stub implementations. Integration tests can hit a real Service Bus namespace in a dev subscription. For full local stacks, consider Tye or Docker Compose to orchestrate multiple services.

What is the role of correlation ID in microservices?

A correlation ID (or trace ID) is passed in headers across all services involved in a request. It lets you search logs and traces for every log line related to that request, making debugging and observability possible across service boundaries. Use it in middleware and when publishing/consuming messages.

When should I use Service Fabric actors?

Use Service Fabric actors when you have stateful, per-entity logic (e.g. user session, device state, workflow per order) that benefits from colocated state and single-threaded access. If your state is already in a database and you do not need in-memory state or actor semantics, AKS with stateless services is usually simpler.

How do I reduce AKS cost?

Right-size node pools (smallest SKU that meets resource requests), use cluster autoscaler to scale in when idle, reserved instances or Savings Plans for baseline load, and tag everything for attribution. Avoid over-provisioning “just in case”; scale up when metrics justify it.

What is the minimum I need for production microservices on Azure?

Compute (AKS or Service Fabric) with Managed Identity, health checks (liveness + readiness), messaging (Service Bus or Event Grid), storage (Azure SQL or Cosmos) with Key Vault, HTTPS and Azure AD where applicable, Azure Monitor (logs + metrics + alerts), and distributed tracing with correlation IDs. Do not skip monitoring and health.

How do I structure configuration for many services?

Use Azure App Configuration or Key Vault for shared config and secrets; use environment-specific labels or key prefixes. Per service, use environment variables or mounted config maps in AKS. Avoid embedding environment names in code; use feature flags and config for behaviour.

What are ADRs and why use them for microservices?

Architecture Decision Records are short documents that capture a decision, context, and consequences. Use them so that future teams understand why you chose AKS over Service Fabric, or Service Bus over Event Grid, and can revisit when requirements change.

How do I do blue-green or canary on AKS?

Use Kubernetes deployment strategies: multiple deployments with different versions and a Service that you switch for blue-green. For canary, use two deployments with a fraction of replicas on the new version and gradually shift traffic (e.g. with Istio or a custom ingress).

Should I use Windows or Linux containers on AKS?

Use Linux containers unless you have a legacy or vendor requirement for Windows. Linux node pools are the default, have broader image support, and are often cheaper. Use Windows node pools only when necessary (e.g. .NET Framework, Windows-specific APIs).

How do I test microservices locally?

Run dependencies in Docker (Azurite, local emulators) or stubs. Use Tye or Docker Compose to run multiple services. Integration tests can target a dev Service Bus or SQL instance. Keep contract tests (e.g. consumer-driven) so that API changes are caught before deployment.

How do I secure the Service Bus namespace?

Use Managed Identity for producers and consumers. Restrict the namespace to a VNet with private endpoints. Use RBAC (e.g. Azure Service Bus Data Sender/Receiver) so each service has least privilege. Enable TLS and consider customer-managed keys for encryption at rest.

Related Guides & Resources

Part of cluster

Cloud & Azure — explore related topics:

Azure Cloud Architecture Microservices Architecture

Waqas Ahmad — Software Architect & Technical Consultant

Distributed Systems

Article

Azure Microservices Architecture Best Practices

Read the article

Introduction

Topics covered

Decision Context

What are microservices and why they matter

What is Azure Kubernetes Service (AKS)?

What is Azure Service Fabric?

Azure microservices at a glance

AKS vs Service Fabric in depth

Event-driven communication: Service Bus and Event Grid

Health checks and readiness

Cost optimization

Decision framework: when to choose what

Minimal service: Program.cs, health and messaging

Dockerfile and appsettings

Class structure: how the pieces fit together

Common issues and challenges

Best practices and pitfalls

Summary

Position & Rationale

Trade-Offs & Failure Modes

What Most Guides Miss

Decision Framework

Key Takeaways

When I Would Use This Again — and When I Wouldn’t

Frequently Asked Questions

Frequently Asked Questions

When should I use AKS vs Service Fabric?

What is the difference between liveness and readiness?

Should I use synchronous REST or messaging between microservices?

How do I secure Service Bus and avoid connection strings?

What are the main cost drivers for microservices on Azure?

How do I implement health checks in .NET for AKS?

When should I use Event Grid vs Service Bus?

How many microservices should I start with?

What is Managed Identity and why use it for microservices?

How do I run microservices locally?

What is the role of correlation ID in microservices?

When should I use Service Fabric actors?

How do I reduce AKS cost?

What is the minimum I need for production microservices on Azure?

How do I structure configuration for many services?

What are ADRs and why use them for microservices?

How do I do blue-green or canary on AKS?

Should I use Windows or Linux containers on AKS?

How do I test microservices locally?

How do I secure the Service Bus namespace?

Related Guides & Resources

Related articles

Part of cluster

Related services