👋Hi, I'm Waqas — a Software Architect and Technical Consultant specializing in .NET, Azure, microservices, and API-first system design..
I help companies build reliable, maintainable, and high-performance backend platforms that scale.
Case Study: BAT In-House Application — Microservices at Scale
Enterprise microservices case study: stack, challenges, and outcomes. Real-world lessons.
June 15, 2024 · Waqas Ahmad
Read the article
Introduction
This guidance is relevant when the topic of this article applies to your system or design choices; it breaks down when constraints or context differ. I’ve applied it in real projects and refined the takeaways over time (as of 2026).
Unifying 8+ enterprise systems (SAP, Cherwell, Power Apps, SharePoint, analytics, data lake) with real-time sync and enterprise-grade security is a common challenge for large organisations. This case study walks through the BAT in-house microservices platform on Azure: goals, architecture (Service Fabric, API Gateway, multi-database), engineering challenges, and business outcomes (5M+ records/hour, 99.9% uptime, $5.2M+ savings). For architects and tech leads planning similar enterprise integration, the patterns and trade-offs below show what was built, what was solved, and what was measured.
System scale: Enterprise platform integrating 8+ systems (SAP Planet 8/9, Cherwell HR/IT, Power Apps, SharePoint, analytics, data lake); 5M+ records/hour, 99.9% uptime target, <200ms response. The approach applies when you have multiple legacy and SaaS systems to unify and need real-time sync, security, and observability at scale.
Team size: Multi-team delivery (platform, integration, data, security); ownership split between Service Fabric microservices, API Gateway, and data pipelines. Works when at least one team owns the integration layer and another owns observability and security.
Time / budget pressure: Fixed timeline and governance; delivery required phased integration and release gates. I’ve applied this pattern under similar enterprise constraints—ARM templates, security certificates, and change boards.
Technical constraints: Azure (Service Fabric, API Gateway, Service Bus, Event Grid, SQL, Cosmos, Redis, Data Lake); .NET; OAuth 2.0, Key Vault, Application Insights. Legacy systems (SAP, Cherwell) had fixed APIs; we built adapters and real-time sync.
Non-goals: This case study does not optimize for minimal cost or for greenfield-only; it optimises for unifying fragmented enterprise systems with security, reliability, and measurable ROI.
What was the project and context?
The BAT in-house application is an enterprise platform that streamlines British American Tobacco’s internal operations. The system integrates 8+ different enterprise systems, providing unified access to HR, IT, analytics, and business intelligence across the organisation. The business needed a unified platform to integrate fragmented systems (SAP Planet 8/9, Cherwell HR/IT, Power Apps, SharePoint, analytics platforms, data lake) while maintaining data consistency, real-time synchronization, and enterprise-scale security and reliability. This case study describes the architecture, stack, challenges, and outcomes so that readers can compare with their own context.
BAT in-house at a glance
Aspect
What it was
Scope
Enterprise microservices platform; 8+ enterprise systems integrated (SAP Planet 8/9, Cherwell HR/IT, Power Apps, SharePoint, analytics, data lake).
Compute
Azure Service Fabric for microservices orchestration; Azure API Gateway at the edge (ARM templates, security certificates).
Integration
Azure Service Bus and Event Grid for messaging and event-driven processing; real-time sync across 8+ systems.
Data
Azure SQL for transactional data; Cosmos DB for document storage and scale; Redis for caching; Azure Data Lake for analytics.
Security
OAuth 2.0, JWT, Azure Active Directory; Key Vault for secrets; security certificates; RBAC and audit logging.
Observability
Application Insights, Azure Monitor, Log Analytics; correlation and health checks.
British American Tobacco needed a unified platform to integrate their fragmented enterprise systems including SAP Planet 8/9, Cherwell HR/IT, Power Apps, SharePoint, and various analytics platforms. The organisation faced data silos, inconsistent user experiences, and the need for real-time business intelligence across multiple departments. They required a solution that could handle enterprise-scale operations while maintaining security, performance, and reliability—with zero data loss and 99.9% reliability across distributed microservices.
The Solution
The BAT Inhouse App leverages Azure Service Fabric microservices architecture to create a unified enterprise platform. The system integrates all existing enterprise systems through Azure API Gateway with ARM templates, security certificates, and advanced routing. Built with .NET Core and featuring multi-database architecture (Azure SQL + Cosmos DB), the platform provides real-time data synchronization and comprehensive analytics capabilities. Azure Notification Hub supports real-time push notifications, email, SMS, and Teams alerts. The CI/CD pipeline (Azure DevOps, ARM templates) ensures zero-downtime deployments and continuous integration of new features, with enterprise-grade authentication and authorization.
Key Features
Enterprise system integration: Seamless integration with SAP Planet 8/9, Cherwell HR/IT, Power Apps, and SharePoint (and related analytics/data lake solutions).
Azure API Gateway: State-of-the-art API Gateway with ARM templates, security certificates, and advanced routing for microservices orchestration.
Microservices architecture:Azure Service Fabric orchestration for independent scaling and reliability; domain-driven design, CQRS, event sourcing, and Saga pattern for distributed transactions.
Multi-database architecture:Azure SQL for transactional data; Cosmos DB for document storage and global distribution; each service owns its data where appropriate.
Real-time analytics: Comprehensive analytics platform with data lake integration for business intelligence; Power BI and machine learning for predictions and optimisation.
Enterprise security:OAuth 2.0, JWT tokens, role-based access control, and comprehensive audit logging; Azure Key Vault and Managed Identity where applicable.
Advanced monitoring: Real-time system monitoring with Application Insights, Azure Monitor, performance metrics, and health checks.
Notification systems:Azure Notification Hub for real-time push notifications, email, SMS, and Teams alerts.
Architecture overview
The architecture is organised in layers aligned with the project page.
Enterprise Frontend Portal: Modern Angular-based enterprise portal providing BAT employees with unified access to all internal systems, HR services, IT support, and business intelligence. Features include unified dashboard, role-based access, real-time data synchronization, OAuth 2.0 authentication, multi-tenant support, mobile-first design, and PWA capabilities for offline access and push notifications.
Azure API Gateway & Management:Azure API Gateway with ARM templates, security certificates, and advanced routing for microservices orchestration. Includes rate limiting, request transformation, API versioning, intelligent load balancing with health checks and circuit breaker patterns, enterprise security (OAuth 2.0, JWT, Azure AD), and API analytics and monitoring.
Azure Service Fabric Microservices: Scalable microservices orchestrated by Azure Service Fabric handling HR management, IT services, analytics processing, and enterprise system integrations. Includes .NET Core microservices with domain-driven design, CQRS, event sourcing; Saga pattern (choreography and orchestration) for distributed transactions; enterprise system integration (SAP, Cherwell, Power Apps, SharePoint, Teams); business intelligence engine; and multi-channel notification (email, SMS, Teams, push).
Multi-Database Data Infrastructure:Azure SQL for transactional and operational data; Cosmos DB for document storage and global distribution; Redis cache layer; Entity Framework Core for data access; automated backup and recovery with geo-redundant storage and RTO/RPO targets.
Enterprise System Integrations: Real-time data synchronization with 8+ enterprise systems: SAP Planet 8/9 (ERP, employee and business process automation), Cherwell HR/IT (IT service management, HR, onboarding, tickets), Microsoft Power Platform (Power Apps, Power BI, Power Automate), Microsoft SharePoint (document management, collaboration). Custom connectors, data transformation, and error handling for enterprise operations.
Analytics & Data Lake Platform:Azure Data Lake storage for analytics and ML; real-time analytics engine; Power BI integration for dashboards and reporting; machine learning platform for business pattern prediction and operational optimisation.
Security & Compliance Framework:OAuth 2.0 and Azure AD authentication; advanced threat protection (OWASP Top 10, SQL injection/XSS prevention, network security groups); end-to-end encryption (AES-256 at rest, TLS in transit, Key Vault for sensitive data, key rotation); enterprise compliance (GDPR, data residency, consent management, 7-year audit log retention).
1. Complex enterprise system integration with Saga pattern
Problem: Integrating 8+ different enterprise systems (SAP Planet 8/9, Cherwell HR/IT, Power Apps, SharePoint, Teams) while maintaining data consistency, real-time synchronization, and business process integrity across distributed microservices—with zero data loss and 99.9% reliability.
Solutions:
Saga pattern implementation: Choreography-based and orchestration-based sagas for distributed transaction management; Azure Service Bus for reliable delivery; compensation patterns for rollback.
Enterprise integration framework:Azure API Management with rate limiting, request transformation, circuit breaker patterns; custom connectors for SAP, Cherwell, and Microsoft Power Platform; data transformation pipelines and error handling.
Real-time data synchronization:Azure Event Grid and Service Bus for real-time sync; conflict resolution algorithms, data validation pipelines, and custom sync engine for 99.9% data quality across 8+ systems.
Integration health monitoring: Health checks every 30 seconds; alerting for API failures and data inconsistencies; retry policies with exponential backoff and dead letter queues.
2. Azure API Gateway implementation
Problem: Implementing enterprise-grade Azure API Gateway with ARM templates, security certificates, and advanced routing while ensuring zero-downtime deployments, security compliance, and seamless microservices orchestration for thousands of concurrent users.
Solutions:
Infrastructure as Code (ARM templates): Automated resource provisioning, security certificate management, environment-specific config; blue-green deployment for zero-downtime updates and rollback.
Advanced routing and load balancing: Health checks, circuit breaker patterns, automatic failover; rate limiting (e.g. 1000 requests/minute per system), request/response transformation, API versioning and backward compatibility.
API analytics and performance monitoring: Real-time metrics, performance tracking, usage analytics; request/response logging, error tracking, alerting; 99.5% SLA monitoring.
3. Multi-database architecture with data consistency
Problem: Designing multi-database architecture (Azure SQL + Cosmos DB) while maintaining data consistency, ACID where needed, and handling 5M+ records per hour with sub-200ms response times across distributed systems.
Cosmos DB global distribution: Intelligent partitioning, automatic scaling, consistency models; multi-region writes, conflict resolution policies.
Data synchronization and consistency:Azure Data Factory and custom ETL pipelines; eventual consistency where appropriate; data validation and conflict resolution for 99.9% consistency.
Intelligent caching (Redis):98% cache hit rate; L1/L2 layers, cache-aside and write-through for critical data; reduced database load during peak operations.
4. Enterprise analytics and data lake integration
Problem: Integrating analytics platforms and Azure Data Lake for real-time business intelligence, ML workloads, and decision-making while processing 5M+ records per hour with sub-second analytics and predictive insights.
Solutions:
Azure Data Lake Storage: Hierarchical namespace, security integration, data governance; automated ingestion, partitioning, lifecycle management.
Machine learning platform:Azure Machine Learning with custom models for business pattern prediction and optimisation; MLOps pipeline, A/B testing, automated rollback.
Power BI integration: Real-time data connections, automated reports, interactive dashboards; custom data models and business intelligence.
Problem: Implementing security for a mission-critical system handling sensitive employee data, financial transactions, and business operations while ensuring GDPR compliance, threat protection, and security posture across distributed microservices.
Solutions:
OAuth 2.0 and MFA: Enterprise OAuth 2.0 for thousands of BAT employees; MFA (TOTP, SMS), conditional access (location, device trust), identity management.
End-to-end encryption:AES-256 at rest; TLS 1.3 in transit with certificate pinning; field-level encryption for PII via Key Vault; automatic key rotation (e.g. 90 days).
Enterprise compliance and audit:GDPR, data residency, consent management, automated compliance reporting; 7-year audit log retention.
Business impact and ROI
The engineering solutions delivered measurable results for BAT:
Financial and operational impact:
Annual cost savings: $5.2M+
Operational efficiency gain: 70%
Manual processing reduction: 75%
ROI on development investment: 340%
Payback period: 4.2 months
Growth and adoption:
Employee productivity growth: +65%
System integration success: 8+ systems unified
Data processing speed: 5M+ records/hour
Business intelligence adoption: 87%
Support ticket reduction: -70%
Performance metrics:99.9% system uptime; < 200ms response time; 5M+ records/hour; 8+ enterprise integrations. The architecture with Azure Service Fabric, Saga pattern, and multi-database design provided enterprise-grade scalability, security, and reliability for BAT’s mission-critical operations.
Outcomes and lessons
Outcomes:30% reduction in operational costs ($5M+ saved annually); 70% faster data processing (5M+ records/hour); unified 8+ enterprise system integrations with 99.9% data consistency; comprehensive analytics for strategic decision-making. Deployment and governance improved as the pipeline and ARM-based infrastructure became the gate for quality and compliance.
Lessons:Automate governance (tests, scans, ADRs, ARM) so that speed and compliance go together. Invest in observability early (correlation IDs, traces, runbooks). Idempotency and Saga are critical for distributed integration. Runbooks and post-mortems sustain reliability and institutional knowledge. API Gateway and ARM templates centralise routing and security and enable zero-downtime deployments.
Best practices and takeaways
API contract and versioning: Define and version APIs; use contract tests in CI so that 8+ system integrations remain compatible.
Saga and compensation: Use Saga pattern (choreography or orchestration) for distributed transactions across microservices and integrated systems; design compensation for rollback.
Secrets in Key Vault; auth via Managed Identity/OAuth: Never store credentials in code or config; use Key Vault and Azure AD so that rotation and audit are centralised.
Observability as a product: Treat logs, metrics, and traces as first-class; Application Insights and Azure Monitor with correlation IDs and runbooks so that on-call is effective.
Governance through automation: Security scans, tests, and quality gates in the pipeline; ARM templates for repeatable infrastructure; evidence-based approvals.
Summary
The BAT in-house platform unifies 8+ enterprise systems with Azure Service Fabric, API Gateway, and real-time sync; the main lesson is that enterprise integration at scale needs clear bounded contexts, Saga-style consistency, and observability from day one. Getting integration boundaries or security wrong would have undermined trust and ROI; the patterns here are reusable for similar unification projects. Next, if you are unifying multiple legacy or SaaS systems, map your integration points and consistency requirements, then apply the same principles: API Gateway, multi-database strategy, and zero-downtime releases.
The BAT in-house application is an enterprise microservices platform on Azure: Azure Service Fabric, Azure API Gateway (ARM, security certificates), 8+ system integrations (SAP Planet 8/9, Cherwell HR/IT, Power Apps, SharePoint, analytics, data lake), Azure SQL + Cosmos DB + Redis + Data Lake, OAuth 2.0/JWT/Azure AD, Application Insights/Azure Monitor.
Challenge: Unifying fragmented enterprise systems with data silos and inconsistent UX; need for real-time BI and enterprise-scale security and reliability.
Solution: Azure Service Fabric microservices, Azure API Gateway, multi-database architecture, real-time sync, Saga pattern, enterprise security, and CI/CD with ARM and Azure DevOps.
Engineering challenges addressed: Complex integration (Saga, custom connectors, real-time sync, health monitoring); API Gateway (ARM, security, routing, monitoring); multi-database consistency (SQL, Cosmos, caching, ETL); analytics and data lake (Data Lake, ML, Power BI, Stream Analytics); security and compliance (OAuth 2.0, MFA, encryption, GDPR, audit).
Business impact: $5.2M+ annual savings, 70% operational efficiency, 75% manual reduction, 340% ROI, 4.2 months payback; 65% productivity growth, 8+ systems unified, 5M+ records/hour, 87% BI adoption, -70% support tickets; 99.9% uptime, <200ms response.
Position & Rationale
I’d repeat the Azure Service Fabric + API Gateway + multi-database pattern when unifying 6+ enterprise systems with real-time sync and strict security. I’d keep ARM templates and security certificates at the gateway and centralised observability (Application Insights, health checks) from day one. I’d avoid big-bang integration: we phased systems (SAP, Cherwell, Power Apps, etc.) and used Saga and event-driven patterns so failures didn’t cascade. I wouldn’t repeat 8+ systems in one go without a clear integration order and ownership; we learned that the hard part is who owns each connector and how to handle partial failure. I also wouldn’t skip the release summary and risk register for change boards—enterprise governance expects evidence.
Trade-Offs & Failure Modes
What this sacrificed: Some agility—we had fixed release windows and change-board approval; we also accepted the cost of Service Fabric and multi-database (SQL + Cosmos + Redis + Data Lake) instead of a single store.
Where it degrades: When legacy systems change APIs without notice or when ownership of connectors is unclear; then sync breaks and blame bounces. It also degrades when observability is added late—we had centralised logging and health from the start, which saved us in production.
How it fails when misapplied: Using Service Fabric for a small app with 2–3 services; or integrating 8 systems without a clear order and failure handling (Saga, retry, circuit breaker). Another failure: skipping security and cert rotation at the gateway.
Early warning signs: “We’re integrating system X but nobody owns the adapter”; “our sync is eventually consistent but we never defined eventual”; “the change board asked for a rollback plan and we didn’t have one.”
What Most Guides Miss
Most case studies list the stack and outcomes. The hard part is integration order and failure handling: we didn’t plug in 8 systems at once—we had a sequence (e.g. SAP first, then Cherwell, then Power Apps) and each had adapters, retry, and health checks. The other gap: who owns what. Platform owned API Gateway and Service Fabric; product teams owned their microservices; a dedicated integration team owned connectors and sync. Without that split, we’d have had finger-pointing. Finally: real-time sync across 8 systems is not “eventually consistent” by default—we had to define what “real-time” meant (e.g. <5 min for non-critical, <30 s for critical) and design for partial failure (e.g. one system down shouldn’t take down the rest).
Decision Framework
If you’re unifying 6+ enterprise systems → Use a gateway (e.g. Azure API Gateway) for auth and routing; use Service Fabric or similar for microservices; use Saga/event-driven for cross-system flows; define integration order and ownership.
If legacy systems have fixed or brittle APIs → Build adapters with retry, circuit breaker, and health checks; don’t assume sync is instant—define SLA (e.g. <5 min) and document partial-failure behaviour.
If you have a change board → Prepare release summary, rollback plan, and pipeline evidence; align release cadence with board schedule.
If observability is an afterthought → Don’t; add Application Insights (or equivalent), health checks, and correlation IDs from day one.
If ownership of connectors is unclear → Assign before integration starts; otherwise failures and changes become nobody’s problem.
Key Takeaways
Unifying 8+ enterprise systems requires a clear integration order, adapters with retry and health, and defined ownership per connector.
Azure Service Fabric + API Gateway + multi-database (SQL, Cosmos, Redis, Data Lake) fit this scale; keep security (OAuth, certs, Key Vault) and observability (Application Insights) at the centre.
Use Saga and event-driven patterns for cross-system flows so partial failure doesn’t cascade; define “real-time” (e.g. <5 min) and document partial-failure behaviour.
Don’t add observability late—correlation IDs, health checks, and centralised logging from day one save you in production.
When I Would Use This Again — and When I Wouldn’t
I would use this approach again when I’m leading or advising on an enterprise platform that must unify 6+ systems (ERP, HR, analytics, etc.) with real-time sync, security, and governance. I wouldn’t use it for a single-product app or a startup with 1–2 systems—then a simpler integration (e.g. one API gateway, one or two backends) is enough. I also wouldn’t use it when the organisation can’t commit to ownership of connectors and observability; without that, integration becomes a bottleneck. Alternative: if you’re only integrating 2–3 systems, consider a single BFF or gateway plus event-driven sync instead of full Service Fabric; scale the pattern as you add systems.
Frequently Asked Questions
Frequently Asked Questions
What was the tech stack?
Azure Service Fabric for orchestration; .NET Core for services; Azure SQL and Cosmos DB for data; Redis and Azure Data Lake; Azure Service Bus and Event Grid; Azure API Gateway (ARM, security certificates) at the edge; OAuth 2.0, JWT, Azure AD; Key Vault; Application Insights, Azure Monitor, Log Analytics. CI/CD with Azure DevOps and ARM templates. Integrations: SAP Planet 8/9, Cherwell HR/IT, Power Apps, SharePoint, Teams.
How many systems were integrated?
8+ enterprise systems: SAP Planet 8/9, Cherwell HR/IT, Power Apps, Microsoft SharePoint, analytics platforms, and data lake solutions. Each integrated with custom connectors, data transformation, and health monitoring.
Why Azure Service Fabric and not AKS?
The project used Azure Service Fabric for microservices orchestration, with Azure API Gateway at the edge. Service Fabric was chosen for enterprise-scale reliability, stateful services where needed, and alignment with existing Azure investments. API Management/Gateway provided routing, throttling, and policy.
What was deployment frequency?
Pipeline and ARM templates enabled repeatable deployments; blue-green strategy for zero-downtime updates. Deployment frequency improved as governance (security scans, tests, ARM) was automated and trusted.
How were events and integration handled?
Azure Service Bus and Event Grid for messaging and event-driven processing. Real-time data synchronization across 8+ systems with conflict resolution, data validation, and custom sync engine. Saga pattern for distributed transactions; retry and dead letter queues for reliability.
What was the biggest challenge?
Complex enterprise system integration: maintaining data consistency, real-time sync, and business process integrity across 8+ heterogeneous systems and distributed microservices. Addressed with Saga pattern, Azure API Management, custom connectors, real-time sync engine, and integration health monitoring.
How did reliability and performance improve?
99.9% uptime; < 200ms response time; 5M+ records/hour. Achieved through Azure Service Fabric orchestration, multi-database optimisation (SQL, Cosmos, Redis), API Gateway load balancing and circuit breakers, observability (Application Insights, Azure Monitor), and runbooks and post-mortems.
What was the observability approach?
Application Insights for logs, metrics, and distributed tracing; Azure Monitor and Log Analytics; correlation IDs and health checks across services and integrations; alerting for API failures, data inconsistencies, and performance degradation.
How were secrets managed?
Azure Key Vault for secrets; OAuth 2.0 and Azure AD for identity; security certificates for API Gateway and TLS. No credentials in code or config in source control; Managed Identity where applicable.
What was the database strategy?
Azure SQL for transactional, strongly consistent workloads; Cosmos DB for document storage and global distribution; Redis for caching (98% hit rate); Azure Data Lake for analytics. Each service owned its data where appropriate; ETL and data validation for consistency across 8+ systems.
What was the role of API Gateway?
Azure API Gateway at the edge for routing, throttling, security (OAuth 2.0, JWT, certificates), and policy. ARM templates for Infrastructure as Code; blue-green deployments; rate limiting (e.g. 1000 req/min per system), request transformation, API versioning; 99.5% SLA monitoring.
How were microservices tested?
Unit tests per service; integration tests with mocked dependencies or test containers; contract tests for API compatibility across 8+ integrations. Security and dependency scans in CI; ARM templates validated in pipeline.
What was the CI/CD pipeline?
Azure DevOps: build, test, security scan, deploy using ARM templates and Azure Resource Manager. Blue-green or zero-downtime deployments; pipeline as the gate for quality and governance.
Was there cost optimisation?
$5.2M+ annual cost savings; 70% operational efficiency gain; 75% manual processing reduction. Right-sized resources, auto-scaling, reserved instances where appropriate; monitoring for waste across services and integrations.
How was the team structured?
Cross-functional teams owning services and integrations end-to-end. DevOps culture: build and run ownership, runbooks, on-call. ADRs and ARM for architectural and infrastructure decisions.
What lessons were learned?
Automate governance (tests, scans, ADRs, ARM) so that speed and compliance go together. Invest in observability early (correlation IDs, traces, runbooks). Saga and idempotency are critical for distributed integration. API Gateway and ARM centralise routing, security, and infrastructure and enable zero-downtime releases. Runbooks and post-mortems sustain reliability.
Related Guides & Resources
Explore the matching guide, related services, and more articles.