Conundrum: Docker Compose Gets You to the Demo. In Regulated Domains, Here Is What Gets You to Production.

I built an APRA-compliant insurance platform in my spare time to prove a point. Then I asked an honest question: could I actually run it in production? The answer revealed something counterintuitive about regulatory burden.

In my previous post, I made the case that the enterprise modernisation playbook is broken. The evidence I offered was UnderwriteAI: a production-grade, APRA-compliant insurance platform I built entirely in my spare time, using GitHub Copilot powered by Claude Sonnet 4.6, across 41 working sessions. Eight microservices, a React portal, an API gateway, real-time Kafka event streaming, 156 automated BDD test scenarios, and a live demo in which an AI agent executes the complete insurance policy lifecycle in eleven natural language commands.

The platform works. The demos are compelling. The test coverage is real.

And then I asked a more uncomfortable question: could I actually run this in production?

The Docker Compose Fiction

The current deployment descriptor for UnderwriteAI is a single docker-compose.yml file. It starts 29 containers on a single machine, hardwires service discovery via a Docker bridge network, and manages persistence through named volumes on the local file system. It works perfectly on my MacBook. It has worked perfectly for 41 sessions of development and demonstration.

It is not a production deployment model.

Docker Compose is a development orchestration tool. It assumes a single host. It has no concept of the machine being unavailable. If the host restarts, you run docker compose up and everything comes back. If a container crashes, the restart: unless-stopped directive brings it back on the same host. If load increases and a service needs more instances, Docker Compose cannot scale to meet it. There is no concept of a rolling deployment. There is no concept of a disruption budget. There is no way to say "this service requires at least one replica to be available at all times."

None of this matters for development. All of it matters for production.

I'm not raising this as a gap in the AI-assisted development story. I'm raising it because the distinction between "working software" and "production software" is consistently underweighted in the industry conversation about what AI-accelerated development can actually deliver. Working software is a necessary condition. It is not a sufficient one.

Resilience Is Not a Feature. It Is a Deployment Architecture.

The regulatory context sharpens this considerably.

APRA's CPS 230, which came into effect on 1 July 2025, sets explicit requirements for operational resilience in Australian regulated entities. It requires demonstrated availability controls: documented tolerance for disruption, tested recovery procedures, and evidence that critical business services can withstand realistic failure scenarios.

An insurance platform running on a single Docker host does not satisfy CPS 230. It cannot, structurally. There is no redundancy. There is no automated failover. There is no mechanism for demonstrating controlled disruption.

The standard artefacts that satisfy CPS 230 requirements in a modern deployment model are Kubernetes-native constructs: Pod Disruption Budgets (defining how many replicas can be unavailable during voluntary disruption), HorizontalPodAutoscalers (scaling replicas in response to load, ensuring capacity under demand), rolling update strategies (allowing new versions to be deployed without service interruption), and liveness and readiness probes (enabling the cluster to remove unhealthy instances from the load pool automatically, without human intervention).

These are not nice-to-have engineering hygiene items. For a regulated insurer, they are the substance of the operational resilience capability that a prudential regulator asks you to demonstrate.

An enterprise programme that defers infrastructure architecture to a later phase is deferring the regulatory capability itself. It cannot be discovered in integration. It has to be designed in.

Twenty-Nine Containers, Three Categories, One Tractable Problem

As always, I want to be specific here, because the move from Docker Compose to Kubernetes is often described at a level of abstraction that makes it sound either trivial ("just deploy the containers differently") or impossibly complex ("you need a dedicated platform team"). Neither characterisation is accurate.

The 29 containers in my stack fall into three categories, and each requires a different approach.

Category 1: Application services (nine containers)

Eight Java microservices and the React frontend. For each of these, the Kubernetes work is mechanical. A Deployment manifest encoding replica count and the resource limits already documented in the project's architecture guide. A Service manifest for internal cluster DNS. A HorizontalPodAutoscaler targeting 70% CPU utilisation with a minimum of one replica and a maximum of three. A PodDisruptionBudget with minAvailable: 1. Liveness and readiness probes wired to the Spring Boot Actuator health endpoints that already exist in every service.

This is templatable. The services share enough structural similarity that nine manifests can be generated from a single template with per-service variable substitution. That is what Helm charts are: parameterised Kubernetes manifest templates with environment-specific values files.

In practice, the structural approach is a single deployment.yaml template that iterates over a services: map using a Go template range loop. All eight microservices are declared as entries in values.yaml under a shared key. The template renders one Deployment, one Service, one ConfigMap, and one Secret per entry, and the only per-service inputs are port numbers, database credentials, and the small number of service-specific environment variables (Redis cache config for the premium service, document storage paths for the document service). The alternative of one template file per service produces eight times the maintenance surface area for changes that are structurally identical across all eight.

Non-sensitive configuration (datasource URLs, Kafka bootstrap addresses, Keycloak JWK endpoints) goes into ConfigMap. Passwords and signing keys go into Kubernetes Secret objects using stringData. The two are mounted into the container together via envFrom. A checksum/config annotation on the Deployment (a SHA-256 hash of the ConfigMap content) ensures that updating a config value triggers a rolling restart automatically, without requiring a manual image rebuild. That is a Helm convention, not a Kubernetes built-in. Kubernetes does not watch ConfigMap content directly. What it watches is the Deployment spec, and when Helm recalculates the hash on the next helm upgrade and writes an updated annotation value, the spec has changed, so Kubernetes sees a new revision and triggers a rolling update. The end result is automatic configuration change propagation; the mechanism is a chart-level pattern built on top of standard Kubernetes rollout behaviour.

The liveness probe wires to /actuator/health/liveness and the readiness probe to /actuator/health/readiness, the Spring Boot Actuator endpoints that already exist in every service. No additional instrumentation is required.

Category 2: Infrastructure services (20 containers)

This is where the real work is. PostgreSQL (ten databases: eight for the application services, plus dedicated instances for Keycloak and Kong, each maintaining their own schema, kept as separate containers to preserve the database-per-service isolation pattern), Redis, Apache Kafka, Zookeeper, Confluent Schema Registry, Keycloak, Kong API Gateway, Mailpit, Prometheus, Grafana, and Swagger UI.

For none of these do you write manifests from scratch. The ecosystem provides well-maintained community Helm charts: Bitnami's postgresql chart, Bitnami's kafka chart, the official Kong chart, the kube-prometheus-stack umbrella chart. The work is configuration: translating the environment variables in the Docker Compose file into the values schema expected by each community chart, ensuring persistent storage is correctly provisioned via PersistentVolumeClaim objects, and preserving the service interconnections (the Kafka bootstrap address, the Schema Registry URL, the Keycloak JWK endpoint) that the application services depend on.

This is the category that consumes most of the effort in any real Kubernetes migration. Configuration surface area is large, the community chart schemas differ from what you'd design yourself, and the failure modes during initial bring-up are obscure. It takes iteration.

The specific friction point in this stack is service discovery. Docker Compose's bridge network uses the service name as a DNS hostname (policy-db, kafka, redis), and every microservice's configuration already hardwires those names as spring.datasource.host, spring.kafka.bootstrap-servers, and so on. The default behaviour of the Bitnami community charts is to name Kubernetes services using the Helm release name as a prefix: a release named underwriteai with a PostgreSQL subchart aliased as policy-db would create a service called underwriteai-policy-db, not policy-db. That prefix would break every microservice's database connection configuration without a single changed line of application code.

The solution is fullnameOverride. Every infrastructure dependency in the umbrella chart's dependency declarations includes a fullnameOverride value matching the Docker Compose hostname exactly. The result is Kubernetes service DNS names that are identical to the docker-compose names, which means the application configuration files require zero changes. The umbrella chart for UnderwriteAI declares 14 dependencies: ten aliased bitnami/postgresql instances, bitnami/redis, bitnami/kafka, bitnami/keycloak, and kong/kong. Each has a fullnameOverride.

Two infrastructure charts offer meaningful topology differences between environments. The Bitnami Kafka chart supports KRaft mode (Kafka's internal Raft consensus mechanism, available from Kafka 3.3), which eliminates the Zookeeper dependency entirely. In the Kubernetes deployment, the chart runs single-node KRaft in development (one pod, no Zookeeper sidecar) and scales the controller pool to three replicas in production. This is a cleaner topology than the docker-compose configuration, which still runs a separate Zookeeper container because the docker-compose image predates the KRaft stabilisation. The Redis chart runs in standalone mode for development and switches to replication with Sentinel enabled in the production values file.

A question that arises consistently at this point in the conversation: who operates a Kubernetes cluster? For most organisations deploying a single application of this scale, the answer is that you do not operate the control plane. EKS (AWS), AKS (Azure), and GKE (Google Cloud) provide Kubernetes as a managed service; the control plane is the cloud provider's operational responsibility. What you need is someone who can write and maintain Helm charts, understand the cluster's operational model, and own the deployment pipeline. For an eight-service application, that is one person with a platform engineering or SRE background, not an organisational function. The 'dedicated platform team' threshold is real for organisations running hundreds of services. It is not the right framing for a greenfield deployment of this scale, and treating it as such is how the infrastructure conversation gets indefinitely deferred.

helm dependency update resolves and downloads all 14 dependency charts into a local charts/ directory in a single command. The pull takes roughly 90 seconds on a reasonable connection. The output names the exact chart version pulled for each dependency (bitnami/postgresql:18.5.24, bitnami/kafka:32.4.3, bitnami/keycloak:25.2.0), which is the version-pinned audit trail the regulatory framework expects of dependency management.

Category 3: Secrets (a category of its own)

The Docker Compose file contains roughly 40 plaintext credentials: database passwords, Redis authentication strings, JWT signing keys, Kafka configuration. Every one of these needs to be removed from the manifest layer and replaced with a Kubernetes Secret reference before this stack goes anywhere near a production cluster.

This is not just a security requirement. It is a baseline expectation of any modern infrastructure audit. Credentials hardcoded into deployment files cannot be rotated cleanly, cannot be scoped by environment, and cannot be managed without modifying source-controlled configuration. Kubernetes Secret objects are the minimum viable solution. A full implementation would use a secrets management tool such as HashiCorp Vault with sidecar injection, but that is a subsequent step. The immediate requirement is to remove the plaintext.

The reason secrets warrant their own category is that the problem class is distinct from both application configuration and infrastructure topology. The challenges are operational: how do you rotate a database password without downtime? How do you promote credentials across environments without committing them to source control? How do you produce an audit trail showing which workloads accessed which credentials, and when? These questions have answers (External Secrets Operator pulling from AWS Secrets Manager, HashiCorp Vault with Kubernetes auth, sealed secrets for GitOps workflows), but each introduces operational surface area that needs to be staffed, monitored, and tested. In a regulated domain, the audit trail for secret access is as important as the audit trail for deployment configuration. They are separate records requiring separate toolchains, and conflating them is where the scope of this work expands unexpectedly.

This three-layer framing (application services, infrastructure services, secrets) applies to any containerised migration, regardless of stack. The proportions of effort will vary depending on how much of your infrastructure is already cloud-native; the categories will not.

A note on sequencing: the three categories do not need to be resolved in parallel. Start with Category 1 (application services). It is mechanical, and the process of templating eight structurally similar deployments builds the chart familiarity required for the harder infrastructure work. Move to Category 2 (infrastructure services) dependency by dependency rather than attempting a full migration in a single pass. Address secrets management early in the Category 2 phase, not as a final step. Establishing how credentials flow through the system while infrastructure charts are being wired is significantly less disruptive than retrofitting a secrets model after 14 dependency charts already have credentials embedded in their values files.

The Deployment Pipeline and the Compliance Audit Trail Are the Same Thing

There is a non-obvious benefit to this work that I want to name directly.

Helm charts are infrastructure-as-code. They are source-controlled, versioned, and diffable. Every change to the deployment configuration produces a commit. Every deployment can be rolled back with a single command. The entire history of how the platform has been deployed is preserved in the repository.

For a regulated insurer, this is not incidental. The ability to produce an immutable record of what was deployed, when, and with what configuration is a compliance requirement in its own right. Docker Compose running on a development laptop is the opposite of this. A Helm chart in a version-controlled repository with a CI/CD pipeline running helm upgrade is exactly the audit trail the regulatory framework expects.

The operational resilience capability and the auditability capability are not separate concerns. They are the same work, expressed at the infrastructure layer.

The Helm chart is not a deployment pipeline on its own. That distinction matters for the compliance story. A CI/CD pipeline running helm upgrade on a validated merge to main automates deployment execution. A GitOps controller such as ArgoCD or Flux takes this further: the desired cluster state is declared in version control, and the controller continuously reconciles the cluster against it. The compliance value is in the approval gate. A pull request review and merge approval on the values file is the change management record. The deployment cannot proceed without it, and the audit trail is the repository history rather than a separate ITSM ticket. For regulated organisations, this collapses the deployment toolchain and the change management toolchain into a single artefact. That is not a small thing.

The Production Readiness Question Always Gets Asked. The Variable is When.

I have sat in a large number of enterprise architecture reviews over 20 years. A recurring pattern: the question "how will this run in production?" is asked late, often after significant investment in application design, and the answer frequently requires renegotiating assumptions that were baked in at the beginning.

Container orchestration is a specific example of this. Organisations that built their containerisation strategy on Docker Compose or Docker Swarm (a reasonable early-phase choice) found themselves rearchitecting the operational layer when production requirements became concrete. The application code was fine. The infrastructure model needed to change.

The pattern is repeatable because the organisational incentive is to demonstrate capability quickly. Docker Compose lets you demonstrate working software on a laptop in a review meeting. That is genuinely useful. But the demonstration creates an impression of production-readiness that can persist longer than it should.

I am not immune to this. I made the same choice. UnderwriteAI runs on Docker Compose because it let me ship working software quickly and demonstrate the full platform in a compelling way. That was the right choice for the phase I was in.

The right choice for the next phase is different. This pattern of working software that is not yet production software is one I encounter consistently in enterprise modernisation engagements, and it is rarely the result of poor engineering. It is the structural consequence of an organisational incentive to demonstrate capability quickly. The demonstration was a success; the problem is that its implied production-readiness tends to outlast the phase it was designed for.

The Counterintuitive Advantage of Regulated Domains

Here is the observation that tends to surprise people when I raise it: in my experience, organisations operating in regulated domains have this conversation earlier and with less organisational resistance than their unregulated counterparts.

The reason is structural. APRA does not ask whether you have thought about resilience. It asks you to demonstrate it, before you operate at scale. CPS 230 requires documented tolerance for disruption, tested recovery procedures, and evidence of availability controls. It is not a checkbox exercise. An auditor will ask to see the Pod Disruption Budgets, the rollback procedures, the incident response runbooks. The regulator has, in effect, mandated that the production infrastructure conversation happen before go-live.

That is an uncomfortable constraint when you first encounter it. It adds work to a phase of the programme that feels like it should be focused on features. But the constraint is doing something useful: it prevents the infrastructure debt from accumulating in the first place.

Compare this to the pattern I have observed consistently in non-regulated organisations. The production readiness conversation gets deferred. The team ships features. The deployment model that worked for the demo becomes the deployment model for production, because changing it would delay the launch. The launch happens. For some period, the single-host deployment holds. Then load increases, or a dependency fails, or a deployment goes wrong and there is no rollback path, and the production infrastructure conversation finally happens. Now it is happening under operational pressure, with real customers affected, in a remediation context rather than a design context. The cost is higher, the options are narrower, and the team is working against the clock.

This cost has been quantified. Google Cloud's DORA programme has tracked software delivery performance across tens of thousands of practitioners for over a decade. A consistent finding: high-performing organisations excel at both speed and stability simultaneously. The assumption that trading production infrastructure maturity for early-phase delivery velocity is a rational choice does not hold up in the data. DORA's 2019 research found that elite performers were more than 23 times more likely to have fully adopted flexible cloud infrastructure than low performers. Their 2023 report found that organisations leveraging flexible infrastructure demonstrate 30% higher organisational performance than those that lift and shift without adopting cloud-native practices. The 2024 report was direct: 'simply migrating to the cloud without adopting its inherent flexibility can be more harmful than staying in a traditional data center' (Accelerate State of DevOps Report 2024). Deferred infrastructure work does not preserve optionality. It compounds a performance deficit.

It is worth naming the governance pattern underneath this. The team that makes the deferral decision is rarely the team that inherits the remediation cost. The engineering team that shipped the demo successfully moved on to the next programme. The operations team, or the team contracted to modernise the platform six months later, inherited the production stability debt. There is a funding structure that reinforces this split. The delivery programme is capitalised: CAPEX, with a defined budget and a clear end date, typically overseen by an executive sponsor accountable for shipping on time. The team that inherits the platform operates under OPEX, a cost centre under sustained pressure to reduce expenditure year on year. The production stability debt crosses the boundary between those two funding models invisibly. It does not appear in the CAPEX programme's final cost. It appears as operational overhead in a budget that was already too small. This is a governance gap, not an engineering failure. The incentive to demonstrate working software quickly is rational for the team that faces it. The cost falls elsewhere, to someone who was not in the room when the deferral was decided.

I have lived both versions. The regulated path feels slower at the time. In retrospect it is faster, because you do not pay the production stability debt after launch.

The lesson for technology leaders in non-regulated domains is uncomfortable but clear: the regulator is not the reason to build production-grade infrastructure before go-live. The reason is that it is cheaper and less risky to build it before go-live than after. The regulator is simply the external forcing function that makes regulated organisations do what all organisations should be doing anyway.

If your programme does not have a regulator imposing that constraint, consider voluntarily imposing it yourself. Define your production readiness criteria at the start of the programme, and make them specific enough to be binding. 'We will use Kubernetes' is not a criterion. 'Helm charts passing helm lint before the first sprint' is. 'We will manage secrets properly' is not a criterion. 'No credentials in deployment files before the first integration environment' is.

Add observability to that list explicitly. An instrumentation layer is not the same as a monitoring capability, and the difference is only visible under production load. Defined service level objectives, alerting on SLO breach, and enough baseline telemetry to distinguish normal behaviour from abnormal are as much a production readiness criterion as liveness probes. Without them, the first indication of a degraded service is a customer complaint.

Treat container orchestration model, secrets management approach, liveness and disruption budget configuration, observability baselines, and tested rollback procedures as launch-blocking requirements with named exit criteria in the programme charter. The charter is the right place for these precisely because it is agreed before anyone has an incentive to defer them.

The conversation will happen eventually. The only question is whether it happens while you still have the full set of options available, and while the team that defined the architecture is still in the room.

From Blank Directory to 145 Resources: What the Work Actually Involved

I completed this work between writing the first and second drafts of this post, so I can give a precise account rather than an estimate.

The finished umbrella chart renders 145 Kubernetes resources in total: 37 from the custom templates (8 Deployments, 8 Services, 8 ConfigMaps, 8 Secrets, 1 PVC for document storage, 1 frontend Deployment, 1 frontend Service, 1 frontend ConfigMap, 1 Ingress) and 108 from the 14 dependency sub-charts. helm lint reports zero failures. helm template against the development values file completes without errors and produces valid YAML for every resource.

The directory structure, consistent with the project's architecture decision records on file organisation:

infrastructure/helm/
├── underwriteai/                       # Umbrella chart
│   ├── Chart.yaml                      # 14 dependency declarations
│   ├── values.yaml                     # Default values (development credentials)
│   └── templates/                      # 10 files
│       ├── deployment.yaml             # Range loop over services map
│       ├── service.yaml
│       ├── configmap.yaml
│       ├── secret.yaml
│       ├── hpa.yaml                    # HorizontalPodAutoscaler
│       ├── pvc.yaml
│       ├── frontend-deployment.yaml
│       ├── frontend-service.yaml       # Service + ConfigMap + Ingress
│       ├── _helpers.tpl
│       └── NOTES.txt
└── values/
    ├── values-dev.yaml                 # 1 replica, 2Gi PVCs, Always pull
    └── values-prod.yaml                # HA replicas, TLS, empty passwords

The entire chart (application templates and all 14 infrastructure dependencies) was produced in a single AI agent session. Total elapsed time from blank directory to passing helm lint: approximately 15 minutes.

Again, I want to be precise about what that means, because it is easy to read "15 minutes" and conclude that the work was simple. It was not. The application template work was mechanical: the range-loop design over a services map is a structural pattern, and the shared Spring Boot probe configuration required no per-service customisation. But the infrastructure configuration work (translating 20 Docker Compose container definitions into correctly wired Helm dependency values, discovering fullnameOverride, resolving the Bitnami chart schemas across 14 dependencies) is exactly the category of work that a senior platform engineer, doing it manually, would have allocated the better part of a day to (if not multiple days). The fullnameOverride problem alone (understanding why the Bitnami chart was not producing the expected DNS name and finding the correct values key to override it) is the class of problem that documentation does not surface until you encounter it. It appears in the Bitnami chart's values.yaml on line 63, unremarked, between unrelated configuration items.

The AI agent resolved it in minutes. This is the part of the AI-accelerated development story that the industry has not yet fully priced in: the compression is not happening in feature development alone. It is happening in the infrastructure layer that was previously the primary bottleneck to production readiness.

The programme-level implication is sharper than it might initially appear. If your organisation is treating Kubernetes migration as a multi-quarter platform programme requiring specialist hiring, and a principal architect in a competing organisation can produce a validated 145-resource chart in 15 minutes, the competitive gap is not only in feature velocity. It is in infrastructure maturity. Organisations that have internalised AI-assisted development at the infrastructure layer are arriving at production-ready deployment configurations in the time it previously took to write the design document for one. The distance between working software and production software has not disappeared. It has shortened to the point where deferring it is a choice, not a constraint imposed by capability.

The secrets layer remains an outstanding item. The development values files contain the same plaintext credentials used in Docker Compose, which is acceptable for a private development repository. A production deployment requires every credential reference replaced with either a --set flag at deploy time or an External Secrets Operator integration pulling from AWS Secrets Manager or HashiCorp Vault. The production values file is structured to make this transition explicit: every password field is set to an empty string with a # REQUIRED: override via --set comment. The shape of the secrets surface is defined; the management mechanism is deferred.

The prerequisite for validating this in a real cluster is a local Kubernetes environment. Docker Desktop includes one (Settings → Kubernetes → Enable Kubernetes). That is sufficient for development and cluster-level validation before deploying to a managed service such as EKS, AKS, or GKE.

If You Don’t Have a Regulator, Consider Becoming Your Own

The broader point I want to leave with technology leaders is this.

AI-assisted development has materially shortened the distance between intent and working software. That is real, and the implications for enterprise programme economics are significant, as I argued in the previous post.

But the distance between working software and production software has not shortened by the same factor. Infrastructure architecture, operational resilience design, secrets management, and the regulatory capability layer that sits on top of all of it are still substantial engineering work. AI tooling helps with the mechanical parts. The design judgements are still human.

The risk for organisations that have adopted AI-assisted development without yet internalising this distinction is that they are delivering working software faster than their infrastructure capability can absorb. Demos improve. Release pipelines, operational resilience frameworks, and audit-ready deployment configurations do not automatically improve alongside them.

Product-led modernisation, the position I argued for in the previous post, does not mean "ship features and work out production later." It means the path to production should be short and known from the beginning. Feature velocity and infrastructure maturity need to advance together, or the gap between what you can demonstrate and what you can actually operate at scale will quietly widen.

I’m closing that gap on my own platform now. It is, predictably, the hardest part of the project.

Link to my previous post 👉 "The Enterprise Modernisation Playbook Is Broken. I Know Because I Helped Write It ... "

I’m Tyrell Perera, an Enterprise Solutions Architect and Fractional CTO with 20+ years of experience leading digital transformation in Insurance, Telecommunications, Energy, Retail, and Media across Australia. The gap between working software and production software is the one I see most consistently underestimated in enterprise modernisation programmes, regardless of how well the application development has gone. If you’re leading a programme where the working software story is strong and the production readiness story is not yet written, that is the specific conversation I’m set up for. Find me at tyrell.co or on GitHub.

Monday, April 20, 2026

Docker Compose Gets You to the Demo. In Regulated Domains, Here Is What Gets You to Production.