Resilience Proven, Not Promised: Why It’s Important To Have A Platform Built For Real World Challenges

Financial services infrastructure gets tested in ways you can't always predict. October 2025 provided one of those tests when cloud service disruptions affected platforms across the industry.

During that outage, the real story wasn’t the disruption itself. It was the divide it exposed. Some platforms strained under the sudden load, scrambling to reroute traffic or restore services. Others held steady, absorbing the shock without breaking stride. Moments like this reveal which systems are engineered for convenience and which ones are engineered for continuity.

 

The Hidden Cost of Dependencies

The average cost of downtime for large enterprises in financial services can reach $9,000 per minute. Across the industry, downtime costs companies $152 million annually. But the bigger cost is often invisible until a disruption occurs.

During a recent event, one significant challenge emerged: the ability to provision new infrastructure resources. The cloud provider's capacity to create new virtual machines on demand was seriously impacted. Platforms that constantly needed new servers to handle normal operations suddenly couldn't scale. Systems that required frequent certificate renewals or mid-transaction authentication validations faced interruptions.

Some platforms didn't need any of that during the disruption. They were already running at the capacity they needed, with appropriate buffers built in. Their databases were operational. Their application servers were running. They didn't require the cloud provider to provision anything new because their architecture doesn't demand constant infrastructure changes to process transactions.

 

Early Decisions That Prove Their Value Later

The resilience that shows up during disruptions isn't built during the crisis. It's the result of architectural decisions made years before, sometimes before a single customer goes live.

At Episode Six, several early decisions shaped how our platform responds when conditions aren't perfect.

Infrastructure as code from day one. Even when Episode Six was very small, running demonstration environments and development portals, we chose the harder path. Instead of manually configuring servers, we invested in writing code to manage infrastructure. That code scaled with us. Today, it means we can manage complex, global deployments consistently, and we're not dependent on being able to manually provision resources during a crisis.

Proprietary framework built for our needs. We wrote Elements, our open source framework, instead of using an off-the-shelf solution. This created a steep initial climb but gave us complete control over our stack. We're not waiting to see what changes a third party makes that could complicate our operations. When dependencies need updating, we do it on our timeline, not someone else's.

Flexible infrastructure requirements. Our platform is engineered to run efficiently across a wide range of infrastructure configurations. Rather than rigidly demanding one specific type of virtual machine, we’re designed to perform well with multiple options. This architectural flexibility means we can maintain performance even when specific resources are constrained. It's resilience through design, not compromise.

Running at appropriate capacity. We don't operate at the bare minimum and constantly scale up and down. We maintain capacity for our workload plus appropriate buffers. That means we're not constantly asking our cloud provider to provision new resources during normal operations but always retain the option to do so.

Active-active.. Our platform runs simultaneously across multiple cloud availability zones, with multiple environments actively processing real transactions at all times. This isn't a backup waiting to take over. If one zone experiences issues, the other continues handling transactions. No disaster recovery procedures needed. No manual intervention required. The platform keeps processing transactions as if nothing happened.

 

How Episode Six Stayed Online

During the outage referenced above, our approach was deliberate and simple. We immediately ceased any changes to our system. Our databases were already running at the scale we needed. Our application servers were already deployed. We weren't constantly talking to our cloud provider for every request going through our system.

When the cloud provider started having problems, it didn't affect us because we weren't trying to use those services. As soon as we noticed an issue, we closed the blast doors, so to speak. Everything we needed was already inside and operational.

The platforms that struggled were often those with architectures that assumed the cloud provider would always be available to provision resources or authenticate requests in real time. When those capabilities became degraded, those platforms degraded with them.

Our architecture is designed to leverage cloud capabilities dynamically when they're available. But it doesn't require them to process transactions. We could run our platform in a dynamic, scalable way when conditions allowed. But during the disruption, it ran in a stable, independent way instead.

 

The Foundation That Made It Possible

What enabled this wasn't a disaster recovery plan executed during the crisis. It was architecture built into the platform from the beginning.

We use cloud capabilities extensively: horizontal scaling, API-first interactions, and dynamic resource management. But we don't couple tightly to provider-specific services that could become unavailable. We use a commodity database platform that could run on multiple clouds providers.

When a transaction comes through our system, we're not making constant external requests to validate, authenticate, or authorize every step. The system is self-contained for its core operations. The platform can be updated, scaled, and modified constantly if needed. But it can also run for extended periods without any changes.

 

Building for Reality

For financial services leaders planning modernization initiatives or evaluating infrastructure partners, these events create clarity. They reveal which platforms were built for real-world conditions and which ones work only when everything is perfect.

When you're processing billions of transactions and serving customers who depend on immediate access to their funds, you can't afford to discover architectural limitations during a disruption. You need infrastructure built from day one to handle the inevitable moments when services stumble.

That's not pessimism. That's preparation.

Subscribe to our blog