Despite AWS failure: MQTT platform EMQX reports stable systems, explains architectural principles
When the Amazon cloud region us-east-1 failed on October 20, 2025, numerous online services and IoT applications worldwide were affected. EMQ Technologies (EMQ), operator of the IoT data platform EMQX Cloud, reported only limited impact. The company explained its resilience to the widespread cloud outage in its own blog post.
EMQ was founded in 2017 and has its roots in the open-source project EMQX. The provider develops messaging systems based on the MQTT protocol (Message Queuing Telemetry Transport), which is one of the most important communication standards in the Internet of Things worldwide.
The EMQX platform is designed as a scalable MQTT broker and, according to the vendor, processes millions of concurrent connections and billions of messages per day. In addition to an enterprise version, EMQ also offers a fully managed cloud variant that can run on several major cloud platforms.
According to EMQX: Stable operations despite the AWS outage
According to EMQX, the core messaging service remained largely stable during the disruption in the AWS us-east-1 region. Only a few customers reportedly experienced brief connection issues. The main impact was on integration features that connect to other AWS services such as DynamoDB, Kinesis, or S3.
After AWS systems were restored, the affected integrations recovered automatically, the company said. The core platform — that is, the MQTT communication service — remained stable.
Rationale: Architectural principles for resilience
EMQ attributes the stability to several technical principles that, by its own account, have been consistently implemented since the platform’s launch:
1. Limited cloud dependencies:
EMQX Cloud reportedly relies only on foundational services such as EC2 (compute) and NLB (load balancer). This “reduced architecture” lowers dependencies on complex cloud services and thus the risk of cascading failures.
2. Multi-AZ architecture (distribution across multiple availability zones):
Even smaller EMQX clusters are spread across multiple AWS zones to absorb local disruptions. This approach does increase operating costs, but is intended to effectively mitigate failures of individual data centers.
3. Multi-cloud strategy:
According to the provider, EMQX is built to be cloud-agnostic. Customers can run the platform with an identical configuration on Microsoft Azure or Google Cloud as well. This reduces dependency on a single cloud provider and enables alternative disaster recovery strategies.
“Even for our smallest Dedicated Flex deployments, we insist on distributing EMQX broker nodes across multiple availability zones. Yes, this increases costs due to cross-AZ traffic. Yes, it adds complexity to our deployment automation. But we believe it’s non-negotiable for production IoT workloads.” – Benniu Ji, VP of Product at EMQ
Lessons for the IoT sector
From the incident, EMQ draws several conclusions:
- Resilience is not limited to the MQTT broker but covers the entire data pipeline, including databases, storage, and integration points.
- High availability alone is not enough; critical applications require full disaster recovery strategies across regions or clouds.
- Cluster Linking, an EMQX feature, enables active connections between regional clusters. This allows global IoT networks with automatic failover without idle resources.
Summary (tl;dr)
- The AWS outage (us-east-1) caused global disruptions, including in the IoT sector.
- EMQX reports largely stable systems during the incident.
- Explanation: deliberate limitation of cloud dependencies, multi-AZ and multi-cloud architecture.
- Context: The blog post also serves as company self-presentation.
- Significance: Illustrative of the growing importance of robust IoT architectures that depend on cloud services.











