High Availability and Enterprise Reliability

Redefine Storage for High Availability (HA)

Why High Availability is Important for Business

Today, network services rely on the Internet. Even the shortest downtime can cause huge losses to the business. Outages can lead to lost revenue, disruption to business operations, increased security and fraud-related risks, and terrible inaccessibility to data. In such a disaster, the company image may be damaged and customer satisfaction may be irreparable. Therefore, designing and running a highly available system is the key to avoiding downtime.

What is High Availability?

Availability means the percentage of the total time that a computer system can be accessed during working normally. You might assume optimal availability is 100%, but it is very hard to achieve. HA (High Availability) systems are those that have online availability in the 99.9% to 99.999% time range. The ideal HA is 99.999% (“five nines”) and can only tolerate approximately five minutes of downtime in a year.

Availability % Downtime per Year Downtime per Month Downtime per Week
90% (“one nine”) 36.5 days 72 hours 16.8 hours
99% (“two nine”) 3.65 days 7.20 hours 1.68 hours
99.9% (“three nine”) 8.76 hours 43.8 minutes 10.1 minutes
99.99% (“four nine”) 52.56 minutes 4.32 minutes 1.01 minutes
99.999% (“five nine”) 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% (“six nine”) 31.5 seconds 2.59 seconds 0.605 seconds

HA can be improved through fault tolerance. Based on a complex hardware and software architecture, all parts of the system work completely independently of each other. Therefore,  the failure of any single component does not crash the entire system.

Understanding RPO and RTO

RTO (Recovery Time Objective) and RPO (Recovery Point Objective) are the two most important parameters in a disaster recovery or data protection plan. These goals can guide companies in choosing the best data backup plan.

RTO is the duration of time an application can be shut down without causing significant harm to the business. Some high-priority applications can only be down for a few seconds without causing customer anger, and loss of business. In fact, the shorter the RTO in mission-critical applications will be the better.

RPO is a measure of the maximum allowable amount of data lost. It also helps measure the time that can occur between the last data backup and the disaster without causing serious business loss. Actually, RPO does not allow any data loss in mission-critical applications.

Requirements for HA Storage

We listed the requirements for HA storage, depending on three parameters. They are availability percentage, RTO (Recovery Time Objective), and RPO (Recovery Point Objective).

HA Storage Type Near HA Native HA True HA
Availability % (Downtime per Year) 99.9% (8.76 hours) 99.999% (5.26 minutes) 99.9999% (31.5 seconds)
RTO (Recovery Time Objective) < 5 minutes < 30 seconds < 30 seconds
RPO (Recovery Point Objective) ≠ 0 = 0 = 0

 

HA storage is a storage system that can run continuously or provides at least 99% uptime. Redundancy is a key feature of HA storage because it eliminates SPOF (Single Points Of Failure). An HA storage array requires at least two controllers if one controller is failing or lost. Other basic requirements for HA are fault-tolerant and redundant modular components such as PSU, FAN module, and dual port disk drive interface. Firmware updates with zero system downtime will keep storage active.

For disaster recovery, HA storage requires a redundant storage system to take over the critical data and applications that the business needs when one of them goes offline. It is also called failover. With failover, tasks are automatically rerouted to secondary during planned or unplanned outages.

Users can build their HA services based on the applications. Services with higher availability percentages can be implemented through more complete mechanisms. Of course, it costs a lot because it requires more consideration.

Take regular data backup as an example, it may require 99.9% uptime. Its RTO will be fine in 5 minutes. If data loss is encountered, resending the data can also be accepted.

Mission-critical services such as enterprise email service or large-scale surveillance, require 99.999% uptime and cannot tolerate data loss. If the downtime is too long, the host may fail and begin dropping I/O packets when there are too many retries. At this time, important purchase order emails may be lost or images of critical moments may not be recorded.

In an online nonstop service, the conditions are stricter. Using AFA with RAID EE protection and C2F mechanism is suitable for higher computing and uninterrupted service.

HA Storage Comparison

Based on three indicators of HA storage, let’s compare the dual controller storage and 2-node storage cluster.

 

Dual Controller Storage  vs.  2-Node Storage Cluster

Dual Controller Storage 2-Node Storage Cluster
Availability % (Downtime per Year) At least 99.999% (5.26 minutes) 99.9% (8.76 hours)
RTO (Recovery Time Objective) < 30 seconds > 1 minutes
RPO (Recovery Point Objective) = 0 ≠ 0

The features of dual controller (active-active) storage are at least 99.999% availability, RTP < 30 seconds, and no data loss for RPO. However, a 2-node storage cluster with active-passive architecture cannot reach RPO = 0 due to lack of C2F, and its RTO may be greater than 1 minutes. Therefore, the total availability percentage may be 99.9% uptime.

The active-active controller architecture can provide real-time storage services in parallel at the same time. The active-active architecture doubles the available host bandwidth and cache-hit rate, ensuring that there are no wasted resources in the system. In addition, the all-in-one dual-controller with dual-port SAS HDD is cost-effective and easy to deploy compared to a two-node storage cluster.

Both architectures claim HA storage, what do you choose?

Conclusion

By keeping your business online in critical applications, you will always be able to do business without losing any revenue. A quality HA design will build customer trust by always being online and available. For a real HA storage, you can review if conditions such as availability percentage, RTO, and RPO are true.

 

 



×