Recovery point objective (RPO) and Recovery time objective (RTO) are two key metrics you need to understand as part of an effective, comprehensive backup strategy.
This article explains what these metrics are, how they are used in business and what you need to do to set the appropriate targets for your backup SLAs.
To find out how RPO and RTO fits into the PennComp 6 step backup strategy, please contact us today.
RPO and RTO: Understanding the Difference
The two metrics may only differ in terms of one word but they measure different things.
In a nutshell, RPO looks backwards and sets goals related to the amount of data it is acceptable to the business to lose. The RTO looks forward and sets goals related to the amount of time before a system is brought back online.
The longer definitions, as set out in the ISO 22301 standard are:
RTO is the ‘predetermined time at which a product, service, or activity must be resumed, or resources must be recovered.’
RPO is the ‘maximum data loss, i.e. minimum amount of data used by an activity that needs to be restored.’
Why is this second value also recorded as a time value? Because all backup processes run on a clock basis whether that clock operates in milliseconds or hours.
Of course, the ideal RTO and RPO values would be zero, indicating no data loss and no system downtime at all but this level of service is impractical on the ground (and you would never find an outsourced service willing to offer you those terms in its SLA).
Examples of RPO and RTO in Business
Where critical data changes rapidly, it is vital that the RPO is set as short as possible. If you decide that you can only afford two hours of data loss then backups will need to be scheduled on a two hourly basis. On the other hand, if data is fairly static you may feel that you can get away with 24 hours between backups.
Where systems, services or applications are critical for business, it is vital that the RTO is set as short as possible. This will help you to prioritize which areas to direct resources to in the event of a system failure.
For example, systems involved in processing financial transactions need a much shorter RPO than those managing employee records which change infrequently.
What about RTO? An e-commerce site’s catalog, held on a relational database, would need a short RTO as no customers can purchase anything until the database is restored. In contrast, some informational websites might get away with a long RTO if they are only used infrequently.
Factors in Setting Your RPO and RTO Targets
You should define your RPO and RTO targets in your Business Impact Analysis, part of your Disaster Recovery and Business Continuity (DRBC) plan.
When defining these values, there are a number of factors to consider, including:
- Resource cost. In general, the closer your RPO and RTO targets are to zero, the more resources you will need in place. Regular backups consume more processing power than occasional ones and if you need to set up hardware and software systems quickly, to online a system at short notice, this will also push costs up.
- Number of versions kept. The more versions of a backup are stored, the lower the risk of complete data loss but the higher the cost of storage.
- Compliance requirements. Some industry standards (e.g.SOC 2) demand a high level of processing integrity. In addition, some data compliance rules may forbid data transfer outside of the United States which could impact on the choice of backup service.
- Discovery time. RTO is measured from the time a system, app or service goes down to the time it is fully restored. If a problem is discovered late by IT Support, you could breach your RTO even if you act very quickly. This is more likely in non-critical systems where real-time monitoring is not a priority.
When negotiating an SLA, it is standard practice to define different tiers based on the importance of the system in question. For example:
Tier 1 SLAs would cover mission-critical systems, applications and services and might specify an RTO and RPO of less than 15 minutes (possibly in seconds or even milliseconds)
Tier 2 SLAs would apply to business-critical systems, applications and services. RTO might be set at 2 hours with RPO set at 4 hours.
Tier 3 SLAs would then apply to any other non-urgent systems, applications and services. For these, an RTO of 4 hours and an RPO of 24 hours might be perfectly acceptable.
During the testing phase of your plan, you will need to run exercises to check whether your team or outsourced service can meet your RPO and RTO targets. This can help you to identify bottlenecks and adjust RPO/RTO to reflect reality on the ground.
RTA and RPA are the respective measurements for your actual exposure calculated during rehearsals. For example, if your last stable backup was at 6am and you lose all the data between then and when your site goes down at 10am, you have an RPA of 4 hours. If you get the system back online at 12pm, your RTA was 2 hours.
What is Real Time Recovery (RTR)
So far, we have dealt with RPO and RTO targets that might apply to a backup system based on a standard batch process.
Real time recovery is an alternative paradigm designed to restore a system within seconds of it going down. In contrast to traditional batch processing devices, lightweight memory-based RTR appliances are driven by the applications themselves and RPOs can usually be reduced to a few milliseconds.
Since backups are required to run constantly, this solution is resource intensive. However, due to the power of the cloud, so-called backup-as-a-service (BaaS) users benefit from economies of scale.
In addition to RTR, some next-gen backup solutions offer features such as self-healing properties for repairing corrupt files ‘on the fly’. This might involve ‘failing over’ to a replica VM to rescue a corrupt file and then ‘failing back’ to the source host once the issue has been resolved. In such a case, RPO and RTO are effectively reduced to zero as neither time nor data has been lost.
Other ways of slashing RTO is to provide granular control over the type of data you restore. By reducing the need to recreate entire VMs or file tables, restoring system function can happen much faster.
How PennComp Backup Solutions can keep your business protected
As you will appreciate, your backup and recovery systems are at the core of your Disaster Recovery and Business Continuity Plan.
No one backup solution will fit every Houston business which is why we recommend contacting PennComp to get your backup needs covered. Our proactive, 6 step methodology will ensure all bases are covered, including determining a realistic RPO and RTO.
We always ensure that our data backup solutions follow the 3-2-1 rule. That is, that there are three copies of the data required, that they are stored in at least two different types of media format and that at least one copy is stored somewhere offsite. Call us today and take that first step towards complete data backup security.