Nowadays, loss of data can cost businesses billions. Data has become a critical asset and organizations must be able to deal with natural disasters, government compliance, database corruption, component failures, and human error. Businesses generate and maintain vast amounts of data, such as details of customers, partners, inventories, products and services. What will happen if all that data is lost? For example, a bank won’t be able to justify its statements or balance sheets. There will be total chaos.
Consider organizations affected by the 9/11 bombings. Many neglected to back up their data to remote sites. As a result, they never resumed operations. Data backup and disaster recovery (DR) solutions are critical to today's organizations. It’s vital that a disaster recovery plan be in place before an organization goes into production mode or starts operations. Here are some points to consider while creating a disaster recovery plan.
- The disaster recovery plan should categorize data according to business criticality in the event of disaster. That is, it should clearly identify the data which needs to be recovered immediately to resume operations with minimal disruption to the business.
- The most critical data, viz., financial data, ERP data, email, etc., should be replicated on a secondary site. This secondary site should ideally be in another seismic zone: the distance between the primary and secondary sites should be greater than 600 kilometers.
- The recovery point objective (RPO) and recovery time objective (RTO) of applications should be defined after analyzing their business impact, and then documented in the DR plan template. Please see the disaster recovery plan template that accompanies this tip for more details.
- RPO is the point of consistency to which data must be restored. It is a measurement of time indicating how long a consistent point is expected to be compared to the time an incident occurred. It can range from zero to minutes or hours. With synchronous data replication, RPO can be zero. For systems that don’t need immediate recovery or where data can be rebuilt from other sources, RPO can be 24 hours or more. RTO is the time permitted for recovering an application to a consistent recovery point. This time can include some, or all, of the following:
i) Time needed to bring up backup hardware
ii) Time needed to restore backed up data
iii) Time needed to perform forward recovery on databases
iv) Time needed to provide data access to users
Determining right RPO and RTO
You must work with the appropriate personnel in the enterprise to identify RPO and RTO requirements. You should ask questions such as, “What frequency of recovery requests does the enterprise expect? How long do you think the organization can afford to wait for a recovery request to happen?”
The DR plan should document users’ expectations regarding RPO and RTO to ensure mutual understanding. It is also very important to include the time for shipment of tapes if they are located offsite. This will help you to determine the backup frequencies for clients and schedules, needed backup levels, as well as browse and retention policies. These details must be documented in the DR plan.
You should also decide on the backup device and its connectivity — SAN-based backups or LAN-based backups. The frequency of backup cycles for critical servers can be set higher than for other servers. You should take full backups of critical servers more frequently so that recovery is faster. In case of critical database backups, such as in an SAP production environment, you can schedule archive backups more than once a day so that point-in-time restores are available.
At one of our clients, we back up archive logs every six hours so that there is minimal data loss in the event of a disaster. Normally, you can schedule weekly full and daily incremental schedules. You can also use the data growth trend to choose schedules. You can schedule frequent full backups for clients with frequently changing data, and incremental backups for other clients. It’s also possible to use the data protection functionality of storage arrays to back up most critical data. For example, you can schedule clone backups of production SAP databases and Exchange database backups on a storage array, rather than opt for traditional software based backups. Keep a few clone copies of the production LUN on the storage array to restore data instantly by promoting the clone to primary in case of data loss on the production LUN.
Going by the different ways explained above, we can use the following tiered approach to create a disaster recovery plan:
- Business critical data should be provided the highest tier of data security to guard against application data loss, disaster at the production site, or compliance requirements that mandate data from previous months.
- Other business data can be backed up using B2D2T solutions where the retention of the backup on disks is a few weeks. Afterwards, the data can be staged to tape for longer retention and shipped offsite.
- To guard against backup application data loss, the backup catalog should be backed up daily.
- Indexes of the application servers should be backed up to a separate pool to ensure faster recovery in case of disaster.
- Full system backups of the application servers and system state should be configured. Regular test recovery drills should be conducted to ensure the consistency of backed up and replicated data.
DR plan optimization tips
- Enterprises that cannot afford to invest in costly replication solutions can opt for cloud-based backup solutions. These can provide the advantage of almost zero CAPEX.
- Your basic DR plan should contain contact details of all key personnel, as illustrated in our downloadable disaster recovery plan template. This includes key members in the SAN, BURA, application, and network teams, at both the primary and secondary sites.
- Follow documented major incident management resolution processes and procedures.
- Follow information flow process and procedures.
- Whom to contact, and whom to inform, in case of disaster should be well defined in the disaster recovery plan.
- Document equipment details of primary and secondary sites, along with their respective owners' contact details and support information.
- Incorporate a DR infrastructure diagram in the DR plan.
- In case of a major Incident, create a major incident report. Update it regularly about current happenings until resolution of the incident.
Do no forget to access our downloadable DR template, which provides an overview of how to design a DR plan, considering the above points.
About the author: Anuj Sharma is an EMC Certified and NetApp accredited professional.
Sharma has experience in handling implementation projects related to SAN, NAS and BURA. He also has
to his credit several research papers published globally on SAN and BURA technologies.
This was first published in May 2011