For CIO Roland Etcheverry, disaster recovery (DR) begins -- and ends quickly -- with virtualized storage. In fact, storage -- liberated from physical machines and living on a storage area network (SAN) that is completely virtualized -- is the alpha and omega of the data center at Houston-based Champion Technologies Inc., where Etcheverry took over IT operations five years ago.
"We built our data center around the concept that there would be no box with storage that was not picking up from the SAN," Etcheverry said.
"All of our servers have enough storage on them to boot the operating system and load whatever application they are going to load, but everything else sits on the SAN," he said. The SAN contains images of the local drives of the data center's 200 to 250 servers, updating the images when they change.
"When you centralize your storage, the DR plan becomes simple -- not trivial, but simple conceptually -- and that was our plan from the beginning," Etcheverry said.
The disaster recovery plan, put into place last year, leverages this storage-centric approach. Another SAN sits at the company's DR site in Scottsdale, Ariz., and the data is replicated over an IP network. "We have a big pipe [OC-3 line] in our data center, because it was the cheapest way to buy what
The DR site houses between 70 and 80 servers for Champion's business-critical applications. An agreement with disaster recovery vendor SunGard Data Systems Inc. ensures that other servers will be provisioned as needed to host other applications. The applications servers are created from an image stored on the SAN. Once imaged, the servers are pointed at the SAN for data. The targeted time to recovery is two-plus hours.
That's a big change from the situation shortly after Etcheverry arrived, when the company was backing up critical SAP and Microsoft systems data to tape and recovery times could take a week or longer. The situation wasn't tenable for a fast-growing company that had 2,300 employees and 70 sites globally and was reliant on these systems to conduct business.
The path to SAN-based disaster recovery
Champion, with approximately $900 million in annual revenue, makes specialty chemicals for the oil and gas industry. Until the recession settled in, the privately held supplier was doubling in growth every three years. Business operations had outpaced the IT systems, giving Etcheverry an opportunity to rethink and rebuild the company's business systems for high availability.
"I started with a point of view that said a single global instance, that said standardize, consolidate, integrate where you can," Etcheverry said.
Virtualized storage is not new, but when Etcheverry was first investigating SAN-based DR, the choices from heavy hitters like EMC Corp. and Hewlett-Packard Co. were not good for a relatively small customer like Champion. "The big companies wanted to jerk us around and the offerings were not that interesting and entirely proprietary," he recalled. "They wanted us to use all their stuff, and we were not comfortable with that."
Instead, Champion went with agnostic and considerably cheaper technology from FalconStor Software Inc. for its storage device. "Their concept was it should not make any difference what you put in front of it. You could hang any server box, any operating system and it should be able to see this storage box as its acceptable drive," he said.
The virtualization appliance was equally accommodating. "They don't care what you hang behind it either; they manage iSCSI, Fibre Channel, Fibre Channel over Ethernet," he said.
Etcheverry said he couldn't virtualize key applications because the vendor didn't support them.
"We are a Microsoft, SAP shop, and neither supported virtualization when we started, but neither questions or raised an eyebrow when we said we were going to virtualize storage on a FalconStor SAN. They know all about SANs," he said.
"The good news is that virtually everything we have runs with one exception -- a third-part app that links to SAP, likes the SAN," he said.
A SAN-based disaster recovery plan may not be so simple for organizations with transactional data. Gartner Inc. analyst John Morency said one wrinkle with SAN-based replication is that "it totally ignores what you do with databases." So, even if an organization is using SAN replication, it will typically need some kind of recovery manager -- something like Oracle Data Gard" for Oracle shops, he said, to ensure the data in the database is exactly the same as in the primary database.
12 TB of data at the ready
Even more than choosing the right vendor, CIOs considering this approach must make the conceptual leap that storage is not directly tied to a server, but "a block of stuff out there," Etcheverry said.
When you centralize your storage, the DR plan becomes simple -- not trivial, but simple conceptually.
Roland Etcheverry, CIO, Champion Technologies Inc.
"If your storage is all in the same location in one big chunk, and you replicate it to another site in one big chunk, and if your server images are at the other site, then your DR is this simple: Put some servers at the other site; when one goes down, turn up the servers at the other site," Etcheverry said.
The take-home message from virtualized storage?
"You never have to worry if the data is there," he emphasized. "This point of view says, 'If you have data, it is on the SAN and you replicate everything on the SAN to the other end.'"
With 12 terabytes of systems data, that approach saves a lot time for file transfers at the moment of the disaster. With this implementation, switching everything to the backup data center takes about four hours -- a far cry from the days it took otherwise. The company's system relays hourly snapshots to the backup data center, keeping it fresh. In addition, because of the frequent snapshots, these backups can become the master storage system for all of Champion's offices, Etcheverry said.
Lessons: What is business-critical?
Selling DR to a company that had no disaster recovery plan was not hard, Etcheverry said. Hashing out with the business which applications were most critical and agreeing on acceptable failover times was more difficult.
Failing over in real time can double or triple the costs and makes the failover more complex, he said, but the company's first inclination is to say everything is critical. For example, Champion closes its books using SAP Business Information Warehouse.
"We did not call that business critical and everybody asked, 'Why not?'" Etcheverry recalled. "I said, 'I can build that when we need it in less than a week's time from the equipment already down there. If this means that in a given month, your closing would be six days later, wouldn't it make sense to save the cost?'"
Another area Etcheverry said he aims to improve is identifying the dependencies between functions. "Everybody knows that to turn on an application's database, the server has to be running. But the actual dependencies -- as opposed to convenience --is critical to driving the DR plan," he said. "If you get the order done right, it can take an hour out of the overall process."
Let us know what you think about the story; email Linda Tucci, Senior News Writer.