According to the Storage Networking Industry Association (SNIA), the hardest part of implementing a tiered storage system is classifying the data.
"The biggest challenge we hear from customers as they begin implementing ILM-based tiered storage is in getting agreement on the information and data classification requirements," reads a quote on the SNIA home page. "This is the crux of establishing successful ILM practices."
Information lifecycle management or ILM is basically the goal-directed adjunct to tiered storage. While tiered storage plans deal with how the data is to be divided and stored, ILM deals with the same issue from the user's perspective. In other words, why the data needs to be divided and stored in that manner.
There are only three basic tiers of storage -- archival, backup and immediate use -- but a typical business is likely to end up with more than a dozen categories in their tiered storage system. Because the categories have to be composed for specific businesses, there are literally hundreds of rules, often differing in only tiny details.
One variable, for example, is the amount of time a record is retained in each tier and if the record should be moved to the next tier or destroyed.
However, establishing categories for a tiered storage system isn't primarily a storage management problem. It's also a user problem, and each group of users in the business needs to help in deciding the classifications. Storage managers typically get the job of refining the categories and picking the technologies to support them.
User input on data is necessaryYou'll need input from each group of users regarding the data they generate or handle. This will include how long to keep the data, how fast the data needs to be available when needed, how likely it is to be needed, when it can be moved between tiers, and when and if it can be destroyed.
Setting up a data classification committee will likely be necessary because data is typically used by more than one department. For example, data that's useless after three months to one department may be useful to another department for a few years.
Whenever possible, the classification itself should be done automatically. That is, the system should be able to determine where to pigeonhole each document without asking the user to classify it. This is usually done based on the type of document (spreadsheet or word processing), time of creation, who created it and what folder it's stored in. This means the classifications need to be easy enough for the system to handle.
The next step is to rationalize these categories and combine them when possible. This involves questioning users about their classification characteristics. For example, a user may only need a document for three months, but if retaining it for six months eliminates a category and doesn't cause problems, it might be worth doing.
Now it's time to start thinking about technology. How much of the hardware and software that you have now can you use in your tiered storage system? What new technology will you need? Can you slipstream it in or will you have to install it in large chunks?
Capacity planning becomes more complicated with tieringOne result of tiering is that capacity planning becomes more complicated. Instead of just needing more hard drives, you need to decide which kinds of hard drives you need (i.e. fast SCSI, medium speed SCSI, SATA, RAID 10, RAID 5, etc.). Don't assume that the storage device categories will grow in lockstep. Tiered storage typically has some categories that grow rapidly, some that hardly grow at all and others that will actually shrink.
For example, archival needs often shrink as data that was once stored permanently is reclassified into categories that are kept for only a limited time.
Once the data is classified, the business rules are set and the technology is in place, dividing the data can be a straightforward process with storage management software and, if needed, data archiving programs.
Some types of data in tiered storage are best handled by specialized data archiving programs, if the amount of data is large enough. This is particularly true with email because of the nature of the backups created by email programs (archival rather than easily searchable) and the large number of small data files.
Once a tiered storage system is established, its classifications should be rigidly adhered to. Let's say records are subpoenaed -- it's crucial that you can deliver all of the records asked for and that all the records you believe to be destroyed have actually been destroyed.
The user who's kept a forgotten copy of a supposedly destroyed email on his/her personal hard drive can be a serious impediment in a court case. You'll need a company policy establishing what kinds of information can be stored where, and you'll need to provide employee education.
This was first published in July 2009