The classic way to size a storage area network (SAN) is to thoroughly understand your business requirements. For instance, your business requirement might start with the selection of an enterprise resource planning (ERP) system and an understanding of the number of concurrent users. After a while, this need translates into a particular application. Thus, the business need gets mapped to an application, and the application's requirement gets mapped to infrastructure. This mapping is dependent on performance, availability, reliability, interoperability, serviceability and scalability.
On the other hand, if your organization is moving from legacy storage to a SAN, you should chart it with current applications when trying to enhance the present environment. It's essential to understand that balance and appropriate sizing are very important. The challenge is that storage has already been deployed in a certain fashion, thus, best practices have to be employed while performing storage augmentation. If the legacy storage is from different vendors, you can look at multiple storage approaches deployed with different solutions according to varying requirements.
While defining scalability, consider the number of years taken into account for the sizing. For reliability and availability, the type of backup should also be considered -- will it be local backup (performed on a daily basis), or a two- or three-site disaster recovery solution? Based on these criteria, application sizing gets mapped to infrastructure sizing. For infrastructure sizing, you have to size the network and servers (for example, within an ERP system there are the Web, application, database, testing and development tiers). Thus, you define storage requirements and connectivity after defining the IT landscape.
The performance-capacity equation
Performance and capacity requirements are the main criteria when sizing a SAN. For instance, assume that an organization wants its SAN to cater to 10,000 users. Now, if the company has a requirement of 15,000 input/output operations per second (I/OPS), it becomes essential to calculate the required number of spindles in order to map this performance requirement. Also keep in mind that you should define the capacity requirement at a much later stage.
Assess the performance requirements before you decide whether you need the capacity. If sizing is conducted purely on the basis of capacity, then you will face problems. You will never get the right number of spindles to meet business needs.
Depending on their specific environments, organizations select either capacity- or performance-based SAN sizing. Today's major computing challenge is that storage still lags behind, whereas the performance capabilities of CPU and memory have improved. Performance is still dependent on the number of deployed spindles. Thus, capacity still takes a back seat for such requirements.
Storage capacity doubles every year, and organizations just keep on hoarding data. Instead, you should carefully tier this stored data according to its criticality. The next step on this front is archival, wherein less-used data can be transferred to other forms of storage. Alternately, the data can also be stored on the same SAN using Serial ATA drives.
Protocols, speed and more
There is much confusion when it comes to choosing between Fibre Channel (FC) and iSCSI-based SANs. If the need is for test and development as well as quality assurance, organizations should generally opt for SANs using the iSCSI protocol. FC should be used for mission-critical applications in production environments because of latency reasons, rather than bandwidth (which is much lower than iSCSI).
As iSCSI runs on the TCP/IP protocol, it is much slower and thereby less efficient. However, iSCSI helps save infrastructure costs. The ideal scenario is a mix of both approaches.
RAID configuration is the next aspect to address. RAID 1 is the fastest configuration and most highly available. Hence, RAID 1 should be used for mission-critical applications. RAID 5 is good for environments with medium performance. All applications have different sections with varying availability requirements. If you consider an Oracle database as an example, logs can be kept on RAID 1 and the data file on RAID 5. If the SAN environment is predominantly read along with a little bit of write requirements, then RAID 5 or 1 will give the same performance. In a write-heavy environment, RAID 1 gives more performance.
Regarding the SAN speed, it's measured in I/OPS or megabytes per second (for sequential applications). SAN latency is typically defined by the application and its acceptable latency. You can size the SAN considering these aspects.
When it comes to drives, extremely demanding applications can run on a 15k (15,000 RPM) FC drive. Moderate environments can run on a 10k (10,000 RPM) FC drive, which occasionally accesses SATA storage. Serial-Attached SCSI drives are found in entry-level environments, since SAS drives are used inside the server and in entry-level devices. However, the mission-critical spindle for organizations is definitely FC.
About the author: Sanjay Lulla is the director of EMC's technology solutions and head of pre-sales operations of EMC Corp. for India and SAARC.
(As told to Jasmine Desai)
This was first published in October 2009