Big data is not a specific type of data. Every kind of unstructured data can be considered big data. This includes data on social networking sites, online financial transactions, company records, data from weather monitoring, satellites, and other surveillance sources, research and development data. Big data is huge in volume and unstructured.
IDC defines big data technologies as: Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis. Noted below are a few tips for efficiently managing your big data storage needs:
Managing big data storage through segregation
If you have multiple storage boxes in your organization, it is a good idea to dedicate database, online transaction processing (OLTP), and Microsoft Exchange applications to specific storage systems, while dedicating other storage systems to big data applications such as web portals, online streaming applications, etc.
In case you can’t afford to segregate your storage systems, dedicate specific front-end storage ports to databases, OLTP, etcetera, and dedicate the other ports to big data applications. The rationale behind using dedicated ports is that while big data traffic is measured in kilobytes or megabytes, OLTP application traffic is measured in input/output operations per second (IOPS), since block sizes are bigger in big data, and smaller in OLTP applications. OLTP applications are also CPU-intensive, while big data applications use the front end ports more. As a result, more ports can be dedicated to big data applications.
Specialized big data storage management
Many companies today offer storage systems with big data management in mind. You should evaluate these companies when looking for your big data storage management solution. Clustered storage systems such as EMC Isilon are a better choice for big data storage management, since big data can grow to many Petabytes of data within a single filesystem.
Big data analysis
Companies such as EMC Greenplum are currently working on tools specifically for managing and analyzing big data. These applications work on clustered storage systems, and ease the management of big data. It is recommended to choose applications that can simultaneously work on cluster storage systems, and analyze data efficiently and quickly. For fast indexing, make sure the metadata always resides on solid-state drives (SSD), if the storage box provides you with the option to do so.
Another important big data management consideration is future data growth. Your big data storage management system should be scalable enough to allow for future storage needs.
Big data storage management and cloud computing
Many companies are now looking to cloud computing services for the storage and management of big data. While choosing cloud services for big data storage management, make sure the ownership of data remains with you.
You should have the choice of moving your data in and out of the cloud, and there should no vendor lock. The other important factor to consider is the vendor's data security guidelines.
About the author: Anuj Sharma is an EMC Certified and NetApp accredited professional. Sharma has experience in handling implementation projects related to SAN, NAS and BURA. He also has to his credit several research papers published globally on SAN and BURA technologies.
This was first published in October 2011