In order to deliver high levels of IT infrastructure availability, organizations need tools that help them isolate these problems. For example, the mean time required to isolate a problem takes up 65% of an administrator's time. RCA helps reduce this.
It's common to find that multiple events and alarms are generated during a device failure -- not just from malfunctioning network devices but also from adjacent attached devices. For example, if a switch generates an alarm that a particular port and card have gone down, then the servers attached to that switch also generate an alarm. This is where RCA and event correlation accurately identify the problem.
RCA is very useful for management of SLAs between the IT team and business users. RCA ensures higher levels of IT infrastructure availability.
To carry out a successful RCA exercise, it is wise to follow certain steps. The first step is to perform detailed mapping. This should be supplemented by selection of the right management tool.Mapping: In order to conduct effective RCA, you must identify what your organization wants to manage. You can achieve this by creating a managed object definition language (MODL) model.
The next step is to create a topology of the data center to be managed, which encompasses all the managed elements. These may include routers, switches, servers and storage equipment, or particular applications that run on this infrastructure. Thus, a map is created of the data center and infrastructure network along with correlation.
To explain this mapping, let's take the example of a core router in the centralized data center. Typically, multiple access routers connect to this core router. So this connectivity between the network's core routers and access routers will be shown in the mapping model. These router-based networks will be connected to switching devices at each of the locations. There will be different servers installed on the switches and different applications installed on these servers. Thus, a map of this information is created, which is known as the topology.
Data centers are now in a constant state of flux, since infrastructure continually changes to meet business needs. The topology itself should be able to effectively discover all infrastructure elements the first time. The tool should keep this topology updated on a periodic basis (or as and when required) so that it is the exact replica of what has been deployed in the data center.Post the network management tool rollout: While RCA can achieve many wonderful things, it also introduces overhead issues into a network. Network overheads are created during discovery of the network elements that have to be managed. To counter this situation, it's essential that you schedule the discovery timing. Thus, you do not overload the network with queries related to discovery tools. A boundary may also be created, so that only a certain area needs to be discovered. That is one way of reducing the discovery time. About the author: The director of sales for Ionix in India, Rajesh Awasthi is responsible for EMC's Ionix software business in India and the SAARC region. He has worked on sales, business development, consulting, partnership management, alliance management and product management for telecom software, data network, IT infrastructure, disaster recovery and business continuity solutions. Awasthi has worked on quality process building and implementation for increasing efficiency and has also won several awards.
(As told to Jasmine Desai.)
This was first published in October 2009