Rationale Of Disaster Recovery Planning Information Technology Essay
The need for a disaster recovery plan can be justified on examining; What is a disaster? Thus the DR preparedness in a business scenario can mean a make or break situation where the lack of a backup plan could lead to a total wipe out of the company or corporation. A case in point would be the recent disaster which happened in the Gulf of Mexico, and the unpreparedness’s of BP not only hurt the bottom line of BP by way of expenses that it incurred for the clean up process but the additional money the government took from BP for the environmental cleanup work damage that was done to the region. The effects of this disaster did not end there, and it also affected the livelihood and business prospects of the people living along the coast and specially the effect on the fishing industry. Thus the above example is although an extreme but the point is that if there was a substantial DR plan in place the oil leak could have been fixed before the damage became so critical. Thus a DR plan could turn out to be a critical part of the business which although is not required on a daily basis but there is a definite need for it to be in place.
Thus an organization’s vital asset in any circumstance would be its data and thus in order to protect its functionality in case of a disaster, the organization needs to implement a disaster recovery plan. Especially in areas such as insurance, manufacturing, banking, government, education, retail, IT companies and most small and medium enterprises the data plays a vital role in the functioning of the organization and it needs to be secured and there should be a plan in place to recover from any sort of disaster. A disaster recovery plan developed for a company would certainly help the organization in maintaining business continuity and would also cause avoidance to customers and business processes in general. A disaster recovery plan when implemented would ensure the following:
Minimizing potential economic loss
In the event of a disaster implementing a disaster recovery plan would certainly minimize the financial loss to the company. For instance let us consider the area of banking and in case of a disaster such as an earthquake where the bank loses its only server, apart from the valuable data loss to the bank, it could also lead to an extent where the bank’s share value could also decline. Thus in order to reduce such potential economic loss, we need to implement a disaster recovery plan.
Reducing disruptions to operations
As a disaster in most of the events leads to damage that often leads to a collapse in the production environment and thus leads to an interruption in the operations. In order to overcome this, the disaster recovery plan implements a strategy that sees that in case of a disaster at least one of the production servers is safe and there is no interruption in the operations.
Providing an orderly recovery
The disaster recovery plan also ensures that the data is recovered in an orderly manner as an orderly recovery of data is crucial in industries such as banking, insurance and retail as they include a series of transactions which needs to be recovered in order of their occurrence in order to ensure the integrity of the system. Apart from the above stated reasons there are several other reasons which make me support the reason of implementing the disaster recovery plan onto an average business. They are as follows:
Protecting the assets of the organization
Minimizing legal liability
Minimizing insurance premium
Decrease in terms of potential exposure to losses or other disastrous outcomes
Reduction in the disruptions of day to day operations
Ensuring organizational stability
Now consider that FDU is in a very tight financial situation and does not have extra funds. Our revenue is less than our expenses. Therefore, we are in a minus category. How can we justify spending additional funds to plan for a disaster which probably will never happen? (13 points)
FDU although a university can be considered to be a major business or service provider which offers its educational services to its customers (Students) and so it definitely needs to have a well planned Disaster Recovery plan in place, although the deficit in its financial budgets is a concern but still the need for the DR plan cannot be stressed enough. First of all most of the university functions such as classes, administration, security and such are highly reliant on IT and network based infrastructure, and also the data and records are stored and maintained on the network servers.
In an ideal world, and with unlimited budgets, the IT planning team at FDU would like to spend significantly to assure that employees, and customers always have access to business systems and important information and many organizations allocate significant portions of their IT spending each year to assure operational resilience. However in the case of FDU which up till date except for the power outage suffered in the winter of 2009-2010 probably has never experienced a natural disaster or a security threat, or human error, and thus there is a major struggle to justify spending on disaster recovery plan for the university.
But the Disaster recovery spending is insurance against the risks of user downtime, data loss, and business interruption just as life insurance, health insurance, and homeowners insurance are pretty much a given, but it’s always difficult to assess how much coverage is enough, and how much to spend. So as every organization knows it needs some level of protection, determining the extent to which to spend is always a challenge.
Furthermore as to why FDU needs to consider implementing and putting a disaster recovery plan in place is because as FDU is an educational institute which not only has its operations at two different locations in New Jersey but it also has operations in Vancouver, Canada. Also being the largest privately funded university in the state of New Jersey FDU it has a database that has details of some thousands of students and alumni who are spread all over the world. Moreover some of its major services for the educational and also its administrative branches are located on the university systems and servers, which if affected would critically affect the ability of the university to function and also it would lead to a long time in recovering from a major disaster or catastrophe.
Thus keeping in view the mission critical areas of operation and also its foreign campus in Vancouver Canada FDU should ensure that the data of the students is safe in case of an unexpected disaster. So to help in determining and justifying how much disaster recovery spending is needed, the IT team at FDU would should ideally perform a case by case risk analysis and then analyze those scenarios and perform a step wise analysis:
To assess the downtime costs for crucial business systems
Go on to calculate the potential disaster risks and the corresponding impacts
Compare different and alternative plans and then determine the benefits of each proposed solution, and how much spending is enough.
Thus after a thorough analysis of all the situations and possible outcomes, and mitigating a significant amount of the risk, while be able to deliver a cost-effective solution. However, it is important to remember that disaster recovery solutions are not selected on ROI measures alone. It’s extremely important to examine the financial and business impact of a potentially disastrous event. Understanding the nature of the risks is crucial, and a good first step for determining the level of protection needed, and demonstrating the business value of such an investment. While disaster recovery solutions can be costly, the risks associated with not having the proper protection in place could be devastating for a company. Thus FDU should go ahead and buy a generalized DR plan where spending huge amounts in developing and implementation of an in house disaster recovery plan would not be wise, especially when there are already products and applications that are available in the market and which can be customized and also additions can be made on top of those existing products depending on the availability of funds.
Thus keeping in view of the financial and the human resources that need to be allocated to the development of the disaster recovery plan, I suggest the option of going for a pre designed disaster recovery plan with little modifications and additions as needed.
This question involves identifying functions from chapter 4.
Explain the concept of “cloud computing”. You can look this up on the Internet but your description must be written in your own words. Your answer must be substantial. (12 points)
Cloud computing involves in delivering hosted services over the Internet. They are broadly divided into three categories:
Infrastructure-as-a-service
Platform-as-a-service
Software-as-a-service
The name was inspired by the cloud symbol that is used to represent the internet in flowcharts and diagrams. A cloud service has three different characteristics that it differentiates it from traditional computing. It is sold on demand by minute or hour. It is elastic-a user can have as much as little of the service as they want at any given time and the service are fully managed by the provider. Innovations in virtualization and distributed computing and improved access to high speed Internet and a poor economy has speed up the interest in cloud computing.
Cloud computing is a concept where the network or services are delivered via a virtual network and through the Internet. Under this concept the made the companies pay only for the time or the amount of services which they actually utilize over the virtual network and thus this amounts to large savings for companies both small and medium. The simplicity in cloud computing made it easy for people to use it without any expertise or extra expenditures needed to be made in order to maintain the services.
Cloud computing relieves the customers from having the need of owning a physical infrastructure such as servers and thus helps in reducing the money being invested in the infrastructure. The companies instead pay the amount to the third party whoorganizes and maintain the cloud. As the above figure illustrates the cloud consists of infrastructure, and different nodes are using the infrastructure present in the cloud. Clients are often charged upon the services utilized or on subscription. Though it depends on the third party about how they charge their clients. As the peak time of access is often common, good response time is ensured by increasing the bandwidth values. The major advantages in cloud computing include it being very cost effective as the cost to company to use the infrastructure itself as service is very less when compared to that of designing and developing one’s own infrastructure. One more advantage that makes the concept of cloud computing feasible is confinement of staff as the company would not need any data centers and thus people maintaining them. Thirdly the nature of cloud being infinitely scalable makes it easy to expand and thus increase the resources.
As every methodology has some pros and cons, so does cloud computing. In cloud computing as the users do not have physical storage space the users need to completely rely on the third party provider. This makes the third party provider responsible for the integrity and security of the data. Another argument that is being strongly made is that this concept doesn’t give the user the freedom to install applications, though a major part of the industry is still inclined towards cloud computing.
Infrastructure-as-a-Service provides virtual server instances with unique IP addresses and blocks of storage on demand. Customers use the provider’s application program interface to start, stop, access and configure their virtual servers and storage. In the enterprise, cloud computing allows a company to pay for only as much capacity as is needed, and bring more online as soon as required. Because this pay-for-use model resembles the way electricity, fuel and water are consumed; it is also referred to as utility computing.
Platform-as-a-service in the cloud is defined as a set of software and product development tools hosted on the provider’s infrastructure. Developers create applications on the provider’s platform over the Internet. PaaS providers may use application program interface, website portals or gateway software installed on the customer’s computer. Force.com, (an outgrowth of Salesforce.com) and GoogleApps are examples of PaaS. Some providers will not allow software created by their customers to be moved off the provider’s platform.
In the software-as-a-service, the vendor supplies the hardware, the software product and interacts with the user through a front-end portal. SaaS is a very broad market. Services can be anything from Web-based email to inventory control and database processing. Because the service provider hosts both the application and the data, the end user is free to use the service from anywhere.
Would “cloud computing” help or hurt in developing a DRP? Explain your answer thoroughly. Include in your answer a company’s assets and employees. (13 points)
Cloud computing is certainly a concept which would help the DRP being efficient. Since the cloud computing brings the entire responsibility onto the third party, who takes care of the complete infrastructure, it is his responsibility to ensure the safety of data. Thus the third party vendor needs to employ a highly efficient disaster recovery plan as the data responsibility of data of several client lie on him. Thus the concept of cloud computing though makes it easy to manage data at a centralized location; it makes it much more critical. There are several different systems that an organization follows in order to identify assets, each company has its own system in order to identify its assets. Some of the critical assets that a company needs to identify are as following Hardware is the first type of asset one would identify in an organization. In the process of developing a DRP, hardware assets such as servers are given the first preference in identification. Bar code reading is one of the popular methods used in order to identify the hardware assets of the organization. Whenever an operation related to the hardware is performed, the bar code is read and an entry is made into the Meta database. Software is the second type of asset that is identified. Once the assets such as operating system, enterprise database system are identified important internal resources such as the code segment are recognized. In case of a disaster the company often depends upon several software components that help in recovering the disaster. Maintaining an inventory of what software applications are installed in a system help in recovering critical applications.
The next important asset of an organization would be the data as the entire functionality of an organization would depend upon its data and in case of a disaster the most vital asset that needs to be recovered is the data. Another asset that would account in an organization would be the human assets which are the employees working in an organization. As the human assets are not like hardware or software they tend to change and thus the organization must keeps it up to date of the employees working so that in case of a disaster, the data that is to be recovered is updated.
This question involves establishing the disaster recovery team in chapter 2.
Discuss thoroughly the types of team members that the disaster recovery team should have. Explain the function of each team member. (12 points).
Disaster recovery planning team is the group responsible for developing a disaster recovery plan and also the group that bears the responsibility of supporting and testing it before deployment. Thus depending on several factors the disaster recovery planning team that would have been designed to include people who could fulfill the key roles as mentioned below and thus this would help in developing an efficient disaster recovery plan.
The roles that must be fulfilled in a disaster recovery plan are recovery is that of a Manager, Facilities coordinator, Technical coordinator, Administrative coordinator, Network coordinator, Application coordinator and Computer Operations coordinator. Though in the above case, the bank is unable to allocate seven employees, thus these seven roles would have to be fulfilled by the three people that we are available with. Though in practicality, it is quite impossible for a person to accomplish two roles efficiently.
The above roles and the responsibilities which would come along with the given roles would be as follows:
Recovery Manager: Is a person who is dynamic and good at both managerial and administrative tasks and also someone who has a broad based knowledge of hardware as well as software functions, and this functional knowledge should not be limited to only the business functions of the company but also have specific knowledge base related to disaster recovery operations. The recovery manager will need to be a hand on person with good problem solving skills.
Facilities Coordinator: The skills level of the facilities coordinator are more or less similar to that of the recovery manager but in terms of leadership and managerial capabilities the facilities coordinator is not so much responsible and needs to have more involved approach with close monitoring of the teams and their progress.
Technical Coordinator: This position needs the person to be very strong technical capabilities with almost in depth knowledge about different platforms and also be able to communicate with engineers and technical staff with ease
Administrative Coordinator: This position the person needs to be well aware of the day to day business processes and business transaction and he should have adequate knowledge of all business functions.
Network Coordinator: This position the person requires to have extensive business expertise in maintaining and design of network systems and the person should have a good grip on diagnosing and correction in network errors and problems. Most important is the ability of the current network setup and if need be to efficiently replicate the existing network.
Applications Coordinator: This position needs the person to be having and extensive knowledge of the existing applications used currently by the business and he should mainly have a very good knowledge of some of the mission critical systems of the business such as accounts receivable, payroll etc. This person should have good knowledge in the deploying of systems and also experience of maintaining these systems in proper functioning order.
Computer Operations Coordinator: This position needs the person to be proficient in the day to day operations of the systems and system software. Also he should be able to skillfully re create production schedules and may be implement new schedules. Also the systems coordinator might he held responsible of creating a temporary help desk in the case of a disaster.
Now, suppose we want to establish a disaster recovery team for FDU. We only have funds for three (3) individuals. How would you group the required team members into the three positions? ( 13 points),
The Disaster recovery team at FDU would have to consist of dynamic people who would be able to multi task in pressure situations and also ones who have sufficient knowledge of more than one critical aspect of business.
Facilities coordinator is a position that needs skills that are similar to that of a recovery manager. This role demands the completion of work as scheduled using minimal amount of resources along with the responsibility of a design of requirements of a data-center. Thus this role of facilities coordinator could also be assigned to the recovery manager in this scenario.
Furthermore due to the broad skill set of a manager which would certainly include his awareness on the day to day operations of business and also his ability to deal with people and also skilled technical individuals would qualify him for the role of Administrative coordinator.
The role of the Computer Operations coordinator who is skilled in day to day operations of the system and also possessing the knowledge of the help desk support features could be assigned to one or two people. Also it could be considered to take care that the person has knowledge in the functioning of networks and so the same person could also be assigned the role of Network coordinator if at all there is lack of personnel who could be allocated to such a team.
Since the role of technical coordinator demands a strong skill set in establishing interface between applications developed on different interfaces, this person would certainly have knowledge in the day to day applications that are used in the company. Thus the roles of being a Technical coordinator and Applications coordinator could be assigned to such a person and the planning process could be started.
Though the roles could be assigned between three people, the limitation of work time availability of the other two people to two or three hours a day would lead to a certain failure in developing a disaster recovery plan. Thus among the positions mentioned only the positions of a Recovery Manager could be justified and two other positions could be justified up to a partial extent because of the nature of the work being part time and also depends on the employees skills set and their own motivation to be part of such critical teams.
This question involves identifying risks and categorizing them from chapter 3.
Explain the process and the need of identifying risks and determining how likely it is that the risk affect the organization. Then list and explain about ten common risks. (12 points)
The objectives of risk identification are to identify and categorize risks that could affect the process of an organization and document these risks. Proactive organizations identify risks before hand and analysis is done based on the risk. Reactive organizations react after the problem has occurred and they will try to mitigate the issue before it gets worse. A proactive organization will chart out all the possible risks the company would face in case a disaster occurs and it will be done systematically. This helps them to take immediate action when facing a serious disruptive event. Proactive organizations will have special disaster recovery team to identify and analyze risks; moreover they will plot a possible solution if they are faced with those kinds of problems. On the other hand, reactive organizations will figure out ways to find a solution to a problem that has already occurred. So damage has occurred, and they need to avoid worst case scenarios. Reactive organizations also identify risk but it may not be as detailed and more comprehensive like proactive organizations do because it is done at the last minute. So risk identification differs in proactive organizations and reactive organizations in which time is a huge constraint for the reactive organizations to analyze and make decisions. Organizations vary in the rate at which they respond to organizational problems, even when they have similar task environments. A proactive organization engages in decision making and information gathering whenever possible. A reactive organization waits until if there is a compulsion to gather information and decision making. Proactive and reactive organizations spend the same amount of time on a single search. The decision between these two organizations lies in their coordination. Proactive organizations react to organizational problems faster than reactive organizations.
The consequence is that proactive organizations outperform reactive organizations. Proactive organizations are more active; more prepared more cooperative and has better performers. It is important that organizations make accurate and timely decisions at the time of disaster. In many cases time pressure causes errors due to loss of information. When time is short proactive organizations have advantage because they are prepared and ready to make a decision. It is a disadvantage in reactive organization that precious time is spent in problem solving and decision making. But when timing is not a crucial factor, reactive organization can solve their problems more economically as they need less training and lower information processing costs. There are various factors that would help to determine whether an organization should be proactive or reactive such as task environment, stress, and organizational design. The effect of time pressure is crucial in proactive and reactive organization. Proactive organizations treat their data like a corporate asset. They think globally across the enterprise, and act collectively as a unified group. Moving to the proactive stage is very difficult because it not the technology that can bring on failure. But, it is the people, politics, and cultural shifts that can make or break a proactive organization. The organization can concentrate more on process because data is handled and monitored.
Being proactive or reactive depends on potential business strategy based on the situation. Proactive means taking steps to contain situations for the long term. It demands that one should analyze the situation thoroughly and then identify alternatives that are best suited for the organization.
The most common risks which would warrant a Disaster Recovery plan are:
Fire
Water logging or flooding
Theft
Intrusion
Mal intent
Human error
Software failures
Hardware failure
Power outage.
Terrorist attack.
Network hacking.
Now, identify at least ten risks that are most likely to affect FDU. For each risk come up with a rating system. Use H, M and L. H means highly likely, M means moderately likely and L means least likely. Explain and justify your rating of each risk.(13 points)
The risks faced by FDU are as follows:
Event: Rating:
Fire H
The fire is a major hazard and thus it could occur at any time and if it occurs and destroy entire infrastructure would qualify it as a major disaster.
Water logging or flooding M
In case of water logging or flooding the equipment might get affected depending on its location and the kind of water hazard thus this would be a moderate to low level hazard. Recently two days back some towns in north New Jersey were evacuated because of the flooding and water levels were higher than normal.
Theft L
Theft is very less likely but still is a remote possibility and thus it is given a rating of least likely.
Intrusion L
Intrusion could be for any purposes, and be done by a disgruntled employee or any person who is looking to get back at the University or such thus the threat level of least likely.
Human error M
Human error is also a possibility although the employees know their work well and are well trained in all systems and processes, but still some possible error or mistake could jeopardize the system and so this event threat level is also moderate to less likely.
Software failures L
Software failures are very much a possibility although most systems are thoroughly tested still there are some remote chances of having a S/w failure, thus the Threat level of least likely.
Hardware failure L
Hardware failures although are not everyday occurrences but still there are also possibilities of having a hardware failure which gives it a threat level of least likely.
Power outage H
Even it’s less likely to have an electric outage in NJ but it happened before. I remember last winter storm back in 2009 many cities were out of electric for couple of days even people had to stay at hotels. That affected the FDU because they depend on electric for classes, computer’s labs, servers and buildings. It was inconvenience for everyone. Also this point might join my next point which is the terrorist attack.
Terrorist attack. L
When the country declares a terrorist attack it’s important to shut down everything for everyone’s safety. It’s less likely to happen to the university but we should consider that because we are close to NY. I also remember an accident back in 2005 when an electric outage happened because of problems in electric company’s computer system. It lasted for almost 40 or 50 min. but the scary thing it was not just NJ but all the states from east coast to the west. So people start panic and chaos because they were thinking they were under attack.
Network hacking H
The network of the university is very important. So it’s highly recommended to protect the network from any intervenes inside or outside the university. For example student can hack the database and change their grades or just make problems for this data so we should take this matter in consideration.
Order Now