Management 101 says that “If you can’t measure it, you can’t manage it”; disaster recovery 101 puts it somewhat differently, “If you don’t test it, then it’s not really a disaster recovery plan”.
To be frank, when a disaster happens, when the power outage drags into the second day, is no time to discover that some component of the recovery plan isn’t performing!
The only way to ensure that there are no nasty surprises is to document the recovery procedures and schedule regular testing of the entire business continuity plan, including the IT disaster recovery portion. Care has to be taken to ensure that the IT disaster recovery plan, usually the preserve of the IT department, is aligned with the overall business continuity plans that cover the business processes and people.
At a practical level, business continuity testing is often performed on a unit-by-unit basis, but it makes sense to test the effectiveness of the IT disaster recovery plans as a whole—in fact, IT testing can be run separately. Adopting the previous blog’s suggestion of using virtualisation to mirror the entire production environment, such a test could be performed without any disruption to the normal running of the company. In fact, switching between the production and disaster recovery environments would be a normal routine.
One additional point needs to be made. Disaster recovery needs always to cover loss of data, but there are instances in which recovering from loss of service might not make sense—for example if clients are impacted by the same power outage. Unless they have effective disaster recovery plans in place, they would not be able to trade with you, and so the best option might be to remain “down” until power is restored.
In conclusion, by focusing on the current power challenges facing South African businesses, our intention is not to be unduly alarmist. As already stated, we believe the threat of a national blackout is very small indeed. However, we do think that prudent risk mitigation has to take account of the possibility of power blackouts that are longer than the usual four hours, and that could affect entire regions. Such eventualities will need creative thinking, and we have tried to give some insight into the thought processes companies should be following.
As always, please feel free to contact ContinuitySA to discuss how to put a solid, practical business continuity plan in place.]]>
Our last blog alluded to a key principle of disaster recovery: make sure you are in control. This doesn’t mean that you can’t outsource to a specialist provider of disaster recovery services (indeed, for many companies this makes good business sense) but it has to be done intentionally and with adequate planning. It can’t just be presumed to be a sort of byproduct of, for example, cloud computing.
Another point to make about cloud is that clients are usually not allowed to run a virtualised environment on a virtual cloud provider’s infrastructure. In most cases this is considered a violation of the terms of service. Typically, a cloud provider offers a virtual operating system on which clients install their applications. This means that the client has basic control of the operating system and full control of what runs on it; the provider controls all the underlying functions.
The reality is that using the cloud for disaster recovery means less control over the finer details like adequate data protection.
By contrast, virtualisation means the client controls the whole stack, from tin to app. As we indicated in the first blog in this series, we believe that new technologies hold the key to developing a highly practical and affordable IT disaster recovery solution. Companies wanting to manage their own disaster recovery arrangements should first of all look at the open source virtualisation solutions that currently exist. It’s possible to buy a licence for around R400 to R500 per CPU from a company that offers support, making this option highly affordable.
As part of this virtualisation strategy, one could rent a server in a local data centre and another in a data centre in another area—for example, one in Johannesburg and one in Cape Town—and then replicate apps between the two sites. It’s actually not a very complex procedure at all.
These data centres should have audited, transparent diesel reserves or other contingency plans to cope with extended power outages. In fact, given the availability of affordable and plentiful bandwidth, some companies might even consider a third server overseas, in a country where power supplies are stable. This might put one on the wrong side of some data-privacy legislation, but if the risks seem to justify such a step, it might be worth investigating this option.
Such a structure would be relatively simple to set up. It would have the great advantage of allowing the company to switch between the disaster recovery and production environments. Because the production environment is mirrored on the disaster recovery sites, losing power at the production site would entail hardly any downtime for the IT systems—and no loss of data, depending on how often changes are replicated.
Of course, the question of work-area recovery for the staff at the production site is something that would also have to be considered as they too would have been impacted by the extended power outage.
Next time, to conclude this series for Business Continuity Awareness Week, a few thoughts on testing and when disaster recovery makes no business sense.]]>
It’s very easy to think that data and/ or systems that are in the cloud are necessarily disaster-proof. After all, a professionally run data centre would have state-of-the-art disaster recovery in place, right?
Not necessarily, and its processes might not cover its clients’ data adequately. Scrutinise the fine print very carefully. In our experience, it’s usually the client’s obligation to perform backups. Clients also have to rely on the data centre’s virtualisation, which might not be mirrored to a backup site. If it is—it’s usually an add-on purchase—it’s wise to be very vigilant about how and when your system is mirrored.
Another point to make is that clients cannot specify where in the cloud their data is housed—it might be physically located in the same region and thus subject to the same power problems.
This might not have been much of an issue in the past when power outages were typically just a few hours long. Now, however, one’s plans must take into account the possibility of outages running into days. The problem is that the typical cloud contract does not stipulate in which of the cloud provider’s data centres your particular data will be housed. From a disaster recovery point of view, relinquishing control over where your application actually is means compromising your survival capability in an era of power instability.
Imagine the frustration of not being able to request your provider to host your application in another province or even country to reduce the risk of power outages—simply because the contractual terms do not allow it.
In fact, we challenge cloud providers to make public the diesel reserves they hold at each of their data centres.
Another related challenge that affects all network-based ICT, including cloud, is the fact that many of the telcos’ relay stations do not themselves have adequate contingencies in place to deal with a prolonged power outage.
Next time, we’ll consider how to put in place a disaster recovery plan that includes a practical response to the current power environment, and that does not cost too much.]]>
Last time, we argued that tape backups had been superseded by newer, more reliable technologies; and that companies should be investigating the range of vendors of virtualisation technologies to get the best deal. Virtualised environments can be easily cloned to the disaster recovery site, and proper testing becomes practical (more on testing in a later blog).
Despite all the reasons for making the move, many companies still cling to tape backups. One of the reasons for the inertia could be that auditors still look for evidence that tapes have been made, and clearly some education of the auditing industry needs to be undertaken. Companies that do make the change should plan to explain the benefits to their own auditors and make sure they understand the difference between cloning and replication.
Another reason could be simply humanity’s natural conservatism—it always seems better to stick with the familiar. Again, a proactive approach is required, and adequate training of IT staff on the new technology will pay dividends.
The next red flag as regards the typical IT disaster recovery plan relates directly to the power problems mentioned in the previous blog. Most IT departments already have backup generators in place, but are they adequate? Many companies have not purchased “industrial-strength” generators so they cannot run for longer than a few hours—as noted earlier, while we don’t think a national blackout is likely, a regional one is a distinct possibility. Does your company have a generator that could run for longer than a day at a time, and enough fuel to do so?
One point to make here is that if the disaster recovery environment is virtual, then it is more than possible to switch over to the disaster recovery site at short notice and before the diesel runs out or the generator’s tolerance levels are reached—presuming the alternative site does have some form of power as well. Playing ping-pong between sites in this way would theoretically permit a household-type generator to be used successfully through a longer outage. Obviously, a proper DNS and routing design would be needed to avoid having to reconfigure end-user desktops and system interconnections every time you do it.
Clearly, it’s better to purchase the right generator!
In any event, it seems wise to ensure a three-day supply of diesel to allow for delays in sourcing it when one is competing with other businesses in the affected area.
Many seem to hold the view that the cloud offers a trouble-free way to insure against disaster. Next time, some thoughts about why that might be an over-optimistic viewpoint, especially when power systems are unstable.]]>
Welcome to Business Continuity Awareness Week—our annual focus on business continuity within the context of contemporary issues. For us here in South Africa, it is well-timed as we begin coming to terms with a highly constrained power grid for the foreseeable future. Putting risk-mitigation strategies in place couldn’t be more important than it is now, as the odds of a severely disruptive power outage have just shortened rather dramatically.
We all now accept that regular load-shedding is part of the business landscape and has to be factored into business continuity plans. In extraordinary cases, a whole region could be without power for much longer than the normal few hours, and such an eventuality should also be factored into plans. However, we do believe that the chances of a national blackout are extremely remote.
Accordingly, our blogs this week will have power outages as a recurring theme, a kind of pressure test with which all business continuity planning has to be able to deal. In particular, extended power outages beyond the traditional two to four hours will affect businesses and their employees in multiple ways.
We will start the conversation with ICT because, let’s face it, most businesses today rely on it, be its own systems, the Internet, cloud-based systems or telecommunications more generally. ICT, wherever it is consumed, runs on electricity, so an unstable grid based power supply is a critical factor no matter how you look at it.
Let’s look first at the question of a company’s own systems, and what its IT disaster recovery plans look like.
The first red flag to be raised is the whole question of tape backups. We all know, or should know, that tape backups have a number of disadvantages. The whole process is highly manual, for one thing. It is usually performed at midnight when the system is quietest, and the tapes get sent off site (another whole set of risks). Most critical of all, tapes are rarely (if ever) tested, and a full disaster recovery test is hardly ever done because it takes so long to plan using this technology. When a disaster happens, restoring large amounts of data off untested tapes is a good excuse for going on Sabbatical.
Help is at hand. Virtualisation technology has now matured to such an extent that using tapes seems slightly perverse. The ability to clone virtualised production environments at the disaster recovery site while the production systems are online means that one doesn’t have to wait for a quiet period to do backups. In fact a company’s systems never have to go down at all.
Virtualisation can be thought of as the “tape killer”. It’s often said that this technology is too expensive, and that may be true when it comes to the big brands. However, open source virtualisation software provides an extremely cost-effective and enterprise-ready alternative, provided a reputable company exists to support it—which in most cases is actually true: open source really has come of age.
In other words, big budgets are really not needed to meet business requirements for resilience—particularly when it comes to power outages. While some larger enterprises may still prefer the “safety” of going with a big brand, two-thirds of the economy is made up of SMEs, for whom open source is a definite answer—in virtualisation and elsewhere.
Next time, some more points on tape backups and a look at the impact of a power outage on other parts of the business.]]>
Here are some factors to consider when assessing which supplier to choose:
Does it offer an end-to-end service? Many companies believe that implementing business continuity management is a once-off action, whereas it’s actually a process that has its own life cycle—a provider that understands the cycle and how to progress from one stage to the next is preferable. This is true even if a company believes it just needs one thing—disaster recovery, for example—it’s just as well to have a partner that understands the full picture. “Make sure that the partner you choose can provide the full range of services, from advisory right through to implementation of both the ICT and physical infrastructures (including an alternative site to work from),” advises Michael Davies, CEO of ContinuitySA, Africa’s leading provider of BCM services.
Another reason for choosing an end-to-end BCM integrator is that such a company will have experience in all of the components of BCM to enable the client to scope the nature of the solution required better, as appropriate to the stage within the life cycle it has reached and its business strategy and risk appetite.
Does it have sufficient experience in BCM and the right level of specialist skills? BCM as a discipline has evolved significantly from its beginnings as a way to recover from IT collapse. Since then, it has expanded to cover the business’s ability to continue servicing customers—how to ensure its business processes can be protected or reinstated.
“Now we are starting to talk even more broadly about business resilience—moving beyond simply understanding risks and putting contingencies in place to fine-tuning the business in the light of its risks so that it is less likely to suffer a setback, or can recover from one much more quickly,” Davies observes. “A company like ContinuitySA has been involved with that change as a member of the relevant industry bodies, so we understand it fully.”
A related point, says Davies, is whether the supplier adheres to and is accredited by the various BCM standards authorities, such as the Business Continuity Institute (BCI) and, of course, the International Standards Organisation (ISO).
Is it technology- and vendor-agnostic? Technology plays a big role in providing contingencies against disaster—it’s vital the provider is independent and thus in a position to choose the right business continuity solution for the client’s needs, without any predisposition for a specific technology or vendor.
Are its solutions flexible, scalable and tested? By its nature, a disaster invocation takes place at a time of high stress, when failure is not an option. It’s worth enquiring how many times a potential provider has actually provided “last-resort” services.
“One of our proudest achievements is that a financial services client had to relocate its treasury and trading operations to our recovery site—and was able to record its best trading day ever while it was using our infrastructure,” says Davies. “Having credible intellectual property around BCM plus a genuine track record is really what one should be looking out for.”]]>
Our business is all about helping our clients to survive disasters, so to some extent it’s just business as usual for us—our data centres and work-area recovery facilities are designed to keep operating in all circumstances. Because of the nature of our business, we have very stringent specifications, but nonetheless the measures we have in place might help other businesses come to some conclusions about what they should be doing.
All our facilities have alternative power in the form of UPSs and generators—our business can keep operating and, as important, so can the work-area recovery facilities we provide for clients.
UPSs and generators run in parallel, with automatic failover.
Our Midrand data centres have several diesel tanks, and diesel can be pumped to the various buildings are needed.
Diesel supplies are sufficient for four to five days of continuous use.
We are not reliant on local suppliers for refuelling. If needed, we have a mobile tank that can be put on one of our own vehicles to fetch our own diesel.
Our generators are on a full maintenance plan.
As regards telecommunications, we have multiple providers, so communication links can be failed over between them.
In addition, our Midrand and Randburg sites are connected by a dedicated fibre link. Thus if only one site can connect to the communications network, it can link in the other.
We have multiple sites in Gauteng, KwaZulu-Natal and the Western Cape, as well as internationally, all on different parts of the power grid. This enhances our ability to help clients recover their ICT environments.
All our sites have auxiliary water tanks.
All of these measures make our sites more resilient to the benefit of our clients—but also could serve as prompts for what companies should consider putting in place at their own facilities. Power shortages are going to be a fact of business life for the foreseeable future: What’s your strategy?]]>
In our previous blog, we examined some of the wider impacts of load-shedding that businesses should integrate into their thinking. Others include:
Impact on security. Most access control and building management systems rely on power to continue functioning so alternative power plans must include security and access control to avoid the business becoming a soft target during load-shedding.
Impact on society. One might argue that coping with load-shedding could create a kind of nation-building based on the feeling that “We’re all in this together”. More likely, particularly as the crisis drags on and on, is that existing social tensions will be exacerbated. When traffic is repeatedly disrupted thanks to non-working traffic lights, for example, incidents of road rage are likely to escalate, perhaps along with other forms of aggressive behaviour. Among these could be wildcat strikes and incidents of opportunistic looting, particularly when businesses start to shed jobs as they must surely do.
Once these inter-connecting risks are well understood and integrated in the business continuity plan, companies can put the appropriate backup power plans in place. These are likely to revolve around generators and uninterruptible power supplies (UPSs), both of which have their own requirements.
Here are some practical guidelines:
This blog series concludes with a look at what ContinuitySA is doing to mitigate the risks it and its clients face from load-shedding.]]>
As Africa’s premier integrator of business continuity services, ContinuitySA’s team spends a lot of time identifying risks and how to mitigate them.
The first order of business is to understand the true nature of the risk—only then can we truly mitigate it. Consequently, we are advising all companies to undertake a thorough review of their business continuity plans—this will provide a structure through which to understand the impact of load-shedding on a particular business, what the company’s risk appetite is and thus what plans should be put in place.
Because electricity is integral to a modern society, load-shedding creates a complex and interdependent set of risks over and above the primary risk of the company’s being unable to trade. These risks need to be understood within the context of each business’s strategic plan.
Some of the wider risks are:
Impact on employees. Regular and extended outages will disturb family life in all sorts of ways, from transport difficulties to care of children and elderly relatives. Employers need to understand the impact on absenteeism and be empathetic to employees’ personal challenges.
Impact on vital services. Power outages are likely to affect water supplies periodically and also telecommunications. Businesses can solve the water issue relatively easily by installing their own gravity-fed tanks that act as an emergency store replenished by the municipal water supply. More serious, however, extended power outages could be more than battery backups at some telecommunications sub-stations can cope with, leading to interruptions in communications. In particular, the impact of load-shedding on ICT disaster recovery should not be ignored.
Impact on the supply chain. Power outages in other areas will not only affect your employees’ ability to get to work, but also the operations of suppliers and clients. Today’s supply chains are both long and complex, and many companies use just-in-time inventory systems. It’s thus imperative that companies understand the impact that load-shedding has on suppliers’ ability to meet their commitments, and what any defaults will have on their own operations. ContinuitySA believes that companies need visibility of their suppliers’ business continuity plans, and also to understand the impact that load-shedding could have on their clients’ demand (and ability to pay) for their services. For more, read Make 2015 the year for becoming resilient and also Outsourcing: Know your partners’ business continuity plans.
Next time, more on the less-obvious impacts of load-shedding.]]>
Skills shortages. Specialist human resources are a feature of the health care industry but they are in short supply. An organisation can find itself reliant on certain individuals who cannot easily be replaced. Exacerbating factors include high attrition rates in the sector owing to low morale, and lengthy recruitment processes.
Crime. While all South African businesses face high risk from criminal activity, health care companies have a higher risk because goods/samples/specimens are often stolen from research and / or storage facilities because they form critical evidence impending court cases.
The loss of years of research findings, or the theft of patient information during a robbery also constitutes a risk as the findings are often irreplaceable and patient information can be highly sensitive in nature.
Inadequate buffer stocks. Industry standards stipulate that health care facilities must carry at least 30 days’ worth of buffer stocks. Complicating factors for health care providers to adhere to these regulations include the limited shelf life of some products, and a lack of funds to build and manage stockpiles.]]>