A continuation of sorts…

Credit: N. Hanacek/NIST

As we further consider the elements of the NIST CyberSecurity Framework (CSF) from Michael’s multi-part series, it is helpful to perform a deeper dive into the ideas of Respond and Recover (the last two elements of NIST CSF). If you haven’t read that series, you may want to check that out first: https://choosetcs.com/2022/01/19/nist_csf_guide/.

Business Continuity spans both Respond and Recover while, as its name suggests, Disaster Recovery is the plan to be used in the “worst day ever” type scenarios and lives in the Recover CSF category. Before we go further, I want us to stop for a second and nail down some basic terminology. You may be thinking at this point, “What is the difference between Business Continuity and Disaster Recovery?” Glad you asked. We often hear these used almost interchangeably, but they are distinct concepts. Even so, they are somewhat like peanut butter and jelly as BCDR is to PBJ. We think of them as one thing. Using the definitions from FEMA, Business Continuity as “The ability of an organization to provide service and support for its customers and to maintain its viability before, during, and after a business continuity event.” Further, it defines the Business Continuity Plan as “Process of developing and documenting arrangements and procedures that enable an organization to respond to an event that lasts for an unacceptable period of time and to return to performing its critical functions after an interruption.”


And since we will later address Disaster Recovery, let’s consider the following definitions. A Disaster is “A sudden, unplanned calamitous event causing great damage or loss. In the business environment: any event that creates an inability on an organization’s part to provide essential products and/or services for an indefinite period of time.” And a Disaster Recovery Plan is defined as “The management approved document that defines the resources, actions, tasks, and data required to manage the technology recovery effort.”

If you are a regulated business, you must have these plans in place. If you are non-regulated (does that even exist these days?), you would be well served to have these plans in place anyway. Increasingly, TCS is seeing these requirements called for in underwriting Cyber Insurance policies, so this and other security & compliance risk reduction measures are not viewed as optional in today’s cyber threat landscape. And unless you are just looking for a different career path altogether, we cannot emphasize enough the necessity to invest the time to get this right. The oft quoted statistic of “40%-60% of small businesses never reopen after a disaster” applies here.

The first step in solving any problem is recognizing there is one.

Your organization is at risk and you may not even know it. Do you have an up to date and tested Business Continuity Plan? If not, you may be missing critical details to keep your business running through a disaster. This is not a technology problem and is not the responsibility of your IT department (on staff or outsourced). This is a strategic imperative which much be owned from the top down. In short, it’s a business problem and risk reduction initiative.


The good news is if you are reading this, you are most likely not trying to restore order from chaos due to a disaster. But this doesn’t mean you should be comfortable with the status-quo. The calm BEFORE the storm is the best time to prepare. We often don’t see the disasters coming miles ahead.


TCS is not only experienced in developing and testing these plans, but in managing its clients through the worst possible events that can easily cripple a business – pandemics (we’ve got the t shirt), ransomware/crypto locker (check), server room floods, power outages, you name it. And believe it or not, you don’t have to reinvent the wheel to put your plan together. That said, your plan will not be cookie-cutter and must address your specific requirements. TCS recommends taking advantage of our Compliance as a Service (CaaS) program to provide fixed-fee consulting support for this and other regulatory compliance needs.


Whether you engage with TCS or do this yourself, be sure to allocate regularly scheduled time week over week. This is not something that will be assembled in a day and the effort will become part of your ongoing business process, not simply a dusty document in a binder on the shelf. It could take a few months the get through this the first time, but the important thing is to make steady progress and not think of business continuity planning as a box to check. It will be an iterative process and you will revisit, test, and update the plan at least annually. So put on a pot of coffee, roll up your sleeves, and let’s go.

I’m from the Government, and I’m here to help!

DHS has a government produced Ready.gov site with a useful Business Continuity Planning Suite.  It can be downloaded here: https://www.ready.gov/business-continuity-planning-suite.  When I first found this tool my thoughts turned to the famous President Ronald Regan quote, “The nine most terrifying words in the English language are, ‘I’m from the Government, and I’m here to help.’” In this case the government is quite helpful.  This is a simple and effective tool and my next few articles will walk you through the process of developing your own Business Continuity Plan.

Now you could stop reading here and simply follow the steps outlined in the software.  It’s actually a straightforward, but lengthy, process, so plan to do this in bite-sized chunks and not all in one week.  The more thought and consideration paid to your business functions/data, personnel, and technology, the better aligned your plan will be with your needs when things hit the fan.  This series of articles will highlight where to slow down and pay attention and where shortcuts can be made.

There Is No “I” In Team

The steps for building your own Business Continuity and Disaster Recovery plans will be covered in more detail in upcoming posts.  A good idea for now is to assemble a small team for developing your plan and then you can divide and conquer the various tasks which we will outline later.  Also, a smaller organization will end up with more overlap of roles and fewer teams defined within the plan, but to get things started, a small group with an Executive/Owner sponsor should lead the effort.  This is a top-down strategic (company-wide) initiative and not something to be led from your IT group.  They will be instrumental from an operations standpoint, and will need to be involved in development and (ultimately) executing the plan, but they will not have a complete view of your organization’s priorities, critical functions, and workflows. 

Next week we will move on to installing the tool and familiarizing ourselves with the application so we can start making progress on developing the plans.

In wrapping up our series on the NIST CyberSecurity Framework (CSF), we come to the final of the five functions – Recover.  Just like the rest of these functions, the Recover function involves creating policies and procedures around recovering from a cybersecurity event.  In other words, when a cybersecurity event happens within your organization, how will your team restore normal operations?  The goal is to have a recovery playbook before an event occurs, so you can recover quickly and efficiently without stepping on forensics.

Having performed the first four functions, you’ve done most of the legwork already.  As such, your Recover function’s success will directly correlate to how well you’ve identified mission-critical resources in your business (Identify), defined how you will protect them (Protect), defined what mechanisms you will put in place to detect threats (Detect), and how you will respond to various threats (Respond) when they occur.

There are four steps to fulfilling the Recover function of CSF:

1. Planning for CyberSecurity

In the planning phase, it’s helpful to think about your company or organization in terms of function.  For instance, most companies have communications, production, administration, operations, etc.  By breaking down the company into those categories and subcategories, it’s easier to plan for recovery for each category.  This will also help you identify which personnel need to oversee the various categories of Recovery.  Naturally, a smaller organization may only have one or a small handful of people.  Larger organizations will want to be specific regarding which personnel have access and control over appropriate Recovery controls.

Due to the sensitive nature of the Recover process, you must take care to keep those plans and procedures confidential only to those who are responsible for implementing recovery procedures in response to events.  You don’t want your recovery procedures to get into the wrong hands only to have them work against you in the middle of a crisis.

Once you document your plan for recovering each function, and who will own the recovery process, your users will have an easy-to-reference guide for how to start recovery procedures.  Did the cybersecurity event impact communications?  If so, then proceed here.  Did it impact production?  Then proceed there.  Management should catalog these categories of business by priority, so the recovery plans and procedures leave no question regarding order of importance.

2. Commit to Continuous Improvement

While it’s important to be as thorough as possible on the first run, you must see the Recover function as something that gets refined over time.  Not only will your company and organization leverage new technologies to fulfill the mission of the organization, but you are likely to miss or overlook a critical procedures on the first few attempts. The best way to overcome this is to schedule times when your organization will focus on reviewing your recovery plans.  Maybe you can incorporate reviewing those plans in your regular admin meetings.

At TCS, we often refer to a client’s operational or security maturity level.  That should be how we approach security.  There is a maturation process that occurs with time, as business owners and managers grow in their understanding of threats and how best to perform all the functions of CSF.  Performing security drills to simulate common threats is helpful for learning where gaps exist between where you are versus where you ought to be (or even thought you were).  Then, plans can be made to close those gaps over time.

Here is a helpful illustration from NIST 800-184 regarding how the Recover phase helps inform and refine the previous functions of CSF:

Figure 3-1: NIST SP 800-184 Guide for Cybersecurity Event Recovery Relationship with the NIST CSF

3. Define Your Recovery Metrics

Recovery metrics are helpful for framing business resilience discussions.  One of the primary metrics is Recovery Time Objective (RTO).  How much time can your company afford to down without causing significant negative impact upon the business?  Testing those recovery time objectives will be key to determining if you have the right solutions in place.

Another metric with regard to data recovery is Recovery Point Objective (RPO).  While the Protect function should have already defined this, your Recovery drills will identify if your desired RPO and RTO are achieved by the technologies you are currently leveraging to minimize them.

Another example of a recovery metric is one that we hope we never have to confront – the point of no return.  At what point will your system be so compromised that it is beyond recovery?  When should you forklift the system and start over?  Defining that point could be the difference between surviving an event and completely folding.

Finally, what Service Level Agreements (SLAs) and Insurance policies do you have in place to protect your organization?  Where would one go to get that information?  Is it organized and catalogued into a single, accessible location?  Who has access to that location?  Who is responsible for ensuring SLAs are being achieved?  Is that data being reported to Management on a regular basis?  These are all questions you need to consider regarding recovery metrics.

4. Build Your Recovery Playbook

The Recovery Playbook is the result of all the Recovery planning.  Here is a summary of the tactical recovery plans that should be included in your Recovery playbook:

  • Defined mission-critical company assets (both personnel and technology)
  • Categorizations of all assets and their interdependencies
  • Identification and documentation of key personnel responsible for overseeing the recovery process
  • Protective measures to ensure an effective recovery, including how to identify the root cause of the threat to ensure false assumptions are not overlooked
  • Defined conditions under which the recovery process is initiated and by whom
  • Tactical plans for restoring in a clandestine way that doesn’t give away information to the adversary or destroy critical evidence
  • Examination plans for how the threat will be addressed by the appropriate personnel
  • Vetted and tested recovery capabilities
  • Scheduled time post-incident (as immediate as possible) to document lessons learned and how to better protect against similar threats in the future

Any time gaps are identified, either in response to a threat or in response to a simulated threat, you should update your Plan of Actions and Milestones document (PoAM) to determine how best to close those gaps and create timetables by which those gaps will be closed.  Performing a hotwash immediately following a security event will ensure key details concerning the threat will be considered in the PoAM.

Conclusion:

As you can see, there are a lot of details and moving parts to the Recover function.  The planning will not be wasted time, though, especially if your business were to face a significant security event.  The goal, though lofty and unobtainable in today’s world, is to go through these exercises to eliminate (best case) all security incidents or (most likely) mitigate them when they do occur.  The time invested in these exercises of planning and documenting could very well determine whether your organization survives when an event occurs.

If you should need assistance with any of these security efforts, TCS would be honored to assist you.  Please contact us today, if you would like to discuss how we can bolster your security or operational maturity levels.  We can also help provide a second opinion regarding your current efforts to help you identify gaps to schedule on your PoAM document.  Either way, TCS would be thrilled to assist in any way possible.

Note:  This article was based upon the NIST 800-184 special publication.

The next logical step in the NIST CyberSecurity Framework is Respond.  In other words, how are you planning to respond when a threat to your organization is detected or realized?  The Respond function essentially sets forth the processes and procedures enacted for incident response, who will own the issue and oversee its execution, who will be engaged to perform the forensics to determine how the threat gained a foothold in the environment, and what steps should be taken correlative to the risk inherent to the threat.

There are four aspects to the Respond function of CSF:

1. Response Planning

The goal in response planning is to enhance your business or organizational resiliency.  Here are some scenarios to consider that we hope would never occur but are likely enough to consider for planning.  We’ll start with a very likely incident.  What happens if your company loses power?   How long can the company network sustain a power outage before it becomes a critical incident?  What would happen if your major Cloud provider (Office365, QuickBooks Online, Kronos, etc.) went offline for a month or longer?  How would your organization respond?  Do you have a Business Continuity plan to cover instances like that?

How would your company be affected by a fire, flood, or tornado?  Would your clients and branches be able to maintain communications and business basics?  Do you have a Disaster Recovery plan that can cover that?

Of course, some these issues are tertiary to cybersecurity – they impact cybersecurity but may or may not be directly related.  What happens if an employee is tricked into opening an attachment that introduces ransomware to the entire network?  Or, what happens if one of your security controls indicates a persistent attack from a particular source? What happens if a disgruntled employee attacks the network from within the company?  Who is notified, who is responsible for mitigation and remediation, who needs to be alerted and when?  What is your Security Incident Response plan?  These are all things you need to consider.

Smaller organizations have the benefit of being able to pivot quickly and adjust to unforeseen situations.  Larger organization require more thorough planning to survive and adapt to such events.  However, we all know that planning ahead of time makes these situations less stressful and easier to overcome.  If that weren’t true, EMA and the Military wouldn’t invest so much time in training and preparing their personnel for disaster response.  Be sure your response planning includes Business Continuity, Disaster Recovery, and Security Incident Response plans.

2. Communications

This article has already hinted at communications, but it is the key to overcoming any crisis.  Technology can help us here, since we all have a smartphone in our pockets; but how will you leverage those technologies in response to an emergency?  What do your personnel need to know and expect when normal avenues of communication are not an option?  How will you respond in such a way to maintain business as usual while not destroying evidence necessary for the authorities to forensically investigate the incident?  Who is going to notify the authorities and what authorities should be notified?  How will your clients get in contact with you?  How often will you test these plans to ensure you aren’t overlooking a critical roadblock?  When do you need to contact your cyber-insurance provider?

There are a lot of questions to consider, which is why leadership must make it a priority to plan out these scenarios.  Attempting to make these decisions on the fly will generate incredible chaos and likely will miss better options that would save the company time and money.  There are a lot of moving parts to cybersecurity incidents, and the more you plan before you need them, the better your organization will weather the storm of an attack.  Defining who communicates with whom and by when will mitigate a lot of unnecessary stress and chaos.

3. Analysis

It’s difficult to talk about one aspect of Response without alluding to others.  We’ve mentioned forensics already, but forensics needs to be planned for in the communications stage of an incident response plan.  Additionally, forensics needs to be performed and executed. 

If you have a cyber-insurance policy, today’s policies often cover forensics up to a certain amount.  Depending on your insurance provider, they may want you to notify them (communications again) before doing anything; because they want to ensure the proper authorities are involved before you make changes that will negatively impact their ability to forensically identify how the attack occurred, who was responsible for it, and what can be done to mitigate that threat in the future.

If you have an IT department, you need to have some means for them to perform their analysis from a read-only snapshot archive.  This enables analysis to be performed without tampering or contaminating digital evidence.  This is where your Protect function comes into play.  Those enhanced logging and archiving measures developed and implemented will help both internal and external sources get to the bottom of the issue.

4. Mitigation

Finally, once you’ve identified various threats, it is important to have a plan for isolating those threats from doing any further damage to your organization.  For instance, TCS has the ability to immediately isolate a computer from the network as soon as ransomware is detected on it.  This effectively enables us to limit the threat exposure to our clients, but ransomware is only one of many threats to our clients.

Different kinds of threats pose different mitigation complications depending on the type of threat.  Planning ahead to determine how different threats can be isolated and contained as quickly as possible will help you recover faster with less negative impact to your organization.

Conclusion:

As you can see, the further we get into the functions of CSF, the easier they get.  All that front-loading work at the beginning to identify the various types of threats, perform risk analyses, implement protection measures, develop policies and procedures for how personnel will perform critical tasks, makes it much easier to respond to emergent issues.

That being said, there are a lot of moving parts to the incident response plan. If you find that you are overwhelmed by the magnitude of incident response planning and need some consulting or even compliance assistance, please reach out to TCS today!  We’d be honored to help you work through these issues and have the best plan possible for your organization to weather just about any storm short of a zombie apocalypse.

Note: This article was based on the resources available at https://www.nist.gov/cyberframework/respond