Recover – The Final Function of CSF
In wrapping up our series on the NIST CyberSecurity Framework (CSF), we come to the final of the five functions – Recover. Just like the rest of these functions, the Recover function involves creating policies and procedures around recovering from a cybersecurity event. In other words, when a cybersecurity event happens within your organization, how will your team restore normal operations? The goal is to have a recovery playbook before an event occurs, so you can recover quickly and efficiently without stepping on forensics.
Having performed the first four functions, you’ve done most of the legwork already. As such, your Recover function’s success will directly correlate to how well you’ve identified mission-critical resources in your business (Identify), defined how you will protect them (Protect), defined what mechanisms you will put in place to detect threats (Detect), and how you will respond to various threats (Respond) when they occur.
There are four steps to fulfilling the Recover function of CSF:
1. Planning for CyberSecurity
In the planning phase, it’s helpful to think about your company or organization in terms of function. For instance, most companies have communications, production, administration, operations, etc. By breaking down the company into those categories and subcategories, it’s easier to plan for recovery for each category. This will also help you identify which personnel need to oversee the various categories of Recovery. Naturally, a smaller organization may only have one or a small handful of people. Larger organizations will want to be specific regarding which personnel have access and control over appropriate Recovery controls.
Due to the sensitive nature of the Recover process, you must take care to keep those plans and procedures confidential only to those who are responsible for implementing recovery procedures in response to events. You don’t want your recovery procedures to get into the wrong hands only to have them work against you in the middle of a crisis.
Once you document your plan for recovering each function, and who will own the recovery process, your users will have an easy-to-reference guide for how to start recovery procedures. Did the cybersecurity event impact communications? If so, then proceed here. Did it impact production? Then proceed there. Management should catalog these categories of business by priority, so the recovery plans and procedures leave no question regarding order of importance.
2. Commit to Continuous Improvement
While it’s important to be as thorough as possible on the first run, you must see the Recover function as something that gets refined over time. Not only will your company and organization leverage new technologies to fulfill the mission of the organization, but you are likely to miss or overlook a critical procedures on the first few attempts. The best way to overcome this is to schedule times when your organization will focus on reviewing your recovery plans. Maybe you can incorporate reviewing those plans in your regular admin meetings.
At TCS, we often refer to a client’s operational or security maturity level. That should be how we approach security. There is a maturation process that occurs with time, as business owners and managers grow in their understanding of threats and how best to perform all the functions of CSF. Performing security drills to simulate common threats is helpful for learning where gaps exist between where you are versus where you ought to be (or even thought you were). Then, plans can be made to close those gaps over time.
Here is a helpful illustration from NIST 800-184 regarding how the Recover phase helps inform and refine the previous functions of CSF:
3. Define Your Recovery Metrics
Recovery metrics are helpful for framing business resilience discussions. One of the primary metrics is Recovery Time Objective (RTO). How much time can your company afford to down without causing significant negative impact upon the business? Testing those recovery time objectives will be key to determining if you have the right solutions in place.
Another metric with regard to data recovery is Recovery Point Objective (RPO). While the Protect function should have already defined this, your Recovery drills will identify if your desired RPO and RTO are achieved by the technologies you are currently leveraging to minimize them.
Another example of a recovery metric is one that we hope we never have to confront – the point of no return. At what point will your system be so compromised that it is beyond recovery? When should you forklift the system and start over? Defining that point could be the difference between surviving an event and completely folding.
Finally, what Service Level Agreements (SLAs) and Insurance policies do you have in place to protect your organization? Where would one go to get that information? Is it organized and catalogued into a single, accessible location? Who has access to that location? Who is responsible for ensuring SLAs are being achieved? Is that data being reported to Management on a regular basis? These are all questions you need to consider regarding recovery metrics.
4. Build Your Recovery Playbook
The Recovery Playbook is the result of all the Recovery planning. Here is a summary of the tactical recovery plans that should be included in your Recovery playbook:
- Defined mission-critical company assets (both personnel and technology)
- Categorizations of all assets and their interdependencies
- Identification and documentation of key personnel responsible for overseeing the recovery process
- Protective measures to ensure an effective recovery, including how to identify the root cause of the threat to ensure false assumptions are not overlooked
- Defined conditions under which the recovery process is initiated and by whom
- Tactical plans for restoring in a clandestine way that doesn’t give away information to the adversary or destroy critical evidence
- Examination plans for how the threat will be addressed by the appropriate personnel
- Vetted and tested recovery capabilities
- Scheduled time post-incident (as immediate as possible) to document lessons learned and how to better protect against similar threats in the future
Any time gaps are identified, either in response to a threat or in response to a simulated threat, you should update your Plan of Actions and Milestones document (PoAM) to determine how best to close those gaps and create timetables by which those gaps will be closed. Performing a hotwash immediately following a security event will ensure key details concerning the threat will be considered in the PoAM.
Conclusion:
As you can see, there are a lot of details and moving parts to the Recover function. The planning will not be wasted time, though, especially if your business were to face a significant security event. The goal, though lofty and unobtainable in today’s world, is to go through these exercises to eliminate (best case) all security incidents or (most likely) mitigate them when they do occur. The time invested in these exercises of planning and documenting could very well determine whether your organization survives when an event occurs.
If you should need assistance with any of these security efforts, TCS would be honored to assist you. Please contact us today, if you would like to discuss how we can bolster your security or operational maturity levels. We can also help provide a second opinion regarding your current efforts to help you identify gaps to schedule on your PoAM document. Either way, TCS would be thrilled to assist in any way possible.
Note: This article was based upon the NIST 800-184 special publication.