Massimo is Operations Planning Manager within the GCSEC (Global Cyber Security Center, Rome). He coordinates, as PMO, the research and education activities of the foundation. Since January 2017, he leads the CERT and Cyber Security of the Poste Italiane with in the Information Protection Department. After economic studies, he obtained PhD in “Geoeconomics, Geopolitics and Geohistory of border regions” focus on Critical Infrastructure Protection Programme and a Master in “Intelligence and Security Studies”. In the previous experience, he assumed the role of Associate Expert in Risk Resilience and Assurance in Booz & Company and Booz Allen Hamilton. He also acted as consultant in several think tanks, for industrial groups as well as for the NATO.
The management of an incident is the most critical task that a Computer Emergency Response Team (CERT) or a Security Operations Center (SOC) can tackle. When dealing with an accident, there are many factors to consider and the time required to take them into account is reduced. To this must be added the element «stress» that affects the performance and choices to be made.
For the latter reason, we always remember that it is essential to prepare for the event. The management of an accident cannot be improvised. The table top scenarios to oil the aspects of collaboration and communication and the exercises, to test the procedures of the departments involved, should be carried out annually to verify each aspect of management and improve its effectiveness and efficiency. The procedures serve to reduce the time of “psychological blockage” that affects those involved in management.Unfortunately, these activities are underestimated because they are considered to have a low added value. The departments involved are many and the agendas of managers are often misaligned and do not allow common moments of comparison and testing. This has a major impact on the timing and response to an accident. Starting to handle an incident the moment it’s identified, is too late. As in the Civil Protection and Defence manuals, also in cyber security the key concept is readiness or “preparedness”, which is achieved only with preparation. Some of you will wonder about the relevance of the introduction to the title of the article. The answer is everything. We consider the macroprocess of an incident management and we verify how fundamental is the preparation as well as the multidisciplinarity. The first step is to identify an accident. An incident can be reported by a customer, an employee or identified during the service monitoring phase. The first activity to be carried out is a quick triage to understand what is happening and what the impact on services can be in terms of interruption, loss of confidentiality, integrity and availability (RID) of information but especially in terms of economic impact. Speed is essential to determine impact, but it’s not something you can do on the spot. The impact is determined on the basis of considerations made during the preparation phase. Usually an impact matrix should be used where different evaluation parameters (e.g. personal injury, service interruption, RID compromise, economic loss, image loss,…) and different criticality levels (e.g. green, yellow, orange and red) are considered. On the basis of the triage, the severity of the event must be determined, from which the actions to be implemented will arise. This is why multidisciplinarity comes into play as early as the assessment phase of the accident. To be more precise in the preparation phase, to draw up the impact matrix for the evaluation of the accident. The actors who must be involved at the table, in addition to the technical profiles, are certainly the “business owners” of the service, the data protection officer, the department of communication / public relations, customer care, legal and regulatory affairs. All these figures must jointly identify which are the criticality levels (thresholds) for each individual parameter of their relevance. For each level of criticality a specific procedure of intervention will be established, involving other structures and levels of management and internal and external communication. The thresholds must be quantitative and not qualitative. The potential impact, defined on the matrix, determines, as already mentioned, the actions to be carried out. If the impact is smaller, it is usually managed within the SOC/CERT technical structure/presidium. The situation is different where the impact is higher in terms of inefficiencies, economic or reputational losses. In this case, an incident management procedure involving several structures is initiated. Multidisciplinarity has its maximum expression during a crisis in which several concomitant factors must be managed. Leaving aside the purely technical IT and security aspects of accident management, I shall present below some of the disciplines that should be involved in the resolution of the crisis. First of all, the business owner of the service. The business owner is the one who can best describe the direct impact of a disruption or blockage of its service and translate it into economic losses (lost revenue, production deadlock, slowdowns, …). The business owner, during the crisis, must already have estimates of what the impacts may be based on the time for which the disruption is slowed down or stopped. For this reason, I would like to stress that most of the activities must be carried out during the “peacetime”, as well as the collection of all the information necessary to understand the scope of the event. The more you have a view of detail, the better. If it is possible to determine the flow of revenues deriving from the service, based on the history of previous years, in terms of reference period (days, hours, etc. etc.), the greater the accuracy of the estimate will be. Obviously for new services, the estimate will have to be based on a methodology. Some might even argue that it is not necessarily the case that each period is the same as the previous year. That is true, but if you don’t have any other methodology to base yourself on, we should start somewhere. Another aspect and discipline to consider is certainly communication. Communication plays a key role in accident management. If you communicate badly or late, you risk the amplifying effect of the impact. In the event of an accident, it is advisable to communicate first and not wait for the voices to leak out and be interpreted or distorted. The risk is the effect of the game “Catch the mole” i.e. you spend your time denying instead of being a primary source of information. The main factors to consider in communication are the messages to be conveyed and the channels to be used according to the type of stakeholder (target) to be informed. Communication could be institutional and therefore use a more formal or customer-oriented format and therefore use simpler terminology. Communication, however, is meek if you do not know the target to which to communicate. This is where two other disciplines come into play: customer care and psychology/sociology/anthropology. Customer care is essential to get to know your
customers or investors. Factors such as age, geographical area of belonging, preferential channels for information, type of product / service acquired are useful to understand which channel to use and the frequency with which they need to be updated. Psychology/sociology/anthropology, if I have condensed them into a single group, is useful to understand which message to
communicate. Different types of customers may need different messages that can be understood by the customer. Psychology is used to define an effective message that produces the desired effect (soothing, alerting,…). Sociology/anthropology is necessary to understand the effects of the accident. A crisis can produce knock-on effects, such as indignation, consensus, frustration, which spread within social groups in different ways. The leakage of personal data may be experienced differently between a group of teenagers and a middle-aged group or between peoples of different cultures. In addition to the communication aspects, the contractual aspects relating to customers are also relevant. If Service Level Agreements have been defined, the Business and Legal functions must determine what the direct consequences may be. Whether there are specific performance clauses to be guaranteed, whether there are contractual termination clauses, or whether there are particular indemnities to be taken into account, will therefore be decisive in both the impact definition and the resolution phases. It is important that this information is available already at the stage of impact analysis because it should be considered within the matrix but often it is not. The Regulatory Office, like that of the Data Protection Officer, should be called upon to determine whether any incidents were caused by the company’s failures and whether these failures, in addition to requiring specific notifications to the relevant public officials or customers themselves, do not provide for potential sanctions. Finally, the presence of the Purchasing Department should also be considered. The resolution of an incident may require the activation of extra-budget contracts and the method of acquisition must be defined and formalized by the purchasing department in derogation of internal procurement policies. These are just some of the functions that, in my opinion, should be triggered during a high-level crisis or incident. In conclusion, I would like to reiterate some of the fundamental elements of accident management: defining a holistic accident management process; defining an impact matrix as detailed as possible; defining the rules of engagement on the basis of the criticality levels of the matrix; testing the communication flows between the various departments; carrying out periodic
exercises; preparing different accident scenarios with related response procedures. All this must be prepared in peacetime and therefore requires the collaboration of all functions. In the future, Artificial Intelligence may change the response time and also the relevance of some functions compared to others, but we will deal with it on another occasion. Continuity comes from resilience, resilience comes from readiness, readiness comes from preparation, preparation from commitment. Let’s remember that am ismanaged incident affects everyone’s business objectives.
Author: Massimo Cappelli