Home > Articles > Cisco Network Technology > General Networking > Computer Incident Response and Product Security: Operating an Incident Response Team

Computer Incident Response and Product Security: Operating an Incident Response Team

Chapter Description

This chapter covers aspects of running an incidence response team (IRT) such as team size, team member profiles, cooperating with other groups, preparing for incidents, and measuring success.

Be Prepared!

An IRT, by its nature, deals with emergencies and exceptions. As such, it is hard to be prepared for something that cannot be foreseen. Although nobody can be prepared for the exact incarnation of the next worm—because we do not know what it will look like—you can be prepared for a general threat of worms. The new worm is expected to have some general characteristics common with previously seen worms. It is known how previous worms affected the organization, so the IRT can prepare to handle future outbreaks similar to the previous ones. Following are some steps that can be taken to prepare to handle incidents:

  • Know current attacks and techniques.
  • Know the system the IRT is responsible for.
  • Identify critical resources.
  • Formulate response strategy.
  • Create a list of scenarios and practice handling them.

Know Current Attacks and Techniques

It is imperative for the IRT to possess an intimate knowledge of current attack techniques and attacks themselves. Without that knowledge, the IRT would not know how to distinguish an attack from some legitimate activity. Obviously, the knowledge must not be limited only to the attacking side. It must also cover the defense. How can you protect your organization from various attacks and what are the potential drawbacks of different methods? This also encompasses features and capabilities of installed equipment. And last, but not least, know the network’s topology and characteristics.

The next question is, How should you gather that knowledge? Unfortunately, there is no easy way to accomplish that. It must be done the hard way. Reading public lists like Bugtraq, full-disclosure, and others is standard for every team. Attending conferences and learning new issues is also important. Analyzing what is going on in the team’s constituency is obligatory. Monitoring, as much as possible, underground is necessary. Setting up honeypots and honeynets and analyzing the activity is also an option. But, above all, talk to your peers and exchange experiences. That is something that cannot be substituted with anything else. All evidence points to the fact that miscreants do exchange information and that they do it rather efficiently. Good guys, on the other hand, tend to lag behind in sharing the information. Chapter 6, “Getting to Know Your Peers: Teams and Organizations Around the World,” talks more about some of the main forums that IRTs can use to interact with peers.

It is not necessary for each IRT member to monitor all the sources. There are simply so many potential sources to collect the information that it is almost impossible for a single person to track them all. One workaround is to contract out this task to an external company or, if it is done internally, share the task among team members so that not all of them are monitoring the same sources.

When monitoring sources is contracted out, you need to make sure that the received information is relevant to the IRT. For example, if your constituency is predominately using the Solaris operating system, the information on vulnerabilities in Microsoft Windows is not that useful to you. The positive side of contracting out this task is that you are freeing your resources. The potential negative side is that you might need to renegotiate your contract if you want to change the scope of the information you are receiving.

If the information collection is done internally, you can include other groups or individuals to help you with that task, even if they are not part of the IRT. This help can be either formal or informal. If your organization has a group that monitors external information sources, you can make a formal arrangement with them to receive only the information that might interest the IRT. If you do not have such a group in your organization, you might find security-conscious individuals who are monitoring some of the sources that might also interest the IRT. If there are such individuals, you can ask them to forward all potentially interesting information to the IRT. This would be an informal arrangement that, in some cases, can be reliable and function quite well. If you have such arrangement, do make sure to nurture that relationship. Commend these people for what they are doing and try to make them feel appreciated. You can give them some small awards or take them out for a dinner. People like to see that their work is appreciated, so an occasional meal together will pay for itself many times over by the work these other people will do.

If your IRT decides to operate a honeypot or honeynet, you must make sure that you will have sufficient resources to do so. A honeypot is a nonproduction service exposed to the Internet with the purpose of being (mis)used by an attacker. The IRT can then capture malware and gain firsthand knowledge about how it infects devices and propagates. The service can be emulated with special software or it can be a real service. A honeynet is a network of honeypots. One way to arrange a honeynet is to assign an unused (either by your organization or in general) portion of IP addresses to a group of computers and monitor all traffic going in and out of that network. Computers can be either real hardware or virtual. If they are virtual computers, you should know that some malware can detect whether it is executed on a virtual platform and, if it is, the malware will not behave maliciously.

Although installing a honeypot and honeynet is relatively quick, monitoring and analyzing what is going on requires a considerable effort. You also must make sure that your honeypot is not used to attack someone else. Overall, honeypots can be valuable sources of information, but they also require significant effort to properly use them.

Know the System IRT Is Responsible For

The IRT must know what it is protecting, the location of the boundaries of the systems for which it is responsible, and the functions of different parts of the system. After defining boundaries, the next step is to identify the groups (or people) that can be contacted when the IRT must cross the boundaries. All this is only the start. These steps just define the area of the IRT’s responsibility. The next task is to determine what is “normal” within that area. This is important because the incident is something that is not expected. It is an activity that is not standard. Most of the malware would initiate actions that are not usual for an average user (for example, starts mass mailing or connects to an IRC channel). If the IRT knows what is normal for the given system, it will be easier to spot deviations and start investigating them. This is also known as determining the baseline. Depending on the organization, some of the tasks to determine the baseline can be done by IT or some other department and not the IRT. Irrespective of who is doing it, the IRT must be able to receive and use that information to spot anomalies.

The baseline means different things for different aspects of the overall system. On the highest level, it can consist of the following things:

  • Number of remote users
  • Number of internal users
  • Total consumed network bandwidth, inbound and outbound, at all links (for example, between branch offices, toward the Internet)
  • Traffic breakdown per protocol and application (TCP, UDP, mail, web, backup, and so on) and bandwidth utilization per protocol

Each of the categories can then be further refined and a more detailed picture can be formed. For remote users, remote IP addresses can be recorded. A traffic model of a user can be formed by recording how much traffic (packets) is generated inbound and outbound and what protocols and applications have generated it. For some protocols, what types of packets are being generated can even be recorded. If we take TCP as an example, the ratio of SYN packets versus ACK packets can be recorded. How many fragmented packets are in the mix? That information can then be used to identify the presence of anomalous traffic because different types of packets are used by different attack programs. Another type of information that can be recorded is the direction of the traffic. That is important because the site can be the target or source of an attack.

Information used to build the baseline should come from multiple sources to build a better picture. Traffic snapshots (or full captures for small sites), Netflow data, syslog logs, logs from intrusion prevention/detection systems, and application logs of all of these sources should be used to build the baseline.

Collecting data to form the baseline can be illuminating. On occasions that can give an interesting picture and reveal all sorts of things that are being done without the knowledge of appropriate groups. It does not always have to be in the negative sense. It is common to find some servers still offering services and being used, even though they were officially decommissioned several years ago. Various cases of network or system misconfigurations can also be detected (for example, traffic being routed down the suboptimal path). Unofficial web servers and wireless access points are also likely to be discovered during the process.

Taking only a single snapshot might not be sufficient to establish a credible baseline. Traffic and usage patterns change over time. They are different depending on the hour within a day, a day in a week, and month in a year. During lunch time, it is expected to see less traffic than in the middle of the morning. Around holidays, traffic will be again lower than during the normal working days. Adding or removing a significant number of computers will affect the baseline, too. The message is that information should be constantly updated with the latest measurements.

The baseline does not need to be precise to the byte and must not be used rigidly. If, for example, 40 percent of incoming traffic on the main Internet link is TCP, the times when that ratio increases to 45 percent do not need to be immediately considered as a sign of an attack. But if it suddenly jumps to 60 percent or more, it is probably suspicious. There will always be some variation in each of the baseline components, and the IRT must be aware of what the expected variation is. That can be determined only with prolonged and continuous inspection.

Identify Critical Resources

The next step in the process is to identify critical resources. What resources are critical for the business and in what way? What will happen if a resource is unavailable? If the company website is used only to present what the organization is about, it being unavailable might not have severe consequences. If the website is also used for ordering, you need to keep the period of not being available as short as possible. The billing system might be more critical than email infrastructure, and so on.

This part of the process must be done with help from different groups and departments within the organization. Each of them should identify what resources are critical for their business. All that information then must be taken to a higher level of management and looked at from the global organization’s perspective. Although something might be critical for a given department, it might not play a significant role from the overall business perspective. The criticality of services should be reviewed periodically and after significant change in the business model is introduced.

Formulate Response Strategy

After completing the inventory of critical resources, an appropriate response strategy can be formulated. This strategy is supposed to answer questions such as: If a service, or server, is compromised, what can and should be done? Here are few examples that illustrate this point:

  • If a company’s website is defaced or compromised, what needs to be done? If the website is used only for general information, it can be simply rebuilt, and no effort will be spent trying to identify how the compromise happened or who did it.
  • If a host used for collecting billing information is compromised and the attacker is siphoning credit card information from it, can you simply shut off the computer to prevent further damages? Although that can prevent data theft, it might also prevent collecting billing information, and the organization will lose some money as a consequence.
  • What level of compromise needs to happen before a decision to attempt to identify a culprit for possible prosecution will be made versus just shutting him out? This can possibly mean that the attacker will be left to (mis)use the compromised system for some time while the investigation is going on. What is the point when the business might seriously suffer as the consequence of the compromise and the investigation has to be stopped?

Answers to some of the questions can also lead to rethink the way the system is organized or services are offered. In the case of a website, maybe it can be made static and burned on a DVD so that the possibility of defacement is reduced if not eliminated. Maybe some critical services can be split across multiple computers, so if one is compromised, it can be shut down without affecting the other service.

Why is this important? When the attack is ongoing, there might not be sufficient time to think about what the various actions of the attacker and defenders can cause to the organization. At that time, the IRT must react as quickly as possible to minimize the impact to the organization. Knowing how different computers and services depend on each other and how important they are to the organization enable the team to respond quickly and accurately while minimizing the impact and disruptions to the business.

Create a List of Scenarios

Instead of waiting for incidents to happen and then learning how to respond, the IRT should have regular practice drills. Some most common scenarios should be created, and the team must practice how to respond to them. This is especially important after new members join the team. Even if they are experienced in incident handling, each organization will have some processes slightly different, and practice drills are the right time and place to learn them. The main purpose of these exercises is that people gain practice and confidence in handling incidents. They also serve to test how effective the current response might be given changes in the network (added new devices or software features) and to accordingly modify the way to respond. These exercises do not need to be limited only to IRT but can involve other parts of the organization. In such joint exercises, all involved participants must know when the exercise is active. This is to prevent confusion so that people will not panic or take wrong actions thinking that the real compromise is happening.

What can these scenarios look like? For a start, they must cover the main aspects of all handled incidents. If these incidents happened once, there is the possibility that they will happen again. Here are some suggestions of what can be covered:

  • Virus or worm outbreaks
  • External and internal routing hijacked
  • DNS-related attacks (for example, the organization DNS entry gets changed and points to a bogus site)
  • Computer compromise
  • Network sniffer installed on several computers
  • Website defacement or compromise
  • Phishing attacks
  • DoS attacks
  • Emergency software upgrade

These may be the most common scenarios that one organization might encounter. Depending on the organization’s role and technical capabilities, some additional scenarios can be created. Also, some of the scenarios might not be applicable to the team because of job separation (for example, software upgrade is done by the IT department).

These practice drills can be only a paper exercise, or they can be conducted on an isolated network segment. Instead of using physical devices, it also might be possible to either simulate them or to use virtual devices (for example, virtual computers in VMware). What method and technology will be used depends on the goals and capabilities.

Devices we can simulate are computers, routers, and networks of devices. In these simulations, devices can be either targets of simulated attacks or used to observe how malicious software behaves. Some of the software for creating virtual computers are VMware, Parallels, Xen, and QEMU. A more comprehensive list of different software is posted at the Wikipedia web page at http://en.wikipedia.org/wiki/ Comparison_of_platform_virtual_machines. Some of the software for creating virtual computers can also be used to connect virtual computers creating virtual networks. Dynamips, Dynagen, and Simics are some of the software that can be used for simulating routers and network of routers.

A paper exercise is good for formulating the initial response on an attack that has not been encountered yet and to modify an existing response after the system changed because the equipment changed or software was upgraded. Testing the response, on the other hand, is best done on the actual equipment. At that time, all the previously invested work to determine the baseline and what is the normal state for the network pays off. Having this information, the team can send (or simulate) the right amount and the mix of traffic and then superimpose attacking traffic on top of it. In some instances, that might not be relevant, but in others, such as DoS attacks, it can be relevant. The instances when the baseline is not that important are in the presence of single-packet attacks. In that case, it is sufficient to send only a single packet to compromise or reset a device or a process on the device. You need to use real devices for the verification to make sure that the simulator reflects the real device’s behavior. It can take some time for the simulator to be updated with the newest features present on the devices.

Use simulators and emulators to practice the response once when you are sure that it actually reflects how the real device will behave and when it is known what the response is. After the response is established and practiced, new elements should be added to it.

Some unexpected or unusual elements should be introduced. They can be various things, such as the following:

  • The telephone network is down; at the same time, team members cannot use fixed telephony or mobile phones to communicate.
  • It is impossible to physically reach the affected device (for example, a computer is locked in a room and the room key is lost).
  • A new device is introduced into the network without anyone’s knowledge (for example, a load-balancing device inserted in front of the web farm) or the network topology is changed.

Introducing these elements should prevent people from trying to fit the problem into the solution instead of the other way around. Each new case should be like the first one and should be handled with a mind open to any eventuality.

The last things to practice are, seemingly, impossible scenarios. You must accept that, occasionally, the research community does come up with a revolutionary new attack technique, and things that were considered impossible suddenly become routine. Here are a few examples:

  • A scenario that contains a logical paradox. That would be the trick case to verify that the handler can notice the paradox. An example might be to invent a device under attack that is not connected to the network or withhold information about an intermediate device.
  • A feature suddenly stops working (for example, packet filters do not block packets; rate limiters do not limit packet rate).
  • Significant improvement in attack techniques (for example, a complete compromise of MD5 and SHA-1 hash functions, an AES crypto system is broken, and the number factoring becomes trivial).

For some of these scenarios, there may be no valid, or possible, responses, so their value lies in forcing people to think out of the the-box. Some of the scenarios might one day become reality—a collision in MD5, a number factoring using quantum computers—so thinking about them today might give the organization an edge.

7. Measure of Success | Next Section Previous Section