Intrusion detection system for internet
The visibility to detect the rapid growth of Internet attacks becomes an important issue in network security. Intrusion detection system (IDS) acts as necessary complement to firewall for monitoring packets on the computer network, performing analysis and incident-responses to the suspicious traffic.
This report presents the design, implementation and experimentation of Network Intrusion Detection System (NIDS), which aims at providing effective network and anomaly based intrusion detection using ANOVA (Analysis of Variance) statistic. A generic system modelling approach and architecture are design for building the NIDS with useful functionalities. Solving the shortcomings of current statistical methods in anomaly based network intrusion detection system is one of the design objectives in this project as all of them reflect the necessary improvements in the network-based IDS industry.
Throughout the system development of NIDS, several aspects for building an affective network-based IDS are emphasized, such as the statistical method implementation, packet analysis and detection capabilities. A step by step anomaly detection using ANOVA (Analysis of Variance) test has been calculated in the report.
Chapter 1 Introduction
This chapter is introduction to the whole project. This chapter introduce the project, its motivation, main objective and advance objectives. The chapter also give brief methodology of the research.
The Though with the rapid growth of computer networks make life faster and easier, while on the other side it makes life insecure as well. Internet banking, on line buying, selling, on internet, is now part of our daily life, along with that, if we look at growing incidents of cyber attacks, security become a problem of great significance. Firewalls are no longer considered sufficient for reliable security, especially against zero error attacks. The security concern companies are now moving towards an additional layer of protection in the form of Intrusion Detection System.
D.Yang, A.Usynin & W.Hines (2006) explain intrusion and intrusion detection as:
“Any action that is not legally allowed for a user to take towards an information system is called intrusion and intrusion detection is a process of detecting and tracing inappropriate, and incorrect, or anomalous activity targeted at computing and networking resources” . Idea of intrusion detection was first introduced in 1980 (J.P Anderson) and first intrusion detection model was suggested in 1987 (D.E.Denning). Intrusion Prevention System (IPS) is considered as first line of defence and Intrusion Detection Systems are considered as second line defence . IDS are useful once an intrusion has occurred to contain the resulting damage. Snot is best example of working Intrusion Detection System and Intrusion Prevention Systems (IDS/IPS) developed by Sourcefire. Which combine the benefits of signature, protocol and anomaly based inspection.
IDS can be classified in to misuse detection and anomaly detection. Misuse detection or signature based IDS can detect intrusion based on known attack patterns or known system vulnerabilities or known intrusive scenarios where as anomaly intrusion detection or not-use detection systems are useful against zero -day attacks, pseudo zero-day attack. Anomaly based IDS based on assumption that behaviour of intruder is different from normal user. Anomaly detection systems can be divided into static and dynamic, S.Chebrolu, et al A.Abraham & J.P.Thomas (2004). Static anomaly detectors assume that the portion of system being monitored will not change and they mostly address the software area of the system . Protocol anomaly detection could be the best example of static anomaly detection . Dynamic anomaly detection systems operate on network traffic data or audit records and that will be the main area of my interest in research.
“Anomaly IDS has become a popular research area due to strength of tracing zero-day threats”, B.Schneier (2002). It examines user profiles and audit records etc, and targets the intruder by identifying the deviation from normal user behaviour and alert from potential unseen attacks . Active attacks have more tendencies to be traced as compared to passive attacks, but in ideal IDS we try to traces both. Anomaly based Intrusion detection system are the next generation IDS and in system defence they are considered as second line of defence. In that research my main concentration will be Denial of service attacks their types and how to trace them.
Though Internet is the well knowing technology of the day but still there are security concerns such as internet security and availability. The big threat to information security and availability is intrusion and denial-of-service attacks. Since the existing internet was developed about 40 year ago, at that time the priorities were different. Then unexpected growth of internet result exhaustion IPV4 address along with that it brings lots of security issues as well. According to the CERT statistical data 44,074 vulnerabilities had been reported till 2008.
Intrusion is the main issue in computer networks. There are too many signature based intrusion detection are used within information systems. But these intrusion detection systems can only detect known intrusion. Another approach called anomaly based intrusion detection is the dominant technology now. Many organizations are working on anomaly based intrusion detection systems. Many organizations such as Massachusetts Institute of Technology are providing data set for this purpose. Motivated by the observation that there is lots of work is done using the Massachusetts Institute of Technology (MIT) data sets.
Another aspect of the anomaly based intrusion detection system is statistical method. There are too many good multivariate statistical techniques e,g Multivariate Cumulative Sum (MCUSUM) and Multivariate Exponentially Weighted Moving Average (MEWMA) are used for anomaly detection in the wild of manufacturing systems . Theoretically, these multivariate statistical methods can be used to intrusion detection for examining and detecting anomaly of a subject in the wild of information science. Practically it is not possible because of the computationally intensive procedures of these statistical techniques cannot meet the requirements of intrusion detection systems for several reasons. First, intrusion detection systems deal with huge amount of high-dimensional process data because of large number of behaviours and a high frequency of events occurrence . Second, intrusion detection systems demand a minimum delay of processing of each event in computer systems to make sure an early detection and signals of intrusions. Therefore, a method which study the variation is called ANOVA statistic would be used in this research.
But there is no research available that have implemented ANOVA and F statistic on data sets collected by The Cooperative Association for Internet Data Analysis (CAIDA). The data sets provided by CAIDA are unique in their nature as it does not contain any session flow, any traffic between the attacker and the attack victim. It contains only reflections from the attack victim that went back to other real or spoof IP addresses. It creates trouble in estimating the attack. I will take that trouble as challenge.
In this section I will explore the core objective of the research and a road map to achieve those objectives.
During that research I will study data sets called backscatter-2008, collected by CAIDA for denial of services attacks. I will use statistical technique ANOVA to detect anomaly activities in computer networks.
My research is guided by five questions.
- What is an intrusion and intrusion detection system? How can we classify intrusion detection system?
- What are different methodologies proposed for intrusion detection systems?
- How to analyse the CAIDA Backscatter-2008 data sets and make them ready for future study and analysis.
- How to figure out the different types of DOS attacks.
- How to implement ANOVA statistical techniques to detect anomaly in networks traffics
Aims and Objectives
Dos attacks are too many in numbers and it is not possible to discuss all the dos attacks in one paper. In this paper I will look to detect anomaly in network traffic using number of packets.
Main/Core objectives of the research
- Review literature of recent intrusion detection approaches and techniques.
- Discuss current intrusion detection system used in computer networks
- Obtaining a data set from CAIDA organization for analysis and future study.
- Pre-process the trace collected by CAIDA, make it ready for future analysis.
- Recognizing the normal and anomaly network traffic in CAIDA dataset called backscatter-2008.
- Investigate Analyse deviated network traffic using MATLAB for different variants of denial of services attacks.
- Review of existing statistical techniques for anomaly detection
- Evaluation of the proposed system model
Advance Objectives of the research
- Extend the system model to detect new security attacks.
- Investigating and analysing the ANOVA statistical techniques over other statistics for anomaly detection in computer networks.
Nature and Methodology
The area of research is related with detecting anomaly traffic in computer networks. The revolution in processing and storage capabilities in the computing made it possible to capture, store computer network traffic and then different kind of data patterns are derived from the captured data traffic. These data patterns are analysed to build profile for the network traffic. Deviations from these normal profiles will be considered anomaly in the computer network traffic. This research presents a study of vulnerability in TCP/IP and attacks that can be initiated. Also the purpose of research is to study TCP flags, find distribution for the network traffic and then apply ANOVA statistical techniques to identify potential anomaly traffic on the network.
Chapter 1: Introduction
This chapter is about the general overview of the project .First of all introduction about the topic is given then motivation of the research is discussed. Core objectives and general road map of the project is discussed under the heading of research question. Aims and objectives are described to enable readers to understand the code and advance objectives of the research and general overview of the research. Nature and Methodology includes the nature of research and what methods will be used during that research to answer the research question and to achieve core and advance objectives. Lastly at the end all chapters in the report are introduced.
Chapter 2: Research Background
The main focus of this chapter to explain what is Intrusion and Detection why we need Intrusion Detection Systems, types and techniques being used for Intrusion Detection Systems, Challenges and problems of Intrusion Detection System.
Chapter 3: Security Vulnerabilities and Threats in Computer Networks
This area of report is dedicated to the Network Security in general and issues with computer networks. Then types of Denial of services attacks are described in general. This chapter also include Types of DOS attacks and brief description of each attack.
Chapter 4: Data Source
Data sets collected and uploaded by CAIDA on their web site are not in a format to be processed straight away. This chapter described in detail how to obtain those data sets. Then all the necessary steps that are carried out on the data sets to convert that trace into format that is understood by MATLAB for final analysis. It also includes the problems faced during the pre-processing of data sets as there not enough material available on internet for pre-processing of datasets and the application used during that phase.
Chapter 5: System Model
As the research is based on TCP/IP protocol So it is vital to discuss the TCP and the weak points that allow that attacker to take advantage and use them for malicious purpose. What measures could be taken to recognize the attacks well before they happen and how to stop them. In this chapter I will discuss the Intrusion detection Model and features of proposed IDS and finally the steps in proposed model.
Chapter 6: ANOVA Statistic and Test Results Implementation in Proposed Model
This chapter is the core chapter of this project. This chapter all about focus on statistical test in intrusion detection systems particularly on ANOVA statistics. In this chapter first, the existing statistical techniques are analysed for intrusion detection. ANOVA calculation, deployment in intrusion detection system, backscatter-2008 data set distribution and other categories wise distribution will be explained in this chapter. Finally in the chapter, includes the graphs of the data sets and ANOVA and F statistic graphs are shown.
Chapter 7: Discussion and conclusion
Finally I will sum up my project in this chapter. It will include conclusion of research. Personal improvements of during that project because during that project I been through my experiences that later I found in the project that is helpful in other areas. Finally the goals that are achieved through entire project.
This chapter will enable reader to understand the general overview of the research. First of all the different research questions are identified. Then the objectives of the research are described which includes both core and advanced objectives. What is the nature of the research and which method will be used in it are in picture. The topic provides overall background information. Furthermore explanation of the report structure and brief description of all the chapters are also included in this chapter.
Chapter 2 Research Background
The focus of this chapter is to explain, what is intrusion and intrusion detection system. Why we need Intrusion Detection System. This chapter also discuss types and techniques used for Intrusion Detection Systems. Goals, challenges and problems are the main parts of the Intrusion Detection System are also explained in this chapter.
Intrusion Detection System (IDS)
A computer intrusion is the number of events that breaches the security of a system. Such number of events must be detected in proactive manner in order to guarantee the confidentiality, integrity and availability of resources of a computer system. An intrusion into an information system is a malicious activity that compromises its security (e.g. integrity, confidentiality, and availability) through a series of events in the information system. For example intrusion may compromise the integrity and confidentiality of an information system by gaining root level access and then modifying and stealing information. Another type of intrusion is denial-of-service intrusion that compromises the availability of an information system by flooding a server with an overwhelming number of service requests to the server over short period of time and thus makes services unavailable to legitimate users. According to D. Yang, A. Usynin & W. Hines, they describe intrusion and intrusion detection as: “Any action that is not legally allowed for a user to take towards an information system is called intrusion and intrusion detection is a process of detecting and tracing inappropriate, and incorrect, or anomalous activity targeted at computing and networking resources”.
Why we need Intrusion Detection System
To provide guarantee of integrity, confidentiality and availability of the computer system resources, we need a system that supervise events, processes and actions within an information system . The limitations of current traditional methods, misconfigured control access policies and also the misconfigured firewalls policies in computer systems and computer network security systems (Basic motivation to prevent security failures), along with increasing number of exploitable bugs in computer network software, have made it very obvious to design security oriented monitoring systems to supervise system events in context of security violations .
These traditional systems do not notify the system administrator about the misuses or anomaly events in the system. So we need a system which provides proactive decision about misuse or anomaly events, so therefore from last two decades the intrusion detection systems importance is growing day by day. Now a day’s intrusion detection system plays vital role in an organization computer’s security infrastructure.
Types of Intrusion Detection System
Intrusion detection system is a technique that supervises computers or networks for unauthorized login, events, activity, or file deletion or modifications . Intrusion detection system can also be designed to monitor network traffic, so it can detect denial of service attacks, such as SYN, RST, ICMP attacks. Typically intrusion detection system can be classified into two types .
- Host-Based Intrusion Detection System (HIDS)
- Network-Based Intrusion Detection System (NIDS)
Each of the above two types of intrusion detection system has their own different approach to supervise, monitor and secure data, and each has distinct merits and demerits. In short words, host based intrusion detection system analyse activity occurrence on individual computers, while on the other hand network based IDSs examine traffic of the whole computer network.
Host-Based Intrusion Detection System
Host based intrusion detection gather and analyse audit records from a computer that provide services such as Password services, DHCP services, web services etc . The host based intrusion detection systems (HIDS) are mostly platform dependent because each platform has different audit record from other platforms. It includes an agent on a host which detect intrusion by examining system audit records, for example audit record may be system calls, application logs, file-system modification (access control list data base modification, password file modification) and other system or user’s events or actions on the system. Intrusion detection system were first developed and implemented as a host based . In host based intrusion detection systems once the audit records is aggregated for a specific computer, it can be sent to a central machine for analysis, or it can be examined for analysis on the local machine as well. These types of intrusion detection systems are highly effective for detecting inside intrusion events. An unauthorized modification, accesses, and retrieval of files can detect effectively by host based intrusion detection system. Issues involve in host based intrusion detection systems is the collection of audit records for thousands of computer may insufficient or ineffective. Windows NT/2000 security events logs, RDMS audit sources, UNIX Syslog, and Enterprises Management systems audit data (such as Tivoli) are the possible implementations of the host based intrusion detection system.
Network-Based Intrusion Detection System
Network-based intrusion detection system (NIDS) is completely platform independent intrusion detection system which predicts intrusion in network traffic by analysing network traffic such as frames , packets and TCP segments (network address, port number, protocols TCP headers, TCP flags etc) and network bandwidth as well. The NIDS examines and compared the captured packets with already analysed data to recognize their nature for anomaly or malicious activity. NIDS is supervising the whole network, so it should be more distributed than HIDS. NIDS does not examine information that originate from a computer but uses specials techniques like “packet sniffing” to take out data from TCP/IP or other protocols travelling along the computer network . HIDS and NIDS can also be used as combination. My project focus on network based intrusion detection systems, in this project we analyse TCP flags for detecting intrusions.
Techniques Used in Existing IDS
In the above section we discussed about the general existing type of the intrusion detection system. Now the question arises that how these intrusion detection system detect the intrusion. There are two major techniques are used for above each intrusion detection system to detect intruder.
- Signature Detection or Misuse Detection
- Anomaly Detection
Signature Detection or Misuse Detection
This technique commonly called signature detection, this technique first derives a pattern for each known intrusive scenarios and then it is stored in a data base . These patterns are called signatures. A signature can be as simple as a three failed login or a pattern that matches a specific portion of network traffic or it may be a sequence of string or bits . Then this technique tests the current behaviour of the subject with store signature data base and signals an intrusion when there is a same pattern match. The main limitation in this technique, that it cannot detect new attacks whose signatures are unknown.
In this technique the IDS develop a profile of the subject’s normal behaviour (norm profile) or baseline of normal usage patterns. Subject of interest may be a host system, user, privileged program, file, computer network etc. Then this technique compare the observed behaviour of the subject with its normal profile and alarm an intrusion when the subject’s observe activity departs from its normal profile . For comparison, anomaly detection method use statistical techniques e,g ANOVA K-mean, Standard Deviations, Linear regressions, etc . In my project, I am using ANOVA statistic for anomaly detection. Anomaly detection technique can detect both known and new intrusion in the information system if and only if, there is departure between norm and observed profile . For example, in denial of service attack, intrusion occurs through flooding a server, the ratio of the events to the server is much higher than the events ratio of the norm operation condition .
Issues and Challenges in the IDS
An intrusion detection system should recognize a substantial percentage of intrusion while maintain the false alarm rate at acceptable level . The major challenge for IDS is the base rate fallacy. The base rate fallacy can be explained in false positive false negative. False positive means when there is no intrusion and the IDS detect intrusion in the event. False negative when there is an intrusion in the events and the IDS does not detect it. Unfortunately, the nature of the probability includes, and the overlapping area between the observed and training data, it is very difficult to keep the standard of the high rate of detections with low rate of false alarms . According study held on the current intrusion detection systems depicted that the existing intrusion detection systems have not solved the problem of base rate fallacy .
An intrusion into information system compromises security of the information system. A system, called intrusion detection is used to detect intrusion into information system. The two major types of IDS are HIDS and NIDS. The host based intrusion detection system monitor mostly the events on the host computer, while the NIDS monitor the activity of the computer network system. There are two approaches implemented for intrusion detection in IDS, anomaly and signature. Anomaly use statistical methods for detecting anomaly in the observed behaviour while signature check patterns in it. Base rate fallacy is the major challenge for IDS.
Chapter 3 Security Vulnerabilities and threats in Networks
In this chapter we are going to discuss the computer and network security. For computer security, there are some other terminologies like vulnerability, exploitability and threats are discussed as well in the chapter. Then chapter focus on Denial of Service attack, which is the most dominant attack in the wild of computer science. The chapter also concentrate the all aspects of the denial of service attack.
In the early days of the internet, network attacks have been a difficult problem. As the economy, business, banks and organization and society becomes more dependent on the internet, network attacks put a problem of huge significance. Computer security preclude attacker from getting the objectives through unauthorized use of computers and networks . According to the Robert C. Searcord “Security has developmental and operational elements” . Developmental security means, developing secure software with secure design and flawless implementation . Operational Security means, securing the implemented system and networks from attacks. In computer security the following terminologies are used most commonly .
- Security Policy: A set of rules and rehearses that are typically implemented by the network or system administrator to their system or network to protect it from attacks are called security policies.
- Security Flaw: A software fault that offers a potential security risk is called security flaw.
- Vulnerability: the term vulnerability is a set of conditions through malicious user implicitly or explicitly violates security policy.
- Exploit: a set of tools, software, or techniques that get benefit of security vulnerability to breach implicit or explicit security policy .
The term information security and network security are often used interchangeably. However, this project focus intrusion in computer networks, so we are going to discuss network security. The term network security is the techniques that are used to protect data from the hacker travelling on computer networks.
Network security Issues
There are many issued involved in the network security but the following are the most common.
- Known vulnerabilities are too many and new vulnerabilities are being discovered every day.
- In denial of service attack when the malicious user, attack on the resources of the remote server, so there is no typical way to distinguish bad and good requests.
- Vulnerability in TCP/IP protocols.
Denial of service Attacks
A denial of service attacks or distributed denial of service attack is an attempt to make computer resources exhausts or disable or unavailable to its legitimate users. These resources may be network bandwidth, computing power, computer services, or operating system data structure. When this attack is launched from a single machine, or network node then it is called denial of service attack. But now days in the computer wild the most serious threat is distributed denial of service attack .
In distributed denial of service attack, the attacker first gain access to the number of host throughout the internet, then the attacker uses these victims as launch pad simultaneously or in a coordinated fashion to launch the attack upon the targets.
There are two basic classes of DoS attacks: logic attacks and resource attacks. “Ping-of-Death”, exploits current software flaws to degrade or crash the remote server is an example of the logic attacks. While on the other hand in resource attacks, the victim’s CPU, memory, or network resources are overwhelmed by sending large amount of wrong requests. Because the remote server, does not differentiate the bad and good request, so to defend attack on resources is not possible. Various denials of service attacks have some special characteristics Oleksii ignatenko explain the characteristics of the denial of service attacks as in the figure 1.
Your browser may not support display of this image.
Figure 1 – Denial of service attack characteristics
- Attack type: a denial of service can be a distributed (when it comes from many sources) or non-distributed (when it comes from only one source).
- Attack Direction: attack direction may be network or system resources.
- Attack Scheme: Attack Scheme can be direct from malicious user’s source or it can be reflections form other victim’s systems, or it can be hidden.
- Attack Method: Method means that vulnerability that allows attack. Targeted attack utilizes vulnerability in protocols, software and services, while consumption method consumes all possible resources. Exploitive attacks take advantages of defects in operating system. operating system
Methods for Implementing Denial of Service Attacks
A denial of service attack can be implemented in many ways; the following are the most common implantation techniques
- Attempt to “flood” a network, thereby stopping legitimate network traffic
- Attempt to interrupt connections between two systems, thereby preclude access to a service
- Attempt to prevent a specific user from accessing a service
The “flood” method can be deployed in many ways but the following are well known in the wild of networks system.
- TCP-SYN Flood
- ICMP Flood
- RST attack
TCP-SYN Flood: In order to achieve the TCP-SYN flood the attacker tries to establish the connection to the server. Normally a client establishes a connection to the server through three way handshake. In three way handshake,
- The client or any sender sends the TCP packet with the SYN flag set.
- The server or receiver receives the TCP packet, it sends TCP packet with both SYN and ACK bits are set.
- The client receives SYN-ACK packet and send ACK packet to the server.
The three way handshake can easily be understood in the figure 2:
- Your browser may not support display of this image.Your browser may not support display of this image.Your browser may not support display of this image.
- Your browser may not support display of this image.
- Your browser may not support display of this image.
Figure 2 Three way Handshake
This is called three way handshake of TCP connection establishment. So in SYN flood what the attacker does, he sends SYN packet to the server and the server responds with SYN-ACK packets but the attacker does not sends the ACK packet. If the server does not receive the ACK packet from the client it will resends a SYN-ACK packet again after waiting for 3 seconds. If SYN-ACK still does not arrive, the server will send another SYN-ACK after 6 seconds. This doubling in time continuous for a total of 4 or 6 attempts (the exact number depends upon the implementation of the TCP protocol on the server side) . So in SYN flood the attacker install Zombies on Internet hosts and sends huge amount of SYN request from spoof IP to the server or any host on the internet and utilize all the server or host memory and data structure. In this way the server get busy and is not able to accept request or respond to the legitimate users.
ICMP Flood: Over the years, along with SYN flooding, Ping flooding is arguably one of the most popular DoS attacks among script kiddies on the Internet. It is similar to SYN flood but in this attack the attacker sends large ICMP pings packets to the server or target repeatedly to make the server so busy, that the server does not respond have time to respond other requests. Some time the attacker sends ICMP packets with spoof source IP address. So this way the attacker targets both the server and spoof source at the same address.
RST Attack: RST attack work the same way as the SYN flooding attack, but difference is that SYN attack work in the start of the communication (three way handshake) while on the other hand RST attack occurs when the session establishes or in the middle of the session. According to RFC-793, “The goal for the attacker to cause one of the two end points to incorrectly tear down the connection state, effectively aborting the connection.” Basically RST flag in the TCP packet is used to reset the communication.
Let take an example, if systems host A and host B are communicating with each other and host K decide to do RST attack on them, the host K has to guess source port, destination port, source IP, destination IP (also called 4-tuple) and compute or guess the correct sequence number. After computing or guessing the sequence number and 4-tuple host k will send a TCO packet setting with RST flag to host B imagine to be A, so when host B receives the packet it will terminate the connection immediately.
After the attacker terminate the communication by sending a spoofed packet with RST flag set to B. The attacker then pretends to be host B and start attacking host A. This result in DoS, until connection between A and B is established again. It depends on application to application, some could establish connection quickly and termination does not affect too much. Whereas some protocol that needs lengthy and sustained connection would affect more. RST attack mostly affects the CISOC Border Gateway Protocol (BGP) because it takes too much time to setup a connection .
Handling Denial of Service Attack
The protect-detect-react cycle is very useful mitigation strategy when there is denial of service attack .
Protect Your System and Prepare for Attack: Designing mitigation strategies against denial-of-service attack only possible if you known before what you are trying to defend. First check the known vulnerabilities in your system, importance of the information. There are different ways to do this. A good standardized method developed by the SEI is called OCTAVE (Operational Critical Threat, Asset, and Vulnerability Evaluation).
Detecting Attack to Your System: The ability of detecting attacks directly impact your ability to react appropriately and to control damages. Among the approaches that can be take are analysing log and automated intrusion detection system.
Reacting against DoS: Reaction strategies include response plan, deploying specific steps based on the type of attack, communicating with ISP, reserving backup links, moving contents, and more. There are also some technical steps, include traffic controlling, traffic blocking and traffic filtering.
Computer security is the study of preventing attacker from unauthorized objectives to the system or network. There are many threats to the computer system; denial of service is the most problematic threat in them. Denial of service attack is an attempt from the malicious user to make resource unavailable to legitimate user. There are many methods used to implement denial of service attack. The most commonly methods are ICMP, SYN flooding. Intrusion detection system is typically used to detect denial of service attack.
Chapter 4 Data Source
This chapter is all about data set or data source. Data source is an important or first phase in the design of an intrusion detection system. To design an anomaly based intrusion detection system, data is capture or analyse for user profiles. This chapter explores what kind of data source will be used in this study. How we will have this data source, from where we have it. What organization provide data sources. All these questions will be answered in this chapter.
What is CAIDA?
CAIDA stands for Cooperative Association for Internet Data Analysis. CAIDA’s centre is at the San Diego Supercomputing Center (SDSC), an extension of the University of California at San Diego (UCSD) where it was established in 1997 by Dr. Kc Claffy and Tracie Monk . It is a cooperative undertaking among organization with a strong interest in keeping basic Internet capacity and usage efficiency in line with increasing demand . The CAIDA’s member come from the government, commercial and research sectors. CAIDA’s participants use this organization as a central point for fostering greater collaboration in the engineering and monitor of a robust, scalable global Internet infrastructure. According to CAIDA website “The Cooperative Association for Internet Data Analysis (CAIDA) is a collaborative undertaking among organizations in the commercial, government, and research sectors aimed at promoting greater cooperation in the engineering and maintenance of a robust, scalable global Internet infrastructure”. In this project will use denial of service data set captured by the CAIDA. The main mission of the CAIDA is to examine theoretical and practical aspects of the internet in order to :
- Understanding macroscopically the function of internet infrastructure, usage, behaviour, and evolution
- Promote a collaborative environment in which data can be isolate, analyzed, and shared appropriately
- Foster the integrity of the field of internet science
- Provide information to science, technology, and communication public policies
Security research at CAIDA includes analysis of network based attacks for example denial-of-service attacks, data hosting and supplying , and measurement and statistical analysis of the trends and effect that certain Internet worms and viruses have on the global network infrastructure . According to the security point of view, the CIADA main objective includes, “We hope to develop meaningful and up-to-date quantitative characterizations of attack activity and to produce fundamental insights into the nature of malicious behaviour on the Internet and consequently the best directions for mitigating that behaviour “.
The CAIDA Backscatter-2008 Data set
For full explanation of backscatter-2008 data set we need to introduce backscatter attack first. In backscatter attack, attacker spoofed the source IP address ( normally those which does not exists) but some time the attacker spoof IP address of live systems selected randomly. In SYN or ICMP ping attack the attacker spoof the source IP address and send a ping or SYN request to the server. The server responds to the spoof source. This accidental behaviour from victim is called as backscatter. This backscatter-2008 data set contains such kind of data.
This data set contains information useful for examining denial-of-service attacks . The data set also contains of quarterly week-long collections of responses to spoofed traffic sent by denial of service attack victims and received by the UCSD Network Telescope (the UCSD network telescope (also called as blackhole, an internet sink etc) contains a globally routed /8 network that carries mostly no legitimate traffic. Because the legitimate traffic can be easily separated from incoming packets, so the network telescope provide a good analysing point anomalous traffic that represents mostly 1/256th of all IPv4 destination addresses on the internet) .
In this data set, quarterly collection is captured in February, May, August and November . In this Backscatter, when the DoS victim receive attack packet with spoofed source IP address, the victim cannot make difference between the spoofed packet and the legitimate traffic, therefore the attacked victim responses normally to the spoofed source IP address . The Backscatter-2008 is the newest data set captured for Dos attack, so this data set will be first time analysed by us in this study. There are some important cautions for use of this data set:
- The Bacscatter-2008 does not consist any communication between the attack victim and the attacker. So the traffic in the data set is only responses that went back to the spoofed IP addresses from the attack victim.
- In this data set not every response is denial of service attack.
Intrusion Detection System and Data source
An intrusion detection normally have a long-term profile for each user or network system and then compare this profile with incoming data and signal an anomaly when there is lager departure between observed profile and normal profile. For norm profile creation it uses data set. This section explore data set for intrusion detection, its application, including in data source we also have training data, testing data .
Data source: An information system normally consists of host machine and communication link connecting those host machines, creating a network of host systems . To capture events in an information system, there are two sources of data have been normally used by intrusion detection system: audit trial data (audit data) and network traffic data . Audit data contains events occurring on host machine. While, network traffic data consists data packets travelling over communication links among host systems to capture events over communication links. In this study, I am going to design Network-Based Intrusion Detection System Model through Anomaly Detection approach; so I analyse the denial of service data set, called as the CAIDA Backscatter-2008 data set. The nature of the CAIDA Backsctter-2008 data set will be explored in section 4.4.
Training and Testing Data: typically for anomaly based intrusion detection system, we develop a long-term profile of norm events  and then the recent past events are compared to the long-term normal profile for detecting significant departure . Network traffic data of normal events are required for training the normal profile . In this study, I use a sample of three hour network traffic is captured in 2008 by the CAIDA. So first, I use the whole three hour Backscatter-2008 data set, for training a normal profile. Then, I also need some other data events for testing purpose. In this study I only design an intrusion detection system and there is no live network traffic data, so I use three minutes of sample from the three hour data for testing purpose.
Issues faced during pre-processing
This data set consists three large file each with one hour duration, compressed with open source LZO utility, which is run by Linux. So first I installed LZO utility and then decompressed these files. Secondly the data set is in PCAP format. This kind of format can be easily analysed in wire shark but the problem was that the file size was large, so wire shark could not load the whole file and out of memory error was generated. Therefore I edited the files into 60 small files each with three minute duration. Then through T-shark, I convert it from PCAP format to CSV format. Then through MATLAB, I analysed the CSV (Comma Separated value) files easily.
CAIDA stands for Cooperative Association for Internet Data Analysis. It is cooperative undertaking among organization with a strong interest in keeping basic Internet capacity and usage efficiency in line with increasing demand. Security research at CAIDA, includes analysis of network based attacks for example denial-of-service attacks, data hosting and supplying , and measurement and statistical analysis of the trends and effect that certain Internet worms and viruses have on the global network infrastructure. The backscatter-2008 data set contains information useful for examining denial-of-service attacks. Data set is used for developing norm profile in intrusion detection system.
Chapter 5 Proposed System Model
In this chapter we will explore that to which way our proposed system will detect intrusion. This chapter explain that how the proposed system will analyse the data and the statistical method will be implemented. The proposed system will be explained through diagram in the chapter. The proposed system will also be explained step by step in the chapter.
In this section we are going to discuss a proposed architecture for our intrusion detection system. Detailed system architecture description is presented. This section will not explain the evaluation and results of proposed system.
My main aim is to propose a system that is suitable to detect anomaly in computer network traffic. It has been mentioned that intrusion detection proposed in this research use backscatter-2008 data set captured by the CAIDA organization. The special property of this data set is that it does not have traffic between the attacker and the attack victims, session flow between the source and destination node or host system. It has only reflection or responses from the victim to the spoof IP’s. So this research only focus, on how to analyse different kind of data set, how statistical methods such as ANOVA statistics are used for anomaly detection in computer networks. Following figure shows generalized architecture of intrusion detection model that I have used
Your browser may not support display of this image.
Figure 1 Architecture for Intrusion Detection System
The Proposed Intrusion Detection System
The approach that I have used is composed of the following steps:
- In design of anomaly based intrusion detection system, the first step is the design of norm profile for the subject of interest. So in this step a trace files that contains the backscatter-2008 three hours data set is given as input to the intrusion detection system. The system extract TCP flags from the data set and make a distribution based on the on the number of packets per second. In order to make the distribution we categorize the number of packets per second. The whole three hour’s data distribution has been computed in table 2 and 3 of the chapter 6. In the proposed system, first the backscatter-2008 whole three hour’s data set distribution will be stored in the data store of the proposed system.
- The next step is testing. In this step, a sample of three minutes of network traffic is captured. The backscatter-2008 data set has been sampled into 60 samples of three minutes each and has been shown in table 1 of chapter 6. Then the sample then the average packets per second is calculated. This sample is also called the observed data. The goal is to observe and record the deviation from the expected data.
- After the three minutes sample categorization, the proposed system then uses the ANOVA test statistic for deviation calculation. In this step a F statistic value is computed from the observed and expecting data, and then the F statistic computed value is compared with F statistic tabulated value. An intrusion alarm is raised when the F statistic value is greater than the F statistic tabulated value. All these test calculation procedures have been shown step by step in chapter 6.
Features of the Proposed IDS:
The proposed model for intrusion detection has the following features.
- It detect anomaly in network traffic.
- The data will compare through bar graph.
- As it anomaly based system so any anomaly in the network traffic would be consider an intrusion. This has been shown in chapter 6.
The Proposed Model:
First of all, for this model to work the backscatter-2008 data set is analysed. The backscatter-2008 data set contains three hours huge PCAP file format round about 5 GB, so it is not possible for any tool to analyse the whole backscatter-2008 data set at a time. Therefore we divide it into smaller duration into three minutes of PCAP format. Then through T-shark utility, all worthless data is eradicated and only useful (TCP flags information) feature are extracted from PCAP three minutes into CSV format. Then the CSV format is manipulated through MATLAB.
The proposed model for the intrusion detection is depicted in the following flow diagram. The network will have traffic sensor. The sensor called Network Traffic Sensor is responsible to capture network traffic. This captured traffic goes as input to the system.
Next phase in the model is pre-processing and data extraction. In this phase packets are analysed. Only TCP flags information are processed in the packet and all other information from the packet are removed. The same procedure is followed which previously followed in the backscatter-2008 distribution.
After pre-processing and feature extraction, data is passed to the next stage that is called average packets calculation and sample drawing. In this stage, the average packets per second are calculated for the whole three hour data. And then samples are selected for analysis and ANOVA statistic calculation. This kind of calculation has computed in chapter 6.
Your browser may not support display of this image.Your browser may not support display of this image.Your browser may not support display of this image.Your browser may not support display of this image.Your browser may not support display of this image.Your browser may not support display of this image.
Your browser may not support display of this image.Your browser may not support display of this image.
Your browser may not support display of this image.
Your browser may not support display of this image.Your browser may not support display of this image.Your browser may not support display of this image.Your browser may not support display of this image.Your browser may not support display of this image.
Figure 2 System Model
Once a average packets and sample drawing is made, then these sample are passed to the ANOVA statistic for testing. In this testing ANOVA calculation is performed on the samples. A F statistic value is calculated. This F statistic value is passed to the decision phase.
In decision phase the F statistic calculated value is compared with F statistic tabulated value, which is also called critical value . If the F statistic calculated value is greater than critical value then intrusion alarm is raised.
This chapter is about the proposed model for intrusion detection in computer networks. The system model is presented in form of a diagram. Proposed approach is explained step by step. Feature of the model along with the detection scheme are explained in detail. This chapter does not include any ANOVA testing or results. Details of tools used, is also not included in this chapter.