Antivirus Research And Development Techniques
Antivirus software is the most booming product which has constant developments to be most up to date defensive detecting product competing with all other antivirus software products available in the commercial market. This thesis covers few techniques used by the antivirus products, a general background information about viruses and antivirus products, some research made on antivirus overheads which shows what overheads are introduced to the computer on using an antivirus products, a research made on one of the most important and common technique used by the antivirus software products to detect viruses which is signature based detection, also covers how antivirus software is updated and how new virus signatures are updated to the virus database. There is some research also on selected algorithms used by the techniques, here in this thesis it is explained how each selected algorithm works to detect the code or a file as an infected file or uninfected. In the experimentation, the experiment is done to detect a virus using three selected popularly known antivirus software products, where reports shown by the three products are compared and concluded.
Chapter 1: Introduction
A life without computers cannot be imagined in the present life style where it plays a very important role though it might be any field one chooses from the millions. Computer is vulnerable to attacks which are most dangerous and hard to handle with. Just like humans even computers are attacked by “viruses”.
A virus can be in a form of worm, malware or Trojan horses anything that infects the computer. The common source of these viruses is World Wide Web where a malicious person can spread the malware very easily. Many researchers found many methods or procedures to stop the attacks of virus that came up with many techniques or software to remove the viruses which are called “Anti-Virus” software.
A computer virus spreads into the computer through emails, floppy disks, internet and many other sources. The spreading mechanism is usually from one computer to another where it corrupts data or deletes the data from the computer. The viruses mostly spread through internet or through emails which may have some hidden illicit software where the user unknowingly downloads the material into the computer.
A virus can attack or cause damage to boot sector, system files, data files, software and also on system bios. There are many newer viruses which attack on many other parts of the computer. Viruses can spread by booting the computer using the infected file, executing or installing the infected file, or by opening the infected data or file. The main hardware sources can be floppy disks, compact disks, USB or external hard drives or a connection with other computer on an unsafe medium.
This rapid growth of viruses is challenging the antivirus software in different fields like prevention of viruses, preparation, detection, recovery and control of viruses. Nowadays there are so many antivirus software tools that remove viruses from the PC and helps protect from future attacks. Antivirus raises privacy and security issues of our computers we work on which is a major issue. However, after taking so many safety measures the growth of viruses is rapidly increasing which are most dangerous and wider.
In this thesis, a history on viruses and evolution of antivirus software is shown where I will explain about how viruses came into existence and what type of viruses evolved and antivirus software discovery. This general criteria of this thesis is mainly targeted on three selected techniques and is mostly concentrated one technique out of the selected three techniques and scanning methods of antivirus products and also gives a basic scenario of how an antivirus product adopts a framework to update the virus database and also gives some information about how a general computer gets an information to update the product to make it ready to defend against the zero-day viruses.
A brief comparison of viruses based on types where the definitions and related threats of viruses will be explained and the working effects of each type of viruses are explained. The working of antivirus software on different types of viruses is explained. Analysis of the current antivirus techniques, showing both advantages and disadvantages.
In chapter 2 gives you the general outline of the thesis in which you can know a general history of the viruses, evolution of the antivirus software. A definition to the virus, types of viruses, the most common methods or techniques used.
In chapter 3 Literature Review, shows the research and review of some selected papers or literature that I found interesting about w antivirus software. In this chapter, there is research in which some antivirus products, techniques and algorithms compared according to the developments in the recent times.
Chapter 4 Experimentation part of the thesis where the comparison of different commercial antivirus products based on their efficiency to detect a virus is shown and also the results are based on false positives, false negatives and hit ratios shown by each antivirus product.
Chapter 5 Conclusion concludes the thesis summarizing research and experimentation done on antivirus products.
Appendix holds relevant information about the undefined key words or frameworks used in this thesis.
Chapter 2 – Overview
This chapter gives general information about the viruses and antivirus giving some basic information about the virus history and when the antivirus software evolved. There different types of viruses and are classified according to the attacking features. This chapter will lead to better understanding of the techniques used by the antivirus products and also gives you basic knowledge about different antivirus products.
2.1 History of Viruses
The computer virus is a program that copies itself to the computer without user permission and infects the system (Vinod et al. 2009). Virus basically means an infection which can be of many types of malware which include worms, trojan horses, rootkits, spyware and adware.
The first work on computer programs was done by John Von Neumann in 1949 (wiki 2010). In his work he suggested that a computer program (the term “virus” was still not invented) can self-reproduce.
The first virus was discovered in early 1990s which is Creeper virus. Creeper copies itself to other computers over a network and shows messages on the infected machine: “I’M THE CREEPER: CATCH ME IF YOU CAN”. It was harmless but to catch the Creeper and stop it the “Reaper” was released.
In 1974 “Rabbit” a program that spreads and multiples itself quickly and crashes the infected system after it reaches a certain limit or number of copies. In 1980s the virus named “Elk Cloner” has infected many PCs. The Apple II computer which was released in 1977 loads its operating system from the floppy disks, using these characteristics the Elk Cloner installed itself to the boot sector of the floppy disk and was loaded already before the operating system.
“©Brain” was the first stealth IBM-compatible virus. This stealth virus hides itself from being known and when detected it attempts to read the infected boot sector and displays the original, uninfected data. In 1987 the most dangerous virus got into news was Vienna virus which was first to infect the .COM files. Whenever the infected file was called it infects the other .COM files in the same directory. It was the first virus that was successfully neutralized by Bernd Fix and which leads to the idea of antivirus software. Then there were many viruses which were Cascade virus the first self-encrypting virus, Suriv Family virus which was a memory resident DOS file virus. Extremely dangerous virus was “Datacrime” virus which destructs FAT tables and cause loss of data. In 1990s there was Chameleon Virus, Concept virus and then CIH virus and in 2000s there were ILOVEYOU virus, My Doom & Sasser. (Loebenberegr 2007)
Vinod et al. 2009 defines computer virus as “A program that infects other program by modifying them and their location such that a call to an infected program is a call to a possibly evolved, functional similar, copy of virus. To protect from the attacks, the antivirus software companies include many different methodologies for protecting against the virus attacks.”
2.2 Virus Detectors
The virus detector scans the file or a program to check whether file/program is malicious or benign. In this research there will be usage of some technical terms and detection methods which are defined below. The main goal for testing the file/program is to find for false positives, false negatives and hit ratio.(Vinod et. al. 2009)
False Positive: This takes place when the scanner detects a non-infected file as a ‘virus’ by error. They can be a waste of time and resources.
False Negatives: This occurs when the scanners fail to detect the ‘virus’ in an infected files.
Hit Ratio: This happens when the virus scanner scans the virus.
Detections are based on 3 types of malware which are:
Basic
In basic type the malware attacks the program at the entry point as shown in the figure 2.2.1. The control is transferred to virus payload as the entry point itself is infected.
Infected Code
Main Code
Entry
Infected by virus
Figure 2.2.1 Attacking system by basic malware. (Vinod et al 2009)
Polymorphic
Polymorphic viruses are viruses which mutates by hiding the original code the virus consists of encrypted malware code along with decrypted unit. They create new mutants very time it is executed. The figure 2.2.2 shows how the main code or original code is encrypted by infected file to produce a decrypted virus code.
Virus Code
Decrypted Code
Main Code
Entry Encrypted by infected file
Figure 2.2.2 Attacking system by polymorphic viruses. (Vinod et al 2009)
Metamorphic
Metamorphic viruses can reprogram themselves using some obfuscation techniques so that the new variants are not same as the original. It sees that the signatures of the subsets are not same as the main set.
Form B
Virus A
Form A
S1
S2
S3
Figure 2.2.3 Attacking system by metamorphic viruses. (Vinod et al 2009)
The above figure 2.2.3 shows that the original virus and form of that virus have different signatures where s1, s2& s3 are different signatures.
2.3 Detection Methods
2.3.1 Signature based detection
Here the scanners search for signatures which are sequence of bytes within the virus code and shows that the programs scanned are malicious. The signatures are developed easy if the network behavior is identified. Signature based detection is based on pattern matching. The pattern matching techniques evolved from times when the operating system was DOS. The viruses then were parasitic in nature and used to attack the host files and most common executable files. (Daniel, Sanok 2005)
2.3.2 Heuristic based detection
Heuristics describe a method of scanning a virus by evaluating the patterns of behaviors. It takes the possibility of the file or program being a virus by testing the uniqueness and behavior matching them to the database of the antivirus heuristic which contains number of indicators. It is helpful to discover those viruses which does not have signatures or hides their signatures. It is also helpful to detect the metamorphic viruses (Daniel, Sanok 2005)
2.3.3 Obfuscation Technique
This technique is used by the viruses to transform an original program into virus program using some transformation functions which makes the virus program irreversible, performs comparably with original program and has the functions of the original program. This technique is used mainly by metamorphic and polymorphic viruses. (Daniel, Sanok 2005)
Antivirus Products
There are many antivirus products available in the commercial market. Some of the most commonly used antivirus products are:
McAfee
G Data
Symantec
Avast
Kaspersky
Trend Micro
AVG
Bit Defender
Norton
ESET Nod32
Chapter 3: Literature Review
<Introductory paragraphs>
3.1 Antivirus workload characterization
A research done by (Derek, Mischa, David 2005) shows an antivirus software package takes many ranges of techniques to check whether the file is infected or not. But from the observations of (Derek, Mischa, David 2005) to best difference between some antivirus software packages compare the overheads introduced by the respective antivirus software during on-access execution.
When running antivirus software there is usage of two main models which are:
on-demand.
on-access.
On-demand involves the scanning of the user specified files where as on-access can be a process that checks the system-level and the user-level operations and scans when an event occurs.
The paper discusses the behavior of four different anti-virus software packages which run on a Intel Pentium IV being installed with Windows XP Professional. Considering three different test scenarios:
A small executable file is copied from the CDROM to the hard disk.
Executing a calc.exe
And also executing wordpad.exe.
All these executable files are running on the Windows XP Professional operating system. The antivirus packages used in this experiment were Cillin, F-Port, McAfee and Norton. The execution of the files are done using the before mentioned antivirus packages. Figure 3.1.1 shows the usage of these packages introduces some overheads during the execution which increases the time of execution.
Fig 3.1.1 Performance degradation of antivirus packages (Derek, Mischa, David 2005)
Then a test was made to know about the extra instructions executed when the file system operations is performed and also when loading and executing a binary. Taking the both scenarios a small binary of very less size is involved. It is found that the execution is dominated by some hot basic blocks in each antivirus package. A basic block is considered “hot” if it is visited more than fifty thousand times.
To detect the behavior of antivirus software packages the (Derek, Mischa, David 2005) used the platform which was majorly targeted by the virus attacks and also must have the existence of some of the commercial antivirus software. A framework of simulator is introduced here called Virustech Simics this has architectural structure as shown in table 3.1.1. Virustech Simics is a simulator that includes a cycle-accurate micro-architectural model and used to get cycle-accurate performance numbers.
Table 3.1.1 Virustech Simics architectural structures (Derek, Mischa, David 2005)
Processor Model
Processor Operating Frequency
L1 Trace Cache
L1 Data Cache
L2 Cache
Main Memory
Intel Pentium 4 2.0A
2GHz
12K entry
8KB
512KB
256MB
The goal behind the model is to confine the execution of antivirus software on a system. To achieve metrics the stream executed is passed to the simulator. To simulate the micro-processor, simics are configured. The host (simulator) executes the operating system loaded via simulated hard drive. On top of the operating system the researchers have installed and run the antivirus software and also the test scenarios are taken (see figure 3.1.2). After this the comparison is done between the baseline configuration execution (without the antivirus software installed) and the systems that are installed with four different antivirus packages.
L2 Cache
Copy/execute process
Antivirus Process
L1 Inst Cache
L1 data Cache
Operating System (Windows XP)
Inst Stream
Simulate micro-architecture
Simulated Architecture
HOST
Fig 3.1.2 Multi – Level architectural & Micro Architectural simulation environment
(Derek, Mischa, and David 2005)
The table 3.1.2 shows the summary of five configurations. For each experiment an image file is created and loaded as a CDROM in the machine. The execution of the utility (contains special instructions) at the start and end of each collection was done in order to assist accurate profile collection.
Table 3.1.2: Five environments evaluated: Base has no antivirus software running (Derek, Mischa, David 2005)
Configuration
Anti-Virus edition
Version
Base
NAV
PC-Cillin
McAfee
F-Port
–
Norton Anti-Virus Professional 2004
Trend Micro Internet Security
McAfee Virus scan professional
F-Port Antivirus for windows
–
10.0.0.109
11.0.0.1253
8.0.20
3.14b
The three different operations invoke anti-virus scanning. In first, a file from the CDROM to the hard drive was copied, and then the operating system accessories: calculator and wordpad are run accessing through a shortcut. After experimentation it is found that there is less than one percent difference in the work load parameters throughout the profile runs.
Then on doing the antivirus characterization it is seen that there is a gradual increase in the cache activity which shows that the overheads released is smallest for F-Port and highest for Norton. The impact on memory while running the antivirus software shows that Norton and McAfee have larger footprints that the Base case, F-Port & Cillin.
3.2 Development techniques a framework showing malware detection using combination of techniques
There are several developments in techniques used by antivirus software. These techniques must be able to detect viruses which were not detected by previous techniques and this is what we say a development in technique. Antivirus software not only does detect a virus but also worms, Trojan horses, spyware and other malicious codes which constitute malware. Malware is a code or a program which intents to damage the computer with its malicious code.
We can filter malware by use of specific antivirus software that installs detection techniques and algorithms. Several commercial antivirus programs uses a common technique called signature-based matching; this technique must be often updated to store new malware signatures in virus dictionary. As the technology advances plenty of malware writers aim to employ better hiding techniques, importantly rootkits became a security issue because of its higher hiding ability.
There is a development of many new detection methods which are used to detect malware, machine learning technique and data mining technique. In this research Zolkipli, M.F.; Jantan, A.,2010 have proposed a new framework to detect malware for which there is a combination of two techniques signature based technique and machine learning technique. This framework has three main sections which are signature-based detection, genetic algorithm based detection & signature generator.
Zolkipli, M.F.; Jantan, A., 2010 defines malware as “the software that performs actions intended by an attacker without consent of the owner when executed”. Every malware has precise individuality, goal attack and transmission method. According to Zolkipli, M.F.; Jantan, A., 2010 virus is that malware,” which when executed tries to replicate itself into other executable code within a host”. What so ever, as technology advances creating malware became sophisticated and extensively improved since early days.
Signature-based matching technique is most common approach to detect malware, this technique works by contrasting file content with the signature by using an approach called string scan that “search for pre-defined bit patterns”. There are some limitations which needs to be solved to this technique though it is popular and very reliable for host-based security tool. The problem with signature-based matching technique I it fails to detect zero-day virus attack or zero-day malware attack. Zero-day malware attack are also called new launch malware. To store and capture a new virus pattern for upcoming use, some number of computers needs to be infected.
Figure 3.2.1 shows an automatic malware removal and system repair was developed by F.Hsu et al. 2006 which has three important parts such as monitor, a logger, and a recovery agent.
The framework solves two problems:
Determines the un-trusted program that breaks the system integrity.
Removal of un-trusted program
Untrusted Process
Trusted Process
Logger
Recovery agent
Monitor
Operating System
Figure 3.2.1: Framework for monitoring, logging & recovery by F.Hsu et al. 2006
The framework is used to monitor and enter logs of the un-trusted program. This framework is capable of defending known and unknown malware, though it does not need any prior information of the un-trusted programs. And from the user side there is no need of modifying any current programs and need not observe that the program is running in the framework as the framework is invisible to both known and unknown malware. A sample of this framework was used on the windows environment and shows that all the malware changes can be detected compared to the commercial tools which use the signature based technique.
Machine learning algorithm was tested and applied on the malware detection technique. In order to classify the signature-based technique limitations that particular technique was using an adaptive data compression. The two restrictions of signature-based technique according to Zolkipli, M.F.; Jantan, A., 2010 are:
It is not compulsory that all malicious programs have bit patterns which are proof of their malicious nature and are also not recorded in virus dictionaries.
Many forms of bit patterns are taken by obfuscated malware that will not work on signature-based technique.
Genetic Algorithm (GA) takes the full advantage of system limitations that are used to detect zero day malware or the day malware was launched. The algorithm was used to develop a detection technique called IMAD that analyzes the new malware. To oppose the restrictions of signature-based detection technique this technique has been developed.
Data mining is another technique which was applied on malware detection much before. The standard data mining algorithm classifies every block file content as normal or used to categorize potentially the malware. To defeat the limitations of signature-based antivirus programs an Intelligent Malware Detection System known as IMDS was developed. This system used Object Oriented Association which adapts OOA_Fast_FPGrowth algorithm. A complete experimentation on windows API file sequence was done which re called PE files. The huge gathering of PE files was taken from the King Soft Corporation antivirus laboratory which is used to compare many malware detection approaches. The results show that IMDS system shows the best results than Norton and McAfee. The proposed framework has two techniques combined which are signature-based technique and GA technique. It was designed to resolve two challenges of malware detections.
“How to detect newly launched malware” (Zolkipli, M.F.; Jantan, A., 2010)
“How to generate signature from infected file” (Zolkipli, M.F.; Jantan, A., 2010)
Signature Generator
S-Based Detection
GA Detection
Figure 3.2.2: Framework for malware detection technique (Zolkipli, M.F.; Jantan, A., 2010)
The main components are s-based detection, s-based generator and GA detection(see figure 3.2.2). The s-based detection acts first in defending the malware, then GA detection is the second layer which is another defense layer that is used to detect newly launched malware. After creating the new signature from zero-day malware these signatures are used by signature based detection technique.
Signature based detection is a fixed examining method used on every antivirus product. This is also called a static analysis method. This decides whether the code is malicious or not by using its malware characterization. This technique is sometimes also called scan strings. In general every malware has one or more patterns of signature which has unique characters. Antivirus software searches through data stream bytes, when a program is executed. Database of antivirus software has thousands of signatures it scans through each signature comparing it with the program code which is executed. For comparing purposes searching algorithm is used, the comparison is usually between program code content with the signature database. The Zolkipli, M.F.; Jantan, A., 2010 chooses this technique at the beginning of the framework because of its effective detection of well known viruses. This technique was used in this framework in order to develop the competence of computer operation.
G.A detection technique is one of the most popular technique that is used to detect newly launched malware. This is used to learn approaches to resolve algebraic or statistical research problems. This is a machine learning technique which applies genetic programming that learns a evolving population. Chromosomes are used for data representation which is used in this algorithm, chromosomes are bit string values, new chromosomes are developed from a bit string combinations from existing chromosomes. Basing the nature of the problem the solution for the problem is given. Crossover and mutation are 2 types of basic operations in GA, to solve the issues concerned with polymorphic viruses and new types of malware this technique was introduced in this framework. By using this technique codes of malware using hidden technique can also be detected which only because of its learning and filtering aspects of virus behavior.( Zolkipli, M.F.; Jantan, A., 2010)
S-based generator generate string patterns are used by signatures which are used to characterize and identify the viruses. Forensic experts started creating signatures once a new virus sample is found, based on the virus behavior these signatures are created. All the antivirus products creates their own signatures and accessing records they are encrypted in case there are more than one antivirus software installed on the computer. As soon as a signature is created the signature database is updated with it. Every computer user requires updating the antivirus product with the database in order to defense against the new viruses. Signature pattern is 16 bytes and to detect 16 bit virus 16 bytes is more than enough.( Zolkipli, M.F.; Jantan, A., 2010)
This generator takes the behavior of virus which identified by the GA detection. The signature pattern of the virus is generated and is added to virus database as a new signature for the signature based detection. To replace the forensic experts’ task this framework was proposed. This creation of framework was lot useful in detecting the new virus signature, and to improve the efficiency and performance of the computer.
3.3 Improving speed of signature scanners using BMH algorithm.
This paper discusses about the problem of detecting viruses using signature scanning method that relies on fast pattern matching algorithm So basically in this technique the pattern is a virus signature which is searched for anywhere in the file. This algorithm is an expensive task which affects the performance frequently. Many users may find it impatient if the pattern matching algorithm does not work fast and consumes lot of time. So to avoid this faster pattern matching algorithm is used to the scanner which is Boyer-Moore Horspool algorithm when compare d to Boyer-Moore algorithm and Turbo Boyer Moore algorithm proved to be the fastest pattern matching algorithm.
In technical terms, a virus has three parts which are trigger, infection mechanism and payload. The main mechanism which is infection mechanism part actually looks for fatalities and frequently avoids multiple infections. After looking for fatalities it might overwrite the fatalities or can attach itself at the beginning of the file or at the end of the fle. Trigger is actually a event which specifies when the payload has to be executed. The payload is the foundation of malicious behavior which actually can be corruption of boot sector or manipulating files.
To detect a virus and to disinfect the infected file are two most important tasks of algorithms used by antivirus software. So defense system code of the algorithm must have a part that is able to detect any type of virus code.
There are four types of basic detection techniques.
Integrity Checking
Signature Scanning
Activity Monitoring
Heuristic Method.
Integrity checking technique:
This program gives checker codes that can be checksums, CRCs or hashes of files that are used to check viruses. Regularly the checksum are re-computed and is compared against the previous checksums. In case the two checksums does not match it is indicated that the file is infected since the file is modified. This technique detects the virus presence by detecting the change in files and also is capable to detect new or unknown viruses. But this technique has several drawbacks. Firstly, the primary checksum calculation has to be performed on a virus less clean system so the technique can never detect viruses if system is infected. Secondly there are lots of false positives if the system is modified during execution. (Sunitha Kanaujiya, et., al 2010)
Signature scanning technique:
This technique is used on large scale to detect virus. This reads data from a system and to that it applies pattern matching algorithm to list of existing virus patterns in case it matches with the existing patterns it is a virus. This scanning technique is effective but the pattern database needs frequent updating which is very easy. There are several advantages of this scanner one of it is the scanning speed for this technique can be increased, it can also be used to detect other types of malicious programs like Trojan horses, worms, logic bombs, etc. So mainly for the virus it is only signature of the virus which is needed and update it to the database. This technique is used on many viruses due to this reason.
Activity monitoring technique:
This technique is used to monitor the behavior of programs executed by some other programs these monitoring programs are known as behavior monitor and they stay in main memory. The behavior monitors alarms or do some action to prevent the program when it tries to do some unusual activities like interrupting tables, partition tables or boot sectors. The database maintains every virus behavior that is supposed to be. The main disadvantage is when the new virus uses another infecting method that is not in the database and in this scenario finding virus is helpless. Secondly viruses avoid defense by activating earlier in the boot sequence prior to the behavior monitors. And also viruses modify the monitors if there is no hardware memory protection.
Heuristic Scanner:
This technique checks the characteristics of a file and can find unknown viruses. The dynamic and statistical checking feature of this technique predicts the chances of infection. Before execution it can find many new viruses. But the main drawback is an unharmed file is sometimes placed in the infected files list.
Pattern matching algorithm is important in signature scanning technique. A faster pattern matching algorithm was required to increase the system performance for this to happen the detection tools uses Boyer-Moore Horspool algorithm (BMH) which is a faster pattern matching algorithm that is used by few popular software and also is better when compared to other sequential pattern searching algorithm.
“The pattern matching problem is let there be a text T of “n” length and pattern “P” which is of “m” length. In this the problem is to locate pattern “P” in text “T” or there exists that P in T or not. (Sunitha Kanaujiya, et., al 2010)
The Boyer-Moore Horspool algorithm implementation for this there is requirement of two position indicators. “j” indicates the pattern set up and “k” is a set for target text. Under the target text “T” first letter the first letter of pattern “P” is aligned. It works similar to a text window which only shows “m” characters which is the pattern length. Other positions are allowed later after the window shifts to right. Second position indicator “i” which records right most text position location which can be seen through the window initialized to “m-1”. Started by comparing letter by letter from the letter Pm-1, all the comparisons occurred between text Tk and pattern Pj. Both j and k are decreased after very successful comparison. This continues until and unless there is a character match and till there are remnants of un-compared characters in the pattern P. If it is j=-1 it means that all the pattern characters have been matched and the pattern occurrence in the text has been found. Though a match is found or not the window is shifted to right to certain distance d, k to i and j is set to m-1. The process repeats unless the end of text is reached.
Signature scanner as two main parts a database signature and a scanning engine that scans for virus signatures from the database. They both cannot work individually as they balance with each other. Implementation of signature scanner for this step one is update the signature database virus till date and second step is to search for viruses from the signature database where the viruses are stored. According to (Sunitha Kanaujiya, et., al 2010) “Signature database is a database of uniquely identifiable signatures that a virus contains”. A signature is a series machine code for an executable virus, this series of machine code bytes is a code that virus contains in it. Following fields are contained in the virus code.
Signature of views in HEX.
Virus types (B for boot sector, F for file views and P for partition table)
Virus description
Whenever there is a new virus, by a data entry program the database can be constantly updated. For this the user is asked to enter the virus signature in a HEX (a hexadecimal code) without blank spaces and commas, then the virus type needs to be entered and lastly the user need to enter virus description. For the verified data to be saved in database the description part must include the name of virus, virus properties, comments about it, etc.
In the virus detection engine the boot sector, partition table and all type of files are scanned. The scanner starts scanning after reading details about the virus after this the matching for virus code will take place and will find the exact match and code is identified as virus for this to keep increasing the scanning speed the Boyer Moore-Horspool algorithm is used which is actually a very fast pattern matching algorithm. So when scanning takes place the file scanned from the first byte of file to the last byte of file against the database of signature. The scanner notifies the user whenever there is an irregularity in the patterns.
An analysis is done showing the performance measures, searching for boot sectors, partition tables and also for all type of viruses. The measures taken by (Sunitha Kanaujiya, et., al 2010) are firstly the algorithm implementation is in C. The text which is target text is divided into different slices and each slice is of 1024 characters, only the last one is left out, which may have very few characters. Whatever measurements are done they were all in incremental manner which increase in steps from one slice size to the full target size. All these algorithms were tested on many patterns.
Test one was done on boot sector using virus signatures of boot sector. The target text is of 512 bytes, as the target text used was of smaller size. The performance difference of all Boyer Moore algorithm and its alternatives are very slight but faster algorithm is Boyer Moore-Horspool algorithm which is faster than sequential algorithm.
Second test was on partition table viruses this was on hard disk where the search was for partition table virus signatures. The target text here was also 512 bytes, so the results were same. Third test was all types of file viruses; here the files were 1127 which occupies 1.5GB.
The table 3.3.1 shows the performance of all algorithms which shows the performance basing on the signature database. This performance is measured on the basis of numbers of patterns used, what algorithms are used in this performance and the performance of each algorithm is compared on time factor.
Table 3.3.1: Performance of algorithms on the basis of signature database. (Sunitha Kanaujiya, et., al 2010)
Database size
(No. of Patterns)
Sequential algorithm
(Sec)
TBM
(Sec)
BM
(Sec)
BMH
(Sec)
20
8.6
6.8
6.3
5.3
40
10.7
9.4
8.2
7.2
60
11.4
10.2
9.2
8.5
80
15.6
14.8
11.5
9.6
100
18.5
18.1
13.9
11.8
Next is the algorithm performance according to pattern size. Skip table is not used by the Sequential algorithm to make this algorithm more effective the shift function takes the same time for all the patterns used. Whereas this is not applicable to the Boyer Moore -Horspool algorithm and its alternatives because here if there is a mismatch, pattern size is larger which means the skip shift is much longer which means skip shift is much longer and so the algorithm is faster. But after all the tests the Boyer Moore- Horspool algorithm proved to be the fastest among all algorithms.
The table 3.3.2 below shows the Boyer Moore- Horspool algorithm extreme speed and also shows the minute advantage to lengthy patterns. The table shows the comparison between Sequential and Boyer Moore- Horspool algorithm, where Sequential algorithm performs well on short patterns and less on long patterns.
Table 3.3.2: Performance of algorithms on the basis of pattern size. (Sunitha Kanaujiya, et., al 2010)
Pattern size
(No. of Chars.)
Sequential algorithm
(Sec)
TBM
(Sec)
BM
(Sec)
BMH
(Sec)
8
4.34
3.46
3.35
3.29
16
4.50
3.25
3.29
3.24
20
5.27
2.91
2.75
2.08
32
5.34
2.23
2.14
1.89
48
6.26
1.86
1.75
1.70
Sunitha Kanaujiya, et., al 2010 concludes that signature scanners can be estimated in two ways one basing on the fast scanning and effective detection of viruses. And also Sunitha Kanaujiya, et., al 2010 says that the implementation gives good results in preferable time and using Boyer Moore-Horspool algorithm carefully the virus detection system performance can be improved when compared to the mostly used Boyer Moore pattern searching algorithm. M When considered longer text patterns Boyer Moore Horspool algorithm is most worth implementing. The scanner has the capability to scan infinite different types of files. Presently signature database consists of hundred signatures but to implement in the real world every single existing signature of virus must be maintained in the signature database.
3.4 Security of Emails
Every organization needs emails and every organization is put into risk by emails. Simultaneously email is the way of options for many attackers worldwide. Organizations most focus driven security is on emails because it is dangerous way used by the attackers to damage important files of organization.
Email security vulnerabilities are