When we hear the words "computer virus", we usually think of it as a computer program that does nasty damages on computer systems. However, according to many researchers in the computer virus field, a computer virus is 
A self-replicating program that can "infect" other programs by modifying them or their environment such that a call to an "infected" program implies a call to a possibly evolved, and in most cases, functionally similar copy of the "virus."
The keywords to the definition of computer virus are "self-replicating" and "parasitic". That is, a computer virus always copies itself and attaches the copies to other computer programs. The first computer virus, Brain, is a harmless virus that copies itself and puts a volume label on floppy diskettes. Brain is the result of Basit and Amjad's realization that the boot sector of a floppy diskette contains executable codes, and the codes get executed whenever a computer is booted with a floppy diskette inserted in drive A. They also realized that they could replace those executable codes with their own codes, that the codes can be memory resident, and that the memory resident codes can copy itself onto each floppy diskette that is accessed in any drive . The replication ability of computer viruses attracted many attentions and interests, and soon people started writing both harmful and harmless virus codes. The appearance of anti-virus programs intrigued virus writers to write viruses that can evade detection or make detection difficult. Two popular classes of viruses that use different ideas for evasion are stealth viruses and polymorphic viruses. A stealth virus is a virus that hides modifications it has made in files or boot sectors. A polymorphic virus, on the other hand, generates numerous mutated versions of itself, relying on the anti-virus tools' inability to detect all instances of the virus. The ways these virus hide themselves are very interesting. Most of the time, they use ideas that exploit the weaknesses in computer security and the difficulty in decrypting codes in advanced cryptography. Anti-virus researchers, seeing the new challenges raised by the virus writers, try to unconventional methods to detect the presence of these viruses. Some of these techniques are generic enough to detect most of these viruses, while some are focused to combat a particular virus. This paper tries to list major evasion techniques used by stealth and polymorphic viruses. It also tries to see if any of these evasion techniques used can be combined or improved in order to generate viruses that are even harder to detect. The other side of this two-way problem, virus detection, will also be presented. This paper also tries to suggest the possibility of combining some of the detection techniques for better virus defense.
The first stealth virus was Brian, which is a boot infector. It monitors physical disk I/O and redirects any attempt on reading a Brian-infected boot sector to where the original disk sector is stored. The idea of stealth virus is to hide any modifications made to files or boot records in order to avoid detection . Stealth viruses are memory resident viruses because they need to monitor any attempt to read the infected files or the infected boot sectors. Stealth viruses are separated into semi stealth viruses and full stealth viruses. Semi stealth viruses are those that subtract the size of the virus code (in bytes) from the infected host file any time a directory command (DIR) is executed. Full stealth viruses are those that remove themselves from the file and re-infect it when the inspection is over .
Since stealth viruses need to monitor any accesses to the infected files and to intercept those requests, they must "hook" onto BIOS Int 13h and Int 21h. Applications use software interrupts to communicate with the operating system. The operating system has a table of the addresses of software interrupt handlers. Whenever a software interrupt is called, the operating system determines the address of the interrupt handler, jumps to that address, and starts executing the interrupt handler. A virus hooks onto an interrupt by changing the address of the interrupt handler in the interrupt table to the location of the virus code. It can then decide what to do with the request . Figure 1 shows how an interrupt routine is affected by a stealth virus.
Figure 1. An interrupt routine before and after virus infection
Both semi stealth and full stealth viruses infect a file or a boot sector by first hooking onto Int 21h. The difference between them is the instructions in the virus handler. Semi stealth viruses subtract the virus code size from the size obtained by Int 21h and give the result back to Int 21h to display the value on screen. Full stealth viruses, on the other hand, remove itself from the file or the boot sector before the interrupt functions are applied to the file or the boot sector. Thus, the interrupt functions obtain the correct results. After the inspection, the full stealth viruses re-infect the file or the boot sector.
Since stealth viruses deliberately hide modifications made to files and boot sectors, an ordinary virus string scanner cannot detect their presence. Stealth viruses are generally detected and handled by three techniques:
Generic techniques are a set of techniques that can be applied to many different viruses. They include the DOS command Fdisk, generic virus capture, and generic integrity analyzers.
The simplest generic technique is the DOS "fdisk" command. This method is applicable when a disk's boot sector has been infected by a stealth virus. The stealth virus usually copies the whole Master Boot Record of the hard disk to some other sector to make room for its own code. Thus, the command Fdisk / MBR overwrites the virus code with a new Master Boot Record. The original partition table can then be manually restored.
The idea of generic virus capture is to sense any virus-like behaviors occurring in the computer system. The kind of testings virus capture performs include self-baiting, self integrity checking, verification of memory stealing, launching bait sequences, sensing piggybacking or file killing, integrity checking, sensing deception like the taking away of file handle . Stealth viruses, when active, will disclose their existences to one or more of these sensing methods. Generic virus capture, therefore, can detect the activities of both known and new viruses.
Generic integrity analyzers are the most powerful generic anti-virus method. When used properly, it can tell the size of the virus, whether the virus is semi or full stealth, and if the file is recoverable . Generic integrity analyzers function by comparing the first several bytes of a possible infected file with the first several bytes of the uninfected file. The first several bytes of a file reveals much information because a virus infects a file by either appending, prepending, or inserting virus code into the file and modifying the first several bytes of the file to jump to the virus code. This method requires a snapshot of the clean computer system.
SeeThru gets its name because anti-virus tools use low level tunneling techniques that "see through" virus hooking onto BIOS Int 13h. This method is only applicable to full stealth viruses of both the boot and file infector types. The idea behind SeeThru is very simple. Full stealth viruses always return the correct, uninfected data of the inspected boot sector or file when the file or boot sector is being read. To recover from a boot sector infection, simply copy the correct image returned by the stealth virus and rewrite it to the same place using tunneling techniques. To recover from a file infector, simply copy the file returned by the full stealth virus and overwrite the infected file.
Cooperative integrity checking exploits the same idea that full stealth virus returns the correct information about the infected boot sector or file. A new integrity database is obtained while the virus is active in memory. The computer system is then rebooted and programs are restored from the new integrity database.
The first polymorphic virus was written by Mark Washburn in 1990. The idea of this virus and its sequences (1260, V2P1, V2P2, and V2P6) is that the virus code would be variably encrypted, and a decryptor would be placed at the beginning of the virus. The decryptor could take a wide number of forms. These viruses have a longest possible search string of two bytes. Consequently, signature matching used in the anti-virus tools at that time was unable to detect polymorphic viruses . Even though 1260 was the first polymorphic virus, it was not widespread. Tequila was actually the first widespread polymorphic virus. Tequila was first erupted in April 1991. Most of the major scanners were not able to detect the virus reliably until September 1991 . Polymorphic viruses became prolific when the mutation engine from Dark Avenger became available for download in 1992. Since then, many new ideas and techniques were developed and used by virus writers for protecting their works from anti-virus tools. Anti-virus researchers, on the other hand, disassembled those viruses, learnt their characteristics, and tried to provide solutions for detecting their existence as well as recovering the damages done by these viruses. In the sections below, some of the major techniques used are discussed. Detection techniques used in anti-virus tools will also be described.
The idea behind polymorphic viruses is to generate mutated copies of the virus when the virus replicates. Since each copy of the virus is different, the difficulty in matching virus signature increases tremendously. The following is a list of major techniques used by polymorphic viruses to avoid detection:
This is actually one of the simplest techniques for generating mutated versions of a virus. The virus self-encrypts with a variable key every time it replicates. Before an execution, the decryptor decrypts the virus code in order to have the instructions executed properly. The decryptor must be in plaintext. The typical size of a decryptor is about ten or twenty bytes. The decryptor is identical in every infected executable . The format of an encrypted virus is shown in Figure 2, which is taken out of . "DE" shown is the decryptor.
Figure 2. Two programs infected with the same virus.
Examples of this subclass of viruses include Cascade, and December_3rd. Even though these viruses generates different encrypted virus codes every time the viruses replicate, they are not truly polymorphic since the decryptor is the same for all infected programs. They are often referred as oligomorphic viruses . Since the decryptor remains the same for all infected programs, it can be used as the signature. Therefore, these viruses can be detected by signature driven scanners.
Several simple techniques can be used for code alternations:
Figure 3. An example on noise instruction addition
Figure 4. An example on instruction interchange
Figure 5. An example on equivalent instruction
Figure 6. An example on changing registers
The resulting virus codes after code alternations are different from the original virus. The code alternation methods above can be used in any combinations in order to produce different effects. Code alternation increases the difficulty for signature scanners to assign a particular signature to the virus. It can be combined with encryption to provide even further varied copies of the virus. An interesting used of noise instructions mentioned in  is described below:
First, the space to be filled with the encryption mechanism is filled with dummy one-byte op-codes such as CLC, STC, etc. The parts of the encryption mechanism are randomly placed in parts of this buffer. The end result is that the gaps between the "real" instructions are filled in with random dummy op-codes. No generic scan string can be located for the encryption mechanism of the virus. The disadvantage of this method is of course the increase in code size.
The generic coding technique exploits the fact that scan string (or virus signature) represents actual code and can NEVER contain code that can occur in a "normal" program . Virus scanners detect viruses by searching for virus signature that could be located anywhere in the file. If a virus is written in such a way that most of its codes are common and can easily be found in any program files, virus scanners then have no way of distinguishing the virus program from any other programs.
It has been mentioned that virus encryption with only variable keys is not truly polymorphic. Virus scanners can easily use the decryptor, which remains constant for each copy of the virus, as the scan string. Thus, in order to minimize scannable code, the use of a variable encryptor / decryptor pair for encryption and decryption of viruses has been suggested. A virus using this technique has a database of encryption / decryption engines. It randomly uses one each time it replicates. The encryption engines can be various forms of XOR operations, mathematical operations such as ADD, SUB, or assembler instructions such as ROR, MOV, PUSH, etc. Examples of this type of viruses include Mark Washburn's V2Px series of viruses.
Yet another sophisticated technique for generating polymorphic viruses is the use of polymorphic generators. They are modules that can be linked to existing viruses. After linking the virus to the polymorphic generator, the virus can request the generator to create an encrypted copy of the virus code and the generator itself, plus a decryptor. The encryption techniques used by the polymorphic generators are usually relatively simple. However, the combinations of simple encryption techniques, variable key encryption, and decryption code alternations, make virus detection by scan strings almost impossible. Polymorphic generators infect a file by replacing the first byte of the file with a jump command to the end of the file. It then appends a decryptor, the encrypted version of the virus code and the generator itself, as well as the original first byte of the file to the end of the file. Figure 7 illustrates the file structure of a file infected by a polymorphic virus.
Figure 7. File structure of a file infected by a polymorphic virus
The first polymorphic generator was the Mutation Engine, or MtE, released by Dark Avenger in 1991 . The Mutation Engine is capable of billions of different permutations. Today, more than 33 viruses are known to be spawn from MtE. They include MtE.Destructor, which is a dangerous non-memory-resident parasitic virus; MtE.Groove, which is a dangerous memory resident virus that hooks INT 21h and infects .COM, .EXE, and .OVL files; and MtE.Pogue, which is a harmless memory resident virus that appends itself to .COM files when they are started or closed . Anther polymorphic generator, called Trident Polymorphic Engine, or TPE, appeared late in 1992. TPE generates fewer permutations than MtE, but the decryptors it generated are more generic, therefore harder to detect . Many other polymorphic generators are well known and are distributed via virus exchange BBS, or underground networks. Examples include Nuke Encryption Device (NED), Dark Angel's Multiple Encryptor (DAME), Guns'n'Roses Polymorphic Engine (GPE), etc.
With the exception of virus encryption with variable keys, all of the evasion techniques used by polymorphic viruses are difficult to detect by a virus scanner. As readers may suspect, more sophisticated detection techniques, or even combinations of detection techniques are required. Detection techniques for polymorphic viruses are divided into the following categories:
Checksumming can be used to detect polymorphic viruses because polymorphic viruses make changes to the files they infect. The idea of checksumming is to detect changes in a file. Since checksumming assumes that executables are static objects, changes in an executable imply a possible virus infection . The checksumming program is first executed on the clean system to provide a baseline for comparisons. In subsequent checksumming executions, the calculated checksum is compared with the baseline. A difference in checksum values indicates changes in the file. Two mathematical techniques are usually employed in checksumming:
CRC checksums are often used in communication networks for detecting errors in packets. They are efficient and well understood. However, if a virus can find the checksum of a file, the checksum can be easily broken. Cryptographic checksums are obtained by applying cryptographic techniques on the calculated checksums. Any public key or private key cryptosystems can be used, although private key cryptosystems are more efficient. Examples of cryptographic checksums include RSA MD4 Cryptographic Checksum Using DES (rsa-md4des) and RSA MD5 Checksum (rsa-md5). Details on these two cryptographic checksums can be found in [10, 11].
Since checksumming is based on the assumption that executables are stable, it will fail if the executables are self-modifying, or the executables have been recompiled. Furthermore, a checksumming program assumes the computer system is virus-free when it first executed on the system. Obviously, this assumption is not always true. Thus, having a clean system before applying checksumming is very important.
Instead of searching for a particular virus signature, heuristic binary analysis looks for more complex and fuzzy patterns, and virus-like behaviors. When such a pattern is found, the analyzer will indicate to the user a possible infection. Since heuristic binary analyzers flag when there is a possible infection, they rely heavily on the users' knowledge of the computer systems to determine if the file is actually infected. Details on heuristic binary analysis can be found in .
The generic decryption engine is a program that can decrypt any encrypted program. It examines the infected program, tries to determine the decryption algorithm from the code, and decrypts the virus. Once the virus instructions are decrypted, it can be easily identified by the constant sequence of bytes hidden by the encryption .
An integrity-checking program computes the checksum of executable code in a computer system and stores the checksum in a database. Thus, checksumming is an integrity-checking tool. Besides storing the checksum in an off-line database, the checksum can also be attached to the executable files. Another implementation of integrity checking is the integrity shells. Integrity shells are memory resident programs similar to resident virus scanners. They check the integrity of an executable at the moment before the file being executed.
Since the ideas behind stealth and polymorphic viruses are totally different, the evasion techniques used by stealth cannot be used by polymorphic and vise versa. However, these evasion techniques can be combined to produce viruses that are both stealth and polymorphic in nature. If a virus is both stealth and polymorphic, many of the detection techniques used for polymorphic viruses become ineffective. For example, both checksumming and integrity checking would fail as the virus would return an uninfected file when the infected file is called for inspection. Generic decryption engines may still work if a copy of the infected program can be obtained. Heuristic binary analysis should still be able to detect the existence of a virus since it is a generic method. It looks for virus-like behaviors. All of the detection techniques for stealth viruses should still work for stealth-polymorphic viruses because stealthy viruses are memory resident and always hook onto BIOS Int 13h or Int 21h. That is, the addition of the stealth virus characteristics to polymorphic virus makes the virus less subjective to polymorphic detection techniques, but more subjective to stealth detection techniques. However, stealth detection techniques may not be able to detect all instances of a stealth-polymorphic virus. Thus, stealth-polymorphic viruses are more difficult to detect.
Another dimension of difficulty can be added to polymorphic virus detection if some tricks are applied to the virus. For example, if the virus code is separated into two parts and each part uses a different encryption technique, generic decryption engines may have a harder time to determine the decryption algorithm from the code. Another way to minimize scannable code is to steal engines from common applications. This way, less code can be used as a signature.
The detection of viruses cannot be relied on one particular technique. For example, the use of checksumming or generic integrity analysis requires information of a clean computer system. However, how can the computer system be verified as a clean system? Therefore, generic virus capture and heuristic binary analysis may be applied first. After that, virus scanners and on-demand scanners may be applied to detect known viruses. Lastly, generic methods may be applied once a week. A computer system can only be less subjective to virus attacks if multiple, overlapping, and partially redundant anti-virus mechanisms are used.
16. A Primer to Generic Anti-Virus Methods, Error! Bookmark not defined.