|
|
The types of malicious logic discussed so far are not distinct. Computer viruses are a form of Trojan horses. Computer viruses may contain logic bombs, as might computer worms. Some worms and viruses are bacteria because they absorb all the resources of some type.
|
EXAMPLE: The Internet worm was a bacterium on many systems. During its infection, the worm opened a port on the network. When another worm tried to infect the system, it first checked the port. If the port was open, the infecting worm knew that another worm was resident on the computer. The author apparently feared that this check would lead to a defense of system administrators opening the port with a small program. So, once out of every six times, the check was ignored and the worm reinfected the infected system. Because the worm was so prolific, infected machines quickly had many different copies of the worm and were overwhelmed. The worms consumed the CPU. |
|
EXAMPLE: The Father Christmas worm created so much network traffic that the networks became unusable and had to be shut down until all instances of the worm were purged from the mail queues. Hence, it was a bacterium also. |
An obvious question is whether a universal detector can be written to detect malicious logic. We consider the narrower question of whether there is an algorithm that can determine if an arbitrary program contains replicating code (this covers both replicating Trojan horses and computer viruses).
Cohen [207] asked if a single algorithm could detect computer viruses precisely. He demonstrated that the virus detection problem, like the safety problem (see Theorem 3–2), is undecidable.
Definition 22–16. [207] Let T be a Turing machine and let V be a sequence of symbols on the machine tape. Let sv be a distinguished state of T. For every v
V, when T lies at the beginning of v in tape square k, suppose that after some number of instructions are executed, a sequence v'
V lies on the tape beginning at location k', where either k + |v|
k' or k'+ |v|
k. Then (T, V) is a viral set and the elements of V are computer viruses.
Figure 22-3 illustrates this definition. The virus v may copy another element of V either before or after itself but may not overwrite itself. Both possibilities are shown. If v' precedes v, then k' + |v|
k; otherwise, v precedes v', and k + |v|
k'. Definition 22–16 is a formal version of Definition 22–4. It focuses on the replication (copying) aspect of computer viruses but includes the execution phase as a component of v that need not be copied. In this case, v' would be the infection part of v, and the actions other than infection would be the remainder of v.

Cohen established the undecidability of detecting generic computer viruses by showing that, if such a decision procedure existed, it would solve the halting problem. Consider an arbitrary Turing machine T and an arbitrary sequence S of symbols on tape. Construct a second Turing machine T' and tape V such that, when T halts on S, V and T' create a copy of S on the tape. Then T' replicates S if and only if T halts on S. By Definition 22–16, a replicating program is a computer virus. So, there is a procedure that decides if (T', V) is a viral set if and only if there is a procedure that determines if T halts on S—that is, if there is a procedure that will solve the halting problem. Because the latter does not exist, neither can the former.
Theorem 22–1. [207] It is undecidable whether an arbitrary program contains a computer virus.
Proof Let T and V define a Turing machine and sequence of tape symbols, respectively. We construct a second Turing machine T' and sequence V' such that T' reproduces V if and only if running T on V halts.
Let A and B be tape symbols, so A, B
M. Let qi, i
1 be states of the Turing machine, so qi
K for i
1. Let a, b, i, and j be non-negative integers. We also redefine the function d as d: K x M
K x M x {L, R, –}, where "–" refers to no motion. This function is equivalent to the d function in Section 3.2 (see Exercise 13).
We will find it convenient to abbreviate arguments and values of d as follows. Let x, y, z, u, and si, i
1, represent values drawn from the set of tape symbols M. We can then write
d(qa, y) = (qa, y, L) when y
A
to represent all definitions of d where the first argument to d is qa and the second argument to d is an element of M other than A.
Three actions recur in our construction of T'. We define abbreviations to simplify the description of that Turing machine. For any symbol x
M, LS(qa, x, qb) represents the sequence
d(qa, x) = (qb, x, –)
d(qa, y) = (qa, y, L) when y
x
This sequence takes effect when the Turing machine is in state qa. It moves the head to the left, skipping over take squares, until the machine encounters a square with the symbol x. At that point, the Turing machine enters state qb, and the head remains over the square with the X symbol.
The abbreviation RS(qa, x, qb) is defined similarly, but for motion to the right:
d(qa, x) = (qb, x, –)
d(qa, y) = (qa, y, R) when y
x
This sequence moves the head to the right until a square containing x is found. The head stops at that square.
The third abbreviation, COPY(qa, x, y, z, qb), means that the Turing machine's head moves right to the next square containing the symbol x and copies the symbols on the tape until the next square with the symbol y is encountered. The copy is placed after the first symbol z following the symbol y. Once the copying is completed, the Turing machine enters state qb.
The following sequence captures this. The part of each line following the semicolon is a comment, for exposition purposes only. We assume that the symbols A and B do not occur on the tape. If necessary, we augment the set M with two symbols and use them for A and B.
RS(qa, x, qa+i) | ; move the head over the next x |
d(qa+i, x) = (qa+i+1, A, –) | ; replace x with symbol A |
RS(qa+i+1, y, qa+i+2) | ; skip to the end of the segment to copy |
RS(qa+i+2, z, qa+i+3) | ; skip to the location to copy it to (which |
d(qa+i+3, z) = (qa+i+4, z, R) | ; is the square after the one containing z) |
d(qa+i+4, u) = (qa+i+5, B,–) for any u | ; mark it with B |
LS(qa+i+5, A, qa+i+6) | ; move the head back to where x was |
d(qa+i+6, A) = (qa+i+7, x, –) | ; put x back |
d(qa+i+7, sj) = (qa+i+5j+10, A, R) for sj | ; overwrite the first uncopied symbol |
d(qa+i+7, y) = (qa+i+8, y, R) | ; for the terminal one, go to cleanup |
RS(qa+i+5j+10, B, qa+i+5j+11) | ; move to location to copy symbol to |
d(qa+i+5j+11, B) = (qa+i+5j+12, sj, R) | ; put it down |
d(qa+i+5j+12, u) = (qa+i+5j+13, B, –) | ; mark where the next symbol goes |
LS(qa+i+5j+13, A, qa+i+5j+14) | ; go back to where the original was |
d(qa+i+5j+14, A)= (qa+i+7, sj, R) | ; copy it back |
RS(qa+i+8, B, qa+i+9) | ; last symbol—move to where it goes |
d(qa+i+9, B) = (qb, y, –) | ; write it and enter terminal state |
We proceed to construct T' and V'. Define the set of symbols in T' to be
M' = { A, B, C, D }
M
where A, B, C, D
M, and the set of states to be
K' = { qa, qb, qc, qd, qe, qf, qg, qh, qH }
K
where qa, qb, qc, qd, qe, qf, qg, qh, qH
K. The initial state of T' is qa, and the halting state of T' is qH. The initial state of T is qf, and we simulate the halting state of T by the state qh. We abbreviate the execution of T on the tape with the head at the current position as SIMULATE(qf, T, qh), where qf is the state of T' corresponding to the initial state of T and qh is the state of T' corresponding to the final (terminal) state of T.
Let V' = (A, B, V, C, D). Then the transition function d for T' is:
d(qa, A) = (qb, A, –) | ; check beginning of tape |
d(qa, y) = (qH, y, –) for y | ; halting state |
COPY(qb, B, C, D, qc) | ; copy V after D |
LS(qc, A, qd) | ; head moves to D (T executes the copy |
RS(qd, D, qe) | ; of V, not the original one) |
d(qe, D) = (qe, D, R) | ; move over the D |
d(qe, B) = (qf, B, R) | ; enter the initial state of T with the |
; head at the beginning of V | |
SIMULATE(qf, T, qh) | ; simulate T running on V |
LS(qh, A, qg) | ; T terminated—go to beginning |
; of T''s tape | |
COPY(qg, A, D, D, qH) | ; copy initial contents over results of |
; running T on V (reproduction) |
The Turing machine T' first makes a copy of V. It then simulates T running on the copy of V. The original V is to the left of the copy (see Figure 22-4), so the simulation of T cannot alter it. If the simulation halts, T' enters state qh, in which the original copy of V is recopied. This satisfies Definition 22–16. On the other hand, if the simulation never halts, V is never recopied, and Definition 22–16 is never satisfied. This establishes the desired result.

Adelman used a completely different approach to obtain a generalization of this result, which we state without proof.
Theorem 22–2. [9] It is undecidable whether an arbitrary program contains malicious logic.
These results mean that there is no generic technique for detecting all malicious logic, or even all computer viruses. Hence, defenses must focus on particular aspects of malicious logic. Furthermore, multiple defenses are needed. We turn to these defenses now.
|
|
| Top |