Previous section   Next section

17.3 Covert Channels

Covert channels use shared resources as paths of communication. This requires sharing of space or sharing of time.

Definition 17–6. A covert storage channel uses an attribute of the shared resource. A covert timing channel uses a temporal or ordering relationship among accesses to a shared resource.

EXAMPLE: The covert channel in the example on page 441 is a covert storage channel. The shared resource is the directory and the names of the files in that directory. The processes communicate by altering characteristics (file names and file existence) of the shared resource.

EXAMPLE: A study of the security of the KVM/370 system [883] found that two virtual machines could establish a covert channel based on the CPU quantum that each virtual machine received. If the sending virtual machine wished to send a "0" bit, it would relinquish the CPU immediately; to send a "1," it would use its full quantum. By determining how quickly it got the CPU, the second virtual machine could deduce whether the first was sending a "1" or a "0." The shared resource is the CPU. The processes communicate by using a real-time clock to measure the intervals between accesses to the shared resource. Hence, this is a covert timing channel.

A covert timing channel is usually defined in terms of a real-time clock or a timer, but temporal relationships sometimes use neither. An ordering of events implies a time-based relationship that involves neither a real-time clock nor a timer.

EXAMPLE: Consider a variant of a channel identified in KVM/370 [401, 1061]. Two virtual machines share cylinders 100 through 200 on a disk. The disk uses a SCAN algorithm [805] to schedule disk accesses. One virtual machine has security class High, and the other has class Low. A process on the High machine is written to send information to a process on the Low machine.

The process on the Low machine issues a read request for data on cylinder 150. When that request completes, it relinquishes the CPU. The process on the High machine runs, issues a seek to cylinder 140, and relinquishes the CPU. The process on the Low machine runs and issues seek requests to cylinders 139 and 161. Because the disk arm is moving over the cylinders in descending order, the seek issued to cylinder 139 is satisfied first, followed by the seek issued to cylinder 161. This ordering represents a 1 bit.

To send a 0 bit, the process on the High machine issues a read request for data on cylinder 160 instead of cylinder 140. Then the process on the Low machine's requests will be satisfied first on cylinder 161 and then on cylinder 139.

Is this a covert timing channel or a covert storage channel? Because it does not involve a real-time clock or timer, the usual definition implies that it is a covert storage channel.

Modify the example slightly to postulate a timer. The process on the Low machine uses this timer to determine how long it takes for its requests to complete. If the timer shows that the time required to satisfy the request for a seek to cylinder 139 is less than the time required to satisfy the request for a seek to cylinder 161, then a 1 bit is being sent. If the timings indicate the opposite, a 0 bit is being sent. This modification clearly uses a covert timing channel.

The difference between the modified example and the original example is the presence of a timer. The timer changes nothing about the way the channel works. For this reason, we include relative ordering of events as a covert timing channel.

A second property distinguishes between a covert channel that only the sender and receiver have access to and a covert channel that others have access to as well.

Definition 17–7. A noiseless covert channel is a covert channel that uses a resource available to the sender and receiver only. A noisy covert channel is a covert channel that uses a resource available to subjects other than the sender and receiver, as well as to the sender and receiver.

The difference between these two types of channels lies in the need to filter out extraneous information. Any information that the receiver obtains from a noiseless channel comes from the sender. However, in a noisy channel, the sender's information is mixed with meaningless information, or noise, from other entities using the resource. A noisy covert channel requires a protocol to minimize this interference.

The key properties of covert channels are existence and bandwidth. Existence tells us that there is a channel along which information can be transmitted. Bandwidth tells us how rapidly information can be sent. Covert channel analysis establishes both properties. Then the channels can be eliminated or their bandwidths can be reduced.

17.3.1 Detection of Covert Channels

Covert channels require sharing. The manner in which the resource is shared controls which subjects can send and receive information using that shared resource. Detection methods begin with this observation.

17.3.1.1 Noninterference

Models such as the Bell-LaPadula Model represent information transfer using read and write operations and develop controls to restrict their use. Viewing "information transfer" more broadly, one can consider any operation that a second process can detect as being a write command. This immediately leads to the use of an interference model to detect these covert channels. If a subject can interfere with another subject in some way, there is a covert channel, and the nature of the interference identifies the channel.

EXAMPLE: The SAT system has a multilevel security policy analyzed in terms of noninterference [434]. The formal model of the SAT was analyzed to locate covert channels [436]. The first analysis, using noninterference, introduced the p(i, l) function, which removes all instructions issued by subjects dominated by level l from the instruction stream i. A(i, s) is the state resulting from the execution of the instruction stream i on the state s. s.v(s) describes the subject s's view of the state s. Then, by Definition 8–4, the system is noninterference-secure if and only if, for all instruction sequences i, subjects s with security level l(s), and states s,

A(p(i, l(s)), s).v(s) = A(i, s).v(s)

This leads to a version of the unwinding theorem (Theorem 8–1):

Theorem 17–1. Let S be the set of states of the system. A specification is noninterference-secure if, for each subject s at security level l(s), there exists an equivalence relation : S x S such that

  1. for s1, s2 S, when s1 s2, s1.v(s) = s2.v(s)

  2. for s1, s2 S and any instruction i, when s1 s2, A(i, s1) A(i, s2)

  3. for s S and instruction i, if p(i, l(s)) is empty, A(p(i, l(s)), s).v(s) = s.v(s).

Intuitively, this theorem states that the system is noninterference-secure if equivalent states have the same view for each subject, the view remains the same when any instruction is executed, and instructions from higher-level subjects do not affect the state from the viewpoint of lower-level subjects.

The designers looked at several parts of the SAT specification. The relevant parts were for the object creation instruction and the readable object set.

Let s be a subject with security level l(s), and let o be an object with security level l(o) and type t(o). Let s be the current state. The set of existing objects is listed in a global object table T(s). Then the object creation specification object_create is as follows.

Specification 17–1.

[s' = object_create(s, o, l(o), t(o), s) s' s' [o T(s) l(s) l(o)]

The object is created if it does not exist and if the subject's clearance is sufficient to permit the creation of an object at the desired level.

The readable object set contains the set of existing objects that the subject could read in at the current, or at least at a future, state. We ignore discretionary controls for this predicate. Let s be a subject and o an object. Let l and T be as before, and let can_read(s, o, s) be true if, in state s, o is of a type to which s can apply the read operation (ignoring permissions). Then:

Specification 17–2.

o readable(s, s) [o T(s) ¬(l(o) l(s)) ¬(can_read(s, o, s))]

An object is not in the set if it does not exist, if the subject's security level does not dominate the object's security level, or if the subject is of the wrong type to read the object (or vice versa).

Because the SAT system model was tranquil, adding an object to the readable set requires a new object to be created. Let s' be the subject that creates it. Then:

Specification 17–3.

[o readable(s, s) o readable(s, s')]

[s' = object_create(s', o, l(o), t(o), s) o T(s)

l(s') l(o) l(s) can_read(s, o, s')]

For an object to be added to a subject's readable set, it initially cannot exist, it must first be created, and then its levels and discretionary access controls must be set appropriately.

Consider two states s1 and s2. These states differ only in that an object o exists in state s2 and not in s1 and that in state s2, l(s) does not dominate l(o). By Specification 17–2, o readable(s, s1) (because o does not exist) and o readable(s, s2) (because ¬(l(o) l(s))). Thus, s1 s2. Now, s issues a command to create o with l(o) = l(s) and of a type that it can read (that is, can_read(s, o, s1') is true, where s1' is the state after object_create(s, o, s1)). By Specification 17–1, s1' differs from s1 by the addition of o to the table T(s1). This new entry would satisfy can_read(s, o, s1' ) and l(s') l(o) l(s), where s' is the subject that created the object.

Next, because o exists in s1, s2' = object_create(s', o, s2) = s2. So, s1 s2 is true, but A(object_create(s', o, s1), s1) A(object_create(s', o, s2), s2) is false. This means that condition 2 in Theorem 17–1 is false. Thus, Theorem 17–1 does not apply.

Exploiting this covert channel is straightforward. To send a 1, the subject at a high level creates an object at a high level. The recipient (a second subject) tries to create the same object but at a low level. The creation fails, but no indication of the failure is given. The second subject then gives a different subject type permission to read and write the object. It writes a 1 to the object and reads the object. The read returns nothing. To send a 0, the subject at the high level creates nothing, but the subject at the low level follows the same steps. In this case, the read returns a 1.

Because noninterference techniques reason in terms of security levels and not in terms of time, these techniques are most useful for analyzing covert storage channels.

17.3.1.2 The Shared Resource Matrix Methodology

Kemmerer introduced a methodology for identifying shared channels and determining in what ways they are shared [557, 558]. First, the analyst identifies all shared resources and the attributes of those resources that are visible to subjects. These attributes make up the rows of the matrix. Next, the analyst determines the operations that either reference (read) or modify (alter) the attributes. These operations make up the columns of the matrix. The contents of each element of the matrix indicate whether the operation references, modifies, or both references and modifies the attribute.

EXAMPLE: Consider a system that implements a multilevel security model. Files have four attributes: file existence, file owner, file label, and file size. Two subjects, one High and one Low, are active. The file manipulation operations are read_ file, write_ file, delete_ file, and create_ file. Reading succeeds if the file exists and the subject's label is greater than or equal to the file's label. Writing and deletion succeed if the file exists and the subject's label is less than or equal to the file's label. Creation succeeds if no file with the given name exists. The file is given the creating process as its owner and the label of the creating process as its label.

The shared resource matrix is as follows.

 

read_file

write_file

delete_file

create_file

file existence

R

R

R, M

R, M

file owner

  

R

M

file label

R

R

R

M

file size

R

M

M

M

Because all four operations check for the existence of the file, they reference the attribute. The "R" in each matrix location reflects this. The create_ file and delete_ file operations also modify that attribute. This is reflected by the "M." Read and write do not check ownership, but delete and create do; create modifies the owner, and delete references it. The file label is set by create_ file and referenced by the other operations, and all but read_ file modify the file size. The read_ file operation checks the size of the the file to determine if the end of the file will be (or has been) encountered.

The next step is to determine whether any of these shared resources provide covert channels. The following properties must hold for a covert storage channel to exist.

  1. Both the sending and receiving processes must have access to the same attribute of a shared object.

  2. The sending process must be able to modify that attribute of the shared object.

  3. The receiving process must be able to reference that attribute of the shared object.

  4. A mechanism for initiating both processes, and properly sequencing their respective accesses to the shared resource, must exist.

Hence, we need to consider only those attributes with both "R" and "M" in their rows.

EXAMPLE: The High process is not allowed to communicate directly with the Low process. For this example, the sending process is the High one and the receiving process is the Low one. Consider the create_ file operation, which both references and modifies the attribute file existence. Both the High and Low processes have access to the file existence attribute. The High process can modify the file existence attribute using create_ file or delete_ file. The Low process can use create_ file to reference this attribute regardless of the file label because if the file exists the creation will fail. All that remains is to devise a mechanism for sequencing the accesses to the attribute of the shared resource, the file.

Let two files be named ready and done and a third be named 1bit. Both processes begin. The Low process creates a file named ready at the High level. The High process references the file existance attribute of this file and sees it exists. If the High process is to send a 1, it creates the file 1bit at the High level. The lack of this file will indicate a 0 bit. The process then deletes the ready file and creates the file done at the High level.

The Low process periodically tries to create the done file at level High. When it fails, the file exists. The process then tries to create the file named 1bit at the High level. On success, it records a 0. On failure, it records a 1. The process then deletes the file named done and creates ready at the High level. This continues until the message is sent. This is a covert storage channel.

The requirements for covert timing channels are similar to those for covert storage channels.

  1. Both the sending and receiving processes must have access to the same attribute of a shared object.

  2. Both the sending and receiving processes must have access to a time reference, such as a real-time clock, a timer, or the ordering of events.

  3. The sending process must be able to control the timing of the detection of a change in the attribute by the receiving process.

  4. A mechanism for initiating both processes, and properly sequencing their respective accesses to the shared resource, must exist.

As with covert storage channels, we need to consider only those attributes with both "R" and "M" in their rows.

EXAMPLE: The variant of the KVM/370 channel (on page 447) is an example of a timing channel. Both the sender and receiver have access to the same attribute—the ordering of requests by the disk-arm scheduler. Both have access to a time reference—the ordering of the requests. The High process can control the ordering of the requests of the Low process by the cylinder number of the request that the High process issues, so it can control the (relative) timing of the detection of a change in the attribute (ordering) by the Low process. Whether this channel can be exploited therefore depends on the initiating and sequencing mechanisms required by requirement 4.

Kemmerer demonstrates the use of the shared resource matrix (SRM) methodology at various stages of the software life cycle ranging from English requirements and formal specifications to implementation code. Its flexibility is one of its strengths.

The SRM methodology was used to analyze the Secure Ada Target [435, 436]. The participants constructed the matrix manually from a flow analysis of the model. From the transitive closure of the elements of the matrix, two potential covert channels were found, one using the assigned level attribute of an object and the other using the assigned type attribute.

The SRM methodology is comprehensive but incomplete. In particular, it does not address the problem of determining what the shared resources are and what the primitives used to access them are. In some ways, this is appropriate, because the techniques used differ at the different steps of the software life cycle. The generality of the SRM method makes it suitable for use throughout the life cycle. However, the absence of detail makes its application sensitive to the analysis of the particular stage of development: specification, design, or implementation. The next approach looks at these issues at the implementation, or source code, level.

17.3.1.3 Information Flow Analysis

The methods of Denning and Denning and of Reitman and Andrews discussed in Sections 16.3.3 and 16.3.4 can uncover covert channels. When an exception occurring depends on the value of a variable, a covert channel exists because information leaks about the value in that variable. Synchronization and interprocess communication primitives also cause problems because one process can control when it sends a message or blocks to receive a message, something the second process can typically detect. This differs from shared variables, which are legitimate channels of information flow. The covert channel occurs because of timing considerations or shared resources (such as a file system).

Tsai, Gligor, and Chandersekaran [1002] have developed a method for identifying covert storage channels in source code. The method asserts that covert (storage) channels arise when processes can view or alter kernel variables. It focuses on identifying variables that processes can refer to directly or that processes can view or alter indirectly (through system calls).

The first step is to identify the kernel functions and processes for analysis. The processes involved are those that function at the highest level of privilege and perform actions on behalf of ordinary users. Processes executing on behalf of administrators are ignored because administrators have sufficient privilege to leak information directly; they do not need to use covert channels. System calls available only to the administrator are ignored for the same reason.

The second step identifies the kernel variables that user processes can read and/or alter. The process must be able to control how the variable is altered and be able to detect that the variable has been altered. For example, if a system call assigns the fixed value 7 to a particular variable whenever that system call is made, the process cannot control how that variable is altered. Similarly, error conditions affect visibility. For example, if the variable count being zero causes an error, the state of count can be determined from the setting of the error indicator: if it is set on exit, count is 0; otherwise, it is nonzero. The specific criteria are as follows.

  1. The value of a variable is obtained from a system call.

  2. A calling process can detect at least two different states of that variable.

EXAMPLE: In Figure 17-2, the variable x is visible because it is returned directly to the calling process. The variable y is not directly visible because its value is never returned. However, its state (zero or nonzero) is visible through the value of the variable z.

The detection of such variables requires that the data flow through the kernel be analyzed to ensure that all dependencies (both data and functional) are detected. If the variable is a record or structure, the analysis process must consider changes in its attributes. If the variable is an array of records, changes both in the attributes of each element and in the array as a whole affect the analysis. Finally, the analysis must consider pointers to the variables in question.

Figure 17-2 Visibility of variables. The code fragments represent the body of system calls. The return value is the value returned by the system call. At the left, x is visible directly. The value of y at the right is not directly visible, but its state can be deduced from the returned value.
x := func(abc, def);                 y := func(abc, def);
if x = 0 then                        if y = 0 then
     x := x + 10;                          z := 1;
return x;                            else
                                           z := 0;
                                     return z;

The third step is to analyze these shared variables, looking for covert channels. The analysis here is similar to the analysis in the SRM method. The results are given in terms of the primitives that alter or view the shared variables. Primitives associated with variables that can only be altered or only be viewed are discarded. Complicating this process is the observation that many variables may be associated with a single covert channel, or one variable with many covert channels.

The resulting primitives are then compared with the model of nondiscretionary access control under the assumption that the recipient's security clearance does not dominate the sender's.

An analysis of the Secure Xenix kernel using this method found two kernel variables involved in covert channels. Four classes of generic covert channels were identified, including an unexploitable class that could be exploited only when the system failed (one such channel caused a reboot) and a noiseless channel that could not be eliminated without discarding the semantics of regular Xenix. The analysts also used the SRM method to analyze the top-level specification of Secure Xenix and noted that it failed to detect several covert channels. (This was expected, because the top-level specifications did not specify the data structures in which the covert channels were found.)

Tsai, Gligor, and Chandersekaran conclude that the shared variables could have been detected by informal code analysis but claim it unlikely that informal analysis would make all the associations of those variables with system calls. Hence, informal analysis would have missed several covert channels that their methodology found.

17.3.1.4 Covert Flow Trees

Porras and Kemmerer have devised an approach to representing security violations that spring from the application of fault trees [812]. They model the flow of information through shared resources with a tree. The paths of flow are identified in this structure. The analyst determines whether each flow is legitimate or covert.

A covert flow tree is a tree-structured representation of the sequence of operations that move information from one process to another. It consists of five types of nodes.

  1. Goal symbols specify states that must exist for the information to flow. There are several such states:

    1. A modification goal is reached when an attribute is modified.

    2. A recognition goal is reached when a modification of an attribute is detected.

    3. A direct recognition goal is reached when a subject can detect the modification of an attribute by referencing it directly or calling a function that returns it.

    4. An inferred recognition goal is reached when a subject can detect the modification of an attribute without referencing it directly and without calling a function that references the attribute directly. For example, the subject may call a function that performs one of two computations depending on the value of the attribute in question.

    5. An inferred-via goal is reached when information is passed from one attribute to other attributes using a specified primitive operation (such as a system call).

    6. A recognize-new-state goal is reached when an attribute that was modified when information was passed using it is specified by an inferred-via goal. The value need not be determined, but the fact that the attribute has been modified must be determined.

  2. An operation symbol is a symbol that represents a primitive operation. The operation symbols may vary among systems if they have different primitive operations.

  3. A failure symbol indicates that information cannot be sent along the path on which it lies. It means that the goal to which it is attached cannot be met.

  4. An and symbol is a goal that is reached when both of the following hold for all children:

    1. If the child is a goal, then the goal is reached.

    2. The child is an operation.

  5. An or symbol is a goal that is reached when either of the following holds for any children:

    1. If the child is a goal, then the goal is reached.

    2. The child is an operation.

Constructing the tree is a three-step process. To make the steps concrete, we present a simple set of operations and then ask if they can create a covert channel.

EXAMPLE: Consider a file system in which each file has three attributes. The boolean attributes locked and isopen are true when the file is locked or opened, respectively, and are false otherwise. The third attribute, inuse, is a set that contains the process ID of each process that has the file open. The function read_access(p, f ) is true if process p has read rights over file f, and empty(s) is true if set s has no members. The function random returns one of its arguments chosen at random. The following operations are defined.

(* lock the file if it is not locked and not opened *)
(* otherwise indicate it is locked by returning false *)
procedure Lockfile(f: file): boolean;
begin
  if not f.locked and empty(f.inuse) then
      f.locked := true;
end;
(* unlock the file *)
procedure Unlockfile(f: file);
begin
  if f.locked then
      f.locked := false;
end;
(* say whether the file is locked *)
function Filelocked(f: file): boolean;
begin
  Filelocked := f.locked;
end;
(* open the file if it isn't locked and the *)
(* process has the right to read the file *)
procedure Openfile(f: file);
begin
  if not f.locked and read_access(process_id, f) then
      (* add the process ID to the inuse set *)
      f.inuse = f.inuse + process_id;
end;
(* if the process can read the file, say if the *)
(* file is open, otherwise return a value at random *)
function Fileopened(f: file): boolean;
begin
  if not read_access(process_id, f) then
      Fileopened := random(true, false);
  else
      Fileopened := not isempty(f.inuse);
end

Assuming that processes are not allowed to communicate with one another, the reader is invited to try to find a covert storage channel.

The first step in constructing a covert flow tree is to determine what attributes (if any) the primitive operations reference, modify, and return.

EXAMPLE: The functions in the preceding example affect file attributes in different ways, as follows.

 

Lockfile

Unlockfile

Filelocked

Openfile

Fileopened

reference

locked, inuse

locked

locked

locked, inuse

inuse

modify

locked

Ø

Ø

inuse

Ø

return

Ø

Ø

locked

Ø

inuse

The symbol Ø means that no attribute is affected in the specified manner.

The second step begins with the goal of locating a covert storage channel that uses some attribute. The analyst constructs the covert flow tree. The type of goal controls the construction, as follows.

  1. The topmost goal requires that the attribute be modified and that the modification be recognized. Hence, it has one child (an and symbol), which in turn has two children (a modification goal symbol and a recognition goal symbol).

  2. A modification goal requires some primitive operation to modify the attribute. Hence, it has one or child, which has one child operation symbol per operation for all operations that modify the attribute.

  3. A recognition goal requires that a subject either directly recognize or infer a change in an attribute. It has an or symbol as its child. The or symbol has two children, one a direct recognition goal symbol and the other an inferred recognition goal symbol.

  4. A direct recognition goal requires that an operation access the attribute. Like the modification goal, it has one or child, and that child in turn has one child operation symbol for each operation that returns the attribute. If no operation returns the attribute, a failure symbol is attached.

  5. An inferred recognition goal requires that the modification be inferred on the basis of one or more other attributes. Hence, it has one child, an or symbol, which has one child inferred-via symbol for each operation that references an attribute and that modifies some attribute (possibly the same one that was referenced).

  6. An inferred-via goal requires that the value of the attribute be inferred via some operation and a recognition of the new state of the attribute resulting from that operation. Hence, it has one child (an and symbol), which has two children (an operation symbol representing the primitive operation used to draw the inference and a recognize-new-state goal symbol).

  7. A recognize-new-state goal requires that the value of the attribute be inferred via some operation and a recognition of the new state of the attribute resulting from that operation. The latter requires a recognition goal for the attribute. So, the child node of the recognize-new-state goal symbol is an or symbol, and for each attribute enabling the inference of the modification of the attribute in question, the or symbol has a recognition goal symbol child.

Tree construction ends when all paths through the tree terminate in either an operation symbol or a failure symbol. Because the construction is recursive, the analyst may encounter a loop in the tree construction. Should this happen, a parameter called repeat defines the number of times that the path may be traversed. This places an upper bound on the size of the tree.

EXAMPLE: We build the covert flow tree for the attribute locked in our previous two examples. The goal state is "covert storage channel via attribute locked." The and node beneath it has two children, "modification of attribute locked" and "recognition of attribute locked." At this point, the tree looks like this:

graphics/17fig01a.gif

From the table in the preceding example, the operations Lockfile and Unlockfile modify the attribute locked. So that branch of the tree becomes:

graphics/17fig01b.gif

The recognition branch expands into direct recognition and inferred recognition branches. The direct recognition branch has an and with one child, Filelocked, because Filelocked returns the value of the locked attribute. The inferred recognition branch has an or child with one child, an "inferred-via" node that infers locked from inuse. This branch comes from comparing the "reference" row of the table in the preceding example with the "modify" row. If an operation references the locked attribute and modifies another attribute, inference is possible (assuming that the modification can be detected). At this point, the recognition branch looks like this:

graphics/17fig01c.gif

Inferring that the attribute locked has changed from the attribute inuse requires the operation Openfile. After that operation, the recognize-new-state goal represents the change in the attribute inuse:

graphics/17fig01d.gif

This in turn requires the recognition of modification of the attribute inuse (hence, a recognition state). The operation Fileopened recognizes this change directly; nothing recognizes it indirectly. The result is:

graphics/17fig03.gif

Figure 17-3 shows the full covert flow tree.

Figure 17-3. The covert flow tree for the operations.

graphics/17fig03a.gif

The analyst now constructs two lists. The first list contains sequences of operations that modify the attribute, and the second list contains sequences of operations that recognize modifications in the attribute. A sequence from the first list followed by a sequence from the second list is a channel along which information can flow. The analyst examines these channels to determine which are covert.

EXAMPLE: In the covert flow tree presented above, the first list has two sequences:

List 1 = ( ( Lockfile ), ( Unlockfile ) )

because both operations modify the attribute (and lie on the "modified" branch under the root of the tree). The second list also has two sequences:

List 2 = ( ( Filelocked ), ( Openfile, Fileopened ) )

—the first from the direct recognition of the modification of the attribute and the second from the indirect recognition. These sequences result in four channels of communication.

  1. Lockfile followed by Filelocked

  2. Unlockfile followed by Filelocked

  3. Lockfile followed by Openfile, then Fileopened

  4. Unlockfile followed by Openfile, then Fileopened.

If a High-level user transmits information to a Low-level user by locking and unlocking a file, the first two channels (in combination) represent a direct covert storage channel. The last two represent an indirect covert storage channel. To use the channel, the High-level process locks a file to send a 0 bit and unlocks a file to send a 1 bit. The Low process tries to open the locked file. It then uses Fileopened to see if it has opened the file. If the file is opened, the High process did not lock the file (a 0 bit). If the file is not opened, the High process did lock the file (a 1 bit).

The shared resource matrix model and covert flow trees spring from the idea of examining shared resources for modification and reference operations, and both can be used at any point within the software development life cycle. One advantage of covert flow trees over the SRM model is that the former identifies explicit sequences of operations that cause information to flow from one process to another. The latter identifies channels rather than sequences of operations. In comparisons involving file system access operations and the Secure Ada Target, the covert flow tree method identified sequences of operations corresponding to the covert storage channels found by the SRM method and the noninterference method, as well as one not found by the other two.

17.3.2 Analysis of Covert Channels

How dangerous is a covert channel? Policy and operational issues come into play, and we do not consider those issues here. We assume that whatever security policy is in force deems covert channels a serious problem. However, the amount of information that can be transmitted over a covert channel affects how serious a problem that channel presents. If the rate were one bit per hour, the channel would be harmless in most circumstances. If the rate were 1,000,000 bits per second, the channel would be dangerous. Following Millen [710], we examine the problem of measuring this bandwidth.

17.3.2.1 Covert Channel Capacity and Noninterference

We begin by asking when the bandwidth is 0. Suppose Alice wants to send Bob information over a covert channel. Alice feeds her input into a machine that passes the output to Bob. We define the following random variables.

Define I(A;B) as the amount of information transmitted over the covert channel. We are interested in the greatest amount of information that can be transmitted.

Definition 17–8. The covert channel capacity is maxA I(A;B).

This capacity is measured in bits. The rate is then established by dividing the capacity by the number of trials (or the amount of time) required to send the information.

We first establish that noninterference is sufficient to make the capacity of a covert channel 0.

Theorem 17–2. [710] If A and V are independent and A is noninterfering with B, then I(A;B) = 0.

Proof It suffices to show that the conditions of the theorem mean that A and B are independent—that is, to prove that p(A = a, B = b) = p(A = a)p(B = b). Recall that

p(A = a, B = b) = SV p(A = a, B = b, V = v)

By Definition 8–4, A being noninterfering with B means that deleting that part of the input making up a will not change the output b. Thus, we need to consider only those values of B that can result from values of V. Hence,

p(A = a, B = b) = SV p(A = a, V = v)p(B = b | V = v)

By independence of A and V, this becomes

p(A = a, B = b) = SV p(A = a)p(V = v)p(B = b | V = v)

Standard manipulations yield

p(A = a, B = b)

=

p(A = a) (SV p(B = b | V = v)p(V = v))

 

=

P(A = a) p(B = b)

establishing independence and hence the desired result.

However, noninterference is not necessary [710]. To see this, consider a system with one bit of state; three inputs IA, IB, and IC; and one output OX. Each input flips the state, and the value of the state (0 or 1) is output. Let the system initially be in state 0, and let w be the sequence of inputs corresponding to the output x(w). Then the value x(w) depends on the length of the input; x(w) = length(w) mod 2. Clearly, IA is not noninterfering with OX because if the inputs from IA are deleted, the length of the input sequence is affected and so the value x(w) may also be affected. We consider two cases. In the following discussion, let W represent the random variable corresponding to the length of the input sequences, let A represent the random variable corresponding to the length of the components of the input subsequence contributed by input IA, let V represent the random variable corresponding to the length of the components of the input sequence not contributed by IA, and let X represent the random variable corresponding to the output state. Let A and V be independent and consider two distributions of V. (Without loss of generality, we restrict A and V to representing single bits.)

  1. If V = 0, because W = (A + V) mod 2, then W = A. So A and W are not independent, and neither are A and X. Hence, if V = 0, I(A; X) 0.

  2. Let inputs IB and IC produce inputs such that p(V = 0) = p(V = 1) = 0.5. Then,

    p(X = x) = p(V = x, A = 0) + p(V = 1 – x, A = 1)

    Because A and V are independent,

    p(X = x) = p(V = x)p(A = 0) + p(V = 1 – x)p(A = 1)

    This yields p(X = x) = 0.5. Moreover,

    p(X = x | A = a) = p(X = (a + x) mod 2) = 0.5

    Hence, A and X are independent, yielding I(A; X) = 0.

This means that even though A and X are not noninterfering, the channel capacity may be 0. In other words, the covert channel capacity will be 0 if either the input is noninterfering with the output or the input sequence is produced from independent sources and all possible values from at least one source are equiprobable. In the latter case, the distribution in effect "hides" the interference.

17.3.2.2 Measuring Covert Channel Capacity

When an attacker uses a covert channel, he modulates the output by providing specific inputs. Suppose that, when no modulation occurs, the uncertainty in the output is eight bits. When modulation occurs, the uncertainty is reduced to five bits. Then the covert channel capacity is three bits, because the input "fixes" those bits. We formalize this idea as follows [710].

The capacity of the covert channel with inputs A and V, and output X, is the measure of the certainty in X given A. In terms of entropy, this means that we maximize

I(A; X) = H(X) – H(X | A)

with respect to A.

EXAMPLE: Return to the example in the preceding section. We assume that A and V are independent. Let p = p(A = 0) and q = p(V = 0). Then,

p(A = 0, V = 0) = pq

p(A = 1, V = 0) = (1 – p)q

p(A = 0, V = 1) = p(1 – q)

p(A = 1, V = 1) = (1 – p)(1 – q)

and p(X = 0) = pq + (1 – p)(1 – q) and p(X = 1) = (1 – p)q + p(1 – q). Also,

p(X = 0 | A = 0) = q

p(X = 1 | A = 0) = 1 – q

p(X = 0 | A = 1) = 1 – q

p(X = 1 | A = 1) = q

This means that

H(X)

=

–p(X = 0) lg p(X = 0) – p(X = 1) lg p(X = 1)

 

=

–[pq + (1 – p)(1 – q)] lg [pq + (1 – p)(1 – q)] – [(1 – p)q + p(1 – q)] lg [(1 – p)q + p(1 – q) ]

and

H(X | A)

=

–SA p(A = a)[SX p(X = x | A = a) lg p(X = x | A = a)]

 

=

– p(A = 0)[p(X = 0 | A = 0) lg p(X = 0 | A = 0) + p(X = 1 | A = 0) lg p(X = 1 | A = 0)] –p(A = 1)[p(X = 0 | A = 1) lg p(X = 0 | A = 1) + p(X = 1 | A = 1) lg p(X = 1 | A = 1)]

 

=

– p[q lg q + (1 – q) lg (1 – q)] – (1 – p)[(1 – q) lg (1 – q) + q lg q]

 

=

– q lg q – (1 – q) lg (1 – q).

So

I(A; X)

=

– [pq + (1 – p)(1 – q)] lg [pq + (1 – p)(1 – q)] – [(1 – p)q + p(1 – q)] lg [(1 – p)q + p(1 – q)] + q lg q + (1 – q) lg (1 – q).

This is a maximum when p = 0.5. At that value, I(A; X) = 1 + q lg q + (1 – q) lg (1 – q) = 1 – H(V), which agrees with the intuition from the earlier example. In particular, if q = 0 (so V is a constant), the bandwidth of the covert channel is I(A; X) = 1 bit, and if q = p = 0.5, the bandwidth of the covert channel is I(A; X) = 0 bits.

We now examine a model for bandwidth computation for a storage channel and a timing channel.

17.3.2.3 Analyzing a Noisy Covert Channel's Capacity

Costich and Moskowitz [236] examined the covert channel created by a multilevel secure database that used replication to ensure data availability. The database used the two-phase commit protocol to ensure atomicity of transactions. One coordinator process managed global execution; the other processes were participants.

  1. The coordinator sends a message to each participant requesting whether to commit or abort the transaction. Each participant replies as it deems appropriate. If a participant replies to abort, it stops its process.

  2. The coordinator gathers the replies from the participants. If all replies are to commit, the coordinator sends commit messages back to the participants. If any reply is to abort, the coordinator sends abort messages to the participants. Each participant that has sent a commit waits for the reply from the coordinator, and then acts accordingly.

In the database under discussion, if either the coordinator does not receive a reply from a participant or a participant does not receive a reply from the coordinator, the protocol times out and the parties act as though the transaction has been aborted.

Suppose the replicated database consists of two types of components—one at a Low security level and another at a High security level. If a Low component begins the two-phase commit, both Low and High components must cooperate in the two-phase commit protocol. A High component can transmit information to a Low component by selectively aborting transactions (either by sending abort messages or simply by not sending anything, causing a time-out). This is a covert channel [519].

EXAMPLE: If transactions always succeeded except when a High component is sending information, this channel would not be noisy. The capacity of the channel would be one bit (abort/commit) per trial.

This channel is noisy because transactions may abort for reasons other than the sending of information. The analysis must take this into account.

EXAMPLE: Let X be the random variable corresponding to what the High user wants to send. Without loss of generality, we treat an aborted transaction as the High user sending a 1 and a committed transaction as the High user sending a 0. Let A be the random variable corresponding to what the Low user receives. (Note that for a noiseless channel, X = A.)

Let p = p(X = 0) be the probability that the High user sends a 0. We also assume that the n users other than the sender and receiver act independently of one another, and that the probability of a transaction being aborted at any of these users is q. Thus,

p(A = 0 | X = 0) = (1 – q)n

p(A = 1 | X = 0) = 1 – (1 – q)n

p(A = 0 | X = 1) = 0

p(A = 1 | X = 1) = 1

This yields p(A = 0) = p(1 – q)n and p(A = 1) = 1 – p(1 – q)n. From this, we have

p(X = 0 | A = 0) = 1

p(X = 1 | A = 0) = 0

p(X = 0 | A = 1) = p[1 – (1 – q)n] / [1 – p(1 – q)n]

p(X = 1 | A = 1) = (1 – p) / [1 – p(1 – q)n]

This means that

H(X)

=

–p(X = 0) lg p(X = 0) – p(X = 1) lg p(X = 1)

 

=

– p lg p – (1 – p) lg (1 – p)

and

H(X | A)

=

–SA p(A = a)[SX p(X = x | A = a) lg p(X = x | A = a)]

 

=

–p(A = 0)[p(X = 0 | A = 0) lg p(X = 0 | A = 0) + p(X = 1 | A = 0) lg p(X = 1 | A = 0)] –p(A = 1)[p(X = 0 | A = 1) lg p(X = 0 | A = 1) + p(X = 1 | A = 1) lg p(X = 1 | A = 1)]

 

=

–p(1 – q)n[0 + 0] –p[1 – (1 – q)n] lg {p[1 – (1 – q)n] / [1 – p(1 – q)n]} – (1 – p) lg {(1 – p) / [1 – p(1 – q)n]}

 

=

–p[1 – (1 – q)n] lg p – p[1 – (1 – q)n] lg [1 – (1 – q)n] + [1 – p(1 – q)n] lg [1 – p(1 – q)n] – (1 – p) lg (1 – p)

So

I(A; X)

=

– p(1 – q)n lg p + p[1 – (1 – q)n] lg [1 – (1 – q)n] – [1 – p(1 – q)n] lg [1 – p(1 – q)n]

We maximize this with respect to p to obtain the covert channel capacity. For notational convenience, take m = (1 – q)n and M = (1 – m)(1–m). Then I(A; X) is a maximum when p = M / (Mm + 1). So, the channel capacity is

I(A; X)

=

[Mm lg p + M(1 – m) lg (1 – m) + lg (Mm + 1)] / (Mm + 1)

17.3.3 Mitigation of Covert Channels

Covert channels convey information by varying the use of shared resources. An obvious way to eliminate all covert channels is to require processes to state what resources they need before execution and provide these resources in such a manner that only the process can access them. This includes runtime, and when the stated runtime is reached, the process is terminated and the resources are released. The resources remain allocated for the full runtime even if the process terminates earlier. Otherwise, a second process could infer information from the timing of the release of the resources (including access to the CPU). This strategy effectively implements Lampson's idea of total isolation, but it is usually unworkable in practice.

An alternative approach is to obscure the amount of resources that a process uses. A receiving process cannot determine what amount of resource usage is attributable to the sender and what amount is attributable to the obfuscation. This can be done in two ways.

First, the resources devoted to each process can be made uniform. This is a variant of isolation, because each process gets the same amount of resources and cannot tell whether a second process is accessing the resource by measuring the timing or amount of resources available. In essence, the system eliminates meaningful irregularities in resource allocation and use.

EXAMPLE: The covert channel involving the CPU usage in KVM (see the second example on page 446) can be mitigated by assigning each virtual machine a time slice of fixed magnitude and not allowing any virtual machine to surrender the CPU until the end of the slice. No virtual machine can shorten its time slice by relinquishing the CPU early, thereby sending a "0" or a "1." This closes the timing channel.

Second, a system can inject randomness into the allocation and use of resources. The goal is to make the covert channel a noisy one and to have the noise dominate the channel. This does not close the covert channel (because it still exists) but renders it useless.

EXAMPLE: Return to the noisy covert channel in the multilevel secure database discussed in Section 17.3.2.3. If the probability of a transaction being aborted by a participant that is neither the sender nor the receiver approaches 1, then the channel capacity approaches 0. Hence, increasing the probability of such an abort decreases the calculated bandwidth of the channel. One suggestion [236] is to resolve conflicts by aborting; this increases the probability of aborts from participants that are neither senders nor receivers. A second idea is to cause participants to abort transactions randomly.

Both these techniques affect efficiency. Assigning fixed allocations and constraining use waste resources. Fixing the time slices on the KVM system means that the CPU will be unused (or will execute an idle process) when another virtual machine could run a non-idle process. Increasing the probability of aborts in the multilevel secure database system will abort some transactions that would normally commit, increasing the expected number of tries to update the database. Whether the closing of the covert channel or the limiting of the bandwidth compensates adequately for the loss in efficiency is a policy decision.

EXAMPLE: Hu [498] describes an interesting approach to limiting covert timing channels on the VAX virtualizing security kernel. "Fuzzy time" reduces the accuracy of system clocks by using the programmable system clock to generate random clock ticks. The random interrupts can take any desired distribution. For example, virtual machines receive timer interrupts with a uniform distribution and a mean of 20 milliseconds, rather than every 10 milliseconds (as the native timer would create). The system clock is updated only after each timer interrupt, and the kernel rounds the time to the nearest tenth of a second before supplying it to the virtual machine (so it cannot be any more accurate than that of the interrupts). I/O operations have delays added randomly. The kernel distinguishes between event time (when the I/O event occurs) and notification time (when the virtual machine is told that the I/O operation has occurred). The random delay between these two times prevents the virtual machine from determining exactly when an event occurred. The random length of the interval can be distributed as desired. In the security kernel, the interval is between 1 and 19 milliseconds. The "fuzz" added to the timings adds noise to the covert timing channel, thereby making it more difficult to exploit. Trostle [999] improved on this technique by modifying the scheduler to run processes in increasing order of security level and by observing that countermeasures are needed only when a transition from a dominating virtual machine to a dominated virtual machine occurs. He also suggested adding random intervals between quanta at these transitions.

A device known as a pump is the basis of several techniques for defeating covert channels.

EXAMPLE: The pump [546] is a (hardware or software) tool for controlling a communication path between a High process and a Low process. It consists of a buffer for messages to and from the High process, a buffer for messages to and from the Low process, and a communications buffer of length n that is connected to both of the other buffers (see Figure 17-4). We assume that messages are numbered and that the communications buffer preserves messages if the pump crashes. Under these assumptions, the processes can recover (so that either the messages in the pump are delivered or the sender detects that they are lost and resends the message; see Exercise 9).

Figure 17-4. The pump. Messages going between the High and Low processes enter the pump (represented by the dashed oval). The pump controls the rate at which the messages flow between the two processes. The pump acknowledges each message as it is moved from the process buffer to the communications buffer.

graphics/17fig04.gif

A covert timing channel occurs when the High process can control the rate at which the pump passes messages to it. The Low process fills the communications buffer by sending messages to the pump until it fails to receive an acknowledgment. At that point, the High and Low processes begin their trials. At the beginning of each trial, if the High process wants to send a 1, it allows the pump to send it one of the queued messages. If the High process wants to send a 0, it does not accept any messages from the pump. If the Low process receives an acknowledgment, it means that a message has moved from the Low buffer to the communications buffer. This can happen only if a space in the communications buffer opens. This occurs when the High process reads a message. Hence, if the Low process gets an acknowledgment, the High process is signaling a 1. By a similar argument, if the Low process does not get an acknowledgment, the High process is signaling a 0. Following the trial, if the Low process has received an acknowledgment, it must send another message to the pump to enter the state required for the next trial.

In what follows, we assume that the Low process and the pump can process messages more quickly than the High process. Let Li be the random variable corresponding to the time from the Low process' sending a message to the pump to the Low process' receiving an acknowledgment. Let Hi be the random variable corresponding to the average time required for the High process to acknowledge each of the last n messages. Three cases arise.

  1. E(Li) > Hi. This means that the High process can process messages in less time than it takes for the Low process to get the acknowledgment. Because this contradicts our assumption above, the pump must be artificially delaying acknowledgments. This means that the Low process will wait for an acknowledgment regardless of whether the communications buffer is full or not. Although this closes the covert timing channel, it is not optimal because the processes may wait even when they do not need to.

  2. E(Li) < Hi. This means that the Low process is sending messages into the pump faster than the High process can remove them. Although it maximizes performance, it opens the covert channel.

  3. E(Li) = Hi. This means that the pump and the processes handle messages at the same rate. It balances security and performance by decreasing the bandwidth of the covert channel (with respect to time) and increases performance. The covert channel is open, however, and performance is not optimal.

Kang and Moskowitz [546] showed that adding noise to the channel in such a way as to approximate the third case reduced the covert channel capacity to at most 1/nr, where r is the time between the Low process' sending a message to the pump and its receiving an acknowledgment when the communications buffer is not full. They concluded that the pump substantially reduces the capacity of covert channels between High and Low processes when compared with direct connection of those processes.


  Previous section   Next section
Top