About half the bugs in typical C programs can be attributed to memory management problems. Memory management bugs are notoriously difficult to detect through traditional techniques. Often, the symptom of the bug is far removed from its actual source. Memory management bugs often only appear sporadically and some bugs may only be apparent when compiler optimizations are turned on or the code is compiled on a different platform. Run-time tools offer some help, but are cumbersome to use and limited to detecting errors that occur when test cases are run. By detecting these errors statically, we can be confident that certain types of errors will never occur and provide verified documentation on the memory management behavior of a program.
Splint can detect many memory management errors at compile time including using storage that may have been deallocated (Section 5.2), memory leaks (Section 5.2), or returning a pointer to stack-allocated storage (Section 5.2.6).
Yea, from the
table of my memory I'll wipe away all trivial fond records, all
saws of books,
all forms, all pressures past, that youth and observation copied
there.
Hamlet prefers
garbage collection (Shakespeare, Hamlet. Act I, Scene
v)
Most of these checks depend on annotations added to programs to document assumptions related to memory management and pointer values. By documenting these assumptions for function interfaces, variables, type definitions and structure fields, memory management bugs can be detected at their source — where an assumption is violated. In addition, precise documentation about memory management decisions makes it easier to change code.
This section describes execution-time concepts for describing the state of storage more precisely than can be done using standard C terminology. Certain uses of storage are likely to indicate program bugs, and are reported as anomalies.[3]
Splint assumes a CLU-like object storage model.[4] An object is a typed region of storage. Some objects use a fixed amount of storage that is allocated and deallocated automatically by the compiler. Other objects use dynamic storage that must be managed by the program.
Storage is undefined if it has not been assigned a value, and defined after it has been assigned a value. An object is completely defined if all storage that may be reached from it is defined. What storage is reachable from an object depends on the type and value of the object. For example, if p is a pointer to a structure, p is completely defined if the value of p is NULL, or if every field of the structure p points to is completely defined.
When an expression is used as the left side of an assignment expression we say it is used as an lvalue. Its location in memory is used, but not its value. Undefined storage may be used as an lvalue since only its location is needed. When storage is used in any other way, such as on the right side of an assignment, as an operand to a primitive operator (including the indirection operator, *),[5] or as a function parameter, we say it is used as an rvalue. It is an anomaly to use undefined storage as an rvalue.
A pointer is a typed memory address. A pointer is either live or dead. A live pointer is either NULL or an address within allocated storage. A pointer that points to an object is an object pointer. A pointer that points inside an object (e.g., to the third element of an allocated block) is an offset pointer. A pointer that points to allocated storage that is not defined is an allocated pointer. The result of dereferencing an allocated pointer is undefined storage. Hence, it is an anomaly to use it as an rvalue. A dead (or “dangling”) pointer does not point to allocated storage. A pointer becomes dead if the storage it points to is deallocated (e.g., the pointer is passed to the free library function.) It is an anomaly to use a dead pointer as an rvalue.
There is a special object null corresponding to the NULLpointer in a C program. A pointer that may have the value NULL is a possibly-null pointer. It is an anomaly to use a possibly-null pointer where a non-null pointer is expected (e.g., certain function arguments or the indirection operator).
There are two kinds of deallocation errors with which we are concerned: deallocating storage when there are other live references to the same storage, or failing to deallocate storage before the last reference to it is lost. To handle these deallocation errors, we introduce a concept of an obligation to release storage. Every time storage is allocated, it creates an obligation to release the storage. This obligation is attached to the reference to which the storage is assigned.[6] Before the scope of the reference is exited or it is assigned to a new value, the storage to which it points must be released. Annotations can be used to indicate that this obligation is transferred through a return value, function parameter or assignment to an external reference.
‘Tis in my
memory lock’d, and you yourself shall keep the key of
it.
Ophelia prefers explicit
deallocation (Hamlet. Act I, Scene iii)
The only annotation is used to indicate a reference is the only pointer to the object it points to. We can view the reference as having an obligation to release this storage. This obligation is satisfied by transferring it to some other reference in one of three ways:
· pass it as an actual parameter corresponding to a formal parameter declared with an only annotation
· assign it to an external reference declared with an only annotation
· return it as a result declared with an only annotation
After the release obligation is transferred, the original reference is a dead pointer and the storage it points to may not be used.
All obligations to release storage stem from primitive allocation routines (e.g., malloc), and are ultimately satisfied by calls to free. The standard library declared the primitive allocation and deallocation routines.
The basic memory allocator, malloc, is declared:
/*@only@*/ /*@null@*/ void *malloc (size_t size);
It returns an object that is referenced only by the function return value.
The deallocator, free, is declared:[7]
void free (/*@only@*/ /*@out@*/ /*@null@*/ void *ptr);
Running Splint | |
1 extern /*@only@*/ int *glob;
/*@only@*/ int * f (/*@only@*/ int *x, int *y, int *z) /*@globals glob;@*/ { 8 int *m = (int *) 9 malloc (sizeof (int));
11 glob = y; Memory leak 12 free (x); 13 *m = *x; Use after free 14 return z; Memory leak detected } |
> splint only.c only.c:11: Only storage glob (type int *) not released before assignment: glob = y only.c:1: Storage glob becomes only only.c:11: Implicitly temp storage y assigned to only: glob = y only.c:13: Dereference of possibly null pointer m: *m only.c:8: Storage m may become null only.c:13: Variable x used after being released only.c:12: Storage x released only.c:14: Implicitly temp storage z returned as only: z only.c:14: Fresh storage m not released before return only.c:9: Fresh storage m allocated |
The parameter to free must reference an unshared object. Since the parameter is declared using only, the caller may not use the referenced object after the call, and may not pass in a reference to a shared object. There is nothing special about malloc and free — their behavior can be described entirely in terms of the provided annotations.
The temp annotation is used to declare a function parameter that is used temporarily by the function. An error is reported if the function releases the storage associated with a temp formal parameter or creates new aliases to it that are visible after the function returns. Any storage may be passed as a temp parameter, and it satisfies its original memory constraints after the function returns.
In real programs it is sometimes necessary to have storage that is shared between several possibly references. The owned and dependent annotations provide a more flexible way of managing storage, at the cost of less checking. The owned annotation denotes a reference with an obligation to release storage. Unlike only, however, other external references marked with dependent annotations may share this object. It is up to the programmer to ensure that the lifetime of a dependent reference is contained within the lifetime of the corresponding owned reference.
The keep annotation is similar to only, except the caller may use the reference after the call. The called function must assign the keep parameter to an only reference, or pass it as a keep parameter to another function. It is up to the programmer to make sure that the calling function does not use this reference after it is released. The keep annotation is useful for adding an object to a collection (e.g., a symbol table), where it is known that it will not be deallocated until the collection is.
If Splint is used to check a program designed to be used in a garbage-collected environment, there may be storage that is shared by one or more references and never explicitly released. The shared annotation declares storage that may be shared arbitrarily, but never released.
Local variables that are not allocated dynamically are stored on a call stack. When a function returns, its stack frame is deallocated, destroying the storage associated with the function’s local variables. A memory error occurs if a pointer into this storage is live after the function returns. Splint detects errors involving stack references exported from a function through return values or assignments to references reachable from global variables or actual parameters. No annotations are needed to detect stack reference errors, since it is clear from a declaration if storage is allocated on the function stack. Figure 7 gives and example of errors reported involving stack-allocated storage.
stack.c |
Running Splint |
int *glob;
/*@dependent@*/ int * f (int **x) { int sa[2] = { 0, 1 }; int loc = 3;
9 glob = &loc; 10 *x = &sa[0];
12 return &loc; } |
> splint stack.c stack.c:12: Stack-allocated storage &loc reachable from return value: &loc stack.c:12: Stack-allocated storage *x reachable from parameter x stack.c:10: Storage *x becomes stack stack.c:12: Stack-allocated storage glob reachable from global glob stack.c:9: Storage glob becomes stack
A dependent annotation is used on the return value. Without this, other warnings would be reported, since the result would have an implicit only annotation. |
Figure 7. Stack-Allocated Storage |
An annotation always applies to the outermost level of storage. For example,
/*@only@*/ int **x;
declares x as an unshared pointer to a pointer to an int. The only annotation applies to x, but not to *x. To apply annotations to inner storage a type definition may be used:
typedef /*@only@*/ int *oip;
/*@only@*/ oip *x;
Now, x is an only pointer to an oip, which is an only pointer to an int.
When annotations are used in type definitions, they may be overridden in instance declarations. For example,
/*@dependent@*/ oip x;
makes x a dependent pointer to an int. Another way to apply annotations to inner storage is to use a state clause (see Section 7.4).
Since it is important that Splint can check unannotated programs effectively, the meaning of declarations with no memory annotations is chosen to minimize the number of annotations needed to get useful checking on an unannotated program.
An implicit memory management annotation may be assumed for declarations with no explicit memory management annotation. Implicit annotations are checked identically to the corresponding explicit annotation, except error messages indicate that they result from an implicit annotation. Figure 8 illustrates some implicit annotations.
Unannotated function parameters are assumed to be temp. This means if memory checking is turned on for an unannotated program, all functions that release storage referenced by a parameter or assign a global variable to alias the storage will produce error messages. (Controlled by paramimptemp.)
implicit.c | |
typedef struct { only char *name; int val; } *rec;
extern only rec rec_last ;
extern only rec rec_create (temp char *name, int val) ; Annotations in italics are not present in the code, but may be implied depending on flag settings. |
Implicit only annotation on mutable structure field if structimponly is on.
Implicit only annotation on mutable global variables if globimponly is on.
Implicit only annotation on mutable function result if retimponly is set. Implicit temp annotation on mutable parameter if paramimptemp is set. |
Figure 8. Implicit Annotations |
Unannotated return values, structure fields and global variables are assumed to be only. With implicit annotations (on by default), turning on memory checking for an unannotated program will produce errors for any function that does not return unshared storage or assignment of shared storage to a global variable or structure field. If an exposure qualifier is used (see Section 6.2), the implied dependent annotation is used instead of the more generally implied only annotation. (Controlled by retimponly, structimponly and globimponly. The allimponly flag sets all of the implicit only flags.)
Another approach to memory management is to add a field to a type to explicitly keep track of the number of references to that storage. Every time a reference is added or lost the reference count is adjusted accordingly; if it would become zero, the storage is released. Reference counting it difficult to do without automatic checking since it is easy to forget to increment or decrement the reference count, and exceedingly difficult to track down these errors.
Splint supports reference counting by using annotations to constrain the use of reference counted storage in a manner similar to other memory management annotations.
A reference counted type is declared using the refcounted annotation. Only pointer to struct types may be declared as refcounted, since reference counted storage must have a field to count the references. One field in the structure (or integral type) is preceded by the refs annotation to indicate that the value of this field is the number of live references to the structure. For example (in rstring.h),
typedef /*@abstract@*/ /*@refcounted@*/ struct {
/*@refs@*/ int refs;
char *contents;
} *rstring;
declares rstring as an abstract, reference-counted type. The refs field counts the number of references and the contents field holds the contents of a string.
rstring.c |
Running Splint |
# include "rstring.h"
static rstring rstring_ref (rstring r) { r->refs++; 6 return r; }
rstring rstring_first (rstring r1, rstring r2) { if (strcmp (r1->contents, r2->contents) < 0) 12 return r1; else 14 return rstring_ref (r2); } |
> splint rstring.c rstring.c:12: Reference counted storage returned without modifying reference count: r1
No error is reported for line 6 since the reference count was incremented. No error is reported for line 14, since rstring_ref returns a new reference.
|
Figure 9. Reference Counting |
All functions that return refcounted storage must increase the reference count before returning. Splint cannot determine if the reference count was increased, so any function that directly returns a reference to refcounted storage will produce an error. This is avoided, by using a function to return a new reference (e.g., rstring_ref in Figure 9).
A reference counted type may be passed as a temp or dependent parameter. It may not be passed as an only parameter. Instead, the killref annotation is used to denote a parameter whose reference is eliminated by the function call. Like only parameters, an actual parameter corresponding to a killref formal parameter may not be used in the calling function after the call. Splint checks that the implementation of a function releases all killref parameters, either by passing them as killref parameters, or assigning or returning them without increasing the reference count.
Next: 6. Sharing
Return to Contents