LCLint User's Guide - Section 8: Macros

8. Macros

Macros are commonly used in C programs to implement constants or to mimic functions without the overhead of a function call. Macros that are used to implement functions are a persistent source of bugs in C programs, since they may not behave like the intended function when they are invoked with certain parameters or used in certain syntactic contexts.

LCLint eliminates most of the potential problems by detecting macros with dangerous implementations and dangerous macro invocations. Whether or not a macro definition is checked or expanded normally depends on flag settings and control comments (see Section 8.3). Stylized macros can also be used to define control structures for iterating through many values (see Section 8.4).

8.1 Constant Macros

Macros may be used to implement constants. To get type-checking for constant macros, use the constant syntactic comment:

/*@constant null char *mstring_undefined@*/

Declared constants are not expanded and are checked according to the declaration. A constant with a null annotation may be used as only storage.

8.2 Function-like Macros

Using macros to imitate functions is notoriously dangerous. Consider this broken macro for squaring a number:

 
   # define square(x) x * x

This works fine for a simple invocation like square(i). It behaves unexpectedly, though, if it is invoked with a parameter that has a side effect.

For example, square(i++) expands to i++ * i++. Not only does this give the incorrect result, it has undefined behavior since the order in which the operands are evaluated is not defined. (See Section 10.1 for more information on how expressions exhibiting undefined evaluation order behavior are detected by LCLint.) To correct the problem we either need to rewrite the macro so that its parameter is evaluated exactly once, or prevent clients from invoking the macro with a parameter that has a side-effect.

Another possible problem with macros is that they may produce unexpected results because of operator precedence rules. The invocation, square(i+1) expands to i+1*i+1, which evaluates to i+i+1 instead of the square of i+1. To ensure the expected behavior, the macro parameter should be enclosed in parentheses where it is used in the macro body.

Macros may also behave unexpectedly if they are not syntactically equivalent to an expression. Consider the macro definition,

  # define incCounts()  ntotal++; ncurrent++;

This works fine, unless it is used as a statement. For example,

if (x < 3) incCounts();

increments ntotal if x < 3 but always increments ncurrent.

One solution is to use the comma operator to define the macro:

  # define incCounts()  (ntotal++, ncurrent++)

More complicated macros can be written using a do … while construction:

  # define incCounts() \
     do { ntotal++; ncurrent++; } while (FALSE)

LCLint detects these pitfalls in macro definitions, and checks that a macro behaves as much like a function as possible. A client should only be able to tell that a function was implemented by a macro if it attempts to use the macro as a pointer to a function.

These checks are done by LCLint on a macro definition corresponding to a function:

Each parameter to a macro (except those declared to be side-effect free, see Section 8.2.1) must be used exactly once in all possible executions of the macro, so side-effecting arguments behave as expected.[21] (Controlled by macroparams.)
A parameter to a macro may not be used as the left hand side of an assignment expression or as the operand of an increment or decrement operator in the macro text, since this produces non-functional behavior. (Controlled by macroassign.)
Macro parameters must be enclosed in parentheses when they are used in potentially dangerous contexts. (Controlled by macroparens.)
A macro definition must be syntactically equivalent to a statement when it is invoked followed by a semicolon. (Controlled by macrostmt.)
The type of the macro body must match the return type of the corresponding function. If the macro is declared with type void, its body may have any type but the macro value may not be used.
All variables declared in the body of a macro definition must be in the macro variable namespace, so they do not conflict with variables in the scope where the macro is invoked (which may be used in the macro parameters). By default, the macro namespace is all names prefixed by m_. (See Section 9.2 for information on controlling namespaces.)

At the call site, a macro is checked like any other function call.

8.2.1 Side-Effect Free Parameters

Suppose we really do want to implement square as a macro, but want do so in a safe way. One way to do this is to require that it is never invoked with a parameter that has a side-effect. LCLint will check that this constraint holds, if the parameter is annotated to be side-effect free. That is, the expression corresponding to this parameter must not modify any state, so it does not matter how many times it is evaluated. The sef annotation is used to denote a parameter that may not have any side-effects:

   extern int square (/*@sef@*/ int x);
   # define square(x) ((x) *(x))

Now, LCLint will not report an error checking the definition of square even though x is used more than once.

A message will be reported, however, if square is invoked with a parameter that has a side-effect.

For the code fragment,

square (i++)

LCLint produces the message:

   Parameter 1 to square is declared sef, but the argument may modify i: i++

It is also an error to pass a non-sef macro parameter as a sef macro parameter in the body of a macro definition. For example,

   extern int sumsquares (int x, int y);
   # define sumsquares(x,y) (square(x) + square(y))

Although x only appears once in the definition of sumsquares it will be evaluated twice since square is expanded. LCLint reports an error when a non-sef macro parameter is passed as a sef parameter.

A parameter may be passed as a sef parameter without an error being reported, if LCLint can determine that evaluating the parameter has no side-effects. For function calls, the modifies clause is used to determine if a side-effect is possible.[22] To prevent many spurious errors, if the called function has no modifies clause, LCLint will report an error only if sefuncon is on. Justifiably paranoid programmers will insist on setting sefuncon on, and will add modifies clauses to unconstrained functions that are used in sef macro arguments.

8.2.2 Polymorphism

One problem with our new definition of square is that while the original macro would work for parameters of any numeric type, LCLint will now report an error is the new version is used with a non-integer parameter.

We can use the /*@alt type;,+@*> syntax to indicate that an alternate type may be used. For example,

  extern int /*@alt float@*/ square (/*@sef@*/ int /*@alt float@*/ x);
  # define square(x) ((x) *(x))

declares square for both ints and floats.

Alternate types are also useful for declaring functions for which the return value may be safely ignored (see Section 10.3.2).

8.3 Controlling Macro Checking

By default, LCLint expands macros normally and checks the resulting code after macros have been expanded. Flags and control comments may be used to control which macros are expanded and which are checked as functions or constants.

If the fcnmacros flag is on, LCLint assumes all macros defined with parameter lists implement functions and checks them accordingly. Parameterized macros are not expanded and are checked as functions with unknown result and parameter types (or using the types in the prototype, if one is given). The analogous flag for macros that define constants is constmacros. If it is on, macros with no parameter lists are assumed to be constants, and checked accordingly. The allmacros flag sets both fcnmacros and constmacros. If the macrofcndecl flag is set, a message reports parameterized macros with no corresponding function prototype. If the macroconstdecl flag is set, a similar message reports macros with no parameters with no corresponding constant declaration.

The macro checks described in the previous sections make sense only for macros that are intended to replace functions or constants. When fcnmacros or constmacros is on, more general macros need to be marked so they will not be checked as functions or constants, and will be expanded normally. Macros which are not meant to behave like functions should be preceded by the /*@notfunction@*/ comment. For example,

   /*@notfunction@*/
   # define forever for(;;)

Macros preceded by notfunction are expanded normally before regular checking is done. If a macro that is not syntactically equivalent to a statement without a semi-colon (e.g., a macro which enters a new scope) is not preceded by notfunction, parse errors may result when fcnmacros or constmacros is on.

8.4 Iterators

It is often useful to be able to execute the same code for many different values. For example, we may want to sum all elements in an intSet that represents a set of integers. If intSet is an abstract type, there is no easy way of doing this in a client module without depending on the concrete representation of the type. Instead, we could provide such a mechanism as part of the type's implementation. We call a mechanism for looping through many values an iterator.

The C language provides no mechanism for creating user-defined iterators. LCLint supports a stylized form of iterators declared using syntactic comments and defined using macros.

Iterator declarations are similar to function declarations except instead of returning a value, they assign values to their yield parameters in each iteration. For example, we could add this iterator declaration to intSet.h:

/*@iter intSet_elements (intSet s, yield int el);@*/

The yield annotation means that the variable passed as the second actual argument is declared as a local variable of type int and assigned a value in each loop iteration.

Defining Iterators

An iterator is defined using a macro. Here's one (not particularly efficient) way of defining intSet_elements:

   typedef /*@abstract@*/ struct {
      int nelements;
      int *elements;
   } intSet;
   ...
   # define intSet_elements(s,m_el) \
     { int m_i; \ 
       for (m_i = (0); m_i <= ((s)->nelements); m_i++) { \
           int m_el = (s)->elements[(m_i)];

   # define end_intSet_elements }}

Each time through the loop, the yield parameter m_el is assigned to the next value. After all values have been assigned to m_el for one iteration, the loop terminates. Variables declared by the iterator macro (including the yield parameter) are preceded by the macro variable namespace prefix m_ (see Section 8.2) to avoid conflicts with variables defined in the scope where the iterator is used.

Using Iterators

The general structure for using an iterator is,

iter (<params>) stmt; end_iter

For example, a client could use intSet_elements to sum the elements of an intSet:

   intSet s;
   int sum = 0;
   ...
   intSet_elements (s, el) { 
      sum += el; 
   } end_intSet_elements;

The actual parameter corresponding to a yield parameter, el, is not declared in the function scope. Instead, it is declared by the iterator and assigned to an appropriate value for each iteration.

LCLint will do the following checks for uses of stylized iterators:

An invocation of the iterator iter must be balanced by a corresponding end, named end_iter.
All actual parameters must be defined, except those corresponding to yield parameters.
Yield parameters must be new identifiers, not declared in the current scope or any enclosing scope.

Iterators are a bit awkward to implement, but they enable compact, easily understood client code. For abstract collection types, an iterator can be used to enable clients to operate on elements of the collection without breaking data abstraction.

Next: Naming Conventions
Contents