C++ Templates: The Complete Guide The Basics Expression Templates

Expression Templates

In this chapter we explore a template programming technique called expression templates. It was originally invented in support of numeric array classes, and that is also the context in which we introduce it here.
A numeric array class supports numeric operations on whole array objects. For example, it is possible to add two arrays, and the result contains elements that are the sums of the corresponding values in the argument arrays. Similarly, a whole array can be multiplied by a scalar, meaning that each element of the array is scaled. Naturally, it is desirable to keep the operator notation that is so familiar for built-in scalar types:

Array<double> x(1000), y(1000); 
 
x = 1.2*x + x*y;
 
For the serious number cruncher it is crucial that such expressions be evaluated as efficiently as can be expected from the platform on which the code is run. Achieving this with the compact operator notation of this example is no trivial task, but expression templates will come to our rescue.
Expression templates are reminiscent of template metaprogramming. In part this is due to the fact that expression templates rely on sometimes deeply nested template instantiations, which are not unlike the recursive instantiations encountered in template metaprograms. The fact that both techniques were originally developed to support high-performance (see our example using templates to unroll loops on page 314) array operations probably also contributes to a sense that they are related. Certainly the techniques are complementary. For example, metaprogramming is convenient for small, fixed-size array whereas expression templates are very effective for operations on medium-to-large arrays sized at run time.

Temporaries and Split Loops

To motivate expression templates, let's start with a straightforward (or maybe "naive") approach to implement templates that enable numeric array operations. A basic array template might look as follows (SArray stands for simple array):
// exprtmpl/sarray1.hpp 

#include <stddef.h> 
#include <cassert> 

template<typename T> 
class SArray { 
  public: 
    // create array with initial size 
    explicit SArray (size_t s) 
     : storage(new T[s]), storage_size(s) { 
        init(); 
    } 

    // copy constructor 
    SArray (SArray<T> const& orig) 
     : storage(new T[orig.size()]), storage_size(orig.size()) { 
        copy(orig); 
    } 

    // destructor: free memory 
    ~SArray() { 
        delete[] storage; 
    } 

    // assignment operator 
    SArray<T>& operator= (SArray<T> const& orig) { 
        if (&orig!=this) { 
            copy(orig); 
        } 
        return *this; 
    } 

    // return size 
    size_t size() const { 
        return storage_size; 
    } 

    // index operator for constants and variables 
    T operator[] (size_t idx) const { 
        return storage[idx]; 
    } 
    T& operator[] (size_t idx) { 
        return storage[idx]; 
    } 

  protected: 
    // init values with default constructor 
    void init() { 
        for (size_t idx = 0; idx<size(); ++idx) { 
            storage[idx] = T(); 
        } 
    } 
    // copy values of another array 
    void copy (SArray<T> const& orig) { 
        assert(size()==orig.size()); 
        for (size_t idx = 0; idx<size(); ++idx) { 
            storage[idx] = orig.storage[idx]; 
        } 
    } 

  private: 
    T*     storage;       // storage of the elements 
    size_t storage_size;  // number of elements 
}; 
The numeric operators can be coded as follows:
// exprtmpl/sarrayops1.hpp 

// addition of two SArrays 
template<typename T> 
SArray<T> operator+ (SArray<T> const& a, SArray<T> const& b) 
{ 
    SArray<T> result(a.size()); 
    for (size_t k = 0; k<a.size(); ++k) { 
        result[k] = a[k]+b[k]; 
    } 
    return result; 
} 

// multiplication of two SArrays 
template<typename T> 
SArray<T> operator* (SArray<T> const& a, SArray<T> const& b) 
{ 
    SArray<T> result(a.size()); 
    for (size_t k = 0; k<a.size(); ++k) { 
        result[k] = a[k]*b[k]; 
    } 
    return result; 
} 

// multiplication of scalar and SArray 
template<typename T> 
SArray<T> operator* (T const& s, SArray<T> const& a) 
{ 
    SArray<T> result(a.size()); 
    for (size_t k = 0; k<a.size(); ++k) { 
        result[k] = s*a[k]; 
    } 
    return result; 
} 

// multiplication of SArray and scalar 
// addition of scalar and SArray 
// addition of SArray and scalar 
 
Many other versions of these and other operators can be written, but these suffice to allow our example expression:
// exprtmpl/sarray1.cpp 

#include "sarray1.hpp" 
#include "sarrayops1.hpp" 

int main() 
{ 
    SArray<double> x(1000), y(1000); 
     
    x = 1.2*x + x*y; 
} 
This implementation turns out to be very inefficient for two reasons:
  1. Every application of an operator (except assignment) creates at least one temporary array (that is, at least three temporary arrays of size 1,000 each in our example, assuming a compiler performs all the allowable temporary copy eliminations).
  2. Every application of an operator requires additional traversals of the argument and result arrays (approximately 6,000 doubles are read, and approximately 4,000 doubles are written in our example, assuming only three temporary SArray objects are generated).

What happens concretely is a sequence of loops that operates with temporaries:
tmp1 = 1.2*x;      // loop of 1,000 operations 
                   // plus creation and destruction of tmp1 
tmp2 = x*y         // loop of 1,000 operations 
                   // plus creation and destruction of tmp2 
tmp3 = tmp1+tmp2;  // loop of 1,000 operations 
                   // plus creation and destruction of tmp3 
x = tmp3;          // 1,000 read operations and 1,000 write operations 
The creation of unneeded temporaries often dominates the time needed for operations on small arrays unless special fast allocators are used. For truly large arrays, temporaries are totally unacceptable because there is no storage to hold them. (Challenging numeric simulations often try to use all the available memory for more realistic results. If the memory is used to hold unneeded temporaries instead, the quality of the simulation will suffer.)
Early implementations of numeric array libraries faced this problem and encouraged users to use computed assignments (such as +=, *=, and so forth) instead. The advantage of these assignments is that both the argument and the destination are provided by the caller, and hence no temporaries are needed. For example, we could add SArray members as follows:
// exprtmpl/sarrayops2.hpp 

// additive assignment of SArray 
template<class T> 
SArray<T>& SArray<T>::operator+= (SArray<T> const& b) 
{ 
    for (size_t k = 0; k<size(); ++k) { 
        (*this)[k] += b[k]; 
    } 
    return *this; 
} 

// multiplicative assignment of SArray 
template<class T> 
SArray<T>& SArray<T>::operator*= (SArray<T> const& b) 
{ 
    for (size_t k = 0; k<size(); ++k) { 
        (*this)[k] *= b[k]; 
    } 
    return *this; 
} 

// multiplicative assignment of scalar 
template<class T> 
SArray<T>& SArray<T>::operator*= (T const& s) 
{ 
    for (size_t k = 0; k<size(); ++k) { 
        (*this)[k] *= s; 
    } 
    return *this; 
} 
With operators such as these, our example computation could be rewritten as
// exprtmpl/sarray2.cpp 
#include "sarray2.hpp" 
#include "sarrayops1.hpp" 
#include "sarrayops2.hpp" 
int main() 
{ 
    SArray<double> x(1000), y(1000); 
     
    // process x = 1.2*x + x*y 
    SArray<double> tmp(x); 
    tmp *= y; 
    x *= 1.2; 
    x += tmp; 
} 
Clearly, the technique using computed assignments still falls short:
  • The notation has become clumsy.
  • We are still left with an unneeded temporary tmp.
  • The loop is split over multiple operations, requiring a total of approximately 6,000 double elements to be read from memory and 4,000 doubles to be written to memory.
What we really want is one "ideal loop" that processes the whole expression for each index:
int main() 
{ 
    SArray<double> x(1000), y(1000); 
     
    for (int idx = 0; idx<x.size(); ++idx) { 
        x[idx] = 1.2*x[idx] + x[idx]*y[idx]; 
    } 
} 
Now we need no temporary array and we have only two memory reads (x[idx] and y[idx]) and one memory write (x[k]) per iteration. As a result, the manual loop requires only approximately 2,000 memory reads and 1,000 memory writes.
Given that on modern, high-performance computer architectures memory bandwidth is the limiting factor for the speed of these sorts of array operations, it is not surprising that in practice the performance of the simple operator overloading approaches shown here is one or two orders of magnitude slower than the manually coded loop. However, we would like to get this performance without the cumbersome and error-prone effort of writing these loops by hand or using a clumsy notation.

Encoding Expressions in Template Arguments

The key to resolving our problem is not to attempt to evaluate part of an expression until the whole expression has been seen (in our example, until the assignment operator is invoked). Thus, before the evaluation we must record which operations are being applied to which objects. The operations are determined at compile time and can therefore be encoded in template arguments.
For our example expression
1.2*x + x*y; 
this means that the result of 1.2*x is not a new array but an object that represents each value of x multiplied by 1.2. Similarly, x*y must yield each element of x multiplied by each corresponding element of y. Finally, when we need the values of the resulting array, we do the computation that we stored for later evaluation.
Let's look at a concrete implementation. With this implementation we transform the written expression
1.2*x + x*y; 
into an object with the following type:
A_Add< A_Mult<A_Scalar<double>,Array<double> >, 
       A_Mult<Array<double>,Array<double> > > 
We combine a new fundamental Array class template with class templates A_Scalar, A_Add, and A_Mult. You may recognize a prefix representation for the syntax tree corresponding to this expression . This nested template-id represents the operations involved and the types of the objects to which the operations should be applied. A_Scalar is presented later but is essentially just a placeholder for a scalar in an array expression.


Figure 18.1. Tree representation of expression 1.2*x+x*y
graphics/18fig01.gif

18.2.1 Operands of the Expression Templates

To complete the representation of the expression, we must store references to the arguments in each of the A_Add and A_Mult objects and record the value of the scalar in the A_Scalar object (or a reference thereto). Here are possible definitions for the corresponding operands:
// exprtmpl/exprops1.hpp 

#include <stddef.h> 

#include <cassert> 

// include helper class traits template to select wether to refer to an 
// ''expression template node'' either ''by value'' or ''by reference.'' 
#include "exprops1a.hpp" 

// class for objects that represent the addition of two operands 
template <typename T, typename OP1, typename OP2> 
class A_Add { 
  private: 
    typename A_Traits<OP1>::ExprRef op1;    // first operand 
    typename A_Traits<OP2>::ExprRef op2;    // second operand 

  public: 
    // constructor initializes references to operands 
    A_Add (OP1 const& a, OP2 const& b) 
     : op1(a), op2(b) { 
    } 

    // compute sum when value requested 
    T operator[] (size_t idx) const { 
        return op1[idx] + op2[idx]; 
    } 

    // size is maximum size 
    size_t size() const { 
        assert (op1.size()==0 || op2.size()==0 
                || op1.size()==op2.size()); 
        return op1.size()!=0 ? op1.size() : op2.size(); 
    } 
}; 

// class for objects that represent the multiplication of two operands 
template <typename T, typename OP1, typename OP2> 
class A_Mult { 
  private: 
    typename A_Traits<OP1>::ExprRef op1;    // first operand 
    typename A_Traits<OP2>::ExprRef op2;    // second operand 

  public: 
    // constructor initializes references to operands 
    A_Mult (OP1 const& a, OP2 const& b) 
     : op1(a), op2(b) { 
    } 

    // compute product when value requested 
    T operator[] (size_t idx) const { 
        return op1[idx] * op2[idx]; 
    } 

    // size is maximum size 
    size_t size() const { 
        assert (op1.size()==0 || op2.size()==0 
                || op1.size()==op2.size()); 
        return op1.size()!=0 ? op1.size() : op2.size(); 
    } 
}; 
As you can see, we added subscripting and size-querying operations that allow us to compute the size and the values of the elements for the array resulting from the operations represented by the subtree of "nodes" rooted at the given object.
For operations involving arrays only, the size of the result is the size of either operand. However, for operations involving both an array and a scalar, the size of the result is the size of the array operand. To distinguish array operands from scalar operands, we define a size of zero for scalars. The A_Scalar template is therefore defined as follows:
// exprtmpl/exprscalar.hpp 

// class for objects that represent scalars 
template <typename T> 
class A_Scalar { 
  private: 
    T const& s;  // value of the scalar 

  public: 
    // constructor initializes value 
    A_Scalar (T const& v) 
     : s(v) { 
    } 

    // for index operations the scalar is the value of each element 
    T operator[] (size_t) const { 
        return s; 
    } 

    // scalars have zero as size 
    size_t size() const { 
        return 0; 
    }; 
}; 
Note that scalars also provide an index operator. Inside the expression, they represent an array with the same scalar value for each index.
You probably saw that the operator classes used a helper class A_Traits to define the members for the operands:
typename A_Traits<OP1>::ExprRef op1;    // first operand 
typename A_Traits<OP2>::ExprRef op2;    // second operand 
This is necessary because of the following: In general, we can declare them to be references because most temporary nodes are bound in the top-level expression and therefore live until the end of the evaluation of that complete expression. The one exception are the A_Scalar nodes. They are bound within the operator functions and might not live until the end of the evaluation of the complete expression. Thus, to avoid that the members refer to scalars that don't exist anymore, for scalars the operands have to get copied "by value." In other words, we need members that are
  • constant references in general:
    OP1 const& op1;  // refer to first operand by reference 
    OP2 const& op2;  // refer to second operand by reference 
  • but ordinary values for scalars:
    OP1 op1;         // refer to first operand by value 
    OP2 op2;         // refer to second operand by value 
This is a perfect application of traits classes. The traits class defines a type to be a constant reference in general, but an ordinary value for scalars:
// exprtmpl/exprops1a.hpp 

/* helper traits class to select how to refer to an ''expression template node'' 
 * - in general: by reference 
 * - for scalars: by value 
 */ 

template <typename T> class A_Scalar; 

// primary template 
template <typename T> 
class A_Traits { 
  public: 
    typedef T const& ExprRef;     // type to refer to is constant reference 
}; 

// partial specialization for scalars 
template <typename T> 
class A_Traits<A_Scalar<T> > { 
  public: 
    typedef A_Scalar<T> ExprRef;  // type to refer to is ordinary value 
}; 
Note that since A_Scalar objects refer to scalars in the top-level expression, those scalars can use reference types.

18.2.2 The Array Type

With our ability to encode expressions using lightweight expression templates, we must now create an Array type that controls actual storage and that knows about the expression templates. However, it is also useful for engineering purposes to keep as similar as possible the interface for a real array with storage and one for a representation of an expression that results in an array. To this end, we declare the Array template as follows:
template <typename T, typename Rep = SArray<T> > 
class Array; 
The type Rep can be SArray if Array is a real array of storage,  or it can be the nested template-id such as A_Add or A_Mult that encodes an expression. Either way we are handling Array instantiations, which considerably simplify our later dealings. In fact, even the definition of the Array template needs no specializations to distinguish the two cases, although some of the members cannot be instantiated for types like A_Mult substituted for Rep.
[1] It is convenient to reuse the previously developed SArray here, but in an industrial-strength library, a special-purpose implementation may be preferable because we won't use all the features of SArray.
Here is the definition. The functionality is limited roughly to what was provided by our SArray template, although once the code is understood, it is not hard to add to that functionality:
// exprtmpl/exprarray.hpp 

#include <stddef.h> 
#include <cassert> 
#include "sarray1.hpp" 

template <typename T, typename Rep = SArray<T> > 
class Array { 
  private: 
    Rep expr_rep;   // (access to) the data of the array 

  public: 
    // create array with initial size 
    explicit Array (size_t s) 
     : expr_rep(s) { 
    } 

    // create array from possible representation 
    Array (Rep const& rb) 
     : expr_rep(rb) { 
    } 

    // assignment operator for same type 
    Array& operator= (Array const& b) { 
        assert(size()==b.size()); 
        for (size_t idx = 0; idx<b.size(); ++idx) { 
            expr_rep[idx] = b[idx]; 
        } 
        return *this; 
    } 

    // assignment operator for arrays of different type 
    template<typename T2, typename Rep2> 
    Array& operator= (Array<T2, Rep2> const& b) { 
        assert(size()==b.size()); 
        for (size_t idx = 0; idx<b.size(); ++idx) { 
            expr_rep[idx] = b[idx]; 
        } 
        return *this; 
    } 

    // size is size of represented data 
    size_t size() const { 
        return expr_rep.size(); 
    } 

    // index operator for constants and variables 
    T operator[] (size_t idx) const { 
        assert(idx<size()); 
        return expr_rep[idx]; 
    } 
    T& operator[] (size_t idx) { 
        assert(idx<size()); 
        return expr_rep[idx]; 
    } 

    // return what the array currently represents 
    Rep const& rep() const { 
        return expr_rep; 
    } 
    Rep& rep() { 
        return expr_rep; 
    } 
}; 
As you can see, many operations are simply forwarded to the underlying Rep object. However, when copying another array, we must take into account the possibility that the other array is really built on an expression template. Thus, we parameterize these copy operations in terms of the underlying Rep representation.

18.2.3 The Operators

We have most of the machinery in place to have efficient numeric operators for our numeric Array template, except the operators themselves. As implied earlier, these operators only assemble the expression template objects—they don't actually evaluate the resulting arrays.
For each ordinary binary operator we must implement three versions: array-array, array-scalar, and scalar-array. To be able to compute our initial value we need, for example, the following operators:
// exprtmpl/exprops2.hpp 

// addition of two Arrays 
template <typename T, typename R1, typename R2> 
Array<T,A_Add<T,R1,R2> > 
operator+ (Array<T,R1> const& a, Array<T,R2> const& b) { 
    return Array<T,A_Add<T,R1,R2> > 
           (A_Add<T,R1,R2>(a.rep(),b.rep())); 
} 

// multiplication of two Arrays 
template <typename T, typename R1, typename R2> 
Array<T, A_Mult<T,R1,R2> > 
operator* (Array<T,R1> const& a, Array<T,R2> const& b) { 
    return Array<T,A_Mult<T,R1,R2> > 
           (A_Mult<T,R1,R2>(a.rep(), b.rep())); 
} 

// multiplication of scalar and Array 
template <typename T, typename R2> 
Array<T, A_Mult<T,A_Scalar<T>,R2> > 
operator* (T const& s, Array<T,R2> const& b) { 
    return Array<T,A_Mult<T,A_Scalar<T>,R2> > 
           (A_Mult<T,A_Scalar<T>,R2>(A_Scalar<T>(s), b.rep())); 
} 

// multiplication of Array and scalar 
// addition of scalar and Array 
// addition of Array and scalar 
 
The declaration of these operators is somewhat cumbersome (as can be seen from these examples), but the functions really don't do much. For example, the plus operator for two arrays first creates an A_Add<> object that represents the operator and the operands
A_Add<T,R1,R2>(a.rep(),b.rep()) 
and wraps this object in an Array object so that we can use the result as any other object that represents data of an array:
return Array<T,A_Add<T,R1,R2> > ( ); 
For scalar multiplication, we use the A_Scalar template to create the A_Mult object
A_Mult<T,A_Scalar<T>,R2>(A_Scalar<T>(s), b.rep()) 
and wrap again:
return Array<T,A_Mult<T,A_Scalar<T>,R2> > ( ); 
Other nonmember binary operators are so similar that macros can be used to cover most operators with relatively little source code. Another (smaller) macro could be used for nonmember unary operators.

18.2.4 Review

On first discovery of the expression template idea, the interaction of the various declarations and definitions can be daunting. Hence, a top-down review of what happens with our example code may help crystallize understanding. The code we will analyze is the following (you can find it as part of meta/exprmain.cpp):
int main() 
{ 
    Array<double> x(1000), y(1000); 
     
    x = 1.2*x + x*y; 
} 
Because the Rep argument is omitted in the definition of x and y, it is set to the default, which is SArray<double>. So, x and y are arrays with "real" storage and not just recordings of operations.
When parsing the expression
1.2*x + x*y 
the compiler first applies the leftmost * operation, which is a scalar-array operator. Overload resolution thus selects the scalar-array form of operator*:
template <typename T, typename R2> 
Array<T, A_Mult<T,A_Scalar<T>,R2> > 
operator* (T const& s, Array<T,R2> const& b) { 
    return Array<T,A_Mult<T,A_Scalar<T>,R2> > 
           (A_Mult<T,A_Scalar<T>,R2>(A_Scalar<T>(s), b.rep())); 
} 
The operand types are double and Array<double, SArray<double> >. Thus, the type of the result is
Array<double, A_Mult<double, A_Scalar<double>, SArray<double> > > 
The result value is constructed to reference an A_Scalar<double> object constructed from the double value 1.2 and the SArray<double> representation of the object x.
Next, the second multiplication is evaluated: It is an array-array operation x*y. This time we use the appropriate operator*:
template <typename T, typename R1, typename R2> 
Array<T, A_Mult<T,R1,R2> > 
operator* (Array<T,R1> const& a, Array<T,R2> const& b) { 
    return Array<T,A_Mult<T,R1,R2> > 
           (A_Mult<T,R1,R2>(a.rep(), b.rep())); 
} 
The operand types are both Array<double, SArray<double> >, so the result type is
Array<double, A_Mult<double, SArray<double>, SArray<double> > > 
This time the wrapped A_Mult object refers to two SArray<double> representations: the one of x and the one of y.
Finally, the + operation is evaluated. It is again an array-array operation, and the operand types are the result types that we just deduced. So, we invoke the array-array operator +:
template <typename T, typename R1, typename R2> 
Array<T,A_Add<T,R1,R2> > 
operator+ (Array<T,R1> const& a, Array<T,R2> const& b) { 
    return Array<T,A_Add<T,R1,R2> > 
           (A_Add<T,R1,R2>(a.rep(),b.rep())); 
} 
T is substituted with double whereas R1 is substituted with
A_Mult<double, A_Scalar<double>, SArray<double> > 
and R2 is substituted with
A_Mult<double, SArray<double>, SArray<double> > 
Hence, the type of the expression to the right of the assignment token is
Array<double, 
      A_Add<double, 
            A_Mult<double, A_Scalar<double>, SArray<double> >, 
            A_Mult<double, SArray<double>, SArray<double>>>> 
This type is matched to the assignment operator template of the Array template:
template <typename T, typename Rep = SArray<T> > 
class Array { 
  public: 
     
    // assignment operator for arrays of different type 
    template<typename T2, typename Rep2> 
    Array& operator= (Array<T2, Rep2> const& b) { 
        assert(size()==b.size()); 
        for (size_t idx = 0; idx<b.size(); ++idx) { 
            expr_rep[idx] = b[idx]; 
        } 
        return *this; 
    } 
     
}; 
The assignment operator computes each element of the destination x by applying the subscript operator to the representation of the right side, the type of which is
A_Add<double, 
      A_Mult<double, A_Scalar<double>, SArray<double> >, 
      A_Mult<double, SArray<double>, SArray<double> > > > 
Carefully tracing this subscript operator shows that for a given subscript idx, it computes
(1.2*x[idx]) + (x[idx]*y[idx]) 
which is exactly what we want.

18.2.5 Expression Templates Assignments

It is not possible to instantiate write operations for an array with a Rep argument that is built on our example A_Mult and A_Add expression templates. (Indeed, it makes no sense to write a+b = c.) However, it is entirely reasonable to write other expression templates for which assignment to the result is possible. For example, indexing with an array of integral values would intuitively correspond to subset selection. In other words, the expression
x[y] = 2*x[y]; 
should mean the same as
for (size_t idx = 0; idx<y.size(); ++idx) { 
    x[y[idx]] = 2*x[y[idx]]; 
} 
Enabling this implies that an array built on an expression template behaves like an lvalue (that is, is "writable"). The expression template component for this is not fundamentally different from, say, A_Mult, except that both const and non-const versions of the subscript operators are provided and they return lvalues (references):
// exprtmpl/exprops3.hpp 

template<typename T, typename A1, typename A2> 
class A_Subscript { 
  public: 
    // constructor initializes references to operands 
    A_Subscript (A1 const & a, A2 const & b) 
     : a1(a), a2(b) { 
    } 

    // process subscription when value requested 
    T operator[] (size_t idx) const { 
        return a1[a2[idx]]; 
    } 
    T& operator[] (size_t idx) { 
        return a1[a2[idx]]; 
    } 

    // size is size of inner array 
    size_t size() const { 
        return a2.size(); 
    } 

  private: 
    A1 const & a1;    // reference to first operand 
    A2 const & a2;    // reference to second operand 
}; 
The extended subscript operator with subset semantics that was suggested earlier would require that additional subscript operators be added to the Array template. One of these operators could be defined as follows (a corresponding const version would presumably also be needed):
// exprtmpl/exprops4.hpp 

template<typename T, typename R1, typename R2> 
Array<T,A_Subscript<T,R1,R2> > 
Array<T,R1>::operator[] (Array<T,R2> const & b) { 
    return Array<T,A_Subscript<T,R1,R2> > 
           (A_Subscript<T,R1,R2>(a.rep(),b.rep()));

Performance and Limitations of Expression Templates

To justify the complexity of the expression template idea, we have already invoked greatly enhanced performance on arraywise operations. As you trace what happens with the expression templates, you'll find that many small inline functions call each other and that many small expression template objects are allocated on the call stack. The optimizer must perform complete inlining and elimination of the small objects to produce code that performs as well as manually coded loops. The latter feat is still rare among C++ compilers at the time of this writing.
The expression templates technique does not resolve all the problematic situations involving numeric operations on arrays. For example, it does not work for matrix-vector multiplications of the form
x = A*x; 
where x is a column vector of size n and A is an n-by-n matrix. The problem here is that a temporary must be used because each element of the result can depend on each element of the original x. Unfortunately, the expression template loop updates the first element of x right away and then uses that newly computed element to compute the second element, which is wrong. The slightly different expression
x = A*y; 
on the other hand, does not need a temporary if x and y aren't aliases for each other, which implies that a solution would have to know the relationship of the operands at run time. This in turn suggests creating a run-time structure that represents the expression tree instead of encoding the tree in the type of the expression template. This approach was pioneered by the NewMat library of Robert Davies (see [new matt]). It was known long before expression templates were developed.
Expression templates aren't limited to numeric computations either. An intriguing application, for example, is Jaakko Järvi and Gary Powell's Lambda Library (see [lamddalid]). This library uses standard library function objects as expression objects. For example, it allows us to write the following:
void lambda_demo (std::vector<long*> & ones) { 
    std::sort(ones.begin(), ones.end(), *_1 > *_2); 
} 
This short code excerpt sorts an array in increasing order of the value of what its elements refer to. Without the Lambda library, we'd have to define a simple (but cumbersome) special-purpose functor type. Instead, we can now use simple inline syntax to express the operations we want to apply. In our example, _1 and _2 are placeholders provided by the Lambda library. They correspond to elementary expression objects that are also functors. They can then be used to construct more complex expressions using the techniques developed in this chapter.

Afternotes

Expression templates were developed independently by Todd Veldhuizen and David Vandevoorde (Todd coined the term) at a time when member templates were not yet part of the C++ programming language (and it seemed at the time that they would never be added to C++). This caused some problems in implementing the assignment operator: It could not be parameterized for the expression template. One technique to work around this consisted of introducing in the expression templates a conversion operator to a Copier class parameterized with the expression template but inheriting from a base class that was parameterized only in the element type. This base class then provided a (virtual) copy_to interface to which the assignment operator could refer. Here is a sketch of the mechanism (with the template names used in this chapter):
template<typename T> 
class CopierInterface { 
  public: 
    virtual void copy_to(Array<T, SArray<T> >&) const; 
}; 

template<typename T, typename X> 
class Copier : public CopierBase<T> { 
  public: 
    Copier(X const &x): expr(x) {} 
    virtual void copy_to(Array<T, SArray<T> >&) const { 
        // implementation of assignment loop 
         
    } 
  private: 
    X const &expr; 
}; 

template<typename T, typename Rep = SArray<T> > 
class Array { 
  public: 
    // delegated assignment operator 
    Array<T, Rep>& operator=(CopierBase<T> const &b) { 
        b.copy_to(rep); 
    }; 
     
}; 
template<typename T, typename A1, typename A2> 
class A_mult { 
  public: 
    operator Copier<T, A_Mult<T, A1, A2> >(); 
     
}; 
This adds another level of complexity and some additional run-time cost to expression templates, but even so the resulting performance benefits were impressive at the time.
The C++ standard library contains a class template valarray that was meant to be used for applications that would justify the techniques used for the Array template developed in this chapter. A precursor of valarray had been designed with the intention that compilers aiming at the market for scientific computation would recognize the array type and use highly optimized internal code for their operations. Such compilers would have "understood" the types in some sense. However, this never happened (in part because the market in question is relatively small and in part because the problem grew in complexity as valarray became a template). Some time after the expression template technique was discovered, one of us (Vandevoorde) submitted to the C++ committee a proposal that turned valarray essentially into the Array template we developed (with many bells and whistles inspired by the existing valarray functionality). The proposal was the first time that the concept of the Rep parameter was documented. Prior to this, the arrays with actual storage and the expression template pseudo-arrays were different templates. When client code introduced a function foo() accepting an array—for example,
double foo(Array<double> const&); 
calling foo(1.2*x) forced the conversion for the expression template to an array with actual storage, even when the operations applied to that argument did not require a temporary. With expresssion templates embedded in the Rep argument it is possible instead to declare
template<typename R> 
double foo(Array<double, R> const&); 
and no conversion happens unless one is actually needed.
The valarray proposal came late in the C++ standardization process and practically rewrote all the text regarding valarray in the standard. It was rejected as a result, and instead, a few tweaks were added to the existing text to allow implementations based on expression templates. However, the exploitation of this allowance remains much more cumbersome than what was discussed here. At the time of this writing, no such implementation is known, and standard valarrays are, generally speaking, quite inefficient at performing the operations for which they were designed.
Finally, it is worth observing here that many of the pioneering techniques presented in this chapter, as well as what later became known as the STL,  were all originally implemented on the same compiler: version 4 of the Borland C++ compiler. This was perhaps the first compiler that made template programming widely available to the C++ programming community.

Followers