Expression Templates
In this chapter we explore a template programming technique 
called expression templates. It was originally 
invented in support of numeric array classes, and that is also the context in 
which we introduce it here.
A numeric array class supports numeric operations on whole 
array objects. For example, it is possible to add two arrays, and the result 
contains elements that are the sums of the corresponding values in the argument 
arrays. Similarly, a whole array can be multiplied by a scalar, meaning that 
each element of the array is scaled. Naturally, it is desirable to keep the 
operator notation that is so familiar for built-in scalar types:
Array<double> x(1000), y(1000); 
… 
x = 1.2*x + x*y;
For the serious number cruncher it is crucial that such 
expressions be evaluated as efficiently as can be expected from the platform on 
which the code is run. Achieving this with the compact operator notation of this 
example is no trivial task, but expression templates will come to our 
rescue.
Expression templates are reminiscent of template 
metaprogramming. In part this is due to the fact that expression templates rely 
on sometimes deeply nested template instantiations, which are not unlike the 
recursive instantiations encountered in template metaprograms. The fact that 
both techniques were originally developed to support high-performance (see our 
example using templates to unroll loops on page 314) array operations probably 
also contributes to a sense that they are related. Certainly the techniques are 
complementary. For example, metaprogramming is convenient for small, fixed-size 
array whereas expression templates are very effective for operations on 
medium-to-large arrays sized at run time.
Temporaries and Split Loops
To motivate expression templates, let's start with a 
straightforward (or maybe "naive") approach to implement templates that enable 
numeric array operations. A basic array template might look as follows 
(SArray stands for simple array):
// exprtmpl/sarray1.hpp #include <stddef.h> #include <cassert> template<typename T> class SArray { public: // create array with initial size explicit SArray (size_t s) : storage(new T[s]), storage_size(s) { init(); } // copy constructor SArray (SArray<T> const& orig) : storage(new T[orig.size()]), storage_size(orig.size()) { copy(orig); } // destructor: free memory ~SArray() { delete[] storage; } // assignment operator SArray<T>& operator= (SArray<T> const& orig) { if (&orig!=this) { copy(orig); } return *this; } // return size size_t size() const { return storage_size; } // index operator for constants and variables T operator[] (size_t idx) const { return storage[idx]; } T& operator[] (size_t idx) { return storage[idx]; } protected: // init values with default constructor void init() { for (size_t idx = 0; idx<size(); ++idx) { storage[idx] = T(); } } // copy values of another array void copy (SArray<T> const& orig) { assert(size()==orig.size()); for (size_t idx = 0; idx<size(); ++idx) { storage[idx] = orig.storage[idx]; } } private: T* storage; // storage of the elements size_t storage_size; // number of elements };
The numeric operators can be coded as follows:
// exprtmpl/sarrayops1.hpp // addition of two SArrays template<typename T> SArray<T> operator+ (SArray<T> const& a, SArray<T> const& b) { SArray<T> result(a.size()); for (size_t k = 0; k<a.size(); ++k) { result[k] = a[k]+b[k]; } return result; } // multiplication of two SArrays template<typename T> SArray<T> operator* (SArray<T> const& a, SArray<T> const& b) { SArray<T> result(a.size()); for (size_t k = 0; k<a.size(); ++k) { result[k] = a[k]*b[k]; } return result; } // multiplication of scalar and SArray template<typename T> SArray<T> operator* (T const& s, SArray<T> const& a) { SArray<T> result(a.size()); for (size_t k = 0; k<a.size(); ++k) { result[k] = s*a[k]; } return result; } // multiplication of SArray and scalar // addition of scalar and SArray // addition of SArray and scalar …
Many other versions of these and other operators can be 
written, but these suffice to allow our example expression:
// exprtmpl/sarray1.cpp #include "sarray1.hpp" #include "sarrayops1.hpp" int main() { SArray<double> x(1000), y(1000); … x = 1.2*x + x*y; }
This implementation turns out to be very inefficient for two 
reasons:
- 
Every application of an operator (except assignment) creates at least one temporary array (that is, at least three temporary arrays of size 1,000 each in our example, assuming a compiler performs all the allowable temporary copy eliminations).
- 
Every application of an operator requires additional traversals of the argument and result arrays (approximately 6,000 doubles are read, and approximately 4,000 doubles are written in our example, assuming only three temporary SArray objects are generated).
What happens concretely is a sequence of loops that operates 
with temporaries:
tmp1 = 1.2*x; // loop of 1,000 operations // plus creation and destruction of tmp1 tmp2 = x*y // loop of 1,000 operations // plus creation and destruction of tmp2 tmp3 = tmp1+tmp2; // loop of 1,000 operations // plus creation and destruction of tmp3 x = tmp3; // 1,000 read operations and 1,000 write operations
The creation of unneeded temporaries often dominates the time 
needed for operations on small arrays unless special fast allocators are used. 
For truly large arrays, temporaries are totally unacceptable because there is no 
storage to hold them. (Challenging numeric simulations often try to use all the 
available memory for more realistic results. If the memory is used to hold 
unneeded temporaries instead, the quality of the simulation will suffer.)
Early implementations of numeric array libraries faced this 
problem and encouraged users to use computed assignments (such as +=, 
*=, and so forth) instead. The advantage of these assignments is that 
both the argument and the destination are provided by the caller, and hence no 
temporaries are needed. For example, we could add SArray members as 
follows:
// exprtmpl/sarrayops2.hpp // additive assignment of SArray template<class T> SArray<T>& SArray<T>::operator+= (SArray<T> const& b) { for (size_t k = 0; k<size(); ++k) { (*this)[k] += b[k]; } return *this; } // multiplicative assignment of SArray template<class T> SArray<T>& SArray<T>::operator*= (SArray<T> const& b) { for (size_t k = 0; k<size(); ++k) { (*this)[k] *= b[k]; } return *this; } // multiplicative assignment of scalar template<class T> SArray<T>& SArray<T>::operator*= (T const& s) { for (size_t k = 0; k<size(); ++k) { (*this)[k] *= s; } return *this; }
With operators such as these, our example computation could be 
rewritten as
// exprtmpl/sarray2.cpp #include "sarray2.hpp" #include "sarrayops1.hpp" #include "sarrayops2.hpp" int main() { SArray<double> x(1000), y(1000); … // process x = 1.2*x + x*y SArray<double> tmp(x); tmp *= y; x *= 1.2; x += tmp; }
Clearly, the technique using computed assignments still falls 
short:
- 
The notation has become clumsy.
- 
We are still left with an unneeded temporary tmp.
- 
The loop is split over multiple operations, requiring a total of approximately 6,000 double elements to be read from memory and 4,000 doubles to be written to memory.
What we really want is one 
"ideal loop" that processes the whole expression for each index:
int main() 
{ 
    SArray<double> x(1000), y(1000); 
    … 
    for (int idx = 0; idx<x.size(); ++idx) { 
        x[idx] = 1.2*x[idx] + x[idx]*y[idx]; 
    } 
} 
Now we need no temporary array and we have only two memory 
reads (x[idx] and y[idx]) and one memory write (x[k]) 
per iteration. As a result, the manual loop requires only approximately 2,000 
memory reads and 1,000 memory writes.
Given that on modern, high-performance computer architectures 
memory bandwidth is the limiting factor for the speed of these sorts of array 
operations, it is not surprising that in practice the performance of the simple 
operator overloading approaches shown here is one or two orders of magnitude 
slower than the manually coded loop. However, we would like to get this 
performance without the cumbersome and error-prone effort of writing these loops 
by hand or using a clumsy notation.
Encoding Expressions in Template Arguments
The key to resolving our problem is not to attempt to evaluate 
part of an expression until the whole expression has been seen (in our example, 
until the assignment operator is invoked). Thus, before the evaluation we must 
record which operations are being applied to which objects. The operations are 
determined at compile time and can therefore be encoded in template 
arguments.
For our example expression
1.2*x + x*y;
this means that the result of 1.2*x is not a new array 
but an object that represents each value of 
x multiplied by 1.2. Similarly, 
x*y must yield each element of 
x multiplied by each corresponding element 
of y. Finally, when we need the values of the resulting array, 
we do the computation that we stored for later evaluation.
Let's look at a concrete implementation. With this 
implementation we transform the written expression
1.2*x + x*y;
into an object with the following type:
A_Add< A_Mult<A_Scalar<double>,Array<double> >, 
       A_Mult<Array<double>,Array<double> > > 
We combine a new fundamental Array class template with 
class templates A_Scalar, A_Add, and A_Mult. You may 
recognize a prefix representation for the syntax tree corresponding to this 
expression . This 
nested template-id represents the operations involved and the types of the 
objects to which the operations should be applied. A_Scalar is 
presented later but is essentially just a placeholder for a scalar in an array 
expression.
Figure 18.1. Tree representation of expression 1.2*x+x*y

18.2.1 Operands of the Expression Templates
To complete the representation of the expression, we must store 
references to the arguments in each of the A_Add and A_Mult 
objects and record the value of the scalar in the A_Scalar object (or a 
reference thereto). Here are possible definitions for the corresponding 
operands:
// exprtmpl/exprops1.hpp #include <stddef.h> #include <cassert> // include helper class traits template to select wether to refer to an // ''expression template node'' either ''by value'' or ''by reference.'' #include "exprops1a.hpp" // class for objects that represent the addition of two operands template <typename T, typename OP1, typename OP2> class A_Add { private: typename A_Traits<OP1>::ExprRef op1; // first operand typename A_Traits<OP2>::ExprRef op2; // second operand public: // constructor initializes references to operands A_Add (OP1 const& a, OP2 const& b) : op1(a), op2(b) { } // compute sum when value requested T operator[] (size_t idx) const { return op1[idx] + op2[idx]; } // size is maximum size size_t size() const { assert (op1.size()==0 || op2.size()==0 || op1.size()==op2.size()); return op1.size()!=0 ? op1.size() : op2.size(); } }; // class for objects that represent the multiplication of two operands template <typename T, typename OP1, typename OP2> class A_Mult { private: typename A_Traits<OP1>::ExprRef op1; // first operand typename A_Traits<OP2>::ExprRef op2; // second operand public: // constructor initializes references to operands A_Mult (OP1 const& a, OP2 const& b) : op1(a), op2(b) { } // compute product when value requested T operator[] (size_t idx) const { return op1[idx] * op2[idx]; } // size is maximum size size_t size() const { assert (op1.size()==0 || op2.size()==0 || op1.size()==op2.size()); return op1.size()!=0 ? op1.size() : op2.size(); } };
As you can see, we added subscripting and size-querying 
operations that allow us to compute the size and the values of the elements for 
the array resulting from the operations represented by the subtree of "nodes" 
rooted at the given object.
For operations involving arrays only, the size of the result is 
the size of either operand. However, for operations involving both an array and 
a scalar, the size of the result is the size of the array operand. To 
distinguish array operands from scalar operands, we define a size of zero for 
scalars. The A_Scalar template is therefore defined as follows:
// exprtmpl/exprscalar.hpp // class for objects that represent scalars template <typename T> class A_Scalar { private: T const& s; // value of the scalar public: // constructor initializes value A_Scalar (T const& v) : s(v) { } // for index operations the scalar is the value of each element T operator[] (size_t) const { return s; } // scalars have zero as size size_t size() const { return 0; }; };
Note that scalars also provide an index operator. Inside the 
expression, they represent an array with the same scalar value for each 
index.
You probably saw that the operator classes used a helper class 
A_Traits to define the members for the operands:
typename A_Traits<OP1>::ExprRef op1; // first operand typename A_Traits<OP2>::ExprRef op2; // second operand
This is necessary because of the following: In general, we can 
declare them to be references because most temporary nodes are bound in the 
top-level expression and therefore live until the end of the evaluation of that 
complete expression. The one exception are the A_Scalar nodes. They are 
bound within the operator functions and might not live until the end of the 
evaluation of the complete expression. Thus, to avoid that the members refer to 
scalars that don't exist anymore, for scalars the operands have to get copied 
"by value." In other words, we need members that are
- 
constant references in general:OP1 const& op1; // refer to first operand by reference OP2 const& op2; // refer to second operand by reference 
- 
but ordinary values for scalars:OP1 op1; // refer to first operand by value OP2 op2; // refer to second operand by value 
This is a perfect application of traits classes. The traits 
class defines a type to be a constant reference in general, but an ordinary 
value for scalars:
// exprtmpl/exprops1a.hpp /* helper traits class to select how to refer to an ''expression template node'' * - in general: by reference * - for scalars: by value */ template <typename T> class A_Scalar; // primary template template <typename T> class A_Traits { public: typedef T const& ExprRef; // type to refer to is constant reference }; // partial specialization for scalars template <typename T> class A_Traits<A_Scalar<T> > { public: typedef A_Scalar<T> ExprRef; // type to refer to is ordinary value };
Note that since A_Scalar objects refer to scalars in 
the top-level expression, those scalars can use reference types.
18.2.2 The Array Type
With our ability to encode expressions using lightweight 
expression templates, we must now create an Array type that controls 
actual storage and that knows about the expression templates. However, it is 
also useful for engineering purposes to keep as similar as possible the 
interface for a real array with storage and one for a representation of an 
expression that results in an array. To this end, we declare the Array 
template as follows:
template <typename T, typename Rep = SArray<T> > class Array;
The type Rep can be SArray if Array 
is a real array of storage,  or it can be the nested template-id such as 
A_Add or A_Mult that encodes an expression. Either way we are 
handling Array instantiations, which considerably simplify our later 
dealings. In fact, even the definition of the Array template needs no 
specializations to distinguish the two cases, although some of the members 
cannot be instantiated for types like A_Mult substituted for 
Rep.
[1] It is convenient to reuse the previously developed SArray here, but in an industrial-strength library, a special-purpose implementation may be preferable because we won't use all the features of SArray.
Here is the definition. The functionality is limited roughly to 
what was provided by our SArray template, although once the code is 
understood, it is not hard to add to that functionality:
// exprtmpl/exprarray.hpp #include <stddef.h> #include <cassert> #include "sarray1.hpp" template <typename T, typename Rep = SArray<T> > class Array { private: Rep expr_rep; // (access to) the data of the array public: // create array with initial size explicit Array (size_t s) : expr_rep(s) { } // create array from possible representation Array (Rep const& rb) : expr_rep(rb) { } // assignment operator for same type Array& operator= (Array const& b) { assert(size()==b.size()); for (size_t idx = 0; idx<b.size(); ++idx) { expr_rep[idx] = b[idx]; } return *this; } // assignment operator for arrays of different type template<typename T2, typename Rep2> Array& operator= (Array<T2, Rep2> const& b) { assert(size()==b.size()); for (size_t idx = 0; idx<b.size(); ++idx) { expr_rep[idx] = b[idx]; } return *this; } // size is size of represented data size_t size() const { return expr_rep.size(); } // index operator for constants and variables T operator[] (size_t idx) const { assert(idx<size()); return expr_rep[idx]; } T& operator[] (size_t idx) { assert(idx<size()); return expr_rep[idx]; } // return what the array currently represents Rep const& rep() const { return expr_rep; } Rep& rep() { return expr_rep; } };
As you can see, many operations are simply forwarded to the 
underlying Rep object. However, when copying another array, we must 
take into account the possibility that the other array is really built on an 
expression template. Thus, we parameterize these copy operations in terms of the 
underlying Rep representation.
18.2.3 The Operators
We have most of the machinery in place to have efficient 
numeric operators for our numeric Array template, except the operators 
themselves. As implied earlier, these operators only assemble the expression 
template objects—they don't actually evaluate the resulting arrays.
For each ordinary binary operator we must implement three 
versions: array-array, array-scalar, and scalar-array. To be able to compute our 
initial value we need, for example, the following operators:
// exprtmpl/exprops2.hpp // addition of two Arrays template <typename T, typename R1, typename R2> Array<T,A_Add<T,R1,R2> > operator+ (Array<T,R1> const& a, Array<T,R2> const& b) { return Array<T,A_Add<T,R1,R2> > (A_Add<T,R1,R2>(a.rep(),b.rep())); } // multiplication of two Arrays template <typename T, typename R1, typename R2> Array<T, A_Mult<T,R1,R2> > operator* (Array<T,R1> const& a, Array<T,R2> const& b) { return Array<T,A_Mult<T,R1,R2> > (A_Mult<T,R1,R2>(a.rep(), b.rep())); } // multiplication of scalar and Array template <typename T, typename R2> Array<T, A_Mult<T,A_Scalar<T>,R2> > operator* (T const& s, Array<T,R2> const& b) { return Array<T,A_Mult<T,A_Scalar<T>,R2> > (A_Mult<T,A_Scalar<T>,R2>(A_Scalar<T>(s), b.rep())); } // multiplication of Array and scalar // addition of scalar and Array // addition of Array and scalar …
The declaration of these operators is somewhat cumbersome (as 
can be seen from these examples), but the functions really don't do much. For 
example, the plus operator for two arrays first creates an 
A_Add<> object that represents the operator and the operands
A_Add<T,R1,R2>(a.rep(),b.rep())
and wraps this object in an Array object so that we 
can use the result as any other object that represents data of an array:
return Array<T,A_Add<T,R1,R2> > (… ); 
For scalar multiplication, we use the A_Scalar 
template to create the A_Mult object
A_Mult<T,A_Scalar<T>,R2>(A_Scalar<T>(s), b.rep())
and wrap again:
return Array<T,A_Mult<T,A_Scalar<T>,R2> > (… ); 
Other nonmember binary operators are so similar that macros can 
be used to cover most operators with relatively little source code. Another 
(smaller) macro could be used for nonmember unary operators.
18.2.4 Review
On first discovery of the expression template idea, the 
interaction of the various declarations and definitions can be daunting. Hence, 
a top-down review of what happens with our example code may help crystallize 
understanding. The code we will analyze is the following (you can find it as 
part of meta/exprmain.cpp):
int main() 
{ 
    Array<double> x(1000), y(1000); 
    … 
    x = 1.2*x + x*y; 
} 
Because the Rep argument is omitted in the definition 
of x and y, it is set to the default, which is 
SArray<double>. So, x and y are arrays with 
"real" storage and not just recordings of operations.
When parsing the expression
1.2*x + x*y
the compiler first applies the leftmost * operation, 
which is a scalar-array operator. Overload resolution thus selects the 
scalar-array form of operator*:
template <typename T, typename R2> 
Array<T, A_Mult<T,A_Scalar<T>,R2> > 
operator* (T const& s, Array<T,R2> const& b) { 
    return Array<T,A_Mult<T,A_Scalar<T>,R2> > 
           (A_Mult<T,A_Scalar<T>,R2>(A_Scalar<T>(s), b.rep())); 
} 
The operand types are double and Array<double, 
SArray<double> >. Thus, the type of the result is
Array<double, A_Mult<double, A_Scalar<double>, SArray<double> > >
The result value is constructed to reference an 
A_Scalar<double> object constructed from the double 
value 1.2 and the SArray<double> representation of the 
object x.
Next, the second multiplication is evaluated: It is an 
array-array operation x*y. This time we use the appropriate 
operator*:
template <typename T, typename R1, typename R2> 
Array<T, A_Mult<T,R1,R2> > 
operator* (Array<T,R1> const& a, Array<T,R2> const& b) { 
    return Array<T,A_Mult<T,R1,R2> > 
           (A_Mult<T,R1,R2>(a.rep(), b.rep())); 
} 
The operand types are both Array<double, 
SArray<double> >, so the result type is
Array<double, A_Mult<double, SArray<double>, SArray<double> > >
This time the wrapped A_Mult object refers to two 
SArray<double> representations: the one of x and the one 
of y.
Finally, the + operation is evaluated. It is again an 
array-array operation, and the operand types are the result types that we just 
deduced. So, we invoke the array-array operator +:
template <typename T, typename R1, typename R2> 
Array<T,A_Add<T,R1,R2> > 
operator+ (Array<T,R1> const& a, Array<T,R2> const& b) { 
    return Array<T,A_Add<T,R1,R2> > 
           (A_Add<T,R1,R2>(a.rep(),b.rep())); 
} 
T is substituted with double whereas 
R1 is substituted with
A_Mult<double, A_Scalar<double>, SArray<double> >
and R2 is substituted with
A_Mult<double, SArray<double>, SArray<double> >
Hence, the type of the expression to the right of the 
assignment token is
Array<double, 
      A_Add<double, 
            A_Mult<double, A_Scalar<double>, SArray<double> >, 
            A_Mult<double, SArray<double>, SArray<double>>>> 
This type is matched to the assignment operator template of the 
Array template:
template <typename T, typename Rep = SArray<T> > 
class Array { 
  public: 
    … 
    // assignment operator for arrays of different type 
    template<typename T2, typename Rep2> 
    Array& operator= (Array<T2, Rep2> const& b) { 
        assert(size()==b.size()); 
        for (size_t idx = 0; idx<b.size(); ++idx) { 
            expr_rep[idx] = b[idx]; 
        } 
        return *this; 
    } 
    … 
}; 
The assignment operator computes each element of the 
destination x by applying the subscript operator to the representation 
of the right side, the type of which is
A_Add<double, 
      A_Mult<double, A_Scalar<double>, SArray<double> >, 
      A_Mult<double, SArray<double>, SArray<double> > > > 
Carefully tracing this subscript operator shows that for a 
given subscript idx, it computes
(1.2*x[idx]) + (x[idx]*y[idx])
which is exactly what we want.
18.2.5 Expression Templates Assignments
It is not possible to instantiate write operations for an array 
with a Rep argument that is built on our example A_Mult and 
A_Add expression templates. (Indeed, it makes no sense to write a+b 
= c.) However, it is entirely reasonable to write other expression 
templates for which assignment to the result is possible. For example, indexing 
with an array of integral values would intuitively correspond to subset 
selection. In other words, the expression
x[y] = 2*x[y];
should mean the same as
for (size_t idx = 0; idx<y.size(); ++idx) { 
    x[y[idx]] = 2*x[y[idx]]; 
} 
Enabling this implies that an array built on an expression 
template behaves like an lvalue (that is, is "writable"). The expression 
template component for this is not fundamentally different from, say, 
A_Mult, except that both const and non-const versions 
of the subscript operators are provided and they return lvalues 
(references):
// exprtmpl/exprops3.hpp template<typename T, typename A1, typename A2> class A_Subscript { public: // constructor initializes references to operands A_Subscript (A1 const & a, A2 const & b) : a1(a), a2(b) { } // process subscription when value requested T operator[] (size_t idx) const { return a1[a2[idx]]; } T& operator[] (size_t idx) { return a1[a2[idx]]; } // size is size of inner array size_t size() const { return a2.size(); } private: A1 const & a1; // reference to first operand A2 const & a2; // reference to second operand };
The extended subscript operator with subset semantics that was 
suggested earlier would require that additional subscript operators be added to 
the Array template. One of these operators could be defined as follows 
(a corresponding const version would presumably also be needed):
// exprtmpl/exprops4.hpp 
template<typename T, typename R1, typename R2> 
Array<T,A_Subscript<T,R1,R2> > 
Array<T,R1>::operator[] (Array<T,R2> const & b) { 
    return Array<T,A_Subscript<T,R1,R2> > 
           (A_Subscript<T,R1,R2>(a.rep(),b.rep()));
Performance and Limitations of Expression Templates
To justify the complexity of the expression template idea, we 
have already invoked greatly enhanced performance on arraywise operations. As 
you trace what happens with the expression templates, you'll find that many 
small inline functions call each other and that many small expression template 
objects are allocated on the call stack. The optimizer must perform complete 
inlining and elimination of the small objects to produce code that performs as 
well as manually coded loops. The latter feat is still rare among C++ compilers 
at the time of this writing.
The expression templates technique does not resolve all the 
problematic situations involving numeric operations on arrays. For example, it 
does not work for matrix-vector multiplications of the form
x = A*x;
where x is a column vector of size n and 
A is an n-by-n matrix. The problem here is that a 
temporary must be used because each element of the result can depend on each 
element of the original x. Unfortunately, the expression template loop 
updates the first element of x right away and then uses that newly 
computed element to compute the second element, which is wrong. The slightly 
different expression
x = A*y;
on the other hand, does not need a temporary if x and 
y aren't aliases for each other, which implies that a solution would 
have to know the relationship of the operands at run time. This in turn suggests 
creating a run-time structure that represents the expression tree instead of 
encoding the tree in the type of the expression template. This approach was 
pioneered by the NewMat library of Robert Davies (see [new matt]). It was known long before 
expression templates were developed.
Expression templates aren't limited to numeric computations 
either. An intriguing application, for example, is Jaakko Järvi and Gary 
Powell's Lambda Library (see [lamddalid]). This library uses standard 
library function objects as expression objects. For example, it allows us to 
write the following:
void lambda_demo (std::vector<long*> & ones) { 
    std::sort(ones.begin(), ones.end(), *_1 > *_2); 
} 
This short code excerpt sorts an array in increasing order of 
the value of what its elements refer to. Without 
the Lambda library, we'd have to define a simple (but cumbersome) 
special-purpose functor type. Instead, we can now use simple inline syntax to 
express the operations we want to apply. In our example, _1 and 
_2 are placeholders provided by the Lambda library. They correspond to 
elementary expression objects that are also functors. They can then be used to 
construct more complex expressions using the techniques developed in this 
chapter.
Afternotes
Expression templates were developed independently by Todd 
Veldhuizen and David Vandevoorde (Todd coined the term) at a time when member 
templates were not yet part of the C++ programming language (and it seemed at 
the time that they would never be added to C++). This caused some problems in 
implementing the assignment operator: It could not be parameterized for the 
expression template. One technique to work around this consisted of introducing 
in the expression templates a conversion operator to a Copier class 
parameterized with the expression template but inheriting from a base class that 
was parameterized only in the element type. This base class then provided a 
(virtual) copy_to interface to which the assignment operator could 
refer. Here is a sketch of the mechanism (with the template names used in this 
chapter):
template<typename T> 
class CopierInterface { 
  public: 
    virtual void copy_to(Array<T, SArray<T> >&) const; 
}; 
template<typename T, typename X> 
class Copier : public CopierBase<T> { 
  public: 
    Copier(X const &x): expr(x) {} 
    virtual void copy_to(Array<T, SArray<T> >&) const { 
        // implementation of assignment loop 
        … 
    } 
  private: 
    X const &expr; 
}; 
template<typename T, typename Rep = SArray<T> > 
class Array { 
  public: 
    // delegated assignment operator 
    Array<T, Rep>& operator=(CopierBase<T> const &b) { 
        b.copy_to(rep); 
    }; 
    … 
}; 
template<typename T, typename A1, typename A2> 
class A_mult { 
  public: 
    operator Copier<T, A_Mult<T, A1, A2> >(); 
    … 
}; 
This adds another level of complexity and some additional 
run-time cost to expression templates, but even so the resulting performance 
benefits were impressive at the time.
The C++ standard library contains a class template 
valarray that was meant to be used for applications that would justify 
the techniques used for the Array template developed in this chapter. A 
precursor of valarray had been designed with the intention that 
compilers aiming at the market for scientific computation would recognize the 
array type and use highly optimized internal code for their operations. Such 
compilers would have "understood" the types in some sense. However, this never 
happened (in part because the market in question is relatively small and in part 
because the problem grew in complexity as valarray became a template). 
Some time after the expression template technique was discovered, one of us 
(Vandevoorde) submitted to the C++ committee a proposal that turned 
valarray essentially into the Array template we developed 
(with many bells and whistles inspired by the existing valarray 
functionality). The proposal was the first time that the concept of the 
Rep parameter was documented. Prior to this, the arrays with actual 
storage and the expression template pseudo-arrays were different templates. When 
client code introduced a function foo() accepting an array—for 
example,
double foo(Array<double> const&);
calling foo(1.2*x) forced the conversion for the 
expression template to an array with actual storage, even when the operations 
applied to that argument did not require a temporary. With expresssion templates 
embedded in the Rep argument it is possible instead to declare
template<typename R> double foo(Array<double, R> const&);
and no conversion happens unless one is actually needed.
The valarray proposal came late in the C++ 
standardization process and practically rewrote all the text regarding 
valarray in the standard. It was rejected as a result, and instead, a 
few tweaks were added to the existing text to allow implementations based on 
expression templates. However, the exploitation of this allowance remains much 
more cumbersome than what was discussed here. At the time of this writing, no 
such implementation is known, and standard valarrays are, generally 
speaking, quite inefficient at performing the operations for which they were 
designed.
Finally, it is worth observing here that many of the pioneering 
techniques presented in this chapter, as well as what later became known as the 
STL,  were all originally implemented on the same 
compiler: version 4 of the Borland C++ compiler. This was perhaps the first 
compiler that made template programming widely available to the C++ programming 
community.





















