Expression Templates
In this chapter we explore a template programming technique
called expression templates. It was originally
invented in support of numeric array classes, and that is also the context in
which we introduce it here.
A numeric array class supports numeric operations on whole
array objects. For example, it is possible to add two arrays, and the result
contains elements that are the sums of the corresponding values in the argument
arrays. Similarly, a whole array can be multiplied by a scalar, meaning that
each element of the array is scaled. Naturally, it is desirable to keep the
operator notation that is so familiar for built-in scalar types:
Array<double> x(1000), y(1000);
…
x = 1.2*x + x*y;
For the serious number cruncher it is crucial that such
expressions be evaluated as efficiently as can be expected from the platform on
which the code is run. Achieving this with the compact operator notation of this
example is no trivial task, but expression templates will come to our
rescue.
Expression templates are reminiscent of template
metaprogramming. In part this is due to the fact that expression templates rely
on sometimes deeply nested template instantiations, which are not unlike the
recursive instantiations encountered in template metaprograms. The fact that
both techniques were originally developed to support high-performance (see our
example using templates to unroll loops on page 314) array operations probably
also contributes to a sense that they are related. Certainly the techniques are
complementary. For example, metaprogramming is convenient for small, fixed-size
array whereas expression templates are very effective for operations on
medium-to-large arrays sized at run time.
Temporaries and Split Loops
To motivate expression templates, let's start with a
straightforward (or maybe "naive") approach to implement templates that enable
numeric array operations. A basic array template might look as follows
(SArray stands for simple array):
// exprtmpl/sarray1.hpp #include <stddef.h> #include <cassert> template<typename T> class SArray { public: // create array with initial size explicit SArray (size_t s) : storage(new T[s]), storage_size(s) { init(); } // copy constructor SArray (SArray<T> const& orig) : storage(new T[orig.size()]), storage_size(orig.size()) { copy(orig); } // destructor: free memory ~SArray() { delete[] storage; } // assignment operator SArray<T>& operator= (SArray<T> const& orig) { if (&orig!=this) { copy(orig); } return *this; } // return size size_t size() const { return storage_size; } // index operator for constants and variables T operator[] (size_t idx) const { return storage[idx]; } T& operator[] (size_t idx) { return storage[idx]; } protected: // init values with default constructor void init() { for (size_t idx = 0; idx<size(); ++idx) { storage[idx] = T(); } } // copy values of another array void copy (SArray<T> const& orig) { assert(size()==orig.size()); for (size_t idx = 0; idx<size(); ++idx) { storage[idx] = orig.storage[idx]; } } private: T* storage; // storage of the elements size_t storage_size; // number of elements };
The numeric operators can be coded as follows:
// exprtmpl/sarrayops1.hpp // addition of two SArrays template<typename T> SArray<T> operator+ (SArray<T> const& a, SArray<T> const& b) { SArray<T> result(a.size()); for (size_t k = 0; k<a.size(); ++k) { result[k] = a[k]+b[k]; } return result; } // multiplication of two SArrays template<typename T> SArray<T> operator* (SArray<T> const& a, SArray<T> const& b) { SArray<T> result(a.size()); for (size_t k = 0; k<a.size(); ++k) { result[k] = a[k]*b[k]; } return result; } // multiplication of scalar and SArray template<typename T> SArray<T> operator* (T const& s, SArray<T> const& a) { SArray<T> result(a.size()); for (size_t k = 0; k<a.size(); ++k) { result[k] = s*a[k]; } return result; } // multiplication of SArray and scalar // addition of scalar and SArray // addition of SArray and scalar …
Many other versions of these and other operators can be
written, but these suffice to allow our example expression:
// exprtmpl/sarray1.cpp #include "sarray1.hpp" #include "sarrayops1.hpp" int main() { SArray<double> x(1000), y(1000); … x = 1.2*x + x*y; }
This implementation turns out to be very inefficient for two
reasons:
-
Every application of an operator (except assignment) creates at least one temporary array (that is, at least three temporary arrays of size 1,000 each in our example, assuming a compiler performs all the allowable temporary copy eliminations).
-
Every application of an operator requires additional traversals of the argument and result arrays (approximately 6,000 doubles are read, and approximately 4,000 doubles are written in our example, assuming only three temporary SArray objects are generated).
What happens concretely is a sequence of loops that operates
with temporaries:
tmp1 = 1.2*x; // loop of 1,000 operations // plus creation and destruction of tmp1 tmp2 = x*y // loop of 1,000 operations // plus creation and destruction of tmp2 tmp3 = tmp1+tmp2; // loop of 1,000 operations // plus creation and destruction of tmp3 x = tmp3; // 1,000 read operations and 1,000 write operations
The creation of unneeded temporaries often dominates the time
needed for operations on small arrays unless special fast allocators are used.
For truly large arrays, temporaries are totally unacceptable because there is no
storage to hold them. (Challenging numeric simulations often try to use all the
available memory for more realistic results. If the memory is used to hold
unneeded temporaries instead, the quality of the simulation will suffer.)
Early implementations of numeric array libraries faced this
problem and encouraged users to use computed assignments (such as +=,
*=, and so forth) instead. The advantage of these assignments is that
both the argument and the destination are provided by the caller, and hence no
temporaries are needed. For example, we could add SArray members as
follows:
// exprtmpl/sarrayops2.hpp // additive assignment of SArray template<class T> SArray<T>& SArray<T>::operator+= (SArray<T> const& b) { for (size_t k = 0; k<size(); ++k) { (*this)[k] += b[k]; } return *this; } // multiplicative assignment of SArray template<class T> SArray<T>& SArray<T>::operator*= (SArray<T> const& b) { for (size_t k = 0; k<size(); ++k) { (*this)[k] *= b[k]; } return *this; } // multiplicative assignment of scalar template<class T> SArray<T>& SArray<T>::operator*= (T const& s) { for (size_t k = 0; k<size(); ++k) { (*this)[k] *= s; } return *this; }
With operators such as these, our example computation could be
rewritten as
// exprtmpl/sarray2.cpp #include "sarray2.hpp" #include "sarrayops1.hpp" #include "sarrayops2.hpp" int main() { SArray<double> x(1000), y(1000); … // process x = 1.2*x + x*y SArray<double> tmp(x); tmp *= y; x *= 1.2; x += tmp; }
Clearly, the technique using computed assignments still falls
short:
-
The notation has become clumsy.
-
We are still left with an unneeded temporary tmp.
-
The loop is split over multiple operations, requiring a total of approximately 6,000 double elements to be read from memory and 4,000 doubles to be written to memory.
What we really want is one
"ideal loop" that processes the whole expression for each index:
int main()
{
SArray<double> x(1000), y(1000);
…
for (int idx = 0; idx<x.size(); ++idx) {
x[idx] = 1.2*x[idx] + x[idx]*y[idx];
}
}
Now we need no temporary array and we have only two memory
reads (x[idx] and y[idx]) and one memory write (x[k])
per iteration. As a result, the manual loop requires only approximately 2,000
memory reads and 1,000 memory writes.
Given that on modern, high-performance computer architectures
memory bandwidth is the limiting factor for the speed of these sorts of array
operations, it is not surprising that in practice the performance of the simple
operator overloading approaches shown here is one or two orders of magnitude
slower than the manually coded loop. However, we would like to get this
performance without the cumbersome and error-prone effort of writing these loops
by hand or using a clumsy notation.
Encoding Expressions in Template Arguments
The key to resolving our problem is not to attempt to evaluate
part of an expression until the whole expression has been seen (in our example,
until the assignment operator is invoked). Thus, before the evaluation we must
record which operations are being applied to which objects. The operations are
determined at compile time and can therefore be encoded in template
arguments.
For our example expression
1.2*x + x*y;
this means that the result of 1.2*x is not a new array
but an object that represents each value of
x multiplied by 1.2. Similarly,
x*y must yield each element of
x multiplied by each corresponding element
of y. Finally, when we need the values of the resulting array,
we do the computation that we stored for later evaluation.
Let's look at a concrete implementation. With this
implementation we transform the written expression
1.2*x + x*y;
into an object with the following type:
A_Add< A_Mult<A_Scalar<double>,Array<double> >, A_Mult<Array<double>,Array<double> > >
We combine a new fundamental Array class template with
class templates A_Scalar, A_Add, and A_Mult. You may
recognize a prefix representation for the syntax tree corresponding to this
expression . This
nested template-id represents the operations involved and the types of the
objects to which the operations should be applied. A_Scalar is
presented later but is essentially just a placeholder for a scalar in an array
expression.
Figure 18.1. Tree representation of expression 1.2*x+x*y
18.2.1 Operands of the Expression Templates
To complete the representation of the expression, we must store
references to the arguments in each of the A_Add and A_Mult
objects and record the value of the scalar in the A_Scalar object (or a
reference thereto). Here are possible definitions for the corresponding
operands:
// exprtmpl/exprops1.hpp #include <stddef.h> #include <cassert> // include helper class traits template to select wether to refer to an // ''expression template node'' either ''by value'' or ''by reference.'' #include "exprops1a.hpp" // class for objects that represent the addition of two operands template <typename T, typename OP1, typename OP2> class A_Add { private: typename A_Traits<OP1>::ExprRef op1; // first operand typename A_Traits<OP2>::ExprRef op2; // second operand public: // constructor initializes references to operands A_Add (OP1 const& a, OP2 const& b) : op1(a), op2(b) { } // compute sum when value requested T operator[] (size_t idx) const { return op1[idx] + op2[idx]; } // size is maximum size size_t size() const { assert (op1.size()==0 || op2.size()==0 || op1.size()==op2.size()); return op1.size()!=0 ? op1.size() : op2.size(); } }; // class for objects that represent the multiplication of two operands template <typename T, typename OP1, typename OP2> class A_Mult { private: typename A_Traits<OP1>::ExprRef op1; // first operand typename A_Traits<OP2>::ExprRef op2; // second operand public: // constructor initializes references to operands A_Mult (OP1 const& a, OP2 const& b) : op1(a), op2(b) { } // compute product when value requested T operator[] (size_t idx) const { return op1[idx] * op2[idx]; } // size is maximum size size_t size() const { assert (op1.size()==0 || op2.size()==0 || op1.size()==op2.size()); return op1.size()!=0 ? op1.size() : op2.size(); } };
As you can see, we added subscripting and size-querying
operations that allow us to compute the size and the values of the elements for
the array resulting from the operations represented by the subtree of "nodes"
rooted at the given object.
For operations involving arrays only, the size of the result is
the size of either operand. However, for operations involving both an array and
a scalar, the size of the result is the size of the array operand. To
distinguish array operands from scalar operands, we define a size of zero for
scalars. The A_Scalar template is therefore defined as follows:
// exprtmpl/exprscalar.hpp // class for objects that represent scalars template <typename T> class A_Scalar { private: T const& s; // value of the scalar public: // constructor initializes value A_Scalar (T const& v) : s(v) { } // for index operations the scalar is the value of each element T operator[] (size_t) const { return s; } // scalars have zero as size size_t size() const { return 0; }; };
Note that scalars also provide an index operator. Inside the
expression, they represent an array with the same scalar value for each
index.
You probably saw that the operator classes used a helper class
A_Traits to define the members for the operands:
typename A_Traits<OP1>::ExprRef op1; // first operand typename A_Traits<OP2>::ExprRef op2; // second operand
This is necessary because of the following: In general, we can
declare them to be references because most temporary nodes are bound in the
top-level expression and therefore live until the end of the evaluation of that
complete expression. The one exception are the A_Scalar nodes. They are
bound within the operator functions and might not live until the end of the
evaluation of the complete expression. Thus, to avoid that the members refer to
scalars that don't exist anymore, for scalars the operands have to get copied
"by value." In other words, we need members that are
-
constant references in general:
OP1 const& op1; // refer to first operand by reference OP2 const& op2; // refer to second operand by reference
-
but ordinary values for scalars:
OP1 op1; // refer to first operand by value OP2 op2; // refer to second operand by value
This is a perfect application of traits classes. The traits
class defines a type to be a constant reference in general, but an ordinary
value for scalars:
// exprtmpl/exprops1a.hpp /* helper traits class to select how to refer to an ''expression template node'' * - in general: by reference * - for scalars: by value */ template <typename T> class A_Scalar; // primary template template <typename T> class A_Traits { public: typedef T const& ExprRef; // type to refer to is constant reference }; // partial specialization for scalars template <typename T> class A_Traits<A_Scalar<T> > { public: typedef A_Scalar<T> ExprRef; // type to refer to is ordinary value };
Note that since A_Scalar objects refer to scalars in
the top-level expression, those scalars can use reference types.
18.2.2 The Array Type
With our ability to encode expressions using lightweight
expression templates, we must now create an Array type that controls
actual storage and that knows about the expression templates. However, it is
also useful for engineering purposes to keep as similar as possible the
interface for a real array with storage and one for a representation of an
expression that results in an array. To this end, we declare the Array
template as follows:
template <typename T, typename Rep = SArray<T> > class Array;
The type Rep can be SArray if Array
is a real array of storage, or it can be the nested template-id such as
A_Add or A_Mult that encodes an expression. Either way we are
handling Array instantiations, which considerably simplify our later
dealings. In fact, even the definition of the Array template needs no
specializations to distinguish the two cases, although some of the members
cannot be instantiated for types like A_Mult substituted for
Rep.
[1] It is convenient to reuse the previously developed SArray here, but in an industrial-strength library, a special-purpose implementation may be preferable because we won't use all the features of SArray.
Here is the definition. The functionality is limited roughly to
what was provided by our SArray template, although once the code is
understood, it is not hard to add to that functionality:
// exprtmpl/exprarray.hpp #include <stddef.h> #include <cassert> #include "sarray1.hpp" template <typename T, typename Rep = SArray<T> > class Array { private: Rep expr_rep; // (access to) the data of the array public: // create array with initial size explicit Array (size_t s) : expr_rep(s) { } // create array from possible representation Array (Rep const& rb) : expr_rep(rb) { } // assignment operator for same type Array& operator= (Array const& b) { assert(size()==b.size()); for (size_t idx = 0; idx<b.size(); ++idx) { expr_rep[idx] = b[idx]; } return *this; } // assignment operator for arrays of different type template<typename T2, typename Rep2> Array& operator= (Array<T2, Rep2> const& b) { assert(size()==b.size()); for (size_t idx = 0; idx<b.size(); ++idx) { expr_rep[idx] = b[idx]; } return *this; } // size is size of represented data size_t size() const { return expr_rep.size(); } // index operator for constants and variables T operator[] (size_t idx) const { assert(idx<size()); return expr_rep[idx]; } T& operator[] (size_t idx) { assert(idx<size()); return expr_rep[idx]; } // return what the array currently represents Rep const& rep() const { return expr_rep; } Rep& rep() { return expr_rep; } };
As you can see, many operations are simply forwarded to the
underlying Rep object. However, when copying another array, we must
take into account the possibility that the other array is really built on an
expression template. Thus, we parameterize these copy operations in terms of the
underlying Rep representation.
18.2.3 The Operators
We have most of the machinery in place to have efficient
numeric operators for our numeric Array template, except the operators
themselves. As implied earlier, these operators only assemble the expression
template objects—they don't actually evaluate the resulting arrays.
For each ordinary binary operator we must implement three
versions: array-array, array-scalar, and scalar-array. To be able to compute our
initial value we need, for example, the following operators:
// exprtmpl/exprops2.hpp // addition of two Arrays template <typename T, typename R1, typename R2> Array<T,A_Add<T,R1,R2> > operator+ (Array<T,R1> const& a, Array<T,R2> const& b) { return Array<T,A_Add<T,R1,R2> > (A_Add<T,R1,R2>(a.rep(),b.rep())); } // multiplication of two Arrays template <typename T, typename R1, typename R2> Array<T, A_Mult<T,R1,R2> > operator* (Array<T,R1> const& a, Array<T,R2> const& b) { return Array<T,A_Mult<T,R1,R2> > (A_Mult<T,R1,R2>(a.rep(), b.rep())); } // multiplication of scalar and Array template <typename T, typename R2> Array<T, A_Mult<T,A_Scalar<T>,R2> > operator* (T const& s, Array<T,R2> const& b) { return Array<T,A_Mult<T,A_Scalar<T>,R2> > (A_Mult<T,A_Scalar<T>,R2>(A_Scalar<T>(s), b.rep())); } // multiplication of Array and scalar // addition of scalar and Array // addition of Array and scalar …
The declaration of these operators is somewhat cumbersome (as
can be seen from these examples), but the functions really don't do much. For
example, the plus operator for two arrays first creates an
A_Add<> object that represents the operator and the operands
A_Add<T,R1,R2>(a.rep(),b.rep())
and wraps this object in an Array object so that we
can use the result as any other object that represents data of an array:
return Array<T,A_Add<T,R1,R2> > (… );
For scalar multiplication, we use the A_Scalar
template to create the A_Mult object
A_Mult<T,A_Scalar<T>,R2>(A_Scalar<T>(s), b.rep())
and wrap again:
return Array<T,A_Mult<T,A_Scalar<T>,R2> > (… );
Other nonmember binary operators are so similar that macros can
be used to cover most operators with relatively little source code. Another
(smaller) macro could be used for nonmember unary operators.
18.2.4 Review
On first discovery of the expression template idea, the
interaction of the various declarations and definitions can be daunting. Hence,
a top-down review of what happens with our example code may help crystallize
understanding. The code we will analyze is the following (you can find it as
part of meta/exprmain.cpp):
int main()
{
Array<double> x(1000), y(1000);
…
x = 1.2*x + x*y;
}
Because the Rep argument is omitted in the definition
of x and y, it is set to the default, which is
SArray<double>. So, x and y are arrays with
"real" storage and not just recordings of operations.
When parsing the expression
1.2*x + x*y
the compiler first applies the leftmost * operation,
which is a scalar-array operator. Overload resolution thus selects the
scalar-array form of operator*:
template <typename T, typename R2> Array<T, A_Mult<T,A_Scalar<T>,R2> > operator* (T const& s, Array<T,R2> const& b) { return Array<T,A_Mult<T,A_Scalar<T>,R2> > (A_Mult<T,A_Scalar<T>,R2>(A_Scalar<T>(s), b.rep())); }
The operand types are double and Array<double,
SArray<double> >. Thus, the type of the result is
Array<double, A_Mult<double, A_Scalar<double>, SArray<double> > >
The result value is constructed to reference an
A_Scalar<double> object constructed from the double
value 1.2 and the SArray<double> representation of the
object x.
Next, the second multiplication is evaluated: It is an
array-array operation x*y. This time we use the appropriate
operator*:
template <typename T, typename R1, typename R2> Array<T, A_Mult<T,R1,R2> > operator* (Array<T,R1> const& a, Array<T,R2> const& b) { return Array<T,A_Mult<T,R1,R2> > (A_Mult<T,R1,R2>(a.rep(), b.rep())); }
The operand types are both Array<double,
SArray<double> >, so the result type is
Array<double, A_Mult<double, SArray<double>, SArray<double> > >
This time the wrapped A_Mult object refers to two
SArray<double> representations: the one of x and the one
of y.
Finally, the + operation is evaluated. It is again an
array-array operation, and the operand types are the result types that we just
deduced. So, we invoke the array-array operator +:
template <typename T, typename R1, typename R2> Array<T,A_Add<T,R1,R2> > operator+ (Array<T,R1> const& a, Array<T,R2> const& b) { return Array<T,A_Add<T,R1,R2> > (A_Add<T,R1,R2>(a.rep(),b.rep())); }
T is substituted with double whereas
R1 is substituted with
A_Mult<double, A_Scalar<double>, SArray<double> >
and R2 is substituted with
A_Mult<double, SArray<double>, SArray<double> >
Hence, the type of the expression to the right of the
assignment token is
Array<double, A_Add<double, A_Mult<double, A_Scalar<double>, SArray<double> >, A_Mult<double, SArray<double>, SArray<double>>>>
This type is matched to the assignment operator template of the
Array template:
template <typename T, typename Rep = SArray<T> > class Array { public: … // assignment operator for arrays of different type template<typename T2, typename Rep2> Array& operator= (Array<T2, Rep2> const& b) { assert(size()==b.size()); for (size_t idx = 0; idx<b.size(); ++idx) { expr_rep[idx] = b[idx]; } return *this; } … };
The assignment operator computes each element of the
destination x by applying the subscript operator to the representation
of the right side, the type of which is
A_Add<double, A_Mult<double, A_Scalar<double>, SArray<double> >, A_Mult<double, SArray<double>, SArray<double> > > >
Carefully tracing this subscript operator shows that for a
given subscript idx, it computes
(1.2*x[idx]) + (x[idx]*y[idx])
which is exactly what we want.
18.2.5 Expression Templates Assignments
It is not possible to instantiate write operations for an array
with a Rep argument that is built on our example A_Mult and
A_Add expression templates. (Indeed, it makes no sense to write a+b
= c.) However, it is entirely reasonable to write other expression
templates for which assignment to the result is possible. For example, indexing
with an array of integral values would intuitively correspond to subset
selection. In other words, the expression
x[y] = 2*x[y];
should mean the same as
for (size_t idx = 0; idx<y.size(); ++idx) { x[y[idx]] = 2*x[y[idx]]; }
Enabling this implies that an array built on an expression
template behaves like an lvalue (that is, is "writable"). The expression
template component for this is not fundamentally different from, say,
A_Mult, except that both const and non-const versions
of the subscript operators are provided and they return lvalues
(references):
// exprtmpl/exprops3.hpp template<typename T, typename A1, typename A2> class A_Subscript { public: // constructor initializes references to operands A_Subscript (A1 const & a, A2 const & b) : a1(a), a2(b) { } // process subscription when value requested T operator[] (size_t idx) const { return a1[a2[idx]]; } T& operator[] (size_t idx) { return a1[a2[idx]]; } // size is size of inner array size_t size() const { return a2.size(); } private: A1 const & a1; // reference to first operand A2 const & a2; // reference to second operand };
The extended subscript operator with subset semantics that was
suggested earlier would require that additional subscript operators be added to
the Array template. One of these operators could be defined as follows
(a corresponding const version would presumably also be needed):
// exprtmpl/exprops4.hpp
template<typename T, typename R1, typename R2>
Array<T,A_Subscript<T,R1,R2> >
Array<T,R1>::operator[] (Array<T,R2> const & b) {
return Array<T,A_Subscript<T,R1,R2> >
(A_Subscript<T,R1,R2>(a.rep(),b.rep()));
Performance and Limitations of Expression Templates
To justify the complexity of the expression template idea, we
have already invoked greatly enhanced performance on arraywise operations. As
you trace what happens with the expression templates, you'll find that many
small inline functions call each other and that many small expression template
objects are allocated on the call stack. The optimizer must perform complete
inlining and elimination of the small objects to produce code that performs as
well as manually coded loops. The latter feat is still rare among C++ compilers
at the time of this writing.
The expression templates technique does not resolve all the
problematic situations involving numeric operations on arrays. For example, it
does not work for matrix-vector multiplications of the form
x = A*x;
where x is a column vector of size n and
A is an n-by-n matrix. The problem here is that a
temporary must be used because each element of the result can depend on each
element of the original x. Unfortunately, the expression template loop
updates the first element of x right away and then uses that newly
computed element to compute the second element, which is wrong. The slightly
different expression
x = A*y;
on the other hand, does not need a temporary if x and
y aren't aliases for each other, which implies that a solution would
have to know the relationship of the operands at run time. This in turn suggests
creating a run-time structure that represents the expression tree instead of
encoding the tree in the type of the expression template. This approach was
pioneered by the NewMat library of Robert Davies (see [new matt]). It was known long before
expression templates were developed.
Expression templates aren't limited to numeric computations
either. An intriguing application, for example, is Jaakko Järvi and Gary
Powell's Lambda Library (see [lamddalid]). This library uses standard
library function objects as expression objects. For example, it allows us to
write the following:
void lambda_demo (std::vector<long*> & ones) { std::sort(ones.begin(), ones.end(), *_1 > *_2); }
This short code excerpt sorts an array in increasing order of
the value of what its elements refer to. Without
the Lambda library, we'd have to define a simple (but cumbersome)
special-purpose functor type. Instead, we can now use simple inline syntax to
express the operations we want to apply. In our example, _1 and
_2 are placeholders provided by the Lambda library. They correspond to
elementary expression objects that are also functors. They can then be used to
construct more complex expressions using the techniques developed in this
chapter.
Afternotes
Expression templates were developed independently by Todd
Veldhuizen and David Vandevoorde (Todd coined the term) at a time when member
templates were not yet part of the C++ programming language (and it seemed at
the time that they would never be added to C++). This caused some problems in
implementing the assignment operator: It could not be parameterized for the
expression template. One technique to work around this consisted of introducing
in the expression templates a conversion operator to a Copier class
parameterized with the expression template but inheriting from a base class that
was parameterized only in the element type. This base class then provided a
(virtual) copy_to interface to which the assignment operator could
refer. Here is a sketch of the mechanism (with the template names used in this
chapter):
template<typename T> class CopierInterface { public: virtual void copy_to(Array<T, SArray<T> >&) const; }; template<typename T, typename X> class Copier : public CopierBase<T> { public: Copier(X const &x): expr(x) {} virtual void copy_to(Array<T, SArray<T> >&) const { // implementation of assignment loop … } private: X const &expr; }; template<typename T, typename Rep = SArray<T> > class Array { public: // delegated assignment operator Array<T, Rep>& operator=(CopierBase<T> const &b) { b.copy_to(rep); }; … }; template<typename T, typename A1, typename A2> class A_mult { public: operator Copier<T, A_Mult<T, A1, A2> >(); … };
This adds another level of complexity and some additional
run-time cost to expression templates, but even so the resulting performance
benefits were impressive at the time.
The C++ standard library contains a class template
valarray that was meant to be used for applications that would justify
the techniques used for the Array template developed in this chapter. A
precursor of valarray had been designed with the intention that
compilers aiming at the market for scientific computation would recognize the
array type and use highly optimized internal code for their operations. Such
compilers would have "understood" the types in some sense. However, this never
happened (in part because the market in question is relatively small and in part
because the problem grew in complexity as valarray became a template).
Some time after the expression template technique was discovered, one of us
(Vandevoorde) submitted to the C++ committee a proposal that turned
valarray essentially into the Array template we developed
(with many bells and whistles inspired by the existing valarray
functionality). The proposal was the first time that the concept of the
Rep parameter was documented. Prior to this, the arrays with actual
storage and the expression template pseudo-arrays were different templates. When
client code introduced a function foo() accepting an array—for
example,
double foo(Array<double> const&);
calling foo(1.2*x) forced the conversion for the
expression template to an array with actual storage, even when the operations
applied to that argument did not require a temporary. With expresssion templates
embedded in the Rep argument it is possible instead to declare
template<typename R> double foo(Array<double, R> const&);
and no conversion happens unless one is actually needed.
The valarray proposal came late in the C++
standardization process and practically rewrote all the text regarding
valarray in the standard. It was rejected as a result, and instead, a
few tweaks were added to the existing text to allow implementations based on
expression templates. However, the exploitation of this allowance remains much
more cumbersome than what was discussed here. At the time of this writing, no
such implementation is known, and standard valarrays are, generally
speaking, quite inefficient at performing the operations for which they were
designed.
Finally, it is worth observing here that many of the pioneering
techniques presented in this chapter, as well as what later became known as the
STL, were all originally implemented on the same
compiler: version 4 of the Borland C++ compiler. This was perhaps the first
compiler that made template programming widely available to the C++ programming
community.