Revision as of 08:28, 25 February 2011

This document gives a practical introduction to the BOOST Libraries oriented to common tasks in scientific computing. For those using C++ as a programming language in general, BOOST is a very useful set of libraries that help programming in C++ by providing tools and commonly used idioms that solve recurring problems that emerge when developing in this language, even at a fairly low level. It also encourages certain style of programming, where generality and efficiency are priority. These tools include macro definitions, template functions and classes, functions, generic data containers, and common algorithms. It can be viewed as a natural extension of the C++ Standard Template Library (STL).

General Remarks

BOOST, like C++, has a very steep learning curve and the syntax can become sometimes very ugly and complicated, but it faster to learn BOOST than to reinvent the wheel and come with buggy home-brew solutions that at the end of the day will look even more ugly and won't be that good anyway. (By good I mean generic and efficient.) Note that I am not comparing BOOST and C++ with other tools like Matlab, or Fortran, or Python. Those languages are likely to do better (and quick development) than C++ in many specific areas. Boost also closes the gap with those languages without introducing modifications to the core C++ language.

Libraries included in Boost range from very simple ones (like Boost.Conversion) to very complex ones (like Boost.MPI and Boost.BGL).

The aim of this document/tutorial is to incrementally document the utility and working practical examples. Extra documentation added by users is very welcome, specially in the area of non-trivial but concise examples, related to our common programming problems (i.e. scientific computing).

Topics covered in the first stages of this document will include: Boost building and installation, Boost.MultiArray and Boost.MPI. That are the libraries that I have to deal with at this point. Ideally I would like to add Boost.uBlas in the future and document examples of small libraries that are very usefull for numerical tasks. The focus will be to document the interaction of these libraries with numerical libraries such as FFTW and Lapack, and above all, the documentation of practical example code.

BOOST build and installation

It is a bless that most Linux distributions now include some versions of Boost, at least the compiled libraries. Many Boost libraries are header-only, which means that no linking is necessary (an #include<boost/ ... > line set us ready to use a particular library). For the rest of the libraries, linking is not hard either, it is just a matter of adding something like "-lboost_NAME" during compilation (with certain naming convention). The current tutorial has been tested in wcr.stanford.edu, with GCC 3 and in my personal Ubuntu with GCC 4. Some libraries seem to compile a little better (fewer warnings) with GCC 4. If you are going to install GCC 4, do it before going ahead with this tutorial.

The difficulty with linking is that most Boost distributions in the operating system are outdated; for example the current version of Boost is 1.37, while the most recent distributions come with version 1.34 at best. Once in a while we will need a library not included in older releases (for example Boost.MPI). For "header-only" libraries the situation is not too bad because the header files can be downloaded and they ready to use. For binary libraries we need to compile them. There are tools to compile specific libraries but it will be easier for the moment to compile the whole Boost set.

Below are the instructions for version 1.37/8 or 1.39. An alternative build instructions (including Windows system) is provided here.

Also if you need Boost.MPI install the mpi compiler before continuing.

Versions 1.37 and 1.38

Before going thorugh these instructions be aware that if you need these versions of boost in your computer you can check whether your distribution already ships with them. For example Ubuntu 9.10 ships with Boost 1.38.1 and it can be installed just by doing

 sudo apt-get install libboost-dev

which doesn't include the mpi libraries by default.

If the aforementioned instructions doesn't work or if you need Boost.MPI proceed as follows:

As usual, to be kind with our usage in administrated system, we will install Boost in our user space, for example in $HOME/usr.

 mkdir $HOME/usr

the directories ~/usr/include/boost and ~/usr/lib will be created and populated after the building of Boost. Be brave and install the last current version from the link bellow:

 mkdir $HOME/soft
 cd $HOME/soft
 wget http://internap.dl.sourceforge.net/sourceforge/boost/boost_1_37_0.tar.gz

If the download fails, try to download the last version from Sourceforge Boost download page.

Then decompress it:

 tar -zxvf boost_1_37_0.tar.gz
 cd boost_1_37_0
 ./configure --prefix=$HOME/usr

Append the following quoted line to user-config.jam to include Boost.MPI among the installed libraries.

 echo "using mpi : mpicxx ;" >> user-config.jam

where mpicxx can be replaced by your mpi compiler wrapper, for example "/usr/lib/mpich/bin/mpiCC", or first selected interactively with "mpi-selector-menu" if is available. Then compile and install the package

 make
 make install

Go lunch, that takes ~30 minutes. After successful compilation and installation, all the header files will be installed in ~/usr/include/boost-1-37/boost/*.hpp and the binary linkable files will be in ~/usr/lib/libboost_*.

Depending on how you want to compile your programs it might be necessary to add these directories to the include path of compilation for example, LD_LIBRARY_PATH. It can also be useful to make a link to the more standard usr/include/boost/:

 cd ~/usr/include
 ln -s boost-1-37/boost boost

Versions 1.39 and up

You can download an specific version of Boost (To see what is the last version of Boost go to http://sourceforge.net/projects/boost/files/boost)

BOOST_VER=1_44_0
mkdir -p $HOME/usr
mkdir -p $HOME/soft
cd $HOME/soft
wget http://downloads.sourceforge.net/boost/boost_$BOOST_VER.tar.gz
tar -zxvf boost_$BOOST_VER.tar.gz
cd boost_$BOOST_VER

or you can use the SVN to download (and update) the latest version

mkdir -p $HOME/usr
mkdir -p $HOME/soft/boost_svn
cd $HOME/soft/boost_svn
svn co http://svn.boost.org/svn/boost/trunk
cd trunk
grep BOOST_LIB_VERSION boost/version.hpp #will show the nominal version of the lib

Then execute

./bootstrap.sh --prefix=$HOME/usr --libdir=$HOME/usr/lib$CLUSTER
#if you need mpi, replace "mpicxx" by your mpi compiler wrapper (e.g. mpicxx.mpich, or /opt/mvapich-gnu-gen2-0.9.9/bin/mpicxx)
echo "using mpi : mpic++ ;" >> project-config.jam 
NUM_CORES=`cat /proc/cpuinfo | grep processor | wc -l`
time ./bjam -j $NUM_CORES cxxflags=-fPIC | tee bjam.out
./bjam install --prefix=$HOME/usr --libdir=$HOME/usr/lib$CLUSTER

It takes ~30 minutes (or ~15 minutes in two processors or ~3 minutes in 8 processors).

Then create the following link (not necessary in version >=1.40):

cd ~/usr/include
ln -s boost-1_39/boost boost

Boost 1.40 can be installed in Ubuntu 9.10 with the line

sudo aptitude install libboost1.40-all-dev

which includes Boost MPI (compiled with openMPI).

Add the following variable to your environment

export LD_LIBRARY_PATH=$HOME/usr/lib

Possible Issues

Boost compilation may require bzlib library installed (libz2-dev package in Ubuntu, if it is not available use the -sNO_COMPRESSION=1 as a bjam option), or complain about python headers (python-dev pack in Ubuntu)

Intel compiler for Boost.MPI

The following steps allows us to compile boost-mpi 1.42.0 library using intel compiler (on su-ahpcrc).

tar zxvf boost_1_42_0.tar.gz
cd boost_1_42_0
./bootstrap.sh --with-toolset=intel-linux --with-libraries=mpi --prefix=$HOME/usr-intel --libdir=$HOME/usr-intel/lib
echo "using mpi : mpicxx ;" >> project-config.jam
vi tools/build/v2/tools/mpi.jam
time ./bjam -j $NUM_CORES --with-mpi toolset=intel | tee bjam.out
./bjam install --prefix=$HOME/usr-intel

In the line with command vi, we need to modify the mpi.jam file (at line 208) to something like:

   if $(otherflags) {
      for unknown in $(unknown-features)
      {
        result += "$(unknown)$(otherflags:J= )" ;
      }
   }

The line in the original mpi.jam file reads:

       result += "$(unknown)$(otherflags)" ;

This bug fix is discussed at here.

Also see here.

bjam "-sINTEL_PATH=/opt/intel/cce/10.0.014/" -sTOOLS=intel-linux -sINTEL_CC="icc -xT -O3 -no-prec-div " -sINTEL_CXX="icc -xT -O3 -no-prec-div" "-sBUILD=release <debug-symbols>on" >& bjam-build-icc-xT-O3-no-prec-div.log &

Remove Boost

To remove an old or incomplete installation

rm -rf ~/usr/lib/libboost_* ~/usr/include/boost*

Boost.Array Library

Boost.Array is one of the simplest Boost libraries yet it solves a common problem in C++. That is that C++ fixed size arrays are not objects but simple pointers whose size (number of elements) cannot be passed to functions unless we add an argument to the function with the number of elements.

double a[] = {1,2,3}; double* b=0; // Typical C-arrays
double f(double* c, int c_size){ ... }  // Typical function taking C-array by reference

To alleviate the problem the Array library defines a class called boost::array<T,N> where T is the type of the elements and N is the (fixed) size of elements.

The syntax is very easy but it can be tricky sometimes

 boost::array<double, 3> a;
 a[0]=1;
 a[1]=2; 
 a[2]=3;

They can be used as other STL-containers:

 assert(a.size()==3);
 for(double* i=a.begin(); i!=a.end(); ++i) std::cout<<*i<<std::endl;

The corresponding (internal) mutable C-array can be accessed by the member function .c_array(). The member .data() is a constant C-array view. Arrays of the same size can be compared, swapped, copied, filled, etc (see detailed specification), things that can not be directly done with C-arrays.

Comparison with other containers

std::vector<T> in some sense plays a similar role and it has the same memory layout (stored elements are contiguous in memory). But in the case of std::vector the size is not set at compile time but it is dynamic. This extra flexibility comes at a cost. First, there is a (small) overhead in std::vector. Second, sometimes we want to express that some array has a constant size.

Programmatically, static information of this type can be accessed with ::static_size and ::value type (http://codepad.org/gOnoJebu).

Some technical differences with other containers: 1) initial elements may have an undetermined value even if they have default constructor 2) there is no efficient swap() (no constant complexity) 3) no allocator support.

Initialization

Canonical

There not much else to say about this library. The only tricky part is how to construct and define a boost::array 'in place'. To do so we still have to use the old literal C++-fixed arrays and put a double curly bracket:

 boost::array<double, 3> a={{1,2,3}};

Let's say we have a function 'f' that takes a boost::array<int,3> and we want to call the function but do not want to create an explicit temporary, in this case some intuitive syntaxes don't work

 f({{1,2,3}}); //doesn't work
 f(boost::array<int, 3>(1,2,3)) //doesn't work
 f(boost::array<int, 3>({1,2,3})) //doesn't work
 f(boost::array<int, 3>({{1,2,3}})) //doesn't work

what eventually works is the following (see http://codepad.org/IEHmUEmu)

 f((boost::array<int,3>){{1,2,3}}); //does work for some compilers, note the parenthesis

This trick is undocumented and it works in modern versions of the compilers, for example in g++ 4.3.3. For old compilers, a named variable has to be created:

 boost::array<int,3> tmp={{1,2,3}};
 f(tmp);

Another problem is that if we initialize the array with less elements than necessary the compiler will not complain.

 boost::array<int,3> a={{1,2}}; // a[2] may be initialized to zero without warning

Any C-array can be used to be copied to a boost::array but that requieres an additional line of code and it is not strictly an initialization.

boost::array<int, N> a;
std::copy(carr, carr+N, a.begin());

Besides the tricky syntax for initialization, these objects are useful to explicitly say that some arrays are not just pointers and that they have a size that is determined during compilation and does not depend on the state of the program. Also we can forget about the problems associated with built-in fixed arrays and pointer errors.

Using Boost.Assign

Alternatively a library called Boost.Assign can be used

#include<boost/assign/list_of.hpp>
namespace assign = boost::assign;
boost::array<double,3> a=assing::list_of(1.)(2.)(3.);

Boost.Assign works with boost::array, but it is also more general. The same method can be used to initialize other containers, such std::vector, std::list, or std::set

using boost::assing::list_of;
const std::list<int> primes = list_of(2)(3)(5)(7)(11);

It is one of the few ways to initialize constant (const) containers.

std::maps can by initialized with map_list_of(key1,value1)(key2,value2)...(keyN,valueN), containers of tuples can be initialized with tuple_list_of.

Although not strickly connected with the problem of initialization the library can also be used to assign values to containers very easily. Without the library we usually need something like

std::vector<int> a(10);
a[0]=1;
a[1]=2;
...
a[9]=10;

With the library this long code can be replaced by

#include <boost/assign/std/vector.hpp>
#include <boost/assert.hpp>

using namespace boost::assign; //brings operator+= for std::vector into scope
vector<int> v;
v += 1,2,3,repeat(10,4),5,6,7,8,9;
// v = [1,2,3,4,4,4,4,4,4,4,4,4,4,5,6,7,8,9]

(This doesn't work with boost::array, because arrays can not grow dynamically).

Replacing raw c-arrays

There are certain advantages in using boost::array instead of plain c-array. For example, boost::array are first class objects, they can be duplicated, or copied, and there is no complications with pointers. If you ever programmed in C you will remember how easily c-arrays decay into pointers, loosing track of its lengths, and how much code is then devoted to take care of the pointed elements. Boost.Array also provides range (out of bound) checking in debug mode. Also, boost::array can be used where other STL containers and algorithms can be used.

Then, a question arises, if have a lot of code that uses plain old c-arrays, for example, in function calls

void f(unsigned size, const double c[])

how do we pass a boost array a. The answer is that the old function can be called in three different ways:

f(a.size(), a.data()); //gives a pointer to const element
f(a.size(), a.c_array()); //gives a raw pointer whose pointee can be modified
f(a.size(), &a[0]); //but it is not clear whether a elements can be modified in the

The other way around, in which we write a new function that takes advantage of boost::array

g(boost::array<double, 3>& a);

but the code is handling c-arrays from the caller side. Then one possibility is to

double c[3];
...
boost::array<double, 3> const& c_as_barray = *(boost::array<double, 3>*)c; // = (boost::array<double, 3>&)c; may not work
g(c_as_barray);

This trick can be used even if the function modifies the argument, modifications are reflected in the original array.

The function can be generalised to different sizes by using templates

template<unsigned N>
g(boost::array<double, N>& a){ ... }

and calling g<3>(a) . If the size is not known at the time of coding (because the size is a variable) then boost::array is not the right container, you can use std::vector instead (part of STL).

g(std::vector<double>& a) ...

std::vector should be used when the size of the array is only known at runtime, otherwise use boost::array.

Multidimensional arrays via Boost.Array

Another feauture of boost::array is that array of arrays (of arrays...) can be defined, and intialized with its elements.

using boost::array;
array<array<double, 2>,3> a={{
 {{1,2}},
  {{3,4}},
  {{5,6}}
}};

then a[0..2][0..1] can be used. (Most brackets could be removed in GCC 4.). This should replace the c-array

double ca[3][2] = {1,2,3,4,5,6};

In both cases the elements will be continuous in memory.

(A similar effect also can be achieved with vector<vector<double> >, the important difference is that in this case each vector can have a different size and the elements are not contiguous.)

A shortcut can be defined with template typedefs:

struct array2<class T, unsigned D1, unsigned D2>{
  typedef array< array<T, D2>, D1> type;
  ... //optional metainfo
};
...
array2<double, 2, 3>::type a;

See example code here http://codepad.org/Gp2AOP92.

Using boost::array in general is slightly better than using c-arrays because boost::array can never be confused with a pointer. For better support of multidimensional arrays (or for dynamic sizing) read the following section.

Boost.MultiArray Library

The MultiArray is a library solves a very annoying problem with C/C++ for our area. C/C++ have a very limited support for arrays and the situation is even worst for multidimensional arrays (2, 3 or more indices). Passing arrays to functions is very error prone, and functions with endless parameters with array dimensions must be continuously passed around. Moreover, writing algorithms for arbitrary array dimensionality is very hard. For one-dimensional arrays we have Boost.Array, std::vector and std::valarray. For multidimensional arrays we have Boost.MultiArray.

The C++ (or C) language was designed to be very flexible, there are infinite ways to arrange multidimensional arrays in memory and infinite ways to resize them, so the C++ language did not enforce any particular standard. Yet for scientific computing (which is only a subset of all computing areas), we can agree that there are a few multidimensional arrangements that make sense in numerical problems: namely, contiguous blocks memory with either column-major ordering (Fortran-convention) or row-major ordering (C-convention).

Among this reasonable constrains of the memory layout, MultiArray provides a very flexible, very small and nice interface. The official MultiArray manual provides rather simple but useful examples to begin with. Below there are more examples more suitable for existing numerical programs.

From the simple examples it can be noticed MultiArray usage is very similar to the default C-arrays with few very convenient differences 1) no explicit allocation (ie. malloc/new) is needed 2) no explicit deallocation (free/delete[]) 3) indices ordering can be C-like or Fortran-like (useful for interfacing with Fortran routines) 4) Arrays keep track of their shape (no separate variables with size are needed).

The typing system of MultiArray is quite complicated and can be confusing, one way to alleviate this problem is to make extensive use of typedefs. (You may seldom benefit from using Template typedefs.)

Basic Usage (`multi_array`)

MultiArray is a header-only library -- no special linking is necessary. You just need to include the header file:

#include<boost/multi_array.hpp>

The usage of the multi_array class is pretty intuitive. First we declare and construct an variable of type multi_array, specifying its dimensionality (2) and its dimensions (100x100).

boost::multi_array<double, 2> A(boost::extents[10][10]);

The number of arguments of extents has to agree with the dimensionality. Then you can use the matrix as usual,

for(int i=0; i!=A.shape()[0]; ++i)
  for(int j=0; j!=A.shape()[1]; ++j)
    A[i][j]=i+j;

The elements can be accessed by a sequence of square brakets '[]' or passing a collection of integers with the parenthesis operator '()' as in this example example:

boost::array<int, 2> idxs={{1,2}};
A(idxs)+=1.2;

For an N-dimensional array we would need N similar loops. For any dimensionality, the elements can be iterated in its underlying memory representation:

for(double* it=A.origin(); it!=A.origin()+A.num_elements(); ++it)
  (*it)*=3;

In this example, the whole multiarray is multiplied by 3, regardless of its dimensionality. Note that num_elements()==shape()[0]*shape()[1]...shape()[N].

The dimensionality of a *particular array* can be known by using A.num_dimensions(). The dimensionality of the *type* can be known by using the static constant "::dimensionality".

 typedef boost::multi_array<double, 4> marray;
 marray A;
 assert( A.num_dimensions() == marray::dimensionality );

The official documentation and this paper have additional examples and some explanation of the library internals.

Using MultiArray on existing code together with C-arrays (`multi_array_ref`)

MultiArray can be adapted easily in programs that already use raw-arrays since the first element to the contiguous array in memory can be known for functions that expose the pointer to the array in memory; the shape of the array can be also accessed:

boost::multi_array<double, 2> A(boost::extents[10][10]); // allocates memory for 100 elements
...
double* A_ptr = A.origin();
int A_N0 = array.shape()[0]; // A_N0 <- 10
int A_N1 = array.shape()[1]; // A_N1 <- 10
...
// use the array in a typical c-function
somecfunction(A_N0, A_N1, A_ptr); // or as cfunction(A.shape()[0], A.shape()[1], A.origin() );
...
// .. no need to free memory, it is deallocated when A goes out of scope

Such function can even modify the elements of the array. In the example the array memory is managed by the multi_array A and therefore the memory is freed when A goes out of scope. Sometimes, for existing applications, this is not possible because the memory is managed in some other complicated way. In that case we can still take advantage of Boost.MultiArray by using the boost::multi_array_ref class, that is the same as a boost::array except that it does not manage the underlaying memory:

int A_N0 = 10; int A_N1=10;
double* A_arr = new double[N1*N2]; // or = malloc(sizeof(double)*A_N0*A_N1);
...
boost::multi_array_ref<double, 2> A_ref(A_arr, boost::extents[A_N0][A_N1]);
...
// access A_arr through A_ref or call a MultiArray aware function
somemultiarrayfunction(A_ref);
...
delete[] A_arr; //or free(A_arr); (do not use A_view after this point!)

In this way we can go back and forth form C-arrays (raw-arrays) to boost::multi_array. Generalizations to higher dimensions are similarly easy. (One of the strong points of MultiArray is that dimensionality can be arbitrary). Let me stress that in the first case the memory is automatically managed for us, while in the second case we have to take care of that by some external (even manual) process.

A simple FFTW example

In the context of numerical libraries the following is an example to call FFTW3 in combination with MultiArray. (Remember to Install FFTW3 first.)

boost::multi_array<std::complex<double>, 2> A(boost::extents[10][10]);
for(int i=0; i!=10; i++)
  for(int j=0; j!=10; j++)
    A[i][j]= std::complex<double>(i,j);
...
fftw_plan p=fftw_plan_dft_2d(A.shape()[0], A.shape()[1], A.origin(), A.origin(), FFTW_FORWARD, FFTW_ESTIMATE); // in place
fftw_execute(p); 
fftw_destroy_plan(p);

Advanced interface of FFTW take stride arguments as well, this is not a limitation for MultiArray since, for example, array_view's (see below) also keep track of internal strides (e.g. A_center.stride()[n] is the stride in direction n of the referenced memory).

Generic code, but not too generic (Boost.Utility enable_if and Boost.TypeTraits)

As soon as we start working with MultiArray we will recognize than there are several kinds of multiarrays: 1) multi_array, 2) multi_array_ref, 3) const_multi_array_ref and in general they are useful in different contexts. The first is the plain object that manages its own memory, the second is brings read write access to an existing C-array and the last one gives read-only access. On top of that when specifying partial indices (e.g. a[3] in a 2-dimensional array) the type is 4) subarray, and if we do this is a const_ version we get a 5) const_subarray. And all these types are formally different. This complicated type system has the drawback that there is no way to write a generic function that takes any of them unless we use templates.

So instead of repeating 5 versions of some function

return_type some_function(multi_array<double, 2>& ma){ ...
return_type some_function(const_multi_array<double,2>& ma){ ...
return_type some_function(multi_array_ref<double,2>& ma){ ...
return_type some_function(subarray<double,2>& ma){ ...

we write a single template function

template<class MultiArray>
return_type some_function(MultiArray& ma){ ...

return_type is just to indicate the return type of the function, it could be for example void. When calling this function it will work with any of the mentioned types. This solves the problem of the multiple types as long as the actual deduced type behaves like a multi_array the program will work. The problem is that now the function is too generic. In our first five versions of the function we were specific enough to only work with arrays of doubles but now we lost that constrain and it maybe that the function logically shouldn't work for not-real numbers.

For example the some_function could rely on comparing (ordering) different elements, if that is the case then when using the function with an array of complex<double> the compiler will complain. This not an ideal solution because the error will usually appear in mysterious places. Or it can be more subtle than that the function may rely and we may not even get an error from the compiler but silently have a bug.

Here is were enable_if (part of Boost.Utility) allows us to constraint function arguments is some situations. What enable_if does when within the function declaration/specification (see later) is to disable or enable a particular instantiation of the template function depending on some condition. For example this condition can be that the elements of the array is a double (an nothing else). There are several ways to use enable_if, but there are mainly two for functions definitions (the example described here). Continuing with the example, we can

#include <boost/utility/enable_if.hpp>
#include "boost/type_traits.hpp" // for is_same
using boost::enable_if_c;
using boost::is_same

1) Replace the return type (return_type) with an enable_if

template<typename MultiArray> 
typename enable_if_c<is_same<typename MultiArray::element, double>::value, 
 return_type>::type return_type some_function(MultiArray& ma){ ...

This is the preferable syntax since the condition is right before the function name. Another possibility is to

2) Replace and input parameter with an enable_if that on success gives the type of that original parameter

template<typename MultiArray>
return_type some_function(
  typename enable_if_c<is_same<typename MultiArray::element, double>::type,
  MultiArray >::type & ma){ ...

This can be also preferable because the constrain on the generic type MultiArray is next to the parameter itself.

Option 1) is not always applicable: some special functions don't have return types (i.e. class constructors), in this case we can

3) Add a dummy optional parameter

template<typename MultiArray>
return_type some_function(
  MultiArray& ma, typename enable_if_c<is_same<typename MultiArray::element, double>::value, void*>::type = 0
){ ...

The '=0' makes the last parameter optional which means that we can still use the function by passing the usual parameter. This optional parameter is not meant to be used at all during the function call.

In the three cases the effect is the same, function is invisible if we try to call it with a kind of multi_array whose elements are not doubles.

boost::multi_array<double, 2> adbl;
some_function(adbl); //ok, some_function will be called
boost::multi_array<std::complex<double>, 2> acmplx;
some_function(acmplx); // compiler error, no such function (i.e. not for std::complex arrays)

In the example we constrained the multiarray by its elements type and relied in the the function is_same from Boost.TypeTraits library. is_same<T1,T2>::value is (a constant) true if T1 is identical to T2 and (the constant) false otherwise. Of course the value is obtained at compilation time.

We could have constrained the function to take only arrays with certain dimensionality, is that case we could also have used enable_if_c<MultiArray::dimensionality==2, ...>::type as an improvement over more crude approaches like compile-time checking BOOST_STATIC_ASSERT(MultiArray::dimensionality==2).

One difference with respect to BOOST_STATIC_ASSERT is that when using enable_if the kind of error given by the compiler is the usual 'no matching function'. In short, use static assert to force the compiler to issue an error directly if the type is not correct but use enable_if if you want the compiler to try with a different (overloaded) function and if this fails then issue a function not found error.

The reason that makes enable_if work is too complicated to explain here, but I only will mention that it is because of a feature of C++ called SFINAE[1] (an acronym that doesn't seem to describe what it's supposed to describe). Luckly nowadays most compilers support this feature. I admit that the syntax becomes so complicated that I wouldn't call it a feature but an embarrassment of the language. Another rough corner of C++.

Fortran ordering and Fortran functions (`fortran_storage_order`)

Functions in libraries like Lapack expect arrays in column-major storage order format, this option of storage is supported by Boost.MultiArray. (For linear algebra matrices it is more recommendable to use Boost.uBlas which is more oriented to linear algebra operations and one or two dimensional arrays, i.e. vectors and matrices.)

The ordering of the elements in memory is controlled by an option that can be set when declaring the array. (By default, if not specified, boost::c_storage_order is used.)

boost::multi_array<double,2> A(boost::extents[10][10], boost::fortran_storage_order() );
... // assign A value with Fortran index ordering

In order to imitate Fortran convention, there is even an option to declare the multi_array to be 1-based (instead of 0-based) (see below).

We then can call a Fortran function as:

 int A_N1=A.shape()[0]; int A_N2=A.shape()[1];
 fortran_function(&A_N1, &A_N2, A.origin()); // Fortran arguments are passed by reference(pointer)

or write a wrapper to the Fortran function

 wrapper_function(boost::multi_array<double, 2>& in){
   int in_N1=in.shape()[0]; int in_N2=A.shape()[1];
   fortran_function(&in_N1, &in_N2, in.origin()); // Fortran arguments are passed by reference(pointer)
 }

and then call it simply by doing 'wrapper_function(A);'.

Calling a Fortran function that takes stride arguments can be done by means of subarray's and this case is not covered here.

The difference between c_storage_order and fortran_storage_order is illustrated in this paste.

Lapack examples

This section shows how to combine fortran_storage_ordering, ::shape() and ::origin() in order to call Fortran routines. There are better ways to use linear algebra routines from C++, like Boost.uBlas, this section is intended to show the usage of the multi_array.

Note that the Lapack routines are called twice, this is because the first call is used to estimate the working space, which is allocated manually. The real work is done by the second call. Lapack functions are declared using the 'extern "C" void' prefix.

Singular Value Decomposition

The design of the SVD class for decomposing complex matrices follows the one from Numerical Recipe 3rd edition (page 67) and the actual lapack calling is taken from this example. The following is the complete program, note the use of fortran_storage_order in order to maintain the fortran convention. The decomposition (hard work) is called from the constructor of the lapack::svd class, then the decomposition matrices are the member multi_arrays, u (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{U}} ), vh (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{V}^\dagger} ) (2-dimensional) and w ( $\mathbf {w}$ ) (1-dimensional vector) ( $\mathbf {A} =\mathbf {U} \cdot diag(\mathbf {w} )\cdot \mathbf {V} ^{\dagger }$ ). The contents of the original matrix are destroyed.

#if 0
	c++ -I$HOME/usr/include -D_TEST $0 -L$HOME/usr/lib -llapack -lptf77blas -latlas -lgfortran -lpthread -Wfatal-errors && ./a.out
	exit
#endif
#include<boost/multi_array.hpp>
#include<iostream>
/* ZGESVD prototype */
struct _dcomplex{double re, im;};
extern "C" void zgesvd_( char* jobu, char* jobvt, int* m, int* n, _dcomplex* a,
               int* lda, double* s, _dcomplex* u, int* ldu, _dcomplex* vt, int* ldvt,
               _dcomplex* work, int* lwork, double* rwork, int* info );

namespace lapack{
	using boost::multi_array;
	using boost::extents;
	class svd{
		public:
		multi_array<std::complex<double>, 2> u;
		multi_array<std::complex<double>, 2> vh;
		multi_array<double, 1> w;
		svd(multi_array<std::complex<double>, 2>& a) : 
		u (extents[a.shape()[0]][a.shape()[0]], boost::fortran_storage_order()), 
		vh(extents[a.shape()[1]][a.shape()[1]], boost::fortran_storage_order()),
		w (extents[std::min(a.shape()[0], a.shape()[1])]){
		int lwork = -1;
		std::complex<double> wkopt;
		std::vector<double> rwork(std::max(1u, 5*std::min(a.shape()[0], a.shape()[1])));
		int info=0;
		char All='A';
		zgesvd_(&All, &All, &(int&)a.shape()[0], &(int&)a.shape()[1], 
		  (_dcomplex*)a.origin(), &(int&)a.shape()[0], 
		  w.origin(), (_dcomplex*)u.origin(), &(int&)a.shape()[0], 
		  (_dcomplex*)vh.origin(), &(int&)a.shape()[1], 
		  &(_dcomplex&)wkopt, &lwork, &rwork[0], &info);
		  lwork = (int)wkopt.real();
		  std::vector<std::complex<double> > work(lwork);
		zgesvd_(&All, &All, &(int&)a.shape()[0], &(int&)a.shape()[1], 
		  (_dcomplex*)a.origin(), &(int&)a.shape()[0], 
		  w.origin(), (_dcomplex*)u.origin(), &(int&)a.shape()[0], 
		  (_dcomplex*)vh.origin(), &(int&)a.shape()[1], 
		  &(_dcomplex&)work[0], &lwork, &rwork[0], &info);
		assert(info<1);
	}
};
}

int main(){
	boost::multi_array<std::complex<double>,2 > a(boost::extents[2][2], boost::fortran_storage_order() );

	double data[2][2][2]={
           {{1., 0.}, {1., 0}},
           {{2., 0.}, {2., 0}}
          };
	a.assign((std::complex<double>*)data, (std::complex<double>*)data + a.num_elements());
	lapack::svd uvh(ma);
	std::clog
		<<"u   = {{"<<uvh.u[0][0]<<","<<uvh.u[0][1]<<"},\n"
	 	<<"       {"<<uvh.u[1][0]<<","<<uvh.u[1][1]<<"}}\n";
	std::clog
		<<"v^h = {{"<<uvh.vh[0][0]<<","<<uvh.vh[0][1]<<"\n"
	 	<<"       {"<<uvh.vh[1][0]<<","<<uvh.vh[1][1]<<"}}\n";
	std::clog
		<<"w   = {{"<<uvh.w[0]<<",0},\n"
		<<"       {0,"<<uvh.w[1]<<"}}\n";
	/*output
	u   = {{(-0.707107,0),(-0.707107,0)},
     	  {(-0.707107,0),(0.707107,0)}}
	v^h = {{(-0.447214,0),(-0.894427,0)
    	  {(0.894427,-0),(-0.447214,0)}}
	w   = {{3.16228,0},
    	  {0,1.98603e-16}}
	*/
	return 0;
}

QR Decomposition

Complex QR decomposition is performed in two stages, first the matrix R (upper triangular) and an intermediate representation of Q (unitary Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{Q}\mathbf{Q}^\dagger=\mathbf{1}} ) is obtained via Lapack ZGEQRF, second the proper Q is calculated with ZUNGQR. The design of the qrd class follows that of Numerical Recipes. After the decomposition the members q and r contains the respective decomposition (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{A} = \mathbf{Q}\mathbf{R}} , no conjugate, no transposition). During construction, it is checked that the multi_array has fortran ordering (the algorithm doesn't give QR decomposition for c-ordering). Output arrays also have fortran ordering.

#if 0
	cp $0 $0.cpp
	g++ -Wl,-rpath=$HOME/usr/lib -I$HOME/usr/include -I.. -D_TEST $0.cpp -o $0.cpp.x -L$HOME/usr/lib -llapack -lptf77blas -latlas -lgfortran -lpthread -lboost_system -Wfatal-errors && ./$0.cpp.x
	rm -f $0.cpp
	exit
#endif
#include<boost/multi_array.hpp>
#include<iostream>
#include<mathematica/archive.hpp>
/* ZGEQRF and ZUNGQR prototype */
struct _dcomplex{double re, im;};
extern "C" void zgeqrf_(int* m, int* n,         _dcomplex* a, int* lda, _dcomplex* tau, _dcomplex* work, int* lwork, int* info);
extern "C" void zungqr_(int* m, int* n, int* k, _dcomplex* a, int* lda, _dcomplex* tau, _dcomplex* work, int* lwork, int* info);
	namespace lapack{
		using boost::multi_array;
		using boost::extents;
		class qrd{
			public:
			multi_array<std::complex<double>, 2> q;
			multi_array<std::complex<double>, 2> r;
			multi_array<double, 1> w;
			qrd(multi_array<std::complex<double>, 2> const& a) : // q.r = a (a is destroyed after the fact, no transpose no conjugate)
				q (extents[a.shape()[0]][a.shape()[1]], boost::fortran_storage_order()), 
				r (extents[a.shape()[0]][a.shape()[1]], boost::fortran_storage_order()){
				assert(a.storage_order() == boost::fortran_storage_order() );
				std::vector<std::complex<double> > tau(std::min(a.shape()[0],a.shape()[1]));
				int m=a.shape()[0];
				int lda=std::max(1u,a.shape()[0]);
				_dcomplex* tau_=(_dcomplex*)&tau[0];
				_dcomplex work_size={0,0};			
				_dcomplex* a_=(_dcomplex*)a.origin();
				{
					int n=a.shape()[1];
					int lwork=-1;
					int info=0;
					//see http://www.intel.com/software/products/mkl/docs/webhelp/lse/functn_geqrf.html 
			 		zgeqrf_(&m, &n, a_, &lda, tau_, &work_size, &lwork, &info);
					assert(info==0);
					lwork=(int)work_size.re;
					std::vector<std::complex<double> > work(lwork);
					_dcomplex* work_=(_dcomplex*)&work[0];
					zgeqrf_(&m, &n,	a_, &lda, tau_, work_, &lwork, &info);
					assert(info==0);
					r=a;
					for(unsigned i=0; i!=r.shape()[0]; ++i)
						for(unsigned j=0; j!=std::min(i, r.shape()[1]); ++j)
							r[i][j]=0.;
				}
				{
					int n=std::min(a.shape()[1], a.shape()[0]);
					int k=n;
					int lwork = -1;
					int info=0;
					//see http://www.intel.com/software/products/mkl/docs/webhelp/lse/functn_ungqr.html
					zungqr_(&m, &n, &k, a_, &lda, tau_, &work_size, &lwork, &info);
					assert(info==0);
					lwork=(int)work_size.re;
					std::vector<std::complex<double> > work(lwork);
					_dcomplex* work_=(_dcomplex*)&work[0];
					zungqr_(&m, &n, &k, a_, &lda, tau_, work_, &lwork, &info);
					assert(info==0);
					q=a;
					q.resize(extents[a.shape()[0]][a.shape()[0]]); //exploits the fact that the extra elements are set to zero
				}
		   }
		};
	}
#ifdef _TEST
int main(){
	boost::multi_array<std::complex<double>,2 > a(boost::extents[3][3], boost::fortran_storage_order() );
	mathematica::oarchive oa(std::cout);
	double c_data[3][3][2]={
   	{{11., 0.}, {21., 0.},{32.,0}},
   	{{31., 0.}, {41., 1.},{21.,0}},
   	{{20., 1.}, {32., 2.},{45.,1.}}
   };
	boost::multi_array<std::complex<double>,2 > data(boost::extents[3][3], boost::c_storage_order() );
   data.assign((std::complex<double>*)c_data, (std::complex<double>*)c_data+data.num_elements());
   a=data;
   oa<<"a = "<<a;
	lapack::qrd qr(a);
   oa<<"q = "<<qr.q;
   oa<<"r = "<<qr.r;
	return 0;
}
#endif

Orthogonalization via Cholesky Decomposition

Given a set n linearly vectors v (stored as rows in V) a new set of n vectors w (stored in the rows of W) can be generated by the transformation

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{W} = {\mathbf{U}^\dagger}^{-1} \mathbf{V} }

where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{U}} is the upper triangular Cholesky decomposition of the square matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{O}}

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{U}^\dagger \mathbf{U} = \mathbf{O} }

in its turn Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{O}} is the array of overlaps.

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{O} = \mathbf{V} \mathbf{V}^\dagger }

The result is that

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{W}\mathbf{W}^\dagger = \mathbf{U}^{-1}\mathbf{V}\mathbf{V}^\dagger{\mathbf{U}^{-1}}^\dagger = \mathbf{U}^{-1}\mathbf{O}{\mathbf{U}^{-1}}^\dagger = \mathbf{U}^{-1}\mathbf{U}\mathbf{U}^\dagger{\mathbf{U}^{-1}}^\dagger = \mathbf{1} } , i.e. Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{W}} is an orthonormal set, as long as Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{V}} rows are linearly independent.

The array Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{O}} is the Hermitian square of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{V}} and can be obtained with Lapack ZHERK, the Cholesky decomposition (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{U}} ) is obtained via Lapack ZPOTR, finally the orthonormal (actually unitary because the implementation is for complex numbers) set is obtained with the solution of

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{U}^\dagger \mathbf{W} = \mathbf{V} }

given by ZTRSM.

The previous examples (SV and QR decompositions) only work properly for fortran_storage_order (if the order is c_storage_order the functions don't run). In contrast this implementation of orthogonalize works for both fortran and c-ordering and is this feature what makes the code complicated. (A more complicated code can even achive column orthogonalization also). In both cases the result is a set of orthonormal rows. (The original matrix is destroyed). The function uses a temporary square array with order equal to the number of rows in the input. Other operations are in-place and the output replaces the input.

namespace lapack{
	void orthogonalize_rows(multi_array<std::complex<double>,2>& v){
		multi_array<std::complex<double>,2> c(extents[v.shape()[0]][v.shape()[0]], boost::fortran_storage_order() );
		{// hermitian_square
			char uplo=(v.storage_order()==boost::fortran_storage_order())?'U':'L';
			char trans=(v.storage_order()==boost::fortran_storage_order())?'N':'C';
			int n=((trans=='N') or (trans=='n'))?v.shape()[0]:v.shape()[1];
			int k=((trans=='N') or (trans=='n'))?v.shape()[1]:v.shape()[0];
			if(not(v.storage_order()==boost::fortran_storage_order())) std::swap(n,k);
			double alpha=1.;
			_dcomplex* a_=(_dcomplex*)v.origin();
			int lda=((trans=='N') or (trans=='n'))?std::max(1, n):std::max(1, k);
			double beta=0.;
			_dcomplex* c_=(_dcomplex*)c.origin();
			int ldc=std::max(1, n);
			zherk_(&uplo, &trans, &n, &k, &alpha, a_, &lda, &beta, c_, &ldc);
		}
		{//cholesky decomposition
			char uplo=(v.storage_order()==boost::fortran_storage_order())?'U':'L';
			int n=c.shape()[0];
			_dcomplex* a_ = (_dcomplex*)c.origin();
			int lda=std::max(1, n);
			int info=0;
			zpotrf_(&uplo, &n, a_, &lda, &info);
			if(info>0) throw std::domain_error("matrix is not positive definite");
		}
		{//inverse of cholesky applied to original array via solution of hermitian system
			char side=(v.storage_order()==boost::fortran_storage_order())?'L':'R';
			char uplo=(v.storage_order()==boost::fortran_storage_order())?'U':'L';
			char transa='C';
			char diag='N';
			int m=v.shape()[0];
			int n=v.shape()[1];
			if(not(v.storage_order()==boost::fortran_storage_order())) std::swap(n,m);
			double alpha=1.;
			_dcomplex* a_=(_dcomplex*)c.origin();
			int lda=(side=='L' or side=='l')?std::max(1,m):std::max(1,n);
			_dcomplex* b_=(_dcomplex*)v.origin();
			int ldb=std::max(1,m);
	 		ztrsm_(&side, &uplo, &transa, &diag, &m, &n, &alpha, a_, &lda, b_, &ldb);
		}
		//now v is orthonormal (input is overwritten with orthonormal set)
	}catch(std::domain_error& e){
		throw std::domain_error("can not orthogonalize because vectors are not linearly independent");
	}
}

The code can be used as follows,

multi_array<std::complex<double>,2> v_fortr(extents[2][3], boost::fortran_storage_order() );
multi_array<std::complex<double>,2> v_clang(extents[2][3], boost::c_storage_order());
... fill v with values ...

both arrays formally contain 2 vectors of 3 components.

lapack::orthogonalize_rows(a_fortr);
lapack::orthogonalize_rows(a_clang);

in both cases the answer is the same, the only difference is how the numbers are arranged internally in memory. The trick is achieved by passing different options to the Lapack routines depending on the known ordering instead of explicitly transposing elements --which requieres copying the full arrays-- to accomodate for one or the other ordering).

(The algorithm throws if the input is not a linearly independent set, this is detected by the Cholesky decomposition, sometimes the lack of linear dependence is not detected but the routine returns vectors with almost zero components --in particular not normalized --).

As a reference, if the input is linearly independent, the output is numerically similar to Mathematica's Orthogonalize[A], except that Mathematica accepts linearly dependent input (giving zero vectors in the output).

Arbitrary ordering (and direction)

For 2 dimensional arrays the ordering can be either fortran or C-type, for higher dimensionality there are other possibilities. Sometimes we may need to work with some arbitrary ordering, for example some 3D FFT routines do effectively transpose the first and second index in order to optimize for speed. Instead of taking care of this transposition (either by actually performing the transposition in memory or by adding some ad-hoc logic to the program), it might be convenient to just declare a different memory storage:

int ordering[] = {1,0,2};
bool ascending[] = {true,true,true}; // we can choose to completely invert the memory order by 'false, false, false'

boost::multi_array A(extents[3][4][2], boost::general_storage_order<3>(ordering,ascending));

This also works for multi_array_ref

boost::multi_array_ref Aref(ptr, extents[3][4][2], boost::general_storage_order<3>(ordering,ascending));

In this case we can treat A and Aref with the conventional ordering of the index even if some external routine is transposing the internal memory layout.

The storage and ascending order can be accessed a posteriori by storage_order(), for example we can ensure fortran-ordering by doing

assert( a.storage_order()==boost::fortran_storage_order());

We can check that all dimensions are in ascending order (independently of fortran or c-ordering) by

assert( a.storage_order().all_dims_ascending() == true);

or we can check that the second dimension (index) is ascending and that it is the slowest-index.

assert( a.storage_order().ascending(1)==true and a.storage_order().ordering(1)==0 );

finally we can check that two arrays follow the exact same convention.

assert( a.storage_order() == b.storage_order() );

Assign from c-arrays

The assign method can be used to assign element from other containers (including c-arrays) into multi_array.

multi_array<double,2> a(extents[3][3]);
double a_data[2][2] =  //this is what I call a c-array
  {{1,2},
   {3,4}};
a.assign(a_data, a_data+4);

The trick can be used to assign complex elements too

multi_array<std::complex<double>, 2> b(extents[2][2]);
double b_data[2][2][2] =
  { { {1,0}, {2,0} },
    { {3,0}, {4,0} } };
b.assign( (std::complex<double>*)b_data, (std::complex<double>*)b_data+4);

(as a pointer to std::complex<double> b_data has effectivelly 4 elements). The trick exploits the fact that a complex number can be viewed as a struct with 2 double elements.

The ordering (X_storage_ordering) and the direction (asc/descending) does not affect how assign works. Assign fills the internal memory of the array independently of the storage convention. This is confusing because in the previous example if 'a' is declared as fortran_storage_order the transpose array is obtained.

A workaround for fortran_storage_order is to declare an intermediate array with c_storage_order and then copy the contents of one array into the other. Elements are copied element by element in the right order.

multi_array<double> d(extents[2][2], fortran_storage_order());
multi_array<double> d_corder(extents[2][2], c_storage_order());
d_coder.assign(a_data, a_data+4);
d = d_coder;

Custom index ranges (`reindex`)

In previous examples we used some obscure entity called boost::extents, it is obviously used to pass the valid range of indices to the matrix declaration. Technically extents is an (empty) global variable of type boost::extent_gen; this is just a trick to allow a nice syntax with arbitrary number of parameters, like passing extents[10][10][10][10] for a 4D-array.

The real power of this syntax is that we can really pass other types than just integers declaring the size in each dimension, namely we can declare ranges instead. For example, this will create a 1-based array of size 10x10.

 using boost::extents;
 using boost::range;
 boost::multi_array<double, 2> A(extents[range(1,11)][range(1,11)]);

In this case the elements A[0][0], A[0][1], A[1][0] are out-of-bounds, just like A[11][11]. Furthermore, we can create an array that is behaves completely like a Fortran-array by also indicating the ordering.

 boost::multi_array<double, 2> B_Flike(extents[range(1,11)][range(1,11)], boost::fortran_storage_order() );

Let also mention that we can reindex an array on the fly by calling the .reindex method:

 A.reindex(0); // now all dimensions are 0-based, valid elements 0..9 x 0..9
 ... // take advantage of 0-based array
 A.reindex(1); // 0-based index restored, valid elements 1..10 x 1..10

After reindexing the range of each dimension i is given by A.index_bases()[i] ... A.index_bases()[i]+A.shape()[i]. (Reindixing is just a new logical rearrangement of element referencing, no copies of the matrix or the elements is made.)

The indices can start an any arbitrary point (not just 0 or 1), even negative values -- in this example (taken from here), the indexing is totally arbitrary in each dimension:

 typedef boost::multi_array<double, 3> array3D;
 array3D A(extents[8][range(1,4)][range(-1,3)][range(10,20)]);

(Note that, as an argument, [8] is the same as [range(0,8)].)

It is also good to note that the memory allocated by an array is always located between A.origin() and A.origin()+A.num_elements() (non-inclusive).

Let's clarify the difference between A.data() and A.origin(). As we saw before origin() returns a pointer to the first element of the array, what ever this is. On the other hand, A.data() returns a pointer to the eventual location of A[0][0]..[0], regardless of whether that element is the first or not; such element may not even be part of the array (for example in 1-based arrays). In general we have to deal origin() and rarely with data(), but it is very easy to confuse one with the other.

Range Checking (and Boost.Assert)

Once the ranges for a multi_array are defined, the elements can be accessed using the square bracket operators. By default the library does check whether the indexes correspond to a valid element (index is not out of bounds). If it is not a valid element, the whole program stops with a failed assertion that looks like this:

./a.out: ... /boost/multi_array/base.hpp:136: 
Reference boost::detail::multi_array::value_accessor_n< ... >::access( ... ) const [with Reference =  
boost::detail::multi_array::sub_array< ... >, 1u>, ... , unsigned int NumDims = 2u]: 
Assertion `size_type(idx - index_bases[0]) < extents[0]' failed.

Thanks to the Boost.Assert facilities, this behavior can be suppressed (in which case there is not checking and overrun memory will be accessed) or changed to an C++ exception mode.

To disable the checking completely just #define BOOST_DISABLE_ASSERT before #including multi_array.hpp.

#define BOOST_DISABLE_ASSERT
#include<boost/multi_array.hpp>

The library will run faster because there is no range checking each time an element is accessed.

To change the behavior such that an exception is thrown if index is out of bounds, define the tag BOOST_ENABLE_ASSERT_HANDLER and the function void boost::assertion_failed(char const * expr, char const * function, char const * file, long line) before including the MultiArray library (http://codepad.org/JgXciACG):

#define BOOST_ENABLE_ASSERT_HANDLER
#include<stdexcept>
namespace boost{
  void assertion_failed(char const * expr, char const * function, char const * file, long line){
    if(string(expr)=="size_type(idx - index_bases[0]) < extents[0]"){
      throw std::range_error("multi_array index range error")");
    }else{
      throw std::runtime_error("generic multi_array error");
    }
  }
}
#include<boost/multi_array.hpp>

(Unfortunately the information about the actual index value that caused the error is not part of the information carried by the exception.) This change in behavior may not work if the program is linked to files already compiled without this definition, or if multi_array.hpp is included before this block is defined.

Now we are able to catch range errors:

multi_array<double, 3> a(extents[10][10][10]);
try{
  a[11][-1][20]=99;
}catch(std::range_error&){ ... }

Most Boost libraries use Boost.Assert to indicate errors. With this mechanisms users can choose the way to handle these errors. Using the exception version of Boost.Assert doesn't make the program run slower and it allow us to recover the program from some types of errors.

Boost.Assert should not be confused with Boost.StaticAssert.

Accessing elements

Elements of multiarrays can be accessed in two ways, by using operator[] in sequence (as in the above examples) or by using operator() with an array of indices as described here. This last form has the advantage that if a given combination of indexes needs to be used repeatedly then it can be defined only once. For example the following code applies a certain operation that depends on a complicated combination of indices.

for(int i=0; i!=a.shape()[0]; ++i){
  for(int j=0; j!=a.shape()[1]; ++j){
    for(int k=0; k!=a.shape()[2]; ++k){
      array<int, 3> idx={{i,j,k}}, idx2 = {{i+1==a.shape()[0]?0:i+1, j,k}};
      A(idx)  = f(idx) * B(idx2);
      A(idx) += g(idx) + C(idx2);  //f and g are functions defined elsewhere
    }
  }
}

Regular subarrays

One of the useful properties of raw-arrays is that *regular* subarrays can be defined simply by defining offsets and strides. MultiArray provides this kind of creation of subarray in a quite elegant manner. For example, given the array defined in the example above, the most simple subarray is the type of the object A[5] which is automatically a one-dimensional array containing a reference to the sixth (indices start at 0) row of the array, so for example (A[5])[1] = 0 changes the value of element A[5][1].

There are more complicated subarray views that can be defined in terms the original array. The interesting thing is that these are only 'views' of the original array, no copy of elements are made.

The basic examples are given in the official MultiArray tutorial. The following are some digested examples, which at the same time introduces 'typedef' statements to reduce verbosity. The following examples will create arrays from the original 2D array A:

typedef std::complex<double> complex;
typedef boost::multi_array<complex,2> array2d;
using boost::extents;

array2d A(extents[10][10]); // 2D, 100 allocated elements

Subarrays (`sub_array`)

These are simply references to the original array when the first n indexes are specified. The dimension of the resulting subarray is the number of dimensions of the original array minus n.
 
typedef array2d::subarray<1>::type subarray1d;
subarray1d A_row5 = A[5]; // 1D, 10 elements

Views (`array_view`)

typedef array2d::array_view<1>::type view1d;
typedef boost::multi_array_types::index_range range;
boost::multi_array_types::index_gen indices;

view1d A_row3_5to10 = A[3][indices[range(5,10)]]; // or A[indices[3][range(5,10)]
view1d A_row3_5to10odd = A[indices[3][range(5,10,2)];

typedef array2d::array_view<2>::type view2d;

view2d A_center = A[indices[range(2,7)][range(2,7)]]; // 2D, 25 elements
view2d A_upperleft_quadrant_oddonly = A[indices[range(0,5,2)][range(0,5,2)]]; // 2D, 4 (2x2) elements
view2d A_lowerright_quadrant_evenonly = A(range(5,10,2),range(5,10,2)); // 2D, 4 (2x2) elements

Last two arrays refer to odd and even elements of the original array.

Also note that something like

view1d A_error = A[indices[range(2,7)][range(2,7)]];'

is not allowed because the RHS is 2-dimensional and the LHS is declared to be 1-dimensional.

None of the arrays defined here make any copy of elements, remember that sub_arrays and array_view are just references, just another way to interpret (view) the data of original array.

Views are more general than subarrays. For example A[1] is an object of type subarray, which can be implicitly converted into a view. However A[1][indices[range(0,5)]] is an array_view object which can not be converted into a subarray.

Array views can be recursive, one can create an array view of an existing array view. For example, the even element of the even element view, is the 'every 4 elements' view of the original array.

Wrap view (code example) (`array_wrap_view`)

Most discrete Fourier transform routines (e.g. FFTW3 and Numerical Recipes) really compute the so-called 'in-order' transform. This means that the 'positive' frequencies are stored in the first half of the output and the 'negative' frequencies are stored in the second half of the output.

If the input data represents a continuous peridic function then applying operations on the output requires special code to treat the first and second half of the output data (quadrants in 2 dimensions). Numerical Recipes 3 (page 614), for example, implements a very basic class WrapVecDoub (one dimensional) that takes care of this ordering without affecting the output memory. In the same spirit we will implement a class wrap_array_view that takes care of the indexing in the context of array views.

namespace boost{
	template<class MultiArray> class wrap_array_view;
	template<typename Element, typename Value>
	struct return_helper{
		typedef wrap_array_view<Value> type;
	};
	template<typename Element>
	struct return_helper<Element, Element>{
		typedef Element type;
	};
	template<class MultiArrayRef>
	class wrap_array_view{
		MultiArrayRef ma_;
		public:
		wrap_array_view(MultiArrayRef const&ma) : ma_(ma){};
		typedef typename return_helper<
			typename MultiArrayRef::element&, 
			typename MultiArrayRef::reference>::type return_helper_type;
		return_helper_type operator[](int i){
			if(i<ma_.index_bases()[0]) i+=ma_.shape()[0];
			return return_helper_type(ma_[i]);
		}
	};
}//namespace boost

When accessing each element (e.g. rows) the code takes care of shifting the index to wrap the second half of the array, the object returned is also wrapped (recursive), if the return type is an element of the original array the return type is a reference to the element instead. (Wrapped version of an element --e.g. double-- usually do not make sense, thats why the ofuscated code of return_helper). The wrap_array_view object behaves as a reference to the original array and modifying its elements modify the original array. A small example code follows:

multi_array<double, 2> mm(extents[10][10]);
wrap_array_view<multi_array_ref<double, 2> > wmm(mm);
wmm[-5][-5]=111;
assert(mm[5][5]==111);
wmm[-1][-1]=222;
assert(mm[9][9]==222);

A simple FFTW wrapper

Here I show that it is very easy to build a wrapper of known libraries that will make the syntax of the main program very easy to understand by packaging common tasks.

The packaging of common tasks is traditionally achieved by writing simple C-like functions. Sometimes need more flexibility, like the one given by C++ classes. Naturally, this will need a deep understanding of the language and of the wrapped libraries. But once written they will make easier to program.

For example, in the case of FFTW, although written in C, it is obvious that 'fftw_plan' structure behaves almost as a C++ object: 1) It has a certain state [pointers and size of arrays], 2) lifetime [until fftw_destroy_plan] 3) and can be referred recurrently during it lifetime [can be executed several times during it lifetime]. If we combine this with what we already know of Boost.MultiArray we can design a robust interface. This is a draft of the wrapper classes:

 namespace fftw3{
   using namespace boost;
   class plan : private counted<plan>{
     plan(fftw_plan const& p) : p_(p) {}
     public:	
       template<class ConstMultiArray, class MultiArray>
         plan(ConstMultiArray const& in, MultiArray const& out, int direction, unsigned flags=FFTW_ESTIMATE){
         assert(boost::shape(in)==boost::shape(out));
         if(in.num_dimensions()>1){
           for(int i=in.num_dimensions()-1, dense_in=in.strides()[i], dense_out=out.strides()[i]; i!=-1; dense_in*=in.shape()[i], dense_out[i]*=out.shape()[i], --i){
 	    assert(dense_in==in.strides()[i] and dense_out==out.strides()[i]); // check for dense representation of multidimensional
           }
         }
         p_ = fftw_plan_many_dft(in.num_dimensions(), (const int *)in.shape(), 1 /*int howmany*/,
        (double(*)[2])in.origin() , 0 /*const int *inembed, 0=in.shape()*/,
                                   in.strides()[in.num_dimensions()-1] /*int istride*/, 0 /*int idist ignored for howmany=1*/,
                                   (double(*)[2])out.origin(), 0 /*const int *onembed, 0=out.shape()*/,
                                   out.strides()[out.num_dimensions()-1] /*int ostride*/, 0 /*int odist ignored for howmany=1*/,
                                   direction, flags);
       }
       void operator()() const{return fftw_execute(p_);}
       virtual ~plan(){fftw_destroy_plan(p_);}
     private:
       fftw_plan p_;
     public:
       static void cleanup(){
         if(get_count()<=0) throw std::logic_error("trying to call fftw3::cleanup() with "+boost::lexical_cast<std::string>(get_count())+" active plans, should be zero");
         fftw_cleanup();
       }
     };
     void execute(plan const& p) {p.operator()();}
 }

Then we can use it in the program:

 boost::multi_array<std::complex<double>,2> A(boost::extents[100][100]); 
 boost::multi_array<std::complex<double>,2> B(boost::extents[100][100]); // must have the same shape!
 fttw3::plan p(A,B, FFTW_FORWARD, FFTW_ESTIMATE);
 ... // assign values to A
 execute(p); // repeated times
 ... // no need to explicitly destroy plan

The wrapper is elegant in some aspects, 1) it is uniform for all input/output dimensionalities and for in-place/out-of-place transform, 2) special cases where array_views are passed can be handled by the same syntax without having to call the advanced FFTW functions, 3) no explicit destruction needed (no risk to execute a destroyed plan).

The wrapper can be used with the condition that the input and output arrays are stored densely in memory. (This is not the case for some array_views.) This is a limitation of the library (which was designed to take advantage of contiguous memory) and this limitation must be reflected in the wrapper. Besides that, the wrapper can deal with arrays, sub_arrays and array_views (with strides==1 if multidimensional).

In-place transform can be achieved simply by using the same argument in the first and second placeholder. The wrapper will also check for size compatibility.

Using the original arrays, there are many possible ways to use the wrapper:

 fftw3::plan p(A, B, FFTW_FORWARD); // 2D FFT 100x100
 fftw3::plan p(A, A, FFTW_FORWARD); // 2D inplace-FFT
 fftw3::plan p(A[21], B[32], FFTW_FORWARD); // 1D FFT 100 elements,
 fftw3::plan p(A[10][indices[range(5,10)], B[20][indices[range(5,10)]], FFTW_FORWARD); // 1D FFT on 5 elements

If not specified, the option FFTW_ESTIMATE is used. If using this default option the point at which the plan is created is unimportant. Instead, when using the options FFTW_MEASURE or FFTW_PATIENT, the location of plan creation matters. This is so because in these cases the arrays can be overwritten when FFTW looks for the optimal algorithm to use. That means that the plans should be created before the matrix is assigned values and after the matrix has the desired size and shape.

 multi_array<comple,2> A(extents[100][100]);
 fftw3::plan p(A,A, FFTW_FORWARD, FFTW_MEASURE); //overwrites values
 ... // assign values;
 execute(p); // performs the transform

Once the plans are created over existing arrays these arrays must be not reallocated! That means not resize operations please. Assignment (global or partial) is OK because in this case MultiArray guarranties no internal reallocation is done. (Global assignment is only allowed for arrays with same sizes). The is reflects a limitation of the FFTW in which case the array pointer doesn't have to change after the plan is created.

'Many' Transforms

The wrapper can also be asked to perform multiple individual transform, by assuming that the first index of the array is used to index smaller arrays and using the following syntax:

 fftw3::plan p=fftw3::plan::many(A, B, FFTW_FORWARD); // 100 1D-FFT of 100 elements each

note that this is not equivalent to the first example.

Again, the hard work is done by the library, which uses certain optimizations to perform repeated transforms of the same size.

Special Memory Allocation

FFTW recommends to use its own version of malloc to allocate arrays. This is supposedly to improve speed taking advantage of some special allocation that is exploited by the numerical routines. For example they recommend to allocate the arrays to be used in the following way:

 typedef std::complex<double> complex;
 complex* in=(complex*)fftw_malloc(sizeof(complex)*N); //instead of malloc(sizeof(complex)*N);
 fftw_free(in); //instead of free(in)

Up to this point we tried not to deal with memory allocation at all.

So, the question is: How can we take advantage of allocation of memory in a context of C++? (where manual allocation can --and should-- be avoided in most cases). The answer is 'custom allocators'. For STL containers and for multi_arrays, the allocator is passed as an extra template parameter.

In this example, I pass an special allocator that uses fftw_malloc internally, for the construction of an STL vector and also for a Boost.MultiArray multi_array.

 std::vector<complex, fftw3::allocator<complex> > v(100);
 boost::multi_array<complex, 3, fftw3::allocator<complex> > A(extents[100][100][100]);

The memory is freed (automatically) by fftw_free. After this, the object can be used like any other standard object. If the allocator is not specified, std::allocator<T> is used.

Declaring the objects in this way is optional in the same way as the usage of fftw_malloc instead of malloc is optional.

Reshape (`reshape`)

Although MultiArray objects can be resized, the underlying memory layout is not specially good for resizing operations. In general, for resizing, the memory is reallocated and *all* the elements copied. This is even true for operations reducing the size of the matrix (since elements have to be contiguous in memory). So, resizing operations should be avoided as much as possible and they will not be covered here (otherwise see here).

Multidimensional arrays are different from one dimensional arrays because they can be reshaped. In the library, reshape operations are different from resize in the sense that they preserve the total number of elements. The contained elements are rearranged logically but no copies or reallocations are performed. The reshape operation is done as follows:

 boost::multi_array<complex, 3> a(extents[5][4][3]); //assert(a.shape()[0]==5 and a.shape()[1]==4 and a.shape()[2]==3);  
 {
   boost::array<int,3> new_shape={{5,6,2}};           //assert(5*6*2==5*4*3);
   a.reshape(new_shape);                              //assert(a.shape()[0]==5 and a.shape()[1]==6 and a.shape()[2]==2);	
 }

In this example, the reshape operation is allowed because the number of elements is preserved since 5*6*2==5*4*3. If the number of elements is not preserved the operation will fail during execution.

The following three syntaxes are not allowed due to very strange technical limitations of C++:

 a.reshape({{5,6,2}});                       // does not compile
 a.reshape({5,6,2});                         // neither
 a.reshape(boost::array<int, 3>({{5,6,2}})); // neither

What eventually works (and only in new compilers) is the following syntax:

 a.reshape((boost::array<int,3>){{5,6,2}}); // watch for parenthesis

or, depending on the context, it might be better to use:

 using boost::array;
 typedef array<int,3> array3;
 a.reshape((array3){{5,6,2}});

Note 1: boost::array is not the same as boost::multi_array. boost::arrays objects are part of a much simpler library called Boost.Array. In this other library all the arrays are one-dimensional and the number after the element type indicates the (constant) number of elements and not the number of dimensions.

So far, the only constrain for reshaping is that it preserves the number of elements. The other requirement that is not very obvious is that the reshape operation must preserve the number of dimensions (in this case 3). This is not a big limitation since we can choose one of the dimensions to be 1 in size

 a.reshape((array<int,3>){{5,1,12}}); // or {{5,12,1}} depending on your application

Note 2: Do not try to do something like a.reshape((array<int,2>){{5,12}}) for a 3D array to change dimensionality, it compiles but gives unexpected results. In fact, there is no way to have 'a' change its dimensionality during the program execution, since it is of type multi_array<double,3>. Another thing that is very confusing is that reshape does not work with extents; do not do A.reshape(extents[5][4][3]). Again it compiles and gives the wrong result.

If you really need an object with smaller dimensionality with the same elements you can do

multi_array<complex, 3> a3D(extents[5][4][3]);
multi_array_ref<complex, 2> a2D_ref(a3D.origin(), extents[5][12]);

and use 'a2D_ref' instead of 'a3D'. In this case nobody will check that the sizes are compatible unless you add something like assert(a3D.num_elements()==a2D_ref.num_elements()); . Alternativelly we can do

multi_array_ref<complex, 2> a2D_ref(a3D.origin(), extents[a3D.shape()[0]][a3D.num_elements()/a3D.shape()[0]]);

// num_elements is always divisible by shape()[0]

to ensure that all elements are referenced.

Note 3: Of course you can always copy element by element from one type of array to another with different shape or different dimensionality. But this section is exactly about avoiding copies.

Example of reshape with FFTW

The FFTW wrapper presented earlier honors the dimensionality of the arrays passed. I mean by this that the dimensionality of the transform follows the dimensionality of the passed array instead of for example interpreting the input as a one dimensional array which would of course give a completely different result. In this section I will illustrate how to force FFTW to interpret the data differently.

For simplicity let us assume that we have a two dimensional array, and that we want to do our transforms in-place on complex data:

 typedef std::complex<double> complex;
 using boost::multi_array;
 using boost::extents;
 multi_array<complex, 2> A(extents[100][100]);
 ... // fill A with data

The simplest case is that we want a 2D FFT on the 2D data structure, I present it here as a review:

 fftw3::plan pf_A2A_2D(A, A, FFTW_FORWARD); // implicitly 2D transform
 execute(pf_A2A_2D);

Now suppose that we need to do a different thing that is to transform each row of the the array independently, the approach in this case will be different, for example we could iterate over the rows and perform a 1D transform (this is allowed because each A row is contiguous data in memory):

 for(int i=0; i!=A.shape()[0]; i++){
   fftw3::plan pf_Arow2Arow_1D(A[i],A[i], FFTW_FORWARD); // implicitly 1D transform
   execute(pf_Arow2Arow_1D);
 }

Below there is a note against using FFTW plans inside loops. A different approach, *to do the same thing* would be to use the 'many' option of the library and the wrapper. This can be faster because the library can take advantage of how the overall data is distributed.

 fftw3::plan pf_Arow2Arow_many_1D=fftw3::plan::many(A, A, FFTW_FORWARD); // many 1D transforms
 execute(pf_Arow2Arow_many_1D);

The last option is faster, there is no explicit loop, and only one plan is created, this is more elegant because the plan can be created just after the array is constructed, this is handy if want to create plans with the FFTW_MEASURE option. (But keep in mind that this is possible only because all the 1D transforms are performed inside a single multi_array.)

Note: There is also an insidious issue with creating many FFTW plans that I discovered accidentally. The library keeps track of all the created plans (even the 'destroyed' ones). Each created/destroyed plan leaves a small memory footprint. FFTW authors claim that this is for performance optimization purposes. But when creating, using and destroying plans repeatedly, let's say few million times, the used memory will grow up and the application crash. Authors of the library say this is not a memory leak, and in part they are right, just because there a function called fftw_cleanup() [called fftw3::plan::cleanup() in the wrapper] that makes sure all that memory is reclaimed. The problem is that the function must be called at some point where it is known that there are no plans active (the wrapper will checks for this condition). It is your responsibility to call this function and since there are only few points in the program at which we know that there are no plans active, where to call cleanup(), this *effectively* behaves like a memory leak. The way to avoid this altogether is to keep the number of created plans controlled, and specially avoid creating plans inside loops. The 'many' plan can save lots of plans to be created.

Now suppose that we have our data in a 2D multi_array but that we want to interpret the data differently and perform a long 1D transform over a contiguous sequence of rows in the array. This can be motivated by special boundary conditions in the problem but also illustrates the utility of reshape.

There are two options here, one using reshape and the other using multi_array_ref (see above). By using reshape we *temporarely* rearrange the array data (logically) and perform the transform:

 using boost::array;
 A.reshape((array<int,2>){{1,10000}}); //remember that original was 100x100
 fftw3::plan pf_Aunwrappedrows_1D(A[0],A[0], FFTW_FORWARD); // implicitly 1D
 A.reshape((array<int,2>){{100,100}});
 execute(pf_Aunwrappedrows_1D);

First, we reshape the original matrix to be effectively a one dimensional array. Strictly speaking A is 2D, but it has only one row A[0] and we create a 1D FFT plan for that single row that contains all the data. After the plan is created we can reshape the array again, since FFTW will remember the shape of the array. Another variant is to simply declare a plan:

 fftw3::plan pf_Aunwrappedrows_2D(A, A, FFTW_FORWARD);

Although in the case the arrays and the plans are 2D, the result should be the same as a 1D transform since there is only one row.

The second option is to create a multi_array reference with a different 'view' and work on that matrix:

 using boost::multi_array_ref;
 multi_array_ref<complex, 1> Aunwrapped_ref(A.origin(), extents[10000]);
 assert(Aunwrapped_ref.num_elements()==A.num_elements());
 fftw3::plan pf_Aunwrapped_1D(Aunwrapped_ref, Aunwrapped_ref, FFTW_FORWARD); // inplicitly 1D
 execute(pf_Aunwrapped_1D);

The FFTW will modify the reference matrix which is only a different way of changing the original array but the element access is logically different.

Boost.MPI Library

I will assume that it is the case that you know the very basics of C-MPI and you want to improve the readability and design of your code. If you don't know C-MPI, you may try to learn from this examples (the syntax is very nice to learn what MPI is supposed to do --in contrast with the ugly C version--) but be aware that Boost.MPI is not widely used.

I will try to work out an example that is not trivial by using FFTW in parallel. Hopefully this approach will be more related to our application area, while the official Boost.MPI pages are plenty of (deceptively) trivial examples.

Simple MPI FFTW example

In my opinion, using a real MPI numerical library is the only way to learn MPI and make it do useful work.

The example in this section uses the MPI version of FFTW3 which must be installed prior to begin. See detailed instructions to Install FFTW3 first if necessary.

Before going to the first example let us review the command line to compile it

 mpicxx -I$HOME/usr/include fft_mpi_test.cpp \
        -L$HOME/usr/lib -lfftw3_mpi -lfftw3 \
        -lboost_mpi-gcc43-mt-1_37 -lboost_serialization-gcc43-mt \
        -o fft_mpi_test

besides the libfftw part, -L$HOME/usr/lib -lboost_mpi-gcc43-mt-1_37 -lboost_serialization-gcc43-mt do the job of linking against Boost.MPI and Boost.Serialization.

The reason we need Boost.Serialization (although we don't use it explicitly here) is that MPI is highly dependent on the serialization mechanism. Serialization is the capacity of converting any object or structure in memory into a linear stream of data. That is what we do over and over again when we save data to a file, in this case MPI communication is basically the same thing: one processes send streams of data to others. This integration allows to pass complex objects between processes without having to convert (deconstruct) the objects to numerical (or primitive) types first and then reconstructing the object on the other side as we would do it with C-MPI.

The following compilable example is based on the fftw3 simple mpi example (plain-C sources). It can be regarded as the parallel version of this serial example. In the following example we just add the Boost.MPI syntax (instead of the C-MPI):

 #include <complex>                    /* must include before fftw3*.h */
 #include <fftw3-mpi.h>
 #include <boost/mpi.hpp>
 #include <iostream>
 
 namespace mpi = boost::mpi;		/* if not defined have to use names as boost::mpi::... */
 using std::clog; 
 using std::endl;
 
 int main(int argc, char **argv){      /* initialize mpi world and fftw_mpi */
   mpi::environment env(argc, argv);   /* in C: MPI_Init(&argc, &argv); */
   mpi::communicator world;            /* this will work as MPI_COMM_WORLD */
   fftw_mpi_init();                    /* initialize fftw_mpi library */
 
   const ptrdiff_t N0 = 18, N1 = 18;	/* size of parallel array */
 	
   fftw_plan plan;			/* this is the usual fftw plan */
   std::complex<double> *data;	        /* in C: fftw_complex *data; this is the local (to process) data */
 	
   // given the matrix size (N0, N1) ask fftw_mpi how to split the matrix
   ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
   alloc_local = fftw_mpi_local_size_2d(N0, N1, world,
                                        &local_n0, &local_0_start); // alloc_local isn't always local_no*N1!!
 
   //report allocation size, example of output from each process
   if(world.rank()==0){                /* only process 0 prints this */
     clog<<"Global size of array : [0:"<<N0<<")x[0:"<<N1<<"), total size "<<N0*N1<<endl;
   }
   //all processes print the following but each prints a different report (there are better ways to print)
   clog<<"Process "<<world.rank()<<" has the array of size ["<<local_0_start<<","
       <<local_0_start+local_n0<<")x[0,"<<N1<<"), total local size "<<(N0*N1)<<endl;
 
   //now allocates data in the old fashioned way	
   data = (std::complex<double>*) fftw_malloc(sizeof(std::complex<double>)*alloc_local);
   // in C: data = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * alloc_local); 
 
   //create fftw plan, note "_mpi_" and note the "_2d"
   plan = fftw_mpi_plan_dft_2d(N0, N1, (double(*)[2])data, (double(*)[2])data, world,
                               FFTW_FORWARD, FFTW_ESTIMATE);
 
   double pdata=0;
   //initialize data and compute local power
   for (i = 0; i < local_n0; ++i){
     for (j = 0; j < N1; ++j){
       std::complex<double> val=std::complex<double>(local_0_start + i, 0);
       data[i*N1 + j] = val;
       pdata+=norm(val);
     }
   }
   clog<<"Process "<<world.rank()		/* each process prints its local power*/
       <<" has a power of "<<pdata<<" of the original data"<<endl;
 
   //compute transform according to the plan
   fftw_execute(plan);
 
   double ptransform=0;
   // normalize data and compute local power of the transform
   for (i = 0; i < local_n0; ++i){
     for (j = 0; j < N1; ++j){
       data[i*N1+j]/=sqrt((double)(N0*N1));	// this is the normalization of the transform
       std::complex<double> val=data[i*N1+j];
       ptransform+=norm(val);
     }
   }
 
   clog<<"Process "<<world.rank()    //each process print its local power
       <<" has a power of "<<ptransform<<" of the transformed data"<<endl;
 
   fftw_destroy_plan(plan);         /* destroy plan */
   fftw_free(data);
   // boost::mpi::environment doesn't need finalization it is automatic, in C: MPI_Finalize();
 }

The example is rather long, this is mainly because we are using the plain-C fftw interface, and manually allocated tables. Note that the only change relative to the original FFTW example is the usage of Boost.MPI and the use of C++ complex type. (Imagine how much cleaner would be the code if we could take advantage of boost.multiarray and of a C++ wrapper of FFTW. Eventually, this will be the topic of another section.)

Note that using C++ and Boost does not enforce us to leave the known C-interfaces, the idea is that we can reuse the old interfaces and improve them gradually as we need it. For example we used std::complex<double> instead of double[2], (this is possible because both types are bit-compatible); the former has a much support for treating complex numbers.

What the program does is to initialize parts of matrix in different processes, calculate local power (sum of squares) of the data, do a global Fourier transform of this matrix (in place) and the compute the local power of the transformed data. The global power (sum of locals powers) must be equal in the original data and in the transformed data. (See above for compilation command.)

Also note that although this is our first parallel MPI program the real MPI communication is performed by the FFTW routines and not by us. Rarely we will want to deal with passing very large amount of data ourselves between processes, libraries like FFTW and Lapack will do a faster/better job than us. The idea here is to set the environment for the usage of those well tunned numerical libraries.

Now let us download simple_boost_mpi_example.tar.gz and run the example:

 wget http://micro.stanford.edu/mediawiki-1.11.0/images/Simple_boost_mpi_example.tar.gz -O simple_boost_mpi_example.tar.gz
 tar -zxvf simple_boost_mpi_example.tar.gz
 cd simple_boost_mpi_example
 make simple_boost_mpi_example

(Checkout the Makefile to see the compilation line). Run the example to see an output like the following:

 $ mpirun -np 2 ./simple_boost_mpi_example
 Process 1 has the array of size [9,18)x[0,18), total local size = 162, local allocated size = 162
 Global size of array : [0:18)x[0:18), total size 324
 Process 0 has the array of size [0,9)x[0,18), total local size = 162, local allocated size = 162
 Process 0 has a power of 3672 of the original data
 Process 0 has a power of 27729 of the transformed data
 Process 1 has a power of 28458 of the original data
 Process 1 has a power of 4401 of the transformed data

Two copies of the program are run at the same time, FFTW takes care of the communication between them. Before looking at any number, we can see that the output messages are not well ordered. we learned the first lesson of MPI in general: not synchronized instructions can be executed in any order. Besides this the program works as expected. The original 18x18 matrix is split among the two processors in two matrices of 9x18, and a memory for 162 elements is allocated in each process.

Also we can check that the power of the global data is the same before and after the FFT, 3672+28458 == 27729+4401 == 32130.

We can check running the program in one process.

 $ mpirun -np 1 simple_boost_mpi_example
 Global size of array : [0:18)x[0:18), total size 324
 Process 0 has the array of size [0,18)x[0,18), total local size = 324, local allocated size = 324
 Process 0 has a power of 32130 of the original data
 Process 0 has a power of 32130 of the transformed data

Important notes about MPI-FFTW3 memory distribution (do not skip)

Although the distribution of the matrix and allocated memory seems to very reasonable in the previous example, this is only because of the simple dimensions and process number used in the example. There are some characteristics of the memory distribution for which the previous example can be misleading, and it is worth noting them:

Only FFTW (via fftw_mpi_local_size_2d) knows how to split the data and allocated memory size among processes, so don't try to predict the splitting yourself.
The splitting of the global matrix is only done over rows (first dimension, for N>2-dimensional arrays).
1D FFT's usage is totally different, because the splitting depends on the type of transform to be performed (e.g. in-place, out-of-place).
The splitting of the matrix can be uneven (for example, #columns is not divisible by #processes).
Some processes can be assigned zero columns (for example if there are more processes than columns). Even in this case, the allocated memory in that process can be different from zero! (in general, it is 1*sizeof(complex)).
The logical size of the submatrix (local matrix) can be different from the local allocated memory, i.e. local_n0*N1 <= alloc_local. The outputs of local_size are not redundant. FFTW uses the (small) extra memory as a scratch space.

The next step in complexity could be to intercommunicate the processes in order to compute the total (global) power of the data.

Collective Reduce operation

There are several kinds of MPI operations, point-to-point and collective. Among the collective operations, the reduce operation results in the number we need to compute: the global power of the data. Reduction means that we take *all* the objects/numbers from different processes and convert it into a *single* object/number by means of some function. More often than not, this function is simply a summation (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Sigma_n a_n} ).

In practice, a summation is a very simple function, because it is associative and commutative (sequence-product Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pi_n a_n} is as well), so many issues are simplified in this case because the result will not depend (mostly) on how the order in which the data is gathered. This kind of operations are defined only in terms of a binary function. That is, the summation is defined in terms of the binary addition alone.

I said *mostly* because for machine numbers the usual operations are not *exactly* associative, and the result can depend on how the data is split among processes. This is the second lesson of parallel programming, numeric results *can* depend on the number of processes. In fact, if the result completely changes with the number of processes then that may point to very deep conceptual errors in your numeric algorithm, since the result depend on how the data is summed (for example, when you sum small and large numbers together). This is very different from a 'bug' in the program (or in the MPI library).

`all_reduce`

Going back to the global power example we can make all the processes receive the summation of values by inserting the lines:

 double global_pdata=-1;
 mpi::all_reduce(world, pdata, global_pdata, std::plus<double>());
 double global_ptransform=-1;
 mpi::all_reduce(world, ptransform, global_ptransform, std::plus<double>());
 ...
 cout<<"Process "<<world.rank()<<" knows that global power of original data is "<< global_pdata << endl;
 cout<<"Process "<<world.rank()<<" knows that global power of transformed data is "<< global_ptransform << endl;

Now the program will compute the power of the data for us, which makes easier to verify that the Fourier transform is consistent. Note that the third argument is the variable which will be overwriten with the result and the second is the contribution from the current process to the reduction.

From this point on we can easily adapt the reduction to other needs. We can pass a different function, like std::multiply<double>() or, just as an example, selectively pass data if it fulfills certain condition:

 mpi::all_reduce(world, pdata>0.?pdata:0, std::plus<double>()); // only pass positive values.
 mpi::all_reduce(world, world.rank()<4?pdata:0, std::plus<double>()); //only pass data of the first 4 processes.

Note that in both cases the condition of passing the data is *inside* the function call (via the ternary function condition?casetrue:casefalse). If you put the condition outside the function call, there exists the possibility of not fulfilling such condition call the MPI communication will hang because all the process will be waiting for a process that refused to send the data. This is the wrong code

 if(pdata>5000){ // wrong way to try to selective pass data larger than 5000
   mpi::all_reduce(world, pdata, std::plus<double>());
 } // WRONG

What is worst: the program may or may not hang depending on the actual numeric value of the data. So, third lesson of parallel programming: it is too easy to create a deadlock. In particular all blocking-communication have to be executed in non-conditional branches of the program. (Another good reason to program without if's.)

Note on function objects (Boost.Lambda and Boost.Phoenix)

In the C++ standard library, the binary addition is represented by a function object called std::plus<T> which represents the addition over type T, we are interested in the case in which T is the double type. This functions object have become the standard way of passing arbitrary functions to other functions. Function-object seem misterious sometimes but they are simply a representation of the operation, for example this code works as expected:

 std::binary_function<double, double, double>* myop;
 myop=new std::plus<double>();
 double plus_result=(*myop)(4,5);  // or myop->operator()(4,5);
 assert(plus_result==9);
 delete myop;
 myop=new std::multiply<double>();
 double multiply_result=(*myop)(3,5);
 assert(multiply_result==15);
 delete myop;

In fact it is very easy to create your own function object, they look almost like functions:

 struct concat{ //concatenation
   std::string operator()(std::string const& s1, std::string const& s2){
     return s1 + s2;
   }
 };

Passing such function object to reduce will make all_reduce concatenate the strings produced by different processes. Note that this operation is not commutative and will depend on some standard convention used by MPI to collect the data (fortunately it is well defined).

The only difference between our function object an std::plus<std::string>, is that the later is already included in some standard library.

They differ from built-in (normal C) functions in two aspects, 1) you can parametrize them, 2) you can make copies of them, 3) you can compare them:

 struct concat_with_sep{ // concatenation with separation
   std::string separator_;
   concat_with_sep(std::string const& separator) : separator_(separator){}
   std::string operator()(std::string const& s1, std::string const& s2){
     return s1 + separator_ + s2;
   }
 };

The beauty of this is that, although the separator string is arbitrary, 'concat_with_sep', viewed as function, still takes only two parameters. Function object are more general than regular functions, because they can have internal parameters. Function objects are sometimes called Functors.

Free functions (the normal C-functions) can be converted to function objects in many ways. The standard way is to use Boost.Function library. Function objects can be combined with one another with the Boost.Bind or the (more advanced) Boost.Lambda.

For very complex functions we should define our own function objects which requires a separate definition of a class or function. There is however some cases where the function that we need to apply is simple but not trivial. It is in this case where Boost.Lambda plays a nice role by allowing us to define a function in-place. For example, suppose that we need to compute not the sum but the sum of square modulus, in such case the function object can be created by replacing ... std::plus<double>() ... by

 ... _1+bind(&std::norm<double>, _2) ...

where we should include and declare

 #include<boost/lambda/lambda.hpp>
 #include<boost/lambda/bind.hpp>
 using namespace boost::lambda;

at some earlier point. If we were interested in the sum of squares, we could use just use

 ... _1 + _2 *_2 ...

Even simpler is the simple addition, plus<double> is exactly identical to _1+_2 once we use Boost.Lambda.

Note that this nice syntax is achieved by the Lambda library and it is not built-in in the language. How the Lambda library works internally is quite complex to explain, however it is designed to be easy to use (note the natural syntax). On the other hand it is difficult to debug (compilation errors are difficult to interpret). A working example of this library can be downloaded from here. The example also illustrates the usage of Boost.Phoenix which is intended to supercede Boost.Lambda.

`reduce`

Another variation of the example can be introduced if we realize that sometimes the result is only needed in one process. In this case the reduction result is sent to one process only, which is specified as the last argument in mpi::reduce:

 double global_pdata=-1;
 mpi::reduce(world, pdata, global_pdata, std::plus<double>(),0); // process 0 receives the data
 cout<<"Process "<<world.rank()<<" *thinks* that global power of the original data is "<< global_pdata << endl;

In this example, only process 0 will have a meaningful value of the result; other processes will have the global_pdata variable unchanged. The process 0 will output the correct value, while the rest will output -1. Essentially, all process but one will have an uninitialized, invalid, or at best flagged to some magical value.

This selective data collection sounds reasonable idea but really it can cause trouble because we have to keep track of the process that has the right value by some conditional. There are two basic possibilities, either we "promise" to ignore the value in processes with the wrong value.

 double global_pdata=-1;
 mpi::reduce(world, pdata, global_pdata, std::plus<double>(),0); // process 0 receives (and send to itself) the data
 if(world.rank()==0){  //always check for process number *before* using the variable. 
   cout<<"Process 0 knows that the global power of the origin data is "<<global_pdata<<", others do not really know"<<endl;
 }

or we put the result variable inside the scope of an 'if' statement:

 if(world.rank()==0){
   double global_pdata=-1;
   mpi::reduce(world, pdata, global_pdata, std::plus<double>(),0); // process 0 sends and receives the data
   ... // do other operations with result
   cout<<"Process 0 knows that the global power of the origin data is "<<global_pdata<<", others do not know at all"<<endl;
 }else{
   mpi::reduce(world, pdata, std::plus<double>(),0); // others process only send the data
 }

Note that the number of arguments is different in the two different calls to reduce. The first branch will receive in the result of all the processes (including itself) while the second will only send values. In this way, only process 0 can ever access the value. Trying to use the the variable outside the first branch of the conditional will not even compile. In this case we do not have to promise about the usage of the variable but we have to promise that we will call similar versions of reduce in both branches of the conditional. If we forget the else branch (second branch), the program will hang in the same way as described above.

Next subsection will deal with this problem of 'dangling' variables. You can skip the following subsection in a first read, it will not improve your understanding of MPI. Just be aware the problem for the moment.

The issue of dangling variables (and Boost.Optional)

A variable with a value that makes sense in only one process seems to be a natural situation, after all, each process is supposed to do different part of the total work. The problem is that, within MPI, all processes must share the same code and therefore will be forced to carry variables that may not make sense to them. This situation creates tension.

Note that in the first place, this issue arises because by collecting data in only one process we are imposing some asymmetry; it is nevertheless necessary sometimes. The situation is annoying for those used to serial programming, but we have to understand that this kind of problems are at the core of parallel programming difficulties. The first step to try to avoid the problem is to think whether it may be really useful --or at least not harming-- for all the processes to know the result of reduce/all_reduce.)

It seems that the second solution above (if/else block) is more elegant because there are no dangling undefined variables around, but is not free of problems. That is because if we need to compute something non-trivial with the result, the conditional statement can become several lines long and it can be difficult to handle cases in which more than one (but not all) process receive the value. We can put all these lines in a separate function, but that function will be forced to be serial code (because it is executed only for one process). Note that I refer this as "dangling variable", in the same sense we refer to "dangling pointers", both are equally dangerous. The following proposed solution uses the library Boost.Optional and it can be also used as an introduction to this other Boost library.

Boost.Optional introduces a kind of objects which carry information on whether their in internal value is valid (e.g. initialized or not). For example the object

 boost::optional<double> a;

can hold any real value *or* be in a 'undefined state'. If it is undefined then (bool)a==false and true otherwise and its actual value is given by *a. If we try to extract a value *a for an undefined variable then we will encounter a runtime error.

There is another possibility to consider that I discovered, that is the usage of Boost.Optional. (This solution is not suggested anywhere else to my knowledge.) In this case, the valid state of the variable is handled by the variable itself.

  boost::optional<double> global_pdata(world.rank()==0, double()); //non empty at node 0 only.
  if(global_pdata) mpi::reduce(world, pdata, *global_pdata, std::plus<double>(),0);
  else             mpi::reduce(world, pdata,                std::plus<double>(),0);
  ...
  if(global_pdata){ // only "use" if global_pdata is it is initialized
    cout<<"Process "<<world.rank()
        <<" knows that the global power of the origin data is "<<*global_pdata<<", others will die if they try to 'know'"<<endl;
  }
  // cout<<*global_pdata<<endl;//unchecked usage will throw and exception and possibly abort the program in other processes

For those used to C, they will recognize that 'global_pdata' looks like a 'nullable' pointer (whose pointee value is valid when !=0 by convention). The difference is that boost::optional behaves better than a pointer for this application, for example, boost::optional can be assigned values, or uninitialized with no memory allocation necessary, no dangling pointers can be created in this way. If we respect this check-then-use syntax, the variable becomes transparent to all the process where it is undefined.

Note that global_pdata is not associated with any particular process number, we can forget in which process the variable was valid, the validity test is done in the variable itself. We have to remember to test whether the global_pdata is locally valid. What if we do not check and use *global_pdata without test? Unless we compile in production mode (NDEBUG=0) the calling process will terminate after an 'uninitialized' fail assertion is called from Boost.Optional. This is still better than other options, like having a dangling pointer (nullable pointer solution), using an uninitialized or invalid value (first solution), or a deadlock (second solution).

So far I mentioned three (or four) possible solutions, none of them can save us at compilation time, i.e. before running our program. The situation will not improve until someone smart writes a compiler (and a language) capable of *really* interpreting parallel code and figuring out these issues before actually running the program. (Notice that MPI is really like low level programming.)

If you think that my suggested solution (using Boost.Optional) is in the right direction, there are at least two ways to improve it further. First, we can encapsulate the two calls to 'reduce' in a single function (a template function will be even more general):

 boost::optional<double> optional_reduce(boost::mpi::communicator const& world, double const& in_value, int root){
   boost::optional<double> ret(world.rank()==0, double());
   if(ret) mpi::reduce(world, in_value, *ret, std::plus<double>(),0);
     else  mpi::reduce(world, in_value,       std::plus<double>(),0);
   return ret; 
 }

and then use it as

 boost::optional<double> global_pdata = optional_reduce(world, pdata, 0);
 ...
 if(global_pdata) cout<<"global_pdata = "<<*global_pdata<<endl;

The second is to use exceptions. An exceptions that is thrown when trying to use a variable in a process in which is not defined.

For output purposes we can make boost::optional support output by

#include<boost/optional/optional_io.hpp> ...
cout<<"global_pdata = "<<global_pdata<<endl;

will print '--' if the value is unitialized and the actual value otherwise, note the lack of '*'.

Transparent blocks via exceptions (boost/assert.hpp)

By default, access to an uninitialized optional variable results in a failed assert and end of execution. If you want to be able to recover from this failed assert, you can use boost/assert.hpp to change the default behavior.

Just add the following code (extracted from here) before and after #include<boost/optional.hpp>

 #define BOOST_ENABLE_ASSERT_HANDLER
 #include "boost/assert.hpp"
 #include "boost/otional.hpp"
 // #undef BOOST_ENABLE_ASSERT_HANDLER //to constrain the new default to optional.hpp only
 #include <stdexcept>
 
 namespace boost {
   void assertion_failed(char const* expr, char const* function,
                       char const* file, long line) {
     throw std::runtime_error(expr);
   }
 }

This will allow you to throw an exception (instead of failing after the assert) when accessing uninitialized variable, this gives the opportunity to recover from failed executions. For example:

 try{
   boost::optional<double> global_pdata = optional_reduce(world, pdata, 0);
 }catch(...){
   clog<<"sorry I am a process that don't know about global_pdata"<<endl;
 }

The idea is that the invalid part of the try-block becomes transparent to processes. (The valid part should be exception safe though.)

Primitive or point-to-point operations (`barrier`, `send` & `recv`)

The section describe primitive operations, i.e. operation that are very basic. In principle we should be able to do all the communications with this primitives, but then many things will have to be handled manually. The result of using this operations in excess can result in a very unelegant code. (That is why this section is at the end).

The simplest operation is the member function barrier it serves to the purpose of synchronizing all the processes in a certain communicator. Then there is the member send and recv, to send an receive messages.

In pure C-MPI, the last two operators are only implemented to pass small types, like single numbers or pointers. That means that to pass a simple string we have to allocate the memory manually and hope for successful communication and that we didn't introduce any mistake when dealing with the pointers. With Boost MPI the situation is much better thank to the automatic serialization and deserialization.

The following example shows the use of barrier, send and recv to print (from process 0) a list of the nodes involved in a MPI execution:

 #include<boost/mpi.hpp>
 int main(int argc, char* argv[]){
 	boost::mpi::environment env(argc, argv);
 	boost::mpi::communicator world;
 	std::clog<<"greetings from process #"<<world.rank()<<" in processor named "<<boost::mpi::environment::processor_name()<<std::endl;
 
 	world.barrier();
 	if(world.rank()==0) std::clog<<"all processes syncronized here"<<std::endl;
 
 	world.send(0,world.rank(), boost::mpi::environment::processor_name());	
 	for(int i=0; i!=world.size(); ++i){
 		//std::clog<<"process #"<<i<<" sends the name of its processor to process #0"<<std::endl;
 		if(world.rank()==0){
 			std::string buffer;
 			world.recv(i, i, buffer);
 			std::clog<<"process #0 receives string "<<buffer<<" from process #"<<i<<std::endl;
 		}
 	}
 	return 0;
 }

compile with

 mpicxx main.cpp -L$HOME/usr/lib -lboost_mpi-gcc44-mt -lboost_serialization-gcc44-mt -Wl,-rpath=$HOME/usr/lib

and run with

 ./a.out

The first set of messages ("greetings...") will appear in a random order, however thanks to the barrier call, all processes will wait for the others to reach that point in the execution. After that all processes will send the name of the processor in which they are executing to process number 0, which will receive the message and print the string.

Both send and recv send a tag (integer in the second argument), this number can be for example the origin process of the message itself, but the important thing is that send and recv must agree in the tag, otherwise recv will wait for a message that has the correct tag. If you make the tag disagree at send and receive the program will hang. Tag is useful to distiguish between 'different' message sent by the same process.

Although much cleaner than its counterpart pure C version, the code is still quite fragile and prone to errors. For these reason it is recommendable to use other ways to send messages as described in the other sections of this chapter.

Other collective operations (`gather` and `broadcast`)

To make clear that options other than send and recv should be considered I will reimplement the program using the 'gather' operation. Gather takes the elements send by other processes and stacks them in a vector type. Then the vector object can be transversed to inspect its contents.

#include<boost/lexical_cast.hpp>
#include<boost/mpi.hpp>
 
int main(int argc, char* argv[]){
 	boost::mpi::environment env(argc, argv);
 	boost::mpi::communicator world;	
 	std::clog<<"greetings from process #"<<world.rank()<<" in processor named "<<boost::mpi::environment::processor_name()<<std::endl;
 
 	std::string message="process #0 receives string "+boost::mpi::environment::processor_name()+" from process #"+boost::lexical_cast<std::string>(world.rank());
 	if(world.rank()==0){
 	std::vector<std::string> lines;
 	boost::mpi::gather(world, message, lines, 0);
 	for(int i=0; i!=lines.size(); ++i)
 		std::cout<<lines[i]<<std::endl;
 	}else{
 		boost::mpi::gather(world, message, 0);
 	}
 	return 0;
}

Broadcast simply sends a value to all the processes in a certain communicator, the values are stored in the same variable as in the broadcasting process.

string v = "empty";
if(world.rank()==0) v="special value";
boost::mpi::broadcast(world, v,0);
assert(v=="special value"); //all processes have this value for v now

In other words it synchronizes the value of a variable with respect to the value in a certain process.

Parallel logging example

One of the first problems that appears when writting our first parallel program is that when we try to print messages the usually don't appear as we intend. The reason is that logging and printing messages to the user is an inherently serial job and once we are in a parallel environment we have to be specific that such message printing is done in a serial way. We have to synchronize the processes make each process send its message (hopefully in a prescribed order) and then recover the parallel execution. What is worst is that sometimes we need all the processes to print the same message, in that case many copies of the same message will be printed.

It would be great if we can solve this problem once and for all and make the code read our mind and understand these issues and print what we intend.

The following code will make that possible, the code seems complicated because it uses advanced C++ stream techniques but what it does is very simple, for a newly defined stream (boost::mpi::ostream) all processes will accumulate strings send by each process, when a special end-of-line signal is sent (boost::mpi::endl) is send the root process collects the messages and make them unique. In this sense repeated messages are only shown once. Messages are displayed in alphabetical order. If we want the messages displayed in the order of the issuing process we can just add that information at the beginning of the message (e.g. "proc 5:...").

In this sense we can forget that we are running a program in parallel if all messages are the same. But if the messages are different (line-by-line) all the messages are printed recovering the parallel behavior. Also, in the parallel case, messages are shown in a prescribed order.

#include<boost/mpi.hpp>
#include<set>
namespace boost{namespace mpi{
     class ostream{
       communicator const& comm_;
       std::ostream& os_;
       std::ostringstream local_buffer_;
       public:
       ostream(communicator const& c=communicator(), std::ostream& o=std::clog) : comm_(c), os_(o){}
       template<typename T> ostream& operator<<(T const& t){
         local_buffer_<<t;
         return *this;
       }
       typedef ostream& (*ostream_manipulator)(ostream&);
       ostream& operator<<(ostream_manipulator manip){
         return manip(*this);
       }
       ~ostream(){endl(*this);}
       static ostream& endl(ostream& os){
         os.local_buffer_<<std::endl;
         os.comm_.send(0, os.comm_.rank(), os.local_buffer_.str());
         os.local_buffer_.str("");
         if(os.comm_.rank()==0){
           std::set<std::string> messages;
           for(int i=0; i!=os.comm_.size(); ++i){
             std::string remote_buffer_;
             os.comm_.recv(i, i, remote_buffer_);
             messages.insert(remote_buffer_); 
           }
           for(std::set<std::string>::const_iterator i=messages.begin(); i!=messages.end(); ++i){
             os.os_<<(*i);//<<std::endl;
           }
         }
         os.comm_.barrier();
         return os;
       }
       typedef std::ostream& (*std_manipulator)(std::ostream&);
       ostream& operator<<(std_manipulator manip){
         manip(local_buffer_);
         return *this;
       }
     };
     ostream& endl(ostream& os){return ostream::endl(os);}
}}

The stream is used as follows:

int main(int argc, char* argv[]){
   boost::mpi::environment env(argc, argv);
   boost::mpi::communicator world;	
   
   using boost::mpi::endl; //or using std::endl; //*
   boost::mpi::ostream clog(world); // or using std::clog; //*

   clog<<"world says: we are the world"<<endl;          //this messages is unique
   clog<<"rank "<<world.rank()<<" says: I am me"<<endl; //this message is different for each process.
 
   return 0;
}

The ouput is

$ mpirun -np 3 ./a.out 
 world says: we are the world
 rank 0 says: I am me
 rank 1 says: I am me
 rank 2 says: I am me

Note that the code looks almost like the serial code, by changing the two lines marked we recover the usual behavior (i.e. multiple messages in random order). Anoether way to recover the serial behavior is to use the communicator MPI_COMM_SELF in the constructor of the stream:

   boost::mpi::communicator self(MPI_COMM_SELF, boost::mpi::comm_attach);
   boost::mpi::ostream clog(self); // or using std::clog;

There are a couple of assumptions in the design. First, the messages are all different or all the same, i.e. we don't care about distinguishing messages from 'groups' of processes. In any case that 'group' information can be passed explicitly as part of the string. Second, the smallest indivisible messages is an entire line and not part of it, i.e. uniqueness is relative to a full line, i.e. until boost::mpi::endl is found.

The class can be defined and used inline:

   boost::mpi::ostream(world)<<world.rank()<<" says: I am me"<<endl;

By default it uses the std::clog output (stderr), if we want to use a different output stream, for example a file or std::cout (stdout) we can specify:

   boost::mpi::ostream(world, std::cout)<<world.rank()<<" says: I am me"<<endl;

Finally, we have to remember that all processes (in a certain communicator) will wait to synchronize to print the message. So the parallel performance can be affected by printing these messages, even if the message is unique. If this is a price we don't want to pay, then we will have to roll back to the usual conditional statement:

 if(world.rank()==0) std::clog<<"here"<<std::endl;

But then the message will not contain information about the current state of processes other than process number 0.

NB: The idea of the behavior is taken from the Parallel Print Function and the implementation of the stream is bases on this generic stream class.

Other Resources

http://daveabrahams.com/2010/09/03/whats-so-cool-about-boost-mpi/

Boost.Filesystem

Sooner or later every program has to deal with the task of managing files or even directories in the disk. The following tasks are quite tedious to perform in C or C++:

Create/remove directories
Iterate through files in a given directory
Create filenames from other names
Change/add filename extensions
Copy files
Delete files
Check if a file exists

Besides, there is no portable way (across operating systems) to do it in general with the core language and libraries. Here is where Boost.Filesystem proves useful, it not only facilitates programming such tasks but also does it in a portable way by hiding the OS-specific file management. Besides it is likely that this library will become part of the next C++ standard library.

This library is also described in http://www.ibm.com/developerworks/aix/library/au-boostfs/index.html and http://en.highscore.de/cpp/boost/filesystem.html

Path

The path object can represent a concrete file or a directory. It is basically a string that represents the path to an existing or non existing file or directory.

boost::filesystem::path p("/home/user/data.dat");

It has certain advantages to the usual way of representing a file by means of char* or std::string. The advantage over char* is that the object is not a pointer so it can be copied without worrying about the pointer semantics. The advantage over std::string is that path names can be combined very easily, even using the intuitive operator '/', for example:

 using boost::filename::path;
 path myhome("/home/user");
 path mydata("data.dat");
 
 path fullname = myhome / mydata;

Other facilities of boost::filename::path is that the extension of the file can be easily extracted and changed.

Traditionally filename are passed as char* in C and std::string in C++. filesystem::path can be converted back and forth from this types. Thanks to the constructors

 const char c[]="file.dat";
 path p(c);
 path p("file.dat"); // from literal;
 
 std::string s="file.dat";
 path p(s);

and back from member functions

 std::string s=p.string();
 const char c[99]=p.string().c_str();

so, passing filenames to old routines is as simple as passing p.string().c_str(). (note that c_str() is a member function of std::string class).

Common path name functions

The operations listed in this section manipulate paths, that is manipulates the string that *represent* files (not the files themselves).

p.extension() // was extension( path ) -> string //returns the extension of a file path (useful for guessing the file type), includes the 'dot'.
p.stem() // was basename( path ) -> string  //returns everything in the filename except for the extension
p.filename() // name of file without path

note that p.string() == p.stem() + p.extension().

p.replace_extension(".newext"); //(was change_extension(const path, string new_extension_w_dot) -> path) // changes the extension of a path name and returns the path

Note that these functions return strings, not paths. Strings can be manipulated as such, and eventually transformed to paths. With the path constructor:

path newp(string);

Also to interface with old routines that take C-strings we could use

p.string().c_str();

the first function produces an std::string with the path the second call productes a C-string (char*).

Common file actions

The operations listed in this section make actual changes on the file in the disk. Most frequently file actions are straight forward once we remember the particular names given, here it is a list of functions with obvious side effects:

create_directory( path );
remove( file );
create_symlink( realfile, linkname );

Some actions could raise exceptions upon errors, if not catch they can abort the program. For example create_symlink raises an exception if the link already exists (but doesn't raise an exception if the destination file does exist).

Directory Iterator

So far, the features described are simply path name manipulations and file actions. On top of that the library gives the opportunity to scan files in a certain directory as if the directory were a typical STL container. The following code shows how to read several files with a given filename in different directories:

using namespace boost::filesystem; // ::directory_iterator ::path, etc
for(directory_iterator it("../datadir"); it == directory_iterator() /*end iterator*/; ++it){
  if(not is_directory(*it)) continue; //ignore if the entry is not a directory
  for(directory_iterator it2(*it); it2 == directory_iterator(); ++it2){
    remove((*it2)/"data.txt");
  }
}

This example removes the file named "data.txt" in the directory "../datadir/*/". There is also a 'recursive_directory_iterator' which avoids the complication of nesting different loops to reach a directory of arbitrary nesting.

This kind of directory iterator should be confused with other type of iterator which exists for a completely different purpose, the 'path::iterator', is able to given a path e.g. ("/a/b/c/d/e/") iterate over the components "a", "b", "c", etc.

Boost.StaticAssert

We can check for certain consistency conditions before some complicated code is called by means of 'assert'. For example:

 #include<cassert>  // usually not needed
 double square_root(double x){
   assert(x>=0);
   ...
 }

Under certain circumstances what we need is something that is checked during compilation, and not at runtime like the previous example. In other words, some conditions on macro definitions, types information and numeric constants must be checked even before the program is executed. It is here were Boost.StaticAssert can be useful.

The macro BOOST_STATIC_ASSERT, defined in Boost.StaticAssert library checks for certain conditions to be met at compile time. Of course the predicate must have the possibility to be evaluated at compile time, so dynamic variables cannot be used. This library should not be confused with Boost.Assert (BOOST_ASSERT).

Although most applications of static assert are for template instantiation, here I will give a simpler example. Suppose that we have two compilation flags -D_HOT and -D_ICECREAM that although it makes sense to use them separately it doesn't make sense to use them together because of the internal logic of the code. HOT is not compatible with ICECREAM. The code needed to ensure that both flags are not defined simultaneously is the following:

 #ifdef _HOT
 #ifdef _ICECREAM
 BOOST_STATIC_ASSERT(0);
 #endif
 #endif

The compilation will fail (at that point) if both compilation flags are defined.

A more complicated example is the following suppose that we want to make sure that the code is never compiled in a machine were the type 'int' has less than 32 digits:

 BOOST_STATIC_ASSERT(std::numeric_limits<int>::digits >= 32);

Note that the condition can be evaluated at compile time (there are no variables involved). In order to use BOOST_STATIC_ASSERT you must #include<boost/static_assert.hpp>.

Due to technical reasons the authors of the library recommend, when possible (specially in header files), to put the condition in a special namespace. That also gives the opportunity to add some meaning to the condition:

#include <climits>
#include <cwchar>
#include <limits>
#include <boost/static_assert.hpp>
 
namespace my_type_conditions {
  BOOST_STATIC_ASSERT(std::numeric_limits<int>::digits >= 32);
  BOOST_STATIC_ASSERT(WCHAR_MIN >= 0);
}

A built-in version of static_assert is supposed to be part of the next C++0x language.

Boost static computations (Metacomputations)

Boost is above all a template library, that means that it concentrates in computations about types and compile time constants rather than in runtime algorithms. Pure C is a programming language that allows to write almost any runtime algorithm, but the language doesn't give facilities to write algorithms that are performed during compilation (i.e. by the compiler). On the other hand, C++ defined a whole set of template syntax that allows us to force the compiler to perform certain operations. Programming templates is called 'metaprogramming' in the jargon, because it is like programming the code generation of a program. In this sense many computations can be performed even before the program is run for the first time.

Boost.MPL is a great example of what it can be achieved by template metaprogramming. Here it is a basic example:

 #include <boost/mpl/int.hpp>
 int main(){
   typedef boost::mpl::int_<8> eight;
   std::clog<<eight::value<<std::endl;
   std::clog<<boost::mpl::next<eight>::type::value<<std::endl;
 }

The output of this program is

 8
 9

The template class mpl::int works like a metatype while 'eight' is a metavariable (and a type).

The operation 8 + 1 is never performed by our program. So, who did the operation? The answer is the compiler it self, in fact the program is exactly (in the binary sense) as having explicitly programmed manually:

 int main(){
   std::clog<<8<<std::endl;
   std::clog<<9<<std::endl;
 }

The operation 'next' is a metafunction that returns a new type which wraps the value that follows the value wrapped by the argument type. The following program, makes the compiler compute the number 17. When executed, the program just prints the constant 17 (but it doesn't need to compute it).

 #include <boost/mpl/int.hpp>
 #include <boost/mpl/plus.hpp>
 int main(){
   typedef boost::mpl::int_<8> eight;
   typedef boost::mpl::int_<9> nine;
   std::clog<<boost::mpl::plus<eight, nine>::type::value<<std::endl;
 }

Boost.MPL have many more facilities, like compiler time containers (e.g mpl::vector) and algorithms (e.g. mpl::max_element).

That it means that we can precompute anything at compile time so that at runtime we save many computations? Can we, for example, precompute the square root of 2 so we don't have to type the constant 1.4142... manually? no, floating point numbers are a limitation of the template metaprogramming and they are not allowed for technical reasons. But otherwise we can compute anything that involves integer arithmetic, including rational numbers and integer linear algebra.

A different kind of limitation of template metaprogramming is that the efficiency of the operations is usually much worst than runtime operations, therefore compilation can take longer as we use more complicated techniques. In fact we can make the template program so complicated that the compilation never ends (most likely fail due to recursion limits set by the compiler).

For example Boost.Units is a compile-time dimensional analysis library with absolutelly no-runtime cost. Dimensional analysis requires to solve linear system of equations with integer coefficients.

Parts of Boost.Math implement static Greatest Common Divisor and Lowest Common Multiple algorithms, i.e. the numbers are computed during program compilation.

Static Rationals

In this section I will describe a library that is auxiliar to the Boost.Units library which is not documented. The Boost.Units library requires an implementation detail of compile time (static) rationals and that implementation can be used if we include the boost/units/static_rational.hpp. Recently Boost accepted Boost.Ratio which implement static rationals as well but it is not yet included in the Boost distribution.

In this example, we define C++ *types* that represent compile time rationals, binary operations can be performed to define new rationals. Values can be inspected at runtime.

 #include<boost/units/static_rational.hpp>
 int main(){
   typedef boost::units::static_rational<6,8> six_eigths;
   std::clog<<six_eigths::Numerator<<"/"<<six_eigths::Denominator<<std::endl;
 
   typedef boost::units::static_rational<1,4> one_fourth;
   typedef boost::mpl::plus<six_eigths, one_fourth>::type result;
   std::clog<<result::Numerator<<"/"<<result::Denominator<<std::endl;
 }

First, the program outputs

3/4

(because 3/4==6/8). Note that it is the compiler through the static_rational.hpp metaalgorithm who computed the non-trivial simplification of the fraction.

Later it prints

1/1

which is the result of 6/8 + 1/4 (or 3/4 + 1/4) which is computed using the mpl::plus operation (and some implementation in static_rational.hpp). Again the non trivial sum and the simplication is done at the time of the compilation.

Also note that, like in the case of mpl::int_c, the class static_rational is essentially empty, all the useful information is contained in the class and not in instances of that class. In fact there may not be instances of the class at all in the program.

Now a simple application. We mentioned earlier that these metaprograms cannot operate on floating point numbers, however we can exploit it anyway to improve floating point performance with minimal user code. Suppose that we want an arbitrary exponentiation of a number, to a power that is known a priori but that can change later while we are writing the code. A typical line of code for this purpose looks like this:

 std::cin>>base;
 double const power = 5/6.;
 double result=std::pow(base, power);
 std::cout<<result;

so far so good, there are not many ways to improve the code, however we may want to change the value of p manually and yet obtain the maximum performance possible. If p=2 or p=1/2 calling std::pow is a waste, since it is not optimal. (Even if std::pow contains an 'if' conditional statement the code is not optimal since it will have to check for a condition at runtime.)

To obtain the maximal performance we will have to change our code manually depending on the value of 'p'.

 std::cin>>base;
 /* arbitrary case */
 //double const power = 5/6.;
 //double result=std::pow(base, power);
 /* square */
 double result=base*base; //more efficient than std::pow(base, 2);
 /* square root */
 //double result=std::sqrt(base); //more efficient than std::pow(base, 0.5);
 std::cout<<result;

Not very elegant, i it? Here is were our solution comes in. We can write a template function parametrized by the known-at-compile-time rational

 #include<cmath> //for std::pow
 template<class RationalPower>
 double smart_pow(double const& base){
   return std::pow(base, (double)RationalPower::type::Numerator/(double)RationalPower::type::Denominator);
 }
 using boost::units::static_rational;
 template<> double smart_pow<static_rational<1,2>::type>(double const& base){
 	return std::sqrt(base);
 }
 template<> double smart_pow<static_rational<1,1>::type>(double const& base){
 	return base;
 }
 template<> double smart_pow<static_rational<2,1>::type>(double const& base){
 	return base*base;
 }

then we can use the function with the confidence that the optimal function is always called, because these cases are handled efficiently. This can approach can even handle cases were the numerator and the denominator are calculated independently by some other computation.

 using boost::units::static_rational;
 double base;
 std::cin>>base;
 double result=smart_pow<static_rational<1,2>::type >(base); //automatically calls std::sqrt
 std::cout<<result;

we don't have to worry about defining the case 4/2 or 2/4 or 9/9 since the static_rational metatype takes care of the simplification and reduces the case to 2, 1/2 and 1 respectively.

Boost.Fusion

Boost.MPL and Boost.Tuples are library without which doing template programming would be difficult. MPL was designed to be the STL (Standard Template Library) of compile time, for example for having containers of types. On the other hand Tuples was the library for heterogeneous (static) containers.

I will skip MPL container and Tuples and go straight to Boost.Fusion which combines the best of both libraries and it is more general. Boost.Fusion is a library of compile-time/runtime containers, that is, compile-time objects that contain other compile-time objects. In its turn each static element (types) of the container is also a placeholder for a runtime object.

#include <boost/fusion/container.hpp>
#include <boost/fusion/include/container.hpp>
#include<string>
#include<iostream>
int main(){
	using namespace boost::fusion;
	vector<int, char, std::string> stuff(1, 'x', "howdy");
	int i         = at_c<0>(stuff);
	char ch       = at_c<1>(stuff);
	std::string s = at_c<2>(stuff);
	return 0;
}

In this example, the object stuff is a boost::fusion::vector (analogous but different from to std::vector), as a type it contains a sequence of types and as a runtime object it contains different types, the int 1, the char 'x' and the string "howdy". Notice how the elements of the fusion::vector are accessed by the fusion:at_c<N> funcion and how therefore N must be a compile time constant. (This is the reason stuff[N] notation wouldn't work). So far, the usage is strictly equivalent to Boost.Tuple's tuples<...> type.

In parallel to the std::vector, the container fusion::vector has a size function which returns the number of elements. Now here it is where the interesting happens. As mentioned before there is a runtime aspect and a compile-time aspect of the object. There are two size functions, one returns a runtime variable and operates on instances of the class.

typedef boost::fusion::vector<int, char, std::string> stuff_type; 
stuff_type stuff(1, 'x', "howdy");
std::cout << size(stuff) << std::endl; //return a size of 3

The other version statically operates on the type and returns a compile time mpl::int_, a compile time integer can be obtained from this mpl::int_ by the value member. MPL basic metatypes were described in a previous section.

std::cout << stuff::size::value << std::endl;

The difference between the two is obvious when we want to pass the size of the fusion::vector (which is known at compile time) as the template parameter.

boost::array<double, stuff_type::size::value> a;

Now going back to the typical C++ class, you can see that since fusion::vector can hold an arbitrary number of types and elements then any class with member fields can be replaced by a fusion::vector.

class model{
 double a;
 int n;
 ...
 double calculate() const{return pow(a, n);}
};

class model : fusion::vector<double, int>{
 double calculate() const{return pow( at_c<0>(*this), at_c<1>(*this) );}
};

quite a generalization, now another metafunction can inspect the parameter types of "model". The price to pay is that we lost the nice names of the parameters (a and n). If we want to give names to the parameters we should use fusion::map.

fusion::map

fusion::map is analogous to std::map. In std::map we have values acceded by a key, now the keys are types. Since the types can be arbitrary we can use this mechanism to assign "names" to the keys. In practice this key type can be auxiliary.

struct a;
struct n;

class model : fusion::map<

Boost Numerics

Boost is not intended specifically to be a numerical library, although there are some useful tools scattered within the library. The numeric capabilities random number generation, linear algebra, interval arithmetics, special functions. For numeric applications, it worths also looking at the well known C library GNU Scientific Library (GSL) which is documented in great detail.

Boost.Interval

Boost.Interval provides us with a support for manipulation of intervals and interval arithmetic. It can be used to represent simple numeric intervals (including the empty, infinite and semi infinite intervals) or to do error propagation for example.

To create an interval choose a number type and the lower and upper bounds

#include<boost/numeric/interval.hpp>
boost::numeric::interval<double> unit(0., 1.);
boost::numeric::interval<double> two(2.); //interval contains one point
boost::numeric::interval<double> w = boost::numeric::interval<double>::whole();

if the, yet to be, lower limit is larger than the upper limit an exception (runtime error) is raised. Different properties of the interval can be accessed by the functions lower, upper, median, width (or norm). Most interesting intervals can be manipulated by arithmetic operators (+,-,*,/) with numbers or with other intervals.

using boost::numeric::interval;
interval<double> i1 = unit*2.; //duplicates the unit interval [0.,2.]
interval<double> i2 = unit-1.; //shifts the unit interval to the left [-.5,.5]

A curious (and correct) result of the interval arithmetic is that whole()*whole()==[-inf,inf] but pow(whole(),2) == [0,inf] A semi-infinite interval can be generated (for types that have infinity),

interval<double> si(0.,std::numeric_limits<double>::infinity());

Some transcendental functions (exp, log, sin, etc) can be applied to intervals.

Intervals can be printed to the screen if we include an extra header file,

#include<boost/numeric/interval/io.hpp>
...
std::cout << unit << std::endl;

To check whether an element belongs to the interval use the in function

if ( in( 0.5, unit) ) cout<<"0.5 is in "<<unit;

Here it is an example http://codepad.org/nwo9Z2Be.

The library seems simple at first sight but the details (and the implementation) are quite tricky, for example regarding the proper rounding of the bounds or whether empty intervals are allowed or not. Some special cases are handled in the library by using special (template) options (policies). Notes on the design can be found here.

Recently, another interval library, called Interval Container Library (ICL) was accepted in Boost, with more abstraction, open/closed, multidimensional, and interval sets (containers specific for intervals).

Boost.Random

Boost.Random is a library that allows us to pick random numbers from specific statistical distributions. It has the flexibility to separate the abstraction of the distribution and the randomness source. For example, you can choose a deterministic random source (pseudo-random generator) and plug in it to a specific distribution from where you can pick up numbers one at a time. For some systems Boost can even provide non-deterministic randomness from external devices and then plug it to your desired abstract distribution (not covered here).

A typical usage is the following

#include<boost/random.hpp>
...
boost::mt19937 mt; //this is the randomness source
boost::uniform_real<> rm1p1(-1.,1.); //this is the abstract distribution 
boost::variate_generator<boost::mt19937&, boost::uniform_real<> > die_real(mt, rm1p1); //glue randomness with distribution
clog<< die_real() << endl; // pick up a random real from distribution

as you can see, the distribution (uniform in the interval [-1:1) ) is independent of the randomness source (Mersenne Twister #19937). The library allows to specify a certain set of predefined and user defined distribution.

This library (with minor changes) will become part of the standard library of C++0x.

(for complete code of examples: http://codepad.org/Uj56naPF).

Normal distribution

We can change the distribution to a normal one (gaussian) with a certain mean and deviation

using boost::mt19937; using boost::normal_distribution; using boost::variate_generator; 

normal_distribution<> n_10_4(10., 4.);
variate_generator<mt19937&, normal_distribution<> > die_normal(mt, n_10_4); //glue
clog<< die_normal() <<endl; //pick a random real from normal distribution

As a side note, it is well known that when we generate a normal distributed random number from a uniform one, we get a second normal random number for the same price. In other words, generating normal numbers one by one is less efficient than generating pairs. The Boost implementation of normal_distribution takes advantage of this but gives the illusion that the numbers are generated one at time, internally only pairs are generated. The trick is that the second number is cached and not calculated in the next round (i.e. second call of die_normal()). In the third call a new pair is generated, the first number of the pair is returned and the second one is stored for the following call.

The omitted template parameter of the distribution is the type of the returned value, by default it is double but it can be other floating point number (namely float). In fact other useful distributions can return multiple numbers. For example, the following code returns two numbers corresponding to a random point in a 2-sphere:

 using boost::rand48; using boost::uniform_on_sphere;
 rand48 rand; 
 uniform_on_sphere<> us(2);
 variate_generator<rand48&, uniform_on_sphere<> > die_place_in_the_world(rand, us);
 clog<< v2[0]<<" "<<v2[1]<<endl;

v2 is normalized (v2[0]*v2[0]+v2[1]*v2[1]==1). The dimension of the sphere can be arbitrary. The example also shows how to change the source of randomness (a.k.a. the underlying random number generator). In this case we change a particular Meressene Twister generator (mt19937) by a particular linear congruential (rand48).

Random Sources

The list of possible random generators is given here. For extra control on the number generator, the parameters can be passed as template parameters, for example mt19937 and ecuyer1988 are internally defined as

 typedef mersenne_twister<uint32_t,624,397,31,0x9908b0df,11,7,0x9d2c5680,15,0xefc60000,18, 3346425566U> mt19937;
 typedef linear_congruential<int32_t, 40692, 0, 2147483399, 0>, 0> ecuyer1988;

All the random generators mentioned are deterministic and return the same sequence after each run unless the seed changes. To change the seed the object can be initialized with an integer that represents the seed or it can be changed at any point with the .seed member function.

 boost::mt19937 mt(123); //this is the randomness source
 ...
 mt.seed(321);

Distributions

A list of possible distributions is given here. Since it is very common, it worths mentioning that integer uniform distributions can be also defined.

 uniform_int<> six(1,6);      // distribution that maps to 1..6
 variate_generator<boost::mt19937&, uniform_int<> > die(mt, six); 
 int x = die();                      // simulate rolling a die

An interesting example is the Bernoulli distribution, whose associated generator returns a bool (true or false) with a given probability.

 bernoulli_distribution<> mostly_true(0.75); //will return true 75% of the time
 variate_generator<boost::mt19937&, bernoulli_distribution<> > acceptance(mt, mostly_true);
 if(acceptance()==true) call_accept(); else call_reject();

Another distribution of integers is the binomial distribution (related to the Bernoulli distribution)

 binomial_distribution<> fortydrops(40,0.5); //a fair coin will be tossed 40 times.

which returns the (random) number of face outcomes of 40 coin tossings. (This distribution is undocumented.)

Note on syntax (`&`)

The ampersand (&) at the end of the random source type in the template parameter (boost::mt19937&) is important if the function where the line is, is reentrant (called many times). Because it will guaranty that the seed is not reset between calls. (It is possible that we always want the same random sequence when calling a function, but that is not very common.)

It is also important not to put it when the random generator is declared in one line.

using namespace boost;
variate_generator<mt19937, uniform_real<> > die_real(mt19937(), uniform_real<>(-0.2,0.2));

Boost Math Constants

(Not to confused with Boost Units Constants.) This is not a library per se, is an internal part of Boost.Math in general. It just defines a few mathematical constants numerically. The advantage compared to other libraries is that it provides the numerical constants with arbitrary type (based in a hard-coded 100 digit expansion).

For example if we want pi to 'float' accuracy we call

#include <boost/math/constants/constants.hpp>
float pi_float = boost::math::constants::pi<float>();

if we want the truncation at 'long double' precision we call

using boost::math::constant::pi;
long double pi_ldouble = pi<long double>();

the translation to the right precession is made during compilation.

Besides pi, there are several other constants predefined template as listed in the reference page:

Boost Math Special Floating Points

There some tricks to determine special floating point conditions (these include 'nan', 'inf' and zero). Unfortunately there is no standard way to do it since it depends on the system. In the systems in which such detection is possible Boost implements a series of functions to determine such special conditions:

#include<boost/math/special_functions/fpclassify.hpp>
using namespace boost::math
isinf  
isnan

It can also detect normal conditions

isfinite
isnormal

The difference between the last two is that zero (0) is not considered a normal floating point (but 0 is finite).

Due to compatibility with macros defined elsewhere it is recommended to enclose the function in parenthesis

#include<boost/math/special_functions/fpclassify.hpp>
(boost::math::isnan)(x)  // test that the value held by x represents a number, for example 5.2, 0., or inf.

Additionally, the function fpclassify<> returns an integer code (FP_ZERO, FP_NORMAL, FP_INFINITE, FP_NAN, FP_SUBNORMAL) that tells what kind of floating point is passed, as documented here.

(The distinction between -inf and +inf can still be done with the comparison to zero).

Note that the best way to filter 'inf' and 'nan' altogether is to test agaist 'isfinite' condition: 'inf' is not finite and neither is 'nan'. (technically, 'nan is not infinite either).

Boost.Accumulators

During a computer simulation it is often useful to 'collect' or 'accumulate' statistics. This accumulated statistics can be reported at the end of the simulation or used immediately on-the-fly to monitor or check for convergence or steady states. For example, the accumulated statistics can be the mean, variance and extremes of a certain variable.

Writing code that does this is indeed trivial but since it is not the main function of the code it can interfere with the normal flow of the code, we have to introduce many new variables that keep track of the statistics, such as counter, total sum, sum of squares, etc. Then these quantities have to be combined in certain ways when reporting the results.

The Boost.Accumulators library provides a consistent way of collecting this information. Accumulators can also keep track of extreme values (min/max), element count and other user defined features.

The first step is to create an accumulator object with certain 'features'. For example if we want to have an accumulator that reports the mean value, the second moment and min and max values we declare

#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/mean.hpp> // or other stat features
...
using namespace boost::accumulators;
accumulator_set<double, stats<tag::mean, tag::variance, tag::min> > acc;

The first template parameter is the type of the data elements that will be feed to the accumulator, the second argument is the list of features or statistics that the will be accumulated. Note that the list of features is specified at compile time, therefore the library is able to know the number of parameters that this particular accumulator will store during book-keeping. For example, if the only feature is the 'mean' value the sum of squares is not necessary and never computed (only the count and the sum of elements is stored). On the other hand, if tag::min or tag::max are the only features it is not necessary to keep the count internaly, etc.

As a second step we need to actually accumulate the information, this is done simply by enclosing the data in the parenthesis operator of the accumulator, for example in a loop

for( ... ){
  double data_point = function(...);
  acc(data_point);
... 
}

As third and final step and after some (or all) data is collected we need to extract the final result that the object was comited to accumulate. To extract the information we must apply a function with the name of the statistics we wanted,

std::cout<<"mean: "<<mean(acc)<<std::endl;
std::cout<<"variance: "<<variance(acc)<<std::endl;

only statistics or features that were declared are allowed to be applied to the accumulator, otherwise a compile error happens.

std::cout<<"max: "<<max(acc)<<std::endl; // <-- compile error, max feature was not requested when acc was declared

The accumulators also support covariance, i.e. how two or variables change together.

Special numerical operations (boost::numeric::functional)

Boost.Accumulators and statistics in general assume a set of numerical operations on the accumulated objects that may be larger than the number of operators defined for that object, or may change the meaning of defined operations in the context of statistics. For example, if we accumulate integers, what is supposed to be the average of integers, is the average a real number? or another integer? In any case we should be explicit because otherwise the native C operations will perform integer-division (where 3/2==1).

For Boost.Accumulators, these special operations are defined in the namespace boost::numeric::functional.

Another example is an object that doesn't carry information on how to divide by an integer, for example std::complex<double>. In these cases we have to extended the operations available on those objects. An interesting case that doesn't define division by integer is boost::units::quantity<Unit>.

In this example, we specify how the division by an integer is carried on:

namespace boost {namespace numeric{namespace functional{
	using namespace boost::units;
	template<class Unit, typename Y> struct quantity_tag{};
	template<class Unit, typename Y> struct tag<quantity<Unit,Y> >{
		typedef quantity_tag<Unit,Y> type;
	};
	template<typename Left, typename Right>
	struct average<Left, Right, quantity_tag<typename Left::unit_type, typename Left::value_type>, void>{
	typedef Left result_type;
	result_type operator()(Left & left, Right & right) const{
		return left/(typename Left::value_type)right;
       }
   };
}}}

(This is a technique is known as 'static tag dispatching'.)

Fortunately this step is not necessary for std::complex<double> and std::vector<double> (taken as a vector in a linear space) if we include these adaptor headers before the accumulator headers:

#include <boost/accumulators/numeric/functional/complex.hpp>
#include <boost/accumulators/numeric/functional/vector.hpp>

These headers tell the library how to average complex numbers and vector of numbers.

Boost.Format

The C-way to print formatted variable numbers is printf("...format string...", a, b, c, ...). This function call tends to have error since we have to make sure that the format string agrees with the actual type of the variable. In other words, printf is not type-safe. The C++ (which still allows to call printf) changed the preferred syntax to the std::cout (and std::cerr) shift(<<) operator:

std::cout<<"text "<<float_number<<" text"<<std::flush<<" more text "<<integer_number<<std::endl;

The user doesn't have to worry about the type of the variable that is printed and the operator<< can be overloaded to print user defined classes. This was, in my opinion, a great improvement. Yet, sometimes it is necessary to specify the format in a way that is constructed at runtime or the format is specified in a separate part of the code. Enter, Boost.Format.

Boost.Format allows to specify printing format in a similar way as printf used to do, with similar and even greater flexibility, But in a way that is type-safe.

The format is specified by a string were the entries are represented by a number (starting at 1) enclosed in percent symbol, like %1%, %2%, %3 etc. After the format is specified the variables (which can be any type in principle) are passer via the operator % (not the string %). For example

#include<boost/format.hpp>
...
cout<< boost::format(" %1% | %2% | %3% ") % a % b % c  << "rest of the line" <<endl;

If the number of variables passed after 'operator%' is different from the maximum fields index present in the format string the program will give an error (i.e. raise an exception of type 'boost::io::too_few_args' or 'boost::io::too_many_args' respectively).

So far, in the example, the use are no different than the normal operator<< in C++. The first difference to note is that

1. The format string can be specified at runtime

std::string frmt;
... //assign frmt from user input, e.g. frmt = "%1% | %2%";
cout<<boost::format(frmt) % a % b ;

2. Format can repeat fields

cout<< boost::format("%1%, %2% | %1%") % a+b % c ; //a+b is printed twice, e.g. '4, 5, 4'

The advantage with respect to printf is that the boost::format is flexible but still type-safe.

3. For numeric types boost::format (like printf) can specify the format of the numeric output.

cout<< boost::format("%1$+05u") % 234; //prints '+0234'

For variables of any type the field number is enclosed in '%', for numeric fields with specific format the field number is enclosed by '%' on the left and '$' on the right followed by the numeric format string. This numeric format string follows the printf conventions like in these examples:

%[flags][width][.precision][length]specifier

For example '%1$+05u' prints an integer (unsigned or int) field using (at least) '5' characters (sign included) completing with left zeros '0'. Plus symbol '+' means that the sign is always specified even if the number is positive. If the field is not an integer (for example, it is double or string) the numeric format is ignored and no error is reported. If the integer needs more than three characters it will use them.

Invalid format strings give an error at run time. So it is a good idea to test the formatting in a small program, for example here http://codepad.org/BINppZ4i.

In this example, we generalize boost::lexical_cast to convert variables to strings in a certain format:

#include<boost/format.hpp>
#include<boost/lexical_cast.hpp>
namespace boost{
template<class Target, class Source>
Target format_cast(std::string const& strFrmt, Source const& s){
  Target t;
  std::stringstream ss;
  ss << boost::format("%1"+strFrmt) % s;
  ss >> t;
  if(not ss) throw boost::bad_lexical_cast();
  return t;
}
}

use as

unsigned u = 4;
std::string str = boost::format_cast<std::string>("02u", u); // str == 04;

Boost.Units

Most physical quantities have units, if we don't take proper care of units, bad things will happen. When programming we usually don't care about the units because we tend to work in consistent systems of units. Sometimes, we need to output some variables in a different system of units that the main program is working on. We do this by doing some adhoc conversion with some multiplication in the output. Other times the input is an different sets of units.

To give an example, when working in atomic problems it makes complete sense to program in atomic units (au) (Hartree, Bohr, etc.). Sometimes we would like to use input parameters in Angstroms instead or more often we need to output the stress tensor in Pascals (which is not au).

Physical units are some sort of typing system of physical model, under some physical models it doesn't make sense to sum length and time. On the other hand, if we are working in a relativistic model it does make sense to sum lenght and time (and of course since there is no disticition this is called space-time). So this "typing system" is not unique an depends on the model we are using.

Once we stablish a model by can give this dimensional information to the program. The ad hoc way is to define conversion factors all over and add comments or same specific names to the variable. But this doesn't guaranties that at some point we will sum and energy for example. This function calculates the area

 double area(double x, double y){return x*y;}

here we implicitly assume that the two doubles in the argument are lenghts (L), and the return type is an area (L*L) but the program doesn't know. So the subroutine is really a multiplication routine, not an area-calculating routine (it is just an interpretation). Somebody can without your knowledge can modify the routine and return x+y and you will interpret the result as an area.

So, what is the solution, in C++ we could create classes whose instance will keep track of the units. This runtime approach is very inefficient, suppose we have an array of 1000 length (e.g. measurements) we don't want each of these variables to have memory used with the redundant that all of them are lengths. We also want to a routine that is as efficient as the routine that works with built-in double.

The solution is to store the dimension of the variable in the type it self. Then all the dimensional correctness is enforced at compilation-time and runtime efficiency is preserved.

The idea is more or less to define type that will have dimensional information in such a way that:

 double_area area(double_length x, double_length y){return x*y;}

Manually writting all the possible resulting dimensions, handling products sums, etc, is not practical. Here is where Boost.Units comes to the rescue. Boost.Units is a very complicated library that enforces dimensional correctness. Dimensional analysis requires a linear algebra solver, this is implemented in Boost.Units by means of template technology. The official manual can also be of interest.

Definining Units

Systems of Units

To use Boost.Units we don't necessarelly need to define a system of units, most common systems SI and CGS ara already defined. But in order to give the complete example here it is the code to define the atomic units (au) system. This section gives an idea of how the library works but it can skipped.

#include <boost/units/base_unit.hpp>

#include <boost/units/physical_dimensions/mass.hpp>
#include <boost/units/physical_dimensions/length.hpp>
#include <boost/units/physical_dimensions/electric_charge.hpp>
#include <boost/units/physical_dimensions/angular_momentum.hpp>
#include <boost/units/physical_dimensions/energy.hpp>

#include <boost/units/make_system.hpp>

#include <boost/units/io.hpp> //do not remove: '...has initializer but incomplete type...' error

namespace boost{
namespace units{
namespace au{

	using std::string;

	// taken from http://en.wikipedia.org/wiki/Atomic_units#Fundamental_units
	struct electron_mass_base_unit : base_unit<electron_mass_base_unit, mass_dimension, 1>{
		static string name()	{return ("electron_mass");}
		static string symbol()	{return ("me");}
	};
	struct bohr_radius_base_unit : base_unit<bohr_radius_base_unit, length_dimension, 2>{
		static string name()	{return ("bohr_radius");}
		static string symbol()	{return ("a0");}
	};
	struct elementary_charge_base_unit : base_unit<elementary_charge_base_unit, electric_charge_dimension, 3>{
		static string name()	{return ("elementary_charge");}
		static string symbol()	{return ("e");}
	};
	struct reduced_planck_constant_base_unit : base_unit<reduced_planck_constant_base_unit, a ngular_momentum_dimension, 4>{
		static string name()	{return ("reduced_planck_constant");}
		static string symbol()	{return ("hbar");}
	};
	struct hartree_energy_base_unit : base_unit<hartree_energy_base_unit, energy_dimension, 5>{
		static string name()	{return ("hartree_energy");}
		static string symbol()	{return ("Eh");}
	};	

	typedef make_system<
		electron_mass_base_unit,
		bohr_radius_base_unit,
		elementary_charge_base_unit,
		reduced_planck_constant_base_unit,
		hartree_energy_base_unit
	>::type system;

	typedef unit<dimensionless_type,	system> dimensionless;

	typedef unit<mass_dimension,		system> mass;
	typedef unit<length_dimension,		system> length;
	typedef unit<electric_charge_dimension,	system> electric_charge;
	typedef unit<angular_momentum_dimension,system> angular_momentum;
	typedef unit<energy_dimension, 		system> energy;

	static const mass		electron_mass;
	static const length 		bohr_radius, 		bohr;
	static const electric_charge	elementary_charge; 
	static const angular_momentum 	reduced_planck_constant;
	static const energy 		hartree_energy, 	hartree;

}
}
}

The code seems complicated but it has its logic. There are four important blocks. The first is basically a table of names and symbols for the fundamental units in that system, with some abstract information on the physical dimension. The symbols will be used when printing the values to the screen. The numbers on the left are conventional numbers to identify the units internally, any positive number is ok as long as it doesn't clash with other units defined, negative numbers are reserved. (For example, the SI system --included in the library-- uses negative numbers.) The make_system line creates the system of units, and the following block gives short type names to the physical dimension of the fundamental units. Finally last block defines unit constants (magnitudes with value 1) in the system of units. Synonyms can be defined here very easily. (For example, hartree_energy and hartree).

Free standing units

Sometimes we need to define units that are not part of a larger system of units. In this way we can define units such as electron volts (eV). The quantities with such units will print nicely with the symbol 'eV' after it. At the same time it is useful that these quantities convert automatically to units with the same dimensions in a system:

namespace boost{namespace units{
namespace atomic{ //optional, this free unit can be part or not of the namespace of a system
struct electron_volt_base_unit : base_unit<electron_volt_base_unit, energy_dimension, 43289 /*unique positive number*/ > {
   static const char* name() { return "electronvolt"; }
   static const char* symbol() { return "eV"; }
};
typedef electron_volt_base_unit::unit_type electron_volt_unit;
static const electron_volt_unit electron_volt, eV;
}
}}
BOOST_UNITS_DEFINE_CONVERSION_FACTOR(
	boost::units::atomic::electron_volt_base_unit,
	boost::units::atomic::energy, 
	double, 0.03674932 // 1 eV = 0.03...E_h
);
BOOST_UNITS_DEFAULT_CONVERSION(
	boost::units::atomic::electron_volt_base_unit, 
	boost::units::atomic::energy
);

use as

quantity<atomic::electron_volt_unit> ene = 1.*atomic::eV;
cout<<(double)quantity<atomic::dimensionless>(ene/atomic::hartree)<<endl; //automatic conversion to dimensionless

Scaled units

All units (including free standing units) can defined scaled counterparts, moreover, by default a standard SI prefix (kilo/k, Mega/M, Giga/G) is added to the name of the variable if the scale is a standard power of ten (i.e. 10, 100, 1000, 1000000, but not 10000). Supported prefixes are zepto (-21 zeros), atto (-15), pico (-12), nano (-9) ... peta (15), exa (18), zetta (21), yotta (24). (Hella (27) not supported but can be added easily :) )

typedef make_scaled_unit<si::pressure, scale<10, static_rational<9> > >::type gpa_unit;
static const gpa_unit gigapascal, GPa; //use as quantity<gpa_unit> P = 10.*GPa;

(si::pressure is defined in boost/units/system/si/pressure.hpp).

automatically prints to '10 GPa' ('Pa' --for pascal-- string is defined in boost/units/system/si/io.hpp, the 'G',--for giga-- is added by the scaled unit character).

A similar code can be used to define MeV (megaelectronvolt) automaticaly based on the definition of eV in the previous section.

Defining physical quantities

Quantities with Fundamental Units

Using fundamental quantities in a given system is almost trival, first we will easy the notation by specifing that we will use Boost.Units intensively and the system of units.

#include<boost/units.hpp>
using namespace boost::units;  //for quantity

The we tell the compiler that a certain quantity has physical dimensions and how to store the actual value.

quantity<au::energy, double> a;

so defined 'a' will have the value 1 in the fundamental units of the system (Hartree). Note that at this point the program doesn't know what a Hartree is, it is just some abstract unit of a certain dimension called "energy". The we can assign a value

a = 30.*au::hartree;

We can print the values, by using the usual cout<< "a = "<< a <<endl;. The output will read

'a = 30 Eh'

where "Eh" is the symbol we defined in our table.

In the same way we can define a quantity with unit of length

quantity<au::length> b = 5.*au::bohr;

Note that we can omit 'double' since this is assumed to be the default. We declared 'a' to have units of energy, and this can not change in the future. Here it is the first implicance of using Boost.Units, if we try to do

a = b; // compile error here

which is wrong, we will get an error... from the compiler! We don't have to wait until the program is executed to realize what we just did. In fact the program will not compile until we fix this dimensional inconsistency. Here it the main difference with the typical procedure, we could have used the type double for a and b; but then the previous line would have been completelly ignored.

Although the previous line was wrong, different can be combined in correct ways for example

 std::cout<< "a * b = "<< a * b << std::endl;

will print 150. Eh a0. As expected, other combinations, like a + b will fail to compile.

Quantities with Derived Units

Obviously that to preserve the dimensional consistency de result of 'a / b' has to have units of force. So, if we want to store the result we need to declare a new variable with units of force, this is achieved that to a the Boost.Units metafunction

 quantity<unit<force_dimension, atomic> > c = a/b;

or we can shorten the syntax with using

 typedef unit<force_dimension, atomic> atomic_force_unit;
 quantity<atomic_force_unit> c = a/b;

There are many physical dimensions predefined in boost/units/physical_dimensions/. For example: acceleration, area, magnetic_flux, stress, surface_tension, velocity, volume and many others.

Still, there maybe some quantities that we need to define that don't have a name. For example if we need something with units of inverse energy times inverse volume we don't have a predefined dimension for that, although this is the units of the electronic density of states. In this case we need to define the quantity ourselves. Boost.Units uses the syntax of Boost.MPL to express such composition. For example

 using mpl::times; using mpl::divides;
 typedef boost::mpl::divides<dimensionless_type, times<lenght_dimension, volume_dimension>::type>::type dos_dimension;
 typedef unit<dos_dimension, atomic::system> atomic_dos_unit; //or one_over_hartree_Bohr3
 quantity<atomic_dos_unit> mydos = 1./en/vol;

A big shortcut could that avoid all the previous, be to use

 BOOST_AUTO(c, a/b); // or 'auto c=a/b;' in C++0x (compile with 'g++-4.4 -std=c++0x')

to let the compiler the compiler "detect" the correct type of 'c' (which can have a very complicated specification) but this can lead to problems as we loose track of the units of the quantities involved.

Mathematical Funtions on Physical Quantities [`boost/units/cmath.hpp`]

Many special functions for floating point numbers (not quantities) are defined in the header file cmath for example:

 #include<cmath>
 ...
 double a = sqrt(b);
 double b = pow(b, 3); //see below why this can not be used with units

In the same way Boost.Units implement its own cmath header (called boost/units/cmath.hpp) to take care of unit correctness:

Square Root (sqrt)

 #include<boost/units/cmath.hpp>
 ...
 quantity<unit<area_dimension, au::system> > square_area = 64*au::bohr*au::bohr;
 quantity<unit<lenght_dimension, au::system> square_side = boost::units::sqrt(square_area); // square_size = 8*au::bohr

or much shorter

 using boost::units::sqrt;
 auto square_side = sqrt(square_area);

In general the 'using' statement is not even necessary since the namespace of sqrt is deduced from the argument type. 'auto' is a keyword of C++0x, if it not used then we have to explicitly declare the quantity with the right units (and dimension).

Integer Power (pow)

In the case of power it is different because to track units at compile-time the value of the integer power must be a constant (i.e. a value known at compile-time), therefore it can not be passed as a second argument, instead it is passed as a template argument

 auto cube_volume = boost::units::pow<3>(cube_side);

Rational Power (root and pow)

It is not uncommon that sometimes rational power of units are used in intermediate calculations, the first example is sqrt above. More generally:

 auto cube_side = boost::units::root<3>(cube_volume);

The most general case is handled with

 auto cube_face = boost::unit::pow< boost::unit::static_rational<2,3> >(cube_volume);

or for short:

 auto cube_face = pow<2>(root<3>(cube_volume));

Trigonometric functions (sin, cos, atan, atan2, ...)

Trigonometric functions are implemented for arguments for which its units are of plane_angle_dimension. Using units for angles can add an extra layer of type checking but it can be counter-intuitive, specially if you are already used to work in radians. If the argument of these functions is dimensionless then the standard (e.g. std::sin is called).

Absolute value (fabs)

quantity<si::length> distance = fabs(x1 - x2);

Use existing numerical functions, subvert dimensional checking (`::from_value` and `.value`)

An arbitrary user defined function

double scale(double x, double factor){ ... ; return result;}

does not work by default with quantities, unless the quantity is dimensionless. Even if we managed to make it work by some kind of trick we would still like to keep track of the resulting dimension, which may o may not be of the same units as the input.

For simplicity let's assume that the output must have the same dimension as the input. The we can overload the function in this way and internally call the good old function

quantity<si::length> scale(quantity<si::length> x, double factor){
 return quantity<si::length>::from_value( scale(x.value(),factor) );
}

Note that 'from_value' and 'value' subvert the dimensional checking but can be nonetheless necessary for forcing the internal arithmetic.

The function can be generalized to arbitrary quantity dimension via templates. Even more challenging is to determinine (also from template programming) the correct type for the returned quantity. For this metafuctions such as power_typeof_helper<Unit, static_rational<P,Q> >::type and mpl::times can be useful.

Display internal unit types for debugging (`simplify_typename`)

Boost Units relies in creating an extremely complicated type system, which in principle should be transparent to us since as long as we know the dimensionality and the system units we are working with. In general we never refer to the complicated types that the library generates. In reality, sometimes we need to know what is the exact type of a certain variable, for example for debugging. There is a function called simplify_typename that prints the exact type of the variable (and gets rid of the ubiquitous 'boost::units' string).

#include<boost/units/io.hpp>
...
quantity<atomic::volume> v(4.*pow<3>(atomic::bohr));
cout<<boost::units::simplify_typename(v)<<endl;
cout<<v<<endl;

In this example we know that the exact type of the variable 'v' is

quantity<unit<list<dim<length_base_dimension, static_rational<3l, 1l> >, dimensionless_type>,   homogeneous_system<list<atomic::electron_mass_base_unit, list<atomic::elementary_charge_base_unit, list<atomic::reduced_planck_constant_base_unit, list<atomic::coulomb_force_constant_base_unit, list<atomic::boltzman_constant_base_unit, dimensionless_type> > > > > >, void>, double>

and its value is

4 (a_0^3)

Internally it is based on abi::__cxa_demangle from abicxx.h, which is a more general way to print type names at runtime. However 'simplify_typename' also simplifies the names by removing the 'boost::units' string.

Display quantities

Boost.Units supports pretty well the display of quantities with units:

cout<< "E = " << E << endl; // prints E = 3.55791e-12 m^2 kg s^-2

Depending on the system units can try to simplify the units before printing; this done if we #include<boost/units/systems/si/io.hpp>, in this case the printing is simplified to

 ...                        // prints E = 3.55791e-12 J

(J is for Joule unit).

Sometimes is necessary to separate the actual numerical value and the unit tag of the string representation of a quantity, for example when printing data files. We want columns of numbers to print without the units, eventually the unit string can go only once at the header of the column. So achieve this, suppose that we have a

quantity<si::energy> E = 4.*si::joules;

then we can print the numerical value and the unit separatelly in the following way:

cout<< E.value() << " * " << si::energy() ;
// or                     << quantity<si::energy>::unit_type();
// or                     << E_type::unit_type();  //after typedef typeof(E) E_type;
// or                     << symbol_name(si::energy())

Note the parenthesis () after the unit type.

(The library support also Boost.Serialization.)

Unit Conversions

Units is not just about keeping dimensional consistency; it is also about being able to convert between different unit systems.

You can notice that we use the unit system prefix when declaring a quantity, for example in atomic units:

quantity<au::energy> E1 = 10.*au::hartree;

or in SI system

quantity<si::energy> E2 = 20.*si::joules;

We may wonder why we need to specify the unit system in the right hand side. We would like to abstract ourselves from the unit system, after all, both are energies! That is correct in an ideal world, in practice we work with numbers of finite precision therefore storing an atomic quantity in SI units leads to a very small number that can even go below the precision limit or at least affect the effective precision. This is the practical reason for which we use different unit systems in the first place.

In some sense giving information on the unit system to the quantity is telling the system which order of magnitude are que quantities. Nevertheless sometimes we need to convert units.

Ad hoc transformation

There is simple way of transforming quantities without much additional code. Suppose that we have a data file with energies and we know those energies are in electron volts (eV). But our system of choise is Hartree, we can express this information in the code by just defining a quantity called eV in our unit system. In this case eV is not a unit but a physical constant.

 const quantity<au::energy> eV = 0.03674932540*au::hartree; //http://physics.nist.gov/cgi-bin/cuu/Value?evhr
 double numerical_input; 
 cin>>numerical_input;                                      //e.g. read from file
 quantity<au::energy> E = numerical_input*eV;              //this tells that original data was in eV

Now for the output we can use the opposite trick. If we print E, the output will be in Hartree.

 cout<<E;   // shows for example 4. Eh

we can force the output to we in eV by doing

 cout<< (double)(E/eV) << " eV"; //shows 108.84 eV

The cast operation '(double)' is needed to convert the dimensionless quantity to the built-in variable. Otherwise the printing will print 'dimensionless' after the numerical value. Cast operation to built-in type are useful in other context and only and it is only permitted if the quantity is dimensionless.

This solution is not optimal but it works pretty well for quantities output.

Automatic Type Declaration in C++0x (auto)

A word of caution about the 'auto' declaration type: In this example the type (and units) of x is determined automatically:

 auto x = 0*au::hartree; //valid C++0x
 x+=0.1*au::hartree;
 cout<<x<<endl;

but outputs '0 Eh'!! This is because this declaration is equivalent to

 quantity<au::energy, int> x = 0*au::hartree;

Note the 'int'. That is, the internal value is an integer. This is most likely not the desired behavior. Although the value initialized is zero the type the variable carries is not a floating point. Remember that '0' is an integer literal in C++. There are few alternatives.

Declare the type explicitly

 quantity<au::energy, double> x = 0*au::hartree;

Add a decimal point or a literal indicator to the literal number

 auto x = 0.*au::hartree; // or 0.0*au::hartree; // or 0f*au::hartree;

Physical Constants (SI codata)

For convenience the library includes the most common physical constants. For the moment physical data is given only for the SI systems, but with the implicit conversions it can be used in any unit system. The definitions are contained in boost/units/systems/si/codata_constants.hpp.

Boost.Operators and Boost.Concepts

This section is more 'mathematical'. When we program we tend to forget all that we know about mathematical theory we do so because too much abstraction at the coding level does not improve the code quality and speed, also C++ (or any other compiled language) does not provide by default enough facilities to allow this abstraction.

These two libraries, when combined, provide a nice framework to introduce some abstractions and at the same time a certain level of mathematical correctness to our code.

Abstraction will not provide faster code or readability for the most part, it only generalizes our code to the maximum extend that the mathematics behind the problem can provide.

Another constrain that I will impose is that the mathematical abstraction should not be introduced at the cost of program execution speed.

Let's start with a very simple example, let's try to construct a type that represents a vector space of fixed dimension (think of a 3D vector), a naive and perfectly valid implementation is the following

class v3d{
  public:
  boost::array<double, 3> coord_;
  v3d(boost::array<double, 3> const& init) : coord_(init){}
  friend v3d operator+(v3d const&   , v3d const&){ ... elementwise addition ... }
  friend v3d operator*(double const&, v3d const&){ ... elementwise scalar multiplication ... }
};

(I will use the 'friend' version for non-member operations).

In principle this a perfectly valid three-dimensional (real) linear space. This implementation allows us to perform any operation possible in this space. (There is no inner product defined, but this is not necessary to have a vector space, we are trying to be mathematical here, remember?).

The aim this section will be to create a class (or more exactly a template class) that is almost mathematically correct. There are obvious ways to make this class more general, first of all. We may want to have a complex space or different number of dimensions. Since we don't want to affect the performance if we need only a real space or a small number of dimensions, the way to implement this is with C++ templates:

template<typename T, unsigned NDim>
class vs{
  boost::array<T, NDim> coords_;
  ... 
};

This is a huge generalization. Yet we realize very soon that the use of the library is quite limited in the syntax. For example the user can not substract directly two vectors (v1-v2), it can not increment a given vector (v1+=v2), etc. The list of possible operation between elements is very large, in principle we need to implement +=,+,-=,-,*=,*,/=,/,==,!= and furthermore the implementations have to be consistent with each other. This is prone to errors. On the other hand we know that many of these operations have redundant implementations.

This is where Boost.Operators come to the rescue, it implements several operators in terms of others in a way that is consistent with the usual arithmetics. Let's start with += and +. We can have both operators at the price of one with inheriting our class from a template class provided by boost operatos.

template<typename T, unsigned D>
class vs : boost::addable< vs<T,NDim> >{
  ...
  friend vs& operator+=(vs&, vs const&){ ... }
};

The trick is that the operator+ (for v1+v2) will be available to the class thanks to boost::addable even if we don't implement such operation explicitly. Two for the price of one. We can go further and do the same with operator -= and have operator- defined automatically.

template<typename T, unsigned D>
class vs : boost::substractable< vs<T,D> >, boost::addable< vs<T,D> >{
  ...
  friend vs& operator+=(vs&, vs const&){ ... }
  friend vs& operator-=(vs&, vs const&){ ... }
};

for each set of operation we can add a new dependency, in order for the list not to become to long, the library offers shortcuts, for example a class that additive is substractable and addable. This shortcut is provided because this is a usual case (mathematically speaking)

template<typename T, unsigned D>
class vs : boost::additive< vs<T,D> >{
  ...
};

This way of defining mathematical classes does not only provides syntactic sugar but also consistency within the syntaxis.

To complete the operations we need a couple of additional operations: for scalar multiplication/division and for comparisons. It turns out that this operators also can be defined with a minimal effort.

template<typename T, unsigned D>
class vs : 
  boost::additive< vs<T,D> >,                //provides + (binary) and - (binary) given += and -=
  boost::multiplicative< vs<T,D> , T>,       //provides * and /  given *= and /=
  boost::equality_comparable< vs<T,D> , T>   //provides != given ==
{  
  boost::array<T, D> data_;
  friend vs& operator+=(vs&, vs const&){ ... }
  friend vs& operator*=(vs const&, T const&){ ... }
  friend vs& operator-=(vs& self, vs const& other){return self+=-other;}
  friend vs& operator/=(vs const&, T const& t){return self*=(T(1)/t);}
  // operator== is defined by default, and is correct if implementation is value based
}

Note that that we only need to implement two operators '+=' and '*='.

Concrete implementation vs. mathematical abstraction

An obvious implementation for operator += and *= looks like this

for(std::size_t i=0; i!=self.size(); ++i)
  self.data_[i] += other.data_[i]; //or *=t;

This is also completely acceptable but it is not the degree of abstraction we want. The produced class has the expected mathematical behavior but it has information extraneous to a mathematical description. For example, why are we forced to use boost::array, for example a user may want to implement the linear space with std::vector or some other container or not even a container. Second, even if we stick to a container implementation, why element by element addition has to be the concept of sum (there are other composition laws that allow for a correct concept of sum). Furthermore, why the linear space has to have finite dimensionality!! (finite dimensionality is not a mathematical requirement for a linear-space.)

To solve the first problem we have to parametrize the template class in the implementation, the class is changed to something like

template<class Element>
class vs{
  Element data_;
  friend vs& operator+=(vs&, vs const&){ ... ??? ... }
  friend vs& operator*=(vs&, T??? const&){ ... ??? ... }
};

Now, this is completely generic regarding the internal representation of the linear space; but it creates a big problem. There is no way to implement += in a generic way, even if we constrain Element to be a container we don't know what is the type of the 'scalar' (T???) in the scalar multiplication!! The only way out is to add this information explicitly to the class template parametrization. This is becoming really mathematical. Let's review for a moment the mathematical definition of a linear space. Using Nakahara's [2] definition

"A vector space (or linear space) Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle V} over a field Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K} is a set in which two operations, addition and multiplication by an element of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K} (called scalar), are defined. The elements (called vectors) of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle V} satisfy the following axioms

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{u} + \mathbf{v} == \mathbf{v} + \mathbf{u} } (commutative addition)
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (\mathbf{u} + \mathbf{v}) + \mathbf{w} == \mathbf{u} + (\mathbf{v} + \mathbf{w}) } (associative addition)
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \exists \mathbf{0}_V / \mathbf{v} + \mathbf{0}_V == \mathbf{v} } (additive identity/zero)
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \exists (-\mathbf{u}) / \mathbf{u} + (-\mathbf{u}) == \mathbf{0}_V } (additive inverse/negation)
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c( \mathbf{u} + \mathbf{v} ) = c \mathbf{u} + c \mathbf{v}} (linear scalar product)
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (cd) \mathbf{u} == c(d \mathbf{u})} (distributive scalar product)
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 1_K \mathbf{u} == \mathbf{u}} (scalar product multiplicative)

(where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{u},\mathbf{v} \in V} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c,d \in K} )"

The first paragraph tell us that we have to define a set of elements (the equivalent of our implementation data type), then we need to define a scalar field and then implicitly it says that we should introduce a concept of addition and multiplication with certain properties. These four pieces of information are in general independent, so we need to parametrize the vector space with all of them.

template<
  class Element,
  typename Field_Scalar,
  class Commutative_Associative_Addition,
  class Associative_ScalarMultiplication>
...

At the same time we want a desirable mathematical property, that is that when we provide the linear space properties to a given set, the elements of the set preserve certain properties. For example, given the set of one variable functions, we can define a linear space by providing a concept of sum and multiplication (e.g. Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (a f + b g)(x) = a f(x) + b g(x)} ) in the process we still think of the elements of the set as functions. The way to preserve the properties of the original set is to use public inheritance from the set type instead of class member of that type.

...
class vs : public Element,
  boost::additive< vs<T,D> >,                
  boost::multiplicative< vs<T,D> , T>,       
  boost::equality_comparable< vs<T,D> , T>
{ ...

We follow by defining certain details for the space

  ...
  typedef Element                          element;
  typedef Field_Scalar                     scalar;
  typedef Commutative_Associative_Addition addition;
  typedef ScalarMultiplication             scalar_multiplication;
  ...

The addition and scalar multiplications are of course mean to be function objects that are shared by all the class elements and that are inmutable (constant) from the point of view of the vector space.

  ...
  static addition add_;
  static scalar_multiplication mult_;
  ...

We also need to be able to generate a vector in the linear space from an element of the set

  ...
  vs(element const& elem) : Element(elem){}
  ...

Then we follow with

  ...
  friend vs& operator+=(vs& self, vs const& other){return self =   add_(self, other )  ;}
  friend vs& operator*=(vs& self, scalar const& s){return self =  mult_(self, s     )  ;}
  friend vs& operator-=(vs& self, vs const& other){return self += (-scalar(1)  )*other ;}
  friend vs& operator/=(vs& self, scalar const& s){return self *= ( scalar(1)/s)       ;}
  ...

Thanks to Boost.Operators the other operators are defined consistently. Following the mathematical definition we have to define (or at least be able to generate) the zero element (it must exists regardless of anything else) and the negative element

  ...
  friend vs        zero();
  friend vs        operator-(vs const& v){vs ret(v); ret*=-scalar(1); return ret;} 
  friend vs const& operator+(vs const& v){return v;                              }
};

which finishes for the moment the definition of the class. The definition of the function 'zero()' is left undefined. This is because it is not always needed, second it depends on the details of the linear space we want to define. The complete code looks like this.

Example 1: boost::array as a finite dimensional linear space

By default boost::array doesn't have any arithmetical operator associated to it. So any arbitrary concept of addition in a linear space formed for such elements has to be defined manually. In the context of the class defined above there are three different way to achieve this depending on the syntax we want to achieve. The first two options produce the most compact syntax but requieres us to define classes in the std namespace or the boost namespace

a) Specialize std::plus

using boost::array
namespace std{
template<>
array<double,3> plus<array<double, 3> >::operator()(array<double, 3> const& a1, array<double, 3> const& a2) const{
  array<double, 3> ret;
  for(unsigned i=0; i!=3; ++i){
    ret[i]=a1[i]+a2[i];
  }
  return ret;
}
}
...

b) Overload + in namespace boost (the namespace of boost::array)

namespace boost{
array<double, 3> operator+(array<double, 3> const& a1, array<double, 3> const& a2){
  .. same function body ...
}

In both cases the linear space class can be used without explicitly specifying the operation

array<double,3> a_init={{1,2,3}}; array<double, 3> b_init={{2,3,4}};
linear_space<array<double, 3> > a=a_init;
linear_space<array<double, 3> > b=b_init;

c) Define an independent object-function

class elementwise_add{
  array<double,3> operator()(array<double> const& a1, array<double,3> const a2) const{
    ... same function body ...
  }
};

In this last case the addition operation has to be explicit

linear_space<array<double, 3>, elementwise_add> a=a_init;
linear_space<array<double, 3>, elementwise_add> b=b_init;

After either a), b) or c) is chosen we can apply the + operation in the sense prescribed for this linear space:

linear_space<array<double, 3>, elementwise_add> c=a+b;

Note that the example the linear space works even though the scalar multiplication was never specified. This is fine, since such operation was not used in the example. (You don't pay for what you don't use.)

To define a multiplication we just need to define a new class and pass it as argument in the linear space.

class elementwise_mult{
  array<double,3> operator()(double const& s, array<double, 3> const& a) const{
  array<double, 3> ret=a;
  for(unsigned i=0; i!=3; ++i){
    ret[i]*=s;
  }
  return ret;
};
...
linear_space<array<double, 3>, elementwise_addition, double, elementwise_multiplication> a=a_init;

At this point it may be beneficial to use typedef

typedef linear_space<array<double, 3>, elementwise_add, double, elementwise_mult> mylinear_space;
mylinear_space a, b;
mylinear_space c=a+b;

Example 2: Implement an infinite dimensional space with `std::vector`

Given the famous vector class, we can define an addition concept that accept vectors of any size, provided that we interpret out of bound elements as zeros. std::vector class allow us to return vectors of arbitrary size. NB: std::vector is not a vector space, it is called vector for other reasons, it doesn't carry an addition concept for example.

class size_variable_elementwise_add{
  std::vector operator()(std::vector const& v1, std::vector const& v2) const{
    std::vector ret(std::max(v1.size(), v2.size()));
    for(unsigned i=0; i!=ret.size(); ++i){
      if(i>=v2.size()) ret[i]=v1[i];
      else if(i>=v1.size()) ret[i]=v2[i];
      else ret[i]=v1[i]+v2[i];
    }
    return ret;
  }
};

This particular implementation of addition induces a linear space

 linear_space<std::vector, size_variable_elementwise_add>

which dimensionality is infinite in the mathematical sense that the number of independent vectors is not bounded by the program, (well ... it is bounded by the available memory and the runtime.)

Example 3: implement an efficient infinite dimensional linear with `std::map`

For practical causes when we think of a potentially infinite dimensional vector we probably think of a sparse vector. A sparse vector (many elements are zero) a good choice is to use something different from an array, for example std::map provides a way to defining a sparse option, the reason is that zero elements are omitted.

class elementwise_add_map{
  std::map<unsigned, double> operator()(std::map<unsigned, double> const& m1, std::map<unsigned, double> m2) const{
    std::map ret;
    for(std::map<unsigned, double>::const_iterator im1=m1.begin(); im1!=m1.end(); ++im1){
      ret[im1->first]=im1->second;
    }
    for(std::map<unsigned, double>::const_iterator im2=m2.begin(); im2!=m2.end(); ++im2){
      ret[im2->first]+=im2->second; if(ret[im1->first]==0) ret.erase(im2->first);
    }
    return ret;
  }
};
...
linear_space<std::map<unsigned, double> , elementwise_add_map>

Multiplication also can be defined for non-zero elements only.

This is an optimal implementation of a sparse vector, the point of this example is to show that the efficiency of the program is related to the actual implementation of the class and not to the linear_space template class itself.

We can temporarily view any object as an element as a linear space without constructing a new object by casting to a reference type. This allows to perform linear space operations for arbitrary objects.

array<double,3> a ,b, c;
typedef linear_space<array<double,3>, element_wise_add, element_wise_mult >& ls_elem;
(ls_elem&)c=(ls_elem&)a+(ls_elem&)b;

(section not finished)

Boost.Spirit

One of the more tedious and unfortunately unavoidable task in programming is parsing. Parsing means that given a "free form" text string we have to convert that into data that the program can manipulate, in order to do that we need to "interpret" the text. The parsing includes walking the input text and executing actions each time a certain pattern in found.

Doing this manually is prone to errors and lack of generality. Boost.Spirit is a complex library that allows parsing text quite easily.

The following parser a line that is expected to be comprised by two integers and a word. As this values are read and interpreted they are assigned to target variables. The beauty of it is that the syntax is straight forward and we will know if the parsing failed right away because of the bool returned, in that case the iterator first will contain the point at which parsing failed. This is the simplest use of Boost.Spirit, more complex parse patterns can be defined.

#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>

namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
namespace ascii = boost::spirit::ascii;

int main(){
       std::string source = "   1   13  Al";

	int idx; 
	int atomic_number;
	std::string name; 
 
       std::string::iterator first = source.begin(), last = source.end();
       qi::phrase_parse(first, last,
        	( qi::int_ >> qi::int_     >> qi::lexeme[+~qi::char_(' ')] ), ascii::space,
		  idx,        atomic_number,  name
	);
	std::cout << idx << " " << atomic_number << " " << name << std::endl;
	return 0;
}

Boost.Proto

V

Boost Toys

Boost has a few hidden small libraries, some hidden as implementation details in other libraries

Version

Once in a while we use features that are available in one version of Boost but not in previous versions. Different systems have different version of Boost, for example, by default:

Chaos 4.3/RedHat 5.5: Boost 1.33
Ubuntu 8.04: Boost 1.34
Ubuntu 10.04: Boost 1.40
Fedora 13: Boost 1.41
Ubuntu 10.10: Boost 1.42

Using new features generally in a system with an older version usually results in a compilation error. Unfortunately the error messages are cryptic and this can be very confusing for the user. This can be resolved by a static (compile time) assert that verifies that the system is using a certain version of Boost.

The version information of the Boost distribution is included in

#include<boost/version.hpp>

in a macro defined (for example) as

#define BOOST_VERSION 103301

So, for example to check that one is using Boost version 1.42 *at least* we can add the following code before using such feature.

BOOST_STATIC_ASSERT( BOOST_VERSION / 100000 >= 1 && BOOST_VERSION / 100 % 1000 >= 42);
.. code valid only in 1.42 ..

The expression is verified at compile time and it will give and error message that will be easy to spot. BOOST_VERSION can also be used in #if/#endif blocks.

Such error message will make clear that to use that code we need a newer version of Boost installed. (For example manually, see at the beginning.)

Progress Bar

One of them is Boost.Progress. It gives a convenient way of printing a progress bar to output. We only need to supply an estimated completion count and increment the counted as progress in the calculation is made:

#include<boost/progress.hpp>
...
boost::progress_display show_progress( max_counter, std::cout, "title of bar:\n" );
for (int count = 0; count < max_counter; ++count){
    ...
    ++show_progress;
}

Each time ++show_progress the internal counter is incremented proportionally and a nicely formatted percentage bar will be updated:

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
****************************

The last two arguments are optional.

If the process requires more counts than the estimate provided (max_counter) it will keep adding stars.

Printing other stuff from inside the loop can ruin the formatting of the bar.

Printing type names (`boost::units::detail::demangle`)

Obtaining type information at runtime is implemented in C++ via typeid. Unfortunatelly the typename results extraction can give cryptic results, probably associated with internal strings used to represent a type. To transform a name into an human redeable string we need a name demangler.

For example, simply calling typeid(variable).name() can give something like

N5boost5units8quantityINS0_4unitINS0_4listINS0_3dimINS0_19time_base_dimensionENS0_15static_rationalILln1ELl1EEEEENS0_18dimensionless_typeEEENS0_18homogeneous_systemINS3_INS0_6atomic23electron_mass_base_unitENS3_INSC_27elementary_charge_base_unitENS3_INSC_33reduced_planck_constant_base_unitENS3_INSC_32coulomb_force_constant_base_unitENS3_INSC_27boltzman_constant_base_unitES9_EEEEEEEEEEEEvEEdEE

which has straneous symbols contained in it. Instead

#include<boost/units/details/utility.hpp> //not necessary if boost.units is already included
using boost::units::detail::demangle;
cout << demange(typeid(var).name()) << endl;

gives

quantity<unit<list<dim<time_base_dimension, static_rational<-1l, 1l> >, dimensionless_type>, homogeneous_system<list<atomic::electron_mass_base_unit, list<atomic::elementary_charge_base_unit, list<atomic::reduced_planck_constant_base_unit, list<atomic::coulomb_force_constant_base_unit, list<atomic::boltzman_constant_base_unit, dimensionless_type> > > > > >, void>, double>

this particular function is specifically written to simplify the names of boost.Units quantities (by removing the string 'boost::units::') but can be used for any type.

Boost.ProgramOptions

Well, this is not a small library but its description is small and covers also a small but frequent feature of every pogram, the command line arguments. We start with a small program, and quickly we realize that we need to pass options at the command-line. In C (and C++) command line options are received by main program via the argc and argv parameters

int main(int argc, char* argv[])

the first argument is the number of parameters (words) and argv is a c-array of c-string passed (included the name of the executable itself). Typically we would implement some sort of parser that reads the strings one by one, reading names of variables, etc. This process is tedious and prone to errors. Even after much effort the solution will be short of features. For example suppose that we want to have an option file instead of command line options, we will need to rewrite the parser for the file, etc. An after we finish we will need to implement a help message describing all the options available and maintain it.

Boost.ProgramOptions solves this problem elegantly. It parses the options from a source (command line, config file or environment variables) and stores it in local variables from where the values can be interpreted more easily.

The simplest example is one in which the main program needs to know the value of several input variables, some of the with default values.

// program.cpp
int main(int argc, char* argv[]){
	double kT=-1;
	double EF=-1;
	{
		using namespace boost::program_options;
		options_description desc("options");
		desc.add_options()
			("temp,T"  , boost::program_options::value<double>(&kT), "set electronic temperature")
			("fermi,EF", boost::program_options::value<double>(&EF)->default_value(10.), "set fermi energy"          );
		variables_map vm;
		try{
			store(parse_command_line(argc, argv, desc),vm);
		}catch(...){clog<< desc <<endl; throw;}
		notify(vm); //don't forget this line
	}
	cout<<"EF is "<<EF<<endl;
	cout<<"kT is "<<kT<<endl;
...

The program can be called as

./program --temp=300. --fermi=10.  #or as ./program -T 300. -EF 10.

Note that this follows the [GNU Argument Syntax] for long and short options.

If we pass options that are not listed (or have a typo) or the parameter has an incorrect value type (string instead of number) the program will give an error:

what():  unknown option dada
what():  in option 'fermi': invalid option value

Also the "help" message is printed by the line

clog << desc << end;

In this case:

options:
  -T [ --temp ] arg        set electronic temperature
  -E [ --fermi ] arg (=10) set fermi energy

This message is constructed automatically from the description object. Option names, default values, assignment to local variables and options description (help message) are all in one place.

This example covers most of the simplest cases. For positional arguments (arguments with no names intepreted by position in command line), other ways to read the information on the variables_map, reading options from config files and non-convetional syntax, see the manual.

Note: the executable needs to be linked to the boost_program_options (c++ a.cpp -lboost_program_options).

Boost.RegEx (regular expressions)

Boost.RegEx is a another major complex library for string manipulation by regular expressions. In this context can be useful sometimes to process outputs from other programs. RegEx is also part of TR1 the next C++ standard library.

Regular expressions are ways to define complicated text patterns. There are several regular expression definitions (syntaxes), Perl regular expressions is one of them. Regular expression can be used to validate string input, extract substrings and do string replacement.

In the following example, the numbers contained in the input string are isolated and enclosed in curly brackets.

#include <iostream>
#include <string>
#include "boost/regex.hpp"
int main() {
	boost::regex reg("[+,-]*\\d{1,}", boost::regex::icase|boost::regex::perl);
	std::string s = "a_0^-1 b^2 c^-32 d^+44";
	s = boost::regex_replace(s,reg,"{$0}");
	std::cout << s << std::endl;
}

will print "a_0^{-1} b^{2} c^{-32} d^{+44}". (This is a simple way to convert unit outputs (see Boost.Units) to something that understandable for LaTeX code.)

There is not much else to say about Boost.RegEx except that it can save you lots of coding for text processing and that you actually have to learn about regular expressions before using Boost.RegEx. Just some notes of caution:

a regular expression involving the character \ has to by typed as \\ in the code.
not all strings are valid regular expressions, if the string is not a valid regular expression the program will abort the execution (it won't give an error at compile time). If you want compile-time (static) regular expressions you have to use Boost.Xpressive.
Make sure you specify the right regular expression syntax (in this example boost::regex::perl) of your regular expression.
You have to link the executable to libboost_regex (c++ -lboost_regex ...). Make sure (with command ldd) that you are linking to the binary library (libboost_regex) consistent with header file (regex.hpp). If it not correctly linked the library can return the wrong result without warning.
Some characters used in regular expressions (eg. \) are reserved characters in C, therefore they must be preceded by another '\'.

Other sources:

http://onlamp.com/pub/a/onlamp/2006/04/06/boostregex.html
http://www.boost.org/doc/libs/1_43_0/libs/regex/doc/html/index.html
http://regexpal.com/ (Regular expression tester, good for learning regular expressions)
http://gskinner.com/RegExr/ (Another tester with replace options)
http://en.highscore.de/cpp/boost/stringhandling.html#stringhandling_regex
http://www.johndcook.com/cpp_regex.html

(not)Boost.Process

This library is not included officially in Boost yet. It can be added to Boost very easily by

mkdir -p ~/usr/include
cd ~/usr/include  # assuming that the rest of Boost is in ~/usr/include/boost
svn co http://svn.boost.org/svn/boost/sandbox/process/boost

(No compilation of the library is required.)

The manual is hosted at http://www.highscore.de/cpp/process/ with a detailed tutorial in http://www.highscore.de/boost/process/. Its purpose is to give the ability of executing a process (an external program) and communicate back and forth with it by the standard output (i.e. program's output) and standard input (if program received input at all).

(not)Boost.Phoenix3

This library is not included officially in Boost yet. Boost included Phoenix2 as part of Boost.Spirit. It can be added to Boost by

cd ~/usr/include/boost  # assuming that the rest of Boost is in ~/usr/include/boost
svn co https://svn.boost.org/svn/boost/sandbox/SOC/2010/phoenix3/boost/phoenix

(No compilation of the library is required.)

The documentation can by viewed (for the moment) at https://svn.boost.org/svn/boost/sandbox/SOC/2010/phoenix3/libs/phoenix/doc/html/index.html

(not)Boost.Geometry

[Boost.Geometry http://geometrylibrary.geodan.nl/index.html] is a library that is accepted to Boost but it not yet shipped with the Boost distribution (but available from trunk). It is a library designed to manipulate geometric objects, vectors, polygons, etc. independently of it internal representation, coordinate system or dimension.

For the moment it can be installed together with Boost by doing:

mkdir -p ~/usr/include/boost
cd ~/usr/include/boost
svn co http://svn.boost.org/svn/boost/trunk/boost/geometry
svn co http://svn.boost.org/svn/boost/trunk/boost/range

(this is not necessary if you get Boost from the svn trunk).

P-Stade Oven

The library P-Stade is not part of Boost but it follows its design and it deserves honorable mention. Technically it is a "range" library that operates on ranges of elements in a container. It is a header only library and therefore the installation is very easy. Just download and copy the header files from http://sourceforge.net/projects/p-stade/ your local directory (e.g. $HOME/usr/include)

cd ~/usr/include
wget http://downloads.sourceforge.net/project/p-stade/pstade/1.04.3/pstade_1_04_3.zip
unzip pstade_1_04_3.zip
rm pstade_1_04_3.zip

all the files will be installed in ~/usr/include/pstade. This library needs Boost to be installed already.

The official manual is located here while a nice set of examples can be found in the entries of this blog. Another source of examples is here.

With pstade/oven/initial_values.hpp every sequence (except c-arrays) becomes directly intializable with a list of "initial values". This is particularly useful to initialize const containers (this is similar to Boost.Assign). Once we have a collection of elements (in some container) we can apply functions to this object. The trivial function is pstade/oven/identities.hpp, it does nothing to the elements. This function is useful anyway because internally it changes the representation of the collection to something that can be printed independently of the source. In the following example I use identities in order to print the original sequence. Other functions used are reversed, sorted, filtered and partitioned.

The function names are self-explanatory. The only one that need explanation is partitioned wich returns a std::pair with two subsets depending in some boolean function. Note the intensive use of the pipe operator |, the functions can be applied in any sequence.

#include <iostream>
#include <pstade/oven/initial_values.hpp>
#include <pstade/oven/identities.hpp>
#include <pstade/oven/reversed.hpp>
#include <pstade/oven/sorted.hpp>
#include <pstade/oven/filtered.hpp>
#include <pstade/oven/partitioned.hpp>
#include <pstade/oven/io.hpp>
#include <pstade/oven/equals.hpp>
using namespace pstade::oven;

//a free function 
bool is_even(int const& x) { return x % 2 == 0; }

int main(){
  using std::cout; using std::endl;

  //works with std::vector
  const std::vector<int> v = initial_values(3, 1, 4, 5, 2);
  cout << (v | identities) << endl; //prints {3,1,4,5,2}

  //works with boost::array
  const boost::array<double, 3> a = initial_values(1, 3, 5);
  cout<< (a | reversed  ) <<endl; //prints {5,3,1}
  cout<< (a | identities) <<endl; //prints the original {1,3,5}
  
  //works with a c-array
  const double arr[3] = {22,33,11};
  cout<< (arr | sorted)  <<std::endl; //prints {11,22,33}

  cout<< (v | filtered(&is_even)   )        <<endl; //prints {4,2}
  cout<< (v | partitioned(&is_even)).first  <<endl; //prints {4,2}
  cout<< (v | partitioned(&is_even)).second <<endl; //prints {3,1,5}
  (v | back) = 2; //assing a value to the last element
//(v | back_value) = 3; //doesn't work because back_value returns a copy (and not the original element)
  cout<< (equals(v, initial_values(1,2,3))  <<endl; 
  return 0;
}

Other Resources

Online book on selected Boost libraries http://en.highscore.de/cpp/boost/index.html
Reference on C++ and the Standard Template Library http://www.cppreference.com/wiki/start
The classic SGI Standard Template Library Programmer's Guide http://www.sgi.com/tech/stl/
Boost Users group list http://groups.google.com/group/boostusers?hl=en

BOOST Library: Difference between revisions

Revision as of 08:28, 25 February 2011

General Remarks

BOOST build and installation

Versions 1.37 and 1.38

Versions 1.39 and up

Possible Issues

Intel compiler for Boost.MPI

Remove Boost

Boost.Array Library

Comparison with other containers

Initialization

Canonical

Using Boost.Assign

Replacing raw c-arrays

Multidimensional arrays via Boost.Array

Boost.MultiArray Library

Basic Usage (multi_array)

Using MultiArray on existing code together with C-arrays (multi_array_ref)

A simple FFTW example

Generic code, but not too generic (Boost.Utility enable_if and Boost.TypeTraits)

Fortran ordering and Fortran functions (fortran_storage_order)

Lapack examples

Singular Value Decomposition

QR Decomposition

Orthogonalization via Cholesky Decomposition

Arbitrary ordering (and direction)

Assign from c-arrays

Custom index ranges (reindex)

Range Checking (and Boost.Assert)

Accessing elements

Regular subarrays

Subarrays (sub_array)

Views (array_view)

Wrap view (code example) (array_wrap_view)

A simple FFTW wrapper

'Many' Transforms

Special Memory Allocation

Reshape (reshape)

Example of reshape with FFTW

Boost.MPI Library

Simple MPI FFTW example

Important notes about MPI-FFTW3 memory distribution (do not skip)

Collective Reduce operation

all_reduce

Note on function objects (Boost.Lambda and Boost.Phoenix)

reduce

The issue of dangling variables (and Boost.Optional)

Transparent blocks via exceptions (boost/assert.hpp)

Primitive or point-to-point operations (barrier, send & recv)

Other collective operations (gather and broadcast)

Parallel logging example

Other Resources

Boost.Filesystem

Path

Common path name functions

Common file actions

Directory Iterator

Boost.StaticAssert

Boost static computations (Metacomputations)

Static Rationals

Boost.Fusion

fusion::map

Boost Numerics

Boost.Interval

Boost.Random

Normal distribution

Random Sources

Distributions

Note on syntax (&)

Boost Math Constants

Boost Math Special Floating Points

Boost.Accumulators

Special numerical operations (boost::numeric::functional)

Boost.Format

Boost.Units

Definining Units

Systems of Units

Free standing units

Scaled units

Basic Usage (`multi_array`)

Using MultiArray on existing code together with C-arrays (`multi_array_ref`)

Fortran ordering and Fortran functions (`fortran_storage_order`)

Custom index ranges (`reindex`)

Subarrays (`sub_array`)

Views (`array_view`)

Wrap view (code example) (`array_wrap_view`)

Reshape (`reshape`)

`all_reduce`

`reduce`

Primitive or point-to-point operations (`barrier`, `send` & `recv`)

Other collective operations (`gather` and `broadcast`)

Note on syntax (`&`)

Mathematical Funtions on Physical Quantities [`boost/units/cmath.hpp`]

Use existing numerical functions, subvert dimensional checking (`::from_value` and `.value`)

Display internal unit types for debugging (`simplify_typename`)

Example 2: Implement an infinite dimensional space with `std::vector`

Example 3: implement an efficient infinite dimensional linear with `std::map`

Printing type names (`boost::units::detail::demangle`)