BOOST Library: Difference between revisions

From Micro and Nano Mechanics Group
Jump to navigation Jump to search
Line 37: Line 37:
Append the following quoted line to user-config.jam
Append the following quoted line to user-config.jam


echo "using mpi : /opt/mpich/gnu/bin/mpicxx ;" >> user-config.jam
echo "using mpi : mpicxx ;" >> user-config.jam


where mpicxx can be replaced by your mpi compiler wrapper, for example "/usr/lib/mpich/bin/mpiCC".
where mpicxx can be replaced by your mpi compiler wrapper, for example "/usr/lib/mpich/bin/mpiCC", or first selected interactively with "mpi-selector".
Then compile and install the package
Then compile and install the package



Revision as of 23:05, 20 January 2009

BOOST Libraries

For those using C++ as a programming language, BOOST is a very useful set of libraries that help programming in C++ by providing tools and commonly used idioms that solve recurring problems that emerge when developing in this language, even at a fairly low level. It also encourages certain style of programming, where genericity and efficiency are priority. These tools include macro definition, template functions and classes, functions, generic data containers, and common algorithms. It can be viewed as a natural extension of the C++ Standard Template Library (STL).

In my opinion BOOST has a very steep learning curve and the syntax can become sometimes very ugly and complicated, but it faster to learn BOOST than to reinvent the wheel and come with buggy home-brew solutions that at the end of the day will look even more ugly and won't be that good anyway. (By good I mean generic and efficient.) Note that I am not comparing BOOST and C++ with other tools like Matlab, or Fortran, or Python. Those language are likely to do better (and quick development) than C++ in many specific areas. (BTW, Boost also closes that gap without introducing modifications to the language.)

Libraries included in Boost range from very simple ones (like lexical_cast) to very complex ones like (boost.mpi and boost.gil).

The aim of this document/tutorial is to incrementally document the utility and working practical examples. Extra documentation added by users is very welcome, specially in the area of non-trivial but concise examples, related to our common programming problems (i.e. scientific programming).

Topics covered in the first stages of this document will include: Boost building and installation, boost.multi_array and boost.mpi. That are the libraries that I have to deal with at this point. Ideally I would like to add boost.ublas in the future and document examples of small libraries that are very usefull for everyday tasks. Ideally the focus will be to document the interaction of these libraries with numerical libraries such as FFTW and Lapack.

BOOST build and installation

It is a bless that most Linux distributions now include some versions of Boost. Many Boost libraries are header-only, which means that no linking is necessary (an #include<boost/ ... > line set us ready to use a particular library). Linking is not hard either, it is just a matter of add something like "-lboost_NAME" during compilationg (with certain naming convention). The current tutorial has been tested in wcr.stanford.edu.

The sad part is that most distributions are usually outdated, for example the current version of Boost is 1.36, while the most recent distributions come with version 1.34 at best. For "header-only" libraries the situation is not that bad because the header files can be downloaded and they ready to use. For binary libraries we need to compile them. There are tools to compile specific libraries but it will be easier for the moment to compile the whole Boost set.

As usual, to be kind with our usage in admintrated system we will install Boost in our user space. for example in $HOME/user.

 mkdir $HOME/usr

the directories ~/usr/include/boost and ~/usr/lib will be created and populated after the building of Boost. Be brave and install the last current version from the link bellow:

 mkdir $HOME/soft
 cd $HOME/soft
 wget http://internap.dl.sourceforge.net/sourceforge/boost/boost_1_37_0.tar.gz
 

If the download fails, try to download the last version from Sourceforge Boost download page. Then decompress it:

 tar -zxvf boost_1_37_0.tar.gz
 cd boost_1_37_0
 ./configure --prefix=$HOME/usr

Append the following quoted line to user-config.jam

 echo "using mpi : mpicxx ;" >> user-config.jam

where mpicxx can be replaced by your mpi compiler wrapper, for example "/usr/lib/mpich/bin/mpiCC", or first selected interactively with "mpi-selector". Then compile and install the package

 make
 make install

Go lunch, that takes 25 minutes. After successful compilation and installation, all the header files will be installed in ~/usr/include/boost-1-37/boost/*.hpp and the binary linkable files will be in ~/usr/lib/libboost_*:

Depending on how you want to compile your programs is might be necessary to add these directories to the include path of compilation for example, LD_LIBRARY_PATH.

Boost.MultiArray Library

The MultiArray library is the first to be described in this tutorial because it is the most simple of these libraries and yet it solves a very annoying problem with C/C++ for our area. C/C++ have a very limited support for built-in arrays and the situation is even worst for multidimensional arrays (2,3 or more indices). Passing arrays to functions is very error prone, and functions with endless parameters with array dimensions must be continuously passed around. Moreover, writing algorithms for arbitrary array dimensionality is very hard.

We have to understand that the reason for this is that the language was designed to be very flexible, there are infinite ways to arrange multidimensional arrays in memory and infinite ways to resize them, so the language did not enforce any particular standard. Yet for scientific computing (which is only a subset of all computing areas), we can agree that there are a few multidimensional arrays, namely contiguous blocks memory with either column-major ordering (C-convention) or row-major ordering (Fortran-convention).

Among this restriction of memory layout, MultiArray provides a very flexible, very small and nice interface.

EXAMPLES HERE

Boost.MPI

I will assume that you already know the very basics of C-MPI and you want to improve the readability and design of your code. If you don't know C-MPI, you may try to learn from this examples (the syntax is very nice to learn what mpi is supposed to do --in contrast with the ugly C version--) but be aware that Boost.mpi is not very widely. This tutorial is meant to make your life easier in the former case not in the second case.

I will try to work out an example that is not trivial by using fftw in parallel. The official boost.mpi pages are plenty of trivial examples.

Before going to the first example let us see the command line necessary to compile it

 mpicxx fft_mpi_test.cpp -I/usr/lib/mpich/include/ \ 
   -L$HOME/usr/lib -lfftw3_mpi $HOME/usr/lib/libfftw3.a -lboost_mpi-gcc43-mt-1_37 -lboost_serialization-gcc43-mt \ 
   -lm -o fft_mpi_test

besides the fftw part, -L$HOME/usr/lib -lboost_mpi-gcc43-mt-1_37 -lboost_serialization-gcc43-mt do the job of linking against boost.mpi and boost.serialization. The example uses FFTW3 MPI experimental version, which can be downloaded from FFTW 3.3a sources and using the following configuration options "./configure --prefix=$HOME/usr --enable-mpi MPICC=/opt/mpich/gnu/bin/mpicc" (or equivalent MPI compiler wrapper).

The reason we need boost.serialization (although we don't use it explicitly here) is that mpi is highly dependent on the serialization mechanism. Serialization is the capacity of converting any C++ object in memory into a linear stream of data. That is what we do over and over again when we save data to a file, in this case MPI communication is basically the same thing: one processes send streams of data to others. (Ideally will be a separate chapter on the powerful boost.serialization.) This integration allows to pass serializable (in the boost sense) complex objects between processes without having to convert (deconstruct) the objects to numerical (or primitive) types first and then reconstructing the object on the other side as we would do it with C-MPI.

This is the compilable example is the following, it is based on the fftw3 simple mpi example:

 #include </home/correaa/usr/include/fftw3-mpi.h>
 #include <iostream>
 #include <complex>
 #include <boost/mpi.hpp>
 namespace mpi = boost::mpi;		/* if not defined have to use names as boost::mpi::... */
 using std::clog; 
 using std::endl;
 int main(int argc, char **argv){      /* initialize mpi world and fftw_mpi */
   mpi::environment env(argc, argv);	/* in C: MPI_Init(&argc, &argv); */
   mpi::communicator world;           	/* this will work as MPI_COMM_WORLD */
   fftw_mpi_init();			/* initialize fftw_mpi library */
   const ptrdiff_t N0 = 18, N1 = 18;	/* size of parallel array */
   fftw_plan plan;			/* this is the usual fftw plan */
   std::complex<double> *data;	        /* in C: fftw_complex *data; this is the local (to process) data */
   // given the matrix size (N0, N1) ask fftw_mpi how to split the matrix
   ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
   alloc_local = fftw_mpi_local_size_2d(N0, N1, world,
                                        &local_n0, &local_0_start);
   //report allocation size, example of output from each process
   if(world.rank()==0){                /* only process 0 prints this */
     clog<<"Global size of array : [0:"<<N0<<")x[0:"<<N1<<"), total size "<<N0*N1<<endl;
   }
   //all processes print the following but each prints a different report (there are better ways to print)
   clog<<"Process "<<world.rank()<<" has the array of size ["<<local_0_start<<","
       <<local_0_start+local_n0<<")x[0,"<<N1<<"), total local size "<<(N0*N1)<<endl;
   //now allocates data in the old fashioned way	
   data = (std::complex<double>*) fftw_malloc(sizeof(std::complex<double>)*alloc_local);
   // in C: data = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * alloc_local); 
   //create fftw plan, note "_mpi_" and note the "_2d"
   plan = fftw_mpi_plan_dft_2d(N0, N1, (double(*)[2])data, (double(*)[2])data, world,
                               FFTW_FORWARD, FFTW_ESTIMATE);

   double pdata=0;
   //initialize data and compute local power
   for (i = 0; i < local_n0; ++i){
     for (j = 0; j < N1; ++j){
       std::complex<double> val=std::complex<double>(local_0_start + i, 0);
       data[i*N1 + j] = val;
       pdata+=norm(val);
     }
   }
   clog<<"Process "<<world.rank()		/* each process prints its local power*/
       <<" has a power of "<<pdata<<" of the original data"<<endl;
   //compute transform according to the plan
   fftw_execute(plan);
   double ptransform=0;
   // normalize data and compute local power of the transform
   for (i = 0; i < local_n0; ++i){
     for (j = 0; j < N1; ++j){
       data[i*N1+j]/=sqrt((double)(N0*N1));	// this is the normalization of the transform
       std::complex<double> val=data[i*N1+j];
       ptransform+=norm(val);
     }
   }

   clog<<"Process "<<world.rank()    //each process print its local power
       <<" has a power of "<<ptransform<<" of the transformed data"<<endl;
   fftw_destroy_plan(plan);         /* destroy plan */
   // boost::mpi::environment doesn't need finalization it is automatic, in C: MPI_Finalize();
 }

The example is rather long, this is mainly because we are using the plain-C fftw interface, and manually allocated tables. Imagine how much cleaner would be the code if we could take advantage of boost.multiarray and of a c++ wrapper of fftw. Eventually, this will be the topic of another section. Note that using C++ and boost doesn't force us to leave the known C-interfaces, the idea is that we can reuse the old interfaces and improve them gradually as we need it. For example we used std::complex<double> instead of double[2], this is possible only because both types are binary compatible, although the former has a much better interface for treating complex numbers.

What the program does is to initialize a matrix in different processes, calculate local power of the data, do a global Fourier transform of this matrix (in place) and the compute the local power of the transformed data. The global power (sum of locals) is equal in the original data and in the transformed data.

Also note that although this is our first parallel mpi program the real mpi communication is performed by the fftw and not by us. Rarely we will want to deal with passing data ourselves between processes, libraries like fftw and Lapack will do a faster/better job than us. The idea here is to set the environment for the usage of those highly tunned numerical libraries.