imported a reworked version of BTL (Benchmark for Templated Libraries).

the modifications to initial code follow: * changed build system from plain makefiles to cmake * added eigen2 (4 versions: vec/novec and fixed/dynamic), GMM++, MTL4 interfaces * added "transposed matrix * vector" product action * updated blitz interface to use condensed products instead of hand coded loops * removed some deprecated interfaces * changed default storage order to column major for all libraries * new generic bench timer strategy which is supposed to be more accurate * various code clean-up
author: Gael Guennebaud <g.gael@free.fr> 2008-07-09 14:04:48 +0000
committer: Gael Guennebaud <g.gael@free.fr> 2008-07-09 14:04:48 +0000
commit: 28539e7597c643dbc2b8d4f49dd16bd86fb7251f (patch)
tree: 3bfb96ae898f0afc943a388edcc83c27d28d5b3a /bench/btl/README
parent: 5f55ab524cf0aad34dd6884bcae09eaa6c43c247 (diff)
1 files changed, 143 insertions, 0 deletions
diff --git a/bench/btl/README b/bench/btl/README
new file mode 100644
index 000000000..6d5712bb6
--- /dev/null
+++ b/bench/btl/README
@@ -0,0 +1,143 @@
+Bench Template Library
+
+****************************************
+Introduction :
+
+The aim of this project is to compare the performance
+of available numerical libraries. The code is designed
+as generic and modular as possible. Thus, adding new
+numerical libraries or new numerical tests should
+require minimal effort.
+
+
+*****************************************
+
+Installation :
+
+BTL uses cmake / ctest:
+
+1 - create a build directory:
+
+  $ mkdir build
+  $ cd build
+
+2 - configure:
+
+  $ ccmake ..
+
+3 - run the bench using ctest:
+
+  $ ctest -V
+
+You can also run a single bench, e.g.: ctest -V -R eigen
+
+4 : Analyze the result. different data files (.dat) are produced in each libs directories.
+ If gnuplot is available, choose a directory name in the data directory to store the results and type
+	cd data
+	mkdir my_directory
+        cp ../libs/*/*.dat my_directory
+ Build the data utilities in this (data) directory
+        make
+ Then you can look the raw data,
+        go_mean my_directory
+ or smooth the data first :
+	smooth_all.sh my_directory
+	go_mean my_directory_smooth
+
+
+*************************************************
+
+Files and directories :
+
+ generic_bench : all the bench sources common to all libraries
+
+ actions : sources for different action wrappers (axpy, matrix-matrix product) to be tested.
+
+ libs/* : bench sources specific to each tested libraries.
+
+ machine_dep : directory used to store machine specific Makefile.in
+
+ data : directory used to store gnuplot scripts and data analysis utilities
+
+**************************************************
+
+Principles : the code modularity is achieved by defining two concepts :
+
+ ****** Action concept : This is a class defining which kind
+  of test must be performed (e.g. a matrix_vector_product).
+	An Action should define the following methods :
+
+        *** Ctor using the size of the problem (matrix or vector size) as an argument
+	    Action action(size);
+        *** initialize : this method initialize the calculation (e.g. initialize the matrices and vectors arguments)
+	    action.initialize();
+	*** calculate : this method actually launch the calculation to be benchmarked
+	    action.calculate;
+	*** nb_op_base() : this method returns the complexity of the calculate method (allowing the mflops evaluation)
+        *** name() : this method returns the name of the action (std::string)
+
+ ****** Interface concept : This is a class or namespace defining how to use a given library and
+  its specific containers (matrix and vector). Up to now an interface should following types
+
+	*** real_type : kind of float to be used (float or double)
+	*** stl_vector : must correspond to std::vector<real_type>
+	*** stl_matrix : must correspond to std::vector<stl_vector>
+	*** gene_vector : the vector type for this interface        --> e.g. (real_type *) for the C_interface
+	*** gene_matrix : the matrix type for this interface        --> e.g. (gene_vector *) for the C_interface
+
+	+ the following common methods
+
+        *** free_matrix(gene_matrix & A, int N)  dealocation of a N sized gene_matrix A
+        *** free_vector(gene_vector & B)  dealocation of a N sized gene_vector B
+        *** matrix_from_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an stl_matrix A_stl into a gene_matrix A.
+	     The allocation of A is done in this function.
+	*** vector_to_stl(gene_vector & B, stl_vector & B_stl)  copy the content of an stl_vector B_stl into a gene_vector B.
+	     The allocation of B is done in this function.
+        *** matrix_to_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an gene_matrix A into an stl_matrix A_stl.
+             The size of A_STL must corresponds to the size of A.
+        *** vector_to_stl(gene_vector & A, stl_vector & A_stl) copy the content of an gene_vector A into an stl_vector A_stl.
+             The size of B_STL must corresponds to the size of B.
+	*** copy_matrix(gene_matrix & source, gene_matrix & cible, int N) : copy the content of source in cible. Both source
+		and cible must be sized NxN.
+	*** copy_vector(gene_vector & source, gene_vector & cible, int N) : copy the content of source in cible. Both source
+ 		and cible must be sized N.
+
+	and the following method corresponding to the action one wants to be benchmarked :
+
+	***  matrix_vector_product(const gene_matrix & A, const gene_vector & B, gene_vector & X, int N)
+	***  matrix_matrix_product(const gene_matrix & A, const gene_matrix & B, gene_matrix & X, int N)
+        ***  ata_product(const gene_matrix & A, gene_matrix & X, int N)
+	***  aat_product(const gene_matrix & A, gene_matrix & X, int N)
+        ***  axpy(real coef, const gene_vector & X, gene_vector & Y, int N)
+
+ The bench algorithm (generic_bench/bench.hh) is templated with an action itself templated with
+ an interface. A typical main.cpp source stored in a given library directory libs/A_LIB
+ looks like :
+
+ bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
+
+ this function will produce XY data file containing measured  mflops as a function of the size for 50
+ sizes between 10 and 10000.
+
+ This algorithm can be adapted by providing a given Perf_Analyzer object which determines how the time
+ measurements must be done. For example, the X86_Perf_Analyzer use the asm rdtsc function and provides
+ a very fast and accurate (but less portable) timing method. The default is the Portable_Perf_Analyzer
+ so
+
+ bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
+
+ is equivalent to
+
+ bench< Portable_Perf_Analyzer,AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
+
+ If your system supports it we suggest to use a mixed implementation (X86_perf_Analyzer+Portable_Perf_Analyzer).
+ replace
+     bench<Portable_Perf_Analyzer,Action>(size_min,size_max,nb_point);
+ with
+     bench<Mixed_Perf_Analyzer,Action>(size_min,size_max,nb_point);
+ in generic/bench.hh
+
+.
+
+
+
author	Gael Guennebaud <g.gael@free.fr>	2008-07-09 14:04:48 +0000
committer	Gael Guennebaud <g.gael@free.fr>	2008-07-09 14:04:48 +0000
commit	28539e7597c643dbc2b8d4f49dd16bd86fb7251f (patch)
tree	3bfb96ae898f0afc943a388edcc83c27d28d5b3a /bench/btl/README
parent	5f55ab524cf0aad34dd6884bcae09eaa6c43c247 (diff)