M4RI(e)- Linear Algebra over $\mathbb{F}_2$ (and $\mathbb{F}_{2^e}$)

Peformance over $\mathbb{F}_2$

This page contains some benchmarketing results for M4RI(e), Magma, GAP and NTL. It will be expanded in the future to contain more data on more operations. If you think that any result presented here is wrong and/or misleading, please let us know so that we can fix it. All times are in seconds.

Matrix Multiplication

Multiplying two random square matrices over $\mathbb{F}_2$ of the given dimension. GAP implements the M4RM algorithm, Magma implements Strassen-Winograd (on top of M4RM) and M4RI implements Strassen-Winograd on top of M4RM.

64-bit Debian/GNU Linux, 2.33Ghz Intel Core2Duo (MacbookPro2,2)
Matrix
Dimension
Magma 2.14-17
(64-bit)
GAP 4.4.10
(64-bit)
M4RI-20080821
(64-bit)
10,000 x 10,000 1.892 6.130 1.504
16,384 x 16,3847 .72025.048 6.074
20,000 x 20,00013.209 -10.721
32,000 x 32,00053.668 -43.197

64-bit Debian/GNU Linux, 2.6Ghz Intel i7 (MacbookPro6,2)
Matrix
Dimension
Magma 2.15-10
(64-bit)
GAP 4.4.12
(64-bit)
M4RI-20121224
(64-bit)
10,000 x 10,000 1.200 6.524 0.942
16,384 x 16,384 4.73520.777 3.672
20,000 x 20,000 8.085 - 6.660
32,000 x 32,00033.395 -28.006

64-bit Debian/GNU Linux, 2.6Ghz AMD Opteron 858 (VMWare Virtualised)
Matrix
Dimension
Magma 2.14-13
(64-bit)
GAP 4.4.12
(64-bit)
M4RI-200100817
(64-bit)
10,000 x 10,000 2.85510.256 2.656
16,384 x 16,38410.865 -11.470
20,000 x 20,00019.505 -18.929
32,000 x 32,00070.110 -66.208

64-bit Ubuntu Linux, 2.6Ghz AMD Athlon X2
Matrix
Dimension
Magma 2.14-13
(64-bit)
GAP 4.4.10
(64-bit)
M4RI-200100817
(64-bit)
10,000 x 10,000 2.005 5.920 2.700
16,384 x 16,384 7.62524.18010.100
20,000 x 20,00013.870 -19.120
32,000 x 32,00051.155 -73.725

64-bit RHEL 5 Linux, 1.6Ghz Intel Itanium
Matrix
Dimension
Magma 2.14-16
(64-bit)
GAP 4.4.10
(64-bit)
M4RI-20080909
(64-bit)
10,000 x 10,000 7.941- 4.200
16,384 x 16,384 31.046- 16.430
20,000 x 20,000 55.654- 28.830
32,000 x 32,000209.483-109.414

Reduced Row Echelon Form

Computing the reduced row echelon form for random square matrices over $\mathbb{F}_2$. Since these algorithms and implementations are sensitive to the structure of the input matrices (e.g. rank sensitive), benchmarks for random square matrices hold very little meaning (see for example this blog post) except maybe to indicate peak performance. We provide some more pratical examples below.

Note that the M4RI algorithm has worse theoretical time complexity when compared to the assymptotically fast matrix elimination implemented in Magma. However, the M4RI algorithm needs less memory: the 64,000 x 64,000 example below takes more than 2GB of memory in Magma but ~ 1GB with M4RI.

64-bit Debian/GNU Linux, 2.33Ghz Core2Duo (MacbookPro 2,2)
Matrix
Dimension
Magma
2.14-17
NTL*
5.4.2
M4RI/M4RI
20090105
M4RI/PLUQ
20091101
10,000 x 10,000 2.474 14.14 1.532 0.864
16,384 x 16,384 7.957 67.52 6.597 3.602
20,000 x 20,000 13.151 123.70 12.031 6.138
32,000 x 32,000 39.653 462.32 40.768 25.290
64,000 x 64,000251.3462511.54241.017139.320

64-bit Debian/GNU Linux, 2.6Ghz Intel i7 (MacbookPro 6,2)
Matrix
Dimension
Magma
2.15-10
NTL*
5.4.2
M4RI/M4RI
20100817
M4RI/PLE
20121224
10,000 x 10,000 1.600 1.050 0.555
16,384 x 16,384 6.515 3.890 2.342
20,000 x 20,000 8.510 7.250 3.898
32,000 x 32,000 27.645 22.56018.309
64,000 x 64,000176.940124.48094.690

64-bit Linux, 2.33Ghz Xeon (E5345)
Matrix
Dimension
Magma
2.16-7
NTL*
5.4.2
M4RI/M4RI
20100324
M4RI/PLUQ
20100324
10,000 x 10,000 2.660 12.05 1.360 0.864
16,384 x 16,384 8.617 54.79 5.734 3.388
20,000 x 20,000 12.527 100.01 10.610 5.661
32,000 x 32,000 41.770 382.52 43.042 20.967
64,000 x 64,000250.193 --382.263 151.314

64-bit Debian/GNU Linux, 2.6Ghz Opteron (VMWare Virtualised)
Matrix
Dimension
Magma
2.14-13
NTL*
5.4.2
M4RI/M4RI
20090105
M4RI/PLUQ
20100817
10,000 x 10,000 3.351 18.45 2.430 1.430
16,384 x 16,384 11.289 72.89 10.822 6.520
20,000 x 20,000 16.734 130.46 19.978 10.180
32,000 x 32,000 57.567 479.07 83.575 40.250
64,000 x 64,000373.9062747.41537.900259.530

RHEL 5, 1.6Ghz Itanium
Matrix
Dimension
Magma
2.14-16
M4RI/M4RI
20080909
M4RI/PLUQ
20090617
10,000 x 10,000 8.069 3.418 2.410
16,384 x 16,384 25.828 19.987 12.720
20,000 x 20,000 43.256 30.829 15.800
32,000 x 32,000157.134118.121 65.550
64,000 x 64,000892.481874.609342.680

(*) For NTL we call gauss which also doesn't return the reduced row echelon form.

The following timings are for $\mathbb{F}_2$ matrices as they appear during a Gröbner basis computation for the HFE cryptosystem. These examples were provided by Michael Brickenstein. In input matrices are linked from the table below. To visualise the structure of the problem a scaled down version of the smallest input matrix follows:

scaled down version of hfe_25_5.png

64-bit Debian/GNU Linux, 2.6Ghz Opteron 858 (VMWare Virtualised)
HFE (Filesize) Matrix
Dimension
Magma
2.15-10
M4RI/M4RI
20100324
M4RI/PLUQ
20100324
25 (5.6 MB)12,307 x 13,508 4.57 3.28 3.34
30 (16 MB) 19,907 x 29,323 33.21 23.72 25.42
35 (40 MB) 29,969 x 55,800278.58126.08159.72

64-bit Fedora Linux, 2.33Ghz Xeon (E5345)
HFE (Filesize) Matrix
Dimension
Magma
2.16-7
M4RI/M4RI
20100324
M4RI/PLUQ
20100324
25 (5.6 MB)12,307 x 13,508 3.68 1.94 2.09
30 (16 MB) 19,907 x 29,32323.3911.4613.34
35 (40 MB) 29,969 x 55,800 --49.1968.85

The following timings are for matrices as they appear during a MutantXL computation. These examples were provided by Wael Said.

64-bit Debian/GNU Linux, 2.6Ghz Opteron (VMWare Virtualised)
Filesize Matrix
Dimension
Magma
2.15-10
M4RI/M4RI
20100324
M4RI/PLUQ
20100324
39 MB26,407 x 26,07576.81023.0319.040

64-bit Fedora Linux, 2.33Ghz Xeon (E5345)
Filesize Matrix
Dimension
Magma
2.16-7
M4RI/M4RI
20100324
M4RI/PLUQ
20100324
39 MB26,407 x 26,07555.1512.259.22

Peformance over $\mathbb{F}_{2^e}$

Matrix Multiplication

Multiplying two random square matrices over $\mathbb{F}_{2^e}$ of dimension 1,000 x 1,000. Magma implements Strassen-Winograd and M4RIE implements Strassen-Winograd either via Karatsuba or using Newton-John tables ($e > 8$).

64-bit Debian/GNU Linux, 2.66Ghz Intel i7 (MacbookPro6,2)
Finite Field
Degree
Magma 2.15-10
(64-bit)
GAP 4.4.12
(64-bit)
LinBox (svn)
(64-bit)
M4RIE
0e0147635ebe
2 0.013 0.216 0.468 0.057
3 0.036 0.592 0.480 0.109
4 0.074 0.588 0.760 0.166
5 1.276 1.568 1.932 0.243
6 1.286 1.356 3.532 0.321
7 1.316 1.276 3.884 0.425
8 1.842 1.328 -- 0.523
9 3.985 64.700 -- 1.419
10 4.160 59.131 -- 1.689

Reduced Row Echelon Form

Computing the reduced row echelon form for random 10,000 x 10,000 matrices over $\mathbb{F}_{2^e}$. Magma implements asymptotically fast Gaussian elimination. M4RIE implements Gaussian elimination with Newton-John tables and PLE decomposition ($e \leq 8$).

64-bit Debian/GNU Linux, 2.66Ghz Intel i7 (MacbookPro6,2)
Finite Field
Degree
Magma 2.15-10
(64-bit)
GAP 4.4.12
(64-bit)
M4RIE 6b24b839a46f
(64-bit)
2 6.040 162.658 3.310
3 14.470 442.522 5.332
4 60.370 502.672 6.330
5 659.030 N/A 10.511
6 685.460 N/A 13.078
7 671.880 N/A 17.285
8 840.220 N/A 20.247
9 1630.380 N/A 260.774
10 1631.350 N/A 291.298

Valid XHTML 1.1 Valid CSS!