Accelerating Erasure Coding with Bit Matrix Multiply


Introduction

In this article we will present a method for fast implementation of Erasure Coding primitives based on Galois Field arithmetic. The Jerasure library manual (pdf) is a good introduction to the erasure coding method considered in this article.

    For or purpose it suffices to say we have $r$ partial stripes. Each striple is constituted by $k$ $w$-bit data words and we want to compute the $m$ $w$-bit parity words to complete each striple.

Each parity word is computed as a dot-product of the data striple multiplied by a constant vectors whose values depends on the index of the parity word we want to compute (those index ranges from
$0$ to $m-1$). The vector elements and dot-product results are in $GF(2^w)$.

Multiplication by a constant through vector by matrix multiplication


We will consider the primitive operation on a $w$-bit data word $D$, which must be multiplied by a $w$-bit constant $C$ to get a $w$-bit parity word $P$: $D \times C = P$

The methode, due to Mastrovito (detailed in [5]), expands a multiplication by a constant and the modulo reduction into a multiplication by a bit matrix as follows:
$$
\begin{pmatrix}
data word
\end{pmatrix}
\times
\begin{pmatrix}
C . X^0 [g] \\
C . X^1 [g] \\
\vdots \\
C . X^{k-1} [g]
\end{pmatrix}
=
\begin{pmatrix}
parity word
\end{pmatrix}
$$

The multiplicand matrix is built by expanding $C . X^i [g] $ for each row $i$ of the matrix.
The matrix multiplication will accumulate the multiplication $d_i \times X^i [g] $ to build the final result:
\begin{align}
 P & = D \times C [g] \\
    & = (\oplus_{i=0}^{w-1} d_i \times X^i) \times C  [g] \\
    & = (\oplus_{i=0}^{w-1} d_i . C \times X^i)   [g] \\
    & = \oplus_{i=0}^{w-1} d_i . (C \times X^i [g] )
\end{align}

\begin{align*}
  \times &

\begin{pmatrix}
c_{0,0} & c_{0,1} & \dots & c_{0,w-1} \\
c_{1,0} & c_{1,1} & \dots & c_{1,w-1} \\
\vdots    &               &          &  \vdots \\
c_{k-1,0} & c_{k-1,1} & \dots & c_{w-1,w-1} \\
\end{pmatrix} \\
\begin{pmatrix} d_0 & d_1 & \dots & d_{w-1}  \end{pmatrix}   &
\begin{pmatrix}
p_0 & p_1 & \dots & p_{w-1}
\end{pmatrix}

\end{align*}

Multiplication parallelization

We have seen how to use vector by matrix multiplication to compute the multiplication of a data word to get part of a parity word (we still have to accumulate those parts to get the actual parity word). We can extend the left hand side vector to a full matrix and implements multiple multiplications by the same expanded constant C, realizing a Single Instruction Multiple Data (SIMD) operation.
With a single operation implementing a multiplication between $w \times w$ matrices we implement $w$ parallel multiplication by a constant C.

Erasure code operation


The macro operation for to compute erasure coding parity is a matrix multiplication where the right hand side matrix element are constant in $GF(2^w)$. This operation is decomposed into vector by constant vector multiplications, themselves decomposed into multiplication by constants and $xor$ accumulations.

References:

  • [1] Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions (pdf)
  • [2] Improving the coding speed of erasure codes with polynomial ring transforms (pdf)
  • [3] A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields, C. Paar (pdf)
  • [4] Jerasure library manual (pdf)
  • [5] Fast and Secure Finite Field Multipliers (pdf) by Pamula and Tisserand (contains description of Mastrovito's method)

No comments:

Post a Comment