--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/Modules/_decimal/libmpdec/literature/matrix-transform.txt Sat Mar 10 18:12:20 2012 +0100 @@ -0,0 +1,256 @@ + + +(* Copyright (c) 2011 Stefan Krah. All rights reserved. *) + + +The Matrix Fourier Transform: +============================= + +In libmpdec, the Matrix Fourier Transform [1] is called four-step transform +after a variant that appears in [2]. The algorithm requires that the input +array can be viewed as an R*C matrix. + +All operations are done modulo p. For readability, the proofs drop all +instances of (mod p). + + +Algorithm four-step (forward transform): +---------------------------------------- + + a := input array + d := len(a) = R * C + p := prime + w := primitive root of unity of the prime field + r := w**((p-1)/d) + A := output array + + 1) Apply a length R FNT to each column. + + 2) Multiply each matrix element (addressed by j*C+m) by r**(j*m). + + 3) Apply a length C FNT to each row. + + 4) Transpose the matrix. + + +Proof (forward transform): +-------------------------- + + The algorithm can be derived starting from the regular definition of + the finite-field transform of length d: + + d-1 + ,---- + \ + A[k] = | a[l] * r**(k * l) + / + `---- + l = 0 + + + The sum can be rearranged into the sum of the sums of columns: + + C-1 R-1 + ,---- ,---- + \ \ + = | | a[i * C + j] * r**(k * (i * C + j)) + / / + `---- `---- + j = 0 i = 0 + + + Extracting a constant from the inner sum: + + C-1 R-1 + ,---- ,---- + \ \ + = | r**k*j * | a[i * C + j] * r**(k * i * C) + / / + `---- `---- + j = 0 i = 0 + + + Without any loss of generality, let k = n * R + m, + where n < C and m < R: + + C-1 R-1 + ,---- ,---- + \ \ + A[n*R+m] = | r**(R*n*j) * r**(m*j) * | a[i*C+j] * r**(R*C*n*i) * r**(C*m*i) + / / + `---- `---- + j = 0 i = 0 + + + Since r = w ** ((p-1) / (R*C)): + + a) r**(R*C*n*i) = w**((p-1)*n*i) = 1 + + b) r**(C*m*i) = w**((p-1) / R) ** (m*i) = r_R ** (m*i) + + c) r**(R*n*j) = w**((p-1) / C) ** (n*j) = r_C ** (n*j) + + r_R := root of the subfield of length R. + r_C := root of the subfield of length C. + + + C-1 R-1 + ,---- ,---- + \ \ + A[n*R+m] = | r_C**(n*j) * [ r**(m*j) * | a[i*C+j] * r_R**(m*i) ] + / ^ / + `---- | `---- 1) transform the columns + j = 0 | i = 0 + ^ | + | `-- 2) multiply + | + `-- 3) transform the rows + + + Note that the entire RHS is a function of n and m and that the results + for each pair (n, m) are stored in Fortran order. + + Let the term in square brackets be f(m, j). Step 1) and 2) precalculate + the term for all (m, j). After that, the original matrix is now a lookup + table with the mth element in the jth column at location m * C + j. + + Let the complete RHS be g(m, n). Step 3) does an in-place transform of + length n on all rows. After that, the original matrix is now a lookup + table with the mth element in the nth column at location m * C + n. + + But each (m, n) pair should be written to location n * R + m. Therefore, + step 4) transposes the result of step 3). + + + +Algorithm four-step (inverse transform): +---------------------------------------- + + A := input array + d := len(A) = R * C + p := prime + d' := d**(p-2) # inverse of d + w := primitive root of unity of the prime field + r := w**((p-1)/d) # root of the subfield + r' := w**((p-1) - (p-1)/d) # inverse of r + a := output array + + 0) View the matrix as a C*R matrix. + + 1) Transpose the matrix, producing an R*C matrix. + + 2) Apply a length C FNT to each row. + + 3) Multiply each matrix element (addressed by i*C+n) by r**(i*n). + + 4) Apply a length R FNT to each column. + + +Proof (inverse transform): +-------------------------- + + The algorithm can be derived starting from the regular definition of + the finite-field inverse transform of length d: + + d-1 + ,---- + \ + a[k] = d' * | A[l] * r' ** (k * l) + / + `---- + l = 0 + + + The sum can be rearranged into the sum of the sums of columns. Note + that at this stage we still have a C*R matrix, so C denotes the number + of rows: + + R-1 C-1 + ,---- ,---- + \ \ + = d' * | | a[j * R + i] * r' ** (k * (j * R + i)) + / / + `---- `---- + i = 0 j = 0 + + + Extracting a constant from the inner sum: + + R-1 C-1 + ,---- ,---- + \ \ + = d' * | r' ** (k*i) * | a[j * R + i] * r' ** (k * j * R) + / / + `---- `---- + i = 0 j = 0 + + + Without any loss of generality, let k = m * C + n, + where m < R and n < C: + + R-1 C-1 + ,---- ,---- + \ \ + A[m*C+n] = d' * | r' ** (C*m*i) * r' ** (n*i) * | a[j*R+i] * r' ** (R*C*m*j) * r' ** (R*n*j) + / / + `---- `---- + i = 0 j = 0 + + + Since r' = w**((p-1) - (p-1)/d) and d = R*C: + + a) r' ** (R*C*m*j) = w**((p-1)*R*C*m*j - (p-1)*m*j) = 1 + + b) r' ** (C*m*i) = w**((p-1)*C - (p-1)/R) ** (m*i) = r_R' ** (m*i) + + c) r' ** (R*n*j) = r_C' ** (n*j) + + d) d' = d**(p-2) = (R*C) ** (p-2) = R**(p-2) * C**(p-2) = R' * C' + + r_R' := inverse of the root of the subfield of length R. + r_C' := inverse of the root of the subfield of length C. + R' := inverse of R + C' := inverse of C + + + R-1 C-1 + ,---- ,---- 2) transform the rows of a^T + \ \ + A[m*C+n] = R' * | r_R' ** (m*i) * [ r' ** (n*i) * C' * | a[j*R+i] * r_C' ** (n*j) ] + / ^ / ^ + `---- | `---- | + i = 0 | j = 0 | + ^ | `-- 1) Transpose input matrix + | `-- 3) multiply to address elements by + | i * C + j + `-- 3) transform the columns + + + + Note that the entire RHS is a function of m and n and that the results + for each pair (m, n) are stored in C order. + + Let the term in square brackets be f(n, i). Without step 1), the sum + would perform a length C transform on the columns of the input matrix. + This is a) inefficient and b) the results are needed in C order, so + step 1) exchanges rows and columns. + + Step 2) and 3) precalculate f(n, i) for all (n, i). After that, the + original matrix is now a lookup table with the ith element in the nth + column at location i * C + n. + + Let the complete RHS be g(m, n). Step 4) does an in-place transform of + length m on all columns. After that, the original matrix is now a lookup + table with the mth element in the nth column at location m * C + n, + which means that all A[k] = A[m * C + n] are in the correct order. + + +-- + + [1] Joerg Arndt: "Matters Computational" + http://www.jjj.de/fxt/ + [2] David H. Bailey: FFTs in External or Hierarchical Memory + http://crd.lbl.gov/~dhbailey/dhbpapers/ + + +