--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Modules/_decimal/libmpdec/literature/matrix-transform.txt Sat Mar 10 18:12:20 2012 +0100
@@ -0,0 +1,256 @@
+
+
+(* Copyright (c) 2011 Stefan Krah. All rights reserved. *)
+
+
+The Matrix Fourier Transform:
+=============================
+
+In libmpdec, the Matrix Fourier Transform [1] is called four-step transform
+after a variant that appears in [2]. The algorithm requires that the input
+array can be viewed as an R*C matrix.
+
+All operations are done modulo p. For readability, the proofs drop all
+instances of (mod p).
+
+
+Algorithm four-step (forward transform):
+----------------------------------------
+
+ a := input array
+ d := len(a) = R * C
+ p := prime
+ w := primitive root of unity of the prime field
+ r := w**((p-1)/d)
+ A := output array
+
+ 1) Apply a length R FNT to each column.
+
+ 2) Multiply each matrix element (addressed by j*C+m) by r**(j*m).
+
+ 3) Apply a length C FNT to each row.
+
+ 4) Transpose the matrix.
+
+
+Proof (forward transform):
+--------------------------
+
+ The algorithm can be derived starting from the regular definition of
+ the finite-field transform of length d:
+
+ d-1
+ ,----
+ \
+ A[k] = | a[l] * r**(k * l)
+ /
+ `----
+ l = 0
+
+
+ The sum can be rearranged into the sum of the sums of columns:
+
+ C-1 R-1
+ ,---- ,----
+ \ \
+ = | | a[i * C + j] * r**(k * (i * C + j))
+ / /
+ `---- `----
+ j = 0 i = 0
+
+
+ Extracting a constant from the inner sum:
+
+ C-1 R-1
+ ,---- ,----
+ \ \
+ = | r**k*j * | a[i * C + j] * r**(k * i * C)
+ / /
+ `---- `----
+ j = 0 i = 0
+
+
+ Without any loss of generality, let k = n * R + m,
+ where n < C and m < R:
+
+ C-1 R-1
+ ,---- ,----
+ \ \
+ A[n*R+m] = | r**(R*n*j) * r**(m*j) * | a[i*C+j] * r**(R*C*n*i) * r**(C*m*i)
+ / /
+ `---- `----
+ j = 0 i = 0
+
+
+ Since r = w ** ((p-1) / (R*C)):
+
+ a) r**(R*C*n*i) = w**((p-1)*n*i) = 1
+
+ b) r**(C*m*i) = w**((p-1) / R) ** (m*i) = r_R ** (m*i)
+
+ c) r**(R*n*j) = w**((p-1) / C) ** (n*j) = r_C ** (n*j)
+
+ r_R := root of the subfield of length R.
+ r_C := root of the subfield of length C.
+
+
+ C-1 R-1
+ ,---- ,----
+ \ \
+ A[n*R+m] = | r_C**(n*j) * [ r**(m*j) * | a[i*C+j] * r_R**(m*i) ]
+ / ^ /
+ `---- | `---- 1) transform the columns
+ j = 0 | i = 0
+ ^ |
+ | `-- 2) multiply
+ |
+ `-- 3) transform the rows
+
+
+ Note that the entire RHS is a function of n and m and that the results
+ for each pair (n, m) are stored in Fortran order.
+
+ Let the term in square brackets be f(m, j). Step 1) and 2) precalculate
+ the term for all (m, j). After that, the original matrix is now a lookup
+ table with the mth element in the jth column at location m * C + j.
+
+ Let the complete RHS be g(m, n). Step 3) does an in-place transform of
+ length n on all rows. After that, the original matrix is now a lookup
+ table with the mth element in the nth column at location m * C + n.
+
+ But each (m, n) pair should be written to location n * R + m. Therefore,
+ step 4) transposes the result of step 3).
+
+
+
+Algorithm four-step (inverse transform):
+----------------------------------------
+
+ A := input array
+ d := len(A) = R * C
+ p := prime
+ d' := d**(p-2) # inverse of d
+ w := primitive root of unity of the prime field
+ r := w**((p-1)/d) # root of the subfield
+ r' := w**((p-1) - (p-1)/d) # inverse of r
+ a := output array
+
+ 0) View the matrix as a C*R matrix.
+
+ 1) Transpose the matrix, producing an R*C matrix.
+
+ 2) Apply a length C FNT to each row.
+
+ 3) Multiply each matrix element (addressed by i*C+n) by r**(i*n).
+
+ 4) Apply a length R FNT to each column.
+
+
+Proof (inverse transform):
+--------------------------
+
+ The algorithm can be derived starting from the regular definition of
+ the finite-field inverse transform of length d:
+
+ d-1
+ ,----
+ \
+ a[k] = d' * | A[l] * r' ** (k * l)
+ /
+ `----
+ l = 0
+
+
+ The sum can be rearranged into the sum of the sums of columns. Note
+ that at this stage we still have a C*R matrix, so C denotes the number
+ of rows:
+
+ R-1 C-1
+ ,---- ,----
+ \ \
+ = d' * | | a[j * R + i] * r' ** (k * (j * R + i))
+ / /
+ `---- `----
+ i = 0 j = 0
+
+
+ Extracting a constant from the inner sum:
+
+ R-1 C-1
+ ,---- ,----
+ \ \
+ = d' * | r' ** (k*i) * | a[j * R + i] * r' ** (k * j * R)
+ / /
+ `---- `----
+ i = 0 j = 0
+
+
+ Without any loss of generality, let k = m * C + n,
+ where m < R and n < C:
+
+ R-1 C-1
+ ,---- ,----
+ \ \
+ A[m*C+n] = d' * | r' ** (C*m*i) * r' ** (n*i) * | a[j*R+i] * r' ** (R*C*m*j) * r' ** (R*n*j)
+ / /
+ `---- `----
+ i = 0 j = 0
+
+
+ Since r' = w**((p-1) - (p-1)/d) and d = R*C:
+
+ a) r' ** (R*C*m*j) = w**((p-1)*R*C*m*j - (p-1)*m*j) = 1
+
+ b) r' ** (C*m*i) = w**((p-1)*C - (p-1)/R) ** (m*i) = r_R' ** (m*i)
+
+ c) r' ** (R*n*j) = r_C' ** (n*j)
+
+ d) d' = d**(p-2) = (R*C) ** (p-2) = R**(p-2) * C**(p-2) = R' * C'
+
+ r_R' := inverse of the root of the subfield of length R.
+ r_C' := inverse of the root of the subfield of length C.
+ R' := inverse of R
+ C' := inverse of C
+
+
+ R-1 C-1
+ ,---- ,---- 2) transform the rows of a^T
+ \ \
+ A[m*C+n] = R' * | r_R' ** (m*i) * [ r' ** (n*i) * C' * | a[j*R+i] * r_C' ** (n*j) ]
+ / ^ / ^
+ `---- | `---- |
+ i = 0 | j = 0 |
+ ^ | `-- 1) Transpose input matrix
+ | `-- 3) multiply to address elements by
+ | i * C + j
+ `-- 3) transform the columns
+
+
+
+ Note that the entire RHS is a function of m and n and that the results
+ for each pair (m, n) are stored in C order.
+
+ Let the term in square brackets be f(n, i). Without step 1), the sum
+ would perform a length C transform on the columns of the input matrix.
+ This is a) inefficient and b) the results are needed in C order, so
+ step 1) exchanges rows and columns.
+
+ Step 2) and 3) precalculate f(n, i) for all (n, i). After that, the
+ original matrix is now a lookup table with the ith element in the nth
+ column at location i * C + n.
+
+ Let the complete RHS be g(m, n). Step 4) does an in-place transform of
+ length m on all columns. After that, the original matrix is now a lookup
+ table with the mth element in the nth column at location m * C + n,
+ which means that all A[k] = A[m * C + n] are in the correct order.
+
+
+--
+
+ [1] Joerg Arndt: "Matters Computational"
+ http://www.jjj.de/fxt/
+ [2] David H. Bailey: FFTs in External or Hierarchical Memory
+ http://crd.lbl.gov/~dhbailey/dhbpapers/
+
+
+