2. Neural Networks
Wij
xi
yj
Input $x_i$, output $y_j$, weight matrix $W_{ij}$
Supervised learning: adjust $W$ to minimize error on dataset
\[\begin{aligned}
y_{j} & = W_{ij} x_{i} & \mbox{(Matrix multiplication)}\\
y_{j} & = W_{ij} x_{i} + c_{j} & \mbox{(+ Bias)}\\
y_{bj} & = W_{ij} x_{bi} + c_{j} & \mbox{(+ Batch dimension)}\\
y_{bj} & = f(W_{ij} x_{bi} + c_{j}) & \mbox{(+ Activation function)}
\end{aligned} \]
$z_{bj} = W_{ij} x_{bi} + c_{j}$
$y_{bj} = f(z_{bj}) = relu(z_{bj})$
def dense(x, w, c):
z = tf.matmul(w, x) + c
# or z = tf.einsum("ij,bi->bj", w, x) + c
y = tf.nn.relu(z)
return y
# x has shape [batch, in_units]
w = tf.Variable(tf.random.normal([x.shape[1], 10]))
c = tf.Variable(tf.zeros([10]))
with tf.GradientTape() as tape:
y = dense(x, w, c)
dy_dw = tape.gradient(y, w)
Keras - Layers as objects
# Layer contains variables.
# Returns relu(Wx + c) when called.
dense_layer = tf.keras.layers.Dense(
units=10, activation="relu"
)
# x has shape [batch, in_units]
y = dense_layer(x)
# Can reuse layer without creating new variables
v = dense_layer(u)
Point-cloud Classification
network = Sequential([
Dense(units=64, activation="relu"),
Dense(units=64, activation="relu"),
Dense(units=40, activation="softmax")
])
# loss: term to be minimized
network.compile(loss="categorical_crossentropy",
optimizer="Adam")
# inputs: [num_samples, 3*2048] (2048 3d points)
# outputs: [num_samples, 40] (40 classes)
model.fit(train_inputs, train_outputs,
epochs=10, validation_split=0.1)
predictions = model.predict(test_inputs)
4. GA in TensorFlow
Geometric product: multiplication table
For $e_1^2 = e_2^2 = 1$
1
e1
e2
e12
e1
1
e12
e2
e2
-e12
1
-e1
e12
-e2
e1
-1
Example: $table[e_1, e_{12}] = e_2$
or with blade-indices (Scalar: 0, $e_1$: 1, $e_2$: 2, $e_{12}$: 3): $table[1, 3] = 2$
Can be precomputed
Multivectors: basis blade coefficients as 1-Tensor
Eg.: a = $1 + 2 e_x + 3 e_y + 4 e_{xy} \rightarrow [1, 2, 3, 4] = a_i$
Dense representation of mult. table as 3-tensor
$C_{ijk}$
$a b = y \rightarrow a_i b_j C_{ijk} = y_k$
a = tf.constant([1, 2, 3, 4]) # shape [4]
b = tf.constant([3, 9, 7, 5]) # shape [4]
# c is precomputed and has shape [4, 4, 4]
y = tf.einsum("i,j,ijk->k", a, b, c) # y shape [4]
Geometric product
*clifford-raw: using numba parallel jit (low #elements overhead lower on Linux)
CPU: AMD 2700X (8C/16T), GPU: Nvidia GTX 1070
Addition
*clifford-raw: using numba parallel jit (low #elements overhead lower on Linux)
CPU: AMD 2700X (8C/16T), GPU: Nvidia GTX 1070
Reminder: Automatic differentiation
$y = a b, \frac{\delta y(a, b)}{\delta a}$
a = tf.constant([1, 2, 3, 4]) # shape [4]
b = tf.constant([3, 9, 7, 5]) # shape [4]
with tf.GradientTape() as tape:
# c is precomputed and has shape [4, 4, 4]
y = tf.einsum("i,j,ijk->k", a, b, c)
print(tape.gradient(y, a)) # = d/da_i sum(y_j), has shape [4]
print(tape.jacobian(y, a)) # = d/da_i y_j, shape [4, 4]
ga = tfga.GeometricAlgebra([1, 1, 1])
# Create geometric algebra tf.Tensor for vector blades
# (ie. e_0 + e_1 + e_2).
# Result: tf.Tensor [0, 1, 1, 1, 0, 0, 0, 0]
vector = ga.from_tensor_with_kind(
tf.ones(3), kind="vector"
)
# 5 + 5 e_01 + 5 e_02 + 5 e_12
# tf.Tensor [5, 0, 0, 0, 5, 5, 5, 0]
quaternion = ga.from_tensor_with_kind(
tf.fill(dims=4, value=5),
kind="even"
)
ga.print(ga.geom_prod(vector, quaternion))
# Grade reversal ~(5 + 5 e_01 + 5 e_02 + 5 e_12)
# = 5 + 5 e_10 + 5 e_20 + 5 e_21
# = 5 - 5 e_01 - 5 e_02 - 5 e_12
ga.print(ga.reversion(quaternion))
# tf.Tensor of shape [1]: -5
# (ie. reversed sign of e_01 component)
ga.print(ga.select_blades(quaternion, "10"))
# tf.Tensor of shape [8] with only e_01
# component equal to 5
ga.print(ga.keep_blades(quaternion, "10"))
# Exterior product e_01 ^ e_2 = e_012.
ga.print(ga.ext_prod(ga.e01, ga.e2))
ga = tfga.GeometricAlgebra([1, 1, 1])
a = ga.from_tensor(...)
b = ga.from_tensor(...)
# Wraps tf.Tensors in tfga's MultiVector class which
# provides operator overrides etc.
mv_a = ga(a)
mv_b = ga(b)
# Reversion ((~mv_a).tensor equivalent to ga.reversion(a))
print(~mv_a)
# Geometric / inner / exterior product
print(mv_a * mv_b, mv_a | mv_b, mv_a ^ mv_b)
# Get back tf.Tensor from the multivector
print(mv_a.tensor)
Tensor vs MultiVector API
Tensors: easy TensorFlow interop
MultiVector: less verbose, no overhead
5. GA Neural Nets
Standard Dense Layer: $y_{bj} = f(W_{ij} x_{bi} + c_j)$
$W_{ij} \in \mathbb{R}^{MN}, c_j \in \mathbb{R}^N$
dense_layer = Dense(
units=64, activation="relu"
)
GA Dense Layer: make parameters multivector-valued
$W_{ij} \in \mathbb{Cl}_{(p,q,r)}^{MN}, c_j \in \mathbb{Cl}_{(p,q,r)}^N$
# Create algebra with 3 basis vectors squaring to 1
ga = tfga.GeometricAlgebra([1, 1, 1])
# 1, e01, e02, e12 indices: [0, 4, 5, 6]
even_indices = ga.get_kind_blade_indices("even")
# e0, e1, e2, e012 indices: [1, 2, 3, 7]
odd_indices = ga.get_kind_blade_indices("odd")
ga_dense_layer = tfga.layers.GeometricProductDense(
algebra=ga,
units=64, activation="relu",
kernel_blade_indices=even_indices,
bias_blade_indices=odd_indices
)
Choices
Activation function $f$: elementwise or acting on multivector (eg. $e^{a e_{12}} = cos(a) +
e_{12} sin(a)$)?
Geometric product $W_{ij} x_i$ or Sandwich product $W_{ij} x_i \widetilde{W}_{ij}$
Matrix mult $W_{ij} x_i$ or elementwise mult $W_{i} x_i$
Add bias $c_{j}$?
Algebra / signature
Subalgebra for $W_{ij}$ and $c_{j}$
No activation, sandwich product, elementwise multiplication, no bias, quaternion weights
$y_i^{(1)} = W_{i} x_i \widetilde{W}_{i}$
Two layers
$y_i^{(2)} = U_{i} y_i^{(1)} \widetilde{U}_{i} = U_{i} W_{i} x_i \widetilde{W}_{i}
\widetilde{U}_{i}$
Composition of transforms
Multi-layer reduces to one layer
Log input, Exp on final output, ReLU activation, geometric product, matrix multiplication, bias, quaternion weights
$x_i^{(1)} = log(x_i^{(0)})$
$x_j^{(n+1)} = relu(W_{ij}^{(n)} x_i^{(n)} + c_j^{(n)})$
$y_j = e^{W_{ij}^{(N-1)} x_i^{(N-1)} + c_j^{(N-1)}}$
$log$ goes from Lie group to algebra
$exp$ goes from Lie algebra to group
6.1 LieNet
Deep Learning on Lie Groups for Skeleton-based Action Recognition
Implementation in TFGA by Hugo Hadfield
ga = GeometricAlgebra([1, 1, 1])
model = Sequential([
TensorToGeometric(ga, blade_indices=[6, 5, 4, 0]),
RotMap(ga, use_bias=False),
LogMap(ga),
Flatten(),
ReLU(),
Dense(units=20),
Softmax(axis=-1)
])
Train model on dataset (Pose-sequence $\rightarrow$ Action)
model.compile(
optimizer="Adam",
loss="sparse_categorical_crossentropy",
metrics=["sparse_categorical_accuracy"]
)
model.fit(
x=inputs_train, y=labels_train,
validation_data=(inputs_test, labels_test),
epochs=100
)
Results
Relatively easy and quick to implement using TFGA
Accuracy verified
But: outperformed by simple Dense ReLU NN (~90% accuracy)
6.3 Lattice QFT
\[\begin{aligned}
\mathcal{L}_{QED}(X) = & \langle \hbar (\nabla \psi(X)) \gamma_2 \gamma_1 \gamma_3 \widetilde{\psi}(X) - \\
& e A(X) \psi(X) \gamma_0 \widetilde{\psi}(X) - \\
& m \psi(X) \widetilde{\psi}(X) \rangle_0
\end{aligned} \]
Action $S = \int_{\mathcal{X}} \mathcal{L}(X) dX$
Many cells, many parallel calculations, perfect for TensorFlow
TensorFlow Probability provides MCMC samplers (or variational inference) needed for lattice QFT
$m \psi(X) \widetilde{\psi}(X)$
sta = tfga.GeometricAlgebra([1, -1, -1, -1])
def get_mass_term(psi, electron_mass):
return (electron_mass *
sta.geom_prod(psi, sta.reversion(psi)
)
$e A(X) \psi(X) \gamma_0 \widetilde{\psi}(X)$
def get_interaction_term(psi, a, electron_charge):
return sta.geom_prod(
electron_charge * a,
sta.geom_prod(
psi,
sta.geom_prod(sta.e0, sta.reversion(psi))
)
)
$\hbar (\nabla \psi(X)) \gamma_2 \gamma_1 \gamma_3 \widetilde{\psi}(X)$
def get_momentum_term(psi, spacing, hbar):
dt_psi = finite_differences(psi, axis=0, spacing=spacing)
dx_psi = finite_differences(psi, axis=1, spacing=spacing)
dy_psi = finite_differences(psi, axis=2, spacing=spacing)
dz_psi = finite_differences(psi, axis=3, spacing=spacing)
d_psi = dt_psi + dx_psi + dy_psi + dz_psi
return sta.geom_prod(
hbar * d_psi,
sta.geom_prod(sta.e213, sta.reversion(psi))
)
Sum cells' $\mathcal{L}$ to get Action $S$
def get_action(psi, a, electron_charge):
mass_term = get_mass_term(psi=psi,
electron_mass=electron_mass)
int_term = get_interaction_term(psi=psi,
a=a, electron_charge=electron_charge)
mom_term = get_momentum_term(psi=psi,
spacing=spacing, hbar=hbar)
# Sum terms and get scalar part
lagrangians = (mom_term - mass_term - int_term)[..., 0]
return tf.reduce_sum(lagrangians)