Modul 4: Teori

Modul 4: Teori#

Pyodide-bemærkning (Python direkte i Browseren)#

Koden i denne Notebook er beregnet til at blive kørt lokalt på din egen computer og ikke direkte i browseren via Pyodide. Årsagen er, at vi bruger scikit-learn til at downloade hele MNIST-datasættet. Dette er en data-tung operation, som involverer download af en stor fil, hvilket ikke umiddelbart understøttes eller er praktisk i et browser-baseret miljø som Pyodide.

Digitale billeder#

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml
# Hent MNIST (784 = 28x28 pixels)
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist.data, mnist.target.astype('int64')

plt.figure(figsize=(12, 12))
plt.subplot(221)
plt.imshow(X[0].reshape(28, 28), cmap='gray')
plt.title(f"Digit: {y[0]}")
plt.subplot(222)
plt.imshow(X[1].reshape(28, 28), cmap='gray')
plt.title(f"Digit: {y[1]}")
plt.subplot(223)
plt.imshow(X[2].reshape(28, 28), cmap='gray')
plt.title(f"Digit: {y[2]}")
plt.subplot(224)
plt.imshow(X[3].reshape(28, 28), cmap='gray')
plt.title(f"Digit: {y[3]}")
# vis grafen
plt.show()

../_images/61affaaeb67e0505b12c254fea56544298eab5be035ed7f3519b93884c79a251.png

Vektorfunktioner#

Lad \(d,k \in \mathbb{N}\). En vektorfunktion af flere variable er en funktion af formen

\[\begin{equation*} \pmb{f} \colon \operatorname{dom}(\pmb{f}) \to \mathbb{R}^k, \text{\; hvor }\operatorname{dom}(\pmb{f}) \subseteq \mathbb{R}^d. \end{equation*}\]

Altså har en vektorfunktion \(\pmb{f} = \pmb{x} \mapsto \pmb{f}(\pmb{x})\):

Input (domænet): vektorer \(\pmb{x}\) i \(\mathbb{R}^d\)
Output (kodomænet): vektorer \(\pmb{f}(\pmb{x})\) i \(\mathbb{R}^k\)

Input \(\pmb{x}_0\)#

# Vis det tredje billede fra træningssættet
x0 = X[2].reshape(28, 28)
plt.figure(figsize=(7, 7))
plt.imshow(x0, cmap='gray')
plt.axis('off')
plt.show()

../_images/d0c13ede4fa5ccc3ea84735416c145724e1901aba617ec3b22e28bfb29111cef.png

Output \(\pmb{f}(\pmb{x}_0)\)#

Output:

"Dette er cifret 4"

Ideelt output:

\[\begin{equation*} \pmb{f}(\pmb{x}_0) = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} \end{equation*}\]

Dette betyder: "100% sikker på, at dette er cifret 4"

Mere realistisk output:

\[\begin{equation*} \pmb{f}(\pmb{x}_0) = \begin{bmatrix} 0.01 \\ 0.03 \\ 0.01 \\ 0.01 \\ 0.87 \\ 0.01 \\ 0.01 \\ 0.03 \\ 0.01 \\ 0.01 \end{bmatrix} \end{equation*}\]

Dette betyder: "87% sikker på, at dette er cifret 4. Lille sandsynlighed (3%) for, at det er 1 eller 7."

Under alle omstændigheder: Outputtet er en vektor af sandsynligheder i \(\mathbb{R}^{10}\). Derfor er \(\operatorname{co\text{-}dom}(\pmb{f})=\mathbb{R}^{10}\).

Hvad med inputtet?#

Det er faktisk en matrix på størrelse 28x28:

\[\begin{split} %\tiny \scriptsize \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 67 & 232 & 39 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 62 & 81 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 120 & 180 & 39 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 126 & 163 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 153 & 210 & 40 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 220 & 163 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 27 & 254 & 162 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 222 & 163 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 183 & 254 & 125 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 46 & 245 & 163 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 198 & 254 & 56 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 120 & 254 & 163 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 23 & 231 & 254 & 29 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 159 & 254 & 120 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 163 & 254 & 216 & 16 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 159 & 254 & 67 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 14 & 86 & 178 & 248 & 254 & 91 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 159 & 254 & 85 & 0 & 0 & 0 & 47 & 49 & 116 & 144 & 150 & 241 & 243 & 234 & 179 & 241 & 252 & 40 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 150 & 253 & 237 & 207 & 207 & 207 & 253 & 254 & 250 & 240 & 198 & 143 & 91 & 28 & 5 & 233 & 250 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 119 & 177 & 177 & 177 & 177 & 177 & 98 & 56 & 0 & 0 & 0 & 0 & 0 & 102 & 254 & 220 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 169 & 254 & 137 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 169 & 254 & 57 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 169 & 254 & 57 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 169 & 255 & 94 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 169 & 254 & 96 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 169 & 254 & 153 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 169 & 255 & 153 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 96 & 254 & 153 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \end{split}\]

Men det er ikke umiddelbart en (søjle)vektor!

Vi kan gøre den om til en vektor ved at stakke billedets rækker oven på hinanden. Dette kaldes flattening (udfladning) af billedet i Python.

# Udskriv som søjle med 784 tal
print("\nSøjle-repræsentation af billedet (784 tal):")
x0.reshape(784,1)

Søjle-repræsentation af billedet (784 tal):

array([[  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [ 67],
       [232],
       [ 39],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [ 62],
       [ 81],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [120],
       [180],
       [ 39],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [126],
       [163],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  2],
       [153],
       [210],
       [ 40],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [220],
       [163],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [ 27],
       [254],
       [162],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [222],
       [163],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [183],
       [254],
       [125],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [ 46],
       [245],
       [163],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [198],
       [254],
       [ 56],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [120],
       [254],
       [163],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [ 23],
       [231],
       [254],
       [ 29],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [159],
       [254],
       [120],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [163],
       [254],
       [216],
       [ 16],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [159],
       [254],
       [ 67],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [ 14],
       [ 86],
       [178],
       [248],
       [254],
       [ 91],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [159],
       [254],
       [ 85],
       [  0],
       [  0],
       [  0],
       [ 47],
       [ 49],
       [116],
       [144],
       [150],
       [241],
       [243],
       [234],
       [179],
       [241],
       [252],
       [ 40],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [150],
       [253],
       [237],
       [207],
       [207],
       [207],
       [253],
       [254],
       [250],
       [240],
       [198],
       [143],
       [ 91],
       [ 28],
       [  5],
       [233],
       [250],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [119],
       [177],
       [177],
       [177],
       [177],
       [177],
       [ 98],
       [ 56],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [102],
       [254],
       [220],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [169],
       [254],
       [137],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [169],
       [254],
       [ 57],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [169],
       [254],
       [ 57],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [169],
       [255],
       [ 94],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [169],
       [254],
       [ 96],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [169],
       [254],
       [153],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [169],
       [255],
       [153],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [ 96],
       [254],
       [153],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0],
       [  0]])

Så vi kan tænke på inputtet som vektorer i \(\mathbb{R}^{784}\)! Det vil sige

En AI funktion#

AI’en er en vektorfunktion med \(d=784\), \(k=10\).

I maskinlæring kaldes funktionen for en model, og den specifikke AI-funktion afhænger typisk af tusindvis eller millioner af parametre (kaldet vægte). For hvert sæt af parametre/vægte får vi en ny AI-funktion. Sådanne funktioner (også dybe neurale netværk) er ikke særligt komplicerede, men ofte lange at skrive eksplicit. De er bygget af

Sammensætning af simple vektorfunktioner \(\pmb{g} \circ \pmb{h}\)
De simple funktioner er normalt kun: 1. Affine vektorfunktioner \(\pmb{x} \mapsto A \pmb{x} + \pmb{b}\) (elementerne i matrixen \(A\) og vektoren \(\pmb{b}\) er parametrene/vægtene) 2. En ikke-lineær aktiveringsfunktion, fx ReLU.

Antallet af sammensætninger \(\pmb{g}_1 \circ \pmb{g}_2 \circ \pmb{g}_3 \circ \cdots \circ \pmb{g}_N\) beskriver netværkets dybde.

Neurale netværk#

Hvordan ser disse “AI-funktioner” ud?

Generel notation for et feedforward ReLU-netværk#

Et feedforward-netværk beregner outputtet ved sekventielt at føre input gennem en række lag. For hvert lag \(\ell\) beregnes først en pre-aktivering (også kaldet logits), \(z^{(\ell)}\), efterfulgt af en aktivering (eller hidden state), \(h^{(\ell)}\).

\[\begin{equation*} \begin{aligned} &h^{(0)} = x \in \mathbb{R}^{n_0} &&\text{(Input)}\\ &z^{(\ell)} = W_\ell h^{(\ell-1)} + b_\ell, && \ell = 1,2,\dots,L &&\text{(Logits)}\\ &h^{(\ell)} = \begin{cases} \sigma (z^{(\ell)}), & \ell < L, \\ z^{(\ell)}, & \ell = L, \end{cases} &&\text{(Activation)} \end{aligned} \end{equation*}\]

hvor

\(h^{(0)}\) er input-vektoren \(x\).
\(W_\ell \in \mathbb{R}^{n_\ell \times n_{\ell-1}}\) er vægtmatricen for lag \(\ell\).
\(b_\ell \in \mathbb{R}^{n_\ell}\) er bias-vektoren.
\(\sigma:\mathbb{R}\to\mathbb{R}\) er en ikke-lineær aktiveringsfunktion (anvendt koordinatvis) typisk ReLU:

\[\begin{equation*} \sigma(z) = \max(0,z) \quad\text{(ReLU)} \end{equation*}\]

\(n_0=d\) er inputdimensionen, \(n_L=k\) outputdimensionen.

Vi siger kort at netværket er af formen \(n_0 \to n_1 \to \cdots \to n_L\).

Netværkets samlede funktion \(\Phi: \mathbb{R}^d \to \mathbb{R}^k\) giver det endelige output:

\[\begin{equation*} \Phi(x) = z^{(L)} = W_L h^{(L-1)} + b_L, \end{equation*}\]

idet sidste lag her er lineært (uden ReLU-aktivering).

Et neuralt netværk

Shallow netværk (ét skjult lag med L=2)#

For et shallow netværk \(\Phi:\mathbb{R}^2\to\mathbb{R}\) med ét skjult lag af størrelse \(n\):

\[\begin{equation*} \Phi(x) = W_2 \, \sigma(W_1 x + b_1) + b_2, \end{equation*}\]

hvor

\(x \in \mathbb{R}^2\),
\(W_1 \in \mathbb{R}^{n\times 2},\ b_1 \in \mathbb{R}^{n},\)
\(W_2 \in \mathbb{R}^{1\times n},\ b_2 \in \mathbb{R}.\)

Illustration af lagene:

\[\begin{equation*} h^{(0)} \xrightarrow{W_1,b_1} z^{(1)} \xrightarrow{\sigma} h^{(1)} \xrightarrow{W_2,b_2} z^{(2)} \xrightarrow{\sigma} h^{(2)} \xrightarrow{W_3,b_3} z^{(3)} = \Phi(x) \end{equation*}\]

Hvert ReLU-lag opdeler rummet i lineære regioner bestemt af ligningerne \( (W_\ell h^{(\ell-1)} + b_\ell)_i = 0 \), så \(\Phi\) er en stykkevist lineær funktion på \(\mathbb{R}^d\).

Generel formel for et ReLU-netværk med \(L=3\)#

Vi betragter en funktion

\[\begin{equation*} \Phi:\mathbb{R}^{n_0}\to\mathbb{R}^{n_3} \end{equation*}\]

defineret som et fuldt forbundet netværk med to skjulte lag:

\[\begin{equation*} \begin{aligned} h^{(0)} &= x \in \mathbb{R}^{n_0}, \\[2pt] z^{(1)} &= W_1 h^{(0)} + b_1,\\ h^{(1)} &= \sigma \bigl(z^{(1)}\bigr), \\[4pt] z^{(2)} &= W_2 h^{(1)} + b_2,\\ h^{(2)} &= \sigma \bigl(z^{(2)}\bigr), \\[4pt] z^{(3)} &= W_3 h^{(2)} + b_3,\\ h^{(3)} &= z^{(3)}. \end{aligned} \end{equation*}\]

hvor \(n_0 = d\), \(n_3 = k\), og

\(W_1 \in \mathbb{R}^{n_1\times n_0},\ b_1\in\mathbb{R}^{n_1}\)
\(W_2 \in \mathbb{R}^{n_2\times n_1},\ b_2\in\mathbb{R}^{n_2}\)
\(W_3 \in \mathbb{R}^{n_3\times n_2},\ b_3\in\mathbb{R}^{n_3}\)

Det samlede funktionsudtryk bliver:

\[\begin{equation*} \Phi(x) = W_3 \, \sigma \bigl(W_2 \, \sigma(W_1 x + b_1) + b_2 \bigr) + b_3. \end{equation*}\]

Eksempel: For et konkret netværk med to inputvariabler og ét output af formen \(2 \to n_1 \to n_2 \to 1\):

\[\begin{equation*} \Phi(x_1,x_2) = W_3 \,\sigma \Big(W_2 \, \sigma \big(W_1 \begin{bmatrix}x_1 \\ x_2\end{bmatrix} + b_1\big) + b_2\Big) + b_3, \end{equation*}\]

hvor dimensionerne er

\[\begin{equation*} W_1 \in \mathbb{R}^{n_1\times2},\quad W_2 \in \mathbb{R}^{n_2\times n_1},\quad W_3 \in \mathbb{R}^{1\times n_2}. \end{equation*}\]

Netværk og træning direkte i SKLearn#

Vi skal finde en “AI-funktion”

\[\begin{equation*} \Phi : \mathbb{R}^{784} \to \mathbb{R}^{10} \end{equation*}\]

der “bedst”-muligt kan klassificere et billede af et håndskrevet ciffer.

I koden nedenfor opbygges denne som et ReLU-netværk med to skjulte lag, hvilket giver en samlet dybde på \(L=3\). Formen af netværket er \(784 \to 256 \to 128 \to 10\). Altså er vægt-matricerne af størrelse

\[\begin{equation*} \begin{aligned} &W_1 \in \mathbb{R}^{256 \times 784}, \quad b_1 \in \mathbb{R}^{256}, \\ &W_2 \in \mathbb{R}^{128 \times 256}, \quad b_2 \in \mathbb{R}^{128}, \\ &W_3 \in \mathbb{R}^{10 \times 128}, \quad b_3 \in \mathbb{R}^{10}. \end{aligned} \end{equation*}\]

from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier

# Normaliser til [0,1]
X = X / 255.0

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=1/7, random_state=42
)

# DNN med samme “størrelse” som PyTorch-eksemplet
clf = MLPClassifier(
    hidden_layer_sizes=(256, 128),  # to skjulte lag
    activation='relu',
    solver='adam',
    batch_size=64,
    learning_rate_init=1e-3,
    max_iter=6,
    verbose=True
)

Netværkets størrelse#

Hvor mange parametre er der?

Svar#

Det samlede antal parametre for netværket er:

Lag 1: (784 input × 256 neuroner) + 256 bias = 200.704 + 256 = 200.960
Lag 2: (256 input × 128 neuroner) + 128 bias = 32.768 + 128 = 32.896
Lag 3: (128 input × 10 neuroner) + 10 bias = 1.280 + 10 = 1.290

I alt: 200.960 + 32.896 + 1.290 = 235.146

Træning via SKLearn#

Vi finder de optimale værdier for alle parametrene i \(Phi-\)funktionen ved at træne modellen med fit-metoden:

# Træn
clf.fit(X_train, y_train)

Iteration 1, loss = 0.23307118

Iteration 2, loss = 0.09007207

Iteration 3, loss = 0.06147579

Iteration 4, loss = 0.04478270

/usr/local/lib/python3.11/site-packages/sklearn/neural_network/_multilayer_perceptron.py:792: UserWarning: Training interrupted by user.
  warnings.warn("Training interrupted by user.")

MLPClassifier(batch_size=64, hidden_layer_sizes=(256, 128), max_iter=6,
              verbose=True)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Forudsigelse på et enkelt billede#

Denne funktion kan vi nu bruge på et enkelt input-billede for at få en forudsigelse. Lad os tage et enkelt billede fra vores testsæt og se, hvad modellen forudsiger.

image_index = 2 # Vælg et billede (input)
input_image = X_test[image_index:image_index+1] # Format (1, 784)
true_label = y_test[image_index]

probability_vector = clf.predict_proba(input_image) # Få sandsynlighedsvektoren
predicted_label = clf.predict(input_image) # Lav forudsigelse 

print(input_image)
print(true_label)
print(probability_vector)
print(predicted_label)

[[0.         0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
61960784 0.90588235 0.14901961 0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.02745098 0.25490196 0.56862745
56862745 0.34509804 0.         0.05490196 0.83137255 0.99215686
28235294 0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.09019608 0.14509804
25882353 0.7254902  0.99215686 0.9372549  0.91372549 0.99215686
59607843 0.45490196 0.99215686 0.80784314 0.10980392 0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.06666667 0.87058824 0.99215686 0.99215686 0.9372549
80392157 0.27058824 0.14509804 0.64705882 0.99607843 0.99215686
89803922 0.10980392 0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.0745098
90196078 0.99215686 0.99215686 0.48235294 0.         0.
        0.58823529 0.99607843 0.81176471 0.24313725 0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.14901961 0.60392157
99215686 0.78823529 0.19215686 0.         0.46666667 0.98431373
98431373 0.30588235 0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.03921569 0.76470588 0.99215686
92941176 0.80784314 0.97647059 0.99215686 0.56470588 0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.10196078 0.50196078 0.92156863 0.99215686
99215686 0.99215686 0.90196078 0.38431373 0.14509804 0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.38039216 0.99215686 0.99215686 0.99215686
99607843 0.99215686 0.9372549  0.58431373 0.12156863 0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
56078431 0.99215686 0.78823529 0.06666667 0.29411765 0.51764706
81176471 0.99215686 0.84705882 0.11372549 0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.1372549  0.90196078 0.99607843
54509804 0.         0.         0.         0.07058824 0.56470588
98431373 0.8627451  0.12156863 0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.80784314 0.99215686 0.09411765 0.
        0.         0.         0.         0.66666667 0.99215686
69019608 0.01568627 0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
50980392 0.99215686 0.38823529 0.         0.         0.
        0.         0.09019608 0.78039216 0.99215686 0.14117647
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.27058824 0.96862745
64705882 0.64313725 0.         0.         0.         0.
        0.28627451 0.99215686 0.45882353 0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.85490196 0.99215686 0.99215686
11764706 0.         0.         0.         0.         0.28627451
99215686 0.61176471 0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.38431373 0.83137255 0.99215686 0.8745098  0.3254902
        0.         0.         0.07058824 0.85098039 0.90588235
07058824 0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
28627451 0.98039216 0.99607843 0.88627451 0.30980392 0.
        0.23529412 0.95686275 0.86666667 0.0627451  0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.13333333
99607843 0.99215686 0.83921569 0.30980392 0.         0.28627451
99215686 0.43529412 0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.36470588 0.99215686
99215686 0.97647059 0.78823529 0.81568627 0.99215686 0.14117647
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.00392157 0.31764706 0.81176471 0.99215686
99215686 0.85490196 0.36078431 0.00784314 0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.         0.         0.
        0.         0.         0.        ]]
8
[[7.08580512e-05 2.61484819e-04 1.44964793e-05 1.17057312e-03
67150883e-05 5.34671916e-02 4.29566890e-01 8.57747096e-06
14443444e-01 9.79769433e-04]]
[8]

Total forudsigelse#

Den samlede procentandel af billeder i testsættet, som modellen klassificerede korrekt kan findes ved:

# Samlet evaluering
print("Test accuracy:", clf.score(X_test, y_test))

Test accuracy: 0.9778

Visualisering af forudsigelser#

For at få en bedre fornemmelse af, hvordan modellen opfører sig, kan vi visualisere dens forudsigelser på enkelte billeder fra testsættet. Nedenstående funktion plotter billedet, den korrekte label, den forudsagte label og et søjlediagram over de forudsagte sandsynligheder for hver klasse. Dette er nyttigt for at se, hvornår modellen er sikker, og hvornår den er i tvivl.

def show_images_with_mlp_probabilities(clf, X_test, y_test, X_test_orig, 
                                       num_images=5, only_incorrect=False,
                                       image_shape=(8,8)):
    # Predict full test set
    pred_labels = clf.predict(X_test)
    probas = clf.predict_proba(X_test)

    # Select indices
    all_indices = np.arange(len(X_test))
    if only_incorrect:
        indices = all_indices[pred_labels != y_test][:num_images]
    else:
        indices = all_indices[:num_images]

    plt.figure(figsize=(12, 6))

    for i, idx in enumerate(indices):
        # --- Image plot ---
        plt.subplot(2, num_images, i + 1)
        img = X_test_orig[idx].reshape(image_shape)
        plt.imshow(img, cmap='gray')
        plt.title(f"Idx {idx}\nTrue {y_test[idx]}\nPred {pred_labels[idx]}")
        plt.axis('off')

        # --- Probability distribution ---
        plt.subplot(2, num_images, num_images + i + 1)

        p = probas[idx]

        classes = np.arange(len(p))
        colors = [
            "red" if c == y_test[idx] else
            ("green" if c == pred_labels[idx] else "blue")
            for c in classes
        ]

        plt.bar(classes, p, color=colors)
        plt.xticks(classes)
        plt.ylim(0, 1)
        plt.xlabel("Class")
        plt.ylabel("Probability")

    plt.tight_layout()
    plt.show()

show_images_with_mlp_probabilities(
    clf, 
    X_test, 
    y_test, 
    X_test_orig=X_test, 
    only_incorrect=False,
    num_images=5,
    image_shape=(28,28)
)    

../_images/ccce3d5195a8015fc07e0e38c82611993e3d80123876949d3b9b4c4ddb86189c.png

show_images_with_mlp_probabilities(
    clf, 
    X_test, 
    y_test, 
    X_test_orig=X_test, 
    only_incorrect=True,
    num_images=5,
    image_shape=(28,28)
)    

../_images/dbc33ccc7294d608abd11b95bf450390e52c13b38ecd094453960a0bbef3e3c8.png

	hidden_layer_sizes hidden_layer_sizes: array-like of shape(n_layers - 2,), default=(100,) The ith element represents the number of neurons in the ith hidden layer.	(256, ...)
	activation activation: {'identity', 'logistic', 'tanh', 'relu'}, default='relu' Activation function for the hidden layer. - 'identity', no-op activation, useful to implement linear bottleneck, returns f(x) = x - 'logistic', the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)). - 'tanh', the hyperbolic tan function, returns f(x) = tanh(x). - 'relu', the rectified linear unit function, returns f(x) = max(0, x)	'relu'
	solver solver: {'lbfgs', 'sgd', 'adam'}, default='adam' The solver for weight optimization. - 'lbfgs' is an optimizer in the family of quasi-Newton methods. - 'sgd' refers to stochastic gradient descent. - 'adam' refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba For a comparison between Adam optimizer and SGD, see :ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_training_curves.py`. Note: The default solver 'adam' works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, 'lbfgs' can converge faster and perform better.	'adam'
	alpha alpha: float, default=0.0001 Strength of the L2 regularization term. The L2 regularization term is divided by the sample size when added to the loss. For an example usage and visualization of varying regularization, see :ref:`sphx_glr_auto_examples_neural_networks_plot_mlp_alpha.py`.	0.0001
	batch_size batch_size: int, default='auto' Size of minibatches for stochastic optimizers. If the solver is 'lbfgs', the classifier will not use minibatch. When set to "auto", `batch_size=min(200, n_samples)`.	64
	learning_rate learning_rate: {'constant', 'invscaling', 'adaptive'}, default='constant' Learning rate schedule for weight updates. - 'constant' is a constant learning rate given by 'learning_rate_init'. - 'invscaling' gradually decreases the learning rate at each time step 't' using an inverse scaling exponent of 'power_t'. effective_learning_rate = learning_rate_init / pow(t, power_t) - 'adaptive' keeps the learning rate constant to 'learning_rate_init' as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if 'early_stopping' is on, the current learning rate is divided by 5. Only used when ``solver='sgd'``.	'constant'
	learning_rate_init learning_rate_init: float, default=0.001 The initial learning rate used. It controls the step-size in updating the weights. Only used when solver='sgd' or 'adam'.	0.001
	power_t power_t: float, default=0.5 The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to 'invscaling'. Only used when solver='sgd'.	0.5
	max_iter max_iter: int, default=200 Maximum number of iterations. The solver iterates until convergence (determined by 'tol') or this number of iterations. For stochastic solvers ('sgd', 'adam'), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.	6
	shuffle shuffle: bool, default=True Whether to shuffle samples in each iteration. Only used when solver='sgd' or 'adam'.	True
	random_state random_state: int, RandomState instance, default=None Determines random number generation for weights and bias initialization, train-test split if early stopping is used, and batch sampling when solver='sgd' or 'adam'. Pass an int for reproducible results across multiple function calls. See :term:`Glossary `.	None
	tol tol: float, default=1e-4 Tolerance for the optimization. When the loss or score is not improving by at least ``tol`` for ``n_iter_no_change`` consecutive iterations, unless ``learning_rate`` is set to 'adaptive', convergence is considered to be reached and training stops.	0.0001
	verbose verbose: bool, default=False Whether to print progress messages to stdout.	True
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See :term:`the Glossary `.	False
	momentum momentum: float, default=0.9 Momentum for gradient descent update. Should be between 0 and 1. Only used when solver='sgd'.	0.9
	nesterovs_momentum nesterovs_momentum: bool, default=True Whether to use Nesterov's momentum. Only used when solver='sgd' and momentum > 0.	True
	early_stopping early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside ``validation_fraction`` of training data as validation and terminate training when validation score is not improving by at least ``tol`` for ``n_iter_no_change`` consecutive epochs. The split is stratified, except in a multilabel setting. If early stopping is False, then the training stops when the training loss does not improve by more than ``tol`` for ``n_iter_no_change`` consecutive passes over the training set. Only effective when solver='sgd' or 'adam'.	False
	validation_fraction validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.	0.1
	beta_1 beta_1: float, default=0.9 Exponential decay rate for estimates of first moment vector in adam, should be in [0, 1). Only used when solver='adam'.	0.9
	beta_2 beta_2: float, default=0.999 Exponential decay rate for estimates of second moment vector in adam, should be in [0, 1). Only used when solver='adam'.	0.999
	epsilon epsilon: float, default=1e-8 Value for numerical stability in adam. Only used when solver='adam'.	1e-08
	n_iter_no_change n_iter_no_change: int, default=10 Maximum number of epochs to not meet ``tol`` improvement. Only effective when solver='sgd' or 'adam'. .. versionadded:: 0.20	10
	max_fun max_fun: int, default=15000 Only used when solver='lbfgs'. Maximum number of loss function calls. The solver iterates until convergence (determined by 'tol'), number of iterations reaches max_iter, or this number of loss function calls. Note that number of loss function calls will be greater than or equal to the number of iterations for the `MLPClassifier`. .. versionadded:: 0.22	15000