Wetts's blog

Stay Hungry, Stay Foolish.

0%

DL-向量化.md

逻辑回归

以下以逻辑回归为例(向量都默认为列向量):

$$
z^{(1)} = w^T x^{(1)} + b \
a^{(1)} = \sigma (z^{(1)}) \
z^{(2)} = w^T x^{(2)} + b \
a^{(2)} = \sigma (z^{(2)}) \
z^{(3)} = w^T x^{(3)} + b \
a^{(3)} = \sigma (z^{(3)}) \
$$

以上是一般形式。

__多样本向量化__。对于一个矩阵 $X = [x^{(1)} x^{(2)} \cdots x^{(m)}]$(每一个输入 $x$ 有 $n_x$ 个参数,即大小为 $(n_x, 1)$)作为训练输入时,可有以下表示:
$
Z = [z^{(1)} z^{(2)} \cdots z^{(m)}] = w^T X + [b b \cdots b] \
A = [a^{(1)} a^{(2)} \cdots a^{(m)}] = \sigma(Z)
$

神经网络

以单层神经网络为例:
$$
z^{[1]}_1 = w^{[1]T}_1x + b^{[1]}_1, a^{[1]}_1 = \sigma(z^{[1]}_1) \
z^{[1]}_2 = w^{[1]T}_2x + b^{[1]}_2, a^{[1]}_2 = \sigma(z^{[1]}_2) \
z^{[1]}_3 = w^{[1]T}_3x + b^{[1]}_3, a^{[1]}_3 = \sigma(z^{[1]}_3) \
z^{[1]}_4 = w^{[1]T}_4x + b^{[1]}_4, a^{[1]}_4 = \sigma(z^{[1]}_4) \
$$

向量化:
$$
a^{[1]} =
\left[
\begin{array}{c}
a^{[1]}{1}\
a^{[1]}
{2}\
a^{[1]}{3}\
a^{[1]}
{4}
\end{array}
\right]
= \sigma(z^{[1]}) \

\left[
\begin{array}{c}
z^{[1]}{1}\
z^{[1]}
{2}\
z^{[1]}{3}\
z^{[1]}
{4}\
\end{array}
\right]
=
\overbrace{
\left[
\begin{array}{c}
…w^{[1]T}{1}…\
…w^{[1]T}
{2}…\
…w^{[1]T}{3}…\
…w^{[1]T}
{4}…
\end{array}
\right]
}^{W^{[1]}}
*
\overbrace{
\left[
\begin{array}{c}
x_1\
x_2\
x_3\
\end{array}
\right]
}^{input}
+
\overbrace{
\left[
\begin{array}{c}
b^{[1]}_1\
b^{[1]}_2\
b^{[1]}_3\
b^{[1]}_4\
\end{array}
\right]
}^{b^{[1]}}
$$

多样本向量化:
$$
x =
\left[
\begin{array}{c}
\vdots & \vdots & \vdots & \vdots\
x^{(1)} & x^{(2)} & \cdots & x^{(m)}\
\vdots & \vdots & \vdots & \vdots\
\end{array}
\right] \

Z^{[1]} =
\left[
\begin{array}{c}
\vdots & \vdots & \vdots & \vdots\
z^{1} & z^{1} & \cdots & z^{1}\
\vdots & \vdots & \vdots & \vdots\
\end{array}
\right] \

A^{[1]} =
\left[
\begin{array}{c}
\vdots & \vdots & \vdots & \vdots\
\alpha^{1} & \alpha^{1} & \cdots & \alpha^{1}\
\vdots & \vdots & \vdots & \vdots\
\end{array}
\right] \

\left.
\begin{array}{r}
\text{$z^{1} = W^{1}x^{(i)} + b^{[1]}$}\
\text{$\alpha^{1} = \sigma(z^{1})$}\
\text{$z^{2} = W^{2}\alpha^{1} + b^{[2]}$}\
\text{$\alpha^{2} = \sigma(z^{2})$}\
\end{array}
\right}
\implies
\begin{cases}
\text{$A^{[1]} = \sigma(z^{[1]})$}\
\text{$z^{[2]} = W^{[2]}A^{[1]} + b^{[2]}$}\
\text{$A^{[2]} = \sigma(z^{[2]})$}\
\end{cases}
$$

说明

  • $a^{m}$:表示第个 n 训练样本的第 m 层。
  • $W^{[m]}_n$:表示第 m 层的第 n 个节点。