逻辑回归
以下以逻辑回归为例(向量都默认为列向量):
$$
z^{(1)} = w^T x^{(1)} + b \
a^{(1)} = \sigma (z^{(1)}) \
z^{(2)} = w^T x^{(2)} + b \
a^{(2)} = \sigma (z^{(2)}) \
z^{(3)} = w^T x^{(3)} + b \
a^{(3)} = \sigma (z^{(3)}) \
$$
以上是一般形式。
__多样本向量化__。对于一个矩阵 $X = [x^{(1)} x^{(2)} \cdots x^{(m)}]$(每一个输入 $x$ 有 $n_x$ 个参数,即大小为 $(n_x, 1)$)作为训练输入时,可有以下表示:
$
Z = [z^{(1)} z^{(2)} \cdots z^{(m)}] = w^T X + [b b \cdots b] \
A = [a^{(1)} a^{(2)} \cdots a^{(m)}] = \sigma(Z)
$
神经网络
以单层神经网络为例:
$$
z^{[1]}_1 = w^{[1]T}_1x + b^{[1]}_1, a^{[1]}_1 = \sigma(z^{[1]}_1) \
z^{[1]}_2 = w^{[1]T}_2x + b^{[1]}_2, a^{[1]}_2 = \sigma(z^{[1]}_2) \
z^{[1]}_3 = w^{[1]T}_3x + b^{[1]}_3, a^{[1]}_3 = \sigma(z^{[1]}_3) \
z^{[1]}_4 = w^{[1]T}_4x + b^{[1]}_4, a^{[1]}_4 = \sigma(z^{[1]}_4) \
$$
向量化:
$$
a^{[1]} =
    \left[
        \begin{array}{c}
        a^{[1]}{1}\
        a^{[1]}{2}\
        a^{[1]}{3}\
        a^{[1]}{4}
        \end{array}
        \right]
        = \sigma(z^{[1]}) \
\left[
        \begin{array}{c}
        z^{[1]}{1}\
        z^{[1]}{2}\
        z^{[1]}{3}\
        z^{[1]}{4}\
        \end{array}
        \right]
         =
    \overbrace{
    \left[
        \begin{array}{c}
        …w^{[1]T}{1}…\
        …w^{[1]T}{2}…\
        …w^{[1]T}{3}…\
        …w^{[1]T}{4}…
        \end{array}
        \right]
        }^{W^{[1]}}
        *
    \overbrace{
    \left[
        \begin{array}{c}
        x_1\
        x_2\
        x_3\
        \end{array}
        \right]
        }^{input}
        +
    \overbrace{
    \left[
        \begin{array}{c}
        b^{[1]}_1\
        b^{[1]}_2\
        b^{[1]}_3\
        b^{[1]}_4\
        \end{array}
        \right]
        }^{b^{[1]}}
$$
多样本向量化:
$$
x =
    \left[
        \begin{array}{c}
        \vdots & \vdots & \vdots & \vdots\
        x^{(1)} & x^{(2)} & \cdots & x^{(m)}\
        \vdots & \vdots & \vdots & \vdots\
        \end{array}
        \right] \
Z^{[1]} =
    \left[
        \begin{array}{c}
        \vdots & \vdots & \vdots & \vdots\
        z^{1} & z^{1} & \cdots & z^{1}\
        \vdots & \vdots & \vdots & \vdots\
        \end{array}
        \right] \
A^{[1]} =
    \left[
        \begin{array}{c}
        \vdots & \vdots & \vdots & \vdots\
        \alpha^{1} & \alpha^{1} & \cdots & \alpha^{1}\
        \vdots & \vdots & \vdots & \vdots\
        \end{array}
        \right] \
\left.
        \begin{array}{r}
        \text{$z^{1} = W^{1}x^{(i)} + b^{[1]}$}\
        \text{$\alpha^{1} = \sigma(z^{1})$}\
        \text{$z^{2} = W^{2}\alpha^{1} + b^{[2]}$}\
        \text{$\alpha^{2} = \sigma(z^{2})$}\
        \end{array}
        \right}
        \implies
        \begin{cases}
        \text{$A^{[1]} = \sigma(z^{[1]})$}\
        \text{$z^{[2]} = W^{[2]}A^{[1]} + b^{[2]}$}\
        \text{$A^{[2]} = \sigma(z^{[2]})$}\
        \end{cases}
$$
说明
- $a^{m}$:表示第个 n 训练样本的第 m 层。
 - $W^{[m]}_n$:表示第 m 层的第 n 个节点。