律师网站素材,福州房地产网站建设,app网站排名,网站图片切换神经网络
计算 神经网络非常简单#xff0c;举个例子就理解了#xff08;最后一层的那个写错了#xff0c;应该是 a 1 ( 3 ) a^{(3)}_1 a1(3)#xff09;#xff1a; n o t a t i o n notation notation#xff1a; a j ( i ) a^{(i)}_j aj(i) 表示第 i i i 层的…神经网络
计算 神经网络非常简单举个例子就理解了最后一层的那个写错了应该是 a 1 ( 3 ) a^{(3)}_1 a1(3) n o t a t i o n notation notation a j ( i ) a^{(i)}_j aj(i) 表示第 i i i 层的第 j j j 个单元。 w ( j ) w^{(j)} w(j) 表示权重矩阵控制从 j j j 层到 j 1 j 1 j1 层的映射。 其中 a 1 ( 2 ) g ( w 10 ( 1 ) x 0 w 11 ( 1 ) x 1 w 12 ( 1 ) x 2 w 13 ( 1 ) x 3 ) a 2 ( 2 ) g ( w 20 ( 1 ) x 0 w 21 ( 1 ) x 1 w 22 ( 1 ) x 2 w 23 ( 1 ) x 3 ) a 3 ( 2 ) g ( w 30 ( 1 ) x 0 w 31 ( 1 ) x 1 w 32 ( 1 ) x 2 w 33 ( 1 ) x 3 ) h ( x ) a 1 ( 3 ) g ( w 10 ( 2 ) a 0 ( 2 ) w 11 ( 2 ) a 1 ( 2 ) w 12 ( 2 ) a 2 ( 2 ) w 13 ( 2 ) a 3 ( 2 ) ) \begin{aligned} a^{(2)}_1 g\bigg( w^{(1)}_{10} x_0 w^{(1)}_{11} x_1 w^{(1)}_{12} x_2 w^{(1)}_{13} x_3 \bigg)\\ a^{(2)}_2 g\bigg( w^{(1)}_{20} x_0 w^{(1)}_{21} x_1 w^{(1)}_{22} x_2 w^{(1)}_{23} x_3 \bigg)\\ a^{(2)}_3 g\bigg( w^{(1)}_{30} x_0 w^{(1)}_{31} x_1 w^{(1)}_{32} x_2 w^{(1)}_{33} x_3 \bigg)\\ h(x) a^{(3)}_1 g\bigg( w^{(2)}_{10}a^{(2)}_0 w^{(2)}_{11}a^{(2)}_1 w^{(2)}_{12}a^{(2)}_2 w^{(2)}_{13}a^{(2)}_3 \bigg) \end{aligned} a1(2)a2(2)a3(2)h(x)a1(3)g(w10(1)x0w11(1)x1w12(1)x2w13(1)x3)g(w20(1)x0w21(1)x1w22(1)x2w23(1)x3)g(w30(1)x0w31(1)x1w32(1)x2w33(1)x3)g(w10(2)a0(2)w11(2)a1(2)w12(2)a2(2)w13(2)a3(2)) 如果向量化一下那就是 x [ x 0 x 1 x 2 x 3 ] , w ( 1 ) [ w 10 ( 1 ) w 11 ( 1 ) w 12 ( 1 ) w 13 ( 1 ) w 20 ( 1 ) w 21 ( 1 ) w 22 ( 1 ) w 23 ( 1 ) w 30 ( 1 ) w 31 ( 1 ) w 32 ( 1 ) w 33 ( 1 ) ] x \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{bmatrix}, \;\;\;\; w^{(1)} \begin{bmatrix} w^{(1)}_{10} w^{(1)}_{11} w^{(1)}_{12} w^{(1)}_{13} \\ w^{(1)}_{20} w^{(1)}_{21} w^{(1)}_{22} w^{(1)}_{23} \\ w^{(1)}_{30} w^{(1)}_{31} w^{(1)}_{32} w^{(1)}_{33} \end{bmatrix} x x0x1x2x3 ,w(1) w10(1)w20(1)w30(1)w11(1)w21(1)w31(1)w12(1)w22(1)w32(1)w13(1)w23(1)w33(1) 然后有 z ( 2 ) w ( 1 ) x [ z 1 ( 2 ) z 2 ( 2 ) z 3 ( 2 ) ] , a ( 2 ) g ( z ( 2 ) ) [ a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) ] z^{(2)} w^{(1)}x \begin{bmatrix} z^{(2)}_1 \\ z^{(2)}_2 \\ z^{(2)}_3 \end{bmatrix}, \;\;\;\;a^{(2)} g(z^{(2)}) \begin{bmatrix} a^{(2)}_1 \\ a^{(2)}_2 \\ a^{(2)}_3 \end{bmatrix} z(2)w(1)x z1(2)z2(2)z3(2) ,a(2)g(z(2)) a1(2)a2(2)a3(2) 下一层是 a ( 2 ) [ a 0 ( 2 ) a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) ] , w ( 2 ) [ w 20 ( 2 ) w 21 ( 2 ) w 22 ( 2 ) w 23 ( 2 ) ] a^{(2)} \begin{bmatrix} a^{(2)}_{0} \\ a^{(2)}_{1} \\ a^{(2)}_{2} \\ a^{(2)}_{3} \end{bmatrix}, \;\;\;\;w^{(2)} \begin{bmatrix} w^{(2)}_{20} w^{(2)}_{21} w^{(2)}_{22} w^{(2)}_{23} \end{bmatrix} a(2) a0(2)a1(2)a2(2)a3(2) ,w(2)[w20(2)w21(2)w22(2)w23(2)] z ( 3 ) w ( 2 ) a ( 2 ) [ z 1 ( 3 ) ] , a ( 3 ) g ( z ( 3 ) ) [ a 1 ( 3 ) ] z^{(3)} w^{(2)}a^{(2)} \begin{bmatrix} z^{(3)}_1 \end{bmatrix}, \;\;\;\; a^{(3)} g(z^{(3)}) \begin{bmatrix} a^{(3)}_1 \end{bmatrix} z(3)w(2)a(2)[z1(3)],a(3)g(z(3))[a1(3)] 以上就是神经网络的计算方式其实还是很好理解也很好实现的qwq
后向传播 B a c k P r o p a g a t i o n Back \; Propagation BackPropagation 现在就是考虑如何计算出 w ( i ) w^{(i)} w(i) 这么多矩阵了。 n o t a t i o n notation notation L L L 表示神经网络的层数 S l S_l Sl 表示 l l l 层的节点数 k k k 表示输出层的节点数 我们仍然考虑用类似 G D GD GD 的方法于是我们考虑 min w J ( w ) \min\limits_wJ(w) wminJ(w)其中 J ( w ) 1 m ∑ i 1 m ∑ k 1 S L 1 2 [ ( h ( x i ) ) k − y i k ] 2 J(w) \frac 1m \sum_{i 1}^m\sum_{k 1}^{S_L}\frac 12 \bigg[ (h(x_i))_k - y_{ik} \bigg]^2 J(w)m1i1∑mk1∑SL21[(h(xi))k−yik]2 然后我们就是要求解 ∂ J ( w ) ∂ w i j ( l ) \frac{\partial J(w)}{\partial w^{(l)}_{ij}} ∂wij(l)∂J(w)。 我们考虑将所有的训练数据分开求解对于其中一个训练数据 ( x i , y i ) (x_i, y_i) (xi,yi) 来说 J i ∑ k 1 S L 1 2 [ ( h ( x i ) ) k − y i k ] 2 J_i \sum_{k 1}^{S_L}\frac 12 \bigg[ (h(x_i))_k - y_{ik} \bigg]^2 Jik1∑SL21[(h(xi))k−yik]2 我们定义 δ i ( l ) \delta^{(l)}_i δi(l) 表示 a i ( l ) a^{(l)}_i ai(l) 对真实值的差值也就是 δ j ( l ) ∂ J i ∂ z j ( l ) \delta^{(l)}_j \frac{\partial J_i}{\partial z^{(l)}_j} δj(l)∂zj(l)∂Ji 而对于最后一层来说 δ j ( L ) ∂ J i ∂ z j ( L ) ∂ J i ∂ a j ( L ) ⋅ ∂ a j ( L ) ∂ z j ( L ) ∂ ∑ k 1 S L 1 2 [ ( h ( x i ) ) k − y i k ] 2 ∂ a j ( L ) ⋅ ∂ g ( z j ( L ) ) ∂ z j ( L ) ∂ ∑ k 1 S L 1 2 [ a k ( L ) − y i k ] 2 ∂ a j ( L ) ⋅ g ′ ( z j ( L ) ) ( a j ( L ) − y i k ) ⋅ g ′ ( z j ( L ) ) \begin{aligned} \delta^{(L)}_j \frac{\partial J_i}{\partial z^{(L)}_j} \frac{\partial J_i}{\partial a^{(L)}_j} \cdot \frac{\partial a^{(L)}_j}{\partial z^{(L)}_j} \frac{\partial \sum\limits_{k 1}^{S_L}\frac 12 [(h(x_i))_k - y_{ik}]^2}{\partial a^{(L)}_j} \cdot \frac{\partial g(z^{(L)}_j)}{\partial z^{(L)}_j} \\ \frac{\partial \sum\limits_{k 1}^{S_L}\frac 12 [a^{(L)}_k - y_{ik}]^2}{\partial a^{(L)}_j} \cdot g(z^{(L)}_j) (a^{(L)}_j - y_{ik}) \cdot g(z^{(L)}_j) \end{aligned} δj(L)∂zj(L)∂Ji∂aj(L)∂Ji⋅∂zj(L)∂aj(L)∂aj(L)∂k1∑SL21[(h(xi))k−yik]2⋅∂zj(L)∂g(zj(L))∂aj(L)∂k1∑SL21[ak(L)−yik]2⋅g′(zj(L))(aj(L)−yik)⋅g′(zj(L)) 而我们要算的是 ∂ J i ∂ w j k ( L − 1 ) ∂ J i ∂ a j ( L ) ⋅ ∂ a j ( L ) ∂ z j ( L ) ⋅ ∂ z j ( L ) ∂ w j k ( L − 1 ) δ j ( L ) ⋅ ∂ z j ( L ) ∂ w j k ( L − 1 ) \begin{aligned} \frac{\partial J_i}{\partial w^{(L-1)}_{jk}} \frac{\partial J_i}{\partial a^{(L)}_j} \cdot \frac{\partial a^{(L)}_j}{\partial z^{(L)}_j} \cdot \frac{\partial z^{(L)}_j}{\partial w^{(L-1)}_{jk}} \delta^{(L)}_j \cdot \frac{\partial z^{(L)}_j}{\partial w^{(L-1)}_{jk}} \end{aligned} ∂wjk(L−1)∂Ji∂aj(L)∂Ji⋅∂zj(L)∂aj(L)⋅∂wjk(L−1)∂zj(L)δj(L)⋅∂wjk(L−1)∂zj(L) 所以我们只需要计算 ∂ z j ( L ) ∂ w j k ( L − 1 ) \frac{\partial z^{(L)}_j}{\partial w^{(L-1)}_{jk}} ∂wjk(L−1)∂zj(L) 就可以了 我们又知道 z j ( L ) ∑ i 1 S L − 1 w j i ( L − 1 ) a i ( L − 1 ) z^{(L)}_j \sum_{i 1}^{S_{L - 1}}w^{(L - 1)}_{ji}a^{(L-1)}_i zj(L)i1∑SL−1wji(L−1)ai(L−1) 所以 ∂ z j ( L ) ∂ w j k ( L − 1 ) ∑ i 1 S L − 1 ∂ w j i ( L − 1 ) a i ( L − 1 ) ∂ w j k ( L − 1 ) a k ( L − 1 ) \frac{\partial z^{(L)}_j}{\partial w^{(L-1)}_{jk}} \frac{\sum\limits_{i 1}^{S_{L - 1}}\partial w^{(L-1)}_{ji}a^{(L-1)}_i }{\partial w^{(L-1)}_{jk}} a^{(L-1)}_k ∂wjk(L−1)∂zj(L)∂wjk(L−1)i1∑SL−1∂wji(L−1)ai(L−1)ak(L−1) 于是 ∂ J i ∂ w j k ( L − 1 ) δ j ( L ) ⋅ a k ( L − 1 ) \frac{\partial J_i}{\partial w^{(L-1)}_{jk}} \delta^{(L)}_j \cdot a^{(L-1)}_k ∂wjk(L−1)∂Jiδj(L)⋅ak(L−1) 现在我们有了最后一层我们考虑能不能往前推回去这里我们以一个简单的例子来更直观的计算这里我画图时把 w w w 写成 φ \varphi φ 了qwq 我们假设我们要计算 J J J 对 w 11 ( 3 ) w^{(3)}_{11} w11(3) 求偏导 ∂ J i ∂ w 11 ( 3 ) ∂ ( J i 1 J i 2 ) ∂ w 11 ( 3 ) ∂ J i 1 ∂ w 11 ( 3 ) ∂ J i 2 ∂ w 11 ( 3 ) \frac{\partial J_i}{\partial w^{(3)}_{11}} \frac{\partial (J_{i1} J_{i2})}{\partial w^{(3)}_{11}} \frac{\partial J_{i1}}{\partial w^{(3)}_{11}} \frac{\partial J_{i2}}{\partial w^{(3)}_{11}} ∂w11(3)∂Ji∂w11(3)∂(Ji1Ji2)∂w11(3)∂Ji1∂w11(3)∂Ji2 我们考虑分开求 ∂ J i 1 ∂ w 11 ( 3 ) \frac{\partial J_{i1}}{\partial w^{(3)}_{11}} ∂w11(3)∂Ji1 和 ∂ J i 2 ∂ w 11 ( 3 ) \frac{\partial J_{i2}}{\partial w^{(3)}_{11}} ∂w11(3)∂Ji2 先算前一项沿着神经网络做分布求导 ∂ J i 1 ∂ w 11 ( 3 ) ∂ J i 1 ∂ a 1 ( 5 ) ⋅ ∂ a 1 ( 5 ) ∂ z 1 ( 5 ) ⋅ ∂ z 1 ( 5 ) ∂ a 1 ( 4 ) ⋅ ∂ a 1 ( 4 ) ∂ z 1 ( 4 ) ⋅ ∂ z 1 ( 4 ) w 11 ( 3 ) δ 1 ( 5 ) ⋅ ∂ z 1 ( 5 ) ∂ a 1 ( 4 ) ⋅ ∂ a 1 ( 4 ) ∂ z 1 ( 4 ) ⋅ ∂ z 1 ( 4 ) w 11 ( 3 ) \begin{aligned} \frac{\partial J_{i1}}{\partial w^{(3)}_{11}} \frac{\partial J_{i1}}{\partial a^{(5)}_1} \cdot \frac{\partial a^{(5)}_1}{\partial z^{(5)}_1} \cdot \frac{\partial z^{(5)}_1}{\partial a^{(4)}_1} \cdot \frac{\partial a^{(4)}_1}{\partial z^{(4)}_1} \cdot \frac{\partial z^{(4)}_1}{w^{(3)}_{11}} \\ \delta^{(5)}_1 \cdot \frac{\partial z^{(5)}_1}{\partial a^{(4)}_1} \cdot \frac{\partial a^{(4)}_1}{\partial z^{(4)}_1} \cdot \frac{\partial z^{(4)}_1}{w^{(3)}_{11}} \end{aligned} ∂w11(3)∂Ji1∂a1(5)∂Ji1⋅∂z1(5)∂a1(5)⋅∂a1(4)∂z1(5)⋅∂z1(4)∂a1(4)⋅w11(3)∂z1(4)δ1(5)⋅∂a1(4)∂z1(5)⋅∂z1(4)∂a1(4)⋅w11(3)∂z1(4) 我们又有 z 1 ( 5 ) w 11 ( 4 ) a 1 ( 4 ) w 12 ( 4 ) a 2 ( 4 ) → ∂ z 1 ( 5 ) ∂ a 1 ( 4 ) w 11 ( 4 ) a 1 ( 4 ) g ( z 1 ( 4 ) ) → ∂ a 1 ( 4 ) ∂ z 1 ( 4 ) g ′ ( z 1 ( 4 ) ) z 1 ( 4 ) w 11 ( 3 ) a 1 ( 3 ) w 12 ( 3 ) a 2 ( 3 ) → ∂ z 1 ( 4 ) ∂ w 11 ( 3 ) a 1 ( 3 ) \begin{aligned} z^{(5)}_1 w^{(4)}_{11}a^{(4)}_1 w^{(4)}_{12}a^{(4)}_2 \rightarrow \frac{\partial z^{(5)}_1}{\partial a^{(4)}_1} w^{(4)}_{11} \\ a^{(4)}_1 g(z^{(4)}_1) \rightarrow \frac{\partial a^{(4)}_1}{\partial z^{(4)}_1} g(z^{(4)}_1) \\ z^{(4)}_1 w^{(3)}_{11}a^{(3)}_1 w^{(3)}_{12}a^{(3)}_2 \rightarrow \frac{\partial z^{(4)}_1}{\partial w^{(3)}_{11}} a^{(3)}_1 \end{aligned} z1(5)w11(4)a1(4)w12(4)a2(4)→a1(4)g(z1(4))→z1(4)w11(3)a1(3)w12(3)a2(3)→∂a1(4)∂z1(5)w11(4)∂z1(4)∂a1(4)g′(z1(4))∂w11(3)∂z1(4)a1(3) 所以 ∂ J i 1 ∂ w 11 ( 3 ) δ 1 ( 5 ) ⋅ w 11 ( 4 ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) \frac{\partial J_{i1}}{\partial w^{(3)}_{11}} \delta^{(5)}_1 \cdot w^{(4)}_{11} \cdot g(z^{(4)}_1) \cdot a^{(3)}_1 ∂w11(3)∂Ji1δ1(5)⋅w11(4)⋅g′(z1(4))⋅a1(3) 同样的我们也可以推出这里因为和前面几乎一样所以过程就省略了 绝对不是因为公式打起来太麻烦了qwq ∂ J i 2 ∂ w 11 ( 3 ) δ 2 ( 5 ) ⋅ w 21 ( 4 ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) \frac{\partial J_{i2}}{\partial w^{(3)}_{11}} \delta^{(5)}_2 \cdot w^{(4)}_{21} \cdot g(z^{(4)}_1) \cdot a^{(3)}_1 ∂w11(3)∂Ji2δ2(5)⋅w21(4)⋅g′(z1(4))⋅a1(3) 所以把这俩玩意儿加起来就能得到 ∂ J i ∂ w 11 ( 3 ) δ 1 ( 5 ) ⋅ w 11 ( 4 ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) δ 2 ( 5 ) ⋅ w 21 ( 4 ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) ( δ 1 ( 5 ) ⋅ w 11 ( 4 ) δ 2 ( 5 ) ⋅ w 21 ( 4 ) ) ⋅ g ′ ( z 1 ( 4 ) ) ⋅ a 1 ( 3 ) \begin{aligned} \frac{\partial J_i}{\partial w^{(3)}_{11}} \delta^{(5)}_1 \cdot w^{(4)}_{11} \cdot g(z^{(4)}_1) \cdot a^{(3)}_1 \delta^{(5)}_2 \cdot w^{(4)}_{21} \cdot g(z^{(4)}_1) \cdot a^{(3)}_1\\ (\delta^{(5)}_1 \cdot w^{(4)}_{11} \delta^{(5)}_2 \cdot w^{(4)}_{21})\cdot g(z^{(4)}_1) \cdot a^{(3)}_1 \end{aligned} ∂w11(3)∂Jiδ1(5)⋅w11(4)⋅g′(z1(4))⋅a1(3)δ2(5)⋅w21(4)⋅g′(z1(4))⋅a1(3)(δ1(5)⋅w11(4)δ2(5)⋅w21(4))⋅g′(z1(4))⋅a1(3) 然后我们令 δ 1 ( 4 ) ( δ 1 ( 5 ) ⋅ w 11 ( 4 ) δ 2 ( 5 ) ⋅ w 21 ( 4 ) ) ⋅ g ′ ( z 1 ( 4 ) ) \delta^{(4)}_1 (\delta^{(5)}_1 \cdot w^{(4)}_{11} \delta^{(5)}_2 \cdot w^{(4)}_{21}) \cdot g(z^{(4)}_1) δ1(4)(δ1(5)⋅w11(4)δ2(5)⋅w21(4))⋅g′(z1(4)) 于是我们就有 ∂ J i ∂ w 11 ( 3 ) δ 1 ( 4 ) ⋅ a 1 ( 3 ) \frac{\partial J_i}{\partial w^{(3)}_{11}} \delta^{(4)}_1 \cdot a^{(3)}_1 ∂w11(3)∂Jiδ1(4)⋅a1(3) 我们发现这个式子跟我们上面的 ∂ J i ∂ w j k ( L − 1 ) δ j ( L ) ⋅ a k ( L − 1 ) \frac{\partial J_i}{\partial w^{(L-1)}_{jk}} \delta^{(L)}_j \cdot a^{(L-1)}_k ∂wjk(L−1)∂Jiδj(L)⋅ak(L−1) 这个的结构完全一致。 所以我们得到了一个递推式 δ 1 ( 4 ) ( δ 1 ( 5 ) ⋅ w 11 ( 4 ) δ 2 ( 5 ) ⋅ w 21 ( 4 ) ) ⋅ g ′ ( z 1 ( 4 ) ) \delta^{(4)}_1 (\delta^{(5)}_1 \cdot w^{(4)}_{11} \delta^{(5)}_2 \cdot w^{(4)}_{21}) \cdot g(z^{(4)}_1) δ1(4)(δ1(5)⋅w11(4)δ2(5)⋅w21(4))⋅g′(z1(4)) 同样的我们也能得到 δ 2 ( 4 ) ( δ 1 ( 5 ) ⋅ w 12 ( 4 ) δ 2 ( 5 ) ⋅ w 22 ( 4 ) ) ⋅ g ′ ( z 2 ( 4 ) ) \delta^{(4)}_2 (\delta^{(5)}_1 \cdot w^{(4)}_{12} \delta^{(5)}_2 \cdot w^{(4)}_{22}) \cdot g(z^{(4)}_2) δ2(4)(δ1(5)⋅w12(4)δ2(5)⋅w22(4))⋅g′(z2(4)) 也可以写成向量的形式 [ δ 1 ( 4 ) δ 2 ( 4 ) ] ( [ w 11 ( 4 ) w 12 ( 4 ) w 21 ( 4 ) w 22 ( 4 ) ] [ δ 1 ( 5 ) δ 2 ( 5 ) ] ) ⋅ ∗ [ g ′ ( z 1 ( 4 ) ) g ′ ( z 2 ( 4 ) ) ] \begin{bmatrix} \delta^{(4)}_1 \\ \delta^{(4)}_2 \end{bmatrix} \left(\begin{bmatrix} w^{(4)}_{11} w^{(4)}_{12} \\ w^{(4)}_{21} w^{(4)}_{22} \end{bmatrix} \begin{bmatrix} \delta^{(5)}_1 \\ \delta^{(5)}_2 \end{bmatrix}\right) \cdot* \begin{bmatrix} g(z^{(4)}_1) \\ g(z^{(4)}_2) \end{bmatrix} [δ1(4)δ2(4)]([w11(4)w21(4)w12(4)w22(4)][δ1(5)δ2(5)])⋅∗[g′(z1(4))g′(z2(4))] 也就是 δ ( 4 ) [ ( w ( 4 ) ) T δ ( 5 ) ] ⋅ ∗ g ′ ( z ( 4 ) ) \delta^{(4)} \bigg[(w^{(4)})^T\delta^{(5)}\bigg] \cdot* g(z^{(4)}) δ(4)[(w(4))Tδ(5)]⋅∗g′(z(4)) 同样的我们也能将这个式子推广到其他层 δ ( l ) [ ( w ( l ) ) T δ ( l 1 ) ] ⋅ ∗ g ′ ( z ( l ) ) \delta^{(l)} \bigg[ (w^{(l)})^T\delta^{(l1)} \bigg] \cdot* g(z^{(l)}) δ(l)[(w(l))Tδ(l1)]⋅∗g′(z(l)) 这个式子就是我们 b a c k p r o p a g a t i o n back \; propagation backpropagation 的关键了。 然后我们对于每个训练数据 i i i 都跑一遍 B P BP BP 计算出 ∂ J i ∂ w j k ( L − 1 ) \frac{\partial J_i}{\partial w^{(L-1)}_{jk}} ∂wjk(L−1)∂Ji然后令 Δ j k ( l ) \Delta^{(l)}_{jk} Δjk(l) 存储 ∂ J i ∂ w j k ( L − 1 ) \frac{\partial J_i}{\partial w^{(L-1)}_{jk}} ∂wjk(L−1)∂Ji 的和。最后跑完 m m m 个训练数据后令 D j k ( l ) 1 m Δ j k ( l ) D^{(l)}_{jk} \frac 1m\Delta^{(l)}_{jk} Djk(l)m1Δjk(l)我们就得到了 ∂ ∂ w j k ( l ) J ( w ) D j k ( l ) \frac{\partial}{\partial w^{(l)}_{jk}}J(w) D^{(l)}_{jk} ∂wjk(l)∂J(w)Djk(l) 然后再进行 G D GD GD 就可以了。