![](http://139.9.1.231/wp-content/uploads/2022/08/image-151-1024x611.png)
transformerr特点:
·是一个encoder-decoder模型
·非RNN模型
·完全基于全连接和注意力
·性能远超RNN(大数据集)
回忆seq-seq模型:
![](http://139.9.1.231/wp-content/uploads/2022/08/image-152-1024x572.png)
如何求c:
![](http://139.9.1.231/wp-content/uploads/2022/08/image-153.png)
![](http://139.9.1.231/wp-content/uploads/2022/08/image-154.png)
![](http://139.9.1.231/wp-content/uploads/2022/08/image-155.png)
如何从RNN到transformer:自注意力层
![](http://139.9.1.231/wp-content/uploads/2022/08/image-156-1024x492.png)
![](http://139.9.1.231/wp-content/uploads/2022/08/image-158.png)
![](http://139.9.1.231/wp-content/uploads/2022/08/image-159.png)
在self-attention中,每个单词有3个不同的向量,它们分别是Query向量( Q ),Key向量( K )和Value向量( V ),长度均是64。它们是通过3个不同的权值矩阵由嵌入向量 X 乘以三个不同的权值矩阵 WQ , WK , WV 得到,其中三个矩阵的尺寸也是相同的。均是 512×64 。
![](http://139.9.1.231/wp-content/uploads/2022/09/image-296.png)
![](http://139.9.1.231/wp-content/uploads/2022/09/image-297.png)
![](http://139.9.1.231/wp-content/uploads/2022/09/image-292.png)
![](http://139.9.1.231/wp-content/uploads/2022/09/image-295.png)
总结为如下图所示的矩阵形式:
![](http://139.9.1.231/wp-content/uploads/2022/09/image-294.png)
搭建transfomer:多头自注意力层
上面给出的是一个自注意力层,我们使用N个相同的层,并行,不同注意力层不共享参数。将多头的输出进行堆叠作为多头注意力层的输出。
![](http://139.9.1.231/wp-content/uploads/2022/08/image-160.png)
![](http://139.9.1.231/wp-content/uploads/2022/08/image-161-1024x384.png)
![](http://139.9.1.231/wp-content/uploads/2022/09/image-298.png)
![](http://139.9.1.231/wp-content/uploads/2022/09/image-299.png)
Stacked Self-Attention Layers
![](http://139.9.1.231/wp-content/uploads/2022/08/image-162-1024x598.png)
![](http://139.9.1.231/wp-content/uploads/2022/09/image-293.png)
一个encoder block:
![](http://139.9.1.231/wp-content/uploads/2022/08/image-163.png)
最终 堆叠6个:作为transfomer encoder:
![](http://139.9.1.231/wp-content/uploads/2022/08/image-164.png)
decoder部分:
![](http://139.9.1.231/wp-content/uploads/2022/08/image-165-1024x574.png)
encoder block:
![](http://139.9.1.231/wp-content/uploads/2022/08/image-166.png)
整体网络:
![](http://139.9.1.231/wp-content/uploads/2022/08/image-167-1024x596.png)
![](http://139.9.1.231/wp-content/uploads/2022/08/image-168.png)
![](http://139.9.1.231/wp-content/uploads/2022/08/image-169.png)
![](http://139.9.1.231/wp-content/uploads/2022/08/image-170-1024x543.png)