BERT模型源码解析( 七 ) _生活百科

for _ in range(num_dims - 2):
position_broadcast_shape.append(1)
position_broadcast_shape.extend([seq_length, width]) 扩张
position_embeddings = tf.reshape(position_embeddings,
position_broadcast_shape) 变形
output += position_embeddings  将位置数据加进去
output = layer_norm_and_dropout(output, dropout_prob) 标准化和丢弃
return output
创建掩码
■从输入掩码创建注意力掩码
def create_attention_mask_from_input_mask(from_tensor, to_mask):
"""Create 3D attention mask from a 2D tensor mask.
从 2D掩码创建3D掩码
Args: 入参：输入张量，转换成掩码的张量
from_tensor: 2D or 3D Tensor of shape [batch_size, from_seq_length, ...].
to_mask: int32 Tensor of shape [batch_size, to_seq_length].
Returns:  返回值浮点值的张量
float Tensor of shape [batch_size, from_seq_length, to_seq_length].
""" 获取入参形状参数
from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])
batch_size = from_shape[0]
from_seq_length = from_shape[1]
获取转换张量的形状
to_shape = get_shape_list(to_mask, expected_rank=2)
to_seq_length = to_shape[1]
先变形，然后转换成float32浮点数
to_mask = tf.cast(
tf.reshape(to_mask, [batch_size, 1, to_seq_length]), tf.float32)
from_tensor不一定是掩码（虽然它可能是）
  我们不太关心（from里面的填充符号），所以创建一个全是1的张量；
# We don't assume that `from_tensor` is a mask (although it could be). We
# don't actually care if we attend *from* padding tokens (only *to* padding)
# tokens so we create a tensor of all ones.
#
# `broadcast_ones` = [batch_size, from_seq_length, 1]
创建全1张量
broadcast_ones = tf.ones(
shape=[batch_size, from_seq_length, 1], dtype=tf.float32)
我们在两个维度上进行广播，从而创建掩码
# Here we broadcast along two dimensions to create the mask.
mask = broadcast_ones * to_mask
return mask
注意力层
■注意力层
def attention_layer(from_tensor,
to_tensor,
attention_mask=None,
num_attention_heads=1,
size_per_head=512,
query_act=None,
key_act=None,
value_act=None,
attention_probs_dropout_prob=0.0,
initializer_range=0.02,
do_return_2d_tensor=False,
batch_size=None,
from_seq_length=None,
to_seq_length=None):
"""Performs multi-headed attention from `from_tensor` to `to_tensor`.
多头的注意力
This is an implementation of multi-headed attention
based on "Attention is all you Need".
这是一个多头注意力的实现，注意的才是需要的
如果from_tensor和to_tensor是一样的， name这个注意力就是自己注意自己，也叫自注意力。
If `from_tensor` and `to_tensor` are the same, then
this is self-attention. Each timestep in `from_tensor` attends to the
corresponding sequence in `to_tensor`, and returns a fixed-with vector.
先将from_tensor投射成query张量，并且将to_tensor投射成key和value张量。
这将产生一系列张量，张量个数=头数，
其中每个张量的形状都是[批处理量，序列长度，头的大小]
This function first projects `from_tensor` into a "query" tensor and
`to_tensor` into "key" and "value" tensors. These are (effectively) a list
of tensors of length `num_attention_heads`, where each tensor is of shape
[batch_size, seq_length, size_per_head].
query 张量和key张量都是点积的和成比例的？？？。
通过softmax运算从而获取注意力数据。
value 张量通过这些注意力数据差值计算得出，然后把它们连接成一个张量。
Then, the query and key tensors are dot-producted and scaled. These are
softmaxed to obtain attention probabilities. The value tensors are then
interpolated by these probabilities, then concatenated back to a single
tensor and returned.
实际操作中，多头注意力进行转置和变形运算，而不是独立的张量运算。
In practice, the multi-headed attention are done with transposes and
reshapes rather than actual separate tensors.
Args: 入参，输入张量，输出张量
from_tensor: float Tensor of shape [batch_size, from_seq_length,
from_width].
to_tensor: float Tensor of shape [batch_size, to_seq_length, to_width].
注意力掩码
attention_mask: (optional) int32 Tensor of shape [batch_size,
from_seq_length, to_seq_length]. The values should be 1 or 0. The
attention scores will effectively be set to -infinity for any positions in
the mask that are 0, and will be unchanged for positions that are 1.

BERT模型源码解析( 七 )

推荐阅读

关于最高人民法院、国家教委、人事部关于加强法院系统成人高等教育《专业证书》教学班管理的通知简述最高人民法院、国家教委、人事部关于加强法院系统成人高等教育《专业证书》教学班管

梦幻新诛仙转门派价格介绍梦幻新诛仙转门派多少钱

生辰八字查询:农历2022年正月二十二这天出生的虎宝宝八字是什么

关于特伦托音乐学院简述特伦托音乐学院

汕头方特欢乐世界蓝水星园内小火车是否收费？

大结局是好是坏怎么样《骊歌行》骊歌行原著小说是什么

重庆今年中考分数线重庆中考录取分数线2023年公布时间

旺财风水秘籍PDF下载，三合风水有哪些古籍

男牛女鸡结婚好不好男牛女鸡结婚会分开吗

维纳斯是什么神维纳斯是什么神话

孕妇能喝蜂蜜吗(孕妇可以喝蜂蜜吗？)

我把心挖出来挂在了树上什么意思

甲苯能使溴水褪色吗

2023年下半年台州教资笔试考试时间是什么时候？

跟女生聊天怎么显得不舔

女孩带楠字有寓意的名字

游戏“双子”讲了啥呀？

属兔和属马婚姻怎么样属兔的人与属马的人适合吗

社保卡遗失后怎么补办社保卡 ?补办社保卡的材料有哪些？

一石二鸟猜一生肖一石二鸟猜一生肖？