当前位置：

首页
/
IT
/
程序
/
Python
/
(原)Non-local Neural Networks

(原)Non-local Neural Networks

https://www.cnblogs.com/darkknightzh/p/12592351.html

论文:

https://arxiv.org/abs/1711.07971

第三方 pytorch 代码:

https://github.com/AlexHex7/Non-local_pytorch

1. non local 操作

该论文定义了通用了 non local 操作:

${{\mathbf{y}}_{i}}=\frac{1}{C(\mathbf{x})}\sum\limits_{\forall j}{f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})g({{\mathbf{x}}_{j}})}$

其中 i 为需要计算响应的输出位置的索引, j 为所有的位置. x 为输入信号 (图像, 序列, 视频等, 通常为这些信号的特征),y 为个 x 相同尺寸的输出信号. f 为 pairwise 的函数, f 计算当前 i 和所有 j 之间的关系, 并得到一个标量. 一元函数 g 计算输入信号在位置 j 的表征.(这段翻译起来怪怪的).C(x) 为归一化系数, 用于归一化 f 和 g 的结果.

2. non local 和其他操作的区别

1 non local 考虑到了所有的位置 j. 卷积操作仅考虑了当前位置的一个邻域(如核为 3 的一维卷积仅考虑了 i-1<=j<=i+1); 循环操作通常只考虑当前和上一个时间, j=i 或 j=i-1.

2 non local 根据不同位置的关系计算响应, fc 使用学习到的权重. 换言之, fc 中,${{\mathbf{x}}_{i}}$ 和 ${{\mathbf{x}}_{j}}$ 之间不是函数关系, 而 non local 中则是函数关系.

3 non local 支持输入不同尺寸, 并且保持输出和输入相同的尺寸; fc 则需要输入和输出均为固定的尺寸, 并且丢失了位置关系.

4 non local 可以用在网络的早期部分, fc 通常用在网络最后.

3. f 和 g 的形式

3.1 g 的形式

为简单起见, 只考虑 g 为线性形式,$g({{\mathbf{x}}_{j}})\text{=}{{W}_{g}}{{\mathbf{x}}_{j}}$,${{W}_{g}}$ 为需要学习的权重向量, 在空域可以使用 1*1conv 实现, 在空间时间域 (如时间序列的图像) 可以通过 1*1*1 的卷积实现.

3.2 f 为 gaussian

$f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})\text{=}{{e}^{\mathbf{x}_{i}^{T}{{\mathbf{x}}_{j}}}}$

其中 $\mathbf{x}_{i}^{T}{{\mathbf{x}}_{j}}$ 为点乘, 因为点乘在深度学习平台中更易实现(欧式距离也可以). 此时归一化系数 $C(\mathbf{x})=\sum\nolimits_{\forall j}{f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})}$

3.3 f 为 embedded Gaussian

$f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})\text{=}{{e}^{\theta {{({{\mathbf{x}}_{i}})}^{T}}\phi ({{\mathbf{x}}_{j}})}}$

其中 $\theta ({{\mathbf{x}}_{i}})\text{=}{{W}_{\theta }}{{\mathbf{x}}_{i}}$,$\phi ({{\mathbf{x}}_{j}})\text{=}{{W}_{\phi }}{{\mathbf{x}}_{j}}$, 此时 $C(\mathbf{x})=\sum\nolimits_{\forall j}{f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})}$

self attention 模块和 non local 的关系: 可以认为 self attention 为 embedded Gaussian 的特殊形式, 如给定 i,$\frac{1}{C(\mathbf{x})}f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})$ 沿着 j 维度变成了计算 softmax. 此时 $\mathbf{y}=softmax({{\mathbf{x}}^{T}}W_{\theta }^{T}{{W}_{\phi }}\mathbf{x})g(\mathbf{x})$, 即为 self attention 的形式.

3.4 点乘

f 可以定义为点乘的相似度(此处使用 embedded 的形式):

$f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})\text{=}\theta {{({{\mathbf{x}}_{i}})}^{T}}\phi ({{\mathbf{x}}_{j}})$

此时, 归一化系数 $C(\mathbf{x})=N$,N 为 x 中所有位置的数量, 而不是 f 的 sum, 这样可以简化梯度的计算.

点乘和 embedded Gaussian 的区别是是否使用了作为激活函数的 softmax.

3.5 Concatenation
$f({
	{
		\mathbf{
	x	
}	
	}_{
	i	
}	
},{
	{
		\mathbf{
	x	
}	
	}_{
	j	
}	
})\text{
	=ReLU(w	
}_{
	f	
}^{
	T	
}[\theta ({
	{
		\mathbf{
	x	
}	
	}_{
	i	
}	
}),\phi ({
	{
		\mathbf{
	x	
}	
	}_{
	j	
}	
})]\text{
	)	
}$

其中 $[\cdot \cdot ]$ 代表 concatenation, 即拼接.${{w}_{f}}$ 为权重向量, 用于将拼接后的向量映射到一个标量.$C(\mathbf{x})=N$

4. Non local block

将之前公式的 non local 操作扩展成 non local block, 可以嵌入到目前的网络结构中, 如下:

${{\mathbf{z}}_{i}}={{W}_{z}}{{\mathbf{y}}_{i}}+{{\mathbf{x}}_{i}}$

其中 ${{\mathbf{y}}_{i}}=\frac{1}{C(\mathbf{x})}\sum\limits_{\forall j}{f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})g({{\mathbf{x}}_{j}})}$,$+{{\mathbf{x}}_{i}}$ 代表残差连接. 残差连接方便将 non local block 嵌入到之前与训练的模型中, 避免打乱其初始行为(如将 ${{W}_{z}}$ 初始化为 0).

non local block 如下图所示. 3.2,3.3,3.4 中的 pairwise 计算对应于下图中的矩阵乘法. 在网络后面的特征图上, pairwise 计算量比较小.

说明:

1. 若为图像, 则使用 1*1conv, 且图中无 T; 若为视频, 则使用 1*1*1conv, 且图中有 T.

2. 图中 softmax 指对该矩阵每行计算 softmax.

5. 降低计算量

5.1 降低 x 的通道数量

将 ${{W}_{g}}$,${{W}_{\theta }}$,${{W}_{\phi }}$ 降低为 x 的通道数量的一半, 可以降低计算量.

5.2 对 x 下采样.

对 x 下采样, 可以进一步降低计算量.

此时, 1 中的共识修改为 ${{\mathbf{y}}_{i}}=\frac{1}{C(\mathbf{\hat{x}})}\sum\limits_{\forall j}{f({{\mathbf{x}}_{i}},{{{\mathbf{\hat{x}}}}_{j}})g({{{\mathbf{\hat{x}}}}_{j}})}$, 其中 $\mathbf{\hat{x}}$ 为对 x 进行下采样后的输入(如 pooling). 这种方式可以降低 pariwsie 计算到原来的 1/4, 一方面不影响 non local 的行为, 另一方面, 使得计算更加稀疏. 可以通过在上图中 $\phi $ 和 $g$ 后面加一个 max pooling 来实现.

6. 代码:

6.1 embedded_gaussian

class _NonLocalBlockND(nn.Module):
     def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True):
         """
         :param in_channels:
         :param inter_channels:
         :param dimension:
         :param sub_sample:
         :param bn_layer:
         """
         super(_NonLocalBlockND, self).__init__()
         assert dimension in [1, 2, 3]
         self.dimension = dimension
         self.sub_sample = sub_sample
         self.in_channels = in_channels
         self.inter_channels = inter_channels
         if self.inter_channels is None:
             self.inter_channels = in_channels // 2
             if self.inter_channels == 0:
                 self.inter_channels = 1
         if dimension == 3:
             conv_nd = nn.Conv3d
             max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
             bn = nn.BatchNorm3d
         elif dimension == 2:
             conv_nd = nn.Conv2d
             max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
             bn = nn.BatchNorm2d
         else:
             conv_nd = nn.Conv1d
             max_pool_layer = nn.MaxPool1d(kernel_size=(2))
             bn = nn.BatchNorm1d
         self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                          kernel_size=1, stride=1, padding=0)    # g 函数, 1*1conv, 用于降维
         if bn_layer:
             self.W = nn.Sequential(    # 1*1conv, 用于图 2 中变换到原始维度
                 conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                         kernel_size=1, stride=1, padding=0),
                 bn(self.in_channels)
             )
             nn.init.constant_(self.W[1].weight, 0)
             nn.init.constant_(self.W[1].bias, 0)
         else:
             self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                              kernel_size=1, stride=1, padding=0)   # 1*1conv, 用于图 2 中变换到原始维度
             nn.init.constant_(self.W.weight, 0)
             nn.init.constant_(self.W.bias, 0)
         self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                              kernel_size=1, stride=1, padding=0)   # θ函数, 1*1conv, 用于降维
         self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                            kernel_size=1, stride=1, padding=0)    # φ函数, 1*1conv, 用于降维
         if sub_sample:
             self.g = nn.Sequential(self.g, max_pool_layer)
             self.phi = nn.Sequential(self.phi, max_pool_layer)
     def forward(self, x, return_nl_map=False):
         """
         :param x: (b, c, t, h, w)
         :param return_nl_map: if True return z, nl_map, else only return z.
         :return:
         """
         # 令 x 维度 B*C*(K): 一维时, x 为 B*C*(K1); 二维时, x 为 B*C*(K1*K2); 三维时, x 为 B*C*(K1*K2*K3)
         batch_size = x.size(0)   # batchsize
         g_x = self.g(x).view(batch_size, self.inter_channels, -1)   # 通过 g 函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
         g_x = g_x.permute(0, 2, 1)   # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
         theta_x = self.theta(x).view(batch_size, self.inter_channels, -1)   # 通过θ函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
         theta_x = theta_x.permute(0, 2, 1)   # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
         phi_x = self.phi(x).view(batch_size, self.inter_channels, -1)   # 通过φ函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
         f = torch.matmul(theta_x, phi_x)    # 得到 B*(K)*(K)矩阵, 和图 2 中一致
         f_div_C = F.softmax(f, dim=-1)      # 通过 softmax, 对最后一维归一化, 得到归一化的特征, 即概率, B*(K)*(K)
         y = torch.matmul(f_div_C, g_x)      # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
         y = y.permute(0, 2, 1).contiguous() # 得到 B*inter_channels*(K)矩阵, 和图 2 中一致
         y = y.view(batch_size, self.inter_channels, *x.size()[2:])  # 得到 B*inter_channels*(K1 或 K1*K2 或 K1*K2*K3)矩阵, 和图 2 中一致
         W_y = self.W(y)  # 得到 B*C*(K)矩阵, 和图 2 中一致
         z = W_y + x   # 特征图和 non local 的图相加, 得到新的特征图, B*C*(K)
         if return_nl_map:
             return z, f_div_C   # 返回结果及归一化的特征
         return z
 class NONLocalBlock1D(_NonLocalBlockND):
     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
         super(NONLocalBlock1D, self).__init__(in_channels,
                                               inter_channels=inter_channels,
                                               dimension=1, sub_sample=sub_sample,
                                               bn_layer=bn_layer)
 class NONLocalBlock2D(_NonLocalBlockND):
     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
         super(NONLocalBlock2D, self).__init__(in_channels,
                                               inter_channels=inter_channels,
                                               dimension=2, sub_sample=sub_sample,
                                               bn_layer=bn_layer,)
 class NONLocalBlock3D(_NonLocalBlockND):
     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
         super(NONLocalBlock3D, self).__init__(in_channels,
                                               inter_channels=inter_channels,
                                               dimension=3, sub_sample=sub_sample,
                                               bn_layer=bn_layer,)
 if __name__ == '__main__':
     import torch
     for (sub_sample_, bn_layer_) in [(True, True), (False, False), (True, False), (False, True)]:
         img = torch.zeros(2, 3, 20)
         net = NONLocalBlock1D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
         out = net(img)
         print(out.size())
         img = torch.zeros(2, 3, 20, 20)
         net = NONLocalBlock2D(3, sub_sample=sub_sample_, bn_layer=bn_layer_, store_last_batch_nl_map=True)
         out = net(img)
         print(out.size())
         img = torch.randn(2, 3, 8, 20, 20)
         net = NONLocalBlock3D(3, sub_sample=sub_sample_, bn_layer=bn_layer_, store_last_batch_nl_map=True)
         out = net(img)
         print(out.size())
View Code

6.2 embedded Gaussian 和点乘的区别

点乘代码:

class _NonLocalBlockND(nn.Module):
     def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True):
         super(_NonLocalBlockND, self).__init__()
         assert dimension in [1, 2, 3]
         self.dimension = dimension
         self.sub_sample = sub_sample
         self.in_channels = in_channels
         self.inter_channels = inter_channels
         if self.inter_channels is None:
             self.inter_channels = in_channels // 2
             if self.inter_channels == 0:
                 self.inter_channels = 1
         if dimension == 3:
             conv_nd = nn.Conv3d
             max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
             bn = nn.BatchNorm3d
         elif dimension == 2:
             conv_nd = nn.Conv2d
             max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
             bn = nn.BatchNorm2d
         else:
             conv_nd = nn.Conv1d
             max_pool_layer = nn.MaxPool1d(kernel_size=(2))
             bn = nn.BatchNorm1d
         self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                          kernel_size=1, stride=1, padding=0)
         if bn_layer:
             self.W = nn.Sequential(
                 conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                         kernel_size=1, stride=1, padding=0),
                 bn(self.in_channels)
             )
             nn.init.constant_(self.W[1].weight, 0)
             nn.init.constant_(self.W[1].bias, 0)
         else:
             self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                              kernel_size=1, stride=1, padding=0)
             nn.init.constant_(self.W.weight, 0)
             nn.init.constant_(self.W.bias, 0)
         self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                              kernel_size=1, stride=1, padding=0)
         self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                            kernel_size=1, stride=1, padding=0)
         if sub_sample:
             self.g = nn.Sequential(self.g, max_pool_layer)
             self.phi = nn.Sequential(self.phi, max_pool_layer)
     def forward(self, x, return_nl_map=False):
         """
         :param x: (b, c, t, h, w)
         :param return_nl_map: if True return z, nl_map, else only return z.
         :return:
         """
         # 令 x 维度 B*C*(K): 一维时, x 为 B*C*(K1); 二维时, x 为 B*C*(K1*K2); 三维时, x 为 B*C*(K1*K2*K3)
         batch_size = x.size(0)
         g_x = self.g(x).view(batch_size, self.inter_channels, -1)    # 通过 g 函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
         g_x = g_x.permute(0, 2, 1)   # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
         theta_x = self.theta(x).view(batch_size, self.inter_channels, -1)   # 通过θ函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
         theta_x = theta_x.permute(0, 2, 1)    # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
         phi_x = self.phi(x).view(batch_size, self.inter_channels, -1)    # 通过φ函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
         f = torch.matmul(theta_x, phi_x)     # 得到 B*(K)*(K)矩阵, 和图 2 中一致
         N = f.size(-1)   # 最后一维的维度
         f_div_C = f / N  # 对最后一维归一化
         y = torch.matmul(f_div_C, g_x)    # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
         y = y.permute(0, 2, 1).contiguous()   # 得到 B*inter_channels*(K)矩阵, 和图 2 中一致
         y = y.view(batch_size, self.inter_channels, *x.size()[2:])   # 得到 B*inter_channels*(K1 或 K1*K2 或 K1*K2*K3)矩阵, 和图 2 中一致
         W_y = self.W(y) # 得到 B*C*(K)矩阵, 和图 2 中一致
         z = W_y + x   # 特征图和 non local 的图相加, 得到新的特征图, B*C*(K)
         if return_nl_map:
             return z, f_div_C  # 返回结果及归一化的特征
         return z
 class NONLocalBlock1D(_NonLocalBlockND):
     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
         super(NONLocalBlock1D, self).__init__(in_channels,
                                               inter_channels=inter_channels,
                                               dimension=1, sub_sample=sub_sample,
                                               bn_layer=bn_layer)
 class NONLocalBlock2D(_NonLocalBlockND):
     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
         super(NONLocalBlock2D, self).__init__(in_channels,
                                               inter_channels=inter_channels,
                                               dimension=2, sub_sample=sub_sample,
                                               bn_layer=bn_layer)
 class NONLocalBlock3D(_NonLocalBlockND):
     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
         super(NONLocalBlock3D, self).__init__(in_channels,
                                               inter_channels=inter_channels,
                                               dimension=3, sub_sample=sub_sample,
                                               bn_layer=bn_layer)
 if __name__ == '__main__':
     import torch
     for (sub_sample_, bn_layer_) in [(True, True), (False, False), (True, False), (False, True)]:
         img = torch.zeros(2, 3, 20)
         net = NONLocalBlock1D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
         out = net(img)
         print(out.size())
         img = torch.zeros(2, 3, 20, 20)
         net = NONLocalBlock2D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
         out = net(img)
         print(out.size())
         img = torch.randn(2, 3, 8, 20, 20)
         net = NONLocalBlock3D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
         out = net(img)
         print(out.size())
View Code

左侧为 embedded Gaussian, 右侧为点乘

6.3 embedded Gaussian 和 Gaussian 的区别

左侧为 embedded Gaussian, 右侧为 Gaussian

初始化:

forward:

6.4 embedded Gaussian 和 Concatenation 的区别

Concatenation 代码:

class _NonLocalBlockND(nn.Module):
     def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True):
         super(_NonLocalBlockND, self).__init__()
         assert dimension in [1, 2, 3]
         self.dimension = dimension
         self.sub_sample = sub_sample
         self.in_channels = in_channels
         self.inter_channels = inter_channels
         if self.inter_channels is None:
             self.inter_channels = in_channels // 2
             if self.inter_channels == 0:
                 self.inter_channels = 1
         if dimension == 3:
             conv_nd = nn.Conv3d
             max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
             bn = nn.BatchNorm3d
         elif dimension == 2:
             conv_nd = nn.Conv2d
             max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
             bn = nn.BatchNorm2d
         else:
             conv_nd = nn.Conv1d
             max_pool_layer = nn.MaxPool1d(kernel_size=(2))
             bn = nn.BatchNorm1d
         self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                          kernel_size=1, stride=1, padding=0)
         if bn_layer:
             self.W = nn.Sequential(
                 conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                         kernel_size=1, stride=1, padding=0),
                 bn(self.in_channels)
             )
             nn.init.constant_(self.W[1].weight, 0)
             nn.init.constant_(self.W[1].bias, 0)
         else:
             self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                              kernel_size=1, stride=1, padding=0)
             nn.init.constant_(self.W.weight, 0)
             nn.init.constant_(self.W.bias, 0)
         self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                              kernel_size=1, stride=1, padding=0)
         self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                            kernel_size=1, stride=1, padding=0)
         self.concat_project = nn.Sequential(   # 将 concat 后的特征降维到 1 维的矩阵
             nn.Conv2d(self.inter_channels * 2, 1, 1, 1, 0, bias=False),
             nn.ReLU()
         )
         if sub_sample:
             self.g = nn.Sequential(self.g, max_pool_layer)
             self.phi = nn.Sequential(self.phi, max_pool_layer)
     def forward(self, x, return_nl_map=False):
         '''
         :param x: (b, c, t, h, w)
         :param return_nl_map: if True return z, nl_map, else only return z.
         :return:
         '''
         # 令 x 维度 B*C*(K): 一维时, x 为 B*C*(K1); 二维时, x 为 B*C*(K1*K2); 三维时, x 为 B*C*(K1*K2*K3)
         batch_size = x.size(0)
         g_x = self.g(x).view(batch_size, self.inter_channels, -1)   # 通过 g 函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
         g_x = g_x.permute(0, 2, 1)  # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
         # (b, c, N, 1)
         theta_x = self.theta(x).view(batch_size, self.inter_channels, -1, 1)  # 通过θ函数, 并 reshape, 得到 B*inter_channels*(K)*1 矩阵
         # (b, c, 1, N)
         phi_x = self.phi(x).view(batch_size, self.inter_channels, 1, -1) # 通过φ函数, 并 reshape, 得到 B*inter_channels*1*(K)矩阵
         h = theta_x.size(2)  # (K)
         w = phi_x.size(3)  # (K)
         theta_x = theta_x.repeat(1, 1, 1, w)  # B*inter_channels*(K)*(K)
         phi_x = phi_x.repeat(1, 1, h, 1)     # B*inter_channels*(K)*(K)
         concat_feature = torch.cat([theta_x, phi_x], dim=1)  # B*(2*inter_channels)*(K)*(K)
         f = self.concat_project(concat_feature)    # B*1*(K)*(K)
         b, _, h, w = f.size()  # B,_,(K),(K)
         f = f.view(b, h, w)   # B*(K)*(K)
         N = f.size(-1)  # (K)
         f_div_C = f / N   # 最后一维归一化, B*(K)*(K)
         y = torch.matmul(f_div_C, g_x)    # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
         y = y.permute(0, 2, 1).contiguous()# 得到 B*inter_channels*(K)矩阵, 和图 2 中一致
         y = y.view(batch_size, self.inter_channels, *x.size()[2:])  # 得到 B*inter_channels*(K1 或 K1*K2 或 K1*K2*K3)矩阵, 和图 2 中一致
         W_y = self.W(y)  # 得到 B*C*(K)矩阵, 和图 2 中一致
         z = W_y + x   # 特征图和 non local 的图相加, 得到新的特征图, B*C*(K)
         if return_nl_map:
             return z, f_div_C    # 返回结果及归一化的特征
         return z
 class NONLocalBlock1D(_NonLocalBlockND):
     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
         super(NONLocalBlock1D, self).__init__(in_channels,
                                               inter_channels=inter_channels,
                                               dimension=1, sub_sample=sub_sample,
                                               bn_layer=bn_layer)
 class NONLocalBlock2D(_NonLocalBlockND):
     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
         super(NONLocalBlock2D, self).__init__(in_channels,
                                               inter_channels=inter_channels,
                                               dimension=2, sub_sample=sub_sample,
                                               bn_layer=bn_layer)
 class NONLocalBlock3D(_NonLocalBlockND):
     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True,):
         super(NONLocalBlock3D, self).__init__(in_channels,
                                               inter_channels=inter_channels,
                                               dimension=3, sub_sample=sub_sample,
                                               bn_layer=bn_layer)
 if __name__ == '__main__':
     import torch
     for (sub_sample_, bn_layer_) in [(True, True), (False, False), (True, False), (False, True)]:
         img = torch.zeros(2, 3, 20)
         net = NONLocalBlock1D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
         out = net(img)
         print(out.size())
         img = torch.zeros(2, 3, 20, 20)
         net = NONLocalBlock2D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
         out = net(img)
         print(out.size())
         img = torch.randn(2, 3, 8, 20, 20)
         net = NONLocalBlock3D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
         out = net(img)
         print(out.size())
View Code

左侧为 embedded Gaussian, 右侧为 Concatenation

初始化:

forward:

来源: https://www.cnblogs.com/darkknightzh/p/12592351.html

与本文相关文章

暂无,快来抢沙发吧！