https://www.cnblogs.com/darkknightzh/p/12592351.html
论文:
https://arxiv.org/abs/1711.07971
第三方 pytorch 代码:
https://github.com/AlexHex7/Non-local_pytorch
1. non local 操作
该论文定义了通用了 non local 操作:
${{\mathbf{y}}_{i}}=\frac{1}{C(\mathbf{x})}\sum\limits_{\forall j}{f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})g({{\mathbf{x}}_{j}})}$
其中 i 为需要计算响应的输出位置的索引, j 为所有的位置. x 为输入信号 (图像, 序列, 视频等, 通常为这些信号的特征),y 为个 x 相同尺寸的输出信号. f 为 pairwise 的函数, f 计算当前 i 和所有 j 之间的关系, 并得到一个标量. 一元函数 g 计算输入信号在位置 j 的表征.(这段翻译起来怪怪的).C(x) 为归一化系数, 用于归一化 f 和 g 的结果.
2. non local 和其他操作的区别
1 non local 考虑到了所有的位置 j. 卷积操作仅考虑了当前位置的一个邻域(如核为 3 的一维卷积仅考虑了 i-1<=j<=i+1); 循环操作通常只考虑当前和上一个时间, j=i 或 j=i-1.
2 non local 根据不同位置的关系计算响应, fc 使用学习到的权重. 换言之, fc 中,${{\mathbf{x}}_{i}}$ 和 ${{\mathbf{x}}_{j}}$ 之间不是函数关系, 而 non local 中则是函数关系.
3 non local 支持输入不同尺寸, 并且保持输出和输入相同的尺寸; fc 则需要输入和输出均为固定的尺寸, 并且丢失了位置关系.
4 non local 可以用在网络的早期部分, fc 通常用在网络最后.
3. f 和 g 的形式
3.1 g 的形式
为简单起见, 只考虑 g 为线性形式,$g({{\mathbf{x}}_{j}})\text{=}{{W}_{g}}{{\mathbf{x}}_{j}}$,${{W}_{g}}$ 为需要学习的权重向量, 在空域可以使用 1*1conv 实现, 在空间时间域 (如时间序列的图像) 可以通过 1*1*1 的卷积实现.
3.2 f 为 gaussian
$f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})\text{=}{{e}^{\mathbf{x}_{i}^{T}{{\mathbf{x}}_{j}}}}$
其中 $\mathbf{x}_{i}^{T}{{\mathbf{x}}_{j}}$ 为点乘, 因为点乘在深度学习平台中更易实现(欧式距离也可以). 此时归一化系数 $C(\mathbf{x})=\sum\nolimits_{\forall j}{f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})}$
3.3 f 为 embedded Gaussian
$f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})\text{=}{{e}^{\theta {{({{\mathbf{x}}_{i}})}^{T}}\phi ({{\mathbf{x}}_{j}})}}$
其中 $\theta ({{\mathbf{x}}_{i}})\text{=}{{W}_{\theta }}{{\mathbf{x}}_{i}}$,$\phi ({{\mathbf{x}}_{j}})\text{=}{{W}_{\phi }}{{\mathbf{x}}_{j}}$, 此时 $C(\mathbf{x})=\sum\nolimits_{\forall j}{f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})}$
self attention 模块和 non local 的关系: 可以认为 self attention 为 embedded Gaussian 的特殊形式, 如给定 i,$\frac{1}{C(\mathbf{x})}f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})$ 沿着 j 维度变成了计算 softmax. 此时 $\mathbf{y}=softmax({{\mathbf{x}}^{T}}W_{\theta }^{T}{{W}_{\phi }}\mathbf{x})g(\mathbf{x})$, 即为 self attention 的形式.
3.4 点乘
f 可以定义为点乘的相似度(此处使用 embedded 的形式):
$f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})\text{=}\theta {{({{\mathbf{x}}_{i}})}^{T}}\phi ({{\mathbf{x}}_{j}})$
此时, 归一化系数 $C(\mathbf{x})=N$,N 为 x 中所有位置的数量, 而不是 f 的 sum, 这样可以简化梯度的计算.
点乘和 embedded Gaussian 的区别是是否使用了作为激活函数的 softmax.
- 3.5 Concatenation
- $f({
- {
- \mathbf{
- x
- }
- }_{
- i
- }
- },{
- {
- \mathbf{
- x
- }
- }_{
- j
- }
- })\text{
- =ReLU(w
- }_{
- f
- }^{
- T
- }[\theta ({
- {
- \mathbf{
- x
- }
- }_{
- i
- }
- }),\phi ({
- {
- \mathbf{
- x
- }
- }_{
- j
- }
- })]\text{
- )
- }$
其中 $[\cdot \cdot ]$ 代表 concatenation, 即拼接.${{w}_{f}}$ 为权重向量, 用于将拼接后的向量映射到一个标量.$C(\mathbf{x})=N$
4. Non local block
将之前公式的 non local 操作扩展成 non local block, 可以嵌入到目前的网络结构中, 如下:
${{\mathbf{z}}_{i}}={{W}_{z}}{{\mathbf{y}}_{i}}+{{\mathbf{x}}_{i}}$
其中 ${{\mathbf{y}}_{i}}=\frac{1}{C(\mathbf{x})}\sum\limits_{\forall j}{f({{\mathbf{x}}_{i}},{{\mathbf{x}}_{j}})g({{\mathbf{x}}_{j}})}$,$+{{\mathbf{x}}_{i}}$ 代表残差连接. 残差连接方便将 non local block 嵌入到之前与训练的模型中, 避免打乱其初始行为(如将 ${{W}_{z}}$ 初始化为 0).
non local block 如下图所示. 3.2,3.3,3.4 中的 pairwise 计算对应于下图中的矩阵乘法. 在网络后面的特征图上, pairwise 计算量比较小.
说明:
1. 若为图像, 则使用 1*1conv, 且图中无 T; 若为视频, 则使用 1*1*1conv, 且图中有 T.
2. 图中 softmax 指对该矩阵每行计算 softmax.
5. 降低计算量
5.1 降低 x 的通道数量
将 ${{W}_{g}}$,${{W}_{\theta }}$,${{W}_{\phi }}$ 降低为 x 的通道数量的一半, 可以降低计算量.
5.2 对 x 下采样.
对 x 下采样, 可以进一步降低计算量.
此时, 1 中的共识修改为 ${{\mathbf{y}}_{i}}=\frac{1}{C(\mathbf{\hat{x}})}\sum\limits_{\forall j}{f({{\mathbf{x}}_{i}},{{{\mathbf{\hat{x}}}}_{j}})g({{{\mathbf{\hat{x}}}}_{j}})}$, 其中 $\mathbf{\hat{x}}$ 为对 x 进行下采样后的输入(如 pooling). 这种方式可以降低 pariwsie 计算到原来的 1/4, 一方面不影响 non local 的行为, 另一方面, 使得计算更加稀疏. 可以通过在上图中 $\phi $ 和 $g$ 后面加一个 max pooling 来实现.
6. 代码:
6.1 embedded_gaussian
- class _NonLocalBlockND(nn.Module):
- def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True):
- """
- :param in_channels:
- :param inter_channels:
- :param dimension:
- :param sub_sample:
- :param bn_layer:
- """
- super(_NonLocalBlockND, self).__init__()
- assert dimension in [1, 2, 3]
- self.dimension = dimension
- self.sub_sample = sub_sample
- self.in_channels = in_channels
- self.inter_channels = inter_channels
- if self.inter_channels is None:
- self.inter_channels = in_channels // 2
- if self.inter_channels == 0:
- self.inter_channels = 1
- if dimension == 3:
- conv_nd = nn.Conv3d
- max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
- bn = nn.BatchNorm3d
- elif dimension == 2:
- conv_nd = nn.Conv2d
- max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
- bn = nn.BatchNorm2d
- else:
- conv_nd = nn.Conv1d
- max_pool_layer = nn.MaxPool1d(kernel_size=(2))
- bn = nn.BatchNorm1d
- self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
- kernel_size=1, stride=1, padding=0) # g 函数, 1*1conv, 用于降维
- if bn_layer:
- self.W = nn.Sequential( # 1*1conv, 用于图 2 中变换到原始维度
- conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
- kernel_size=1, stride=1, padding=0),
- bn(self.in_channels)
- )
- nn.init.constant_(self.W[1].weight, 0)
- nn.init.constant_(self.W[1].bias, 0)
- else:
- self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
- kernel_size=1, stride=1, padding=0) # 1*1conv, 用于图 2 中变换到原始维度
- nn.init.constant_(self.W.weight, 0)
- nn.init.constant_(self.W.bias, 0)
- self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
- kernel_size=1, stride=1, padding=0) # θ函数, 1*1conv, 用于降维
- self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
- kernel_size=1, stride=1, padding=0) # φ函数, 1*1conv, 用于降维
- if sub_sample:
- self.g = nn.Sequential(self.g, max_pool_layer)
- self.phi = nn.Sequential(self.phi, max_pool_layer)
- def forward(self, x, return_nl_map=False):
- """
- :param x: (b, c, t, h, w)
- :param return_nl_map: if True return z, nl_map, else only return z.
- :return:
- """
- # 令 x 维度 B*C*(K): 一维时, x 为 B*C*(K1); 二维时, x 为 B*C*(K1*K2); 三维时, x 为 B*C*(K1*K2*K3)
- batch_size = x.size(0) # batchsize
- g_x = self.g(x).view(batch_size, self.inter_channels, -1) # 通过 g 函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
- g_x = g_x.permute(0, 2, 1) # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
- theta_x = self.theta(x).view(batch_size, self.inter_channels, -1) # 通过θ函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
- theta_x = theta_x.permute(0, 2, 1) # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
- phi_x = self.phi(x).view(batch_size, self.inter_channels, -1) # 通过φ函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
- f = torch.matmul(theta_x, phi_x) # 得到 B*(K)*(K)矩阵, 和图 2 中一致
- f_div_C = F.softmax(f, dim=-1) # 通过 softmax, 对最后一维归一化, 得到归一化的特征, 即概率, B*(K)*(K)
- y = torch.matmul(f_div_C, g_x) # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
- y = y.permute(0, 2, 1).contiguous() # 得到 B*inter_channels*(K)矩阵, 和图 2 中一致
- y = y.view(batch_size, self.inter_channels, *x.size()[2:]) # 得到 B*inter_channels*(K1 或 K1*K2 或 K1*K2*K3)矩阵, 和图 2 中一致
- W_y = self.W(y) # 得到 B*C*(K)矩阵, 和图 2 中一致
- z = W_y + x # 特征图和 non local 的图相加, 得到新的特征图, B*C*(K)
- if return_nl_map:
- return z, f_div_C # 返回结果及归一化的特征
- return z
- class NONLocalBlock1D(_NonLocalBlockND):
- def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
- super(NONLocalBlock1D, self).__init__(in_channels,
- inter_channels=inter_channels,
- dimension=1, sub_sample=sub_sample,
- bn_layer=bn_layer)
- class NONLocalBlock2D(_NonLocalBlockND):
- def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
- super(NONLocalBlock2D, self).__init__(in_channels,
- inter_channels=inter_channels,
- dimension=2, sub_sample=sub_sample,
- bn_layer=bn_layer,)
- class NONLocalBlock3D(_NonLocalBlockND):
- def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
- super(NONLocalBlock3D, self).__init__(in_channels,
- inter_channels=inter_channels,
- dimension=3, sub_sample=sub_sample,
- bn_layer=bn_layer,)
- if __name__ == '__main__':
- import torch
- for (sub_sample_, bn_layer_) in [(True, True), (False, False), (True, False), (False, True)]:
- img = torch.zeros(2, 3, 20)
- net = NONLocalBlock1D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
- out = net(img)
- print(out.size())
- img = torch.zeros(2, 3, 20, 20)
- net = NONLocalBlock2D(3, sub_sample=sub_sample_, bn_layer=bn_layer_, store_last_batch_nl_map=True)
- out = net(img)
- print(out.size())
- img = torch.randn(2, 3, 8, 20, 20)
- net = NONLocalBlock3D(3, sub_sample=sub_sample_, bn_layer=bn_layer_, store_last_batch_nl_map=True)
- out = net(img)
- print(out.size())
- View Code
6.2 embedded Gaussian 和点乘的区别
点乘代码:
- class _NonLocalBlockND(nn.Module):
- def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True):
- super(_NonLocalBlockND, self).__init__()
- assert dimension in [1, 2, 3]
- self.dimension = dimension
- self.sub_sample = sub_sample
- self.in_channels = in_channels
- self.inter_channels = inter_channels
- if self.inter_channels is None:
- self.inter_channels = in_channels // 2
- if self.inter_channels == 0:
- self.inter_channels = 1
- if dimension == 3:
- conv_nd = nn.Conv3d
- max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
- bn = nn.BatchNorm3d
- elif dimension == 2:
- conv_nd = nn.Conv2d
- max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
- bn = nn.BatchNorm2d
- else:
- conv_nd = nn.Conv1d
- max_pool_layer = nn.MaxPool1d(kernel_size=(2))
- bn = nn.BatchNorm1d
- self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
- kernel_size=1, stride=1, padding=0)
- if bn_layer:
- self.W = nn.Sequential(
- conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
- kernel_size=1, stride=1, padding=0),
- bn(self.in_channels)
- )
- nn.init.constant_(self.W[1].weight, 0)
- nn.init.constant_(self.W[1].bias, 0)
- else:
- self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
- kernel_size=1, stride=1, padding=0)
- nn.init.constant_(self.W.weight, 0)
- nn.init.constant_(self.W.bias, 0)
- self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
- kernel_size=1, stride=1, padding=0)
- self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
- kernel_size=1, stride=1, padding=0)
- if sub_sample:
- self.g = nn.Sequential(self.g, max_pool_layer)
- self.phi = nn.Sequential(self.phi, max_pool_layer)
- def forward(self, x, return_nl_map=False):
- """
- :param x: (b, c, t, h, w)
- :param return_nl_map: if True return z, nl_map, else only return z.
- :return:
- """
- # 令 x 维度 B*C*(K): 一维时, x 为 B*C*(K1); 二维时, x 为 B*C*(K1*K2); 三维时, x 为 B*C*(K1*K2*K3)
- batch_size = x.size(0)
- g_x = self.g(x).view(batch_size, self.inter_channels, -1) # 通过 g 函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
- g_x = g_x.permute(0, 2, 1) # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
- theta_x = self.theta(x).view(batch_size, self.inter_channels, -1) # 通过θ函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
- theta_x = theta_x.permute(0, 2, 1) # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
- phi_x = self.phi(x).view(batch_size, self.inter_channels, -1) # 通过φ函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
- f = torch.matmul(theta_x, phi_x) # 得到 B*(K)*(K)矩阵, 和图 2 中一致
- N = f.size(-1) # 最后一维的维度
- f_div_C = f / N # 对最后一维归一化
- y = torch.matmul(f_div_C, g_x) # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
- y = y.permute(0, 2, 1).contiguous() # 得到 B*inter_channels*(K)矩阵, 和图 2 中一致
- y = y.view(batch_size, self.inter_channels, *x.size()[2:]) # 得到 B*inter_channels*(K1 或 K1*K2 或 K1*K2*K3)矩阵, 和图 2 中一致
- W_y = self.W(y) # 得到 B*C*(K)矩阵, 和图 2 中一致
- z = W_y + x # 特征图和 non local 的图相加, 得到新的特征图, B*C*(K)
- if return_nl_map:
- return z, f_div_C # 返回结果及归一化的特征
- return z
- class NONLocalBlock1D(_NonLocalBlockND):
- def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
- super(NONLocalBlock1D, self).__init__(in_channels,
- inter_channels=inter_channels,
- dimension=1, sub_sample=sub_sample,
- bn_layer=bn_layer)
- class NONLocalBlock2D(_NonLocalBlockND):
- def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
- super(NONLocalBlock2D, self).__init__(in_channels,
- inter_channels=inter_channels,
- dimension=2, sub_sample=sub_sample,
- bn_layer=bn_layer)
- class NONLocalBlock3D(_NonLocalBlockND):
- def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
- super(NONLocalBlock3D, self).__init__(in_channels,
- inter_channels=inter_channels,
- dimension=3, sub_sample=sub_sample,
- bn_layer=bn_layer)
- if __name__ == '__main__':
- import torch
- for (sub_sample_, bn_layer_) in [(True, True), (False, False), (True, False), (False, True)]:
- img = torch.zeros(2, 3, 20)
- net = NONLocalBlock1D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
- out = net(img)
- print(out.size())
- img = torch.zeros(2, 3, 20, 20)
- net = NONLocalBlock2D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
- out = net(img)
- print(out.size())
- img = torch.randn(2, 3, 8, 20, 20)
- net = NONLocalBlock3D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
- out = net(img)
- print(out.size())
- View Code
左侧为 embedded Gaussian, 右侧为点乘
6.3 embedded Gaussian 和 Gaussian 的区别
左侧为 embedded Gaussian, 右侧为 Gaussian
初始化:
forward:
6.4 embedded Gaussian 和 Concatenation 的区别
Concatenation 代码:
- class _NonLocalBlockND(nn.Module):
- def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True):
- super(_NonLocalBlockND, self).__init__()
- assert dimension in [1, 2, 3]
- self.dimension = dimension
- self.sub_sample = sub_sample
- self.in_channels = in_channels
- self.inter_channels = inter_channels
- if self.inter_channels is None:
- self.inter_channels = in_channels // 2
- if self.inter_channels == 0:
- self.inter_channels = 1
- if dimension == 3:
- conv_nd = nn.Conv3d
- max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
- bn = nn.BatchNorm3d
- elif dimension == 2:
- conv_nd = nn.Conv2d
- max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
- bn = nn.BatchNorm2d
- else:
- conv_nd = nn.Conv1d
- max_pool_layer = nn.MaxPool1d(kernel_size=(2))
- bn = nn.BatchNorm1d
- self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
- kernel_size=1, stride=1, padding=0)
- if bn_layer:
- self.W = nn.Sequential(
- conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
- kernel_size=1, stride=1, padding=0),
- bn(self.in_channels)
- )
- nn.init.constant_(self.W[1].weight, 0)
- nn.init.constant_(self.W[1].bias, 0)
- else:
- self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
- kernel_size=1, stride=1, padding=0)
- nn.init.constant_(self.W.weight, 0)
- nn.init.constant_(self.W.bias, 0)
- self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
- kernel_size=1, stride=1, padding=0)
- self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
- kernel_size=1, stride=1, padding=0)
- self.concat_project = nn.Sequential( # 将 concat 后的特征降维到 1 维的矩阵
- nn.Conv2d(self.inter_channels * 2, 1, 1, 1, 0, bias=False),
- nn.ReLU()
- )
- if sub_sample:
- self.g = nn.Sequential(self.g, max_pool_layer)
- self.phi = nn.Sequential(self.phi, max_pool_layer)
- def forward(self, x, return_nl_map=False):
- '''
- :param x: (b, c, t, h, w)
- :param return_nl_map: if True return z, nl_map, else only return z.
- :return:
- '''
- # 令 x 维度 B*C*(K): 一维时, x 为 B*C*(K1); 二维时, x 为 B*C*(K1*K2); 三维时, x 为 B*C*(K1*K2*K3)
- batch_size = x.size(0)
- g_x = self.g(x).view(batch_size, self.inter_channels, -1) # 通过 g 函数, 并 reshape, 得到 B*inter_channels*(K)矩阵
- g_x = g_x.permute(0, 2, 1) # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
- # (b, c, N, 1)
- theta_x = self.theta(x).view(batch_size, self.inter_channels, -1, 1) # 通过θ函数, 并 reshape, 得到 B*inter_channels*(K)*1 矩阵
- # (b, c, 1, N)
- phi_x = self.phi(x).view(batch_size, self.inter_channels, 1, -1) # 通过φ函数, 并 reshape, 得到 B*inter_channels*1*(K)矩阵
- h = theta_x.size(2) # (K)
- w = phi_x.size(3) # (K)
- theta_x = theta_x.repeat(1, 1, 1, w) # B*inter_channels*(K)*(K)
- phi_x = phi_x.repeat(1, 1, h, 1) # B*inter_channels*(K)*(K)
- concat_feature = torch.cat([theta_x, phi_x], dim=1) # B*(2*inter_channels)*(K)*(K)
- f = self.concat_project(concat_feature) # B*1*(K)*(K)
- b, _, h, w = f.size() # B,_,(K),(K)
- f = f.view(b, h, w) # B*(K)*(K)
- N = f.size(-1) # (K)
- f_div_C = f / N # 最后一维归一化, B*(K)*(K)
- y = torch.matmul(f_div_C, g_x) # 得到 B*(K)*inter_channels 矩阵, 和图 2 中一致
- y = y.permute(0, 2, 1).contiguous()# 得到 B*inter_channels*(K)矩阵, 和图 2 中一致
- y = y.view(batch_size, self.inter_channels, *x.size()[2:]) # 得到 B*inter_channels*(K1 或 K1*K2 或 K1*K2*K3)矩阵, 和图 2 中一致
- W_y = self.W(y) # 得到 B*C*(K)矩阵, 和图 2 中一致
- z = W_y + x # 特征图和 non local 的图相加, 得到新的特征图, B*C*(K)
- if return_nl_map:
- return z, f_div_C # 返回结果及归一化的特征
- return z
- class NONLocalBlock1D(_NonLocalBlockND):
- def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
- super(NONLocalBlock1D, self).__init__(in_channels,
- inter_channels=inter_channels,
- dimension=1, sub_sample=sub_sample,
- bn_layer=bn_layer)
- class NONLocalBlock2D(_NonLocalBlockND):
- def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
- super(NONLocalBlock2D, self).__init__(in_channels,
- inter_channels=inter_channels,
- dimension=2, sub_sample=sub_sample,
- bn_layer=bn_layer)
- class NONLocalBlock3D(_NonLocalBlockND):
- def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True,):
- super(NONLocalBlock3D, self).__init__(in_channels,
- inter_channels=inter_channels,
- dimension=3, sub_sample=sub_sample,
- bn_layer=bn_layer)
- if __name__ == '__main__':
- import torch
- for (sub_sample_, bn_layer_) in [(True, True), (False, False), (True, False), (False, True)]:
- img = torch.zeros(2, 3, 20)
- net = NONLocalBlock1D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
- out = net(img)
- print(out.size())
- img = torch.zeros(2, 3, 20, 20)
- net = NONLocalBlock2D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
- out = net(img)
- print(out.size())
- img = torch.randn(2, 3, 8, 20, 20)
- net = NONLocalBlock3D(3, sub_sample=sub_sample_, bn_layer=bn_layer_)
- out = net(img)
- print(out.size())
- View Code
左侧为 embedded Gaussian, 右侧为 Concatenation
初始化:
forward:
来源: https://www.cnblogs.com/darkknightzh/p/12592351.html