层级

fastai 的自定义层和获取它们的基本函数。

基本操作和大小调整

模块

 module (*flds, **defaults)

使用 f 作为 forward 方法创建 nn.Module 的装饰器

源代码

恒等

 Identity ()

什么都不做

test_eq(Identity()(1), 1)

源代码

Lambda

 Lambda (func)

为简单的 func 创建 PyTorch 层的一种简单方法

def _add2(x): return x+2
tst = Lambda(_add2)
x = torch.randn(10,20)
test_eq(tst(x), x+2)
tst2 = pickle.loads(pickle.dumps(tst))
test_eq(tst2(x), x+2)
tst

Lambda(func=<function _add2>)

源代码

PartialLambda

 PartialLambda (func)

应用 partial(func, **kwargs) 的层

def test_func(a,b=2): return a+b
tst = PartialLambda(test_func, b=5)
test_eq(tst(x), x+5)

源代码

展平

 Flatten (full=False)

将 x 展平为单个维度，例如在模型末尾。full 用于秩为 1 的张量

tst = Flatten()
x = torch.randn(10,5,4)
test_eq(tst(x).shape, [10,20])
tst = Flatten(full=True)
test_eq(tst(x).shape, [200])

源代码

ToTensorBase

 ToTensorBase (tensor_cls=<class 'fastai.torch_core.TensorBase'>)

将 x 转换为 TensorBase 类

ttb = ToTensorBase()
timg = TensorImage(torch.rand(1,3,32,32))
test_eq(type(ttb(timg)), TensorBase)

源代码

View

 View (*size)

将 x 重塑为 size

tst = View(10,5,4)
test_eq(tst(x).shape, [10,5,4])

源代码

ResizeBatch

 ResizeBatch (*size)

将 x 重塑为 size，同时保持批处理维度大小不变

tst = ResizeBatch(5,4)
test_eq(tst(x).shape, [10,5,4])

源代码

调试器

 Debugger ()

用于在模型内部调试的模块。

源代码

sigmoid_range

 sigmoid_range (x, low, high)

范围为 (low, high) 的 Sigmoid 函数

test = tensor([-10.,0.,10.])
assert torch.allclose(sigmoid_range(test, -1,  2), tensor([-1.,0.5, 2.]), atol=1e-4, rtol=1e-4)
assert torch.allclose(sigmoid_range(test, -5, -1), tensor([-5.,-3.,-1.]), atol=1e-4, rtol=1e-4)
assert torch.allclose(sigmoid_range(test,  2,  4), tensor([2.,  3., 4.]), atol=1e-4, rtol=1e-4)

源代码

SigmoidRange

 SigmoidRange (low, high)

范围为 (low, high) 的 Sigmoid 模块

tst = SigmoidRange(-1, 2)
assert torch.allclose(tst(test), tensor([-1.,0.5, 2.]), atol=1e-4, rtol=1e-4)

池化层

源代码

AdaptiveConcatPool1d

 AdaptiveConcatPool1d (size=None)

拼接 AdaptiveAvgPool1d 和 AdaptiveMaxPool1d 的层

源代码

AdaptiveConcatPool2d

 AdaptiveConcatPool2d (size=None)

拼接 AdaptiveAvgPool2d 和 AdaptiveMaxPool2d 的层

如果输入是 bs x nf x h x h，如果没有传递 size，输出将是 bs x 2*nf x 1 x 1，或者 bs x 2*nf x size x size

tst = AdaptiveConcatPool2d()
x = torch.randn(10,5,4,4)
test_eq(tst(x).shape, [10,10,1,1])
max1 = torch.max(x,    dim=2, keepdim=True)[0]
maxp = torch.max(max1, dim=3, keepdim=True)[0]
test_eq(tst(x)[:,:5], maxp)
test_eq(tst(x)[:,5:], x.mean(dim=[2,3], keepdim=True))
tst = AdaptiveConcatPool2d(2)
test_eq(tst(x).shape, [10,10,2,2])

源代码

PoolType

 PoolType ()

初始化 self。请参阅 help(type(self)) 获取准确的签名。

源代码

adaptive_pool

 adaptive_pool (pool_type)

源代码

PoolFlatten

 PoolFlatten (pool_type='Avg')

结合 nn.AdaptiveAvgPool2d 和 Flatten。

tst = PoolFlatten()
test_eq(tst(x).shape, [10,5])
test_eq(tst(x), x.mean(dim=[2,3]))

BatchNorm 层

源代码

BatchNorm

 BatchNorm (nf, ndim=2, norm_type=<NormType.Batch: 1>, eps:float=1e-05,
            momentum:Optional[float]=0.1, affine:bool=True,
            track_running_stats:bool=True, device=None, dtype=None)

具有 nf 特征和 ndim 的 BatchNorm 层，根据 norm_type 进行初始化。

源代码

InstanceNorm

 InstanceNorm (nf, ndim=2, norm_type=<NormType.Instance: 5>, affine=True,
               eps:float=1e-05, momentum:float=0.1,
               track_running_stats:bool=False, device=None, dtype=None)

具有 nf 特征和 ndim 的 InstanceNorm 层，根据 norm_type 进行初始化。

kwargs 被传递给 nn.BatchNorm，可以是 eps、momentum、affine 和 track_running_stats。

tst = BatchNorm(15)
assert isinstance(tst, nn.BatchNorm2d)
test_eq(tst.weight, torch.ones(15))
tst = BatchNorm(15, norm_type=NormType.BatchZero)
test_eq(tst.weight, torch.zeros(15))
tst = BatchNorm(15, ndim=1)
assert isinstance(tst, nn.BatchNorm1d)
tst = BatchNorm(15, ndim=3)
assert isinstance(tst, nn.BatchNorm3d)

tst = InstanceNorm(15)
assert isinstance(tst, nn.InstanceNorm2d)
test_eq(tst.weight, torch.ones(15))
tst = InstanceNorm(15, norm_type=NormType.InstanceZero)
test_eq(tst.weight, torch.zeros(15))
tst = InstanceNorm(15, ndim=1)
assert isinstance(tst, nn.InstanceNorm1d)
tst = InstanceNorm(15, ndim=3)
assert isinstance(tst, nn.InstanceNorm3d)

如果 affine 为 false，则 weight 应为 None

test_eq(BatchNorm(15, affine=False).weight, None)
test_eq(InstanceNorm(15, affine=False).weight, None)

源代码

BatchNorm1dFlat

 BatchNorm1dFlat (num_features:int, eps:float=1e-05,
                  momentum:Optional[float]=0.1, affine:bool=True,
                  track_running_stats:bool=True, device=None, dtype=None)

nn.BatchNorm1d，但首先展平前面的维度

tst = BatchNorm1dFlat(15)
x = torch.randn(32, 64, 15)
y = tst(x)
mean = x.mean(dim=[0,1])
test_close(tst.running_mean, 0*0.9 + mean*0.1)
var = (x-mean).pow(2).mean(dim=[0,1])
test_close(tst.running_var, 1*0.9 + var*0.1, eps=1e-4)
test_close(y, (x-mean)/torch.sqrt(var+1e-5) * tst.weight + tst.bias, eps=1e-4)

源代码

LinBnDrop

 LinBnDrop (n_in, n_out, bn=True, p=0.0, act=None, lin_first=False)

组合了 BatchNorm1d、Dropout 和 Linear 层的模块

如果 bn=False，则跳过 BatchNorm 层；如果 p=0.，则跳过 dropout。可选地，您可以在线性层之后使用 act 添加激活函数。

tst = LinBnDrop(10, 20)
mods = list(tst.children())
test_eq(len(mods), 2)
assert isinstance(mods[0], nn.BatchNorm1d)
assert isinstance(mods[1], nn.Linear)

tst = LinBnDrop(10, 20, p=0.1)
mods = list(tst.children())
test_eq(len(mods), 3)
assert isinstance(mods[0], nn.BatchNorm1d)
assert isinstance(mods[1], nn.Dropout)
assert isinstance(mods[2], nn.Linear)

tst = LinBnDrop(10, 20, act=nn.ReLU(), lin_first=True)
mods = list(tst.children())
test_eq(len(mods), 3)
assert isinstance(mods[0], nn.Linear)
assert isinstance(mods[1], nn.ReLU)
assert isinstance(mods[2], nn.BatchNorm1d)

tst = LinBnDrop(10, 20, bn=False)
mods = list(tst.children())
test_eq(len(mods), 1)
assert isinstance(mods[0], nn.Linear)

初始化

源代码

sigmoid

 sigmoid (input, eps=1e-07)

与 torch.sigmoid 相同，加上限制在 `(eps,1-eps)` 范围内

源代码

sigmoid_

 sigmoid_ (input, eps=1e-07)

与 torch.sigmoid_ 相同，加上限制在 `(eps,1-eps)` 范围内

源代码

vleaky_relu

 vleaky_relu (input, inplace=True)

斜率为 0.3 的 F.leaky_relu

源代码

init_default

 init_default (m, func=<function kaiming_normal_>)

使用 func 初始化 m 的权重并将 bias 设置为 0。

源代码

init_linear

 init_linear (m, act_func=None, init='auto', bias_std=0.01)

卷积

源代码

ConvLayer

 ConvLayer (ni, nf, ks=3, stride=1, padding=None, bias=None, ndim=2,
            norm_type=<NormType.Batch: 1>, bn_1st=True, act_cls=<class
            'torch.nn.modules.activation.ReLU'>, transpose=False,
            init='auto', xtra=None, bias_std=0.01,
            dilation:Union[int,Tuple[int,int]]=1, groups:int=1,
            padding_mode:str='zeros', device=None, dtype=None)

创建一个卷积层（从 ni 到 nf）、ReLU 层（如果 use_activ 为 true）和 norm_type 层的序列。

卷积使用 ks（核大小）、stride、padding 和 bias。padding 将默认为适当的值（如果不是转置卷积，则为 (ks-1)//2），如果 norm_type 是 Spectral 或 Weight，bias 将默认为 True；如果是 Batch 或 BatchZero，则默认为 False。请注意，如果您不希望进行任何归一化，应传递 norm_type=None。

这定义了一个具有 ndim（1,2 或 3）的卷积层，如果 transpose=True，它将是 ConvTranspose。act_cls 是要使用的激活函数的类（在内部实例化）。如果您不希望使用激活函数，请传递 act=None。如果您想快速更改默认激活函数，可以修改 defaults.activation 的值。

init 用于初始化权重（bias 初始化为 0），xtra 是一个可选层，可以添加到末尾。

tst = ConvLayer(16, 32)
mods = list(tst.children())
test_eq(len(mods), 3)
test_eq(mods[1].weight, torch.ones(32))
test_eq(mods[0].padding, (1,1))

x = torch.randn(64, 16, 8, 8)#.cuda()

#Padding is selected to make the shape the same if stride=1
test_eq(tst(x).shape, [64,32,8,8])

#Padding is selected to make the shape half if stride=2
tst = ConvLayer(16, 32, stride=2)
test_eq(tst(x).shape, [64,32,4,4])

#But you can always pass your own padding if you want
tst = ConvLayer(16, 32, padding=0)
test_eq(tst(x).shape, [64,32,6,6])

#No bias by default for Batch NormType
assert mods[0].bias is None
#But can be overridden with `bias=True`
tst = ConvLayer(16, 32, bias=True)
assert first(tst.children()).bias is not None
#For no norm, or spectral/weight, bias is True by default
for t in [None, NormType.Spectral, NormType.Weight]:
    tst = ConvLayer(16, 32, norm_type=t)
    assert first(tst.children()).bias is not None

#Various n_dim/tranpose
tst = ConvLayer(16, 32, ndim=3)
assert isinstance(list(tst.children())[0], nn.Conv3d)
tst = ConvLayer(16, 32, ndim=1, transpose=True)
assert isinstance(list(tst.children())[0], nn.ConvTranspose1d)

#No activation/leaky
tst = ConvLayer(16, 32, ndim=3, act_cls=None)
mods = list(tst.children())
test_eq(len(mods), 2)
tst = ConvLayer(16, 32, ndim=3, act_cls=partial(nn.LeakyReLU, negative_slope=0.1))
mods = list(tst.children())
test_eq(len(mods), 3)
assert isinstance(mods[2], nn.LeakyReLU)

# #export
# def linear(in_features, out_features, bias=True, act_cls=None, init='auto'):
#     "Linear layer followed by optional activation, with optional auto-init"
#     res = nn.Linear(in_features, out_features, bias=bias)
#     if act_cls: act_cls = act_cls()
#     init_linear(res, act_cls, init=init)
#     if act_cls: res = nn.Sequential(res, act_cls)
#     return res

# #export
# @delegates(ConvLayer)
# def conv1d(ni, nf, ks, stride=1, ndim=1, norm_type=None, **kwargs):
#     "Convolutional layer followed by optional activation, with optional auto-init"
#     return ConvLayer(ni, nf, ks, stride=stride, ndim=ndim, norm_type=norm_type, **kwargs)

# #export
# @delegates(ConvLayer)
# def conv2d(ni, nf, ks, stride=1, ndim=2, norm_type=None, **kwargs):
#     "Convolutional layer followed by optional activation, with optional auto-init"
#     return ConvLayer(ni, nf, ks, stride=stride, ndim=ndim, norm_type=norm_type, **kwargs)

# #export
# @delegates(ConvLayer)
# def conv3d(ni, nf, ks, stride=1, ndim=3, norm_type=None, **kwargs):
#     "Convolutional layer followed by optional activation, with optional auto-init"
#     return ConvLayer(ni, nf, ks, stride=stride, ndim=ndim, norm_type=norm_type, **kwargs)

源代码

AdaptiveAvgPool

 AdaptiveAvgPool (sz=1, ndim=2)

用于 ndim 的 nn.AdaptiveAvgPool 层

源代码

MaxPool

 MaxPool (ks=2, stride=None, padding=0, ndim=2, ceil_mode=False)

用于 ndim 的 nn.MaxPool 层

源代码

AvgPool

 AvgPool (ks=2, stride=None, padding=0, ndim=2, ceil_mode=False)

用于 ndim 的 nn.AvgPool 层

嵌入

源代码

trunc_normal_

 trunc_normal_ (x, mean=0.0, std=1.0)

截断正态初始化（近似）

源代码

Embedding

 Embedding (ni, nf, std=0.01)

使用截断正态初始化的嵌入层

截断正态初始化限制分布以避免出现大值。对于给定的标准差 std，边界大致为 -2*std、2*std。

std = 0.02
tst = Embedding(10, 30, std)
assert tst.weight.min() > -2*std
assert tst.weight.max() < 2*std
test_close(tst.weight.mean(), 0, 1e-2)
test_close(tst.weight.std(), std, 0.1)

自注意力

源代码

SelfAttention

 SelfAttention (n_channels)

用于 n_channels 的自注意力层。

在 Self-Attention Generative Adversarial Networks 中引入的自注意力层。

最初，不对输入进行任何更改。这由一个名为 gamma 的可训练参数控制，因为我们返回 x + gamma * out。

tst = SelfAttention(16)
x = torch.randn(32, 16, 8, 8)
test_eq(tst(x),x)

然后，在训练过程中，gamma 可能会改变，因为它是一个可训练参数。让我们看看当它获得非零值时会发生什么。

tst.gamma.data.fill_(1.)
y = tst(x)
test_eq(y.shape, [32,16,8,8])

注意力机制需要三次矩阵乘法（此处用 1x1 卷积表示）。乘法在通道级别（我们张量中的第二个维度）进行，并且我们展平特征图（此处为 8x8）。如论文中所述，我们将这些乘法的结果记为 f、g 和 h。

q,k,v = tst.query[0].weight.data,tst.key[0].weight.data,tst.value[0].weight.data
test_eq([q.shape, k.shape, v.shape], [[2, 16, 1], [2, 16, 1], [16, 16, 1]])
f,g,h = map(lambda m: x.view(32, 16, 64).transpose(1,2) @ m.squeeze().t(), [q,k,v])
test_eq([f.shape, g.shape, h.shape], [[32,64,2], [32,64,2], [32,64,16]])

注意力层的关键部分是计算特征图中每个位置（此处 8x8 = 64）的注意力权重。这些是总和为 1 的正数，告诉模型应该关注图片的哪个部分。我们将 f 与 g 的转置相乘（得到一个大小为 bs x 64 x 64 的结果），然后对第一个维度应用 softmax（得到总和为 1 的正数）。然后可以将结果与 h 的转置相乘，得到一个大小为 bs x 通道数 x 64 的输出，然后可以将其视为与原始输入相同大小的输出。

最终结果是 x + gamma * out，如我们之前所见。

beta = F.softmax(torch.bmm(f, g.transpose(1,2)), dim=1)
test_eq(beta.shape, [32, 64, 64])
out = torch.bmm(h.transpose(1,2), beta)
test_eq(out.shape, [32, 16, 64])
test_close(y, x + out.view(32, 16, 8, 8), eps=1e-4)

源代码

PooledSelfAttention2d

 PooledSelfAttention2d (n_channels)

用于二维的池化自注意力层。

在 Big GAN 论文中使用的自注意力层。

它使用与 SelfAttention 相同的注意力机制，但在计算矩阵 g 和 h 之前添加了一个步幅为 2 的最大池化层：注意力集中在一个 2x2 最大池化窗口上，而不是整个特征图。还在输出的末尾添加了一个最终的矩阵乘积，然后返回 gamma * out + x。

源代码

SimpleSelfAttention

 SimpleSelfAttention (n_in:int, ks=1, sym=False)

与 nn.Module 相同，但子类无需调用 super().__init__

像素洗牌

PixelShuffle 在这篇文章中提出，用于避免图像上采样时的棋盘状伪影。如果我们需要一个具有 ch_out 个滤波器的输出，我们使用一个具有 ch_out * (r**2) 个滤波器的卷积，其中 r 是上采样因子。然后我们像下图一样重新组织这些滤波器

Pixelshuffle

源代码

icnr_init

 icnr_init (x, scale=2, init=<function kaiming_normal_>)

x 的 ICNR 初始化，带有 scale 和 init 函数

ICNR 初始化在这篇文章中提出。它建议初始化将在 PixelShuffle 中使用的卷积，以便 r**2 个通道中的每个通道都获得相同的权重（这样在上图中，3x3 窗口中的 9 种颜色最初是相同的）。

注意

这是在第一个维度上完成的，因为 PyTorch 以这种格式存储卷积层的权重：ch_out x ch_in x ks x ks。

tst = torch.randn(16*4, 32, 1, 1)
tst = icnr_init(tst)
for i in range(0,16*4,4):
    test_eq(tst[i],tst[i+1])
    test_eq(tst[i],tst[i+2])
    test_eq(tst[i],tst[i+3])

源代码

PixelShuffle_ICNR

 PixelShuffle_ICNR (ni, nf=None, scale=2, blur=False,
                    norm_type=<NormType.Weight: 3>, act_cls=<class
                    'torch.nn.modules.activation.ReLU'>)

使用 nn.PixelShuffle 将 ni 滤波器上采样 scale 倍到 nf（默认为 ni）。

卷积层使用 icnr_init 初始化，并传递 act_cls 和 norm_type（在我们的实验中，权重归一化的默认设置似乎最适合超分辨率问题）。

blur 选项来自 Super-Resolution using Convolutional Neural Networks without Any Checkerboard Artifacts，作者在该论文中添加了一点模糊来完全消除棋盘状伪影。

psfl = PixelShuffle_ICNR(16)
x = torch.randn(64, 16, 8, 8)
y = psfl(x)
test_eq(y.shape, [64, 16, 16, 16])
#ICNR init makes every 2x2 window (stride 2) have the same elements
for i in range(0,16,2):
    for j in range(0,16,2):
        test_eq(y[:,:,i,j],y[:,:,i+1,j])
        test_eq(y[:,:,i,j],y[:,:,i  ,j+1])
        test_eq(y[:,:,i,j],y[:,:,i+1,j+1])

psfl = PixelShuffle_ICNR(16, norm_type=None)
x = torch.randn(64, 16, 8, 8)
y = psfl(x)
test_eq(y.shape, [64, 16, 16, 16])
#ICNR init makes every 2x2 window (stride 2) have the same elements
for i in range(0,16,2):
    for j in range(0,16,2):
        test_eq(y[:,:,i,j],y[:,:,i+1,j])
        test_eq(y[:,:,i,j],y[:,:,i  ,j+1])
        test_eq(y[:,:,i,j],y[:,:,i+1,j+1])

psfl = PixelShuffle_ICNR(16, norm_type=NormType.Spectral)
x = torch.randn(64, 16, 8, 8)
y = psfl(x)
test_eq(y.shape, [64, 16, 16, 16])
#ICNR init makes every 2x2 window (stride 2) have the same elements
for i in range(0,16,2):
    for j in range(0,16,2):
        test_eq(y[:,:,i,j],y[:,:,i+1,j])
        test_eq(y[:,:,i,j],y[:,:,i  ,j+1])
        test_eq(y[:,:,i,j],y[:,:,i+1,j+1])

Sequential 扩展

源代码

sequential

 sequential (*args)

创建一个 nn.Sequential，如果需要，使用 Lambda 包装项目

源代码

SequentialEx

 SequentialEx (*layers)

类似于 nn.Sequential，但具有 ModuleList 语义，并且可以访问模块输入

这对于以顺序方式编写需要记住输入（如 resnet 块）的层非常有用。

源代码

MergeLayer

 MergeLayer (dense:bool=False)

通过将快捷连接与模块结果相加或（如果 dense=True）拼接来合并它们。

res_block = SequentialEx(ConvLayer(16, 16), ConvLayer(16,16))
res_block.append(MergeLayer()) # just to test append - normally it would be in init params
x = torch.randn(32, 16, 8, 8)
y = res_block(x)
test_eq(y.shape, [32, 16, 8, 8])
test_eq(y, x + res_block[1](res_block[0](x)))

x = TensorBase(torch.randn(32, 16, 8, 8))
y = res_block(x)
test_is(y.orig, None)

拼接

等同于 keras.layers.Concatenate，它将在给定维度（默认为过滤器维度）上拼接 ModuleList 的输出

源代码

连接

 Cat (layers, dim=1)

在给定维度上连接层输出

layers = [ConvLayer(2,4), ConvLayer(2,4), ConvLayer(2,4)] 
x = torch.rand(1,2,8,8) 
cat = Cat(layers) 
test_eq(cat(x).shape, [1,12,8,8]) 
test_eq(cat(x), torch.cat([l(x) for l in layers], dim=1))

即用模型

源代码

SimpleCNN

 SimpleCNN (filters, kernel_szs=None, strides=None, bn=True)

使用 filters 创建一个简单的 CNN。

模型是一个从 (filters[0],filters[1]) 到 (filters[n-2],filters[n-1]) 的卷积层序列（如果 n 是 filters 列表的长度），后跟一个 PoolFlatten。kernel_szs 和 strides 默认为一个包含 3 的列表和一个包含 2 的列表。如果 bn=True，卷积层序列为 conv-relu-batchnorm，否则为 conv-relu。

tst = SimpleCNN([8,16,32])
mods = list(tst.children())
test_eq(len(mods), 3)
test_eq([[m[0].in_channels, m[0].out_channels] for m in mods[:2]], [[8,16], [16,32]])

测试核大小

tst = SimpleCNN([8,16,32], kernel_szs=[1,3])
mods = list(tst.children())
test_eq([m[0].kernel_size for m in mods[:2]], [(1,1), (3,3)])

测试步幅

tst = SimpleCNN([8,16,32], strides=[1,2])
mods = list(tst.children())
test_eq([m[0].stride for m in mods[:2]], [(1,1),(2,2)])

源代码

ProdLayer

 ProdLayer ()

通过将快捷连接与模块结果相乘来合并它们。

源代码

SEModule

 SEModule (ch, reduction, act_cls=<class
           'torch.nn.modules.activation.ReLU'>)

源代码

残差块

 ResBlock (expansion, ni, nf, stride=1, groups=1, reduction=None,
           nh1=None, nh2=None, dw=False, g2=1, sa=False, sym=False,
           norm_type=<NormType.Batch: 1>, act_cls=<class
           'torch.nn.modules.activation.ReLU'>, ndim=2, ks=3,
           pool=<function AvgPool>, pool_first=True, padding=None,
           bias=None, bn_1st=True, transpose=False, init='auto',
           xtra=None, bias_std=0.01, dilation:Union[int,Tuple[int,int]]=1,
           padding_mode:str='zeros', device=None, dtype=None)

从 ni 到 nh 的 Resnet 块，带有 stride

这是一个 Resnet 块（根据 expansion 不同可以是普通块或 bottleneck 块，普通块为 1，传统 bottleneck 块为 4），它实现了 Bag of Tricks for Image Classification with Convolutional Neural Networks 中的改进。特别是，最后一个 batchnorm 层（如果选中该 norm_type）使用零权重（或 gamma）初始化，以促进从网络开始到结束的信息流动。它还实现了可选的 Squeeze and Excitation 以及用于 ResNeXT 及类似模型的分组卷积（使用 dw=True 表示深度可分离卷积）。

kwargs 以及 norm_type 被传递给 ConvLayer。

源代码

SEBlock

 SEBlock (expansion, ni, nf, groups=1, reduction=16, stride=1, **kwargs)

源代码

SEResNeXtBlock

 SEResNeXtBlock (expansion, ni, nf, groups=32, reduction=16, stride=1,
                 base_width=4, **kwargs)

源代码

SeparableBlock

 SeparableBlock (expansion, ni, nf, reduction=16, stride=1, base_width=4,
                 **kwargs)

时间分布层

等同于 Keras TimeDistributed 层，能够在某个轴上计算 PyTorch Module。

bs, seq_len = 2, 5
x, y = torch.rand(bs,seq_len,3,2,2), torch.rand(bs,seq_len,3,2,2)

tconv = TimeDistributed(nn.Conv2d(3,4,1))
test_eq(tconv(x).shape, (2,5,4,2,2))
tconv.low_mem=True
test_eq(tconv(x).shape, (2,5,4,2,2))

class Mod(Module):
    def __init__(self):
        self.conv = nn.Conv2d(3,4,1)
    def forward(self, x, y):
        return self.conv(x) + self.conv(y)
tmod = TimeDistributed(Mod())

out = tmod(x,y)
test_eq(out.shape, (2,5,4,2,2))
tmod.low_mem=True
out_low_mem = tmod(x,y)
test_eq(out_low_mem.shape, (2,5,4,2,2))
test_eq(out, out_low_mem)

class Mod2(Module):
    def __init__(self):
        self.conv = nn.Conv2d(3,4,1)
    def forward(self, x, y):
        return self.conv(x), self.conv(y)
tmod2 = TimeDistributed(Mod2())

out = tmod2(x,y)
test_eq(len(out), 2)
test_eq(out[0].shape, (2,5,4,2,2))
tmod2.low_mem=True
out_low_mem = tmod2(x,y)
test_eq(out_low_mem[0].shape, (2,5,4,2,2))
test_eq(out, out_low_mem)

源代码

TimeDistributed

 TimeDistributed (module, low_mem=False, tdim=1)

在 tdim 轴上对每个步骤应用 module。使用 low_mem 逐个计算以节省内存。

这个模块等同于 Keras TimeDistributed Layer。这个包装器允许将一个层应用于输入的每个时间切片。默认情况下，假设时间轴（tdim）是第一个轴（批处理大小之后的那个轴）。一个典型的用法是使用图像编码器对图像序列进行编码。

TimeDistributed 的 forward 函数支持 *args 和 **kkwargs，但只有 args 会被分割并独立传递给底层模块的每个时间步，而 kwargs 将按原样传递。当您有接受多个参数作为输入的模块时，这非常有用，这样您可以将所有需要分割的张量作为 args，将其他不需要分割的参数作为 kwargs。

这个模块对内存要求很高，因为它会尝试在批处理维度上同时传递多个时间步。如果遇到内存不足错误，请首先尝试将您的批处理大小除以时间步数。

from fastai.vision.all import *

encoder = create_body(resnet18())

Resnet18 将编码一个具有 512 个通道的特征图。高度和宽度将除以 32。

time_resnet = TimeDistributed(encoder)

一个包含 2 个图像序列（长度为 5）的合成批次。(bs, seq_len, ch, w, h)

image_sequence = torch.rand(2, 5, 3, 64, 64)

time_resnet(image_sequence).shape

torch.Size([2, 5, 512, 2, 2])

通过这种方式，可以在特征空间上编码图像序列。还有一个 low_mem_forward 函数，它将逐个传递图像以减少 GPU 内存消耗。

time_resnet.low_mem_forward(image_sequence).shape

torch.Size([2, 5, 512, 2, 2])

Swish 和 Mish

源代码

swish

 swish (x, inplace=False)

源代码

SwishJit

 SwishJit ()

与 nn.Module 相同，但子类无需调用 super().__init__

源代码

MishJitAutoFn

 MishJitAutoFn (*args, **kwargs)

*创建自定义 autograd.Function 的基类。

要创建自定义 autograd.Function，请继承此类并实现 :meth:forward 和 :meth:backward 静态方法。然后，要在前向传播中使用您的自定义操作，请调用类方法 apply。不要直接调用 :meth:forward。

为确保正确性和最佳性能，请确保您在 ctx 上调用了正确的方法，并使用 :func:torch.autograd.gradcheck 验证您的反向传播函数。

有关如何使用此类的更多详细信息，请参阅 :ref:extending-autograd。

示例：

>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_AUTOGRAD)
>>> class Exp(Function):
>>>     @staticmethod
>>>     def forward(ctx, i):
>>>         result = i.exp()
>>>         ctx.save_for_backward(result)
>>>         return result
>>>
>>>     @staticmethod
>>>     def backward(ctx, grad_output):
>>>         result, = ctx.saved_tensors
>>>         return grad_output * result
>>>
>>> # Use it by calling the apply method:
>>> # xdoctest: +SKIP
>>> output = Exp.apply(input)*

源代码

mish

 mish (x, inplace=False)

源代码

MishJit

 MishJit ()

与 nn.Module 相同，但子类无需调用 super().__init__

子模块辅助函数

很容易获取给定模型的所有参数列表。如果您想要所有子模块（如线性/卷积层）而不错过单独的参数，以下类将这些参数包装在伪模块中。

源代码

ParameterModule

 ParameterModule (p)

在模块中注册一个单独的参数 p。

源代码

children_and_parameters

 children_and_parameters (m)

返回 m 的子模块及其未在模块中注册的直接参数。

class TstModule(Module):
    def __init__(self): self.a,self.lin = nn.Parameter(torch.randn(1)),nn.Linear(5,10)

tst = TstModule()
children = children_and_parameters(tst)
test_eq(len(children), 2)
test_eq(children[0], tst.lin)
assert isinstance(children[1], ParameterModule)
test_eq(children[1].val, tst.a)

源代码

has_children

 has_children (m)

class A(Module): pass
assert not has_children(A())
assert has_children(TstModule())

源代码

flatten_model

 flatten_model (m)

返回 m 的所有子模块和参数列表。

tst = nn.Sequential(TstModule(), TstModule())
children = flatten_model(tst)
test_eq(len(children), 4)
assert isinstance(children[1], ParameterModule)
assert isinstance(children[3], ParameterModule)

源代码

NoneReduce

 NoneReduce (loss_func)

用于以 none reduce 方式评估 loss_func 的上下文管理器。

x,y = torch.randn(5),torch.randn(5)
loss_fn = nn.MSELoss()
with NoneReduce(loss_fn) as loss_func:
    loss = loss_func(x,y)
test_eq(loss.shape, [5])
test_eq(loss_fn.reduction, 'mean')

loss_fn = F.mse_loss
with NoneReduce(loss_fn) as loss_func:
    loss = loss_func(x,y)
test_eq(loss.shape, [5])
test_eq(loss_fn, F.mse_loss)

源代码

in_channels

 in_channels (m)

返回 m 中第一个权重层的形状。

test_eq(in_channels(nn.Sequential(nn.Conv2d(5,4,3), nn.Conv2d(4,3,3))), 5)
test_eq(in_channels(nn.Sequential(nn.AvgPool2d(4), nn.Conv2d(4,3,3))), 4)
test_eq(in_channels(nn.Sequential(BatchNorm(4), nn.Conv2d(4,3,3))), 4)
test_eq(in_channels(nn.Sequential(InstanceNorm(4), nn.Conv2d(4,3,3))), 4)
test_eq(in_channels(nn.Sequential(InstanceNorm(4, affine=False), nn.Conv2d(4,3,3))), 4)
test_fail(lambda : in_channels(nn.Sequential(nn.AvgPool2d(4))))