模型钩子

用于在模型中添加钩子的回调和辅助函数

from fastai.test_utils import *

什么是钩子？

钩子是可以附加到模型中特定层的函数，它们会在前向传播（对于前向钩子）或后向传播（对于后向钩子）时执行。这里我们从钩子的介绍开始，但如果你想快速实现一个钩子，可以直接跳到HookCallback（并阅读下面的例子ActivationStats）。

前向钩子是接受三个参数的函数：应用它的层、该层的输入以及该层的输出。

tst_model = nn.Linear(5,3)
def example_forward_hook(m,i,o): print(m,i,o)
    
x = torch.randn(4,5)
hook = tst_model.register_forward_hook(example_forward_hook)
y = tst_model(x)
hook.remove()

Linear(in_features=5, out_features=3, bias=True) (tensor([[-0.9811,  0.1455,  0.3667,  0.7821,  1.0376],
        [ 0.4916, -0.8581,  0.1134,  0.1752, -0.0595],
        [ 0.4517, -0.9027,  1.3693, -0.8399,  1.4931],
        [-0.7818, -1.1915, -0.1014,  1.1878, -0.8517]]),) tensor([[-0.1019, -0.4006, -0.3282],
        [-0.0551,  0.5754,  0.0726],
        [-0.5382, -0.1731, -0.1683],
        [-0.3195,  0.7669,  0.3924]], grad_fn=<AddmmBackward0>)

后向钩子是接受三个参数的函数：应用它的层、损失相对于输入的梯度，以及损失相对于输出的梯度。

def example_backward_hook(m,gi,go): print(m,gi,go)
hook = tst_model.register_backward_hook(example_backward_hook)

x = torch.randn(4,5)
y = tst_model(x)
loss = y.pow(2).mean()
loss.backward()
hook.remove()

Linear(in_features=5, out_features=3, bias=True) (tensor([ 0.0913,  0.3834, -0.0015]), None, tensor([[ 0.1872,  0.1248, -0.2946],
        [ 0.1090, -0.3164, -0.2486],
        [-0.0468, -0.1728, -0.1686],
        [-0.0787,  0.3200,  0.0099],
        [-0.0308, -0.1119,  0.0056]])) (tensor([[ 0.0414,  0.1750,  0.0672],
        [-0.0252,  0.0636,  0.0592],
        [ 0.1243,  0.0364, -0.1118],
        [-0.0491,  0.1084, -0.0160]]),)

/home/benja/.conda/envs/fastaidev/lib/python3.12/site-packages/torch/nn/modules/module.py:1830: FutureWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
  self._maybe_warn_non_full_backward_hook(args, result, grad_fn)

钩子可以改变层的输入/输出或梯度，打印值或形状。如果你想存储与这些输入/输出相关的东西，最好将你的钩子与一个类关联起来，以便它可以将其放入该类实例的状态中。

源码

Hook

 Hook (m, hook_func, is_forward=True, detach=True, cpu=False,
       gather=False)

使用 `hook_func` 在 `m` 上创建一个钩子。

如果 `is_forward=True`，它将在前向传播期间被调用，否则在后向传播期间被调用，并可选择地将模型的输入/输出（或梯度）`detach`、`gather` 并放到 `cpu` 上，然后将它们传递给 `hook_func`。`hook_func` 的结果将存储在Hook的 `stored` 属性中。

tst_model = nn.Linear(5,3)
hook = Hook(tst_model, lambda m,i,o: o)
y = tst_model(x)
test_eq(hook.stored, y)

源码

Hook.hook_fn

 Hook.hook_fn (module, input, output)

将 `hook_func` 应用于module、`input`、`output`。

源码

Hook.remove

 Hook.remove ()

从模型中移除钩子。

注意

完成使用后正确移除模型的钩子非常重要，这样可以避免下次将模型应用于某些输入时再次调用它们，并释放与其状态相关的内存。

tst_model = nn.Linear(5,10)
x = torch.randn(4,5)
y = tst_model(x)
hook = Hook(tst_model, example_forward_hook)
test_stdout(lambda: tst_model(x), f"{tst_model} ({x},) {y.detach()}")
hook.remove()
test_stdout(lambda: tst_model(x), "")

上下文管理器

由于即使你的代码因某些错误中断，正确移除Hook也非常重要，因此Hook可以被用作上下文管理器。

源码

Hook.enter

 Hook.__enter__ (*args)

注册钩子

源码

Hook.exit

 Hook.__exit__ (*args)

移除钩子

tst_model = nn.Linear(5,10)
x = torch.randn(4,5)
y = tst_model(x)
with Hook(tst_model, example_forward_hook) as h:
    test_stdout(lambda: tst_model(x), f"{tst_model} ({x},) {y.detach()}")
test_stdout(lambda: tst_model(x), "")

源码

hook_output

 hook_output (module, detach=True, cpu=False, grad=False)

返回一个将module的激活值存储在 `self.stored` 中的Hook。

如果 `grad=True`，存储的激活值是梯度，否则是module的输出。如果 `detach=True`，它们将从其历史记录中分离；如果 `cpu=True`，它们将被放到 CPU 上。

tst_model = nn.Linear(5,10)
x = torch.randn(4,5)
with hook_output(tst_model) as h:
    y = tst_model(x)
    test_eq(y, h.stored)
    assert not h.stored.requires_grad
    
with hook_output(tst_model, grad=True) as h:
    y = tst_model(x)
    loss = y.pow(2).mean()
    loss.backward()
    test_close(2*y / y.numel(), h.stored[0])

with hook_output(tst_model, cpu=True) as h:
    y = tst_model.cuda()(x.cuda())
    test_eq(h.stored.device, torch.device('cpu'))

源码

Hooks

 Hooks (ms, hook_func, is_forward=True, detach=True, cpu=False)

使用 `hook_func` 在 `ms` 中的模块上创建多个钩子。

layers = [nn.Linear(5,10), nn.ReLU(), nn.Linear(10,3)]
tst_model = nn.Sequential(*layers)
hooks = Hooks(tst_model, lambda m,i,o: o)
y = tst_model(x)
test_eq(hooks.stored[0], layers[0](x))
test_eq(hooks.stored[1], F.relu(layers[0](x)))
test_eq(hooks.stored[2], y)
hooks.remove()

源码

Hooks.stored

 Hooks.stored ()

源码

Hooks.remove

 Hooks.remove ()

从模型中移除钩子。

上下文管理器

与Hook一样，你可以将Hooks用作上下文管理器。

源码

Hooks.enter

 Hooks.__enter__ (*args)

注册钩子

源码

Hooks.exit

 Hooks.__exit__ (*args)

移除钩子

layers = [nn.Linear(5,10), nn.ReLU(), nn.Linear(10,3)]
tst_model = nn.Sequential(*layers)
with Hooks(layers, lambda m,i,o: o) as h:
    y = tst_model(x)
    test_eq(h.stored[0], layers[0](x))
    test_eq(h.stored[1], F.relu(layers[0](x)))
    test_eq(h.stored[2], y)

源码

hook_outputs

 hook_outputs (modules, detach=True, cpu=False, grad=False)

返回将所有 `modules` 的激活值存储在 `self.stored` 中的Hooks。

如果 `grad=True`，存储的激活值是梯度，否则是 `modules` 的输出。如果 `detach=True`，它们将从其历史记录中分离；如果 `cpu=True`，它们将被放到 CPU 上。

layers = [nn.Linear(5,10), nn.ReLU(), nn.Linear(10,3)]
tst_model = nn.Sequential(*layers)
x = torch.randn(4,5)
with hook_outputs(layers) as h:
    y = tst_model(x)
    test_eq(h.stored[0], layers[0](x))
    test_eq(h.stored[1], F.relu(layers[0](x)))
    test_eq(h.stored[2], y)
    for s in h.stored: assert not s.requires_grad
    
with hook_outputs(layers, grad=True) as h:
    y = tst_model(x)
    loss = y.pow(2).mean()
    loss.backward()
    g = 2*y / y.numel()
    test_close(g, h.stored[2][0])
    g = g @ layers[2].weight.data
    test_close(g, h.stored[1][0])
    g = g * (layers[0](x) > 0).float()
    test_close(g, h.stored[0][0])

with hook_outputs(tst_model, cpu=True) as h:
    y = tst_model.cuda()(x.cuda())
    for s in h.stored: test_eq(s.device, torch.device('cpu'))

源码

dummy_eval

 dummy_eval (m, size=(64, 64))

在特定 `size` 的虚拟输入上评估 `m`。

源码

model_sizes

 model_sizes (m, size=(64, 64))

将虚拟输入通过模型 `m` 以获取各种激活的大小。

m = nn.Sequential(ConvLayer(3, 16), ConvLayer(16, 32, stride=2), ConvLayer(32, 32))
test_eq(model_sizes(m), [[1, 16, 64, 64], [1, 32, 32, 32], [1, 32, 32, 32]])

源码

num_features_model

 num_features_model (m)

返回 `m` 的输出特征数量。

m = nn.Sequential(nn.Conv2d(5,4,3), nn.Conv2d(4,3,3))
test_eq(num_features_model(m), 3)
m = nn.Sequential(ConvLayer(3, 16), ConvLayer(16, 32, stride=2), ConvLayer(32, 32))
test_eq(num_features_model(m), 32)

为了方便使用钩子，我们在 Callback 中封装了一个版本，你只需实现一个 `hook` 函数（以及你可能需要的任何元素）。

源码

has_params

 has_params (m)

检查 `m` 是否至少有一个参数

assert has_params(nn.Linear(3,4))
assert has_params(nn.LSTM(4,5,2))
assert not has_params(nn.ReLU())

源码

HookCallback

 HookCallback (modules=None, every=None, remove_end=True, is_forward=True,
               detach=True, cpu=True, include_paramless=False, hook=None)

可用于在 `modules` 上注册钩子的Callback。

你可以通过继承并实现一个 `hook` 函数（以及你想要的任何事件），或者在初始化时传递一个 `hook` 函数。这样的函数需要接受三个参数：层、输入和输出（对于后向钩子，输入指的是损失相对于输入的梯度，输出指损失相对于输出的梯度），并且可以修改它们或根据它们更新状态。

如果未提供，`modules` 将默认为 `self.model` 中具有 `weight` 属性的层。（要包含 `self.model` 中不具有 `weight` 属性的层，例如 `ReLU`、Flatten 等，请设置 `include_paramless=True`）。根据 `do_remove`，钩子将在训练结束时（或发生错误时）被正确移除。Hooks 会传递 `is_forward`、`detach` 和 `cpu`。

每次前向（或后向）传播时调用的函数是 `self.hook`，在继承此回调时必须实现该函数。

class TstCallback(HookCallback):
    def hook(self, m, i, o): return o
    def after_batch(self): test_eq(self.hooks.stored[0], self.pred)
        
learn = synth_learner(n_trn=5, cbs = TstCallback())
learn.fit(1)

[0, 6.587433815002441, 5.402360916137695, '00:00']

/home/benja/fastai/fastai/fastai/callback/core.py:71: UserWarning: You are shadowing an attribute (modules) that exists in the learner. Use `self.learn.modules` to avoid this
  warn(f"You are shadowing an attribute ({name}) that exists in the learner. Use `self.learn.{name}` to avoid this")

class TstCallback(HookCallback):
    def __init__(self, modules=None, remove_end=True, detach=True, cpu=False):
        super().__init__(modules, None, remove_end, False, detach, cpu)
    def hook(self, m, i, o): return o
    def after_batch(self):
        if self.training:
            test_eq(self.hooks.stored[0][0], 2*(self.pred-self.y)/self.pred.shape[0])
        
learn = synth_learner(n_trn=5, cbs = TstCallback())
learn.fit(1)

[0, 8.743090629577637, 10.072294235229492, '00:00']

源码

HookCallback.before_fit

 HookCallback.before_fit ()

在 `self.modules` 上注册Hooks。

源码

HookCallback.after_fit

 HookCallback.after_fit ()

移除Hooks。

模型概览

源码

total_params

 total_params (m)

给出模块的参数数量以及是否可训练。

test_eq(total_params(nn.Linear(10,32)), (32*10+32,True))
test_eq(total_params(nn.Linear(10,32, bias=False)), (32*10,True))
test_eq(total_params(nn.BatchNorm2d(20)), (20*2, True))
test_eq(total_params(nn.BatchNorm2d(20, affine=False)), (0,False))
test_eq(total_params(nn.Conv2d(16, 32, 3)), (16*32*3*3 + 32, True))
test_eq(total_params(nn.Conv2d(16, 32, 3, bias=False)), (16*32*3*3, True))
#First ih layer 20--10, all else 10--10. *4 for the four gates
test_eq(total_params(nn.LSTM(20, 10, 2)), (4 * (20*10 + 10) + 3 * 4 * (10*10 + 10), True))

源码

layer_info

 layer_info (learn, *xb)

返回 `model` 在 `xb` 上的层信息（仅支持 batch first 输入）。

`_track` 的输出期望是一个 `tuple`，包含模块名称、参数数量、层形状、是否可训练、所属的层组以及大小是否改变。有三个潜在的组别可以显示：

非激活层（Linear、Conv 等）
激活层
池化层

根据具体类型，只返回输出的一部分，否则返回 `''`。对于非激活层，返回所有信息。激活层只返回名称、形状和 `same` 为 `False`。池化层将返回名称、新的形状和 `same` 为 `False`。

def _m(): return nn.Sequential(nn.Linear(1,50), nn.ReLU(), nn.BatchNorm1d(50), nn.Linear(50, 1))
sample_input = torch.randn((16, 1))
test_eq(layer_info(synth_learner(model=_m()), sample_input), [
    ('Linear', 100, True, [1, 50], False),
    ('ReLU', '', '', [1,50], True),
    ('BatchNorm1d', 100, True, [1, 50], True),
    ('Linear', 51, True, [1, 1], False)
])

源码

module_summary

 module_summary (learn, *xb)

使用 `xb` 打印 `model` 的概览。

源码

Learner.summary

 Learner.summary ()

打印模型、优化器和损失函数的概览。

learn = synth_learner(model=_m())
learn.summary()

Sequential (Input shape: 16 x 1)
============================================================================
Layer (type)         Output Shape         Param #    Trainable 
============================================================================
                     16 x 50             
Linear                                    100        True      
ReLU                                                           
BatchNorm1d                               100        True      
____________________________________________________________________________
                     16 x 1              
Linear                                    51         True      
____________________________________________________________________________

Total params: 251
Total trainable params: 251
Total non-trainable params: 0

Optimizer used: functools.partial(<function SGD at 0x78dacd98c7c0>, mom=0.9)
Loss function: FlattenedLoss of MSELoss()

Callbacks:
  - TrainEvalCallback
  - CastToTensor
  - Recorder

激活图

源码

ActivationStats

 ActivationStats (with_hist=False, modules=None, every=None,
                  remove_end=True, is_forward=True, detach=True, cpu=True,
                  include_paramless=False, hook=None)

记录激活值的均值和标准差的回调。

learn = synth_learner(n_trn=5, cbs = ActivationStats(every=4))
learn.fit(1)

[0, 7.943600177764893, 8.535039901733398, '00:00']

learn.activation_stats.stats

(#2) [[{'mean': 1.3028467893600464, 'std': 0.32002925872802734, 'near_zero': 0.0}],[{'mean': 1.3026641607284546, 'std': 0.29966112971305847, 'near_zero': 0.0}]]

第一行包含训练集中每个批次模型输出的均值，第二行包含它们的标准差。

def test_every(n_tr, every):
    "create a learner, fit, then check number of stats collected"
    learn = synth_learner(n_trn=n_tr, cbs=ActivationStats(every=every))
    learn.fit(1)
    expected_stats_len = math.ceil(n_tr / every)
    test_eq(expected_stats_len, len(learn.activation_stats.stats))
    
for n_tr in [11, 12, 13]:
    test_every(n_tr, 4)
    test_every(n_tr, 1)

[0, 7.132676601409912, 6.505333423614502, '00:00']
[0, 30.60495376586914, 29.395254135131836, '00:00']
[0, 14.507355690002441, 10.65038013458252, '00:00']
[0, 12.470440864562988, 7.216660499572754, '00:00']
[0, 30.247482299804688, 25.165172576904297, '00:00']
[0, 6.672229290008545, 5.598482131958008, '00:00']