核心文本模块

包含不同架构之间通用的模块以及获取模型的通用函数

语言模型

LinearDecoder

 LinearDecoder (n_out:int, n_hid:int, output_p:float=0.1,
                tie_encoder:nn.Module=None, bias:bool=True)

用于位于 RNNCore 模块之上并创建一个语言模型。

	类型	默认值	详情
n_out	int		输出通道数
n_hid	int		编码器最后一层输出的特征数
output_p	float	0.1	输入 dropout 概率
tie_encoder	模块	无	如果提供了模块，则会将解码器权重与 `tie_encoder.weight` 绑定
偏置	bool	真	如果为 `False`，则该层将不学习加性偏置

from fastai.text.models.awdlstm import *

enc = AWD_LSTM(100, 20, 10, 2)
x = torch.randint(0, 100, (10,5))
r = enc(x)

tst = LinearDecoder(100, 20, 0.1)
y = tst(r)
test_eq(y[1], r)
test_eq(y[2].shape, r.shape)
test_eq(y[0].shape, [10, 5, 100])

tst = LinearDecoder(100, 20, 0.1, tie_encoder=enc.encoder)
test_eq(tst.decoder.weight, enc.encoder.weight)

源代码

SequentialRNN

 SequentialRNN (*args)

一个将重置调用传递给其子模块的序贯模块。

class _TstMod(Module):
    def reset(self): print('reset')

tst = SequentialRNN(_TstMod(), _TstMod())
test_stdout(tst.reset, 'reset\nreset')

源代码

get_language_model

 get_language_model (arch, vocab_sz:int, config:dict=None,
                     drop_mult:float=1.0)

根据 arch 及其 config 创建一个语言模型。

	类型	默认值	详情
arch			可以生成语言模型架构的函数或类
vocab_sz	int		词汇表大小
config	dict	无	模型配置字典
drop_mult	float	1.0	用于缩放 `config` 中所有 dropout 概率的乘法因子
返回值	SequentialRNN		带有 `arch` 编码器和线性解码器的语言模型

使用的默认 config 可以在 _model_meta[arch]['config_lm'] 中找到。drop_mult 应用于该 config 中所有 dropout 概率。

config = awd_lstm_lm_config.copy()
config.update({'n_hid':10, 'emb_sz':20})

tst = get_language_model(AWD_LSTM, 100, config=config)
x = torch.randint(0, 100, (10,5))
y = tst(x)
test_eq(y[0].shape, [10, 5, 100])
test_eq(y[1].shape, [10, 5, 20])
test_eq(y[2].shape, [10, 5, 20])
test_eq(tst[1].decoder.weight, tst[0].encoder.weight)

#test drop_mult
tst = get_language_model(AWD_LSTM, 100, config=config, drop_mult=0.5)
test_eq(tst[1].output_dp.p, config['output_p']*0.5)
for rnn in tst[0].rnns: test_eq(rnn.weight_p, config['weight_p']*0.5)
for dp in tst[0].hidden_dps: test_eq(dp.p, config['hidden_p']*0.5)
test_eq(tst[0].encoder_dp.embed_p, config['embed_p']*0.5)
test_eq(tst[0].input_dp.p, config['input_p']*0.5)

分类模型

源代码

SentenceEncoder

 SentenceEncoder (bptt:int, module:nn.Module, pad_idx:int=1,
                  max_len:int=None)

在 module 上创建一个可以处理完整句子的编码器。

	类型	默认值	详情
bptt	int		随时间反向传播
module	模块		一个可以处理最多 [`bs`, `bptt`] 个 token 的模块
pad_idx	int	1	填充 token id
max_len	int	无	最大输出长度

警告

此模块期望输入数据首先填充大部分填充字符，序列开始于 bptt 的整数倍位置（其余填充字符在末尾）。使用 pad_input_chunk 将数据转换为适合的格式。

mod = nn.Embedding(5, 10)
tst = SentenceEncoder(5, mod, pad_idx=0)
x = torch.randint(1, 5, (3, 15))
x[2,:5]=0
out,mask = tst(x)

test_eq(out[:1], mod(x)[:1])
test_eq(out[2,5:], mod(x)[2,5:])
test_eq(mask, x==0)

源代码

masked_concat_pool

 masked_concat_pool (output:torch.Tensor, mask:torch.Tensor, bptt:int)

将 MultiBatchEncoder 的输出池化为一个向量 [last_hidden, max_pool, avg_pool]

	类型	详情
输出	Tensor	句子编码器输出
mask	Tensor	句子编码器返回的布尔掩码
bptt	int	随时间反向传播
返回值	Tensor	[last_hidden, max_pool, avg_pool] 的拼接

out = torch.randn(2,4,5)
mask = tensor([[True,True,False,False], [False,False,False,True]])
x = masked_concat_pool(out, mask, 2)

test_close(x[0,:5], out[0,-1])
test_close(x[1,:5], out[1,-2])
test_close(x[0,5:10], out[0,2:].max(dim=0)[0])
test_close(x[1,5:10], out[1,:3].max(dim=0)[0])
test_close(x[0,10:], out[0,2:].mean(dim=0))
test_close(x[1,10:], out[1,:3].mean(dim=0))

#Test the result is independent of padding by replacing the padded part by some random content
out1 = torch.randn(2,4,5)
out1[0,2:] = out[0,2:].clone()
out1[1,:3] = out[1,:3].clone()
x1 = masked_concat_pool(out1, mask, 2)
test_eq(x, x1)

源代码

PoolingLinearClassifier

 PoolingLinearClassifier (dims:list, ps:list, bptt:int,
                          y_range:tuple=None)

创建带池化的线性分类器

	类型	默认值	详情
dims	列表		MLP 的隐藏层大小列表，格式为 `int`
ps	列表		dropout 概率列表，格式为 `float`
bptt	int		随时间反向传播
y_range	元组	无	输出值边界的元组 (低, 高)

mod = nn.Embedding(5, 10)
tst = SentenceEncoder(5, mod, pad_idx=0)
x = torch.randint(1, 5, (3, 15))
x[2,:5]=0
out,mask = tst(x)

test_eq(out[:1], mod(x)[:1])
test_eq(out[2,5:], mod(x)[2,5:])
test_eq(mask, x==0)

源代码

get_text_classifier

 get_text_classifier (arch:Callable, vocab_sz:int, n_class:int,
                      seq_len:int=72, config:dict=None,
                      drop_mult:float=1.0, lin_ftrs:list=None,
                      ps:list=None, pad_idx:int=1, max_len:int=1440,
                      y_range:tuple=None)

根据 arch 及其 config 创建一个文本分类器，可能使用 pretrained

	类型	默认值	详情
arch	可调用对象		可以生成语言模型架构的函数或类
vocab_sz	int		词汇表大小
n_class	int		类别数
seq_len	int	72	随时间反向传播
config	dict	无	编码器配置字典
drop_mult	float	1.0	用于缩放 `config` 中所有 dropout 概率的乘法因子
lin_ftrs	列表	无	分类器头部隐藏层大小列表，格式为 `int`
ps	列表	无	分类器头部 dropout 概率列表，格式为 `float`
pad_idx	int	1	填充 token id
max_len	int	1440	`SentenceEncoder` 的最大输出长度
y_range	元组	无	输出值边界的元组 (低, 高)

config = awd_lstm_clas_config.copy()
config.update({'n_hid':10, 'emb_sz':20})

tst = get_text_classifier(AWD_LSTM, 100, 3, config=config)
x = torch.randint(2, 100, (10,5))
y = tst(x)
test_eq(y[0].shape, [10, 3])
test_eq(y[1].shape, [10, 5, 20])
test_eq(y[2].shape, [10, 5, 20])

#test padding gives same results
tst.eval()
y = tst(x)
x1 = torch.cat([x, tensor([2,1,1,1,1,1,1,1,1,1])[:,None]], dim=1)
y1 = tst(x1)
test_close(y[0][1:],y1[0][1:])

#test drop_mult
tst = get_text_classifier(AWD_LSTM, 100, 3, config=config, drop_mult=0.5)
test_eq(tst[1].layers[1][1].p, 0.1)
test_eq(tst[1].layers[0][1].p, config['output_p']*0.5)
for rnn in tst[0].module.rnns: test_eq(rnn.weight_p, config['weight_p']*0.5)
for dp in tst[0].module.hidden_dps: test_eq(dp.p, config['hidden_p']*0.5)
test_eq(tst[0].module.encoder_dp.embed_p, config['embed_p']*0.5)
test_eq(tst[0].module.input_dp.p, config['input_p']*0.5)