from nbdev import show_doc
Tensorboard
首先,你需要安装 tensorboard,使用以下命令:
pip install tensorboard
然后使用以下命令启动 tensorboard:
tensorboard --logdir=runs
在你的终端中运行。你可以更改 logdir,只要它与你传递给 TensorBoardCallback
的 log_dir
匹配即可(默认为工作目录下的 runs
)。
支持 Tensorboard Embedding Projector
Tensorboard Embedding Projector 目前仅支持图像分类
训练期间导出图像特征
Tensorboard Embedding Projector 在 TensorBoardCallback
中支持(设置参数 projector=True
)。每个 epoch 后将写入验证集嵌入。
cbs = [TensorBoardCallback(projector=True)]
learn = vision_learner(dls, resnet18, metrics=accuracy)
learn.fit_one_cycle(3, cbs=cbs)
推理期间导出图像特征
要为自定义数据集(例如加载 learner 后)写入嵌入,请使用 TensorBoardProjectorCallback
。将此回调函数手动添加到 learner。
learn = load_learner('path/to/export.pkl')
learn.add_cb(TensorBoardProjectorCallback())
dl = learn.dls.test_dl(files, with_labels=True)
_ = learn.get_preds(dl=dl)
如果使用自定义模型(非 fastai-resnet),请将应提取嵌入的层作为回调函数参数传递。
layer = learn.model[1][1]
cbs = [TensorBoardProjectorCallback(layer=layer)]
preds = learn.get_preds(dl=dl, cbs=cbs)
从语言模型中导出词嵌入
要从语言模型中导出词嵌入(已使用 AWD_LSTM (fast.ai) 和 GPT2 / BERT (transformers) 测试),这适用于包含嵌入层的任何模型。
对于 fast.ai TextLearner 或 LMLearner,只需传入 learner 即可 - 嵌入层和词汇表将自动提取
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
projector_word_embeddings(learn=learn, limit=2000, start=2000)
对于其他语言模型 - 例如 transformers 库 中的模型 - 你必须传入层和词汇表。这里有一个关于 BERT 模型的例子。
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
# get the word embedding layer
layer = model.embeddings.word_embeddings
# get and sort vocab
vocab_dict = tokenizer.get_vocab()
vocab = [k for k, v in sorted(vocab_dict.items(), key=lambda x: x[1])]
# write the embeddings for tb projector
projector_word_embeddings(layer=layer, vocab=vocab, limit=2000, start=2000)
TensorBoardBaseCallback
TensorBoardBaseCallback ()
通过在各种事件中更改 Learner
来处理训练循环调整的基础类
TensorBoardCallback
TensorBoardCallback (log_dir=None, trace_model=True, log_preds=True, n_preds=9, projector=False, layer=None)
在训练期间保存模型拓扑、损失 & 度量指标到 tensorboard 和 tensorboard projector
TensorBoardProjectorCallback
TensorBoardProjectorCallback (log_dir=None, layer=None)
在推理期间提取并导出图像特征到 tensorboard projector
projector_word_embeddings
projector_word_embeddings (learn=None, layer=None, vocab=None, limit=-1, start=0, log_dir=None)
从语言模型嵌入层中提取并导出词嵌入
TensorBoardCallback
from fastai.vision.all import Resize, RandomSubsetSplitter, aug_transforms, vision_learner, resnet18
= untar_data(URLs.PETS)
path
= DataBlock(blocks=(ImageBlock, CategoryBlock),
db =get_image_files,
get_items=Resize(128),
item_tfms=RandomSubsetSplitter(train_sz=0.1, valid_sz=0.01),
splitter=aug_transforms(size=64),
batch_tfms=using_attr(RegexLabeller(r'(.+)_\d+.*$'), 'name'))
get_y
= db.dataloaders(path/'images') dls
= vision_learner(dls, resnet18, metrics=accuracy) learn
learn.unfreeze()3, cbs=TensorBoardCallback(Path.home()/'tmp'/'runs'/'tb', trace_model=True)) learn.fit_one_cycle(
epoch | 训练损失 | 验证损失 | 准确率 | 时间 |
---|---|---|---|---|
0 | 4.973294 | 5.009670 | 0.082192 | 00:03 |
1 | 4.382769 | 4.438282 | 0.095890 | 00:03 |
2 | 3.877172 | 3.665855 | 0.178082 | 00:04 |
Projector
TensorBoardCallback 中的 Projector
= untar_data(URLs.PETS) path
= DataBlock(blocks=(ImageBlock, CategoryBlock),
db =get_image_files,
get_items=Resize(128),
item_tfms=RandomSubsetSplitter(train_sz=0.05, valid_sz=0.01),
splitter=aug_transforms(size=64),
batch_tfms=using_attr(RegexLabeller(r'(.+)_\d+.*$'), 'name'))
get_y
= db.dataloaders(path/'images') dls
= [TensorBoardCallback(log_dir=Path.home()/'tmp'/'runs'/'vision1', projector=True)]
cbs = vision_learner(dls, resnet18, metrics=accuracy) learn
learn.unfreeze()3, cbs=cbs) learn.fit_one_cycle(
epoch | 训练损失 | 验证损失 | 准确率 | 时间 |
---|---|---|---|---|
0 | 5.143322 | 6.736727 | 0.082192 | 00:03 |
1 | 4.508100 | 5.106580 | 0.109589 | 00:03 |
2 | 4.057889 | 4.194602 | 0.068493 | 00:03 |
TensorBoardProjectorCallback
= untar_data(URLs.PETS) path
= DataBlock(blocks=(ImageBlock, CategoryBlock),
db =get_image_files,
get_items=Resize(128),
item_tfms=RandomSubsetSplitter(train_sz=0.1, valid_sz=0.01),
splitter=aug_transforms(size=64),
batch_tfms=using_attr(RegexLabeller(r'(.+)_\d+.*$'), 'name'))
get_y
= db.dataloaders(path/'images') dls
= get_image_files(path/'images')
files = files[:256] files
= learn.dls.test_dl(files, with_labels=True) dl
= vision_learner(dls, resnet18, metrics=accuracy)
learn = learn.model[1][0].ap
layer = [TensorBoardProjectorCallback(layer=layer, log_dir=Path.home()/'tmp'/'runs'/'vision2')] cbs
= learn.get_preds(dl=dl, cbs=cbs) _
projector_word_embeddings
fastai 文本或语言模型 learner
from fastai.text.all import TextDataLoaders, text_classifier_learner, AWD_LSTM
= TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
dls = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy) learn
=1000, log_dir=Path.home()/'tmp'/'runs'/'text') projector_word_embeddings(learn, limit
transformers
GPT2
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
= GPT2TokenizerFast.from_pretrained('gpt2')
tokenizer = GPT2LMHeadModel.from_pretrained('gpt2')
model = model.transformer.wte
layer = tokenizer.get_vocab()
vocab_dict = [k for k, v in sorted(vocab_dict.items(), key=lambda x: x[1])]
vocab
=layer, vocab=vocab, limit=2000, log_dir=Path.home()/'tmp'/'runs'/'transformers') projector_word_embeddings(layer
BERT
from transformers import AutoTokenizer, AutoModel
= AutoTokenizer.from_pretrained("bert-base-uncased")
tokenizer = AutoModel.from_pretrained("bert-base-uncased")
model
= model.embeddings.word_embeddings
layer
= tokenizer.get_vocab()
vocab_dict = [k for k, v in sorted(vocab_dict.items(), key=lambda x: x[1])]
vocab
=layer, vocab=vocab, limit=2000, start=2000, log_dir=Path.home()/'tmp'/'runs'/'transformers') projector_word_embeddings(layer
warning: Embedding dir exists, did you set global_step for add_embedding()?
在 tensorboard 中验证结果
在命令行中运行以下命令,检查 projector 嵌入是否已正确写入
tensorboard --logdir=~/tmp/runs
在浏览器中打开 http://localhost:6006(TensorBoard Projector 在 Safari 中无法正常工作!)