Tensorboard

tensorboard 集成
from nbdev import show_doc

首先,你需要安装 tensorboard,使用以下命令:

pip install tensorboard

然后使用以下命令启动 tensorboard:

tensorboard --logdir=runs

在你的终端中运行。你可以更改 logdir,只要它与你传递给 TensorBoardCallbacklog_dir 匹配即可(默认为工作目录下的 runs)。

支持 Tensorboard Embedding Projector

Tensorboard Embedding Projector 目前仅支持图像分类

训练期间导出图像特征

Tensorboard Embedding ProjectorTensorBoardCallback 中支持(设置参数 projector=True)。每个 epoch 后将写入验证集嵌入。

cbs = [TensorBoardCallback(projector=True)]
learn = vision_learner(dls, resnet18, metrics=accuracy)
learn.fit_one_cycle(3, cbs=cbs)

推理期间导出图像特征

要为自定义数据集(例如加载 learner 后)写入嵌入,请使用 TensorBoardProjectorCallback。将此回调函数手动添加到 learner。

learn = load_learner('path/to/export.pkl')
learn.add_cb(TensorBoardProjectorCallback())
dl = learn.dls.test_dl(files, with_labels=True)
_ = learn.get_preds(dl=dl)

如果使用自定义模型(非 fastai-resnet),请将应提取嵌入的层作为回调函数参数传递。

layer = learn.model[1][1]
cbs = [TensorBoardProjectorCallback(layer=layer)]
preds = learn.get_preds(dl=dl, cbs=cbs)

从语言模型中导出词嵌入

要从语言模型中导出词嵌入(已使用 AWD_LSTM (fast.ai) 和 GPT2 / BERT (transformers) 测试),这适用于包含嵌入层的任何模型。

对于 fast.ai TextLearner 或 LMLearner,只需传入 learner 即可 - 嵌入层和词汇表将自动提取

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
projector_word_embeddings(learn=learn, limit=2000, start=2000)

对于其他语言模型 - 例如 transformers 库 中的模型 - 你必须传入层和词汇表。这里有一个关于 BERT 模型的例子。

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

# get the word embedding layer
layer = model.embeddings.word_embeddings

# get and sort vocab
vocab_dict = tokenizer.get_vocab()
vocab = [k for k, v in sorted(vocab_dict.items(), key=lambda x: x[1])]

# write the embeddings for tb projector
projector_word_embeddings(layer=layer, vocab=vocab, limit=2000, start=2000)

来源

TensorBoardBaseCallback

 TensorBoardBaseCallback ()

通过在各种事件中更改 Learner 来处理训练循环调整的基础类


来源

TensorBoardCallback

 TensorBoardCallback (log_dir=None, trace_model=True, log_preds=True,
                      n_preds=9, projector=False, layer=None)

在训练期间保存模型拓扑、损失 & 度量指标到 tensorboard 和 tensorboard projector


来源

TensorBoardProjectorCallback

 TensorBoardProjectorCallback (log_dir=None, layer=None)

在推理期间提取并导出图像特征到 tensorboard projector


来源

projector_word_embeddings

 projector_word_embeddings (learn=None, layer=None, vocab=None, limit=-1,
                            start=0, log_dir=None)

从语言模型嵌入层中提取并导出词嵌入

TensorBoardCallback

from fastai.vision.all import Resize, RandomSubsetSplitter, aug_transforms, vision_learner, resnet18
path = untar_data(URLs.PETS)

db = DataBlock(blocks=(ImageBlock, CategoryBlock), 
                  get_items=get_image_files, 
                  item_tfms=Resize(128),
                  splitter=RandomSubsetSplitter(train_sz=0.1, valid_sz=0.01),
                  batch_tfms=aug_transforms(size=64),
                  get_y=using_attr(RegexLabeller(r'(.+)_\d+.*$'), 'name'))

dls = db.dataloaders(path/'images')
learn = vision_learner(dls, resnet18, metrics=accuracy)
learn.unfreeze()
learn.fit_one_cycle(3, cbs=TensorBoardCallback(Path.home()/'tmp'/'runs'/'tb', trace_model=True))
epoch 训练损失 验证损失 准确率 时间
0 4.973294 5.009670 0.082192 00:03
1 4.382769 4.438282 0.095890 00:03
2 3.877172 3.665855 0.178082 00:04

Projector

TensorBoardCallback 中的 Projector

path = untar_data(URLs.PETS)
db = DataBlock(blocks=(ImageBlock, CategoryBlock), 
                  get_items=get_image_files, 
                  item_tfms=Resize(128),
                  splitter=RandomSubsetSplitter(train_sz=0.05, valid_sz=0.01),
                  batch_tfms=aug_transforms(size=64),
                  get_y=using_attr(RegexLabeller(r'(.+)_\d+.*$'), 'name'))

dls = db.dataloaders(path/'images')
cbs = [TensorBoardCallback(log_dir=Path.home()/'tmp'/'runs'/'vision1', projector=True)]
learn = vision_learner(dls, resnet18, metrics=accuracy)
learn.unfreeze()
learn.fit_one_cycle(3, cbs=cbs)
epoch 训练损失 验证损失 准确率 时间
0 5.143322 6.736727 0.082192 00:03
1 4.508100 5.106580 0.109589 00:03
2 4.057889 4.194602 0.068493 00:03

TensorBoardProjectorCallback

path = untar_data(URLs.PETS)
db = DataBlock(blocks=(ImageBlock, CategoryBlock), 
                  get_items=get_image_files, 
                  item_tfms=Resize(128),
                  splitter=RandomSubsetSplitter(train_sz=0.1, valid_sz=0.01),
                  batch_tfms=aug_transforms(size=64),
                  get_y=using_attr(RegexLabeller(r'(.+)_\d+.*$'), 'name'))

dls = db.dataloaders(path/'images')
files = get_image_files(path/'images')
files = files[:256]
dl = learn.dls.test_dl(files, with_labels=True)
learn = vision_learner(dls, resnet18, metrics=accuracy)
layer = learn.model[1][0].ap
cbs = [TensorBoardProjectorCallback(layer=layer, log_dir=Path.home()/'tmp'/'runs'/'vision2')]
_ = learn.get_preds(dl=dl, cbs=cbs)

projector_word_embeddings

fastai 文本或语言模型 learner

from fastai.text.all import TextDataLoaders, text_classifier_learner, AWD_LSTM
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
projector_word_embeddings(learn, limit=1000, log_dir=Path.home()/'tmp'/'runs'/'text')

transformers

GPT2

from transformers import GPT2LMHeadModel, GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
layer = model.transformer.wte
vocab_dict = tokenizer.get_vocab()
vocab = [k for k, v in sorted(vocab_dict.items(), key=lambda x: x[1])]

projector_word_embeddings(layer=layer, vocab=vocab, limit=2000, log_dir=Path.home()/'tmp'/'runs'/'transformers')

BERT

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

layer = model.embeddings.word_embeddings

vocab_dict = tokenizer.get_vocab()
vocab = [k for k, v in sorted(vocab_dict.items(), key=lambda x: x[1])]

projector_word_embeddings(layer=layer, vocab=vocab, limit=2000, start=2000, log_dir=Path.home()/'tmp'/'runs'/'transformers')
warning: Embedding dir exists, did you set global_step for add_embedding()?

在 tensorboard 中验证结果

在命令行中运行以下命令,检查 projector 嵌入是否已正确写入

tensorboard --logdir=~/tmp/runs

在浏览器中打开 http://localhost:6006(TensorBoard Projector 在 Safari 中无法正常工作!)