fast.ai ULMFiT helpers to easily use pretrained models

Get model and vocab files from path.

Tokenizer

Get tokenizer from model-config. Tokenizer parameters in model.json will be passed to the Tokenizer. As of now SentencePiece and Spacy are supported.

tokenizer_from_pretrained[source]

tokenizer_from_pretrained(url, pretrained=False, backwards=False, **kwargs)

Language Model Learner

Create langauge_model_learner from pretrained model-URL. All parameters will be passed to language_model_learner. The following parameters are set automatically: arch, pretrained and pretrained_fnames. By default accuracy and perplexity are passed as metrics.

language_model_from_pretrained[source]

language_model_from_pretrained(dls, url=None, backwards=False, metrics=None, config=None, drop_mult=1.0, pretrained=True, pretrained_fnames=None, loss_func=None, opt_func=Adam, lr=0.001, splitter=trainable_params, cbs=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95, 0.85, 0.95))

Saves the following model files to path:

  • Model (lm_model.pth)
  • Encoder (lm_encoder.pth)
  • Vocab from dataloaders (lm_vocab.pkl)
  • SentencePieceModel (spm/)

LMLearner.save_lm[source]

LMLearner.save_lm(x:LMLearner, path=None, with_encoder=True, backwards=False)

Text Classifier

#    path = _get_model_path(learn, path)
#    with open((path/'lm_vocab.pkl').absolute(), 'rb') as f:
#        return pickle.load(f)
#    path = _get_model_path(learn, path)

Create text_classifier_learner from fine-tuned model path (saved with learn.save_lm()).

text_classifier_from_lm[source]

text_classifier_from_lm(dls, path=None, backwards=False, seq_len=72, config=None, pretrained=True, drop_mult=0.5, n_out=None, lin_ftrs=None, ps=None, max_len=1440, y_range=None, loss_func=None, opt_func=Adam, lr=0.001, splitter=trainable_params, cbs=None, metrics=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95, 0.85, 0.95))

Tests - Tokenizer, LM and Classifier