What is a (Fully Connected) Feed Forward Network?
The situation where the model is able to respond to an input without ever having been trained on it.
What is zero-shot transfer?
The methods of calculating how important a word is from the sentence in a Transformer architecture.
What is attention?
This is the most popular encoder-decoder architecture that heavily relies on attention.
What is a Transformer?
The model has never been trained but is able to complete a task by only seeing a few examples.
What is few-shot transfer?
The process of pre-training a model with masking a words from the sequence and letting the model predict the missing word.
What is a masked or discriminative language model?
This architecture looks at one token at a time and remembers information from previous time steps/tokens.
What is a Recurrent Neural Network?
The process of adapting a pretrained language model to a specific task by training it further on this specific task.
What is fine-tuning?
One algorithm of encoding subword vocabularies for training/pretraining neural networks.
What is Byte-Pair Encoding?