Retrain or Refactoring Approaches

Based on the type of downstream task it is feasible to change the original architecture of a LLM, or retrain conditional language models from scratch.
While this approach is promising it is also limited by the increased computing resource consumption and the lack of sufficient labeled data.

CTRL, @keskarCTRLConditionalTransformer2019, was an early attempt in this direction. It trains a LLM conditioned on a variety of control codes. The network model used in this approach is a Transformer and a piece of control code is added in front of the text corpus. This transform the original language model $p (x_{i} ∣ x_{< i})$ into $p (x_{i} ∣ x_{< i}, c)$ .

@zhengPretrainingBasedPersonalized2019 propose a LLM-based method to build a personalized dialogue agent. The model is based on the vanilla Transformer and its parameters are inherited from an existing LLM model. The personalized information (such as gender, location, etc. of the speaker) are represented as an attribute embedding which are added together with the token embeddings and the positional embeddings in the encoder-side of the model. To also guarantee that the decoder incorporates the target personal information in the decoding process an attention routing network is added by extending the functionalities of the multi-attention heads.

📚 Michele's Notes

Explorer

Retrain or Refactoring Approaches

Graph View

Backlinks