Course Link: How Transformer LLMs Work

2025-08-14

  • chapter: understanding language models: language as a Bag-of-Words
    • non-transformer, encoder-only, decoder-only, encoder-decoder
      • decoder-only: such GPT
    • tokenization -> tokens -> vocabulary -> vector embeddings
  • chapter: understanding language models: (word) embeddings
    • word2vec, the way to express the natural meaning is an array of floats
      • such as cats [.91, -.11, .19 … ]
    • types of embeddings
  • chapter: understanding language models: encoding and decoding context with attention
    • recurrent neural networks (RNNs)
      • key applications: natural language processing (translate, text generation, sentiment analysis)
      • speech recognition
      • time series prediction (weather, stock price)
    • autoregressive
      • meaning: the model predicts the current (or future) value based on past values, and the prediction itself can be fed back as input for subsequent predictions
    • attention
      • allows a model to focus on parts of the input that are relevant to one another
  • chapter: understanding language models: transformers
    • alltention is all your need (paper)
    • transformer – a new architecture solely based on attention and without the RNN
    • self-attention
    • representation models, like embedding models
      • bidirectional encoder from transformers (BERT), classification
      • pre-training on large dataset -> fine-tune for downstream tasks: classification, named entity recognition, paraphrase identification
    • generative models, like GPT
    • context length (parameters)
Share:

Leave a Reply

Your email address will not be published. Required fields are marked *

It might take a few hours to show up the new comment because of our caching system.

This site uses Akismet to reduce spam. Learn how your comment data is processed.