2 of 21

Variational Auto encoders

Variational Autoencoders (VAEs) are generative models that learn a smooth, probabilistic latent space, allowing them not only to compress and reconstruct data but also to generate entirely new, realistic samples. VAEs capture the underlying structure of a dataset and produce outputs that closely resemble the original data.
Learns a continuous latent representation
Enables controlled and meaningful data generation
Widely used in image synthesis, anomaly detection, and representation learning
Learns a probabilistic latent space (mean & variance).

3 of 21

�Architecture of Variational Autoencoder�

5 of 21

VAE is a special kind of autoencoder that can generate new data instead of just compressing and reconstructing it.

It has three main parts:

1. Encoder (Understanding the Input)

2. Latent Space (Adding Some Randomness)

3. Decoder (Reconstructing or Creating New Data)

6 of 21

�1. Encoder (Understanding the Input)�

The encoder takes input data like images or text and learns its key features. Instead of outputting one fixed value, it produces two vectors for each feature:

Mean (μ): A central value representing the data.
Standard Deviation (σ): It is a measure of how much the values can vary.

These two values define a range of possibilities instead of a single number.

7 of 21

��2. Latent Space (Adding Some Randomness)�

Instead of encoding the input as one fixed point it pick a random point within the range given by the mean and standard deviation.
This randomness lets the model create slightly different versions of data which is useful for generating new, realistic samples.

z∼N(μ,σ2)

8 of 21

�3. Decoder (Reconstructing or Creating New Data)�

The decoder takes the random sample from the latent space and tries to reconstruct the original input.
Since the encoder gives a range, the decoder can produce new data that is similar but not identical to what it has seen.

9 of 21

Transformers

A Transformer is a deep learning architecture designed to handle sequential data such as text. It uses a mechanism called Self-Attention to understand relationships between words in a sentence. It was introduced in the paper Attention Is All You Need.
Introduced in the 2017 paper "Attention Is All You Need", they have become the backbone of state-of-the-art models like BERT, GPT, and Vision Transformers.
They excel at capturing long-range dependencies and processing sequences in parallel, making them faster and more scalable than RNNs or LSTMs.

13 of 21

Components of Transformer

Self-Attention Mechanism:
Feed-Forward Neural Network:
Multi-Head Attention
Positional Encoding

14 of 21

��Architecture of Transformer��

The Transformer model consists of two main components:

1. Encoder

Takes input sequence (sentence)
Converts words into vector representations (embeddings)
Uses self-attention and feed-forward layers
Captures context of each word

Encoder consists of multiple layers and each layer is composed of two main sub-layers:

Self-Attention Mechanism
Feed-Forward Neural Network

15 of 21

��

2. Decoder

Generates output sequence
Uses encoder output + its own self-attention
Used in tasks like translation and text generation

Each decoder layer consists of three main sub-layers:

1.Masked Self-Attention Mechanism

2.Encoder-Decoder Attention Mechanism

3.Feed-Forward Neural Network

16 of 21

��How Transformers Work�

1. Input Representation

The first step in processing input data involves converting raw text into a format that the transformer model can understand. This involves tokenization and embedding.

Tokenization: The input text is split into smaller units called tokens, which can be words, sub words or characters. Tokenization ensures that the text is broken down into manageable pieces.
Embedding: Each token is then converted into a fixed-size vector using an embedding layer. This layer maps each token to a dense vector representation that captures its semantic meaning.
Positional encodings: are added to these embeddings to provide information about the token positions within the sequence.

17 of 21

2. Encoder Process in Transformers

Input Embedding: The input sequence is tokenized and converted into embeddings with positional encodings added.
Self-Attention Mechanism: Each token in the input sequence attends to every other token to capture dependencies and contextual information.
Feed-Forward Network: The output from the self-attention mechanism is passed through a position-wise feed-forward network.
Layer Normalization and Residual Connections: Layer normalization and residual connections are applied.

18 of 21

3. Decoder Process

Input Embedding and Positional Encoding: The partially generated output sequence is tokenized and embedded with positional encodings added.
Masked Self-Attention Mechanism: The decoder uses masked self-attention to prevent attending to future tokens ensuring that the model generates the sequence step-by-step.
Encoder-Decoder Attention Mechanism: The decoder attends to the encoder's output allowing it to focus on relevant parts of the input sequence.
Feed-Forward Network: Similar to the encoder the output from the attention mechanisms is passed through a position-wise feed-forward network.
Layer Normalization and Residual Connections: Similar to the encoder Layer normalization and residual connections are applied.

19 of 21

4. Training and Inference

Training

Uses Teacher Forcing (correct previous tokens are given)
Predicts next word based on actual previous words
Uses loss function (e.g., cross-entropy) for learning
Faster training due to parallel processing
Based on Attention Is All You Need architecture

Inference

No teacher forcing (model uses its own predictions)

Generates output token by token (step-by-step)

Uses techniques like:
Greedy Search
Beam Search

21 of 21

��Applications�

Machine Translation
Chatbots (e.g., ChatGPT)
Text Summarization
Question Answering
Speech Recognition

1 of 21