3:56
Why Batch Normalization (batchnorm) Works
DataMListic
6:06
Capsule Networks Explained
5:37
Why Deep Neural Networks (DNNs) Underperform Tree-Based Models on Tabular Data
8:19
AMSGrad - Why Adam FAILS to Converge
4:29
Why Neural Networks Can Learn Any Function
4:58
Why Residual Connections (ResNet) Work
Deep by Design: Why Depth Matters in Neural Networks
9:01
Why ReLU Is Better Than Other Activation Functions | Tanh Saturating Gradients
12:24
Why The Reset Gate is Necessary in GRUs
3:44
Why Recurrent Neural Networks (RNN) Suffer from Vanishing Gradients - Part 2
2:42
Why We Need Activation Functions In Neural Networks
4:02
Why Convolutional Neural Networks Are Not Permuation Invariant
12:58
Why Recurrent Neural Networks Suffer from Vanishing Gradients - Part 1
7:24
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained
4:03
Low-Rank Adaptation (LoRA) Explained
9:17
Gated Recurrent Unit (GRU) Equations Explained
11:05
Long Short-Term Memory (LSTM) Equations Explained
8:11
LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p
3:40
Two Towers vs Siamese Networks vs Triplet Loss - Compute Comparable Embeddings
5:14
LLM Tokenizers Explained: BPE Encoding, WordPiece and SentencePiece
13:59
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits - Paper Explained
27:43
Chain-of-Verification (COVE) Reduces Hallucination in Large Language Models - Paper Explained
20:28
RLHF: Training Language Models to Follow Instructions with Human Feedback - Paper Explained
3:36
BART Explained: Denoising Sequence-to-Sequence Pre-training
3:51
Sliding Window Attention (Longformer) Explained
5:48
BLEU Score Explained
3:27
ROUGE Score Explained
8:03
Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained
4:11
Overfitting vs Underfitting - Explained
2:37
Why L1 Regularization Produces Sparse Weights (Geometric Intuition)
3:59
Dropout in Neural Networks - Explained
4:13
Why Neural Networks Need Random Weight Initialization
4:27
Cross-Entropy - Explained
8:07
The Curse of Dimensionality
5:44
An Introduction to Graph Neural Networks
5:05
MMaDA: Multimodal Large Diffusion Language Models - Paper Walkthrough
4:52
Google's AlphaEvolve - Paper Walkthrough
5:15
SAM2: Segment Anything in Images and Videos - Paper Walkthrough