20:23
Connectionist Temporal Classification (CTC) Explained
DataMListic
11:05
Long Short-Term Memory (LSTM) Equations Explained
9:29
Transformer Self-Attention Mechanism Visualized
8:59
Term Frequency Inverse Document Frequency (TF-IDF) Explained
9:17
Gated Recurrent Unit (GRU) Equations Explained
9:04
ReLU Activation Function Variants Explained
3:55
Model Calibration - Estimated Calibration Error (ECE) Explained
5:58
Mel Frequency Cepstral Coefficients (MFCC) Explained
7:08
Multivariate Normal (Gaussian) Distribution Explained
3:27
AdamW - L2 Regularization vs Weight Decay
4:23
Bagging vs Boosting - Ensemble Learning In Machine Learning Explained
2:58
Spectral Features - Deltas and Delta-Deltas Explained
4:02
Measuring Artificial Intelligence (AI) Fairness - Disparate Impact Explained
4:09
Gradient Boosting with Regression Trees Explained
4:18
Model Calibration - Brier Score Explained
7:18
ROC Curve - how to select the BEST threshold
5:40
Object Detection Part 1: R-CNN, Sliding Window and Selective Search
6:06
Capsule Networks Explained
10:15
Fourier Transform Formula Explained
4:49
Gaussian Mixture Models (GMM) Explained
2:55
XGBoost Explained in Under 3 Minutes
4:54
Discrete Fourier Transform (DFT and IDFT) Explained in Python
4:03
Low-Rank Adaptation (LoRA) Explained
4:32
Kabsch-Umeyama Algorithm - How to Align Point Patterns
7:24
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained
7:35
Eigendecomposition Explained
4:36
Covariance and Correlation Explained
3:21
Kullback-Leibler (KL) Divergence Mathematics Explained
8:04
P-Values Explained
8:11
LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p
3:40
Two Towers vs Siamese Networks vs Triplet Loss - Compute Comparable Embeddings
3:15
Spearman Correlation Explained in 3 Minutes
2:44
Word Error Rate (WER) Explained - Measuring the performance of speech recognition systems
Hyperparameters Tuning: Grid Search vs Random Search
5:14
LLM Tokenizers Explained: BPE Encoding, WordPiece and SentencePiece
3:36
BART Explained: Denoising Sequence-to-Sequence Pre-training
3:51
Sliding Window Attention (Longformer) Explained
3:38
Cross-Validation Explained
5:48
BLEU Score Explained
ROUGE Score Explained
Singular Value Decomposition (SVD) Explained
8:03
Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained
Least Squares vs Maximum Likelihood
8:10
The Bitter Lesson in AI...
4:04
L1 vs L2 Regularization
4:11
Overfitting vs Underfitting - Explained
2:37
Why L1 Regularization Produces Sparse Weights (Geometric Intuition)
Content-Based Recommendations - Recommender Systems Part 1
Collaborative Filtering - Recommender Systems Part 2
3:41
Real-World Challenges in Recommender Systems - Recommender Systems Part 3
3:59
Dropout in Neural Networks - Explained
4:13
Why Neural Networks Need Random Weight Initialization
4:27
Cross-Entropy - Explained
8:07
The Curse of Dimensionality
9:03
The Kernel Trick
4:22
RBF Kernel Explained: Mapping Data to Infinite Dimensions
8:15
Bayesian Optimization
9:33
Gaussian Processes
5:44
An Introduction to Graph Neural Networks
5:53
Introduction to HMMs | Hidden Markov Models Part 1
4:52
Google's AlphaEvolve - Paper Walkthrough
5:15
SAM2: Segment Anything in Images and Videos - Paper Walkthrough
5:18
Perception Encoder - Paper Walkthrough
8:02
t-SNE - Explained
8:21
Student's t-Distribution - Explained
9:01
Magistral - Paper Walkthrough
5:35
Variational Inference - Explained
5:31
Gamma Function - Explained
LIVE
[Private video]