Deep Learning - Invidious

Deep Learning

DataMListic | 38 videos | Updated 1 week ago

View playlist on YouTube | Switch Invidious Instance

3:56

Why Batch Normalization (batchnorm) Works

DataMListic

6:06

Capsule Networks Explained

DataMListic

5:37

Why Deep Neural Networks (DNNs) Underperform Tree-Based Models on Tabular Data

DataMListic

8:19

AMSGrad - Why Adam FAILS to Converge

DataMListic

4:29

Why Neural Networks Can Learn Any Function

DataMListic

4:58

Why Residual Connections (ResNet) Work

DataMListic

5:37

Deep by Design: Why Depth Matters in Neural Networks

DataMListic

9:01

Why ReLU Is Better Than Other Activation Functions | Tanh Saturating Gradients

DataMListic

12:24

Why The Reset Gate is Necessary in GRUs

DataMListic

3:44

Why Recurrent Neural Networks (RNN) Suffer from Vanishing Gradients - Part 2

DataMListic

2:42

Why We Need Activation Functions In Neural Networks

DataMListic

4:02

Why Convolutional Neural Networks Are Not Permuation Invariant

DataMListic

12:58

Why Recurrent Neural Networks Suffer from Vanishing Gradients - Part 1

DataMListic

7:24

Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained

DataMListic

4:03

Low-Rank Adaptation (LoRA) Explained

DataMListic

9:17

Gated Recurrent Unit (GRU) Equations Explained

DataMListic

11:05

Long Short-Term Memory (LSTM) Equations Explained

DataMListic

8:11

LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p

DataMListic

3:40

Two Towers vs Siamese Networks vs Triplet Loss - Compute Comparable Embeddings

DataMListic

5:14

LLM Tokenizers Explained: BPE Encoding, WordPiece and SentencePiece

DataMListic

13:59

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits - Paper Explained

DataMListic

27:43

Chain-of-Verification (COVE) Reduces Hallucination in Large Language Models - Paper Explained

DataMListic

20:28

RLHF: Training Language Models to Follow Instructions with Human Feedback - Paper Explained

DataMListic

3:36

BART Explained: Denoising Sequence-to-Sequence Pre-training

DataMListic

3:51

Sliding Window Attention (Longformer) Explained

DataMListic

5:48

BLEU Score Explained

DataMListic

3:27

ROUGE Score Explained

DataMListic

8:03

Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained

DataMListic

4:11

Overfitting vs Underfitting - Explained

DataMListic

2:37

Why L1 Regularization Produces Sparse Weights (Geometric Intuition)

DataMListic

3:59

Dropout in Neural Networks - Explained

DataMListic

4:13

Why Neural Networks Need Random Weight Initialization

DataMListic

4:27

Cross-Entropy - Explained

DataMListic

8:07

The Curse of Dimensionality

DataMListic

5:44

An Introduction to Graph Neural Networks

DataMListic

5:05

MMaDA: Multimodal Large Diffusion Language Models - Paper Walkthrough

DataMListic

4:52

Google's AlphaEvolve - Paper Walkthrough

DataMListic

5:15

SAM2: Segment Anything in Images and Videos - Paper Walkthrough

DataMListic