5:55
Why LLMs Hallucinate
DataMListic
4:03
Low-Rank Adaptation (LoRA) Explained
7:24
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained
9:29
Transformer Self-Attention Mechanism Visualized
8:11
LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p
26:16
Jailbroken: How Does LLM Safety Training Fail? - Paper Explained
5:14
LLM Tokenizers Explained: BPE Encoding, WordPiece and SentencePiece
13:59
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits - Paper Explained
27:43
Chain-of-Verification (COVE) Reduces Hallucination in Large Language Models - Paper Explained
20:28
RLHF: Training Language Models to Follow Instructions with Human Feedback - Paper Explained
3:36
BART Explained: Denoising Sequence-to-Sequence Pre-training
3:51
Sliding Window Attention (Longformer) Explained
5:48
BLEU Score Explained
3:27
ROUGE Score Explained
8:03
Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained
5:05
MMaDA: Multimodal Large Diffusion Language Models - Paper Walkthrough
8:04
The Illusion of Thinking - Paper Walkthrough
8:36
DeepSeek-R1 - Paper Walkthrough
3:57
Why Larger Language Models Do In-context Learning Differently? - Paper Walkthrough
5:57
Qwen3 - Paper Walkthrough