I cover the latest AI tech/research papers for fun



bycloud

RTX4080 SUPER GIVEAWAY --- 4 DAYS LEFT (only 20 people joined so far??)
Attend any NVIDIA GTC2025 free virtual sessions now and win a GPU!

For more giveaway instructions or join now: docs.google.com/forms/d/e/1FAIpQLSflyZe-qyx0X2Q5Y2…

Sign up for GTC sessions: nvda.ws/48s4tmc

1 month ago (edited) | [YT] | 78

bycloud

🚨This week’s top AI/ML research papers:
- GPT-4o System Card
- Are LLMs Better than Reported?
- Can Language Models Replace Programmers?
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking
- SelfCodeAlign
- Mixture of Parrots
- Unpacking SDXL Turbo
- A prescriptive theory for brain-like inference
- Modular Duality in Deep Learning
- Learning Video Representations without Natural Videos
- CORAL
- Task Vectors are Cross-Modal
- Mind Your Step (by Step)
- ShadowKV
- MarDini
- COAT
- Fast Best-of-N Decoding via Speculative Rejection
- Continuous Speech Synthesis using per-token Latent Diffusion
- Teach Multimodal LLMs to Comprehend Electrocardiographic Images
- FasterCache
- Read-ME
- VibeCheck
- HoPE
- In-Context LoRA for Diffusion Transformers
- Knowledge Graph Enhanced Language Agents for Recommendation
- $100K or 100 Days
- On Memorization of Large Language Models in Logical Reasoning
- Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
- Grounding by Trying
- Relaxed Recursive Transformers
- Combining Induction And Transduction For Abstract Reasoning


overview for each + authors' explanations
x.com/TheAITimeline/status/1853284642238968233

read it on my newsletter instead
mail.bycloud.ai/p/this-week-s-top-ai-ml-research-p…
Hope you like it, and have a great week!

join patreon to support me:
www.patreon.com/bycloud

5 months ago (edited) | [YT] | 308

bycloud

🚨This week’s top AI/ML research papers:
- Sparse Crosscoders
- Rethinking Softmax
- Mechanistic Unlearning
- Decomposing The Dark Matter of Sparse Autoencoders
- ZIP-FIT
- Automatically Interpreting Millions of Features in Large Language Models
- Breaking the Memory Barrier
- Can Knowledge Editing Really Correct Hallucinations?
- Framer: Interactive Frame Interpolation
- Beyond position
- A Hitchhiker's Guide to Scaling Law Estimation
- Scaling up Masked Diffusion Models on Text
- Why Does the Effective Context Length of LLMs Fall Short?
- Scaling Diffusion Language Models via Adaptation from Autoregressive Models
- Improve Vision Language Model Chain-of-thought Reasoning
- PyramidDrop
- FrugalNeRF
- SAM2Long
- SeerAttention
- FiTv2

overview for each + authors' explanations
x.com/TheAITimeline/thread/1850237734381834447

read it on a website instead
mail.bycloud.ai/p/this-week-s-top-ai-ml-research-p…

Hope you like it, and have a great week!

join patreon to support me:
www.patreon.com/bycloud

6 months ago (edited) | [YT] | 389

bycloud

🚨This week’s top AI/ML research papers:
- REPA: Representation Alignment for Generation
- Sabotage evaluations for frontier models
- Janus
- What Matters in Transformers? Not All Attention is Needed
- The Curse of Multi-Modalities
- When Attention Sink Emerges in Language Models
- Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
- Sample what you can't compress
- Mix Data or Merge Models?
- Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
- SeedLM
- LOKI
- Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
- Baichuan-Omni Technical Report
- KV Prediction for Improved Time to First Token
- Thinking LLMs
- MoH
- WorldCuisines
- Fluid
- FlatQuant
- A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
- Revealing the Barriers of Language Agents in Planning
- HumanEval-V
- EvolveDirector
- Self-Data Distillation for Recovering Quality in Pruned Large Language Models
- CoTracker3
- SANA
- LLM X MapReduce
- MLLM can see?
- Animate-X
- Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
- Model Swarms
- Fundamental Limitations on Subquadratic Alternatives to Transformers
- Inference Scaling for Long-Context Retrieval Augmented Generation
- Refined LLC
- TorchTitan
- You Know What I'm Saying: Jailbreak Attack via Implicit Reference
- Strong Model Collapse

overview for each + authors' explanations
x.com/TheAITimeline/thread/1848037103554441635

read it on a website instead
mail.bycloud.ai/p/this-week-s-top-ai-ml-research-p…

Hope you like it, and have a great week!

join patreon to support me:
www.patreon.com/bycloud

6 months ago (edited) | [YT] | 425

bycloud

🚨This week’s top AI/ML research papers:
- Differential Transformer
- GSM-Symbolic
- Pixtral 12B
- Intelligence at the Edge of Chaos
- Cheating Automatic LLM Benchmarks
- nGPT
- Upcycling Large Language Models into Mixture of Experts
- Personalized Visual Instruction Tuning
- Towards World Simulator
- Only-IF
- Addition is All You Need for Energy-efficient Language Models
- Selective Attention Improves Transformer
- MLLM as Retriever
- Rectified Diffusion
- Everything Everywhere All at Once
- Astute RAG
- LLMs Are In-Context Reinforcement Learners
- Scaling Laws For Diffusion Transformers
- EVOLvE
- Rewarding Progress
- Falcon Mamba
- Efficient Dictionary Learning with Switch Sparse Autoencoders
- Scaling Up Your Kernels
- RL, but don't do anything I wouldn't do
- Aria: An Open Multimodal Native Mixture-of-Experts Model
- Inheritune: Training Smaller Yet More Attentive Language Models

overview for each + authors' explanations
x.com/TheAITimeline/thread/1845511652550038003

read it on a website instead
mail.bycloud.ai/p/this-week-s-top-ai-ml-research-p…

Hope you like it, and have a great week!

join patreon to support me:
www.patreon.com/bycloud

6 months ago (edited) | [YT] | 375

bycloud

🚨This week’s top AI/ML research papers:
- MovieGen
- Were RNNs All We Needed?
- Contextual Document Embeddings
- RLEF
- ENTP
- VinePPO
- When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1
- LLMs Know More Than They Show
- Video Instruction Tuning With Synthetic Data
- PHI-S
- Thermodynamic Bayesian Inference
- Emu3: Next-Token Prediction is All You Need
- Lattice-Valued Bottleneck Duality
- Loong
- Archon
- Direct Judgement Preference Optimization
- Depth Pro
- MIO: A Foundation Model on Multimodal Tokens
- MM1.5
- PhysGen
- Cottention
- UniAff
- Hyper-Connections
- Image Copy Detection for Diffusion Models
- RATIONALYST
- From Code to Correctness
- Not All LLM Reasoners Are Created Equal
- VPTQ: Extreme Low-bit Vector Post-Training Quantization for LLMs
- Leopard: A VLM For Text-Rich Multi-Image Tasks
- Selective Aggregation for LoRA in Federated Learning
- Quantifying Generalization Complexity for Large Language Models
- FactAlign: Long-form Factuality Alignment of LLMs
- Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation?
- Law of the Weakest Link: Cross Capabilities of Large Language Models
- TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices
- One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
- Looped Transformers for Length Generalization
- Illustrious
- LLaVA-Critic
- Contrastive Localized Language-Image Pre-Training
- Large Language Models as Markov Chains
- CLIP-MoE
- SageAttention
- Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
- Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
- EVER
- The bunkbed conjecture is false

overview for each + authors' explanations
x.com/TheAITimeline/thread/1842759118509002777

read it on a website instead
mail.bycloud.ai/p/this-week-s-top-ai-ml-research-p…

Hope you like it, and have a great week!

join patreon to support me:
www.patreon.com/bycloud

6 months ago (edited) | [YT] | 593

bycloud

🚨This week’s top AI/ML research papers:
- Molmo and PixMo
- MaskLLM
- Are We Closer to an AI Doctor?
- Programming Every Example
- MIMO
- Pixel-Space Post-Training of Latent Diffusion Models
- Phantom of Latent for Large Language and Vision Models
- Making Text Embedders Few-Shot Learners
- Discovering the Gems in Early Layers
- Imagine yourself
- MonoFormer
- Instruction Following without Instruction Tuning
- HelloBench
- YesBut
- EMOVA
- LLaVA-3D
- Boosting Healthcare LLMs Through Retrieved Context
- RACER
- Present and Future Generalization of Synthetic Image Detectors
- Time-MoE
- Reflecting Reality
- Improvements to SDXL in NovelAI Diffusion V3
- MaskBit

overview for each + authors' explanations
x.com/TheAITimeline/thread/1840195222698885383

read it on a website instead
mail.bycloud.ai/p/this-week-s-top-ai-ml-research-p…


please consider give it some love and give the post a like <3
join patreon to support me:
www.patreon.com/bycloud

6 months ago (edited) | [YT] | 456

bycloud

🚨This week’s top AI/ML research papers:
- Qwen2.5-Coder Technical Report
- GRIN
- Moshi
- Training Language Models to Self-Correct via RL
- To CoT or not to CoT?
- OmniGen
- NVLM
- Qwen2-VL
- Kolmogorov-Arnold Transformer
- InfiMM-WebMath-40B
- MMSearch
- LVCD
- Scaling Smart
- Language Models Learn to Mislead Humans via RLHF
- A Controlled Study on Long Context Extension and Generalization in LLMs
- LLMs + Persona-Plug = Personalized LLMs
- Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
- Promptriever
- Phidias
- On the limits of agency in agent-based models
- SplatFields
- Seed-Music
- RetrievalAttention
- jina-embeddings-v3
- On the Diagram of Thought
- Iteration of Thought
- Breaking reCAPTCHAv2
- DrawingSpinUp

overview for each + authors' explanations
x.com/TheAITimeline/status/1837986580881141898

read it on Notion instead
admitted-caution-d79.notion.site/This-week-s-top-A…

If you do have X, please consider give it some love and give the post a like <3

Stay tuned for a selected few with simplified explanations on my newsletter:
mail.bycloud.ai/

join patreon to support me:
www.patreon.com/bycloud

7 months ago (edited) | [YT] | 400

bycloud

🚨This week’s top AI/ML research papers:

- Can LLMs Generate Novel Research Ideas?
- LLaMA-Omni
- SaRA
- How Do Your Code LLMs Perform?
- GroUSE
- Open-MAGVIT2
- Qihoo-T2X
- Windows Agent Arena
- Configurable Foundation Models
- Paper Copilot
- MemoRAG
- MMEvol
- INTRA
- DSBench
- PingPong
- Agent Workflow Memory
- Gated Slot Attention for Efficient Linear-Time Sequence Modeling
- Hi3D
- The AdEMAMix Optimizer
- DepthCrafter
- MLR-Copilot
- LinFusion
- gsplat
- SongCreator
- Draw an Audio


overview for each + authors' explanations
x.com/TheAITimeline/thread/1835123984167219684

read it on Notion instead
admitted-caution-d79.notion.site/This-week-s-top-A…

If you do have X, please consider give it some love and give the post a like <3

Stay tuned for a selected few with simplified explanations on my newsletter:
mail.bycloud.ai/

join patreon to support me:
www.patreon.com/bycloud

7 months ago (edited) | [YT] | 548

bycloud

🚨This week’s top AI/ML research papers:

- OLMoE
- FLUX that Plays Music
- Loopy
- LongLLaVA
- In Defense of RAG in the Era of Long-Context Language Models
- FuzzCoder
- Guide-and-Rescale
- LongCite
- MMMU-Pro
- Kvasir-VQA
- LongRecipe
- VQ4DiT
- xLAM
- AlphaProteo
- Late Chunking

overview for each + authors' explanations
x.com/TheAITimeline/thread/1832891905581216107

read it on Notion instead
admitted-caution-d79.notion.site/This-week-s-top-A…

If you do have X, please consider give it some love and give the post a like <3

Stay tuned for a selected few with simplified explanations on my newsletter:
mail.bycloud.ai/

7 months ago (edited) | [YT] | 485