Introducing my new series: “𝐋 𝐟𝐨𝐫 𝐋𝐋𝐌”(Episode - 1):
𝑳𝒂𝒓𝒈𝒆 𝑳𝒂𝒏𝒈𝒖𝒂𝒈𝒆 𝑴𝒐𝒅𝒆𝒍
AI systems trained on massive amounts of text (books, articles, websites, conversations). LLMs are powerful pattern recognizers that feel intelligent because of scale.
Example : ChatGPT, Claude, and more......................................................................
But, How exactly are these modern LLMs trained?
1️⃣ 𝐏𝐫𝐞𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠: Model learns language patterns and knowledge from massive text datasets.
2️⃣ 𝐈𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 (𝐒𝐅𝐓): Humans teach the model to follow instructions with curated examples.
3️⃣ 𝐏𝐫𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐂𝐨𝐥𝐥𝐞𝐜𝐭𝐢𝐨𝐧: Multiple model outputs are ranked to capture what humans actually prefer.
4️⃣ 𝐑𝐞𝐰𝐚𝐫𝐝 𝐌𝐨𝐝𝐞𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠: A model is trained to score outputs based on human preferences.
5️⃣ 𝐑𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 (𝐑𝐋𝐇𝐅/𝐃𝐏𝐎): The model is fine-tuned to maximize helpful, safe, and aligned.
Tada! ChatGPT is ready to be shipped! 🚀
As usual i have some handcrafted visualizations to make it easy for digestion. Go check out now----->
Here’s why it’s one of my favorite ML algorithms 👇
✅ 𝐃𝐨𝐞𝐬𝐧’𝐭 𝐝𝐢𝐬𝐜𝐚𝐫𝐝 𝐰𝐞𝐚𝐤 𝐦𝐨𝐝𝐞𝐥𝐬 – it makes them work harder.
✅ 𝐋𝐞𝐚𝐫𝐧𝐬 𝐟𝐫𝐨𝐦 𝐦𝐢𝐬𝐭𝐚𝐤𝐞𝐬 – misclassified points get more weight next round.
✅ 𝐓𝐞𝐚𝐦 𝐞𝐟𝐟𝐨𝐫𝐭 – many weak learners combine into one strong learner.
✅ 𝐒𝐢𝐦𝐩𝐥𝐞 𝐦𝐚𝐭𝐡, 𝐩𝐨𝐰𝐞𝐫𝐟𝐮𝐥 𝐢𝐦𝐩𝐚𝐜𝐭 – a great mix of theory + real-world use.
If you’ve ever wondered how machines get better by learning from errors, this is the algorithm to explore.
New video in 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 மெதுவாக Series ! https://youtu.be/-aI9fjOL9EQ Welcome to my small 🍎 Apple Shop.
Here, Every customer asks me 2 things:
1️⃣ Is this really an apple?
2️⃣ How sweet is it? (in grams of sugar)
My data mind wanted to frame it into a use case, hence 𝐑𝐚𝐧𝐝𝐨𝐦 𝐅𝐨𝐫𝐞𝐬𝐭 𝐓𝐫𝐞𝐞𝐬 are born!
👉 Classification = “Is Apple or Not?”
👉 Regression = “Sweetness Score”
I also covered Ensemble, bagging, feature sampling — in a super simple way.
No heavy math, just everyday examples.
☕ Once upon a coffee (Decision Trees Ft. Filter காப்பிl!)…
We were debating over our morning brews — who drinks when, how many cups, who gets stressed, who sleeps well. That tiny argument turned into something big… 🌳
👉 We built our very own Coffee Chronicles Dataset. 👉 We asked a simple question: Can a Decision Tree predict caffeine addicts vs calm sippers? 👉 From there, the math started pouring in —
🔹 Entropy to measure the “chaos” in our data 🔹 Gini Impurity to split better 🔹 Variance Reduction to handle regression All explained in simple Tamil, step by step. 🇮🇳
And now, it’s a video — mixing math + machine learning + Tamil + coffee all in one cup. 🚀
𝐋𝐚𝐝𝐢𝐞𝐬 𝐚𝐧𝐝 𝐆𝐞𝐧𝐭𝐥𝐞𝐦𝐞𝐧 𝐒𝐞𝐫𝐢𝐞𝐬 - (Episode 3)
Sharing go-to snippets that teach you something new in every post.
𝑹𝑨𝑮:
❓ WHO AM I? → technique to make LLMs accurate, up-to-date, and domain-specific by grounding their answers in external data.
⏳ SINCE WHEN? → Introduced by Facebook AI Research in 2020.
🌍 HOW POPULAR? → Powering today’s AI copilots, chat-with-your-PDF apps, and enterprise assistants!
But wait…… 🤔
How do I actually work ?
1️⃣ 𝐃𝐚𝐭𝐚 𝐈𝐧𝐠𝐞𝐬𝐭𝐢𝐨𝐧 📥 — Load PDFs, CSVs, websites, or DBs.
2️⃣ 𝐒𝐩𝐥𝐢𝐭𝐭𝐢𝐧𝐠 ✂️ — Break long docs into chunks.
3️⃣ 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐬 🧩 — Turn chunks into vectors.
4️⃣ 𝐕𝐞𝐜𝐭𝐨𝐫 𝐒𝐭𝐨𝐫𝐞 🗂️ — Store in FAISS, Pinecone, Chroma, etc.
5️⃣ 𝐑𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 + 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 🤝 — Fetch relevant chunks → pass to LLM → get grounded answers.
🚦Ways to Build Me:
𝐎𝐩𝐭𝐢𝐨𝐧 𝐀 – 𝐃𝐈𝐘🧑💻
Use Python libraries (PyPDF2, sentence-transformers, faiss, openai).
But, You’ll control everything, but also manage memory, prompts, and orchestration yourself.
𝐎𝐩𝐭𝐢𝐨𝐧 𝐁 – 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤𝐬 ⚡
LangChain → general-purpose & modular.
LlamaIndex → data-centric & simple.
Haystack → enterprise-grade pipelines.
AutoGen → for agent-based workflows.
Choose what fits your style & infra.
🚀Now go check some starters using Langchain below!
𝐋𝐚𝐝𝐢𝐞𝐬 𝐚𝐧𝐝 𝐆𝐞𝐧𝐭𝐥𝐞𝐦𝐞𝐧 𝐒𝐞𝐫𝐢𝐞𝐬 - (Episode 2)
Sharing go-to snippets that teach you something new in every post.
𝙇𝙖𝙣𝙜𝙘𝙝𝙖𝙞𝙣
❓ Who am I? → Framework to build applications on top of LLMs.
⏳ Since when? → October 2022 (still a baby...)
🌍 How popular? → 360,000 package downloads per day!
But wait...........🤔
Why do you need me?
1️⃣ 𝗣𝗿𝗼𝗺𝗽𝘁 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁 was messy 📝 — LangChain gives you neat templates.
2️⃣ Connecting your 𝗼𝘄𝗻 𝗱𝗮𝘁𝗮 was painful 📚 — now you can load PDFs, DBs, or APIs easily.
3️⃣ LLMs don’t 𝗿𝗲𝗺𝗲𝗺𝗯𝗲𝗿 past chats 🧠 — LangChain adds memory so they can!
4️⃣ Need external help like APIs? 🔧 — 𝗔𝗴𝗲𝗻𝘁𝘀 step in to pick and use tools.
5️⃣ 𝗧𝗿𝗮𝗰𝗸𝗶𝗻𝗴, 𝗹𝗼𝗴𝗴𝗶𝗻𝗴 & 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 ✅ — all in one place.
6️⃣ Complex workflows? 🔗 — 𝗖𝗵𝗮𝗶𝗻𝘀 let you connect multiple steps seamlessly.
🚀 LLMs are raw electricity… but you need wiring to light up your world🔌'
That's when I came into picture, now go check out the 📸.
🚀 Launching "𝐋𝐚𝐝𝐢𝐞𝐬 𝐚𝐧𝐝 𝐆𝐞𝐧𝐭𝐥𝐞𝐦𝐞𝐧 𝐒𝐞𝐫𝐢𝐞𝐬"
Sharing go-to snippets that teach you something new in every post.
𝑷𝒚𝒅𝒂𝒏𝒕𝒊𝒄
❓ Who am I? → A data validation library for Python
⏳ Since when? → October 2019 (first stable release)
🌍 How popular? → 8,000+ Python packages trust me
But wait...........🤔
Why do you need me?
Python is 𝐝𝐲𝐧𝐚𝐦𝐢𝐜𝐚𝐥𝐥𝐲 𝐭𝐲𝐩𝐞𝐝 - variables can change their datatype anytime:
---------------------------------
x = 10
x = "Keerthi"
print(x) # "Keerthi"
---------------------------------
This is great as it is 𝒉𝒊𝒈𝒉𝒍𝒚 𝒇𝒍𝒆𝒙𝒊𝒃𝒍𝒆 and on the fly🦅!
But in large apps / APIs, you need strict control of data types for requests & responses. You can't sent a "string" to the variable that expects "int"(So Sad!)
This is Tamil meets AI – for anyone trying to learn NLP concepts with our own culture.
Perfect for students, beginners, and anyone building something in NLP ❤️
Data Science with Keerthi (தமிழில்)
New video in 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 மெதுவாக Series !
🎯 New YouTube Video Drop — 𝐆𝐫𝐚𝐝𝐢𝐞𝐧𝐭 𝐁𝐨𝐨𝐬𝐭𝐢𝐧𝐠 (Ft. The AC Dataset)
📺 Watch full breakdown → https://youtu.be/UFMBXkB7BR0?si=bnyjy...
🔸 Gradient = derivative of loss
🔸 Each tree fits residuals from previous step
🔸 Regression → MSE loss
🔸 Classification → Log-loss & log-odds
🔸 Controlled updates using learning rate
🔸 Final prediction = base + sum of all tiny corrections
#GradientBoosting #DataScience #MachineLearning #DataScienceWithKeerthi #ArtificaialIntelligence
1 week ago (edited) | [YT] | 10
View 0 replies
Data Science with Keerthi (தமிழில்)
Introducing my new series: “𝐋 𝐟𝐨𝐫 𝐋𝐋𝐌”(Episode - 1):
𝑳𝒂𝒓𝒈𝒆 𝑳𝒂𝒏𝒈𝒖𝒂𝒈𝒆 𝑴𝒐𝒅𝒆𝒍
AI systems trained on massive amounts of text (books, articles, websites, conversations). LLMs are powerful pattern recognizers that feel intelligent because of scale.
Example : ChatGPT, Claude, and more......................................................................
But, How exactly are these modern LLMs trained?
1️⃣ 𝐏𝐫𝐞𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠: Model learns language patterns and knowledge from massive text datasets.
2️⃣ 𝐈𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 (𝐒𝐅𝐓): Humans teach the model to follow instructions with curated examples.
3️⃣ 𝐏𝐫𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐂𝐨𝐥𝐥𝐞𝐜𝐭𝐢𝐨𝐧: Multiple model outputs are ranked to capture what humans actually prefer.
4️⃣ 𝐑𝐞𝐰𝐚𝐫𝐝 𝐌𝐨𝐝𝐞𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠: A model is trained to score outputs based on human preferences.
5️⃣ 𝐑𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 (𝐑𝐋𝐇𝐅/𝐃𝐏𝐎): The model is fine-tuned to maximize helpful, safe, and aligned.
Tada! ChatGPT is ready to be shipped! 🚀
As usual i have some handcrafted visualizations to make it easy for digestion. Go check out now----->
#AI #ArtificialIntelligence #MachineLearning #DeepLearning #LLM #ChatGPT #ClaudeAI #NaturalLanguageProcessing #NLP #AIResearch #TechInnovation #GenerativeAI #AIExplained #DataScience hashtag#AICommunity #RLHF #RLAIF #LLMTraining
2 weeks ago | [YT] | 12
View 0 replies
Data Science with Keerthi (தமிழில்)
New video in 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 மெதுவாக Series !
AdaBoost (Ft. Saravana Bhavan) - https://youtu.be/EliLxkKUDQ0
Here’s why it’s one of my favorite ML algorithms 👇
✅ 𝐃𝐨𝐞𝐬𝐧’𝐭 𝐝𝐢𝐬𝐜𝐚𝐫𝐝 𝐰𝐞𝐚𝐤 𝐦𝐨𝐝𝐞𝐥𝐬 – it makes them work harder.
✅ 𝐋𝐞𝐚𝐫𝐧𝐬 𝐟𝐫𝐨𝐦 𝐦𝐢𝐬𝐭𝐚𝐤𝐞𝐬 – misclassified points get more weight next round.
✅ 𝐓𝐞𝐚𝐦 𝐞𝐟𝐟𝐨𝐫𝐭 – many weak learners combine into one strong learner.
✅ 𝐒𝐢𝐦𝐩𝐥𝐞 𝐦𝐚𝐭𝐡, 𝐩𝐨𝐰𝐞𝐫𝐟𝐮𝐥 𝐢𝐦𝐩𝐚𝐜𝐭 – a great mix of theory + real-world use.
If you’ve ever wondered how machines get better by learning from errors, this is the algorithm to explore.
#MachineLearning #AI #Boosting #DataScience #AdaBoost
3 weeks ago | [YT] | 13
View 1 reply
Data Science with Keerthi (தமிழில்)
New video in 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 மெதுவாக Series ! https://youtu.be/-aI9fjOL9EQ
Welcome to my small 🍎 Apple Shop.
Here, Every customer asks me 2 things:
1️⃣ Is this really an apple?
2️⃣ How sweet is it? (in grams of sugar)
My data mind wanted to frame it into a use case, hence 𝐑𝐚𝐧𝐝𝐨𝐦 𝐅𝐨𝐫𝐞𝐬𝐭 𝐓𝐫𝐞𝐞𝐬 are born!
👉 Classification = “Is Apple or Not?”
👉 Regression = “Sweetness Score”
I also covered Ensemble, bagging, feature sampling — in a super simple way.
No heavy math, just everyday examples.
1 month ago | [YT] | 14
View 2 replies
Data Science with Keerthi (தமிழில்)
☕ Once upon a coffee (Decision Trees Ft. Filter காப்பிl!)…
We were debating over our morning brews — who drinks when, how many cups, who gets stressed, who sleeps well.
That tiny argument turned into something big… 🌳
👉 We built our very own Coffee Chronicles Dataset.
👉 We asked a simple question: Can a Decision Tree predict caffeine addicts vs calm sippers?
👉 From there, the math started pouring in —
🔹 Entropy to measure the “chaos” in our data
🔹 Gini Impurity to split better
🔹 Variance Reduction to handle regression
All explained in simple Tamil, step by step. 🇮🇳
And now, it’s a video — mixing math + machine learning + Tamil + coffee all in one cup. 🚀
🎥 Here’s the release: [https://youtu.be/zkxMjd6Sw2c]
#MachineLearning #TamilTech #DataScience #DecisionTrees #AI #DecisionTreesInTamil #Entropy #GiniImpurity
You said:
1 month ago | [YT] | 9
View 2 replies
Data Science with Keerthi (தமிழில்)
𝐋𝐚𝐝𝐢𝐞𝐬 𝐚𝐧𝐝 𝐆𝐞𝐧𝐭𝐥𝐞𝐦𝐞𝐧 𝐒𝐞𝐫𝐢𝐞𝐬 - (Episode 3)
Sharing go-to snippets that teach you something new in every post.
𝑹𝑨𝑮:
❓ WHO AM I? → technique to make LLMs accurate, up-to-date, and domain-specific by grounding their answers in external data.
⏳ SINCE WHEN? → Introduced by Facebook AI Research in 2020.
🌍 HOW POPULAR? → Powering today’s AI copilots, chat-with-your-PDF apps, and enterprise assistants!
But wait…… 🤔
How do I actually work ?
1️⃣ 𝐃𝐚𝐭𝐚 𝐈𝐧𝐠𝐞𝐬𝐭𝐢𝐨𝐧 📥 — Load PDFs, CSVs, websites, or DBs.
2️⃣ 𝐒𝐩𝐥𝐢𝐭𝐭𝐢𝐧𝐠 ✂️ — Break long docs into chunks.
3️⃣ 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐬 🧩 — Turn chunks into vectors.
4️⃣ 𝐕𝐞𝐜𝐭𝐨𝐫 𝐒𝐭𝐨𝐫𝐞 🗂️ — Store in FAISS, Pinecone, Chroma, etc.
5️⃣ 𝐑𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 + 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 🤝 — Fetch relevant chunks → pass to LLM → get grounded answers.
🚦Ways to Build Me:
𝐎𝐩𝐭𝐢𝐨𝐧 𝐀 – 𝐃𝐈𝐘🧑💻
Use Python libraries (PyPDF2, sentence-transformers, faiss, openai).
But, You’ll control everything, but also manage memory, prompts, and orchestration yourself.
𝐎𝐩𝐭𝐢𝐨𝐧 𝐁 – 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤𝐬 ⚡
LangChain → general-purpose & modular.
LlamaIndex → data-centric & simple.
Haystack → enterprise-grade pipelines.
AutoGen → for agent-based workflows.
Choose what fits your style & infra.
🚀Now go check some starters using Langchain below!
#llms #langchain #GenAI #datascience #Agents #LadiesAndGentlemen
#datasciencewithkeerthi #RAG
1 month ago | [YT] | 10
View 3 replies
Data Science with Keerthi (தமிழில்)
𝐋𝐚𝐝𝐢𝐞𝐬 𝐚𝐧𝐝 𝐆𝐞𝐧𝐭𝐥𝐞𝐦𝐞𝐧 𝐒𝐞𝐫𝐢𝐞𝐬 - (Episode 2)
Sharing go-to snippets that teach you something new in every post.
𝙇𝙖𝙣𝙜𝙘𝙝𝙖𝙞𝙣
❓ Who am I? → Framework to build applications on top of LLMs.
⏳ Since when? → October 2022 (still a baby...)
🌍 How popular? → 360,000 package downloads per day!
But wait...........🤔
Why do you need me?
1️⃣ 𝗣𝗿𝗼𝗺𝗽𝘁 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁 was messy 📝 — LangChain gives you neat templates.
2️⃣ Connecting your 𝗼𝘄𝗻 𝗱𝗮𝘁𝗮 was painful 📚 — now you can load PDFs, DBs, or APIs easily.
3️⃣ LLMs don’t 𝗿𝗲𝗺𝗲𝗺𝗯𝗲𝗿 past chats 🧠 — LangChain adds memory so they can!
4️⃣ Need external help like APIs? 🔧 — 𝗔𝗴𝗲𝗻𝘁𝘀 step in to pick and use tools.
5️⃣ 𝗧𝗿𝗮𝗰𝗸𝗶𝗻𝗴, 𝗹𝗼𝗴𝗴𝗶𝗻𝗴 & 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 ✅ — all in one place.
6️⃣ Complex workflows? 🔗 — 𝗖𝗵𝗮𝗶𝗻𝘀 let you connect multiple steps seamlessly.
🚀 LLMs are raw electricity… but you need wiring to light up your world🔌'
That's when I came into picture, now go check out the 📸.
#llms #langchain #GenAI #datascience #Agents #LadiesAndGentlemen #datasciencewithkeerthi
1 month ago | [YT] | 5
View 0 replies
Data Science with Keerthi (தமிழில்)
🚀 Launching "𝐋𝐚𝐝𝐢𝐞𝐬 𝐚𝐧𝐝 𝐆𝐞𝐧𝐭𝐥𝐞𝐦𝐞𝐧 𝐒𝐞𝐫𝐢𝐞𝐬"
Sharing go-to snippets that teach you something new in every post.
𝑷𝒚𝒅𝒂𝒏𝒕𝒊𝒄
❓ Who am I? → A data validation library for Python
⏳ Since when? → October 2019 (first stable release)
🌍 How popular? → 8,000+ Python packages trust me
But wait...........🤔
Why do you need me?
Python is 𝐝𝐲𝐧𝐚𝐦𝐢𝐜𝐚𝐥𝐥𝐲 𝐭𝐲𝐩𝐞𝐝 - variables can change their datatype anytime:
---------------------------------
x = 10
x = "Keerthi"
print(x) # "Keerthi"
---------------------------------
This is great as it is 𝒉𝒊𝒈𝒉𝒍𝒚 𝒇𝒍𝒆𝒙𝒊𝒃𝒍𝒆 and on the fly🦅!
But in large apps / APIs, you need strict control of data types for requests & responses. You can't sent a "string" to the variable that expects "int"(So Sad!)
That's when I came into picture, now go check out the 📸.
#python #pydantic #datascience #Agents #LadiesAndGentlemen #datasciencewithkeerthi
1 month ago | [YT] | 15
View 2 replies
Data Science with Keerthi (தமிழில்)
ANN's to LLM's (Ep-2): Activation Functions - lnkd.in/gQxSdWmc
Confused between ReLU, GELU, Swish, Mish, or just using Sigmoid by default?
- Clean formulas for all activations
- No confusion + Simple intuition
- Pros & Cons – vanishing gradient, dying ReLU, etc.
- When to use what – hidden layers vs output layers
#DeepLearning #ActivationFunctions #AI #ReLU #GELU #NeuralNetworks #ML
3 months ago | [YT] | 3
View 0 replies
Data Science with Keerthi (தமிழில்)
Word Embeddings with திருக்குறள் - OHE/BoW/Tf-IDF (https://youtu.be/rOkvRtuyw6Q)
This is Tamil meets AI – for anyone trying to learn NLP concepts with our own culture.
Perfect for students, beginners, and anyone building something in NLP ❤️
#TamilNLP #ThirukkuralMeetsAI #MachineLearningInTamil #WordEmbeddings #TFIDF #OHE #BagOfWords #DataScienceinTamil #DataScienceWithKeerthi
3 months ago | [YT] | 7
View 2 replies
Load more