Karina Data Scientist

Hi, I'm Karina! A decade ago, I transitioned from finance to the world of data analytics and data science. It is all started with simple VBA code, and I knew my life will never be the same. After that it was SQL, R, Python, PowerBI, Tableau, hours spent with Stackoverflow and Youtube tutorials.

⚡️Want to learn Python or start coding, but it feels overwhelming - start with my "Data Analysis with Python" beginner-friendly masterclass: karinadatascientist.com/
⚡️

Through my channel, I want to demystify data analysis and share my knowledge — from statistics and Excel to Python and ChatGPT.

Want to learn something new? Subscribe and hit the bell to get notified when I upload new videos!


Karina Data Scientist

Python tip

Find the most frequent element in a list

8 hours ago | [YT] | 9

Karina Data Scientist

Python Tip

New way to merge dictionaries in Python 3.9+

1 day ago | [YT] | 16

Karina Data Scientist

Data tip

Your merge can doubled your revenue.

The problem: Right table has duplicates. Each left row matches multiple right rows = row explosion.

The fix:

- Check duplicates before merging
- add validate='many_to_one' for lookups

2 days ago | [YT] | 12

Karina Data Scientist

You assign a value. No error. But the data doesn't change.

The problem: Chained indexing creates a view, not a copy. Your assignment disappears.

The fix: Use .copy() or .loc in one expression.

The rule:

Slicing for analysis? No .copy() needed
Slicing to modify? Add .copy()
Modifying original DataFrame? Use .loc[condition, column]

3 days ago | [YT] | 21

Karina Data Scientist

Python tip

You have a column of whole numbers. After a merge or fillna, suddenly they're floats with decimals.

Standard integers can't hold missing values. One NaN forces the entire column to float64.

To fix it - Use nullable integer dtypes (Int64) or fill before converting back.

The key difference:

int64 (lowercase) → Can't hold NaN, converts to float64
Int64 (uppercase) → Nullable integer, stays integer with <NA>

The rule:

Need to keep missing values? Use Int64
Missing = 0 makes sense? Use fillna(0).astype('int64')
Always check dtypes after merges

4 days ago | [YT] | 18

Karina Data Scientist

GroupBy Multi-Index Columns: The Confusing Mess

You run a groupby with multiple aggregations. The column names become ('amount', 'sum') tuples.

Suddenly df['amount'] doesn't work anymore.

The problem: Using agg() with lists creates multi-index columns that break downstream code.

The fix: Use named aggregations to get clean, flat column names.

Always add .reset_index() to move the groupby key from index to column.

The rule:

Use named aggregations: new_name=(column, func)
Always .reset_index() to flatten
Your downstream code will thank you

5 days ago | [YT] | 22

Karina Data Scientist

Categorical Traps: Missing Categories and Broken Sorting

You group by customer tier. "Gold" customers don't appear in the results because there are zero this month.

Or your report shows Bronze → Gold → Platinum → Silver. Wait, what?

The problem: Categories inferred from data miss empty groups and sort alphabetically, not logically.

The fix: Define categorical dtype upfront with proper order.

The rule:

Define categories upfront with CategoricalDtype
Use observed=False in groupby to include empty categories
Use .reindex() to show zeros in pivot tables
Set ordered=True for logical sorting

6 days ago | [YT] | 22

Karina Data Scientist

You merge two tables. Both have a status column. After the merge, df['status'] isn't what you think it is.

The problem: Duplicate column names after merge create status_x and status_y. You can't tell which is which.

The fix: Use explicit suffixes to show which table each column came from.

The rule:

Always use explicit suffixes= parameter
Give primary table empty suffix: suffixes=('', '_lookup')
Use semantic names: _user, _product, _meta, not _x, _y
Drop duplicate columns you don't need immediately

1 week ago | [YT] | 33

Karina Data Scientist

𝟏𝟎 𝐆𝐢𝐭𝐇𝐮𝐛 𝐑𝐞𝐩𝐨𝐬 𝐟𝐨𝐫 𝐒𝐐𝐋 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐞

Practice SQL with real business scenarios

𝐅𝐎𝐑 𝐂𝐎𝐌𝐏𝐋𝐄𝐓𝐄 𝐁𝐄𝐆𝐈𝐍𝐍𝐄𝐑𝐒 (𝐥𝐞𝐚𝐫𝐧 𝐛𝐲 𝐝𝐨𝐢𝐧𝐠)

𝐒𝐐𝐋 𝐌𝐮𝐫𝐝𝐞𝐫 𝐌𝐲𝐬𝐭𝐞𝐫𝐲 — solve a crime using only SQL queries, gamified learning
github.com/NUKnightLab/sql-mysteries

𝐒𝐐𝐋 𝐙𝐨𝐨 — interactive tutorials from basics to advanced, instant feedback on your queries
github.com/jisaw/sqlzoo-solutions

𝐅𝐎𝐑 𝐈𝐍𝐓𝐄𝐑𝐕𝐈𝐄𝐖 𝐏𝐑𝐄𝐏

𝐒𝐐𝐋 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐞 𝐏𝐫𝐨𝐛𝐥𝐞𝐦𝐬 — 57 progressively harder problems mirroring real interview scenarios
github.com/XD-DENG/SQL-exercise

𝐀𝐰𝐞𝐬𝐨𝐦𝐞 𝐒𝐐𝐋 𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 — company-specific questions (Google, Amazon, Meta) with expected approaches
github.com/kansiris/SQL-interview-questions

𝐃𝐚𝐭𝐚 𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 — SQL problems asked at top tech companies with detailed solutions
github.com/shawlu95/Beyond-LeetCode-SQL

𝐅𝐎𝐑 𝐑𝐄𝐀𝐋-𝐖𝐎𝐑𝐋𝐃 𝐒𝐂𝐄𝐍𝐀𝐑𝐈𝐎𝐒 (business problems)

𝟖 𝐖𝐞𝐞𝐤 𝐒𝐐𝐋 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞 — case studies from e-commerce, food delivery, and streaming platforms
github.com/katiehuangx/8-Week-SQL-Challenge

𝐒𝐐𝐋 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐞 — real-world business datasets with progressively challenging queries
github.com/WebDevSimplified/Learn-SQL

𝐅𝐎𝐑 𝐀𝐃𝐕𝐀𝐍𝐂𝐄𝐃 𝐏𝐑𝐀𝐂𝐓𝐈𝐂𝐄 (window functions, CTEs, optimisation)

𝐀𝐝𝐯𝐚𝐧𝐜𝐞𝐝 𝐒𝐐𝐋 𝐏𝐮𝐳𝐳𝐥𝐞𝐬 — brain teasers that force you to think differently about queries
github.com/smpetersgithub/AdvancedSQLPuzzles

𝐒𝐐𝐋 𝐖𝐢𝐧𝐝𝐨𝐰 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧𝐬 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐞 — focused exercises on the functions that separate junior from senior analysts
github.com/lpinzari/sql-psql-udy

𝐋𝐞𝐞𝐭𝐂𝐨𝐝𝐞 𝐒𝐐𝐋 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐬 — if you must use LeetCode, here are optimized solutions with explanations
github.com/kamyu104/LeetCode-Solutions

𝐏.𝐒. I share my experience, data analytics & data science tips in my free newsletter. Join here ->
lnkd.in/d3M49ktj

1 week ago | [YT] | 12

Karina Data Scientist

Did you know you can embed Power BI inside Jupyter Notebook? I didn't.

Yep — interactive Power BI visuals, inside your Python environment.

Here’s what you can do

- Instantly explore your data (drag, filter, cross-highlight)
- Build automated reports that refresh with new data
- Combine Python models + Power BI visuals in one notebook
- Publish directly to Power BI workspace

Great for notebooks used in reports, walkthroughs or live demos.

Documentation:
learn.microsoft.com/en-us/javascript/api/overview/…

powerbi.microsoft.com/fr-fr/blog/announcing-power-…

1 week ago | [YT] | 11