Karina Data Scientist

Hi, I'm Karina! A decade ago, I transitioned from finance to the world of data analytics and data science. It is all started with simple VBA code, and I knew my life will never be the same. After that it was SQL, R, Python, PowerBI, Tableau, hours spent with Stackoverflow and Youtube tutorials.

⚡️Want to learn Python or start coding, but it feels overwhelming - start with my "Data Analysis with Python" beginner-friendly masterclass: karinadatascientist.com/
⚡️

Through my channel, I want to demystify data analysis and share my knowledge — from statistics and Excel to Python and ChatGPT.

Want to learn something new? Subscribe and hit the bell to get notified when I upload new videos!


Karina Data Scientist

Stop stacking OR conditions in Python

There's a cleaner way - use in

Why sets?

in with sets is O(1) lookup
in with lists is O(n) lookup
For 3 items? Doesn't matter
For 100 items? Use sets

16 hours ago | [YT] | 16

Karina Data Scientist

Variables in SQL - complete game changer

If you're copy-pasting the same query 4 times just to change one value, it works... but there's a better way.

Let me show you.

SELECT * FROM Orders WHERE Region = 'West';
SELECT * FROM Orders WHERE Region = 'East';
SELECT * FROM Orders WHERE Region = 'South';
SELECT * FROM Orders WHERE Region = 'Central';

Copy. Paste. Change value. Repeat.
It works. But it's painful.

The better way: Use a variable

Change the region? Change one line. Done.

Why this matters:
- No more find-and-replace across 50 lines
- Change values in one place
- Easier to test different scenarios
- Cleaner stored procedures
- Reusable scripts

SQL Server:
DECLARE @Region VARCHAR(50) = 'West';

PostgreSQL:
DO $$
DECLARE Region VARCHAR(50) := 'West';
BEGIN
-- your query
END $$;

MySQL:
SET @Region = 'West';

1 day ago | [YT] | 19

Karina Data Scientist

ROWS vs RANGE

ROWS counts rows. RANGE groups by value. They're different.

When to use which:

ROWS → When you want exact row counts
RANGE → When you want value-based grouping

2 days ago | [YT] | 27

Karina Data Scientist

UNBOUNDED Window Performance Killer

UNBOUNDED PRECEDING scans your entire partition

This turns 5-second queries into 2-hour nightmares.

3 days ago | [YT] | 23

Karina Data Scientist

PARTITION BY Missing Columns

Missing one column in PARTITION BY produces wrong numbers

This bug looks fine but inflates your totals.

4 days ago | [YT] | 24

Karina Data Scientist

NULL Handling in Aggregations

NULLs silently break your SQL averages

This produces wrong numbers and you won't notice.

5 days ago | [YT] | 27

Karina Data Scientist

Count Unique Values Per Group

Find distinct items in each category

How many unique products did each store sell? How many different customers per region?


Difference:

.nunique() → Count only
.apply(set) → Unique values (no duplicates)
.apply(list) → All values (with duplicates)

Perfect for inventory analysis, customer diversity, or product mix metrics.

6 days ago | [YT] | 25

Karina Data Scientist

Rolling Averages Within Groups

Moving averages that respect group boundaries

Smooth noisy data without mixing categories.

The rolling window never crosses group boundaries.

Parameters:

window=3 → Size of moving window
min_periods=1 → Calculate even with fewer values

Perfect for smoothing time series by category, region, or product line.

1 week ago | [YT] | 35

Karina Data Scientist

Cumulative Sums Within Groups

Calculate running totals that reset per group in pandas

Perfect for YTD sales, running balances, or any sequential metric.

The cumulative sum resets automatically for each group.

Use cases:

Year-to-date revenue by region
Running balance by customer account
Sequential position tracking by category

1 week ago | [YT] | 26

Karina Data Scientist

Python tip

APIs often dump nested JSON into CSV columns. You can't analsze it until you flatten it.

After you flatten it - you can analyse data by OS, version, or device.

What's happening:

json.loads → Converts string to Python dict
.apply(pd.Series) → Expands dict keys into columns
pd.concat() → Combines with original data

Pro tip - handle errors safely:


# Some rows might have invalid JSON
df_meta = df['metadata'].apply(
lambda x: json.loads(x) if pd.notna(x) else {}
).apply(pd.Series)

1 week ago | [YT] | 33