Hi, I'm Karina! A decade ago, I transitioned from finance to the world of data analytics and data science. It is all started with simple VBA code, and I knew my life will never be the same. After that it was SQL, R, Python, PowerBI, Tableau, hours spent with Stackoverflow and Youtube tutorials.
⚡️Want to learn Python or start coding, but it feels overwhelming - start with my "Data Analysis with Python" beginner-friendly masterclass: karinadatascientist.com/
⚡️
Through my channel, I want to demystify data analysis and share my knowledge — from statistics and Excel to Python and ChatGPT.
Want to learn something new? Subscribe and hit the bell to get notified when I upload new videos!
Karina Data Scientist
Stop stacking OR conditions in Python
There's a cleaner way - use in
Why sets?
in with sets is O(1) lookup
in with lists is O(n) lookup
For 3 items? Doesn't matter
For 100 items? Use sets
16 hours ago | [YT] | 16
View 0 replies
Karina Data Scientist
Variables in SQL - complete game changer
If you're copy-pasting the same query 4 times just to change one value, it works... but there's a better way.
Let me show you.
SELECT * FROM Orders WHERE Region = 'West';
SELECT * FROM Orders WHERE Region = 'East';
SELECT * FROM Orders WHERE Region = 'South';
SELECT * FROM Orders WHERE Region = 'Central';
Copy. Paste. Change value. Repeat.
It works. But it's painful.
The better way: Use a variable
Change the region? Change one line. Done.
Why this matters:
- No more find-and-replace across 50 lines
- Change values in one place
- Easier to test different scenarios
- Cleaner stored procedures
- Reusable scripts
SQL Server:
DECLARE @Region VARCHAR(50) = 'West';
PostgreSQL:
DO $$
DECLARE Region VARCHAR(50) := 'West';
BEGIN
-- your query
END $$;
MySQL:
SET @Region = 'West';
1 day ago | [YT] | 19
View 0 replies
Karina Data Scientist
ROWS vs RANGE
ROWS counts rows. RANGE groups by value. They're different.
When to use which:
ROWS → When you want exact row counts
RANGE → When you want value-based grouping
2 days ago | [YT] | 27
View 0 replies
Karina Data Scientist
UNBOUNDED Window Performance Killer
UNBOUNDED PRECEDING scans your entire partition
This turns 5-second queries into 2-hour nightmares.
3 days ago | [YT] | 23
View 0 replies
Karina Data Scientist
PARTITION BY Missing Columns
Missing one column in PARTITION BY produces wrong numbers
This bug looks fine but inflates your totals.
4 days ago | [YT] | 24
View 0 replies
Karina Data Scientist
NULL Handling in Aggregations
NULLs silently break your SQL averages
This produces wrong numbers and you won't notice.
5 days ago | [YT] | 27
View 0 replies
Karina Data Scientist
Count Unique Values Per Group
Find distinct items in each category
How many unique products did each store sell? How many different customers per region?
Difference:
.nunique() → Count only
.apply(set) → Unique values (no duplicates)
.apply(list) → All values (with duplicates)
Perfect for inventory analysis, customer diversity, or product mix metrics.
6 days ago | [YT] | 25
View 1 reply
Karina Data Scientist
Rolling Averages Within Groups
Moving averages that respect group boundaries
Smooth noisy data without mixing categories.
The rolling window never crosses group boundaries.
Parameters:
window=3 → Size of moving window
min_periods=1 → Calculate even with fewer values
Perfect for smoothing time series by category, region, or product line.
1 week ago | [YT] | 35
View 0 replies
Karina Data Scientist
Cumulative Sums Within Groups
Calculate running totals that reset per group in pandas
Perfect for YTD sales, running balances, or any sequential metric.
The cumulative sum resets automatically for each group.
Use cases:
Year-to-date revenue by region
Running balance by customer account
Sequential position tracking by category
1 week ago | [YT] | 26
View 0 replies
Karina Data Scientist
Python tip
APIs often dump nested JSON into CSV columns. You can't analsze it until you flatten it.
After you flatten it - you can analyse data by OS, version, or device.
What's happening:
json.loads → Converts string to Python dict
.apply(pd.Series) → Expands dict keys into columns
pd.concat() → Combines with original data
Pro tip - handle errors safely:
# Some rows might have invalid JSON
df_meta = df['metadata'].apply(
lambda x: json.loads(x) if pd.notna(x) else {}
).apply(pd.Series)
1 week ago | [YT] | 33
View 2 replies
Load more