Sumit Mittal

I created a video covering 5 Scenario based Interview questions related to Deletion Vectors in Databricks (a really hot topic for interviews)

Question 1) You are managing a large Delta table with billions of rows.
Deleting even a few thousand rows used to be very expensive because it required rewriting files.
How do deletion vectors, help solve this problem?

Question 2) Imagine two cases:
Case A: Deleting 100 rows
Case B: Deleting 100 million rows
In which case do deletion vectors bring the most benefit, and when would you consider a file rewrite instead?

Question 3) Your company must comply with the GDPR (right to be forgotton). Since deletion vectors only mark rows as deleted, the data still exists in storage.
How would you design a process to ensure deleted records are physically removed for compliance?

Question 4) Over time, analysts complain that queries on the Delta table are becoming slower after frequent deletes. How would you confirm whether deletion vectors are the cause, and what steps can you take to restore performance?

Question 5) After months of deletes and updates, you notice storage costs have gone up and queries are slower, even though deletion vectors were meant to save space and time. Why does this happen, and what Databricks operations would you apply to balance storage, performance, and compliance?

I am sure this video will help you a lot. This is a really hot topic for Interviews these days!

Do support by liking, commenting & sharing if you truly find it valuable :)

2 months ago | [YT] | 50