Data Savvy

Shuffle is one of most common reason for performance issues in spark... Be smart... Use Broadcast join... Here is a short and crisp tutorial on this

6 years ago | [YT] | 3