Week 1 Focus: Designing Data-Intensive Applications (DDIA) by Martin Kleppmann
This book is foundational for any data engineer.
• Why it's crucial: If you've ever been asked why a pipeline dropped messages or whether to use "at least once" versus "exactly once" processing, and found yourself silent, this book is your answer. It's about understanding the "traffic" before you build the "highway". DDIA explains how data moves, fails, and recovers in complex systems, covering durability, ordering, replication, and critical trade-offs in real system thinking. Mastering this will help you answer tough interview questions about pipeline messages or consistency models.
• Key Takeaway: Always state the consistency, availability, and latency trade-off for your design in one clear sentence.
• Field Exercise: Diagram one job you currently own. Mark where back pressure can occur, where you buffer data, and where you need to implement retries. Then, determine where you would deduplicate and explain why an "exactly-once" guarantee might cost more than it saves for your specific use case.
• Pair with: To get hands-on practice while reading, we recommend pairing DDIA with the "Spark and Python for Big Data" Udemy course.
--------------------------------------------------------------------------------
Bonus Tip: Remember, reading plus practice beats passive reading every day! Aim to do two labs that map to these topics for each chapter you read. The Databricks free edition, recently updated, is an excellent resource with loaded datasets for you to practice with.
Stay tuned for next week's deep dive into "Fundamentals of Data Engineering"!
Affiliate links to the book and suggested Udemy course:
Designing Data-Intensive Applications — Martin Kleppmann
amzn.to/45QIzci
Kickstarting Your Data Engineering Mastery! (Week 1):
We're excited to launch a weekly series to help you master essential data engineering concepts! Each week, we'll dive into one of the "7 Books Every Data Engineer Should Master," providing you with a focused reading plan, practice drills, and key takeaways. Think of this as your quasi-training program to level up your skills and confidently tackle data engineering challenges.
If you're eager to get a head start or prefer to consume all the information at once, you can also watch the full video on "7 Books Every Data Engineer Should Master" right now!
I appreciate this community so much! You are all amazing brilliant people and I am grateful for the time you spend here with me.
Many of you have reached out over the last year telling me you are surprised more people haven't found me. Well here is your chance to change that! When you see a video that you really love, you have the power to "Hype" it. This will help to spread the video further across YouTube.
Data engineering can be isolating, it doesn't have to be! Let's continue to build this amazing community!
I am humbled and honored that so many people take the time to watch the videos I've been sharing. We are a community of 5,245! Thank you all for watching, providing feedback, and participating!
The Data Engineering Channel
Week 1 Focus: Designing Data-Intensive Applications (DDIA) by Martin Kleppmann
This book is foundational for any data engineer.
• Why it's crucial: If you've ever been asked why a pipeline dropped messages or whether to use "at least once" versus "exactly once" processing, and found yourself silent, this book is your answer. It's about understanding the "traffic" before you build the "highway". DDIA explains how data moves, fails, and recovers in complex systems, covering durability, ordering, replication, and critical trade-offs in real system thinking. Mastering this will help you answer tough interview questions about pipeline messages or consistency models.
• Key Takeaway: Always state the consistency, availability, and latency trade-off for your design in one clear sentence.
• Field Exercise: Diagram one job you currently own. Mark where back pressure can occur, where you buffer data, and where you need to implement retries. Then, determine where you would deduplicate and explain why an "exactly-once" guarantee might cost more than it saves for your specific use case.
• Pair with: To get hands-on practice while reading, we recommend pairing DDIA with the "Spark and Python for Big Data" Udemy course.
--------------------------------------------------------------------------------
Bonus Tip: Remember, reading plus practice beats passive reading every day! Aim to do two labs that map to these topics for each chapter you read. The Databricks free edition, recently updated, is an excellent resource with loaded datasets for you to practice with.
Stay tuned for next week's deep dive into "Fundamentals of Data Engineering"!
Affiliate links to the book and suggested Udemy course:
Designing Data-Intensive Applications — Martin Kleppmann
amzn.to/45QIzci
Spark and Python for Big Data with PySpark — Jose Portilla
click.linksynergy.com/deeplink?id=WuIlwt/6f6I&mid=…
3 days ago | [YT] | 18
View 4 replies
The Data Engineering Channel
Kickstarting Your Data Engineering Mastery! (Week 1):
We're excited to launch a weekly series to help you master essential data engineering concepts! Each week, we'll dive into one of the "7 Books Every Data Engineer Should Master," providing you with a focused reading plan, practice drills, and key takeaways. Think of this as your quasi-training program to level up your skills and confidently tackle data engineering challenges.
If you're eager to get a head start or prefer to consume all the information at once, you can also watch the full video on "7 Books Every Data Engineer Should Master" right now!
3 days ago | [YT] | 23
View 0 replies
The Data Engineering Channel
I appreciate this community so much! You are all amazing brilliant people and I am grateful for the time you spend here with me.
Many of you have reached out over the last year telling me you are surprised more people haven't found me. Well here is your chance to change that! When you see a video that you really love, you have the power to "Hype" it. This will help to spread the video further across YouTube.
Data engineering can be isolating, it doesn't have to be! Let's continue to build this amazing community!
6 days ago | [YT] | 19
View 0 replies
The Data Engineering Channel
Is it ok for Early Career Data Engineers to use AI to write SQL, Java and Python scripts?
1 week ago | [YT] | 4
View 2 replies
The Data Engineering Channel
Which of these options best describes a SCD Type 2?
1 week ago | [YT] | 7
View 0 replies
The Data Engineering Channel
What Does SCD stand for?
1 week ago | [YT] | 6
View 10 replies
The Data Engineering Channel
I am humbled and honored that so many people take the time to watch the videos I've been sharing. We are a community of 5,245! Thank you all for watching, providing feedback, and participating!
Always feel free to reach out if you have questions about Data Engineering or would like to set up a time to talk about our coaching program! www.gambilldataengineering.com/mentoring-program
If you don't realize it, we also have a growing discord community of data enthusiasts and professionals. discord.gg/kHdPQHeWwB
1 week ago | [YT] | 9
View 0 replies
The Data Engineering Channel
What is the biggest tell tale sign that a programmer vibe coding with AI?
2 weeks ago | [YT] | 1
View 0 replies
The Data Engineering Channel
What’s your target salary as a data engineer in 2025?
2 weeks ago | [YT] | 7
View 1 reply
The Data Engineering Channel
Which of the following is the best way to log files from a mounted directory in a Databricks pipeline using dbutils?
3 weeks ago | [YT] | 0
View 0 replies
Load more