Many people fall into data engineering by accident. Software engineers may find that they enjoy building platforms to drive their company’s data initiatives; data scientists may find they need to get “dirty” to deliver insights at scale. What they have in common is that there’s always something new to learn about data engineering, and a robust set of resources at their fingertips.
Our data team at Immuta is no different, and we wanted to share some of the great resources we have found to improve our practice, from books to blogs to podcasts. Enjoy!
- The Data Engineering Podcast: Tobias Macey has made a tremendous contribution to the data engineering community with his weekly highlight of a new tool or creator. Topics run the gamut from deploying web scale streaming services to building organizational knowledge graphs to building a more efficient data organization — but I always walk away having learned something new. [Website]
- Designing Data Intensive Applications: Since its release in 2016, Designing Data Intensive Applications has become the cornerstone of any data engineer’s library. The book covers everything from the Parquet file format to synchronizing clocks on database clusters, and you’re sure to learn something new. Plus, it features some really cool “data landscape” maps. [Website]
- The Data Warehouse Toolkit: There’s a timeless quality to Kimball dimensional modelling. This reference guide is full of practical wisdom for setting up your first (or fifteenth!) data warehouse, from how many meetings you should have and with whom to how to best set up surrogate keys. [Amazon]
- Reddit’s r/dataengineering: Great articles and resources tend to find their way into this subreddit at some point, and you can find some good nuggets of wisdom from other commenters. Although a bit sedate, it’s a great spot to keep an eye on. [Reddit]
- DataCamp’s Data Engineering with Python Career Track: If you’re just getting started in data engineering, you may be interested in a more comprehensive approach to learning the fundamentals. DataCamp courses are taught by experts in their field, and the platform prioritizes interactive programming over long lectures. Their engineering track is a great way to bootstrap your skillset. [DataCamp]
- Data Skeptic Podcast: Data engineers also need to be aware of new approaches (and their limitations) to analyzing data or deploying data products. Data Skeptic offers a variety of content, from interviews with experts to bite-sized lessons on analytical techniques. Few podcasts are able to cater to both new and advanced data practitioners in the way the Data Skeptic is. [Website]
- Data Council: The Data Council (originally DataEngineeringConf) hosts high-quality, engineering-focused conferences and talks throughout the year. They prioritize open source projects and talks from practitioners, and consequently are a rich source of actionable knowledge. [Website]
- The Ethical Algorithm: This book is a great introduction to new advances in computational privacy techniques. At Immuta, we believe that data engineers should be organizational leaders in protecting customer privacy, and anonymization techniques like differential privacy are increasingly important tools to leverage. Kearns and Roth make a complicated subject very accessible. [Oxford Press]
The great thing about the data community right now is that you’ll never run out of resources to learn or people to learn from. Hopefully these resources and channels will help you on your journey to become a better data engineer.
Data engineers are central to achieving data-driven results. Find out more in Gartner’s report, Data Engineering is Critical to Driving Data and Analytics Success.