about this book

Who should read this book

Data Science with Python and Dask takes you on a hands-on journey through a typical data science workflow—from data cleaning through deployment—using Dask. The book begins by presenting some foundational knowledge of scalable computing and explains how Dask takes advantage of those concepts to operate on datasets big and small. Building on that foundation, it then turns its focus to preparing, analyzing, visualizing, and modeling various real-world datasets to give you tangible examples of how to use Dask to perform common data science tasks. Finally, the book ends with a step-by-step walkthrough of deploying your very own Dask cluster on AWS to scale out your analysis code.

Data Science with Python and Dask was primarily written with beginner to intermediate data scientists, data engineers, and analysts in mind, specifically those who have not yet mastered working with datasets that push the limits of a single machine. While prior experience with other distributed frameworks (such as PySpark) is not necessary, readers who have such experience can also benefit from this book by being able to compare the capabilities and ergonomics of Dask. There are various articles and documentation available online, but none are focused specifically on using Dask for data science in such a comprehensive manner as this book.

How this book is organized: A roadmap

This book has three sections that cover 11 chapters.

about this book

Who should read this book

How this book is organized: A roadmap

About the code

liveBook discussion forum