Dask is a flexible, open source, parallel computing library for analytic computing. It takes a Python job and shares it across multiple systems.
It’s main virtue is that if you are familiar with Python’s syntax, you’re ready to use Dask.
Dask consists of two components:
- Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
- “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.
It offers three main interfaces for many popular machine learning and scientific-computing libraries in Python:
- Array, which works like NumPy arrays.
- Bag, which is akin to the RDD interface in Spark. Dask.Bag parallelizes computations across a large collection of generic Python objects.
- DataFrame, which works like Pandas DataFrame.
- Provides parallelized NumPy array and Pandas DataFrame objects.
- Scale Pandas, scikit-learn, and NumPy workflows with minimal rewriting.
- Provides a task scheduling interface for more custom workloads and integration with other projects.
- Enables distributed computing in pure Python with access to the PyData stack.
- Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms.
- Runs resiliently on clusters with thousands of cores.
- Supports encryption and authentication using TLS/SSL certificates.
- Resilient – can handle the failure of worker nodes gracefully and is elastic.
- Scales down – easy to set up and run on a laptop in a single process. This is useful if you need to manipulate some datasets without needing to use a cluster.
- Responsive – designed with interactive computing in mind it provides rapid feedback and diagnostics to aid humans.
- Diagnostic and investigative tools:
- Real-time and responsive dashboard that shows current progress, communication costs, memory use, and more, updated every 100ms.
- A statistical profiler installed on every worker that polls each thread every 10ms to determine which lines in your code are taking up the most time across your entire computation.
- An embedded IPython kernel in every worker and the scheduler, allowing users to directly investigate the state of their computation with a pop-up terminal
- The ability to re-raise errors locally, so that they can use the traditional debugging tools to which they are accustomed, even when the error happens remotely.
- Several user APIs.
|New to Linux? Read our Linux for Starters series. We start right at the basics and teach you everything you need to know to get started with Linux.|
|The largest compilation of the best free and open source software in the universe. Each article is supplied with a legendary ratings chart helping you to make informed decisions.|
|Hundreds of in-depth reviews offering our unbiased and expert opinion on software. We offer helpful and impartial information.|
|Replace proprietary software with open source alternatives: Google, Microsoft, Apple, Adobe, IBM, Autodesk, Oracle, Atlassian, Corel, Cisco, Intuit, and SAS.|
|Linux Around The World showcases events and usergroups that are Linux-related. This is a new series.|
|Getting Started with Docker helps you master Docker, a set of platform as a service products that delivers software in packages called containers.|
|Essential Linux system tools focuses on small, indispensable utilities, useful for system administrators as well as regular users.|
|Linux utilities to maximise your productivity. Small, indispensable tools, useful for anyone running a Linux machine.|
|Home computers became commonplace in the 1980s. Emulate home computers including the Commodore 64, Amiga, Atari ST, ZX81, Amstrad CPC, and ZX Spectrum.|
|Now and Then examines how promising open source software fared over the years. It can be a bumpy ride.|
|Linux at Home looks at a range of home activities where Linux can play its part, making the most of our time at home, keeping active and engaged.|
|Linux Candy reveals the lighter side of Linux. Have some fun and escape from the daily drudgery.|
|Best Free Android Apps. We showcase free Android apps that are definitely worth downloading. There's a strict eligibility criteria for inclusion in this series.|
|These best free books accelerate your learning of every programming language. Learn a new language today!|
|These free tutorials offer the perfect tonic to our free programming books series.|
|Stars and Stripes is an occasional series looking at the impact of Linux in the USA.|