Mastering large datasets with Python : parallelize and distribute your Python code / J.T. Wolohan.
Material type: TextPublication details: Shelter Island, New York : Manning, 2019.Description: xx, 289 pages : illustrations ; 24 cmISBN:- 9781617296239
- Large datasets with Python
- QA76.73.P98 W65 2019
Item type | Current library | Collection | Call number | Vol info | Status | Date due | Barcode |
---|---|---|---|---|---|---|---|
Main Short | Martin Oduor-Otieno Library This item is located on the library ground floor | Non-fiction | QA76.73.P98 W65 2019 (Browse shelf(Opens below)) | 31692/24 | Available | MOOL24030011 | |
Main Short | Martin Oduor-Otieno Library This item is located on the library ground floor | Non-fiction | QA76.73.P98 W65 2019 (Browse shelf(Opens below)) | 31693/24 | Available | MOOL24030012 |
Includes index.
Programming techniques that work well on laptop-sized data can slow to a crawl-- or fail altogether-- when applied to massive files or distributed datasets. By mastering the powerful map and reduce paradigm, along with the Python-based tools that support it, you can write data-centric applications that scale efficiently without requiring codebase rewrites as your requirements change. "Mastering large datasets with Python" teaches you to write code that can handle datasets of any size. You'll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You'll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firly in place, you'll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3.
There are no comments on this title.