Mastering large datasets with Python : parallelize and distribute your Python code / J.T. Wolohan.

By:

Wolohan, J. T [author.]

Material type: Text

TextPublication details: Shelter Island, New York : Manning, 2019.Description: xx, 289 pages : illustrations ; 24 cmISBN:

9781617296239

Other title:

Large datasets with Python

Subject(s):

LOC classification:

QA76.73.P98 W65 2019

Summary: Programming techniques that work well on laptop-sized data can slow to a crawl-- or fail altogether-- when applied to massive files or distributed datasets. By mastering the powerful map and reduce paradigm, along with the Python-based tools that support it, you can write data-centric applications that scale efficiently without requiring codebase rewrites as your requirements change. "Mastering large datasets with Python" teaches you to write code that can handle datasets of any size. You'll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You'll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firly in place, you'll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3.

Reviews from LibraryThing.com:

Tags from this library: No tags from this library for this title. Log in to add tags.

Holdings
Item type	Current library	Collection	Call number	Vol info	Status	Date due	Barcode
Main Short	Martin Oduor-Otieno Library This item is located on the library ground floor	Non-fiction	QA76.73.P98 W65 2019 (Browse shelf(Opens below))	31692/24	Available		MOOL24030011
Main Short	Martin Oduor-Otieno Library This item is located on the library ground floor	Non-fiction	QA76.73.P98 W65 2019 (Browse shelf(Opens below))	31693/24	Available		MOOL24030012

Includes index.

Programming techniques that work well on laptop-sized data can slow to a crawl-- or fail altogether-- when applied to massive files or distributed datasets. By mastering the powerful map and reduce paradigm, along with the Python-based tools that support it, you can write data-centric applications that scale efficiently without requiring codebase rewrites as your requirements change. "Mastering large datasets with Python" teaches you to write code that can handle datasets of any size. You'll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You'll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firly in place, you'll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3.

There are no comments on this title.

to post a comment.

Print
Add to your cart (remove)
Save record
BIBTEX Dublin Core MARCXML MARC (non-Unicode/MARC-8) MARC (Unicode/UTF-8) MARC (Unicode/UTF-8, Standard) MODS (XML) RIS
More searches

Search for this title in:
Other Libraries (WorldCat) Other Databases (Google Scholar) Online Stores (Bookfinder.com)

here

The KCAU Library

Mastering large datasets with Python : parallelize and distribute your Python code / J.T. Wolohan.

More Links

More Links