review: Python for Data Analysis
¶ by Rob FrieselWes McKinney’s Python for Data Analysis (O’Reilly, 2012) is a tour pandas and NumPy (mostly pandas) for folks looking to crunch “big-ish” data with Python. The target audience is not Pythonistas, but rather scientists, educators, statisticians, financial analysts, and the rest of the “non-programmer” cohort that is finding more and more these days that it needs to do a little bit-sifting to get the rest of their jobs done.
First, two warnings:
- This book is not an introduction to Python. While McKinney does not assume that you know any Python, he isn’t exactly going to hold your hand on the language here. There is an appendix (“Python Language Essentials”) that beginners will want to read before getting too far, but otherwise you’re on your own. (“Lucky for you Python is executable pseudocode”?)
- This book is not about theories of data analysis. What I mean by that is: if you’re looking for a book that is going to tell you the types of analyses to do, this is not that book. McKinney assumes that you already know, through your “actual” training, what kinds of analyses you need to perform on your data, and how to go about the computations necessary for those analyses.
That being said: McKinney is the principal author on pandas, a Python package for doing data transformation and statistical analysis. The book is largely about pandas (and NumPy), offering overviews of the utilities in these packages, and concrete examples on how to employ them to great effect. In examining these libraries, McKinney also delves into general methodologies for munging data and performing analytical operations on them (e.g., normalizing messy data and turning it into graphs and tables). McKinney also delves into some (semi) esoteric information about how Python works at very low levels and ways to optimize data structures so that you can get maximum performance from your programs. McKinney is clearly knowledgeable about these libraries, about Python, and about using those tools effectively in analytical software.
So where do I land on Python for Data Analysis? If you’re looking for a book that discusses data analysis in a broad sense, or one that pays special attention to the theory, this isn’t that book. If you’re looking for a generalist’s book on Python–also not this book. However, if you’ve already selected Python as your analytical tool (and it sounds like it’s more/less the de facto analytical tool in many circles) then this just might be the perfect book for you.
Disclosure: I received an electronic copy of this book from the publisher in exchange for writing this review.
Leave a Reply