I just finished up reading Practical Data Analysis by Hector Cuesta (Packt Publishing, 2013) and overall, it was a pretty good overview and recommends some good tools. I would say that the book is a good place for someone to get started if they have no real experience performing these kinds of analyses, and though Cuesta doesn’t go deep into the math behind it all, he isn’t afraid to use the technical names for different formulae, which should make it easy for you to do your own follow-up research.1
Jeff Leek’s Data Analysis on Coursera provides the lens through which I read this book.2 That being said, I found myself doing a lot of comparing and contrasting between the two. For example, they both use practical, reasonably small “real world” sample problems to highlight specific analytical techniques and/or features of their chosen toolkits. However, whereas Leek’s course focused exclusively on using R, Cuesta assembles his own all-star team of tools using Python3 and D3.js. Perhaps it goes without saying, but there are pros and cons to each approach (e.g., Leek’s “pure R” vs. Cuesta’s “Python plus D3.js”), and I felt that it was best to consider them together.
Cuesta’s approach with this book is to present a sample scenario in each chapter that introduces a class of problem, a solution to that problem, and his recommended toolkit. For example, chapter six creates a stock price simulation, introducing simple simulation problems (especially for apparently stochastic data), time series data and Monte Carlo methods, and then how to simulate the data using Python and visualizing it in D3.js. Although the book is not strictly a “cookbook”, the chapters very much feel like macro-level “recipes”. There’s quite a bit of code and some decent discussion around the concepts that govern the analytical model, and (true to the “practical” in the title) the emphasis is on the “how” and not the “why”.
While I did not read the entire book cover-to-cover, I would definitely recommend it to anyone that wants an introduction to some basic data analysis techniques and tools. You’ll get more out of this book if you have some base to compare it to — e.g., some experience in R (academic or otherwise); and you’ll get the most out of this book if you also have a solid foundation in the mathematics and/or statistics that underlie these analytical approaches. Check it out on the Packt Publishing site: bit.ly/1co6hOZ
Disclosure: I received an electronic copy of this book from the publisher in exchange for writing this review.
- As an aside, this seems to be par for the course for the “technical” data analysis books, blog posts, and MOOCs that I’ve encountered. That is to say, “the math” is touched on, but if you don’t already have a background in linear algebra (or whatever) then you’re going to wind up taking it on faith that support vector machines do what you need them to do. [↩]
- I wrote about my experience in Jeff Leek’s class in April of 2013. (See: “reflecting on Data Analysis”.) [↩]
- Both the Python standard library and a collection of libraries like mlpy and matplotlib. [↩]