found drama

get oblique

review: Regular Expressions Cookbook

by Rob Friesel

Regular Expressions CookbookAlthough I run the risk of fawning all over this book here, Jan Goyvaerts and Steven Levithan’s Regular Expressions Cookbook (Second Edition) (O’Reilly, 2012) is a technical text that I will gladly describe using words like “essential” and “indispensable” and “invaluable”. It should be on every working programmer’s bookshelf, if not on her desk. It is exhaustive and rigorous, covering the major regex flavors across eight popular/widespread general purpose languages.[1] If your work brings you in regular contact with regular expressions, then you need easy access to this book.

To begin with, Goyvaerts and Levithan present an in-depth discussion of each regex feature, starting with the very basics (e.g., making matches against literal expressions) and working up into some pretty sophisticated topics (e.g., writing parsers). True to the title, their approach is a “cookbook” style: a general problem is stated, a solution is presented (or multiple solutions, if that’s what it takes), and then they go into an almost painful (but neatly sectioned) level of detail about the solution, describing it token-for-token in some cases. Now, by “neatly sectioned” I mean that their discussion of each solution is broken down by language[2] wherein they are careful to point out flavor- and/or language-specific nuances, quirks, bugs, and/or unique features. They are very careful about this part–if a particular feature does not work in a language (e.g., how JavaScript lacks named capturing groups) then they show you how to work around that deficiency; but perhaps more importantly, if a feature is unique to a language, they point it out as such and caution you against using them (i.e., to keep your regexes general and portable).[3]

Later chapters (i.e., 4 through 9) look at more specific problems–e.g., performing validation on email addresses,[4] dealing with Roman numerals, combing for text in the Apache Common Log Format, or parsing URLs. The recipes are all cross-referenced with each other, so if a particular solution really only solves about 75% of your problem, they’re prepared to point you in the right direction. They get right to the point, and then tell you where to go for more. What else can be said about these chapters except that they’re like the magnificent arsenal you’ll be wishing for when the text zombies swarm at your gate.

All of this makes the Regular Expressions Cookbook very skimmable. It is easy to pick it up, find the particular recipe that is going to help you out of a jam, and power through with that solution in hand. Do you “just” need a quick JavaScript solution? Done. Curious how it might compare to the solution in Java or Ruby? No problem. You skim the surface, or you can go as deep as you need[5] on some very narrow and specific sub-sub-subject within the corpus of regular expressions knowledge. (That being said, take their advice and be sure to read the first three chapters so that you are properly equipped for those deep dives later on.)

As I said before, if your work regularly brings you in contact with regular expressions, you’ll want to arm yourself with this. Highly recommended.

Disclosure: I received an electronic copy of this book from the publisher in exchange for writing this review.

UPDATE: (9/5/2012) I felt it was worth pointing out (as I did on Twitter):

  1. Goyvaerts and Levithan define the regex flavors as: .NET, Java, JavaScript, PCRE, Perl, Python, and Ruby; the specific languages covered include: C#, Java, JavaScript (and Levithan’s XRegExp library), PHP, Perl, Python, Ruby, and VB.NET. They also have a list in chapter 3 of 11 other languages which–while not specifically covered–are applicable because they adhere to one of the flavors. []
  2. I should add “where appropriate” here, and note that the per-language sections in each discussion are much more common in the early chapters (2 and 3, with a pretty sharp drop-off starting in 4). This is because they’re covering the fundamentals, and there’s a lot more in the way of quirks and nuances to tread lightly around at this point. []
  3. In other words: they remind you not to get too clever. “Sure you could do that as a one-liner… but no one’s going to know what that means next week. Not even you.” []
  4. Which, validating an email address is not as easy as it sounds. []
  5. Or as deep as you want, if you’re in to that sort of thing. []

About Rob Friesel

Software engineer by day, science fiction writer by night. Author of The PhantomJS Cookbook and a short story in Please Do Not Remove. View all posts by Rob Friesel →

4 Responses to review: Regular Expressions Cookbook

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*