Combinator Parsing

Rusky · on May 5, 2014

I've still never seen anything that can compete with recursive descent (with pratt parsing for expressions [0]) on error handling, simplicity of implementation, and performance.

The article claims recursive descent is tedious and error prone, but I've not found that to be the case. In fact, combinators like match, many, and opt map easily to either if/while or higher order functions.

I also think people tend to focus too much on LL vs LR vs whatever. Clang parses the entire family of C languages with a hand-written recursive descent parser + pratt parsing and it's well-known for its error messages, speed, and memory usage. I would also say it's very readable and pleasant to work with (for a C++ parser of course ;)

[0] http://javascript.crockford.com/tdop/tdop.html

dons · on May 5, 2014

Amazing to think that parser combinator libraries are an almost 20 year old technique now.

And if you think these things are toys, consider e.g. http://hackage.haskell.org/package/FpMLv53 is used to process transactions worth billions.

hugofirth · on May 5, 2014

Even though they have performance issues - I still love Scala's parser combinator libraries. I hadn't really dealt much with parsing prior to my use of them, and they made the process of parsing seem a lot more obvious and less "mystical" .

platz · on May 5, 2014

I saw there was a port of attoparsec recently, but it still has to use trampolines.

Do you know of any benchmarks of different parsers, especially combinator libraries?

stephenjudkins · on May 5, 2014

parboiled2 [https://github.com/sirthias/parboiled2] offers pretty much everything the built-in parser combinators do, but with fewer bugs and vastly improved performance. Since both use PEG grammars, it should be straightforward to port over. parboiled2 is fast because it generates fast, stateful parsers using macros while only exposing a purely functional interface.

phpnode · on May 5, 2014

isn't there a good PEG available for python? Compare the source code in the post to the equivalent PEG.js grammar - http://pegjs.majda.cz/online

Ygg2 · on May 5, 2014

What is the relationship between PEG parsers and Parser Combinators?

MarkCC · on May 5, 2014

(Original blogpost author here.)

They're pretty much orthogonal.

PEG is a particular parsing technique. Combinators are a technique for implementing parsers.

You can write a parser combinator library which uses PEG or packrat algorithms underneath.

In this blog post, the library that I'm using is basically building LL parse generators, not PEG. But you could easily implement a PEG-based combinator system.

I didn't do that, because I didn't want to get bogged down with explaining too much about parsing algorithms. The LL/recursive descent approach is pretty easy to understand when you see the grammar code. For PEG, I would have needed to spend more time explaining the algorithms, and the post was already too long!

jerf · on May 5, 2014

That's not quite equivalent; the JS doesn't do division or subtraction.

The Python syntax is a bit klunky. The Haskell equivalent is on par with the PEG. I don't know how they compare as the grammars become non-trivial.