Times ago I’ve been impressed by this video about Parsley, demonstrating a very convincing parsing tool.

Yesterday, I needed to parse some BibTeX. After testing several packages without being really convinced, I decided to give Parsley a try.

My need is actually to parse only a restricted version of BibTeX, in particular, all items must be enclosed in braces {...} while full BibTeX allows "..." or no delimiter at all. We need a few rules:

All together, this yields the following code:

import parsley

parser = parsley.makeGrammar(r"""
    text   = (anything:x ?(x not in '{}') -> x)+:data
        -> "".join(data)
    string = '{' (text|(string:s -> '{{ '{%s}' }}' % s))+:data '}'
        -> "".join(data)
    value  = string:data ','
        -> data
    pair   = ws (letter+):key ws '=' ws value:val ws
        -> "".join(key), val
    item   = pair:first pair*:rest
        -> [first] + rest
    entry  = ws '@' (letter+):kind ws '{'
             (anything:x ?(x not in ' \t\n\r{,}') -> x)+:key ','
             item:content '}' ws
        -> [('type', "".join(kind)), ('key', "".join(key))] + content
    biblio = ws (entry:e ws -> e)*:items
        -> [dict(i) for i in items]
""", {})

And that’s it. By running parser(bibdata).biblio() I get my bib file turned into a list of dict. Moreover, not only Parsley allows to easily build a parser, but also it gives really helpful error messages on parsing errors, which is usually not the case for most parser generators.


  1. My only disappointment is that Parsley could not handle a better rule for item: when I write item = pair:first (',' pair)*:rest ','? and drop rule value using string instead, Parsley complains for incorrect syntax at parse time. Maybe I should really read the doc…