Franck Pommereau - Blog - Parsley rocks!

Times ago I’ve been impressed by this video about Parsley, demonstrating a very convincing parsing tool.

Yesterday, I needed to parse some BibTeX. After testing several packages without being really convinced, I decided to give Parsley a try.

My need is actually to parse only a restricted version of BibTeX, in particular, all items must be enclosed in braces {...} while full BibTeX allows "..." or no delimiter at all. We need a few rules:

text matches a text with no braces inside. anything:x ?(x not in '{}') -> x matches any character not in '{}' and the rule matches a list of such x collected in data. So we concatenate them in -> "".join(data)
string matches a text with possibly nested braces. Notice how string:s -> '{{ '{%s}' }}' % s restores the braces when a string is matched inside another string
value matches a string followed by a comma¹
pair matches a key/value like author = {...},
item matches the pairs inside a BibTeX reference
entry adds the type (e.g., @Article) and key reference
finally biblio matches the whole content of a bib file

All together, this yields the following code:

import parsley

parser = parsley.makeGrammar(r"""
    text   = (anything:x ?(x not in '{}') -> x)+:data
        -> "".join(data)
    string = '{' (text|(string:s -> '{{ '{%s}' }}' % s))+:data '}'
        -> "".join(data)
    value  = string:data ','
        -> data
    pair   = ws (letter+):key ws '=' ws value:val ws
        -> "".join(key), val
    item   = pair:first pair*:rest
        -> [first] + rest
    entry  = ws '@' (letter+):kind ws '{'
             (anything:x ?(x not in ' \t\n\r{,}') -> x)+:key ','
             item:content '}' ws
        -> [('type', "".join(kind)), ('key', "".join(key))] + content
    biblio = ws (entry:e ws -> e)*:items
        -> [dict(i) for i in items]
""", {})

And that’s it. By running parser(bibdata).biblio() I get my bib file turned into a list of dict. Moreover, not only Parsley allows to easily build a parser, but also it gives really helpful error messages on parsing errors, which is usually not the case for most parser generators.

My only disappointment is that Parsley could not handle a better rule for item: when I write item = pair:first (',' pair)*:rest ','? and drop rule value using string instead, Parsley complains for incorrect syntax at parse time. Maybe I should really read the doc… ↩