Times ago I’ve been impressed by this video about Parsley, demonstrating a very convincing parsing tool.
Yesterday, I needed to parse some BibTeX. After testing several packages without being really convinced, I decided to give Parsley a try.
My need is actually to parse only a restricted version of BibTeX, in
particular, all items must be enclosed in braces {...} while full
BibTeX allows "..." or no delimiter at all. We need a few rules:
textmatches a text with no braces inside.anything:x ?(x not in '{}') -> xmatches any character not in'{}'and the rule matches a list of suchxcollected indata. So we concatenate them in-> "".join(data)stringmatches a text with possibly nested braces. Notice howstring:s -> '{{ '{%s}' }}' % srestores the braces when astringis matched inside anotherstringvaluematches astringfollowed by a comma1pairmatches a key/value likeauthor = {...},itemmatches the pairs inside a BibTeX referenceentryadds the type (e.g.,@Article) and key reference- finally
bibliomatches the whole content of a bib file
All together, this yields the following code:
import parsley
parser = parsley.makeGrammar(r"""
text = (anything:x ?(x not in '{}') -> x)+:data
-> "".join(data)
string = '{' (text|(string:s -> '{{ '{%s}' }}' % s))+:data '}'
-> "".join(data)
value = string:data ','
-> data
pair = ws (letter+):key ws '=' ws value:val ws
-> "".join(key), val
item = pair:first pair*:rest
-> [first] + rest
entry = ws '@' (letter+):kind ws '{'
(anything:x ?(x not in ' \t\n\r{,}') -> x)+:key ','
item:content '}' ws
-> [('type', "".join(kind)), ('key', "".join(key))] + content
biblio = ws (entry:e ws -> e)*:items
-> [dict(i) for i in items]
""", {})
And that’s it. By running parser(bibdata).biblio() I get my bib file
turned into a list of dict. Moreover, not only Parsley allows to
easily build a parser, but also it gives really helpful error messages
on parsing errors, which is usually not the case for most parser
generators.
-
My only disappointment is that Parsley could not handle a better rule for
item: when I writeitem = pair:first (',' pair)*:rest ','?and drop rulevalueusingstringinstead, Parsley complains for incorrect syntax at parse time. Maybe I should really read the doc… ↩