Robust Parsing: Bridging the Coverage Chasm

Dan Flickinger

Grammar implementations which are guided by linguistic theory will normally lack coverage of even some well-formed utterances, since no current theory exhaustively characterizes all of the phenomena in any language.  For many uses of a grammar, approximate or robust analyses of the out-of-grammar utterances would be better than nothing, and a variety of approaches have been developed for such robust parsing.  In this paper I present an implemented method which adds two simple "bridging" rules to an existing broad-coverage grammar, the English Resource Grammar, allowing any two constituents to combine.  This method relies on a parser which can efficiently pack the full parse forest for an utterance, and then selectively unpack the most likely N analyses guided by a statistical model trained on a manually constructed treebank.  Initial experimental results with two types of annotated corpus data show both promise and some remaining challenges for this approach.