A Theory of Content

Mark Steedman

Linguists and computational linguists have come up with some quite useful theories of the semantics of function words and the corresponding logical elements such as generalized quantifiers and negation (Woods 1968; Montague, 1973; Steedman 2012). There has been much less progress in defining a usable semantics for content words. The effects of this deficiency are very bad: linguists find themselves in the embarrassing position of saying that the meaning of "seek" is seek'. Computationalists find that their wide coverage parsers, which are now fast and robust enough to parse billions of words of web text, have very low recall as question answerers because, while the answers to questions like "Who bought YouTube'" are out there on the web, they are not stated in the form suggested by the question, "X bought YouTube", but in some other form that paraphrases or entails the answer, such as "X's purchase of YouTube". Semantics as we know it is not provided in a form that supports practical inference over the variety of expression we see in real text.

I'll discuss recent work with Mike Lewis which seeks to define a novel form of semantics for content words using semi-supervised machine learning methods over unlabeled text. True paraphrases are represented by the same semantic constant. Common-sense entailment is represented directly in the lexicon, rather than delegated to meaning postulates and theorem-proving. The method can be applied cross-linguistically, in support of machine translation. I'll discuss the relation of this representation of content to the hidden prelinguistic language of mind that must underlie all natural language semantics, but which has so far proved resistant to discovery.

Lewis and Steedman, 2013a: "Combined Distributional and Logical Semantics", Transactions of the Association for Comptuational Linguistics, 1, 179-192.

Lewis and Steedman, 2014b: "Combining Formal and Distributional Models of Temporal and Intensional Semantics", Proceedings of the ACL Workshop on Semantic Parsing, Baltimore, 28-32.