Formal models of natural language are neither recent nor rare: one can argue that Aristotle's syllogistic logic was the first such attempt; there is the work of Chomsky on generative grammars and the more algebraic work of Lambek (the syntactic calculus and its various forms), combinatorial work of Steedman (CCG), Discourse Representation Theory of Kamp and so on. But all of these formal systems are based on set-theoretic semantics. More recent approaches to semantics of natural language argue that pragmatics also plays a crucial role, coming from the view that representations of words should be based on the contexts in which they are used; here, various statistical measures are developed to retrieve such information from large corpora of data. A popular formal framework thereof is that of vector spaces. These provide a solid base for word meanings, but it is less clear how to extend them to phrases and sentences. In joint ongoing work with Clark and Coecke we provide a setting on how to use linear algebraic operations inspired by category theoretical models of quantum mechanics and develop a solution. Here, empirical data from corpora and experimental analysis is an essential tool to verify the theoretical predictions of the models. In this talk I will present our framework, draw connections to the other approaches listed above, and go through experimental results.
School of Informatics and Computing, Indiana University.