Based on a set of strings (sentences) from a language, we wish to infer the complexity of the underlying grammar. To this end, we develop a methodology to choose between two classes of formal grammars in the Chomsky hierarchy: simple regular grammars and more complex context-free grammars. To do so, we introduce a probabilistic context-free grammar model in the form of a Hierarchical Dirichlet Process over rules expressed in Greibach Normal Form. In comparison to other representations, this has the advantage of nesting the regular class within the context-free class. It allows us in particular to ensure that in this representation, the prior probability under the context free grammar model to draw a regular grammar is equal to 0 and to get an understanding of some of the properties of typical sentences drawn from this model.
We consider model comparison with Bayes' factors. The model is fit using a Sequential Monte Carlo method, implemented in the Birch probabilistic programming language. We apply this methodology to data collected from primates, for which the complexity of the grammar is a key question.
How to attend online:
Join Zoom Meeting: https://unibocconi-it.zoom.us/j/97165784062
Meeting ID: 971 6578 4062
in presence: room 3-E4-SR03