PRIN PNRR Grant P2022H5WZ9 (2023-2025) “Measuring Biodiversity via Bayesian Nonparametrics: Estimation, Clustering and Uncertainty Quantification"

National Scientific Coordinator: Igor Pruenster

ABSTRACT

The first estimator for the probability of discovering a previously unknown species was derived by Alan Turing, while trying to decipher the German Enigma machine during World War II. Generalizations of the Turing estimator answer the “unseen species problem”, which deals with the estimation of the number of species represented in an ecosystem that were not observed in the available data. The unseen species problem represents just one, albeit fundamental, aspect of biodiversity quantification, which further includes population size estimation, estimation of species proportions and of rare species, just to mention a few.

This broad class of statistical problems involving populations of individuals belonging to different species with unknown proportions are known as species sampling problems. Interest in this class of problems has grown dramatically in recent years due to the role they play in the study of genomic diversity, yet another facet of biodiversity, and a variety of related issues in different scientific fields.
Among the various statistical paradigms, the Bayesian nonparametric approach stands out as ideally suited for tackling species sampling problems and leads to principled estimation, model-based clustering and uncertainty quantification. It already counts on several success stories in the exchangeable case, i.e., the case of a single homogeneous population. Homogeneity is often restrictive in the presence of different yet interacting ecosystems, motivating our goal to go beyond and tackle the situation of multiple heterogeneous populations. First we introduce and investigate a suitable class of probabilistic models, to be termed multivariate species sampling models, and then address several multivariate species sampling problems, which are to a large extent open and unexplored. Challenging new research directions will include the multivariate unseen species problem, clustering and temporal evolution of species communities, and taxonomic classification.