Image of

PRIN PNRR Grant P2022H5WZ9 (2023-2025) “Measuring Biodiversity via Bayesian Nonparametrics: Estimation, Clustering and Uncertainty Quantification"

ABSTRACT

The first estimator for the probability of discovering a previously unknown species was derived by Alan Turing, while trying to
decipher the German Enigma machine during World War II. Generalizations of the Turing estimator answer the “unseen species
problem”, which deals with the estimation of the number of species represented in an ecosystem that were not observed in the
available data. The unseen species problem represents just one, albeit fundamental, aspect of biodiversity quantification, which
further includes population size estimation, estimation of species proportions and of rare species, just to mention a few. This broad
class of statistical problems involving populations of individuals belonging to different species with unknown proportions are known as species sampling problems. Interest in this class of problems has grown dramatically in recent years due to the role they play in
the study of genomic diversity, yet another facet of biodiversity, and a variety of related issues in different scientific fields.
Among the various statistical paradigms, the Bayesian nonparametric approach stands out as ideally suited for tackling species
sampling problems and leads to principled estimation, model-based clustering and uncertainty quantification. It already counts on
several success stories in the exchangeable case, i.e., the case of a single homogeneous population. Homogeneity is often restrictive
in the presence of different yet interacting ecosystems, motivating our goal to go beyond and tackle the situation of multiple
heterogeneous populations. First we introduce and investigate a suitable class of probabilistic models, to be termed multivariate
species sampling models, and then address several multivariate species sampling problems, which are to a large extent open and
unexplored. Challenging new research directions will include the multivariate unseen species problem, clustering and temporal
evolution of species communities, and taxonomic classification.