# PRIN PNRR Grant P20229PBZR (2023-2025) “When deep learning is not enough: creating hard benchmarks and physics-inspired algorithms for combinatorial optimization problems"

Bocconi Unit Coordinator: Carlo Lucibello

ABSTRACT

Combinatorial optimization problems (COPs) such as the traveling salesman problem and boolean satisfiability problems are of

fundamental importance in theoretical computer science and in countless industrial applications.

Recently, machine learning (ML) tools such as transformers and graph neural networks, have been applied to COPs in an effort to

overcome the limitations of classical solvers. Despite the outstanding results obtained in computer vision and natural language processing, ML models are not yet competitive with classical algorithms for COPs.

Part of the success of modern ML is due to the creation of large training datasets and to a benchmark-driven culture where each new

model is compared to competing ones on a variety of datasets. We argue that current datasets and benchmarks for COPs could be

largely improved; Moreover claims of effectiveness against classical solvers are often overstated. In some scenarios, simple greedy

algorithms return solutions of the same quality as ML models while being several orders of magnitude faster.

The statistical physics community has analyzed COPs for several decades, developing powerful and generic solvers such as

simulated annealing and parallel tempering, and identifying ensembles of synthetic problems whose hardness can be tuned by a

single parameter in a controlled way.

In this proposal we will look at COPs through the lens of statistical physics with the goal of i) creating synthetic datasets and hard

benchmarks for testing classical and learned algorithms ii) using physical knowledge to understand what hardness is for ML

algorithms.

We will highlight the hardest problems for existing classical algorithms and outline the current and future challenges for models. We

will provide datasets of various degrees of hardness, both supervised and unsupervised, that can be used for training and testing

models. We will open-source and make publicly available the datasets along with libraries containing the best algorithms designed

by different communities, in order to bootstrap future research and provide high-quality baselines.

In this proposal, we will also consider the problem of unbiased sampling solutions to constraint satisfaction problems. We will build

on recent advances in deep learning, mathematics, and statistical physics to develop two efficient and scalable algorithms, a

classical one and a learned one. Their common underpinning is in the fact that they will be both expressed as stochastic diffusive

processes, namely stochastic localization and denoising probabilistic diffusion (the one behind recent text-to-image models). We will

test the algorithms on the paradigmatic benchmarks developed during the project. Moreover, we will offer a characterization of their

strengths and limitations in terms of a geometrical description of the solution space sing tools from spin glass theory. The resulting

theoretical control of ML systems will improve their safety and explainability.