We present a comprehensive study of task arithmetic in vision-language models and show that weight disentanglement is the crucial factor that makes it effective. Notably, we show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement. This leads to substantial performance improvements across multiple task arithmetic benchmarks and diverse models.

We present a large-scale benchmark of label noise methods that leverage privileged information and analyze why and when they work.

We propose PRIME, a general data augmentation scheme that consists of simple families of max-entropy image transformations. We show that PRIME outperforms the prior art for corruption robustness, while its simplicity and plug-and-play nature enables it to be combined with other methods to further boost their robustness.

We find that the interplay between the structure of the data and the dynamics of AT plays a fundamental role in CO. Specifically, through active interventions on typical datasets of natural images, we establish a causal link between the structure of the data and the onset of CO in single-step AT methods.

We show that most INR families are analogous to structured signal dictionaries whose atoms are integer harmonics of the set of initial mapping frequencies whose inductive bias is akin to that of signal dictionary formed by the eigenfunctions of the NTK at initialization. We reveal that meta-learning has a reshaping effect of the NTK analogous to dictionary learning, building dictionary atoms as a combination of the examples seen during meta-training.

We provide strong empirical evidence to determine the practical validity of the linear approximation of neural networks for different learning tasks. Specifically, we discover that, in contrast to what was previously observed, neural networks do not always perform better than their kernel approximations, and reveal that their performance gap heavily depends on architecture, number of samples and training task.

The underspecification of most machine learning pipelines means that we cannot rely solely on validation performance to assess the robustness of deep learning systems to naturally occurring distribution shifts. Instead, making sure that a neural network can generalize across a large number of different situations requires to understand the specific way in which it solves a task. In this work, we propose to study this problem from a geometric perspective with the aim to understand two key characteristics of neural network solutions in underspecified settings: how is the geometry of the learned function related to the data representation? And, are deep networks always biased towards simpler solutions, as conjectured in recent literature? We show that the way neural networks handle the underspecification of these problems is highly dependent on the data representation, affecting both the geometry and the complexity of the learned predictors. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.

We analyze the role of the network architecture in shaping the inductive bias of deep classifiers. Specifically, we introduce the neural anisotropy directions (NADs) of a network as the vectors that encapsulate the directional inductive bias of an architecture. These vectors, encode the preference of a network to separate the input data based on some particular features.

Important insights towards the explainability of neural networks reside in the characteristics of their decision boundaries. In this work, we borrow tools from the field of adversarial robustness, and propose a new perspective that relates dataset features to the distance of samples to the decision boundary. Specifically, we rigorously confirm that neural networks exhibit a high invariance to non-discriminative features, and show that very small perturbations of the training samples in certain directions can lead to sudden invariances in the orthogonal ones.

In this article, we provide an in-depth review of the field of adversarial robustness in deep learning, and give a self-contained introduction to its main notions. But, in contrast to the mainstream pessimistic perspective of adversarial robustness, we focus on the main positive aspects that it entails.

In this work, we identify the subspace of features used by CNNs to classify large-scale vision benchmarks, and reveal some intriguing aspects of their robustness to distributions shift. Specifically, we show that the existence of redundant features on a dataset can harm the networks’ robustness to distribution shifts.

In this paper, we show empirically that in settings with fewer features and more training data, more complex graph networks significantly outperform simpler architectures, and propose a few insights towards to the proper choice of graph neural networks architectures.

In this paper, we devise a general forward-backward splitting algorithm based on Bregman distances for solving a wide range of optimization problems involving a differentiable function with Lipschitz-continuous gradient and a doubly stochastic constraint.

We consider the problem of designing sparse sampling strategies for multidomain signals, which can be represented using tensors that admit a known multilinear decomposition. We leverage the multidomain structure of tensor signals and propose to acquire samples using a Kronecker-structured sensing function, thereby circumventing the curse of dimensionality.

In this paper, we consider the problem of subsampling and reconstruction of signals that reside on the vertices of a product graph, such as sensor network time series, genomic signals, or product ratings in a social network.

In this era of data deluge, we are overwhelmed with massive volumes of extremely complex datasets. Data generated today is complex because it lacks a clear geometric structure, comes in great volumes, and it often contains information from multiple domains. In this thesis, we address these issues and propose two theoretical frameworks to handle such multidomain dataset.

We present a simulation framework for a 3-D high-resolution imaging radar at 300 GHz with mechanical scanning. This tool allows us to reproduce the imaging capabilities of the radar in different setups and with different targets. The simulations are based on a ray-tracing approximation combined with a bidirectional reflectance distribution function (BRDF) model for the scattering of rough surfaces. Moreover, we present a novel approach to estimate the scattering parameters of the BRDF model for different types of targets from the combination of the radar data and information obtained from an infrared structure light sensor. This new framework will serve as a baseline for the design of future radar multistatic configurations and to generate synthetic data to train automatic target recognition algorithms.