Attributing Model Behavior at Scale

(T) One of the biggest challenges, if not the biggest challenge in AGI, is to develop and launch deep learning powered-applications that are safe, robust, and reliable for everyone to use.

As documented by Oxford University researcher Owain Evans, if LLMs are only trained on “George Washington was the first US president”, can they automatically answer “Who was the first US president?”. The answer is surprisingly no. Wouldn’t be nice to know about those LLMs limitations before to deploy them. OpenAI’s rock star Andrej Karpathy proposed the following explanation:

LLM knowledge is a lot more “patchy” than you’d expect. I still don’t have great intuition for it. They learn any thing in the specific “direction” of the context window of that occurrence and may not generalize when asked in other directions. It’s a weird partial generalization. The “reversal curse” (cool name) is imo a special case of this.

Adversarial examples have shown that a slight change, unnoticeable to the human eyes, in the training data can completely change the model predictions. A group of MIT researchers, from the MIT Center for Deployable Machine Learning, led by Professor Aleksander Madry, have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features (meaning that all other features are robust, and those robust features need ideally to be known by the machine learning engineer).

In order to make deep learning systems “safe, robust, and reliable”, we need to understand how the end-to-end ML data pipeline works from the training data, algorithm selection, to model deployment.

Following are a few materials to dig more on that very important topics:

NeurIPS 2023 workshop on attributing model behavior at scale

NeurIPS 2023 had a workshop on attributing model behavior at scale:

How do we attribute model behavior to the training data, algorithm, data pipeline, or scale used in training?

There is much left to understand in how these different factors combine to give rise to observed behaviors. For example, we still do not fully understand how the composition of training datasets influence downstream model capabilities, how to attribute model capabilities to subcomponents inside the model, and which algorithmic choices really drive performance.

A common theme underlying all these challenges is model behavior attribution. That is, the need to tie model behavior back to factors in the machine learning pipeline—such as the choice of training dataset or particular training algorithm—that we can control or reason about.”

The workshop had a number of papers that were presented at the bottom of the Web page. Those papers are, surprise surprise, mostly about LLMs.

The MIT Center for Deployable Machine Learning

In addition to the previous paper, that MIT group has published two other papers that I found quite interesting:

Datamodels: Predicting Predictions from Training Data:

We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data. For any fixed “target” example x, training set S, and learning algorithm, a datamodel is a parameterized function 2S→ℝ that for any subset of S′⊂S — using only information about which examples of S are contained in S′ — predicts the outcome of training a model on S′ and evaluating on x. Despite the potential complexity of the underlying process being approximated (e.g., end-to-end training and evaluation of deep neural networks), we show that even simple linear datamodels can successfully predict model outputs. We then demonstrate that datamodels give rise to a variety of applications, such as: accurately predicting the effect of dataset counterfactuals; identifying brittle predictions; finding semantically similar examples; quantifying train-test leakage; and embedding data into a well-behaved and feature-rich representation space. Data for this paper (including pre-computed datamodels as well as raw predictions from four million trained deep neural networks) is available at this https URL .”

TRAK: Attributing Model Behavior at Scale:

The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets.
In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at this https URL 
.”

Following is a presentation from one of the MIT researchers, Andrew ILyas:

Google Research – “Towards Reliability in Deep Learning Systems”:

Google research has published a paper “Towards Reliability in Deep Learning Systems” (blog post, paper) that focuses on the model itself.

Plex: Towards Reliability in Deep Learning Systems:

A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models’ abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it improves the out-of-the-box performance and does not require designing scores or tuning the model for each task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex’s capabilities on challenging tasks including zero-shot open set recognition, active learning, and uncertainty in conversational language understanding.

Note: The picture above is Point Reyes.

Copyright © 2005-2024 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com.