Ensuring Deep Learning Model Predictions for Whatever could be Unexpected and Uncertain

On December 11, 2023, two Waymo robotaxis crashed into the same towed pickup truck in Phoenix. Waymo’s Chief Safety Officer Mauricio Peña wrote that “a Waymo vehicle made contact with a backwards-facing pickup truck being improperly towed ahead of the Waymo vehicle such that the pickup truck was persistently angled across a center turn lane and a traffic lane. Following contact, the tow truck and towed pickup truck did not pull over or stop traveling, and a few minutes later another Waymo vehicle made contact with the same pickup truck while it was being towed in the same manner.”

Those two crashes prompted the first recall of Waymo’s deep learning software.

Those two crashes raise the following questions:

  • How do you design deep learning systems in inference for unexpected situations that were not taken into account into the training data?
  • How do you quantify uncertainty in order to guarantee the performance of the deep learning system predictions in inference?

I attended a month and a half ago a lecture at Stanford University from Princeton researcher Jake Snell from the Princeton’s Computational Cognitive Science Lab that suggested some answers to those two critical questions:

“Large-scale deep neural networks have emerged as a powerful approach to predictive tasks in a broad range of domains. However, they are apt to fail in subtle and unexpected ways when applied to tasks beyond their training data. In this talk, I will show how probabilistic inference can be used to improve the design and validation of neural networks. First, I will discuss how meta-learning can be used to endow neural networks with favorable inductive biases from probabilistic models. Second, I will illustrate how to flexibly quantify the performance of black-box predictive algorithms by using tools from empirical process theory.”

1st question – Adapt to novel situations:

Approach: data augmentation of the training sets or generative model to introduce new training data

Abstract:

“Most applications of machine learning to classification assume a closed set of balanced classes. This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution and it is unlikely that all classes are seen in a single sample. Nonparametric Bayesian models naturally capture this phenomenon, but have significant practical barriers to widespread adoption, namely implementation complexity and computational inefficiency. To address this, we present a method for extracting the inductive bias from a nonparametric Bayesian model and transferring it to an artificial neural network. By simulating data with a nonparametric Bayesian prior, we can metalearn a sequence model that performs inference over an unlimited set of classes. After training, this “neural circuit” has distilled the corresponding inductive bias and can successfully perform sequential inference over an open set of classes. Our experimental results show that the metalearned neural circuit achieves comparable or better performance than particle filter-based methods for inference in these models while being faster and simpler to use than methods that explicitly incorporate Bayesian nonparametric inference.”

Paper: A Metalearned Neural Circuit for Nonparametric Bayesian Inference

2nd question – Quantify uncertainty:

Approach: a framework to produce a family of bounds on quantiles of the loss distribution incurred by a predictor

Abstract:

“Rigorous guarantees about the performance of predictive algorithms are necessary in order to ensure their responsible use. Previous work has largely focused on bounding the expected loss of a predictor, but this is not sufficient in many risk-sensitive applications where the distribution of errors is important. In this work, we propose a flexible framework to produce a family of bounds on quantiles of the loss distribution incurred by a predictor. Our method takes advantage of the order statistics of the observed loss values rather than relying on the sample mean alone. We show that a quantile is an informative way of quantifying predictive performance, and that our framework applies to a variety of quantile-based metrics, each targeting important subsets of the data distribution. We analyze the theoretical properties of our proposed method and demonstrate its ability to rigorously control loss quantiles on several real-world datasets.”

Paper: Quantile Risk Control: A Flexible Framework for Bounding the Probability of High-Loss Predictions

An (obvious) application of the previous paper to LLM:

Abstract:

“The recent explosion in the capabilities of large language models has led to a wave of interest in how best to prompt a model to perform a given task. While it may be tempting to simply choose a prompt based on average performance on a validation set, this can lead to a deployment where unexpectedly poor responses are generated, especially for the worst-off users. To mitigate this prospect, we propose Prompt Risk Control, a lightweight framework for selecting a prompt based on rigorous upper bounds on families of informative risk measures. We offer methods for producing bounds on a diverse set of metrics, including quantities that measure worst-case responses and disparities in generation quality across the population of users. In addition, we extend the underlying statistical bounding techniques to accommodate the possibility of distribution shifts in deployment. Experiments on applications such as open-ended chat, medical question summarization, and code generation highlight how such a framework can foster responsible deployment by reducing the risk of the worst outcomes.”

Paper: Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

Note 1: I wish we knew what was the fix by Waymo to its deep learning software.

Note 2: The picture above is from Waymo.

Copyright © 2005-2024 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com.