Download high-resolution image

Perturbations, Optimization, and Statistics

Part of Neural Information Processing series

Edited by Tamir Hazan, George Papandreou, Daniel Tarlow

Paperback

$70.00 US

| $92.00 CAN

The MIT Press

On sale Dec 05, 2023 | 412 Pages | 9780262549943

Add to cart Add to list Exam Copies

Nonfiction > Science & Technology

View on Edelweiss

About
Table of Contents

A description of perturbation-based methods developed in machine learning to augment novel optimization methods with strong statistical guarantees.

In nearly all machine learning, decisions must be made given current knowledge. Surprisingly, making what is believed to be the best decision is not always the best strategy, even when learning in a supervised learning setting. An emerging body of work on learning under different rules applies perturbations to decision and learning procedures. These methods provide simple and highly efficient learning rules with improved theoretical guarantees. This book describes perturbation-based methods developed in machine learning to augment novel optimization methods with strong statistical guarantees, offering readers a state-of-the-art overview.

Chapters address recent modeling ideas that have arisen within the perturbations framework, including Perturb & MAP, herding, and the use of neural networks to map generic noise to distribution over highly structured data. They describe new learning procedures for perturbation models, including an improved EM algorithm and a learning algorithm that aims to match moments of model samples to moments of data. They discuss understanding the relation of perturbation models to their traditional counterparts, with one chapter showing that the perturbations viewpoint can lead to new algorithms in the traditional setting. And they consider perturbation-based regularization in neural networks, offering a more complete understanding of dropout and studying perturbations in the context of deep neural networks.

Preface ix
1 Introduction 1
Tamir Hazan, George Papandreou, and Daniel Tarlow
1.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Perturb-and-MAP Random Fields 17
George Papandreou and Alan L. Yuille
2.1 Energy-Based Models: Deterministic vs. Probabilistic Approaches . . . . . . . . . . . . . . . . 19
2.2 Perturb-and-MAP for Gaussian and Sparse Continuous MRFs 23
2.3 Perturb-and-MAP for MRFs with Discrete Labels . . . . . . . 28
2.4 On the Representation Power of the Perturb-and-MAP Model 35
2.5 Related Work and Recent Developments . . . . . . . . . . . . 38
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Factorizing Shortest Paths with Randomized Optimum Models 45
Daniel Tarlow, Alexander Gaunt, Ryan Adams, and Richard S. Zemel
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Building Structured Models: Design Considerations . . . . . . 47
3.3 Randomized Optimum Models (RandOMs) . . . . . . . . . . . 48
3.4 Learning RandOMs . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5 RandOMs for Image Registration . . . . . . . . . . . . . . . . 56
3.6 Shortest Path Factorization . . . . . . . . . . . . . . . . . . . 56
3.7 Shortest Path Factorization with RandOMs . . . . . . . . . . 58
3.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.9 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4 Herding as a Learning System with Edge-of-Chaos Dynamics 73
Yutian Chen and Max Welling
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Herding Model Parameters . . . . . . . . . . . . . . . . . . . . 77
4.3 Generalized Herding . . . . . . . . . . . . . . . . . . . . . . . 99
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5 Learning Maximum A-Posteriori Perturbation Models 127
Andreea Gane, Tamir Hazan, and Tommi Jaakkola
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2 Background and Notation . . . . . . . . . . . . . . . . . . . . 130
5.3 Expressive Power of Perturbation Models . . . . . . . . . . . . 131
5.4 Higher Order Dependencies . . . . . . . . . . . . . . . . . . . 132
5.5 Markov Properties and Perturbation Models . . . . . . . . . . 134
5.6 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . 136
5.7 Learning Perturbation Models . . . . . . . . . . . . . . . . . . 141
5.8 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.9 Perturbation Models and Stability . . . . . . . . . . . . . . . . 152
5.10 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6 On the Expected Value of Random Maximum A-Posteriori Perturbations 161
Tamir Hazan and Tommi Jaakkola
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.2 Inference and Random Perturbations . . . . . . . . . . . . . . 164
6.3 Low-Dimensional Perturbations . . . . . . . . . . . . . . . . . 169
6.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . 182
6.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7 A Poisson Process Model for Monte Carlo 193
Chris J. Maddison
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.2 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.3 Exponential Races . . . . . . . . . . . . . . . . . . . . . . . . 203
7.4 Gumbel Processes . . . . . . . . . . . . . . . . . . . . . . . . . 210
7.5 Monte Carlo Methods That Use Bounds . . . . . . . . . . . . 216
7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8 Perturbation Techniques in Online Learning and Optimization 233
Jacob Abernethy, Chansoo Lee, and Ambuj Tewari
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8.3 Gradient-Based Prediction Algorithm . . . . . . . . . . . . . . 237
8.4 Generic Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . 245
8.5 Experts Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.6 Euclidean Balls Setting . . . . . . . . . . . . . . . . . . . . . . 252
8.7 The Multi-Armed Bandit Setting . . . . . . . . . . . . . . . . 254
8.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
9 Probabilistic Inference by Hashing and Optimization 265
Stefano Ermon
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
9.2 Problem Statement and Assumptions . . . . . . . . . . . . . . 268
9.3 Approximate Model Counting via Randomized Hashing . . . . 270
9.4 Probabilistic Models and Approximate Inference: The WISH Algorithm . . . . . . . . . . . 274
9.5 Optimization Subject to Parity Constraints . . . . . . . . . . 279
9.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
9.7 Open Problems and Research Challenges . . . . . . . . . . . . 282
9.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
9.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
10 Perturbation Models and PAC-Bayesian Generalization Bounds 289
Joseph Keshet, Subhransu Maji, Tamir Hazan, and Tommi Jaakkola
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
10.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
10.3 PAC-Bayesian Generalization Bounds . . . . . . . . . . . . . 294
10.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
10.5 The Bayesian Perspective . . . . . . . . . . . . . . . . . . . . 298
10.6 Approximate Inference . . . . . . . . . . . . . . . . . . . . . 301
10.7 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . 302
10.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
11 Adversarial Perturbations of Deep Neural Networks 311
David Warde-Farley and Ian Goodfellow
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
11.2 Adversarial Examples . . . . . . . . . . . . . . . . . . . . . . 312
11.3 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . 329
11.4 Generative Adversarial Networks . . . . . . . . . . . . . . . . 330
11.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
11.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
12 Data Augmentation via L´evy Processes 343
Stefan Wager, William Fithian, and Percy Liang
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
12.2 L´evy Thinning . . . . . . . . . . . . . . . . . . . . . . . . . . 349
12.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
12.4 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . 365
12.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
12.6 Appendix: Proof of Theorem 12.4 . . . . . . . . . . . . . . . 369
12.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
13 Bilu-Linial Stability 375
Konstantin Makarychev and Yury Makarychev
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
13.2 Stable Instances of Graph Partitioning Problems . . . . . . . 380
13.3 Stable Instances of Clustering Problems . . . . . . . . . . . . 391
13.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400