Contents
Preface v
1 Introduction 1
1.1 What Is Learning From Data ? 1
1.2 Types of Learning 7
1.3 Relationship to Statistics, Data Science, and Artificial Intelligence 12
1.4 Social, Ethical, and Legal Aspects 13
1.5 Notes 14
1.6 Exercises 15
2 Random Experiments and Probabilities 17
2.1 Random Events 17
2.2 What Is A Probability ? 18
2.3 Equally-Likely Events 19
2.4 Principles of Counting 21
2.5 Some Events May Be More Equal Than Others 24
2.6 The Additive Rule 24
2.7 Random Variables and Probability Distributions 25
2.8 Joint Probability Distribution of Two Random Variables 34
2.9 Conditional Probabilities 35
2.10 Bayes’ Rule 41
2.11 Graphical Models 43
2.12 Notes 56
2.13 Exercises 57
3 Probability Distributions 63
3.1 Expected Value 63
3.2 Variance 67
3.3 Covariance, Correlation, and Independence 70
3.4 Russian Inequalities 73
3.5 Programming Probability Distributions 75
3.6 Discrete Probability Distributions 76
3.7 Continuous Probability Distributions 85
3.8 Mixtures of Distributions 101
3.9 Generalized Distributions 105
3.10 Distributions of Two Random Variables 107
3.11 Exercises 114
4 Sampling and Estimation 121
4.1 Population vs. Sample 121
4.2 Sample Statistics 123
4.3 Maximum Likelihood Estimation 125
4.4 Bias and Variance 128
4.5 Knowledge Extraction 132
4.6 Prediction 133
4.7 Sampling Distributions 137
4.8 Interval Estimation 142
4.9 Nonparametric Estimation 152
4.10 Monte Carlo Methods 156
4.11 Bootstrapping 159
4.12 Notes 163
4.13 Exercises 163
5 Hypothesis Testing 171
5.1 Basic Definitions 171
5.2 Tests on the Mean of a Population 172
5.3 Tests on the Proportion of a Bernoulli Population 186
5.4 Tests on the Variance of a Normal Population 189
5.5 Comparing the Parameters of Two Populations 191
5.6 Comparing Many Populations: Analysis of Variance 197
5.7 Design of Experiments 201
5.8 Goodness of Fit Tests 204
5.9 Nonparametric Tests 207
5.10 Notes 210
5.11 Exercises 210
6 Multivariate Models 215
6.1 Multivariate Data 215
6.2 Multivariate Modeling 220
6.3 Multivariate Normal Distribution 226
6.4 Multivariate Bernoulli Distribution 232
6.5 Principal Component Analysis 234
6.6 Dimensionality Reduction and Class Separability 245
6.7 Encoding/Decoding Data 245
6.8 Feature Embedding 249
6.9 Singular Value Decomposition 252
6.10 Notes 255
6.11 Exercises 256
7 Regression 263
7.1 The Idea 263
7.2 Simple Linear Regression 265
7.3 Probabilistic Interpretation 268
7.4 Analysis of Variance for Regression 271
7.5 Prediction 273
7.6 Vector-Matrix Notation 275
7.7 Generalizing the Linear Model 278
7.8 Regression using Iterative Optimization 281
7.9 Online Learning 290
7.10 Model Selection and the Bias/Variance Tradeoff 296
7.11 Cross-Validation 299
7.12 Feature Selection 306
7.13 Regularization 311
7.14 K-Fold Resampling 312
7.15 Exercises 316
8 Classification 325
8.1 Introduction 325
8.2 Bayesian Decision Theory 326
8.3 Parametric Classification 327
8.4 Multivariate Case 330
8.5 Losses and Rejects 339
8.6 Information Retrieval 342
8.7 Logistic Regression 344
8.8 Notes 357
8.9 Exercises 357
9 Clustering 363
9.1 Introduction 363
9.2 k-Means Clustering 365
9.3 Normal Mixtures and Soft Clustering 374
9.4 Mixtures of Mixtures for Classification 379
9.5 Radial Basis Functions 382
9.6 Mixtures of Experts 385
9.7 Notes 390
9.8 Exercises 391
10 Nearest Neighbors 395
10.1 The Story So Far 395
10.2 Nonparametric Methods 396
10.3 Kernel Density Estimation 397
10.4 Nonparametric Classification 401
10.5 k-Nearest Neighbors 403
10.6 Smoothing Models 410
10.7 Distance Measures 414
10.8 Notes 415
10.9 Exercises 416
11 Artificial Neural Networks 419
11.1 Why We Care About the Brain 419
11.2 The Perceptron 425
11.3 Training A Perceptron 428
11.4 Learning Boolean Functions 428
11.5 The Multilayer Perceptron 434
11.6 The Autoencoder 444
11.7 Deep Learning 447
11.8 Improving Convergence 449
11.9 Structuring the Network 451
11.10 Recurrent Networks 456
11.11 Composite Architectures 457
11.12 Notes 460
11.13 Exercises 461
A Linear Algebra 465
A.1 Vectors and Matrices 465
A.2 Vector Projections 467
A.3 Similarity of Vectors 468
A.4 Square Matrices 468
A.5 Linear Dependence, Rank, and Inverse Matrices 468
A.6 Positive Definite Matrices 469
A.7 Trace and Determinant 469
A.8 Matrix-Vector Product 470
A.9 Eigenvalues and Eigenvectors 470
A.10 Matrix Decomposition 470
References 471
Index 476