Content:
Introduction
1.1 What Is machine Learning
1.2 When Do We Need Machine Learning
1.3 Types of Learning
1.4 Relations to Other Fields
1.2 When Do We Need Machine Learning
1.3 Types of Learning
1.4 Relations to Other Fields
1.5 How to Read This Book
1.5.1 Possible Course Plans Based on This Book
1.6 Notation
Part I Foundations
2 A Gentle Start
2.1 A Formal Model { The Statistical Learning Framework
2.2 Empirical Risk Minimization
2.2.1 Something May Go Wrong
2.3 Empirical Risk Minimization with Inductive Bias
2.3.1 Finite Hypothesis Classes
3 A Formal Learning Model
3.1 PAC Learning
3.2 A More General Learning Model
3.2.1 Releasing the Realizability Assumption
3.2.2 The Scope of Learning Problems Modeled
4 Learning via Uniform Convergence
4.1 Uniform Convergence Is Su cient for Learnability
4.2 Finite Classes Are Agnostic PAC Learnable
The Bias-Complexity Tradeo
5.1 The No-Free-Lunch Theorem
5.1.1 No-Free-Lunch and Prior Knowledge
5.2 Error Decomposition
6 The VC-Dimension
6.1 In nite-Size Classes Can Be Learnable
6.2 The VC-Dimension
6.3 Threshold Functions
6.3.1 Intervals
6.3.2 Axis Aligned Rectangles
6.3.3 Finite Classes
6.3.4 VC-Dimension and the Number of Parameters
6.4 The Fundamental Theorem of PAC learning
6.5 Proof of Theorem
6.5.1 Sauer's Lemma and the Growth Function
6.5.2 Uniform Convergence for Classes of Small E ective Size
7 Nonuniform Learnability
7.1 Nonuniform Learnability
7.1.1 Characterizing Nonuniform Learnability
7.2 Structural Risk Minimization
7.3 Minimum Description Length and Occam's Razor
7.3.1 Occam's Razor
7.4 Other Notions of Learnability { Consistency
7.5 Discussing the Di erent Notions of Learnability
7.5.1 The No-Free-Lunch Theorem Revisited
8.1 Formal De nition*
8.2 Implementing the ERM Rule
8.2.1 Finite Classes
8.2.2 Axis Aligned Rectangles
8.2.3 Boolean Conjunctions
8.2.4 Learning 3-Term DNF
8.3 E ciently Learnable, but Not by a Proper ERM
8.4 Hardness of Learning*
Part II From Theory to Algorithms
9 Linear Predictors
9.1 Halfspaces
9.1.1 Linear Programming for the Class of Halfspaces
9.1.2 Perceptron for Halfspaces
9.1.3 The VC Dimension of Halfspaces
9.2 Linear Regression
9.2.1 Least Squares
9.2.2 Linear Regression for Polynomial Regression Tasks
9.3 Logistic Regression
10 Boosting
10.1 Weak Learnability
10.1.1 E cient Implementation of ERM for Decision Stumps
10.2 AdaBoost
10.3 Linear Combinations of Base Hypotheses
10.3.1 The VC-Dimension of L(B; T)
10.4 AdaBoost for Face Recognition
11 Model Selection and Validation
11.1 Model Selection Using SRM
11.2 Validation
11.2.1 Hold Out Set
11.2.2 Validation for Model Selection
11.2.3 The Model-Selection Curve
11.2.4 k-Fold Cross Validation]
11.2.5 Train-Validation-Test Split
11.3 What to Do If Learning Fails
12 Convex Learning Problems
12.1 Convexity, Lipschitzness, and Smoothness
12.1.1 Convexity
12.1.2 Lipschitzness
12.1.3 Smoothness
12.2 Convex Learning Problems
12.2.1 Learnability of Convex Learning Problems
12.2.2 Convex-Lipschitz/Smooth-Bounded Learning Problems
12.3 Surrogate Loss Functions
13 Regularization and Stability
13.1 Regularized Loss Minimization
13.1.1 Ridge Regression
13.2 Stable Rules Do Not Over t
13.3 Tikhonov Regularization as a Stabilizer
13.3.1 Lipschitz Loss
13.3.2 Smooth and Nonnegative Loss
13.4 Controlling the Fitting-Stability Tradeo
14 Stochastic Gradient Descent
14.1 Gradient Descent
14.1.1 Analysis of GD for Convex-Lipschitz Functions
14.2 Subgradients
14.2.1 Calculating Subgradients
14.2.2 Subgradients of Lipschitz Functions
14.2.3 Subgradient Descent
14.3 Stochastic Gradient Descent (SGD)
14.3.1 Analysis of SGD for Convex-Lipschitz-Bounded Functions
14.4 Variants
14.4.1 Adding a Projection Step
14.4.2 Variable Step Size
14.4.3 Other Averaging Techniques
14.4.4 Strongly Convex Functions*
14.5 Learning with SGD
14.5.1 SGD for Risk Minimization
14.5.2 Analyzing SGD for Convex-Smooth Learning Problems
14.5.3 SGD for Regularized Loss Minimization
15 Support Vector Machines
15.1 Margin and Hard-SVM
15.1.1 The Homogenous Case
15.1.2 The Sample Complexity of Hard-SVM
15.2 Soft-SVM and Norm Regularization
15.2.1 The Sample Complexity of Soft-SVM
15.2.2 Margin and Norm-Based Bounds versus Dimension
15.2.3 The Ramp Loss* 209
15.3 Optimality Conditions and \Support Vectors"*
15.4 Duality*
15.5 Implementing Soft-SVM Using SGD
16 Kernel Methods
16.1 Embeddings into Feature Spaces
16.2 The Kernel Trick
16.2.1 Kernels as a Way to Express Prior Knowledge
16.2.2 Characterizing Kernel Functions*
16.3 Implementing Soft-SVM with Kernels
17 Multiclass, Ranking, and Complex Prediction Problems
17.1 One-versus-All and All-Pairs
17.2 Linear Multiclass Predictors
17.2.1 How to Construct
17.2.2 Cost-Sensitive Classi cation
17.2.3 ERM
17.2.4 Generalized Hinge Loss
17.2.5 Multiclass SVM and SGD
17.3 Structured Output Prediction
17.4 Ranking
17.4.1 Linear Predictors for Ranking
17.5 Bipartite Ranking and Multivariate Performance Measures
17.5.1 Linear Predictors for Bipartite Ranking
18 Decision Trees
18.1 Sample Complexity
18.2 Decision Tree Algorithms
18.2.1 Implementations of the Gain Measure
18.2.2 Pruning
18.2.3 Threshold-Based Splitting Rules for Real-Valued Features
18.3 Random Forests
19 Nearest Neighbor
19.1 k Nearest Neighbors
19.2 Analysis
19.2.1 A Generalization Bound for the 1-NN Rule
19.2.2 The \Curse of Dimensionality"
19.3 E cient Implementation*
20 Neural Networks
20.1 Feedforward Neural Networks
20.2 Learning Neural Networks
20.3 The Expressive Power of Neural Networks
20.3.1 Geometric Intuition
20.4 The Sample Complexity of Neural Networks
20.5 The Runtime of Learning Neural Networks
20.6 SGD and Backpropagation
Part III Additional Learning Models
21 Online Learning
21.1 Online Classi cation in the Realizable Case
21.1.1 Online Learnability
21.2 Online Classi cation in the Unrealizable Case
21.2.1 Weighted-Majority
21.3 Online Convex Optimization
21.4 The Online Perceptron Algorithm
22 Clustering
22.1 Linkage-Based Clustering Algorithms
22.2 k-Means and Other Cost Minimization Clusterings
22.2.1 The k-Means Algorithm
22.3 Spectral Clustering
22.3.1 Graph Cut
22.3.2 Graph Laplacian and Relaxed Graph Cuts
22.3.3 Unnormalized Spectral Clustering
22.4 Information Bottleneck*
22.5 A High Level View of Clustering
23 Dimensionality Reduction
23.1 Principal Component Analysis (PCA)
23.1.1 A More E cient Solution for the Case d m
23.1.2 Implementation and Demonstration
23.2 Random Projections
23.3 Compressed Sensing
23.3.1 Proofs*
23.4 PCA or Compressed Sensing?
24 Generative Models
24.1 Maximum Likelihood Estimator
24.1.1 Maximum Likelihood Estimation for Continuous Random
Variables
24.1.2 Maximum Likelihood and Empirical Risk Minimization
24.1.3 Generalization Analysis
24.2 Naive Bayes
24.3 Linear Discriminant Analysis
24.4 Latent Variables and the EM Algorithm
24.4.1 EM as an Alternate Maximization Algorithm
24.4.2 EM for Mixture of Gaussians (Soft k-Means)
24.5 Bayesian Reasoning
25 Feature Selection and Generation
25.1 Feature Selection
25.1.1 Filters
25.1.2 Greedy Selection Approaches
25.1.3 Sparsity-Inducing Norms
25.2 Feature Manipulation and Normalization
25.2.1 Examples of Feature Transformations
25.3 Feature Learning
25.3.1 Dictionary Learning Using Auto-Encoders
Part IV Advanced Theory
26 Rademacher Complexities
26.1 The Rademacher Complexity
26.1.1 Rademacher Calculus
26.2 Rademacher Complexity of Linear Classes
26.3 Generalization Bounds for SVM
26.4 Generalization Bounds for Predictors with Low `1 Norm
27 Covering Numbers
27.1 Covering
27.1.1 Properties
27.2 From Covering to Rademacher Complexity via Chaining
28 Proof of the Fundamental Theorem of Learning Theory
28.1 The Upper Bound for the Agnostic Case
28.2 The Lower Bound for the Agnostic Case
28.2.1 Showing That m( ; ) 0:5 log(1=(4 ))= 2
28.2.2 Showing That m( ; 1=8) 8d= 2
28.3 The Upper Bound for the Realizable Case
28.3.1 From -Nets to PAC Learnability
29 Multiclass Learnability
29.1 The Natarajan Dimension
29.2 The Multiclass Fundamental Theorem
29.2.1 On the Proof of Theorem
29.3 Calculating the Natarajan Dimension
29.3.1 One-versus-All Based Classes
29.3.2 General Multiclass-to-Binary Reductions
29.3.3 Linear Multiclass Predictors
29.4 On Good and Bad ERMs
30 Compression Bounds
30.1 Compression Bounds
30.2 Axis Aligned Rectangles
30.2.1 Halfspaces
30.2.2 Separating Polynomials
30.2.3 Separation with Margin
31 PAC-Bayes
31.1 PAC-Bayes Bounds
Comments
Post a Comment
thank you for visiting :)