Statistical Learning Theory Papers

	A	B	C
1
2	Name	First Author (or most notable)	Link
3	An Overview of Statistical Learning Theory	Vapnik	http://math.arizona.edu/~hzhang/math574m/vapnik.pdf
4	A Theory of the Learnable	Valiant	http://web.mit.edu/6.435/www/Valiant84.pdf
5	An Introduction to Kernel-Based Learning Algorithms	Klaus-Robert Müller	http://media.cs.tsinghua.edu.cn/~taopin/ML2005/intro2Kernel-basedLearn-TNN-2001.pdf
6	Ridge Regression: Biased Estimation for Nonorthogonal Problems	Arthur E. Hoerl	http://math.arizona.edu/~hzhang/math574m/Read/Ridge.pdf
7	Random Forests	Breiman	https://www.cise.ufl.edu/~anand/fa11/Breiman_Random_Forests.pdf
8	Generalized Additive Models	Trevor Hastie and Robert Tibshirani	http://gsp.humboldt.edu/olm_2015/Courses/GSP_570/Learning%20Modules/07%20GAMs/gam.pdf
9	The Mathematics of Learning: Dealing with Data	Tomaso Poggio	http://cbcl.mit.edu/projects/cbcl/publications/ps/notices-ams2003refs.pdf
10	Multivariate Adaptive Regression Splines	Friedman	ftp://gisportal.mt.gov/Maxell/Models/Predictive_Modeling_for_DSS_Lincoln_NE_121510/Modeling_Literature/Friedman_MARS.pdf
11	Stochastic Gradient Boosting	Friedman	http://astro.temple.edu/~msobel/courses_files/StochasticBoosting(gradient).pdf
12	Regularization and Variable Selection via the Elastic Net	Hastie	http://web.stanford.edu/~hastie/Papers/elasticnet.pdf
13	Regression Shrinkage and Selection via the Lasso	Tibshirani	http://lib.cufe.edu.cn/upload_files/file/20140521/3_20140521_Regression%20shrinkage%20and%20selection%20via%20the%20lasso.pdf
14	Bagging Predictors	Breiman	http://lia.disi.unibo.it/Courses/AI/applicationsAI2005-06/Tesine/Leo%20Breiman-Bagging%20Predictors.pdf
15	Estimating Latent-Variable Graphical Models using Moments and Likelihoods	Liang	http://arun.chagantys.org/files/research/ChaLiang2014.pdf
16	Analysis of Thompson Sampling for the Multi-armed Bandit Problem	Agrawal	http://jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf
17	A Method of Moments for Mixture Models and Hidden Markov Models	Anandkumar	http://www.jmlr.org/proceedings/papers/v23/anandkumar12/anandkumar12.pdf
18	Online convex optimization in the bandit setting: gradient descent without a gradient	Flaxman	http://research.microsoft.com/en-us/um/people/adum/publications/2005-Online_Convex_Optimization_in_the_Bandit_Setting.pdf
19	Finite-time Analysis of the Multiarmed Bandit Problem	Auer	http://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf
20	Unsupervised Learning of Noisy-Or Bayesian Networks	Halpern	http://cs.nyu.edu/~dsontag/papers/HalpernSontag_uai13.pdf
21	Learning mixtures of spherical Gaussians: moment methods and spectral decompositions	Hsu	http://arxiv.org/pdf/1206.5766.pdf
22	Asymptotically Efficient Adaptive Allocation Rules	Lai	http://www.rci.rutgers.edu/~mnk/papers/Lai_robbins85.pdf
23	A Training Algorithm for Optimal Margin Classiers (SVM)	Vapnik	http://w.svms.org/training/BOGV92.pdf
24	Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods	Janzamin	http://arxiv.org/pdf/1506.08473.pdf
25	A Generalized Online Mirror Descent with Applications to Classification and Regression	Orabona	http://mercurio.srv.di.unimi.it/~cesabian/Pubblicazioni/genOmd.pdf
26	Online Learning and Online Convex Optimization	Shai Shalev-Shwartz	http://www.cs.huji.ac.il/~shais/papers/OLsurvey.pdf
27	Laplacian Eigenmaps for Dimensionality Reduction and Data Representation	Belkin	https://www.cise.ufl.edu/~anand/fa11/Laplacian_Eigenmaps_preprint.pdf
28	Spatial Interaction and the Statistical Analysis of Lattice Systems	Besag	https://www.cise.ufl.edu/~anand/fa11/Besag_Spatial_interaction.pdf
29	Online Learning with Predictable Sequences	Rakhlin	http://arxiv.org/pdf/1208.3728v2.pdf
30	Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm	Liang	http://cs.stanford.edu/~pliang/papers/eg-icml2014.pdf
31	Convolution Kernels on Discrete Structures	Haussler	https://cbse.soe.ucsc.edu/sites/default/files/convolutions.pdf
32	Markov Logic Networks	Domingos	http://homes.cs.washington.edu/~pedrod/papers/mlj05.pdf
33	AdaBoost	Freund and Schapire	http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/materials/freund_schapire_adaboost_journal.pdf
34	A Short Introduction to Boosting	Yoav Freund Robert E. Schapire	http://cseweb.ucsd.edu/~yfreund/papers/IntroToBoosting.pdf
35
36	Explaining the Gibbs Sampler	George Casella	http://www.stat.ufl.edu/archived/casella/OlderPapers/ExpGibbs.pdf
37	MCMC original paper	Metropolis	http://bayes.wustl.edu/Manual/EquationOfState.pdf
38
39	Maximum likelihood from incomplete data via the EM algorithm	Dempster	http://web.mit.edu/6.435/www/Dempster77.pdf
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100