消息首页搜索举报

The Elements of Statistical Learning：Data Mining, Inference, and Prediction

无笔记划线

200 九品

仅1件

河北保定

认证卖家担保交易快速发货售后保障

作者Trevor Hastie；Robert Tibshirani；Jerome Friedman

出版社Springer

出版时间2008-12

装帧精装

上书时间2024-09-29

墨林二手书

十一年老店

已实名已认证进店收藏店铺

在售商品暂无
平均发货时间 33小时
好评率暂无

最新上架

味即道中华饮食与文化十一讲 ¥10.00

杨家将 ¥5.00

留一块黑板:与顾明远先生对话现代学校发展 ¥8.00

学习的科学：每位教师都应知道的77项教育研究成果 ¥8.00

享受语文课堂：黄厚江本色语文教学典型案例 ¥8.00

混沌与觉悟 : 中医入门零到玖 ¥8.00

万科控制权之争：律师视角 ¥30.00

品类 ¥35.00

整合式短程心理咨询 ¥8.00

商品详情

品相描述：九品

图书标准信息

作者 Trevor Hastie；Robert Tibshirani；Jerome Friedman
出版社 Springer
出版时间 2008-12
ISBN 9780387848570
装帧精装
开本其他
纸张其他

【内容简介】: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has le
【作者简介】: Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
【目录】: 1Introduction
2Overview of Supervised Learning
2.1Introduction
2.2Variable Types and Terminology
2.3Two Simple Approaches to Prediction:
Least Squares and Nearest Neighbors
2.3.1Linear Models and Least Squares
2.3.2Nearest-Neighbor Methods
2.3.3From Least Squares to Nearest Neighbors
2.4Statistical Decision Theory
2.5Local Methods in High Dimensions
2.6Statistical Models, Supervised Learning
and Function Approximation
2.6.1A Statistical Model
for the Joint Distribution Pr(X, Y )
2.6.2Supervised Learning
2.6.3Function Approximation
2.7Structured Regression Models
2.7.1Difficulty of the Problem
2.8Classes of Restricted Estimators
2.8.1Roughness Penalty and Bayesian Methods
2.8.2Kernel Methods and Local Regression
2.8.3Basis Functions and Dictionary Methods
2.9Model Selection and the Bias–Variance Tradeoff
Bibliographic Notes
Exercises
3Linear Methods for Regression
3.1Introduction
3.2Linear Regression Models and Least Squares
3.2.1Example: Prostate Cancer
3.2.2The Gauss–Markov Theorem
3.2.3Multiple Regression
from Simple Univariate Regression
3.2.4Multiple Outputs
3.3Subset Selection
3.3.1Best-Subset Selection
3.3.2Forward- and Backward-Stepwise Selection
3.3.3Forward-Stagewise Regression
3.3.4Prostate Cancer Data Example (Continued)
3.4Shrinkage Methods
3.4.1Ridge Regression
3.4.2The Lasso
3.4.3Discussion: Subset Selection, Ridge Regression
and the Lasso
3.4.4Least Angle Regression
3.5Methods Using Derived Input Directions
3.5.1Principal Components Regression
3.5.2Partial Least Squares
3.6Discussion: A Comparison of the Selection
and Shrinkage Methods
3.7Multiple Outcome Shrinkage and Selection
3.8More on the Lasso and Related Path Algorithms
3.8.1Incremental Forward Stagewise Regression
3.8.2Piecewise-Linear Path Algorithms
3.8.3The Dantzig Selector
3.8.4The Grouped Lasso
3.8.5Further Properties of the Lasso
3.8.6Pathwise Coordinate Optimization
3.9Computational Considerations
Bibliographic Notes
Exercises

4Linear Methods for Classification
4.1Introduction
4.2Linear Regression of an Indicator Matrix
4.3Linear Discriminant Analysis
4.3.1Regularized Discriminant Analysis
4.3.2Computations for LDA
4.3.3Reduced-Rank Linear Discriminant Analysis
4.4Logistic Regression
4.4.1Fitting Logistic Regression Models
4.4.2Example: South African Heart Disease
4.4.3Quadratic Approximations and Inference
4.4.4L1 Regularized Logistic Regression
4.4.5Logistic Regression or LDA?
4.5Separating Hyperplanes
4.5.1Rosenblatt’s Perceptron Learning Algorithm .
4.5.2Optimal Separating Hyperplanes
Bibliographic Notes
Exercises
5Basis Expansions and Regularization
5.1Introduction
5.2Piecewise Polynomials and Splines
5.2.1Natural Cubic Splines
5.2.2Example: South African Heart Disease (Continued)
5.2.3Example: Phoneme Recognition
5.3Filtering and Feature Extraction
5.4Smoothing Splines
5.4.1Degrees of Freedom and Smoother Matrices
5.5Automatic Selection of the Smoothing Parameters
5.5.1Fixing the Degrees of Freedom
5.5.2The Bias–Variance Tradeoff
5.6Nonparametric Logistic Regression
5.7Multidimensional Splines
5.8Regularization and Reproducing Kernel Hilbert Spaces
5.8.1Spaces of Functions Generated by Kernels
5.8.2Examples of RKHS
5.9Wavelet Smoothing
5.9.1Wavelet Bases and the Wavelet Transform
5.9.2Adaptive Wavelet Filtering
Bibliographic Notes
Exercises
Appendix: Computational Considerations for Splines
Appendix: B-splines
Appendix: Computations for Smoothing Splines

6Kernel Smoothing Methods
6.1One-Dimensional Kernel Smoothers
6.1.1Local Linear Regression
6.1.2Local Polynomial Regression
6.2Selecting the Width of the Kernel
6.3Local Regression in IRp
6.4Structured Local Regression Models in IRp
6.4.1Structured Kernels
6.4.2Structured Regression Functions
6.5Local Likelihood and Other Models
6.6Kernel Density Estimation and Classification
6.6.1Kernel Density Estimation
6.6.2Kernel Density Classification
6.6.3The Naive Bayes Classifier
6.7Radial Basis Functions and Kernels
6.8Mixture Models for Density Estimation and Classification
6.9Computational Considerations
Bibliographic Notes
Exercises
7Model Assessment and Selection
7.1Introduction
7.2Bias, Variance and Model Complexity
7.3The Bias–Variance Decomposition223
7.3.1Example: Bias–Variance Tradeoff
7.4Optimism of the Training Error Rate
7.5Estimates of In-Sample Prediction Error
7.6The Effective Number of Parameters
7.7The Bayesian Approach and BIC
7.8Minimum Description Length
7.9Vapnik–Chervonenkis Dimension
7.9.1Example (Continued)
7.10Cross-Validation
7.10.1K-Fold Cross-Validation
7.10.2The Wrong and Right Way
to Do Cross-validation
7.10.3Does Cross-Validation Really Work?
7.11Bootstrap Methods
7.11.1Example (Continued)
7.12Conditional or Expected Test Error?
Bibliographic Notes
Exercises
8Model Inference and Averaging
8.1Introduction
8.2The Bootstrap and Maximum Likelihood Methods
8.2.1A Smoothing Example
8.2.2Maximum Likelihood Inference
8.2.3Bootstrap versus Maximum Likelihood
8.3Bayesian Methods
8.4Relationship Between the Bootstrap
and Bayesian Inference
8.5The EM Algorithm
8.5.1Two-Component Mixture Model
8.5.2The EM Algorithm in General
8.5.3EM as a Maximization–Maximization Procedure
8.6MCMC for Sampling from the Posterior
8.7Bagging
8.7.1Example: Trees with Simulated Data
8.8Model Averaging and Stacking
8.9Stochastic Search: Bumping
Bibliographic Notes
Exercises
9Additive Models, Trees, and Related Methods
9.1Generalized Additive Models
9.1.1Fitting Additive Models
9.1.2Example: Additive Logistic Regression
9.1.3Summary
9.2Tree-Based Methods
9.2.1Background
9.2.2Regression Trees
9.2.3Classification Trees
9.2.4Other Issues
9.2.5Spam Example (Continued)
9.3PRIM: Bump Hunting
9.3.1Spam Example (Continued)
9.4MARS: Multivariate Adaptive Regression Splines
9.4.1Spam Example (Continued)
9.4.2Example (Simulated Data)
9.4.3Other Issues
9.5Hierarchical Mixtures of Experts
9.6Missing Data
9.7Computational Considerations
Bibliographic Notes
Exercises
10Boosting and Additive Trees
10.1Boosting Methods
10.1.1Outline of This Chapter
10.2Boosting Fits an Additive Model
10.3Forward Stagewise Additive Modeling
10.4Exponential Loss and AdaBoost
10.5Why Exponential Loss?
10.6Loss Functions and Robustness
10.7“Off-the-Shelf” Procedures for Data Mining
10.8Example: Spam Data
10.9Boosting Trees
10.10Numerical Optimization via Gradient Boosting
10.10.1Steepest Descent
10.10.2Gradient Boosting
10.10.3Implementations of Gradient Boosting
10.11Right-Sized Trees for Boosting
10.12Regularization
10.12.1Shrinkage
10.12.2Subsampling
10.13Interpretation
10.13.1Relative Importance of Predictor Variables
10.13.2Partial Dependence Plots
10.14Illustrations
10.14.1California Housing
10.14.2New Zealand Fish
10.14.3Demographics Data
Bibliographic Notes
Exercises
11Neural Networks
11.1Introduction
11.2Projection Pursuit Regression
11.3Neural Networks
11.4Fitting Neural Networks
11.5Some Issues in Training Neural Networks
11.5.1Starting Values
11.5.2Overfitting
11.5.3Scaling of the Inputs
11.5.4Number of Hidden Units and Layers
11.5.5Multiple Minima
11.6Example: Simulated Data
11.7Example: ZIP Code Data
11.8Discussion
11.9Bayesian Neural Nets and the NIPS 2003 Challenge
11.9.1Bayes, Boosting and Bagging
11.9.2Performance Comparisons
11.10Computational Considerations
Bibliographic Notes
Exercises
12Support Vector Machines and
Flexible Discriminants
12.1Introduction
12.2The Support Vector Classifier
12.2.1Computing the Support Vector Classifier
12.2.2Mixture Example (Continued)
12.3Support Vector Machines and Kernels
12.3.1Computing the SVM for Classification
12.3.2The SVM as a Penalization Method
12.3.3Function Estimation and Reproducing Kernels
12.3.4SVMs and the Curse of Dimensionality
12.3.5A Path Algorithm for the SVM Classifier
12.3.6Support Vector Machines for Regression
12.3.7Regression and Kernels
12.3.8Discussion
12.4Generalizing Linear Discriminant Analysis
12.5Flexible Discriminant Analysis
12.5.1Computing the FDA Estimates
12.6Penalized Discriminant Analysis
12.7Mixture Discriminant Analysis
12.7.1Example: Waveform Data
Bibliographic Notes
Exercises
13 Prototype Methods and Nearest-Neighbors
13.1Introduction
13.2Prototype Methods
13.2.1K-means Clustering
13.2.2Learning Vector Quantization
13.2.3Gaussian Mixtures
13.3k-Nearest-Neighbor Classifiers
13.3.1Example: A Comparative Study
13.3.2Example: k-Nearest-Neighbors
and Image Scene Classification
13.3.3Invariant Metrics and Tangent Distance
13.4Adaptive Nearest-Neighbor Methods
13.4.1Example
13.4.2Global Dimension Reduction
for Nearest-Neighbors
13.5Computational Considerations
Bibliographic Notes
Exercises

14 Unsupervised  Learning
14.1Introduction
14.2Association Rules
14.2.1Market Basket Analysis
14.2.2The Apriori Algorithm
14.2.3Example: Market Basket Analysis
14.2.4Unsupervised as Supervised Learning
14.2.5Generalized Association Rules
14.2.6Choice of Supervised Learning Method
14.2.7Example: Market Basket Analysis (Continued)
14.3Cluster Analysis
14.3.1Proximity Matrices
14.3.2Dissimilarities Based on Attributes
14.3.3Object Dissimilarity
14.3.4Clustering Algorithms
14.3.5Combinatorial Algorithms
14.3.6K-means
14.3.7Gaussian Mixtures as Soft K-means Clustering
14.3.8Example: Human Tumor Microarray Data
14.3.9Vector Quantization
14.3.10   K-medoids
14.3.11   Practical Issues
14.3.12  Hierarchical Clustering
14.4Self-Organizing Maps
14.5Principal Components, Curves and Surfaces
14.5.1Principal Components
14.5.2Principal Curves and Surfaces
14.5.3Spectral Clustering
14.5.4Kernel Principal Components
14.5.5Sparse Principal Components
14.6Non-negative Matrix Factorization
14.6.1Archetypal Analysis
14.7Independent Component Analysis
and Exploratory Projection Pursuit
14.7.1Latent Variables and Factor Analysis
14.7.2Independent Component Analysis
14.7.3Exploratory Projection Pursuit
14.7.4A Direct Approach to ICA
14.8Multidimensional Scaling
14.9Nonlinear Dimension Reduction
and Local Multidimensional Scaling
14.10  The Google PageRank Algorithm
Bibliographic Notes
Exercises

15Random Forests
15.1Introduction
15.2Definition of Random Forests
15.3Details of Random Forests
15.3.1Out of Bag Samples
15.3.2Variable Importance
15.3.3Proximity Plots
15.3.4Random Forests and Overfitting
15.4Analysis of Random Forests
15.4.1Variance and the De-Correlation Effect
15.4.2Bias
15.4.3Adaptive Nearest Neighbors
Bibliographic Notes
Exercises
16Ensemble Learning
16.1Introduction
16.2Boosting and Regularization Paths
16.2.1Penalized Regression
16.2.2The “Bet on Sparsity” Principle
16.2.3Regularization Paths, Over-fitting and Margins
16.3Learning Ensembles
16.3.1Learning a Good Ensemble
16.3.2Rule Ensembles
Bibliographic Notes
Exercises
17Undirected Graphical Models
17.1Introduction
17.2Markov Graphs and Their Properties
17.3Undirected Graphical Models for Continuous Variables
17.3.1Estimation of the Parameters
when the Graph Structure is Known
17.3.2Estimation of the Graph Structure
17.4Undirected Graphical Models for Discrete Variables
17.4.1Estimation of the Parameters
when the Graph Structure is Known
17.4.2Hidden Nodes
17.4.3Estimation of the Graph Structure
17.4.4Restricted Boltzmann Machines
Exercises
18High-Dimensional Problems: p ≫ N
18.1When p is Much Bigger than N
18.2Diagonal Linear Discriminant Analysis
and Nearest Shrunken  Centroids
18.3Linear Classifiers with Quadratic Regularization
18.3.1Regularized  Discriminant Analysis
18.3.2Logistic Regression
with Quadratic Regularization
18.3.3The Support Vector Classifier
18.3.4Feature  Selection
18.3.5Computational Shortcuts When p ≫ N
18.4Linear Classifiers with L1 Regularization
18.4.1Application of Lasso
to Protein Mass Spectroscopy
18.4.2The Fused Lasso for Functional Data
18.5Classification When Features are Unavailable
18.5.1Example: String Kernels
and Protein Classification
18.5.2Classification and Other Models Using
Inner-Product Kernels and Pairwise Distances .
18.5.3Example: Abstracts Classification
18.6High-Dimensional Regression: Supervised Principal Components
18.6.1Connection to Latent-Variable Modeling
18.6.2Relationship with Partial Least Squares
18.6.3Pre-Conditioning for Feature Selection
18.7Feature Assessment and the Multiple-Testing Problem
18.7.1The False Discovery Rate
18.7.2Asymmetric Cutpoints and the SAM Procedure
18.7.3A Bayesian Interpretation of the FDR
18.8Bibliographic Notes
Exercises