前言 We live in the midst of a data deluge. According to recent estimates, 2.5 quintillion (10i8) bytes of data are generated on a daily basis. This is so much data that over 90 percent of the information that we store nowadays was generated in the past decade alone. Unfortunately, most of this information cannot be used by humans. Either the data is beyond the means of standard analytical methods, or it is simply too vast for our limited minds to even comprehend.
Through Machine Learning, we enable computers to process, learn from, and draw actionable insights out of the otherwise impenetrable walls of big data. From the massive supercomputers that support Google s search engines to the smart phones that we carry in our pockets, we rely on Machine Learning to power most of the world around us - often, without even knowing it.
As modem pioneers in the brave new world of big data, it then behooves us to learn more about Machine Learning. What is Machine Learning and how does it work? How can I use Machine Learning to take a glimpse into the unknown, power my business, or just find out what the Internet at large thinks about my favorite movie? All of this and more will be covered in the following chapters authored by my good friend and colleague, Sebastian Raschka. When away from taming my otherwise irascible pet dog, Sebashan has tirelessly devoted his free time to the open source Machine Learning community. Over the past several years, Sebastian has developed dozens of popular tutorials that cover topics in Machine Learning and data visualization in Python. He has also developed and contributed to several open source Python packages, several of which are now part of the core Python Machine Learning workflow.
Owing to his vast expertise in this field, I am confident that Sebashan's insights into the world of Machine Learning in Python will be invaluable to users of all experience levels. l wholeheartedly recommendy this book to anyone looking to gain a broader and more practical und Yerstanding of Machine Learning.
作者简介 Sebastian Raschka,a PhD student at Michigan State University, who develops new computational methods in the field of computational biology. He has been ranked as the number one most influential data scientist on GitHub by Analytics Vidhya. He has a yearlong experience in Python programming and he has conducted several seminars on the practical applications of data science and machine learning. Talking and writing about data science, machine learning, and Python really motivated Sebastian to write this book in order to help people develop data-driven solutions without necessarily needing to have a machine learning background. He has also actively contributed to open source projects and methods that he implemented, which are now successfully used in machine learning competitions, such as Kaggle. In his free time, he works on models for sports predictions, and if he is not in front of the computer, he enjoys playing sports.
目录 Preface
Chapter 1: Givin Computers the Ability to Learn from Data
Building intelligent machines to transform data into knowledge
The three different types of machine learning
Making predictions about the future with supervised learning
Classification for predicting class labels
Regression for predicting continuous outcomes
Solving interactive problems with reinforcement learning
Discovering hidden structures with unsupervised learning
Finding subgroups with clustering
Dimensionality reduction for data compression
An introduction to the basic terminology and notations
A roadmap for building machine learning systems
Preprocessing-getting data into shape
Training and selecting a predictive model
Evaluating models and predicting unseen data instances
Using Python for machine learning
Installing Python packages
Summary
Chapter 2: Training Machine Learning Algorithms
for Classification
Artificial neurons-a brief glimpse into the early history
of machine learning
Implementing a perceptron learning algorithm in Python
Training a perceptron model on the Iris dataset
Adaptive linear neurons and the convergence of learning
Minimizing cost functions with gradient descent
Implementing an Adaptive Linear Neuron in Python
Large scale machine learning and stochastic gradient descent
Summary
Chapter 3: A Tour of Machine Learning Classifiers Using
Scikit-learn
Choosing a classification algorithm
First steps with scikit-learn
Training a perceptron via scikit-learn
Modeling class probabilities via logistic regression
Logistic regression intuition and conditional probabilities
Learning the weights of the logistic cost function
Training a logistic regression model with scikit-learn
Tackling overfitting via regularization
Maximum margin classification with support vector machines
Maximum margin intuition
Dealing with the nonlinearly separablecase using slack variables
Alternative implementations in scikit-learn
Solving nonlinear problems using a kernel SMM
Using the kernel trick to find separating hyperplanes in higher
dimensional space
Decision tree learning
Maximizing information gain-getting the most bang for the buck
Building a decision tree
Combining weak to strong learners via random forests
K-nearest neighbors-a lazy learning algorithm
Summary
Chapter 4: Building Good Training Sets-Data Preprocessing
Dealing with missing data
Eliminating samples or features with missing values
Imputing missing values
Understanding the scikit-learn estimator API
Handling categorical data
Mapping ordinal features
Encoding class labels
Performing one-hot encoding on nominal features
Partitioning a dataset in training and test sets
Bringing features onto the same scale
Selecting meaningful features
Sparse solutions with L1 regularization
Sequential feature selection algorithms
Assessing feature importance with random forests
Summary
Chapter 5: Com~ Data via Di~ Reduction
Unsupervised dimensionality reduction via principal
component analysis
Total and explained variance
Feature transformation
Principal component analysis in scikit-learn
Supervised data compression via linear discriminant analysis
Computing the scatter matrices
Selecting linear discriminants for the new feature subspace
Projecting samples onto the new feature space
LDA via scikit-learn
Using kernel principal component analysis for nonlinear mappings
Kernel functions and the kernel trick
Implementing a kernel principal component analysis in Python
Example 1-separating half-moon shapes
Example 2-separating concentric circles
Projecting new data points
Kernel principal component analysis in scikit-learn
Summary
Chapter 6: Learning Best Practices for Model Evaluation
and Hyperparameter Tuni~
Streamlining workflows with pipelines
Loading the Breast Cancer Wisconsin dataset
Combining transformers and estimators in a pipeline
Using k-fold cross-validation to assess model performance
The holdout method
K-fold cross-validation
Debugging algorithms with learning and validation curves
Diagnosing bias and variance problems with learning curves
Addressing overfitting and underfitting with validation curves
Fine-tuning machine learning models via grid search
Tuning hyperparameters via grid search
Algorithm selection with nested cross-validation
Looking at different performance evaluation metrics
Reading a confusion matrix
Optimizing the precision and recall of a classification model
Plotting a receiver operating characteristic
The scoring metrics for multiclass classification
Summary
Chapter 7: Combining Different Models for Ensemble Learning
Learning with ensembles
Implementing a simple majority vote classifier
Combining different algorithms for classification with majority vote
Evaluating and tuning the ensemble classifier
Bagging-building an ensemble of classifiers from
bootstrap samples
Leveraging weak learners via adaptive boosting
Summary
Chapter 8: Applying Machine Learning to Sentiment Analysis
Obtaining the IMDb movie review dataset
Introducing the bag-of-words model
Transforming words into feature vectors
Assessing word relevancy via term frequency-inverse
document frequency
Cleaning text data
Processing documents into tokens
Training a logistic regression model for document classification
Working with bigger data-online algorithms and
out-of-core learning
Summary
Chapter 9: Embedding a Machine Learning Model into
a Web Application
Serializing fitted scikit-learn estimators
Setting up a SQLite database for data storage
Developing a web application with Flask
Our first Flask web application
Form validation and rendering
Turning the movie classifier into a web application
以下为对购买帮助不大的评价