作者简介 Simon Walkowiak,a cognitive neuroscientist and a managing director of Mind Project Ltd - a Big Data and Predictive Analytics consultancy based in London, United Kingdom. As a former data curator at the UK Data Service (UKDS, University of Essex) - European largest socio-economic data repository, Simon has an extensive experience in processing and managing large-scale datasets such as censuses, sensor and smart meter data, telecommunication data and well-known governmental and so surveys such as the British So Attitudes survey, Labour Force surveys, Understanding Society, National Travel survey, and many other socio-economic datasets collected and deposited by Eurostat, World Bank, Office for National Statistics, Department of Transport, NatyCen and International Energy Agency, to mention just a few. Simon has delivered numerous data science and R training courses at public institutions and international comparniues. He has also taught a course in Big Data Methods in R at major UK universities and at the prestigious Big Data and Analyhcs Summer School organized by the Institute of Analytics and Data Saence (IADS)。
目录 Preface Chapter 1:The Era of Big Data Big Data - The monster re-defined Big Data toolbox - dealing with the giant Hadoop - the elephant in the room Databases Hadoop Spark-ed up R- The unsung Big Data hero Summary Chapter 2:Introduction to R Programming Language and Statistical Environment Learning R Revisiting R basics Getting R and RStudio ready Setting the URLs to R repositories R data structures Vectors Scalars Matrices Arrays Data frames Lists Exporting R data objects Applied data science with R Importing data from different formats Exploratory Data Analysis Data aggregations and contingency tables Hypothesis testing and statistical inference Tests of differences Independent t-test example (with power and effect size estimates) ANOVA example Tests of relationships An example of Pearsons r correlations Multiple regression example Data visualization packages Summary Chapter 3:Unleashing the Power of R from Within Traditional limitations of R Out-of-memory data Processing speed To the memory limits and beyond Data transformations and aggregations with the ff and ffbase packages Generalized linear models with the ff and ffbase packages Logistic regression example with ffbase and biglm Expanding memory with the bigmemory package Parallel R From bigmemory to faster computations An apply() example with the big.matrix object A for() loop example with the ffdf object Using apply() and for() loop examples on a data.frame A parallel package example A foreach package example The future of parallel processing in R Utilizing Graphics Processing Units with R Multi-threading with Microsoft R Open distribution Parallel machine learning with H20 and R Boosting R performance with the data.table package and other tools Fast data import and manipulation with the data.table package Data import with data.table Lightning-fast subsets and aggregations on data.table Chaining, more complex aggregations, and pivot tables with data.table Writing better R code Summary Chapter 4:Hadoop and MapReduce Framework for R Hadoop architecture Hadoop Distributed File System MapReduce framework A simple MapReduce word count example Other Hadoop native tools Learning Hadoop A single-node Hadoop in Cloud Deploying Hortonworks Sandbox on Azure A word count example in Hadoop using Java A word count example in Hadoop using the R language RStudio Server on a Linux RedHat/CentOS virtual machine Installing and configuring RHadoop packages HDFS management and MapReduce in R - a word count example HDInsight - a multi-node Hadoop cluster on Azure Creating your first HDInsight cluster Creating a new Resource Group Deploying a Virtual Network Creating a Network Security Group Setting up and configuring an HDInsight cluster Starting the cluster and exploring Ambari Connecting to the HDInsight cluster and installing RStudio Server Adding a new inbound security rule for port 8787 Editing the Virtual Networks public IP address for the head node Smart energy meter readings analysis example - using R on HDInsight cluster Summary Chapter 5:R with Relational Database Management Systems (RDBMSs) Relational Database Management Systems (RDBMSs) A short overview of used RDBMSs Structured Query Language (SQL) SQLite with R Preparing and importing data into a local SQLite database Connecting to SQLite from RStudio MariaDB with R on a Amazon EC2 instance Preparing the EC2 instance and RStudio Server for use Preparing MariaDB and data for use Working with MariaDB from RStudio PostgreSQL with R on Amazon RDS Launching an Amazon RDS database instance Preparing and uploading data to Amazon RDS Remotely querying PostgreSQL on Amazon RDS from RStudio Summary Chapter 6:R with Non-Relational (NoSQL) Databases Introduction to NoSQL databases Review of leading non-relational databases MongoDB with R Introduction to MongoDB MongoDB data models Installing MongoDB with R on Amazon EC2 Processing Big Data using MongoDB with R Importing data into MongoDB and basic MongoDB commands MongoDB with R using the rmongodb package MongoDB with R using the RMongo package MongoDB with R using the mongolite package HBase with R Azure HDInsight with HBase and RStudio Server Importing the data to HDFS and HBase Reading and querying HBase using the rhbase package Summary Chapter 7:Faster than Hadoop - Spark with R Spark for Big Data analytics Spark with R on a multi-node HDInsight cluster Launching HDInsight with Spark and R/RStudio Reading the data into HDFS and Hive Getting the data into HDFS Importing data from HDFS to Hive Bay Area Bike Share analysis using SparkR Summary Chapter 8:Machine Learning Methods for Big Data in R What is machine learning? Supervised and unsupervised machine learning methods Classification and clustering algorithms Machine learning methods with R Big Data machine learning tools GLM example with Spark and R on the HDInsight cluster Preparing the Spark cluster and reading the data from HDFS Logistic regression in Spark with R Naive Bayes with H20 on Hadoop with R Running an H2O instance on Hadoop with R Reading and exploring the data in H2O Naive Bayes on&
以下为对购买帮助不大的评价