Preface 1. Introduction to Python Why Python Getting Started with Python Which Python Version Setting Up Python on Your Machine Test Driving Python Install pip Install a Code Editor Optional: Install IPython Summary
2. Python Basics Basic Data Types Strings Integers and Floats Data Containers Variables Lists Dictionaries What Can the Various Data Types Do? String Methods: Things Strings Can Do Numerical Methods: Things Numbers Can Do List Methods: Things Lists Can Do Dictionary Methods: Things Dictionaries Can Do Helpful Tools: type, dir, and help type dir help Putting It All Together What Does It All Mean? Summary
3. Data Meant to Be Read by Machines CSV Data How to Import CSV Data Saving the Code to a File; Running from Command Line JSON Data How to Import ]SON Data XML Data How to Import XML Data Summary
4. Working with Excel Files Installing Python Packages Parsing Excel Files Getting Started with Parsing Summary
5. PDFs and Problem Solving in Python Avoid Using PDFs! Programmatic Approaches to PDF Parsing Opening and Reading Using slate Converting PDF to Text Parsing PDFs Using pdfminer Learning How to Solve Problems Exercise: Use Table Extraction, Try a Different Library Exercise: Clean the Data Manually Exercise: Try Another Tool Uncommon File Types Summary
6. Acquiring and Storing Data Not All Data Is Created Equal Fact Checking Readability, Cleanliness, and Longevity Where to Find Data Using a Telephone US Government Data Government and Civic Open Data Worldwide Organization and Non-Government Organization (NGO) Data Education and University Data Medical and Scientific Data Crowdsourced Data and APIs Case Studies: Example Data Investigation Ebola Crisis Train Safety Football Salaries Child Labor Storing Your Data: When, Why, and How? Databases: A Brief Introduction Relational Databases: MySQL and PostgreSQL Non-Relational Databases: NoSQL Setting Up Your Local Database with Python When to Use a Simple File Cloud-Storage and Python Local Storage and Python Alternative Data Storage Summary
7. Data Cleanup: Investigation, Matching, and Formatting Why Clean Data? Data Cleanup Basics Identifying Values for Data Cleanup Formatting Data Finding Outliers and Bad Data Finding Duplicates Fuzzy Matching RegEx Matching What to Do with Duplicate Records Summary
8. Data Cleanup: Standardizing and Scripting Normalizing and Standardizing Your Data Saving Your Data Determining What Data Cleanup Is Right for Your Project Scripting Your Cleanup Testing with New Data Summary
9. Data Exploration and Analysis Exploring Your Data Importing Data Exploring Table Functions Joining Numerous Datasets Identifying Correlations Identifying Outliers Creating Groupings Further Exploration Analyzing Your Data Separating and Focusing Your Data What Is Your Data Saying? Drawing Conclusions Documenting Your Conclusions Summary
10. Presenting Your Data Avoiding Storytelling Pitfalls How Will You Tell the Story? Know Your Audience Visualizing Your Data Charts Time-Related Data Maps Interactives Words Images, Video, and Illustrations Presentation Tools Publishing Your Data Using Available Sites Open Source Platforms: Starting a New Site Jupyter (Formerly Known as IPython Notebooks) Summary
11. Web Scraping: Acquiring and Storing Data from the Web What to Scrape and How Analyzing a Web Page Inspection: Markup Structure Network/Timeline: How the Page Loads Console: Interacting with JavaScript In-Depth Analysis of a Page Getting Pages: How to Request on the Internet Reading a Web Page with Beautiful Soup Reading a Web Page with LXML A Case for XPath Summary
12. Advanced Web Scraping: Screen Scrapers and Spiders Browser-Based Parsing Screen Reading with Selenium Screen Reading with Ghost.Py Spidering the Web Building a Spider with Scrapy Crawling Whole Websites with Scrapy Networks: How the Internet Works and Why It's Breaking Your Script The Changing Web (or Why Your Script Broke) A (Few) Word(s) of Caution Summary
13. APIs API Features REST Versus Streaming APIs Rate Limits Tiered Data Volumes API Keys and Tokens A Simple Data Pull from Twitter's REST API Advanced Data Collection from Twitter's REST API Advanced Data Collection from Twitter's Streaming API Summary
14. Automation and Scaling Why Automate? Steps to Automate What Could Go Wrong? Where to Automate Special Tools for Automation Using Local Files, argv, and Config Files Using the Cloud for Data Processing Using Parallel Processing Using Distributed Processing Simple Automation CronJobs Web Interfaces Jupyter Notebooks Large-Scale Automation Celery: Queue-Based Automation Ansible: Operations Automation Monitoring Your Automation Python Logging Adding Automated Messaging Uploading and Other Reporting Logging and Monitoring as a Service No System Is Foolproof Summary
15. Conclusion Duties of a Data Wrangler Beyond Data Wrangling Become a Better Data Analyst Become a Better Developer Become a Better Visual Storyteller Become a Better Systems Architect Where Do You Go from Here? A. Comparison of Languages Mentioned B. Python Resources for Beginners C. Learning the Command Line D. Advanced Python Setup E. Python Gotchas F. IPython Hints G. Using Amazon Web Services Index
以下为对购买帮助不大的评价