Skip to main content Site map

Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning


Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning

Paperback by Gutman, Alex J.; Goldmeier, Jordan

Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning

WAS £33.00   SAVE £4.95

£28.05

ISBN:
9781119741749
Publication Date:
24 Jun 2021
Language:
English
Publisher:
John Wiley & Sons Inc
Pages:
272 pages
Format:
Paperback
For delivery:
Estimated despatch 2 May 2024
Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning

Description

"Turn yourself into a Data Head. You'll become a more valuable employee and make your organization more successful." Thomas H. Davenport, Research Fellow, Author of Competing on Analytics, Big Data @ Work, and The AI Advantage You've heard the hype around data-now get the facts. In Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning, award-winning data scientists Alex Gutman and Jordan Goldmeier pull back the curtain on data science and give you the language and tools necessary to talk and think critically about it. You'll learn how to: Think statistically and understand the role variation plays in your life and decision making Speak intelligently and ask the right questions about the statistics and results you encounter in the workplace Understand what's really going on with machine learning, text analytics, deep learning, and artificial intelligence Avoid common pitfalls when working with and interpreting data Becoming a Data Head is a complete guide for data science in the workplace: covering everything from the personalities you'll work with to the math behind the algorithms. The authors have spent years in data trenches and sought to create a fun, approachable, and eminently readable book. Anyone can become a Data Head-an active participant in data science, statistics, and machine learning. Whether you're a business professional, engineer, executive, or aspiring data scientist, this book is for you.

Contents

Acknowledgments xiii Foreword xxiii Introduction xxvii Part One Thinking Like a Data Head Chapter 1 What Is the Problem? 3 Questions a Data Head Should Ask 4 Why Is This Problem Important? 4 Who Does This Problem Affect? 6 What If We Don't Have the Right Data? 6 When Is the Project Over? 7 What If We Don't Like the Results? 7 Understanding Why Data Projects Fail 8 Customer Perception 8 Discussion 10 Working on Problems That Matter 11 Chapter Summary 11 Chapter 2 What Is Data? 13 Data vs. Information 13 An Example Dataset 14 Data Types 15 How Data Is Collected and Structured 16 Observational vs. Experimental Data 16 Structured vs. Unstructured Data 17 Basic Summary Statistics 18 Chapter Summary 19 Chapter 3 Prepare to Think Statistically 21 Ask Questions 22 There Is Variation in All Things 23 Scenario: Customer Perception (The Sequel) 24 Case Study: Kidney-Cancer Rates 26 Probabilities and Statistics 28 Probability vs. Intuition 29 Discovery with Statistics 31 Chapter Summary 33 Part Two Speaking Like a Data Head Chapter 4 Argue with the Data 37 What Would You Do? 38 Missing Data Disaster 39 Tell Me the Data Origin Story 43 Who Collected the Data? 44 How Was the Data Collected? 44 Is the Data Representative? 45 Is There Sampling Bias? 46 What Did You Do with Outliers? 46 What Data Am I Not Seeing? 47 How Did You Deal with Missing Values? 47 Can the Data Measure What You Want It to Measure? 48 Argue with Data of All Sizes 48 Chapter Summary 49 Chapter 5 Explore the Data 51 Exploratory Data Analysis and You 52 Embracing the Exploratory Mindset 52 Questions to Guide You 53 The Setup 53 Can the Data Answer the Question? 54 Set Expectations and Use Common Sense 54 Do the Values Make Intuitive Sense? 54 Watch Out: Outliers and Missing Values 58 Did You Discover Any Relationships? 59 Understanding Correlation 59 Watch Out: Misinterpreting Correlation 60 Watch Out: Correlation Does Not Imply Causation 62 Did You Find New Opportunities in the Data? 63 Chapter Summary 63 Chapter 6 Examine the Probabilities 65 Take a Guess 66 The Rules of the Game 66 Notation 67 Conditional Probability and Independent Events 69 The Probability of Multiple Events 69 Two Things That Happen Together 69 One Thing or the Other 70 Probability Thought Exercise 72 Next Steps 73 Be Careful Assuming Independence 74 Don't Fall for the Gambler's Fallacy 74 All Probabilities Are Conditional 75 Don't Swap Dependencies 76 Bayes' Theorem 76 Ensure the Probabilities Have Meaning 79 Calibration 80 Rare Events Can, and Do, Happen 80 Chapter Summary 81 Chapter 7 Challenge the Statistics 83 Quick Lessons on Inference 83 Give Yourself Some Wiggle Room 84 More Data, More Evidence 84 Challenge the Status Quo 85 Evidence to the Contrary 86 Balance Decision Errors 88 The Process of Statistical Inference 89 The Questions You Should Ask to Challenge the Statistics 90 What Is the Context for These Statistics? 90 What Is the Sample Size? 91 What Are You Testing? 92 What Is the Null Hypothesis? 92 Assuming Equivalence 93 What Is the Significance Level? 93 How Many Tests Are You Doing? 94 Can I See the Confidence Intervals? 95 Is This Practically Significant? 96 Are You Assuming Causality? 96 Chapter Summary 97 Part Three Understanding the Data Scientist's Toolbox Chapter 8 Search for Hidden Groups 101 Unsupervised Learning 102 Dimensionality Reduction 102 Creating Composite Features 103 Principal Component Analysis 105 Principal Components in Athletic Ability 105 PCA Summary 108 Potential Traps 109 Clustering 110 k-Means Clustering 111 Clustering Retail Locations 111 Potential Traps 113 Chapter Summary 114 Chapter 9 Understand the Regression Model 117 Supervised Learning 117 Linear Regression: What It Does 119 Least Squares Regression: Not Just a Clever Name 120 Linear Regression: What It Gives You 123 Extending to Many Features 124 Linear Regression: What Confusion It Causes 125 Omitted Variables 125 Multicollinearity 126 Data Leakage 127 Extrapolation Failures 128 Many Relationships Aren't Linear 128 Are You Explaining or Predicting? 128 Regression Performance 130 Other Regression Models 131 Chapter Summary 131 Chapter 10 Understand the Classification Model 133 Introduction to Classification 133 What You'll Learn 134 Classification Problem Setup 135 Logistic Regression 135 Logistic Regression: So What? 138 Decision Trees 139 Ensemble Methods 142 Random Forests 143 Gradient Boosted Trees 143 Interpretability of Ensemble Models 145 Watch Out for Pitfalls 145 Misapplication of the Problem 146 Data Leakage 146 Not Splitting Your Data 146 Choosing the Right Decision Threshold 147 Misunderstanding Accuracy 147 Confusion Matrices 148 Chapter Summary 150 Chapter 11 Understand Text Analytics 151 Expectations of Text Analytics 151 How Text Becomes Numbers 153 A Big Bag of Words 153 N-Grams 157 Word Embeddings 158 Topic Modeling 160 Text Classification 163 Naïve Bayes 164 Sentiment Analysis 166 Practical Considerations When Working with Text 167 Big Tech Has the Upper Hand 168 Chapter Summary 169 Chapter 12 Conceptualize Deep Learning 171 Neural Networks 172 How Are Neural Networks Like the Brain? 172 A Simple Neural Network 173 How a Neural Network Learns 174 A Slightly More Complex Neural Network 175 Applications of Deep Learning 178 The Benefits of Deep Learning 179 How Computers "See" Images 180 Convolutional Neural Networks 182 Deep Learning on Language and Sequences 183 Deep Learning in Practice 185 Do You Have Data? 185 Is Your Data Structured? 186 What Will the Network Look Like? 186 Artificial Intelligence and You 187 Big Tech Has the Upper Hand 188 Ethics in Deep Learning 189 Chapter Summary 190 Part Four Ensuring Success Chapter 13 Watch Out for Pitfalls 193 Biases and Weird Phenomena in Data 194 Survivorship Bias 194 Regression to the Mean 195 Simpson's Paradox 195 Confirmation Bias 197 Effort Bias (aka the "Sunk Cost Fallacy") 197 Algorithmic Bias 198 Uncategorized Bias 198 The Big List of Pitfalls 199 Statistical and Machine Learning Pitfalls 199 Project Pitfalls 200 Chapter Summary 202 Chapter 14 Know the People and Personalities 203 Seven Scenes of Communication Breakdowns 204 The Postmortem 204 Storytime 205 The Telephone Game 206 Into the Weeds 206 The Reality Check 207 The Takeover 207 The Blowhard 208 Data Personalities 208 Data Enthusiasts 209 Data Cynics 209 Data Heads 209 Chapter Summary 210 Chapter 15 What's Next? 211 Index 215

Back

CIM (Chartered Institute of Marketing) logo