preface xiii 1 introduction 1 data analysis 1 what’s in this book 2 what’s with theworkshops? 3 what’s with the math? 4 what you’ll need 5 what’smissing 6 part i graphics: looking at data 2 a single variable: shape and distribution 11 dot and jitter plots 12 histograms and kernel density estimates 14 the cumulative distribution function 23 rank-order plots and lift charts 30 only when appropriate: summary statistics and box plots 33 workshop: numpy 38 further reading 45 3 two variables: establishing relationships 47 scatter plots 47 conquering noise: smoothing 48 .logarithmic plots 57 banking 61 linear regression and all that 62 showing what’s important 66 graphical analysis and presentation graphics 68 workshop: matplotlib 69 further reading 78 4 time as a variable: time-series analysis 79 examples 79 the task 83 smoothing 84 don’t overlook the obvious! 90 the correlation function 91 optional: filters and convolutions 95 workshop: scipy.signal 96 further reading 98 5 more than two variables: graphical multivariate analysis 99 false-color plots 100 a lot at a glance: multiplots 105 composition problems 110 novel plot types 116 interactive explorations 120 workshop: tools for multivariate graphics 123 further reading 125 6 intermezzo: a data analysis session 127 a data analysis session 127 workshop: gnuplot 136 further reading 138 part ii analytics: modeling data 7 guesstimation and the back of the envelope 141 principles of guesstimation 142 how good are those numbers? 151 optional: a closer look at perturbation theory and error propagation 155 workshop: the gnu scientific library (gsl) 158 further reading 161 8 models from scaling arguments 163 models 163 arguments from scale 165 mean-field approximations 175 common time-evolution scenarios 178 case study: how many servers are best? 182 why modeling? 184 workshop: sage 184 further reading 188 9 arguments from probability models 191 the binomial distribution and bernoulli trials 191 the gaussian distribution and the central limit theorem 195 power-law distributions and non-normal statistics 201 other distributions 206 optional: case study—unique visitors over time 211 workshop: power-law distributions 215 further reading 218 10 what you really need to know about classical statistics 221 genesis 221 statistics defined 223 statistics explained 226 controlled experiments versus observational studies 230 optional: bayesian statistics—the other point of view 235 workshop: r 243 further reading 249 11 intermezzo: mythbusting—bigfoot, least squares, and all that 253 how to average averages 253 the standard deviation 256 least squares 260 further reading 264 part iii computation: mining data 12 simulations 267 awarm-up question 267 monte carlo simulations 270 resampling methods 276 workshop: discrete event simulations with simpy 280 further reading 291 13 finding clusters 293 what constitutes a cluster? 293 distance and similarity measures 298 clustering methods 304 pre- and postprocessing 311 other thoughts 314 a special case:market basket analysis 316 aword ofwarning 319 workshop: pycluster and the c clustering library 320 further reading 324 14 seeing the forest for the trees: finding important attributes 327 principal component analysis 328 visual techniques 337 kohonen maps 339 workshop: pca with r 342 further reading 348 15 intermezzo: when more is different 351 a horror story 353 some suggestions 354 what about map/reduce? 356 workshop: generating permutations 357 further reading 358 part iv applications: using data 16 reporting, business intelligence, and dashboards 361 business intelligence 362 corporate metrics and dashboards 369 data quality issues 373 workshop: berkeley db and sqlite 376 further reading 381 17 financial calculations and modeling 383 the time value of money 384 uncertainty in planning and opportunity costs 391 cost concepts and depreciation 394 should you care? 398 is this all that matters? 399 workshop: the newsvendor problem 400 further reading 403 18 predictive analytics 405 introduction 405 some classification terminology 407 algorithms for classification 408 the process 419 the secret sauce 423 the nature of statistical learning 424 workshop: two do-it-yourself classifiers 426 further reading 431 19 epilogue: facts are not reality 433 a programming environments for scientific computation and data analysis 435 software tools 435 a catalog of scientific software 437 writing your own 443 further reading 444 b results from calculus 447 common functions 448 calculus 460 useful tricks 468 notation and basic math 472 where to go from here 479 further reading 481 c working with data 485 sources for data 485 cleaning and conditioning 487 sampling 489 data file formats 490 the care and feeding of your data zoo 492 skills 493 terminology 495 further reading 497 index 499