Big Data are not only Big but Complex, Messy, Badly Sampled, and Creepy

Istatisticsn researching last weeks post, I came across this great lecture by Professor Thomas Lumley

Prof Lumley discusses the role that statisticians must play in data science – in an entertaining and understandable way.

Great piece for any aspiring data scientist, or the interested lay person.

Data Science: Will Computer Science and Informatics Eat Our Lunch? 

Mainstream statistics ignored computing for many years, so that students were taught to handle infinite N, but not N of a million. Practical estimation of conditional probabilities and conditional distributions in large data sets was often left to computer science and informatics. Although statistics started behind, we are catching up: many individual statisticians and some statistics departments are taking computing seriously. More importantly, applied statistics has a long tradition of understanding how to formulate questions: large-scale empirical data can tell you a lot of things, but not what your question is. Big Data are not only Big but Complex, Messy, Badly Sampled, and Creepy. These are problems that statistics has thought about for some time, so we have the opportunity to take all the shiny computing technology that other people have developed and use it to re-establish statistics in data science.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.