Monday , May 25 2026

Home / Lars P. Syll / Big Data — Poor Science

Big Data — Poor Science

See on Internet Archive

Lars Pålsson Syll September 6, 2017 Lars P. Syll

by
Lars Pålsson Syll
My articles My site About me My books My videos
Follow on:Twitter

Summary:
Big Data — Poor Science Almost everything we do these days leaves some kind of data trace in some computer system somewhere. When such data is aggregated into huge databases it is called “Big Data”. It is claimed social science will be transformed by the application of computer processing and Big Data. The argument is that social science has, historically, been “theory rich” and “data poor” and now we will be able to apply the methods of “real science” to “social science” producing new validated and predictive theories which we can use to improve the world. What’s wrong with this? … Firstly what is this “data” we are talking about? In it’s broadest sense it is some representation usually in a symbolic form that is machine readable and processable. And

Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics

This could be interesting, too:

Lars Pålsson Syll writes Keynes’ critique of econometrics is still valid

Lars Pålsson Syll writes The history of random walks

Lars Pålsson Syll writes The history of econometrics

Lars Pålsson Syll writes What statistics teachers get wrong!

Related Articles

Big Data — Poor Science

Almost everything we do these days leaves some kind of data trace in some computer system somewhere. When such data is aggregated into huge databases it is called “Big Data”. It is claimed social science will be transformed by the application of computer processing and Big Data. The argument is that social science has, historically, been “theory rich” and “data poor” and now we will be able to apply the methods of “real science” to “social science” producing new validated and predictive theories which we can use to improve the world.

What’s wrong with this? … Firstly what is this “data” we are talking about? In it’s broadest sense it is some representation usually in a symbolic form that is machine readable and processable. And how will this data be processed? Using some form of machine learning or statistical analysis. But what will we find? Regularities or patterns … What do such patterns mean? Well that will depend on who is interpreting them …

Looking for “patterns or regularities” presupposes a definition of what a pattern is and that presupposes a hypothesis or model, i.e. a theory. Hence big data does not “get us away from theory” but rather requires theory before any project can commence.

What is the problem here? The problem is that a certain kind of approach is being propagated within the “big data” movement that claims to not be a priori committed to any theory or view of the world. The idea is that data is real and theory is not real. That theory should be induced from the data in a “scientific” way.

I think this is wrong and dangerous. Why? Because it is not clear or honest while appearing to be so. Any statistical test or machine learning algorithm expresses a view of what a pattern or regularity is and any data has been collected for a reason based on what is considered appropriate to measure. One algorithm will find one kind of pattern and another will find something else. One data set will evidence some patterns and not others. Selecting an appropriate test depends on what you are looking for. So the question posed by the thought experiment remains “what are you looking for, what is your question, what is your hypothesis?”

Ideas matter. Theory matters. Big data is not a theory-neutral way of circumventing the hard questions. In fact it brings these questions into sharp focus and it’s time we discuss them openly.

David Hales

The central problem with the present ‘machine learning’ and ‘big data’ hype is that so many — falsely — think that they can get away with analysing real world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for.

Theory matters.

Advertisements

Full story here

Are you the author?

0 0

Tags Statistics & Econometrics

About Lars Pålsson Syll

Professor at Malmö University. Primary research interest - the philosophy, history and methodology of economics.

My articles My site About me My books
Follow on:Twitter

Leave a Reply Cancel reply