Big Data — Poor Science Almost everything we do these days leaves some kind of data trace in some computer system somewhere. When such data is aggregated into huge databases it is called “Big Data”. It is claimed social science will be transformed by the application of computer processing and Big Data. The argument is that social science has, historically, been “theory rich” and “data poor” and now we will be able to apply the methods of “real science” to “social science” producing new validated and predictive theories which we can use to improve the world. What’s wrong with this? … Firstly what is this “data” we are talking about? In it’s broadest sense it is some representation usually in a symbolic form that is machine readable and processable. And
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Lars Pålsson Syll writes Interpreting confidence intervals
Big Data — Poor Science
Almost everything we do these days leaves some kind of data trace in some computer system somewhere. When such data is aggregated into huge databases it is called “Big Data”. It is claimed social science will be transformed by the application of computer processing and Big Data. The argument is that social science has, historically, been “theory rich” and “data poor” and now we will be able to apply the methods of “real science” to “social science” producing new validated and predictive theories which we can use to improve the world.
What’s wrong with this? … Firstly what is this “data” we are talking about? In it’s broadest sense it is some representation usually in a symbolic form that is machine readable and processable. And how will this data be processed? Using some form of machine learning or statistical analysis. But what will we find? Regularities or patterns … What do such patterns mean? Well that will depend on who is interpreting them …
Looking for “patterns or regularities” presupposes a definition of what a pattern is and that presupposes a hypothesis or model, i.e. a theory. Hence big data does not “get us away from theory” but rather requires theory before any project can commence.
What is the problem here? The problem is that a certain kind of approach is being propagated within the “big data” movement that claims to not be a priori committed to any theory or view of the world. The idea is that data is real and theory is not real. That theory should be induced from the data in a “scientific” way.
I think this is wrong and dangerous. Why? Because it is not clear or honest while appearing to be so. Any statistical test or machine learning algorithm expresses a view of what a pattern or regularity is and any data has been collected for a reason based on what is considered appropriate to measure. One algorithm will find one kind of pattern and another will find something else. One data set will evidence some patterns and not others. Selecting an appropriate test depends on what you are looking for. So the question posed by the thought experiment remains “what are you looking for, what is your question, what is your hypothesis?”
Ideas matter. Theory matters. Big data is not a theory-neutral way of circumventing the hard questions. In fact it brings these questions into sharp focus and it’s time we discuss them openly.
The central problem with the present ‘machine learning’ and ‘big data’ hype is that so many — falsely — think that they can get away with analysing real world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for.
Theory matters.