Machine learning — getting results that are completely wrong Machine-learning techniques used by thousands of scientists to analyse data are producing results that are misleading and often completely wrong. Dr Genevera Allen from Rice University in Houston said that the increased use of such systems was contributing to a “crisis in science” … The data sets are very large and expensive. But, according to Dr Allen, the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world … Machine learning systems and the use of big data sets has accelerated the crisis, according to Dr Allen. That is because machine learning algorithms have been developed
Topics:
Lars Pålsson Syll considers the following as important: Statistics & Econometrics
This could be interesting, too:
Lars Pålsson Syll writes The history of econometrics
Lars Pålsson Syll writes What statistics teachers get wrong!
Lars Pålsson Syll writes Statistical uncertainty
Lars Pålsson Syll writes The dangers of using pernicious fictions in statistics
Machine learning — getting results that are completely wrong
Machine-learning techniques used by thousands of scientists to analyse data are producing results that are misleading and often completely wrong.
Dr Genevera Allen from Rice University in Houston said that the increased use of such systems was contributing to a “crisis in science” …
The data sets are very large and expensive. But, according to Dr Allen, the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world …
Machine learning systems and the use of big data sets has accelerated the crisis, according to Dr Allen. That is because machine learning algorithms have been developed specifically to find interesting things in datasets and so when they search through huge amounts of data they will inevitably find a pattern.
“The challenge is can we really trust those findings?” she told BBC News.
“Are those really true discoveries that really represent science? Are they reproducible? If we had an additional dataset would we see the same scientific discovery or principle on the same dataset? And unfortunately the answer is often probably not.”
The central problem with the present ‘machine learning’ and ‘big data’ hype is that so many think that they can get away with analysing real-world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for.
Machine learning algorithms always express a view of what constitutes a pattern or regularity. They are never theory-neutral.
Clever data-mining tricks are not enough to answer important scientific questions. Theory matters.