From Lars Syll Machine-learning techniques used by thousands of scientists to analyse data are producing results that are misleading and often completely wrong. Dr Genevera Allen from Rice University in Houston said that the increased use of such systems was contributing to a “crisis in science” … The data sets are very large and expensive. But, according to Dr Allen, the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world … Machine learning systems and the use of big data sets has accelerated the crisis, according to Dr Allen. That is because machine learning algorithms have been developed specifically to find interesting things in datasets and so when they search through
Topics:
Lars Pålsson Syll considers the following as important: Uncategorized
This could be interesting, too:
John Quiggin writes Trump’s dictatorship is a fait accompli
Peter Radford writes Election: Take Four
Merijn T. Knibbe writes Employment growth in Europe. Stark differences.
Merijn T. Knibbe writes In Greece, gross fixed investment still is at a pre-industrial level.
from Lars Syll
Machine-learning techniques used by thousands of scientists to analyse data are producing results that are misleading and often completely wrong.
Dr Genevera Allen from Rice University in Houston said that the increased use of such systems was contributing to a “crisis in science” …
The data sets are very large and expensive. But, according to Dr Allen, the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world …
Machine learning systems and the use of big data sets has accelerated the crisis, according to Dr Allen. That is because machine learning algorithms have been developed specifically to find interesting things in datasets and so when they search through huge amounts of data they will inevitably find a pattern.
“The challenge is can we really trust those findings?” she told BBC News.
“Are those really true discoveries that really represent science? Are they reproducible? If we had an additional dataset would we see the same scientific discovery or principle on the same dataset? And unfortunately the answer is often probably not.”
The central problem with the present ‘machine learning’ and ‘big data’ hype is that so many think that they can get away with analysing real-world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for.
Machine learning algorithms always express a view of what constitutes a pattern or regularity. They are never theory-neutral.
Clever data-mining tricks are not enough to answer important scientific questions. Theory matters.