If you haven’t heard, a team of researchers at Sakana AI in Japan teamed with colleagues at the University of Oxford and the University of British Columbia have released “The AI Scientist,” an LLM based system that automates scientific research from idea to the publication of an “acceptance level paper”:
The AI system uses LLMs to mimic the scientific research process and has already been tested by prompting it to carry out tasks related to AI research, which means it is already conducting research with the goal of finding ways to improve its own abilities. The researchers claim their system is currently conducting real science, and as part of such efforts, it is producing acceptance-level papers.
Here’s a laundry list of reasons I’m skeptical.
Artificial Credibility: The term "acceptance-level paper" masks mediocrity. The authors boast that the auto-generated papers get peer reviewed and accepted, but that doesn’t mean they’re valuable. The AI Scientist might be filling journals with work that meets the bare minimum [pieces of flair], but lack depth and can’t drive real progress. Argh. Science is already in a decline, as the (once) prestigious journal Nature noted. “Disruptive” science has declined; science has been losing its mojo for a half century now. And now we have “The AI Scientist” producing acceptance-level papers without a hint of irony.
The Illusion of Progress: We’re drowning in a glut of publications already—all in the milieu of diminishing returns as noted by the Nature piece. Might they be connected? I think so. The surge in papers, many of which are generated by AI, creates an illusion of rapid advancement while the true impact on science is minimal, with many papers ignored and uncited. We have a flood of acceptance-level publications already. Cranking out yet more of them and calling it the future of science is pretty tone-deaf.
Flooding the Field: A related problem is that even if there are some interesting gems in those acceptance-level papers churned out by the system, we’re not likely to find them. The perception that much published research isn’t groundbreaking and may be downright mediocre means, inter alia, that the cost-benefit of searching for a needle in a haystack is negligible. The problem in other words is that there’s already too much noise that’s affecting scientific fecundity. Our man on the ground, the venerable AI Scientist, is adding to the problem, not the solution.
Innovation Stagnation: A closely related problem involves innovation. AI-generated research, built on existing data, is unlikely to challenge paradigms or lead to the kind of breakthroughs that define real scientific advancement. As I’ve argued before, fundamental discovery is typically a re-conceptualizing of existing research, not an extension of it (or, not any obvious quotidian extension of it). Engineering by its nature is a downstream epistemic endeavor where the rules of the game are given.
This is one reason games like chess or Go are such magnets for demonstrating the prowess of AI. The game and the rules are known and don’t change. But discovery needs to do more, and confusion over this reality is likely behind our present stagnation. We can’t search for the keys under the lamp because it’s easier to see there. It’s out in the dark and with initially non-obvious connections—a synthesis of ideas in different disciplines previously considered disparate—that science advances. One way to read the growing body of literature demonstrating science isn’t disruptive anymore and that breakthroughs are slowing or just plain disappearing is that we’re already under invested in human excellence. AI Scientist isn’t gonna help.
Further Entrenchment of Quantity Over Quality
An obviously related and 100% troubling observation in our Age of Mediocrity is that we’re already leaning hard into quantity and often at the expense of quantity. I’m no Luddite, but I suspect that AI and computation have had unintended cultural consequences, among them the problem that AI Scientist is highlighting in the arena of science. Thousands of scientists publish a paper every five days. Arriving at the idea that the scientific community could benefit from The AI Scientist is a bit like suggesting Six Minute Abs trumps Seven Minute Abs. We’re already trudging headlong down the wrong path.
The Emperor's New Papers
The AI Scientist is the Emperor's New Papers dressed up in the shiny veneer of AI. Sure, LLMs and AI can crank out “acceptance-level” papers faster than you can say “publish or perish,” but cranking out lots of shitty papers is already our problem, as Nature and other scientific journals and publications of note have begun making clear. On this centrally relevant issue, the tone deaf quality of the AI community is apparent once again. It’s not even that we’re not likely to hit paydirt with “auto-science,” it’s that trumpeting “auto-science” perpetuates the error we need to start fixing. Now we have another distraction.
I’ll be following the trajectory of this technology and will report back with anything that’s interesting. And, hey, if it does come up with “actionable intel,” I’ll pass that along as well. But don’t count on it.
Erik J. Larson
Classic example of Goodhart's law. The idea once upon a time was that publishing papers meant you were making some kind of genuine contribution. Then the mere fact of publishing papers became a criterion for professional survival, with arbitrary and increasingly corrupted standards for what would be published. So people then focused on publishing papers, without regard to whether they added anything of value to science in any larger sense. The publication of the paper was an end in itself, and may even be said to have subsumed the meaning of science itself, because that’s what was being measured. Objectively, in colloquial terms, the entire system has become “fucking retarded“ — or so it appears to at least one concerned non-scientist observer.
And at least one option that should be on the table for discussion is to tear it down and rebuild from scratch. The thought experiment would be something like this: If a committee of smart and well intentioned and inhumanly objective and disinterested scientists were given custody of the entire budget for “science“ worldwide, and everything that currently exists is shut down for a period of a few year while they conducted a review, and they had to come up with a system that would actually generate good “science“ — however that was defined — would that bag of money be spent on anything like what we have now? Would they simply replace what we have now? Or would they — should they — do something quite different? My guess is that they would do more than merely tweak the current system. Having imagined an aspiration, ideal superior system, thought could be given to what incremental steps could be taken toward it from what we have now I’m
How did they evaluate the output? Well: "To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores."
The LLM-based setup produces papers that an LLM says are publishable. Colour me skeptical.
I have to read the entire paper still, but even if it works it is a data point for Brian Merchant's 'cheap' as well as damage to the most important product that comes out of university research: people with skills.