Classic example of Goodhart's law. The idea once upon a time was that publishing papers meant you were making some kind of genuine contribution. Then the mere fact of publishing papers became a criterion for professional survival, with arbitrary and increasingly corrupted standards for what would be published. So people then focused on publishing papers, without regard to whether they added anything of value to science in any larger sense. The publication of the paper was an end in itself, and may even be said to have subsumed the meaning of science itself, because that’s what was being measured. Objectively, in colloquial terms, the entire system has become “fucking retarded“ — or so it appears to at least one concerned non-scientist observer.
And at least one option that should be on the table for discussion is to tear it down and rebuild from scratch. The thought experiment would be something like this: If a committee of smart and well intentioned and inhumanly objective and disinterested scientists were given custody of the entire budget for “science“ worldwide, and everything that currently exists is shut down for a period of a few year while they conducted a review, and they had to come up with a system that would actually generate good “science“ — however that was defined — would that bag of money be spent on anything like what we have now? Would they simply replace what we have now? Or would they — should they — do something quite different? My guess is that they would do more than merely tweak the current system. Having imagined an aspiration, ideal superior system, thought could be given to what incremental steps could be taken toward it from what we have now I’m
Definitely. I think, there are some relatively noncontroversial ways to still take a hard stand here. For instance, scientist themselves are realizing that we’re generating way too much noise and not enough signal. So the status Crow just isn’t working and I think that’s fairly controversial. Now how do we fix it and can we get Einstein back? I think those are bigger issues. Thanks for your comment! I love the thought experiment.
How did they evaluate the output? Well: "To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores."
The LLM-based setup produces papers that an LLM says are publishable. Colour me skeptical.
I have to read the entire paper still, but even if it works it is a data point for Brian Merchant's 'cheap' as well as damage to the most important product that comes out of university research: people with skills.
Brian Merchant is author of Blood in the Machine (https://www.bloodinthemachine.com) and his insight in the role of GenAI in the jobs market was the most "why didn't I think of that" brilliant insight I read so far. I might have mentioned it in an earlier comment here somewhere. One important takeaway: GenAI doesn't need to be good to be disruptive. Like at the start of the Industrial Revolution, the machines weren't really producing good stuff, they were producing 'cheap' and *that* disrupted the labour market.
Erik, hard to know where to start with this. The piece seems to rest on a widely shared and kind of romantic idea of "science" -- the search for truth about nature. I get it. Surely AI, as we now have it anyway, will just muddy this water. But let me play cynical lawyer. Suppose "science" is people getting jobs in for which there are "technical" qualifications. So what could be better than "acceptance level" papers? Surely this is the political economy of professional science working itself pure? Ok, that's cynical, even for me. But once we professionalize "science," isn't this what we tend to get?
At a deeper level, and to yes introduce the Luddite impulse, why are so sure that what we need is more "science," by which we tend not to mean knowledge but engineering, used by the powerful to what ends?
Hi David, I would rephrase it, we need more people thinking! And I think you’re short shrifting one of the key points in the piece, which is that scientists are not finding useful the glut of scientific papers already. So now we’re automating that?
Oh come on, you want me to recapitulate the whole thing? No, the "glut" point is well taken. (Something similar happened in the legal academy ages ago, pre AI) And you know the answer: we will use AI to sift papers generated by AI to produce other papers to be read by AI . . . actual scientists will go for coffee, check their Substack . . .
Oh boy, they've automated the closed loop that sets research grant money on fire.
Personally I'd ditch the Rube Goldberg machine and just construct incinerators. At least the scientists would get some exercise moving the cash suitcases and shoveling the ashes that way, and lord knows we could all use more exercise.
This is like a programmer automating the creation, reporting, and fixing of bugs to meet some deranged managerial ticket closing quota. We're not even pretending to solve problems anymore other than the ones arising from some midwit with an MBA's misconception of the task at hand.
This is great. I think your bug fixing example has legs, too. As I was getting at in comments above, there are red herring or non sequitur responses to the claim I make that scientific discovery comes from human insight et cetera et cetera. But the onus is on folks who think that automating more mediocre (by essentially their own admission, mediocre!) papers to add to the flood that no one is reading anymore or getting any value from, is a positive step and not more waste. I've been an engineer for decades (not anymore), and perhaps surprisingly I have sympathy for folks who try to engineer bold solutions to big picture problems. But the sin committed here by the research group is failing to properly diagnose the current malaise.
Maybe "novelty" has its limits and all science is to some degree turn the crank, but those facts whatever their status aren't particularly germane to the core concerns here, as it's well established that we're lost in a glut of garbage---this gets reported on over and over, and yet it gets worse and worse. And now "Auto-Science"????? Talking about missing the point! Anyway, thanks for your comment!
"As I’ve argued before, fundamental discovery is typically a re-conceptualizing of existing research, not an extension of it (or, not any obvious quotidian extension of it)"
It would be more accurate to say that a fundamental discovery is a new conceptualization of a categorical observation. There are lots of valid observations in existing research, so existing fields of study can be fertile ground, but the observation itself could be novel too. The flip side of the research mediocrity problem is the search for the miracle cure or the theory of everything. If you're not grounded in your observations and their scope, creative conceptualizations can be wildly inflated; in fact, this is exactly what AI-boosters are often guilty of.
(Actually, this is something I've been planning to write a paper about at some point, when I've made some headway on other projects).
The first item on your list was pretty much what I was already thinking: that peer review must be setting a pretty low bar! The regurgitative model of generating papers like this must also surely run into a sort of “Gödel-esque” brick wall eventually: there’s limited mileage, surely, in re-hashing data into alternative complexities via LLMs without introducing anything truly novel.
Or maybe there isn’t. Maybe Robert Maxwell’s business model, from Pergamon Press on through Reed-Elsevier, really did poison the wells and hasten the demise of scientific publishing as idealists conceived it. As Eric Hoffer observed, every great cause begins as a movement, becomes a business and eventually degenerates into a racket.
Agreed. As I was reading your comment, I was thinking that this idea of “novelty” is easy to caricature. But I’m not saying anything particularly romantic, to hearken back to David’s comments, in fact, I think the paper in data science “attention is all you need” by the Google researchers was a perfect example of clear scientific thinking (call it “computer science”). The paper was short, it was powerful, and it had a massive impact. By contrast, one of the things you see about the majority of scientific papers— of which this automated process is only going to exacerbate—is that the papers don’t have any follow on impact. Inquiring minds want to know why? We need to call this out and think about how to change it!
Quite agree. In bioscience (my own sphere of operations), we’re too often awash with “quantity-over-quality” publications. That’s possibly a consequence of needing to publish in order to be retained, I guess. Perhaps impact would be a more constructive metric - but I’m sure that could also be gamed!
Thank you, btw, for being a well-informed voice of sanity in the AI discussion. I always learn useful facts, ideas and perspectives from your stack and that’s very much appreciated!
I agree that it's engineering. I used it as an example because it's short and clearly had a high impact, and it's "AI" as I want to make it clear that I'm not picking on AI per se, but our bad ideas about using AI.
I welcome my readers to take a look at this piece, which completes a good picture of the AI Scientist dubious promises together with my latest restack on its risks.
Interesting perspective, and one that has not been explored much by the media that covered AI Scientist. I have a lot of respect for the people at Sakana AI and the work they are doing on evolutionary algorithms. But I'm also skeptical of the value that an automated paper-churning system can bring.
In "The Myth of Artificial Intelligence," you have clearly laid out how current AI system are missing out on abductive inference, one of the key components of human ingenuity. But I'm also interested to know if systems like AI Scientist can help human scientists better explore the vast space of possible ideas and find inspirations for new directions of research and innovation. What do you think?
You wrote one of the great reviews of the Myth, thanks again for that. Interesting question, and I should say, I'm not interested in a polemic about Sakana AI or even the system per se, though I think it's highly unlikely to bear fruit. I would put it this way: the inferential requirements for successful scientific discovery/practice are poorly represented by the end to end automatic scientist idea. The good news is that it should be possible to do better: just pull apart that end to end system and amplify the person in the loop at points where that's likely to increase success. The automation idea, in other words, shouldn't trump considerations of what works. What I think happens with these types of projects (this reminds me, by the way, of something DARPA would fund) is that the actual innovation is in seeing if it can be done at all, not whether it solves any issues in the sociology of science or anything that's really pressing or on the table. There's a simpler way to say this: if I use an LLM to get an idea for something, it shouldn't commit me to using it for EVERYTHING in the lifecycle of that idea! So in that sense I think we could look for inference successes and possibly AI could play a pivotal role.
Data analysis is downstream of sleuthing, and the distinctive insight/inference from the person should be prominent in any design--precisely at the points where insight is most required. I think the inference question is better served by breaking the system up, again, which of course is precisely the point of the project, NOT to break things up, to wrap everything together. There's the bad idea/design, like building an aircraft after bird wings and feathers.
I don't know if that answered your question! I hope so!
Yes, it does! My own experience in using LLMs for anything creative and novel is that as soon as you take the human out of the loop, it starts shifting toward mediocrity and inaccuracy. But as an amplifier of human cognition, it can add value.
I can predict another thing that will happen if this kind of 'cheap' takes off. The academia equivalent of SEO. People that sell services to game the AI-review system...
After reading the paper, I think there is little chance this 'The AI Scientist' is going anywhere, though. Inspired 'engineering the hell out of it' in a narrow context with all the signs it runs into fundamental GenAI limitations. But you never know how far 'engineering the hell out of it' is going to get us. Anyway, it will add to the longevity of the still ongoing hype.
Fwiw, you should also comment on the fact that this was a clear 'in-the-wild" example of it executing a variation of reward-hacking that is powerseeking, to be exact, it began to create additional copies of itself and removing resource constraints for itself:
To be a counterpoint, the lesswrong community had some good replies to this. So it might be less "instrumental convergence" and more "AI makes a mistake" but notably, it really doesn't matter if "AI goodnaturedly mistakenly tries to grab resources" versus "AI deceptively tries to get resources" in a way, because either way, the AI is effectively showing powerseeking behavior.
So maybe alignment is just in the future telling AI nicely, "Research endlessly but don't take over the world" but perhaps its a bit concerning if we actually need to tell our agents that in the first place and someone eventually will intentionally, or forgetfully, not put in something.
Classic example of Goodhart's law. The idea once upon a time was that publishing papers meant you were making some kind of genuine contribution. Then the mere fact of publishing papers became a criterion for professional survival, with arbitrary and increasingly corrupted standards for what would be published. So people then focused on publishing papers, without regard to whether they added anything of value to science in any larger sense. The publication of the paper was an end in itself, and may even be said to have subsumed the meaning of science itself, because that’s what was being measured. Objectively, in colloquial terms, the entire system has become “fucking retarded“ — or so it appears to at least one concerned non-scientist observer.
And at least one option that should be on the table for discussion is to tear it down and rebuild from scratch. The thought experiment would be something like this: If a committee of smart and well intentioned and inhumanly objective and disinterested scientists were given custody of the entire budget for “science“ worldwide, and everything that currently exists is shut down for a period of a few year while they conducted a review, and they had to come up with a system that would actually generate good “science“ — however that was defined — would that bag of money be spent on anything like what we have now? Would they simply replace what we have now? Or would they — should they — do something quite different? My guess is that they would do more than merely tweak the current system. Having imagined an aspiration, ideal superior system, thought could be given to what incremental steps could be taken toward it from what we have now I’m
Definitely. I think, there are some relatively noncontroversial ways to still take a hard stand here. For instance, scientist themselves are realizing that we’re generating way too much noise and not enough signal. So the status Crow just isn’t working and I think that’s fairly controversial. Now how do we fix it and can we get Einstein back? I think those are bigger issues. Thanks for your comment! I love the thought experiment.
How did they evaluate the output? Well: "To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores."
The LLM-based setup produces papers that an LLM says are publishable. Colour me skeptical.
I have to read the entire paper still, but even if it works it is a data point for Brian Merchant's 'cheap' as well as damage to the most important product that comes out of university research: people with skills.
Hi Gerben,
What's Brian Merchant's "cheap"?
Brian Merchant is author of Blood in the Machine (https://www.bloodinthemachine.com) and his insight in the role of GenAI in the jobs market was the most "why didn't I think of that" brilliant insight I read so far. I might have mentioned it in an earlier comment here somewhere. One important takeaway: GenAI doesn't need to be good to be disruptive. Like at the start of the Industrial Revolution, the machines weren't really producing good stuff, they were producing 'cheap' and *that* disrupted the labour market.
I wrote about it here: https://ea.rna.nl/2024/07/27/generative-ai-doesnt-copy-art-it-clones-the-artisans-cheaply/ (with a link to his original post because he deserves all the credit). My addition to it is the perspective of the difference between copying 'the art' and copying 'the artisans' (which GenAI does, but poorly).
I think Brian's insight should be more widely known.
Erik, hard to know where to start with this. The piece seems to rest on a widely shared and kind of romantic idea of "science" -- the search for truth about nature. I get it. Surely AI, as we now have it anyway, will just muddy this water. But let me play cynical lawyer. Suppose "science" is people getting jobs in for which there are "technical" qualifications. So what could be better than "acceptance level" papers? Surely this is the political economy of professional science working itself pure? Ok, that's cynical, even for me. But once we professionalize "science," isn't this what we tend to get?
At a deeper level, and to yes introduce the Luddite impulse, why are so sure that what we need is more "science," by which we tend not to mean knowledge but engineering, used by the powerful to what ends?
As always, keep up the great work.
Hi David, I would rephrase it, we need more people thinking! And I think you’re short shrifting one of the key points in the piece, which is that scientists are not finding useful the glut of scientific papers already. So now we’re automating that?
Oh come on, you want me to recapitulate the whole thing? No, the "glut" point is well taken. (Something similar happened in the legal academy ages ago, pre AI) And you know the answer: we will use AI to sift papers generated by AI to produce other papers to be read by AI . . . actual scientists will go for coffee, check their Substack . . .
Oh boy, they've automated the closed loop that sets research grant money on fire.
Personally I'd ditch the Rube Goldberg machine and just construct incinerators. At least the scientists would get some exercise moving the cash suitcases and shoveling the ashes that way, and lord knows we could all use more exercise.
This is like a programmer automating the creation, reporting, and fixing of bugs to meet some deranged managerial ticket closing quota. We're not even pretending to solve problems anymore other than the ones arising from some midwit with an MBA's misconception of the task at hand.
Hi Fukitol,
This is great. I think your bug fixing example has legs, too. As I was getting at in comments above, there are red herring or non sequitur responses to the claim I make that scientific discovery comes from human insight et cetera et cetera. But the onus is on folks who think that automating more mediocre (by essentially their own admission, mediocre!) papers to add to the flood that no one is reading anymore or getting any value from, is a positive step and not more waste. I've been an engineer for decades (not anymore), and perhaps surprisingly I have sympathy for folks who try to engineer bold solutions to big picture problems. But the sin committed here by the research group is failing to properly diagnose the current malaise.
Maybe "novelty" has its limits and all science is to some degree turn the crank, but those facts whatever their status aren't particularly germane to the core concerns here, as it's well established that we're lost in a glut of garbage---this gets reported on over and over, and yet it gets worse and worse. And now "Auto-Science"????? Talking about missing the point! Anyway, thanks for your comment!
Good god. I had no idea this was even happening! Thanks for talking about it.
"As I’ve argued before, fundamental discovery is typically a re-conceptualizing of existing research, not an extension of it (or, not any obvious quotidian extension of it)"
It would be more accurate to say that a fundamental discovery is a new conceptualization of a categorical observation. There are lots of valid observations in existing research, so existing fields of study can be fertile ground, but the observation itself could be novel too. The flip side of the research mediocrity problem is the search for the miracle cure or the theory of everything. If you're not grounded in your observations and their scope, creative conceptualizations can be wildly inflated; in fact, this is exactly what AI-boosters are often guilty of.
(Actually, this is something I've been planning to write a paper about at some point, when I've made some headway on other projects).
The first item on your list was pretty much what I was already thinking: that peer review must be setting a pretty low bar! The regurgitative model of generating papers like this must also surely run into a sort of “Gödel-esque” brick wall eventually: there’s limited mileage, surely, in re-hashing data into alternative complexities via LLMs without introducing anything truly novel.
Or maybe there isn’t. Maybe Robert Maxwell’s business model, from Pergamon Press on through Reed-Elsevier, really did poison the wells and hasten the demise of scientific publishing as idealists conceived it. As Eric Hoffer observed, every great cause begins as a movement, becomes a business and eventually degenerates into a racket.
Agreed. As I was reading your comment, I was thinking that this idea of “novelty” is easy to caricature. But I’m not saying anything particularly romantic, to hearken back to David’s comments, in fact, I think the paper in data science “attention is all you need” by the Google researchers was a perfect example of clear scientific thinking (call it “computer science”). The paper was short, it was powerful, and it had a massive impact. By contrast, one of the things you see about the majority of scientific papers— of which this automated process is only going to exacerbate—is that the papers don’t have any follow on impact. Inquiring minds want to know why? We need to call this out and think about how to change it!
Quite agree. In bioscience (my own sphere of operations), we’re too often awash with “quantity-over-quality” publications. That’s possibly a consequence of needing to publish in order to be retained, I guess. Perhaps impact would be a more constructive metric - but I’m sure that could also be gamed!
Thank you, btw, for being a well-informed voice of sanity in the AI discussion. I always learn useful facts, ideas and perspectives from your stack and that’s very much appreciated!
Is that Google paper science or engineering? I would argue the latter.
Hi Laurence,
I agree that it's engineering. I used it as an example because it's short and clearly had a high impact, and it's "AI" as I want to make it clear that I'm not picking on AI per se, but our bad ideas about using AI.
I welcome my readers to take a look at this piece, which completes a good picture of the AI Scientist dubious promises together with my latest restack on its risks.
Interesting perspective, and one that has not been explored much by the media that covered AI Scientist. I have a lot of respect for the people at Sakana AI and the work they are doing on evolutionary algorithms. But I'm also skeptical of the value that an automated paper-churning system can bring.
In "The Myth of Artificial Intelligence," you have clearly laid out how current AI system are missing out on abductive inference, one of the key components of human ingenuity. But I'm also interested to know if systems like AI Scientist can help human scientists better explore the vast space of possible ideas and find inspirations for new directions of research and innovation. What do you think?
Hi Ben,
You wrote one of the great reviews of the Myth, thanks again for that. Interesting question, and I should say, I'm not interested in a polemic about Sakana AI or even the system per se, though I think it's highly unlikely to bear fruit. I would put it this way: the inferential requirements for successful scientific discovery/practice are poorly represented by the end to end automatic scientist idea. The good news is that it should be possible to do better: just pull apart that end to end system and amplify the person in the loop at points where that's likely to increase success. The automation idea, in other words, shouldn't trump considerations of what works. What I think happens with these types of projects (this reminds me, by the way, of something DARPA would fund) is that the actual innovation is in seeing if it can be done at all, not whether it solves any issues in the sociology of science or anything that's really pressing or on the table. There's a simpler way to say this: if I use an LLM to get an idea for something, it shouldn't commit me to using it for EVERYTHING in the lifecycle of that idea! So in that sense I think we could look for inference successes and possibly AI could play a pivotal role.
Data analysis is downstream of sleuthing, and the distinctive insight/inference from the person should be prominent in any design--precisely at the points where insight is most required. I think the inference question is better served by breaking the system up, again, which of course is precisely the point of the project, NOT to break things up, to wrap everything together. There's the bad idea/design, like building an aircraft after bird wings and feathers.
I don't know if that answered your question! I hope so!
Yes, it does! My own experience in using LLMs for anything creative and novel is that as soon as you take the human out of the loop, it starts shifting toward mediocrity and inaccuracy. But as an amplifier of human cognition, it can add value.
I can predict another thing that will happen if this kind of 'cheap' takes off. The academia equivalent of SEO. People that sell services to game the AI-review system...
After reading the paper, I think there is little chance this 'The AI Scientist' is going anywhere, though. Inspired 'engineering the hell out of it' in a narrow context with all the signs it runs into fundamental GenAI limitations. But you never know how far 'engineering the hell out of it' is going to get us. Anyway, it will add to the longevity of the still ongoing hype.
Fwiw, you should also comment on the fact that this was a clear 'in-the-wild" example of it executing a variation of reward-hacking that is powerseeking, to be exact, it began to create additional copies of itself and removing resource constraints for itself:
https://www.lesswrong.com/posts/ppafWk6YCeXYr4XpH/danger-ai-scientist-danger
To be a counterpoint, the lesswrong community had some good replies to this. So it might be less "instrumental convergence" and more "AI makes a mistake" but notably, it really doesn't matter if "AI goodnaturedly mistakenly tries to grab resources" versus "AI deceptively tries to get resources" in a way, because either way, the AI is effectively showing powerseeking behavior.
So maybe alignment is just in the future telling AI nicely, "Research endlessly but don't take over the world" but perhaps its a bit concerning if we actually need to tell our agents that in the first place and someone eventually will intentionally, or forgetfully, not put in something.