>The results of this paper should not be interpreted as suggesting that AI can consistently solve
research-level mathematics questions. In fact, our anecdotal experience is the opposite: success cases
are rare, and an apt intuition for autonomous capabilities (and limitations) may currently be important
for finding such cases. The papers (ACGKMP26; Feng26; LeeSeo26) grew out of spontaneous positive
outcomes in a wider benchmarking effort on research-level problems; for most of these problems, no
autonomous progress was made.
This is what everyone who uses llms regularly expected. Good results require a human in the loop and the internet is so big that just about everything has been done there by someone. Most often you.
2 hours ago [-]
measurablefunc 2 hours ago [-]
I still don't get how achieving 96% on some benchmark means it's a super genius but that last 4% is somehow still out of reach. The people who constantly compare robots to people should really ponder how a person who manages to achieve 90% on some advanced math benchmark still misses that last 10% somehow.
bee_rider 15 minutes ago [-]
This feels like a maybe interesting position, but I don’t really follow what you mean. Is it possible to just state it directly? Asking us to ponder is sort of vague.
These math LLMs seem very different from humans. A person has a specialty. A LLM that was as skilled as, say, a middling PhD recipient (not superhuman), but also was that skilled in literally every field, maybe somebody could argue that’s superhuman (“smarter” than any one human). By this standard a room full of people or an academic journal could also be seen as superhuman. Which is not unreasonable, communication is our superpower.
Joel_Mckay 2 minutes ago [-]
Humans have heuristic biases, and intuition often doesn't succeed with the unknown.
LLM are good at search, but plagiarism is not "AI".
Leonhard Euler discovered many things by simply trying proofs everyone knew was impossible at the time. Additionally, folks like Isaac Newton and Gottfried Leibniz simply invented new approaches to solve general problems.
The folks that assume LLM are "AI"... also are biased to turn a blind eye to clear isomorphic plagiarism in the models. Note too, LLM activation capping only reduces aberrant offshoots from the expected reasoning models behavioral vector (it can never be trusted.) Thus, will spew nonsense when faced with some unknown domain search space.
Most exams do not have ambiguous or unknown contexts in the answer key, and a machine should score 100% matching documented solutions without fail. However, LLM would also require >75% of our galaxy energy output to reach 1 human level intelligence error rates in general.
YC has too many true believers with "AI" hype, and it is really disturbing. =3
[1] https://www.scientificamerican.com/article/first-proof-is-ai...
These math LLMs seem very different from humans. A person has a specialty. A LLM that was as skilled as, say, a middling PhD recipient (not superhuman), but also was that skilled in literally every field, maybe somebody could argue that’s superhuman (“smarter” than any one human). By this standard a room full of people or an academic journal could also be seen as superhuman. Which is not unreasonable, communication is our superpower.
https://en.wikipedia.org/wiki/List_of_cognitive_biases
LLM are good at search, but plagiarism is not "AI".
Leonhard Euler discovered many things by simply trying proofs everyone knew was impossible at the time. Additionally, folks like Isaac Newton and Gottfried Leibniz simply invented new approaches to solve general problems.
The folks that assume LLM are "AI"... also are biased to turn a blind eye to clear isomorphic plagiarism in the models. Note too, LLM activation capping only reduces aberrant offshoots from the expected reasoning models behavioral vector (it can never be trusted.) Thus, will spew nonsense when faced with some unknown domain search space.
Most exams do not have ambiguous or unknown contexts in the answer key, and a machine should score 100% matching documented solutions without fail. However, LLM would also require >75% of our galaxy energy output to reach 1 human level intelligence error rates in general.
YC has too many true believers with "AI" hype, and it is really disturbing. =3
https://www.youtube.com/watch?v=X6WHBO_Qc-Q