The Challenge of AI in Achieving Mathematical Creativity
Written on
Chapter 1: The AI vs. Human Mathematics Dilemma
For over six decades, machines have excelled in arithmetic tasks. The landmark achievement occurred over 40 years ago when a computer program solved the four-color map theorem. In contemporary times, artificial intelligence has infiltrated numerous mathematics fields, including pattern recognition, conjecture formulation, theorem verification, and even theorem proving.
However, the question arises: why can't AI secure a victory in a mathematics competition aimed at high school students?
The International Mathematical Olympiad (IMO)
The IMO is a prestigious competition showcasing the top six mathematical talents from various countries. These students are not just average high schoolers; many are future leaders in mathematics. Over two days, participants have 4.5 hours each day to tackle three challenging problems. The IMO emphasizes the elegance of mathematics and the unique human capability to solve complex problems.
IMO problems are intentionally crafted to be challenging. They must push the limits of the brightest minds globally while remaining solvable without advanced university-level math. This aspect makes IMO questions particularly tough for AI. The hidden solutions often possess a remarkable simplicity, contingent upon the right perspective.
The IMO Grand Challenge
The objective of the IMO Grand Challenge is clear: to create an AI capable of earning an IMO Gold Medal. Gold medals are awarded to roughly 8% of participants, with the cutoff score varying yearly based on question difficulty. To achieve this, an AI would likely need to solve four or five out of six problems accurately.
While the challenge has several stipulations (see the link at the bottom of the article), a significant aspect is that it is a "formal to formal" challenge. The AI will receive a human-translated version of the problems, reducing the necessity for comprehension.
Nonetheless, the AI faces a considerable hurdle in returning a logical sequence to prove the necessary theorem, as all IMO problems require proof.
Daniel Selsam from OpenAI, a key figure in the challenge, outlines three major challenges:
- Current technology renders this task nearly impossible, leading to skepticism from various subfields regarding its feasibility.
- Finding a "hack" for it is likely unattainable without developing robust general-purpose methodologies. A successful system could significantly advance mathematics and software engineering.
- There remains considerable uncertainty over which paradigm could succeed.
The last point is crucial; the IMO Grand Challenge represents a contest of AI paradigms and innovative concepts.
Prominent Figures in the Challenge
The IMO Challenge has garnered serious attention from top tech organizations, as evidenced by the committee's esteemed members:
- Daniel Selsam (OpenAI)
- Leonardo de Moura (Microsoft Research)
- Kevin Buzzard (Imperial College London)
- Reid Barton (University of Pittsburgh)
- Percy Liang (Stanford University)
- Sarah Loos (Google AI)
- Freek Wiedijk (University of Nijmegen)
Launched in 2019, the challenge remains open as of 2022. Predictions from over 500 individuals suggest a median expectation for an AI to achieve an IMO gold medal by 2027, though some believe it may take over a century.
AI Progress and Historical Context
In 1988, Raj Reddy presented grand challenges for AI during an AAAI presidential address. Many have since been accomplished:
- IBM's Deep Blue defeated Garry Kasparov in chess in 1997.
- AI systems, including Google Translate, have achieved real-time translation.
- Self-driving cars have been developed by companies like Google and Uber, though widespread usage is still lacking.
On the surface, the Chess Grand Challenge resembles the IMO Grand Challenge; both define success as defeating human competitors. However, a critical difference exists—chess has a finite set of moves, while mathematical proofs can involve an infinite array of possibilities.
Recent AI milestones include OpenAI's GPT-3, which has enabled machines to write blogs, generate code, and even create art. However, these capabilities do not parallel the demands of the IMO Grand Challenge. GPT-3 synthesizes information from training data, but it is prone to errors. IMO problems must be solved flawlessly and are crafted to avoid simple "copy-paste" solutions, necessitating genuine creativity.
In December 2020, Fields Medallist Peter Scholze, a triple gold medallist at the IMO, challenged the proof community to formally verify a significant theorem he co-authored. Johan Commelin led a successful verification team in July 2022. While verifying a proof is remarkable, generating an original proof demands an even higher level of creativity and intelligence.
Understanding Human Problem Solving
How do humans tackle problems? It's not merely a matter of systematically exploring pre-sorted possible steps. While there is indeed search and trial-and-error involved, many experience moments of sudden insight that guide their problem-solving journey. The origins of such inspiration remain largely unknown.
Yitang Zhang, who made headlines in 2013 with his proof on prime gaps, expressed the elusive nature of navigating through mathematical challenges: “When you try to prove a theorem, you can almost be totally lost to knowing exactly where you want to go. Often, when you find your way, it happens in a moment, then you live to do it again.”
The complexity of understanding human creativity adds another layer of difficulty to designing an AI capable of innovative problem-solving.
I am optimistic that the IMO Grand Challenge will be met within the next decade. Although IMO questions are tough, specialized training programs exist that can help humans master them. Each medallist has undergone extensive training, primarily focused on solving previous problems, which offers hope for machine learning systems.
Notable advancements have occurred in 2022. In February, OpenAI announced a model capable of solving various challenging high school olympiad problems, including those from AMC12, AIME, and two adapted from the IMO.
Additionally, in November, Meta AI revealed that their neural theorem prover successfully solved ten IMO problems. Their approach was inspired by AlphaZero's achievements, using previous proof searches to enhance its training.
Despite these advancements, significant challenges remain. Luck will play a critical role in the IMO Grand Challenge. Certain algebraic and geometric problems may be more amenable to AI solutions than others. The performance of AI systems will likely fluctuate based on the problems selected, and the IMO problem selection committee might even consider the challenge when crafting problems.
Exciting times lie ahead… stay tuned!
The first video explores whether AI is replacing mathematicians, particularly focusing on Google's AlphaGeometry and its implications for the future of mathematics.
The second video discusses the potential for AI in science and mathematics, featuring insights from renowned mathematician Terence Tao.