What if artificial intelligence were to one day surpass human intelligence? This possibility, known as “superintelligence,” is something OpenAI expects could happen within this decade. To prepare for this, the company has formed a new team focused on ensuring that superintelligent AI aligns with humanity’s best interests.
In a blog post announcing the team, OpenAI posed the question: “How do we ensure AI systems much smarter than humans follow human intent?” The new team, named Superalignment, will be co-led by Ilya Sutskever and Jan Leike, both authors of the post.
Rather than focusing on artificial general intelligence (AGI), OpenAI’s main focus is on addressing the alignment of superintelligent AI systems. The company stresses that it is aiming to manage a system with much higher capabilities than what current alignment methods, like reinforcement learning from human feedback, can handle.
OpenAI plans to tackle this challenge by assembling a team of top machine learning researchers and engineers dedicated to the problem. The company emphasized that solving this issue is critical to its mission and expects multiple teams to contribute, from developing new methods to scaling them for deployment.
Sutskever, a co-founder and the chief scientist at OpenAI, and Leike, who leads the alignment team, have been at the forefront of OpenAI’s alignment efforts. Previously, their work focused on three main pillars: training AI systems using human feedback, training AI to assist with human evaluation, and training AI to conduct alignment research. Leike shared in a tweet that most of the previous alignment team has now joined the new Superalignment team.
In support of this initiative, OpenAI is dedicating 20% of its compute resources over the next four years to solving the superintelligence alignment problem. Leike pointed out that this is a significant commitment, calling it “the largest investment in alignment ever made,” and possibly more than what has been spent on alignment research globally up to this point.
The Superalignment team has an ambitious goal: to solve the technical challenges of aligning superintelligent AI within four years. The team’s work will focus on improving the safety of current AI models like ChatGPT, understanding and mitigating AI risks such as misuse, economic disruption, disinformation, bias, discrimination, and overreliance.
Sociotechnical issues—concerns related to how humans and machines interact—will also be a key focus. OpenAI is working closely with experts from various fields to ensure that its technical solutions take into account broader societal concerns.
The team has set its first goal: to develop a human-level automated alignment researcher. This would allow them to scale up their efforts using large amounts of compute and gradually align superintelligent AI. To achieve this, they will need to create a scalable training method, validate the resulting model, and thoroughly stress-test the alignment process. Stress testing will involve providing training signals on tasks that are difficult for humans to evaluate and using AI systems to evaluate other AI systems. It will also include automating the identification and interpretation of problematic behaviors.
Finally, the team plans to test the entire process by deliberately training misaligned models and ensuring that their techniques can detect serious misalignments through adversarial testing.
In response to concerns about how they will know if progress is being made, Leike explained that they would closely monitor empirical data as it comes in, measuring progress on specific parts of the research roadmap, such as scalable oversight. They plan to track how well the alignment of models like GPT-5 goes and to closely observe how quickly the technology develops.