MIRI - Why AI Safety?
intelligence.org/why-ai-safety/
he arguments and concepts behind AGI safety research
Humanity’s social and technological dominance stems primarily from our proficiency at reasoning, planning, and doing science (Armstrong). This relatively general problem-solving capacity is roughly what we have in mind when we talk about “intelligence” (Muehlhauser), including artificial intelligence.
Note that we don’t assume that “human-level” artificial intelligence implies artificial consciousness, artificial emotions, or other human-like characteristics. When it comes to artificial intelligence, the only assumption is that if a carbon brain can solve a practical problem, a silicon brain can too.
The case for focusing on AI risk mitigation doesn’t assume much about how future AI systems will be implemented or used. Here are the claims that we think of as key:
- Whatever problems/tasks/objectives we assign to advanced AI systems probably won’t exactly match our real-world objectives. Unless we put in an (enormous, multi-generational) effort to teach AI systems every detail of our collective values (to the extent there is overlap), realistic systems will need to rely on imperfect approximations and proxies for what we want (Soares, Yudkowsky).
- If the system’s assigned problems/tasks/objectives don’t fully capture our real objectives, it will likely end up with incentives that catastrophically conflict with what we actually want (Bostrom, Russell, Benson-Tilsen & Soares).
- AI systems can become much more intelligent than humans (Bostrom), to a degree that would likely give AI systems a decisive advantage in arbitrary conflicts (Soares, Branwen).
- It’s hard to predict when smarter-than-human AI will be developed: it could be 15 years away, or 150 years (Open Philanthropy Project). Additionally, progress is likely to accelerate as AI approaches human capability levels, giving us little time to shift research directions once the finish line is in sight (Bensinger).
Stuart Russell’s Cambridge talk is an excellent introduction to long-term AI risk. Other leading AI researchers who have expressed these kinds of concerns about general AI include Francesca Rossi (IBM), Shane Legg (Google DeepMind), Eric Horvitz (Microsoft), Bart Selman (Cornell), Ilya Sutskever (OpenAI), Andrew Davison (Imperial College London), David McAllester (TTIC), and Jürgen Schmidhuber (IDSIA).
Our take-away from this is that we should prioritize early research into aligning future AI systems with our interests, if we can find relevant research problems to study. AI alignment could easily turn out to be many times harder than AI itself, in which case research efforts are currently being wildly misallocated.
Alignment research can involve developing formal and theoretical tools for building and understanding AI systems that are stable and robust (“high reliability”), finding ways to get better approximations of our values in AI systems (“value specification”), and reducing the risks from systems that aren’t perfectly reliable or value-specified (“error tolerance”).