Powerful AI needs to be reliably aligned with human values. Does this mean that AI will eventually have to police those values? Cambridge philosophers Huw Price and Karina Vold consider the trade-off between safety and autonomy in the era of superintelligence.

This has been the decade of AI, with one astonishing feat after another. A chess-playing AI that can defeat not only all human chess
players, but also all previous human-programmed chess machines, after learning the game in just four hours? That's yesterday's news, what's next?

True, these prodigious accomplishments are all in so-called narrow AI, where machines perform highly
specialised tasks. But many experts believe this restriction is very temporary. By mid-century, we may have artificial general intelligence (AGI) – machines that are capable of human-level performance on the full range of tasks that we ourselves can tackle.

If so, then there's little reason to think that it will stop there. Machines will be free of many of the physical constraints on human intelligence. Our brains run at slow biochemical processing speeds on the power of a light
bulb, and need to fit through a human birth canal. It is remarkable what they accomplish, given these handicaps. But they may be as far from the physical limits of thought as our eyes are from the Webb Space Telescope.

Once machines are better than us at designing even smarter machines, progress towards these limits could accelerate. What would this mean for us? Could we ensure a safe and worthwhile coexistence with such machines?

On the plus side, AI is already useful and profitable for many things, and super AI might be expected to be super useful, and super profitable. But the more powerful AI becomes, the more we ask it to do for us, the more important it will be to specify its goals with great care. Folklore is full of tales of people who ask for the wrong thing, with disastrous consequences – King Midas, for example, who didn't really want his breakfast to turn to gold as he put it to his lips.

So we need to make sure that powerful AI machines are 'human-friendly' – that they have goals reliably aligned with our own values. One thing that makes this task difficult is that by the standards we want the machines to aim for, we ourselves do rather poorly. Humans are far from reliably human-friendly. We do many terrible things to each other and to many other sentient creatures with whom we share the planet. If superintelligent machines don't do a lot better than us, we'll be in deep trouble. We'll have powerful new intelligence amplifying the dark sides of our own fallible natures.

To read more, click here.