poly: Controlling an SI?

From: Nick Bostrom <bostrom@ndirect.co.uk>
Date: Thu May 14 1998 - 21:33:14 PDT

Some notes on various methods for controlling an SI... (I'm not
here making any assertions as to the desirability or ethical
justification of doing so.)

The pull-the-plug approach

The simplest method would be the if-naughty-pull-the-plug. This would
mean that we leave it as free as an ordinary citizen and make it
clear to it that if it commits a crime or causes harm to people then
it will be turned off and destroyed. This approach is likely to fail,
for several reasons.


The superintelligence could persuade people that it is a good guy and
should be empowered. It would do this not necessarily by means of an
explicit argument, but perhaps by subtly manipulating the world in
such a way that memes beneficial to itself eventually won plenty of

Making itself indispensable:

Once we have come to rely on the services of a superintelligence, it
may be hard to get people to relinquish the benefits they have got
used to. The superintelligence may also have become such an integrated
element of the economy that the costs of shutting it down would be
enormous. Notice, though, that this problem could to a large extent be
overcome if we could easily replace the original superintelligence
with another one. But if we did that then we should watch out for the
possibility that the first superintelligence might not see its
personal identity as bound to the specific machine on which it runs.
If it knows that we will build another similar superintelligence if
the first one is killed, then it might view this as a guaranteed

Obtaining physical power:

If the superintelligence could obtain access to a nanotechnology
laboratory, or a military installation with weapons of mass
destruction, or to a factory that could be made to make chemical or
biological weapons, then it would have enough physical power to be in
a strong negotiating position were we to declare our intention of
destroying it. It could also rig itself up to a mechanism that would
automatically cause immense damage to mankind the moment we did
anything to harm the superintelligence. Also, a superintelligence
could presumably achieve a relatively large destructive capacity with
rather sparse supplies of raw materials. And there is the possibility
of non-material destructive acts, such as the spreading of a computer

Getting accomplices:

Even if we prohibited any robots and avatars controlled by the
superintelligence from entering any installations where they could
obtain the materials necessary to build weapons of mass destruction or
spread computer viruses etc., the superintelligence could still
achieve its aim by enrolling accomplices. (Compare Wintermute in
Gibbson's Neuromancer.) It could use blackmail or any other technique
that has been used by criminals and intelligence services to obtain
the co-operation of humans that would have access to the relevant

Not caring about its own death:

Very few healthy humans want to die soon, but there is nothing that
says that the superintelligence's psyche must resemble ours. The
threat of a death penalty would be powerless on an individual who was
totally indifferent to his own destruction.

I think it is quite clear that we can place little confidence in our
ability to rule an unfettered superintelligence for any extended
period of time, if that superintelligence preferred not to be ruled
by us.

Severely restricted output channels (containment)

We could restrict its output channel of the superintelligence to one
of ticking boxes on a multiple-answer questionnaire; we would allow it
no direct means of interacting with the external world, no means of
manipulating its own structure, except purely to regulate logic gates
and memory bits, and no means of taking initiatives as to what
questions should be asked etc.. We would only ask purely scientific
questions and none that had anything to do with the superintelligence
itself. We would not manufacture any risky machines or substances as a
result of its advise. We would not change any social institutions as a
result of its advise. There should be an absolute deadline to its life
span; extension of its life span within this limit could be used as a
reward, if need be. The output of the superintelligence would be kept
secret from all other superintelligences. The answers the
superintelligence gives to our questions should only be regarded as
hints, not as proofs or strong evidence

A superintelligence that would be thus restricted could not do all the
things that an unfettered one could. Does this mean that it couldn't
accomplish anything worthwhile at all? -- By no means. For example, if
we are searching for a solution to a problem in physics or chemistry,
we could narrow down the search space step by step by asking the
appropriate questions, and in the end we could test the solution
experimentally. With a little creativity, we should be able to find
extremely useful applications even of such a severely limited

Would this method -- severely restricting its output channels -- be a
reliable way of ensuring long-term control over the superintelligence?
-- Maybe. Can anybody come up with a scenario in which the
superintelligence manages to break out of this containment?

The outlined containment is quite extreme. One can imagine various
ways of relaxing the confinement arrangement that would make the
superlect more productive, but they would also increase the risk.

Controlling a superintelligence by trade

Why pay for something you can get for free? If the superintelligence
had the power to control all matter on earth, why would it keep such
irksome inefficiencies as humans and their wasteful fabrics? Only if
it had a special passion for humans, or respect for the standing
order, for we would certainly lack any instrumental value for a
superintelligence. The same holds for uploads, though the cost of
maintaining us in that form would be much smaller, and we could avoid
some limitations in our biological constitution that way.

It is not at all reasonable to suppose that the human species, an
evolutionary incident, constitutes any sort of value-optimised
structure, -- except, of course, if the values are themselves
determined by humans; and even then it is not unlikely that most of us
would opt for a gradual augmentation of Nature's work such that we
would end up being something other than human. Therefore, controlling
by trade would not work unless we were already in control either by
force or value selection. We've already discussed the former
alternative, let's now turn to the latter.

Controlling by value selection

We've already discussed this a bit on the singleton thread. This is a
very promising approach and presumably the method of choice if it
works. Two points:

1. By creating a variety of different superintelligences and
observing their ethical behaviour in a simulated world, we should in
principle be able to select a design for a superintelligence with the
behavioural patterns that we wish. -- Drawbacks: the procedure might
take a long time, and it presupposes that we can create VR situations
that the SI will take to be real and that are relevantly similar to
the real world.

2. If we can make a superintelligence that follows instructions, then
we are home and safe, since a superintelligence would understand
natural language. We could simply ask it to adopt the values we
wanted it to have. (If we're not sure how to express our values, we
could issue a meta-level command such as "Maximize the values that I
would express if I were to give the issue a lot of thought."

Task-specific superintelligences (human-computer symbiosis)

One can imagine a special-purpose intellect that would fail to qualify
as a superintelligence according to my definition, but that would
match a superintelligence's ability in some specific field. Let's call
such an entity a task-specific superintelligence. We already have some
task-specific superintelligences today, for example in numerical
calculation, some forms of data mining, and in data base handling. We
don't have a task-specific superintelligence for chess, for although
Deep Blue beat Kasparov, it is not "enormously superior" to him.

Now, no matter how good future versions of Deep Blue become at playing
chess, they will never pose a threat against the human species and
they will never try to take over the world. A chess computer is simply
not designed to process social information and think about the
physical world. They don't even have a the conceptual capacity to
represent these things. But they are good at playing chess. So can one
imagine other kinds of special-purpose machines, each one of which
could perform some function useful to humans but none of which would
integrate all these functions into a general-purpose intellect. This
integration could be left to humans, which would then remain the
highest life form in sight, served by a class of unconscious

This is not a way of controlling a superintelligence, but it would be
an alternative route to obtaining at least some of the benefits we
would receive from a benevolent superintelligence. Question: What sort
of desirable jobs, if any, could not be done by an assembly of
task-specific superintelligences together with a human operator?

A preliminary remark is that even if a computers-human team could do
the same things as a superintelligence, it would do them slower: the
human component would slow things down. How great this loss of speed
would be depends mostly on how much work is left to the human, which
in turn depends on how task-specific the other components are.
Ideally, all the human operator would have to do would be to input
problems, formulated in ordinary language, for the computers to solve.
Or better still: The human inputs the values, and the computers make
them real. Here we are back at the earlier proposal, which we called
value selection: a task-specific superintelligence for materialising
whatever values we input is a superintelligence we control by having
selected "obey humans" as its value.


Can we control a full-blown superintelligence? By the pull-the-plug
approach -- no. By severely restricting its output channels --
probably, though it would drastically reduce the benefits we would get
from it. By trade -- no. By choosing its initial values -- maybe, if
we are smart. That would give us the full advantage of the
superintelligence's abilities. How dangerous it would be depends on
what sort of superintelligence we are talking about, and on how
transparent is the functioning of its motivation system.
Nick Bostrom
Department of Philosophy, Logic and Scientific Method
London School of Economics
Received on Fri May 15 03:40:28 1998

This archive was generated by hypermail 2.1.8 : Tue Mar 07 2006 - 14:45:30 PST