poly: Rational ethics and human-posthuman relations

From: Anders Sandberg <asa@nada.kth.se>
Date: Sat May 23 1998 - 12:01:11 PDT

[ This was inspired by the interesting discussion about posthuman
ethics on the transhuman list; unfortunately I did not have the time
to participate. This is my somewhat late response, together with ideas
relating to the discussions about the "spike/swell" scenarios,
rational ethics and other things. It is a rather rough text right now,
a lot of it was finished in a creative fury tonight and I cannot vouch
for my sanity (my biological clock is right now following Icelandic
time, of all things!). Hopefully the basic ideas are sound, even if my
arguments might be fuzzy in this version. ]

Posthuman rational ethics

There has been plenty of discussions on the various transhumanist fora
about possible ethical systems for posthumans and how they might
interact with humans. This is my attempt to sketch some possible
answers to these questions, based on game theory and economics.

What is a basis for a rational ethics?

A rational ethics has to promote the survival of the entity believing
in it. A rational being will strive to achieve its goals (whatever
they may be), but if it doesn't survive it will be unable to achieve
any of the goals (except possibly posthumous goals). In addition,
individuals that do not strive for their own survival will not survive
in the long run, and the population will become dominated by

It should be noted that many memes promote overall survival of hosts
infected with the meme rather than the individual they have infected;
this is rational from the meme's point of view but not rational for
the individual host. Still, a meme encoding an ethic that is
detrimental to survival of the hosts or meme will be weeded out by
selection over time.

Rational beings will hence strive to approximate a rational ethics (it
might not be possible to come up with an *optimal* rational ethics for
a given situation or environment given lack of information about the
world, behavior of other beings and the future, but it seems
reasonable to think that a "good enough" ethics can be
developed). They will behave in ways which at least not decreases
their chance of survival, and makes it possible to achieve their

Uniform population

First, let us assume a population of mostly rational entities that are
roughly equivalent when it comes to ability and power; the differences
between them are small.

Experiments in evolutionary game theory suggest that it is rational
for individuals in prisoners' dilemma-type situations to cooperate
with each other rather than defect. In the long run cooperation is
more profitable than defection, and even selfish entities will begin
to cooperate. This will occur if future interactions are expected
(otherwise it is rational to defect) and entities are distinguishable
(otherwise it is hard to tell if this is a past partner or
defector). In this case there are evolutionarily stable strategies
(i.e. they can withstand the presence of other strategies) that are:
        "nice" (do not initiate defection)
        "forgiving" (do not always defect indefinitely after a defection)
        "provocable" (responds to defections)

(Axelrod suggested the need to be clear and predictable, but it has
been challenged,
e.g. ftp://ftp.lifl.fr/pub/users/mathieu/articles/ep98.ps.gz)

Even in a population of mostly defectors a small subpopulation of nice
strategies can gain dominance since they will be gaining so much more
from their cooperation than the defectors gain from the very rare case
they manage to squeeze some utility out of a player.

It is interesting to note that the single-game Prisoners' dilemma have
been used to suggest the need for an outside authority to
support/penalize cooperation and defection since rational players in
the single-game version will defect. But the iterated version doesn't
need the authority, since rational players will mostly cooperate.

It is also worth noting that a pure "cooperate always" strategy is not
stable if mutations can occur; a single defector can invade it. If the
cooperative strategy is reactive, on the other hand, it will become
stable to this kind of invasion. "Turn the other cheek" isn't a good
heuristic, really.

It thus seems likely that altruism is rational as long as the entities
form a group where future interactions will occur, where individuals
can be identified, and the benefit of cooperating is larger than

It is often cognitively expensive to calculate if the situation is such
that cooperation is rational or if defection can be profitable
(especially if one is a non-intelligent animal), so evolution has
favored built-in heuristics such as "cooperate with kin who asks for
it" (kin are likely to share the same heuristics, and can thus be
trusted) or "become aggressive if somebody fools you" (decreases
the likelihood of interacting again with the defector). Much of
human behavior and "natural ethics" seems to be based on such
heuristics. They are often crude and sometimes applied when they
are rationally not applicable, but they do not require the complex
rational calculations necessary for estimating a profitable strategy
and can thus save a lot of mental energy and time.


What about violence among rational entities? This can in some
sense be seen as an application of the prisoners' dilemma, where
mutual cooperation have the positive pay-off of continued survival,
mutual defection has the negative pay-off of mutual harm or
destruction, and one-sided defection provides the attacker with a
possible positive utility of increased resources, less competition etc.
and the victim suffers from a surprise attack.

Among rational entities the simulations of strategies hence suggests
that it is rational to not attack each other. In order to make things
stable a measure of provocability is necessary, i.e. force must be met
with some force (or other form of utility decrease, such as ostracism
or strong fines).

The power of violence also plays a role. One unit of violence can be
defined as the amount of violence required to kill another entity (a
more general and realistic measure would be the amount necessary to
kill with a certain probability). If each entity has a potential for
violence of V units, then if the population size N is larger than V a
group of entities can in principle always bring down any individual
that begins to defect. Retaliation becomes a way of enforce
cooperation, and rational entities will refrain from killing each
other. The larger population, the more force can be brought to bear on
the defectors.

Irrational entities may however defect even if they do not profit from
it. Since a rational entity could "mutate" into an irrational entity
for a variety of circumstances, there will likely be a small incidence
of irrational entities at any time. This means that there is always a
risk of violence, even if the majority is completely rational and
rationality is ESS.

In order to preserve the stability of the situation the rational
cooperators can decide to band together against anybody defecting
against them. This is an ESS.

However, by participating in retaliation an individual increases the
risk of being attacked slightly; nobody wants to be first in
line. This problem can be solved by a "market" approach: the group can
hire some members to do the retaliation for them, compensating them
for the increased risk by some form of payment. This is essentially
the underlying rationale for Hobbe's Leviathan, current police forces
and the proposed PPL (Privately Produced Law) firms.

Problems also occur if attacks are untraceable (this is why terrorism
is such a problem) or retaliation hard.

If the potential for violence V grows, each individual can kill more
victims if it chooses to defect. V increases with increasing technology;
once a club could only be used to kill one person at a time, now
automatic guns allow a defector to kill tens to hundreds or people,
and weapons of mass destruction can kill in the range of thousands
to millions.

The situation seems to become unstable as V becomes closer to N. This
is the problem of destruction: if almost everyone has access to
weaponry that can kill everybody else, then the probability for a
deadly breakdown becomes noticeable. If the damage done when
attacking is non-directable (i.e. one cannot be sure that one can
avoid it) or if even a successful attack can lead to retaliation (like
second strike capabilities among nuclear powers) rational entities
will be discouraged from attacking.

If the weaponry is highly directable, it becomes in principle possible
that when V is on the order N to wipe out everyone else. This is could
be rational if an entity do not have any need for the other entities
and would gain from their destruction, but given even a very low
possibility of retaliation the strategy becomes irrational.

But even if the rational entities do not use their power, there is
always a risk that an irrational entity would do it. Mutually Assured
Destruction is only stable if all participants are rational and never
makes errors.

This seems to be a real dilemma for powerful but vulnerable
entities. If the amount of violence necessary to kill an entity is not
1 but a higher number (say D), then the relative amount of violent
potential will decrease to V/D. This of course overlooks the
complications of different attack and defense strategies etc. Another
way of increasing security is making it possible to escape or avoid
attacks; no amount of V (unless it is undirected and destroys the
surroundings in a very wide area) can kill somebody who isn't there.

This is in some sense the answer to the "Guy Fawkes scenario" (anybody
can put a bomb under your house): make sure nobody knows where you

Differentiated population

Now let us turn to look at a population consisting of entities of two
or more different kinds, of differing capabilities. This could be
humans and posthumans, or humans with high technology and low

Is it rational for the more powerful part to cooperate, ignore or
attack the less powerful part? (Defection would only lead to an end of
transactions as the other part would refuse to cooperate, so this case
will not be studied here).

The law of comparative advantage seems to suggest that it is
rational for the more powerful part to cooperate with the lesser part
even when the capability differential is large.

In a nutshell, David Ricardo's Law of Comparative Advantage says that
trade is mutually profitable even when one part is more productive
than the other in every commodity that is being exchanged. Assume
posthumans can efficiently make both space habitats and spaceships,
while humans are inefficient at making habitats and truly awful at
making spaceships. It would appear reasonable for the posthumans to
make their own habitats and spaceships and not trade at all. But
assume a habitat and a spaceship are worth the same. Then if the
posthumans build a habitat then they are building something they could
get more easily from the humans (who would be eager to trade habitats
for spaceships); the effort saved could create more wealth for the
posthumans by allowing them to make more spaceships. The end result is
that both sides are better off than if they didn't trade. So according
to the law of comparative advantage they would specialize, the
posthumans relying on the humans for habitats and themselves building

As long as the law of comparative advantage holds it is rational to
trade. It of course breaks down if trade is impossible, like when one
part cannot produce anything desirable for the other part. But as long
as there is anything to trade, rational entities will cooperate.

This argument suggests that it is irrational for the more powerful
part to attack the weaker part. Not only would it end the trade, as
long as the production is not of easily taken raw materials it would
hurt the production of tradeable goods (there is little point in
trying to loot an economy based on just-in-time manufacturing and
information) and would not produce any net gain.

Attacks on a non-trading group

However, if we assume trade is irrelevant in this case the attack
would still be costly, both due to the expenditure of energy in the
attack (which might be comparatively smaller for the powerful part, of
course) and in that the weaker part would do its best to defend
itself, incurring another cost and risk to the stronger part. There
has to be some positive utility U for attacking, be it access to raw
materials, religious fanaticism or safety.

Let's assume the weaker part has violence potential V1, defense D1 and
population N1, and the stronger part violence potential V2, defense D2
and population N2. A single strong individual could hurt V2/D1 weak
individuals. But if that occurs, it is rational for the other weak
individuals to retaliate, and they can produce N1*V1 units of
violence. If N1*V1 > D2, it is very stupid of single a strong
individual to attack the weak individuals; sufficiently many ants can
kill an elephant. In order for the strong to prevail, they either need
an extremely good defense (like being practically invulnerable to
low-tech attacks) or the number of weak cooperators to be small
(divide and conquer).

In the case of a concerted attack of all the strong individuals, the
question is how large L1=N2*V2/D1 (the losses among the weak) and
L2=N1*V1/D2 (the losses among the strong) will be. Unless the loss of
L2 entities can be balanced by the utility U it will be rational for
the powerful entities to cooperate, i.e attacks only occurs if U is
greater than the negative utility of having chance L2/N2 of being
killed (note that centralized powers where the decision-making
entities can remain in security are more likely to attack, since the
chance of the initiators of the attack being hurt is much smaller).

Another factor is whether a "surprise attack" has any advantage.
There are two cases:

Surprises doesn't give any advantage, i.e. when there is second strike
capabilities. In this case the game will be "cooperate - cooperate"
(no losses on any side) or "defect - defect" (both will suffer L1 and
L2, respectively). It is not rational for the strong side to attack
unless U makes the attack worth the damage, and if L2/N2 is much
smaller than L1/N1, then the weak side will not rationally attack
unless provoked since it knows a counterattack would hurt it a lot or
even extinguish it. This makes the game more stable, since the
stronger player can now depend on the weaker's cooperation.

Surprises give advantages. In this case we have a game where the parts
at each moment need to decide to attack or not. Peace causes no net
loss or gain. An attack gives a pay-off of -L1/N1 and -L2/N2
respectively (assuming linear subjective utility functions). The
payoff matrix is (the choices strong side is along columns and its
payoff first):

         Don't Attack Attack

Don't 0 / 0 U / -L1

Attack -L2 / 0 U-L2 / -L1

The dominant strategy for the strong side in this case is to attack,
while the dominant strategy for the weak side is not to attack; the
dominant strategy equilibrium is a strong against weak attack. This is
an unstable situation.

Hence, if surprise attacks are possible or the utility U is greater
than the L2/N2 risk of being killed the strong side will rationally
attack, but otherwise neither side will initiate an attack. So under
quite general circumstances it seems that the strong part will not try
to wipe out the weaker part even when there is no trade. In fact, the
only likely reason there would be no trade is that the weaker part
would have nothing of interest to trade with, which suggests that it
is likely that the utility of attacking will also be low; there will
be little to get.

What about irrational powerful entities? In the generic situation of
cooperating strong entities that also trade with the weak entities,
they will rationally want to stop the irrational attacker (not only
because of the losses in trade, but also due to the fact that an
irrational entity could also attack them). So in this case the other
powerful entities plus the weak would retaliate against the
irrational entity, with a total power of (N1*V1+(N2-1)*V2)/D2, most
likely enough to wipe it out.

We get the same instability here as in the single group situation if
the violence potential becomes large compared to the population sizes;
at that point the irrational entities become a real danger. The same
strategies as in the earlier situation becomes relevant, with the
addition of the "doomsday device" potential if the "weak" side could
wipe out the strong side if attacked: this makes it completely
irrational to attack, but is also dangerous due to the presence of
irrational entities and mistakes.

The need for a continuous population

If more levels of entities are added with differing power-levels,
little changes. It is still rational to cooperate with one's peers,
avoiding angering one's superiors and not attacking one's inferiors.
The situation will be stable if surprise attacks are not useful, the
available weaponry cannot wipe out a sizeable part of any population
and no population would gain more from attacking another than the risk
of being retaliated against.

The entities of population X can in principle count on support from
all their trading partners P(X) when attacked. Note that if P(X) get
involved, P(P(X)) will likely also become involved and so on. One can
define the "alliance" A(X) of a population X as the union of
X,P(X),P(P(X)),... If A(X) is the entire set of populations, things
will likely remain stable assuming that the loss of a number of
"links" in the chain of trading deals between a potential attacker Y
and X (creating two disjoint alliances A(X) and A(Y)) plus the losses
due to the attack (now involving just A(X)) is smaller than the
utility of the attack. This will not likely occur if the graph of
trading populations is well connected (which is likely due to the law
of comparative advantage). If there are disjoint alliances, then one
can treat the interaction between them as a higher level game with the
same rules as before.

This seems to suggest that the strongest stabilizing factor in a
system with entities of several different levels is the amount of
connections between them. Well-connected systems will be stabler, and
even if the most powerful entities have nothing to do with the least
entities there are many intermediate entities trading both upwards and
downwards. Breaks in the chain are risky, since the different
alliances have less to lose by attacking each other. A continuous
population with entities on all scales is thus safer than a stratified


A rational ethics promotes the survival of the entity believing in
it. I have attempted to show that it is rational to cooperate under
many circumstances, both with peers and with entities of different
power levels. Since rational entities will gain more than irrational
entities through their cooperation, they will dominate the population
in the long run.

The presence of irrational entities throws some gravel into the
machinery; strategies have to be stable to occasional irrational
attacks, but rational entities will set up retaliation strategies to
deal with them.

The situation will be destabilized if there exists weaponry or
something similar that can wipe out entire populations, and cannot be
defended or retaliated against. In this case dispersion and hiding
seems to be the only way out due to the risk of irrational attacks.

A second destabilizing factor is if surprise attacks with no chance of
retaliation is possible. This both makes the irrational entities more
dangerous and makes it rational for more powerful entities to destroy
less powerful entities if they can gain anything from it.

A stabilizing force is having the entities form a single
interconnected group, with no gaps that divide the entities into two
groups with very weak connections. In this case it will be in the best
interest of everyone to get along with the others (regardless of
differences in power or basic values) and to retaliate against anybody
upsetting the peace.

These factors suggest that a rational ethics encompassing cooperation,
tolerance and niceness tempered by provocability is evolutionarily
stable and rational for entities on a variety of scales. The value
systems and individual epistemologies of the entities may be very
different, and this may add other parts to their ethical systems not
based in rational ethics (for example valuations of different actions
that have no clear implications for survival).


How does this apply to humans and the relation humans-posthumans?

At first glance it seems that humans do not behave rationally, given
the amount of violence reported in the news. But normally practically
all humans interact in a cooperative manner: we trade instead of
steal, we do not kill each other to gain advantages and we support
each other. This occurs even when there are no formal laws or very
little risk of getting caught. There is some irrational behavior such
as murder or theft, but it is very uncommon given the number of
interactions humans have each day.

Are super-powerful attacks or attacks with no retaliation possible

On the personal level, there are relatively few accessible weapons
that can destroy a sizeable fraction of a population unless it is a
very small population. Individual irrational attacks are at present a
fairly small problem seen from a population point of view, but it is
aggravated by the difficulty of finding the perpetrators and the
possibility of surprise attacks. The problem is also growing as more
technology appears.

At present the only groups having significant weapons of mass
destruction are national states, and while they can kill significant
numbers of people they cannot effectively kill significant numbers of
states (i.e. entities on their own level). Any nation attacking
another nation with this kind of weapons would have to deal with an
alliance of all or most of the other rational nations (at present
rationality seems to be slightly limited on the nation level, but it
exists) making it irrational to attack.

Surprise attacks with weapons of mass destruction are possible, but
most national defense infrastructures involve distribution and
hardening of vital installations, making second-strike capabilities
possible. Retaliation can thus occur, stabilizing the system.

Is the population a single interconnected alliance, or does there
exist gaps?

Obviously there exists large differences between social classes and
human nations in economic wealth and interaction. Economically
speaking very few individuals in the west stand outside the
conventional market (or any of the black markets) so they are tied
together within a society by market ties. In addition there exist
other ties and trades in form of social relations, cultural norms etc,
which suggests that in most western societies there doesn't exist any
true gaps, even if the social differences are great (note that a gap
in the sense I am discussing means two groups that have nothing in
common and no economic ties at all). In other societies such gaps may
exist, but due to their instability these societies appear to be
rather rare (occupied nations with strict segregation between
occupiers and the occupied come to mind).

Among nations, it seems that most nations are similarly tied together
by a world economy and a more tenuous world society which ties into
the internal networks of relations inside the nations. While there are
huge differences there doesn't seem to be any nations that the others
would do better without (while almost everyone might dislike (say)
Iraq, nobody seriously proposes to nuke the nation out of existence;
instead efforts are directed towards forcing it to change in a
suitable direction).

Implications for human-superintelligence relations

These arguments suggest that even if superintelligent entities (SIs)
develop (through the development of posthumans or artificial
intelligences) achieve significant power it is rational for them to
cooperate with the normal humans.

It has been argued that humans have nothing SIs might want, and hence
cooperation is not rational. However, it is not clear that this is

First, as the SIs are developing they will start out at or below the
human level in capability. This makes it rational for them to
cooperate at least initially. However, this will produce trading ties,
and as they develop it seems likely that the law of comparative
advantages will favor specialization rather than being able to do
everything humans can do.

For example, an emerging SI would be better off selling new
technological designs necessary for its growth to the humans, who
would implement them and sell the finished devices back, than trying
to both develop technology and build it (which would also necessitate
acquiring resources; either through trade, which would require
something to trade with, or through force/scheming, which could lead
to retaliation). This would create strong economic ties between it and
the rest of humanity, decreasing the likelihood of defection from any
part. Specialization would be favored, creating mutual dependence.

In fact, this is also a good way for the emerging SI to protect itself
from possible human aggression (for example fear of the SI). It
creates a mutually beneficial trade, which gives the SI human allies
and the humans a benefit from the SI technology. Even if some humans
were averse to it, the humans benefiting from the cooperation would
protect it from the anti-SI group.

During the bootstrapping phase the SI and humans will specialize to
maximize their respective utilities. As long as there are anything
either part can trade, they will be motivated to do so. It has been
argued that SIs are so different that any exchange would be pointless,
but even a SI will be a physical entity and have physical needs,
meaning that there are at least some overlap between the human world
and it. For example, matter and energy will certainly be necessary for
SI function. Information in different forms might also be of

To make an analogy, humans can today create honey artificially through
chemical synthesis, but it is much more profitable to raise bees
(which is a trade between bees and bee-keepers) which also has the
added value of the buyer by being "real honey". It is not unreasonable
to think that "human made" could become valuable too (especially since
humans by their limitations will be scarce in an SI-dominated world).

Is it likely that gaps will appear?

There are some debate about the speed with which SI can emerge. Some
claim the development of SI is a very fast bootstrapping process that
will result in a singularity: by applying its intelligence to
improving itself the SI system can bootstrap itself to higher levels
extremely quickly, leaving everybody not involved in the dust (a
"spike" in Dan Clemmensen's terminology). This can be compared to the
classic singularity idea of Vernor Vinge, where the bootstrapping
process encompasses all of society (a "swell") instead of a very small

Under what circumstances would a spike occur instead of a swell? It
seems clear spikes require the that the bootstrapping process does not
require a wide variety of goods or skills that has to be acquired, and
that the process itself is a simple feedback loop, for example a
self-augmenting software design system. Both of these assumptions are

The second assumption assumes that the intelligence that is increasing
can both be applied to the problem of increasing itself and for other
applications (otherwise we would end up with a system able to improve
its ability to improve itself indefinitely, but absolutely no other
abilities). There is significant debate in psychology today if general
intelligence is a meaningful concept. But intelligence isn't the only
necessary ability needed to fulfill a task, skills, knowledge and
resources are also necessary. The bootstrapping process must acquire
this in order to become a SI, which is in contradiction with the first
assumption unless one postulates that the requirements can be met
fairly easily during the early stages and at the late stages the SI is
so advanced that it can gain these resources without anybody or
anything stopping it.

Another open question is if the problem of self-augmentation has a
simple solution. Does there exist an algorithm that when applied to
itself in a certain environment produces another algorithm with
greater ability for algorithm improvement? And if there does, is it
given that the series of improved versions will not converge instead
of diverge, or avoid running into various constraints such as storage
space, speed or Gödel-like inabilities? It appears very unlikely that
such an algorithm could find a path towards SI that efficiently avoids
all possible hinders, especially given that it starts in a
low-intelligence, low-information, low-resource state. Evolution amply
demonstrates that emergent problems appear on all scales as a species
evolves, and have to be solved in a variety of ways which in the long
run might be hindersome for further development (cf. the trachea of
insects, which limit their size).

To sum up, the spike scenario requires a very high level of ability to
develop using a very small initial resource base with few
hinders. This appears unlikely, and there are no evidence that such an
event can occur.

On the other hand, if the assumptions are loosened a bit the
bootstrapping process becomes significantly more likely. It might be
too optimistic to assume that the emerging SI system could (long
before its truly superintelligent stage) develop the necessary
knowledge for its further development in various aspects, but that
knowledge could be traded with other entities (human experts) having
such knowledge. In fact, the transcension scenario suggested by Dan
Clemmensen involves the collaboration between human experts and
software in developing a more powerful collaboration. What is not
obvious is the necessary size of the collaboration; as it becomes more
advanced more fields become involved (hardware design, human
psychology, financing etc) requiring more collaboration. It might turn
out that the process requires the involvement of a significant portion
of society. In this case no real gap can emerge.

Another common assumption is that a single SI will appear, which will
essentially create a monopoly on superintelligence. This is rather
unlikely, since the same technology needed to create it will also be
available to create another. In fact, it seems to be rational for the
existing SI to promote the development of more SIs, since that gives
it more partners to cooperate with and hence a larger utility increase
(assuming the SI does not suffer from strict resource limitations).

It hence seems likely that there will be no real gap between humans
and SI; the development of SI requires in the initial stages human
assistance and this naturally leads to a trade situation which
develops as both part adapt. There is no obvious reason why there
would exist only humans and SIs; intermediary forms appear just as
likely and would have a further stabilizing influence.

Further work

This is just a rough sketch of rational ethics applied to posthuman
developments, but I hope it will prove an useful jump-off point for
other investigations. Some questions that deserve study are:

        What are the limitations of the law of comparative advantage
        in a situation with vastly different entities?

        How can gaps be characterized and predicted?

        In the iterated prisoner's dilemma rational strategies do not
        always cooperate, and under some circumstances cooperating
        groups are not stable due to noise, invasion of defectors
        after having evolved to a non-retaliatory strategy or
        spontaneous fluctuations. How can rational cooperation be
        promoted, both in the game and in real interactions?

        How can rational cooperation be promoted in systems with many
        classes of entities? In this case the situation is even more

        Are there dominating technologies that will destabilize any
        physically realizable society? Can they be rationally avoided?

Anders Sandberg                                      Towards Ascension!
asa@nada.kth.se                            http://www.nada.kth.se/~asa/
GCS/M/S/O d++ -p+ c++++ !l u+ e++ m++ s+/+ n--- h+/* f+ g+ w++ t+ r+ !y
Received on Sat May 23 19:05:39 1998

This archive was generated by hypermail 2.1.8 : Tue Mar 07 2006 - 14:45:30 PST