Philosophy Dictionary of Arguments

Home Screenshot Tabelle Begriffe

Author Item Summary Meta data

Nick Bostrom on Values - Dictionary of Arguments

I 226
Values/superintelligence/software-agents//Bostrom: While the agent is unintelligent, it might lack the capability to understand or even represent any humanly meaningful value.
Problem: It is impossible to enumerate all possible situations a superintelligence might find itself in and to specify for each what action it should take. Similarly, it is impossible to create a list of all possible worlds and assign each of them a value.
Motivation: A motivation system, therefore, cannot be specified as a comprehensive lookup table. It must instead be expressed more abstractly, as a formula or rule that allows the agent to decide what to do in any given situation. ((s) Cf. the philosophical discussion of principles against content: >Principles, >Utilitarianism, >Deontology.)
I 227
Utility: Creating a machine that can compute a good approximation of the expected utility of the actions available to it is an AI-complete problem. (…) a problem, a problem that remains even if the problem of making machines intelligent is solved. We can use this framework of a utility-maximizing agent to consider the predicament of a future seed-AI programmer who intends to solve the control problem by endowing the AI with a final goal that corresponds to some plausible human notion of a worthwhile outcome.
E.g., The programmer has some particular human value in mind that he would like the AI to promote. (…) let us say that it is happiness. But how could he express such a utility function in computer code? Computer languages do not contain terms such as “happiness” as primitives.
I 228
If we cannot transfer human values into an AI by typing out full-blown representations in computer code, what else might we try?
I 230
{Possible methods for acquiring values]:
-Reinforcement learning: Often, the learning algorithm involves the gradual construction of some kind of evaluation function, which assigns values to states, state–action pairs, or policies.
Problem: The evaluation function, which is continuously updated in light of experience, could be regarded as incorporating a form of learning about value. However, what is being learned is not new final values but increasingly accurate estimates of the instrumental values of reaching particular states (or of taking particular actions in particular states, or of following particular policies). Insofar as a reinforcement-learning agent can be described as having a final goal, that goal remains constant: to maximize future reward. And reward consists of specially designated percepts received from the environment. Therefore, the wireheading syndrome remains a likely outcome in any reinforcement agent that develops a world model sophisticated enough to suggest this alternative way of maximizing reward.
I 233
- Motivational scaffolding: It involves giving the seed AI an interim goal system, with relatively simple final goals that we can represent by means of explicit coding or some other feasible method. Once the AI has developed more sophisticated representational faculties, we replace this interim scaffold goal system with one that has different final goals.
Problem: Because the scaffold goals are not just instrumental but final goals for the AI, the AI might be expected to resist having them replaced (goal-content integrity being a convergent instrumental value). This creates a hazard. If the AI succeeds in thwarting the replacement of its scaffold goals, the method fails.
I 234
Further problems: (1) The motivational scaffolding (…) carries the risk that the AI could become too powerful while it is still running on its interim goal system.
(2) Installing the ultimately intended goals in a human-level AI is not necessarily that much easier than doing so in a more primitive AI.
I 235
-Value learning: [in order to] AI’s intelligence to learn the values (…) we must provide a criterion for the AI that at least implicitly picks out some suitable set of values. (…) the value learning approach retains an unchanging final goal throughout the AI’s developmental and operational phases. Learning does not change the goal. It changes only the AI’s beliefs about the goal.
Criteria: The AI thus must be endowed with a criterion that it can use to determine which percepts constitute evidence in favor of some hypothesis about what the ultimate goal is, and which percepts constitute evidence against.
Problem: creating artificial general intelligence in the first place, which requires a powerful learning mechanism that can discover the structure of the environment from limited sensory inputs.
I 240
Understanding/motivation: (…) the difficulty here is not so much how to ensure that the AI can understand human intentions. A superintelligence should easily develop such understanding. Rather, the difficulty is ensuring that the AI will be motivated to pursue the described values in the way we intended.
This is not guaranteed by the AI’s ability to understand our intentions: an AI could know exactly what we meant and yet be indifferent to that interpretation of our words (being motivated instead by some other interpretation of the words or being indifferent to our words altogether).
Solution: the correct motivation should ideally be installed in the seed AI before it becomes capable of fully representing human concepts or understanding human intentions.
I 253
[Further] value-loading techniques:
- Evolutionary selection: Powerful search may find a design that satisfies the formal search criteria but not our intentions.
- Value accretion: (…) the human value-accretion dispositions might be complex and difficult to replicate in a seed AI.
Problem: A bad approximation may yield an AI that generalizes differently than humans do and therefore acquires unintended final goals.
I 254
- Motivational scaffolding: encourage a system to develop internal high-level representations that are transparent to humans (while keeping the system’s capabilities below the dangerous level) and then to use those representations to design a new goal system.
- Emulation modulation: If machine intelligence is achieved via the emulation pathway, it would likely be possible to tweak motivations through the digital equivalent of drugs or by other means.
- Institution design: Various strong methods of social control could be applied in an institution composed of emulations. In principle, social control methods could also be applied in an institution composed of artificial intelligences. >Ethics/superintelligence/Bostrom, >Ethics/superintelligence/Yudkowsky, >Norms/Bostrom.

Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. Translations: Dictionary of Arguments
The note [Author1]Vs[Author2] or [Author]Vs[term] is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.

Bostrom I
Nick Bostrom
Superintelligence. Paths, Dangers, Strategies Oxford: Oxford University Press 2017

Send Link
> Counter arguments against Bostrom

Authors A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   Y   Z  

Concepts A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   Z  

Ed. Martin Schulz, access date 2021-06-18
Legal Notice   Contact   Data protection declaration