AI Research on Goals - Dictionary of Arguments
Bostrom I 126
Goals/superintelligence/AI Research/Bostrom: Is it possible to say anything about what a superintelligence with a decisive
strategic advantage would want?
Motivation/intelligence/superintelligent will/orthogonality/Bostrom: Intelligent search for instrumentally optimal plans and policies can be performed in the service of any goal. Intelligence and motivation are in a sense orthogonal: we can think of them as two axes spanning a graph in which each point represents a logically possible artificial agent. Some qualifications could be added to this picture. For instance, it might be impossible for a very unintelligent system to have very complex motivations.
Def Orthogonality thesis/Bostrom: Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.
According to the orthogonality thesis, artificial agents can have utterly non-anthropomorphic goals.
-Predictability through design:
(…) even before an agent has been created we might be able to predict something about its behavior, if we know something about who will build it and what goals they will want it to have.
-Predictability through inheritance. If a digital intelligence is created directly from a human template (as would be the case in a high-fidelity whole brain emulation), then the digital intelligence might inherit the motivations of the human template.
-Predictability through convergent instrumental reasons: (…) we may be able to infer something about its more immediate objectives by considering the instrumental reasons that would arise for any of a wide range of possible final goals in a wide range of situations.
Def Instrumental convergence thesis/Bostrom: Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents. >Goals/Omohundro.
Where there are convergent instrumental values, we may be able to predict some aspects of a superintelligence’s behavior:
-Self-preservation: Most humans seem to place some final value on their own survival. This is not a necessary feature of artificial agents: some may be designed to place no final value whatever on their own survival.
-Goal-content integrity: If an agent retains its present goals into the future, then its present goals will be more likely to be achieved by its future self. This gives the agent a present instrumental reason to
prevent alterations of its final goals. For software agents, which can easily switch bodies or create exact duplicates of themselves, preservation of self as a particular implementation or a particular physical object need not be an important instrumental value. Advanced software agents might also be able to swap memories, download skills, and radically modify their cognitive architecture and personalities.
Orthogonality thesis/Bostrom: (see above) the orthogonality thesis suggests that we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans (…).
Goals/ethics/morality/superintelligence/Bostrom: Consider, for example, the following “reasons-based” goal:
Do whatever we would have had most reason to ask the AI to do.
((s)VsBostrom: Here it is assumed that the AI has no reason to falsify our intentions.
Bostrom: components for choices of behavior:
-Goal content: What objective should the AI pursue? How should a description of this objective be interpreted?
-Decision theory: Should the AI use causal decision theory, evidential decision theory, updateless decision theory, or something else?
-Epistemology: What should the AI’s prior probability function be (…).What theory of anthropics should it use?
-Ratification: Should the AI’s plans be subjected to human review before being put into effect? If so, what is the protocol for that review process?
>Ethics/superintelligence/Bostrom, >Ethics/superintelligence/Yudkowsky, >Norms/Bostrom._____________Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. Translations: Dictionary of Arguments The note [Concept/Author], [Author1]Vs[Author2] or [Author]Vs[term] resp. "problem:"/"solution:", "old:"/"new:" and "thesis:" is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.
Superintelligence. Paths, Dangers, Strategies Oxford: Oxford University Press 2017