Peter Norvig on Prior Knowledge - Dictionary of Arguments

Author	Concept	Summary/Quotes	Sources
Philosophy Dictionary of Arguments Home


Prior knowledge: Prior knowledge is the information and skills that a learner already has before learning about a new topic. See also Knowledge, Learning, Understanding, Artificial Intelligence. _____________ Annotation: The above characterizations of concepts are neither definitions nor exhausting presentations of problems related to them. Instead, they are intended to give a short introduction to the contributions below. – Lexicon of Arguments.

> Norvig, Peter	> Prior Knowledge	Peter Norvig on Prior Knowledge - Dictionary of Arguments Norvig I 777 Prior knowledge/AI Research/Norvig/Russell: To understand the role of prior knowledge, we need to talk about the logical relationships among hypotheses, example descriptions, and classifications. Let Descriptions denote the conjunction of all the example descriptions in the training set, and let Classifications denote the conjunction of all the example classifications. Then a Hypothesis that “explains the observations” must satisfy the following property (recall that \|= means “logically entails”): Hypothesis ∧ Descriptions \|= Classifications. Entailment constraint: We call this kind of relationship an entailment constraint, in which Hypothesis is the “un-known.” Pure inductive learning means solving this constraint, where Hypothesis is drawn from some predefined hypothesis space. >Hypotheses/AI Research. Software agents/knowledge/learning/Norvig: The modern approach is to design agents that already know something and are trying to learn some more. An autonomous learning agent that uses background knowledge must somehow obtain the background knowledge in the first place (…). This method must itself be a learning process. The agent’s life history will therefore be characterized by cumulative, or incremental, development. Norvig I 778 Learning with background knowledge: allows much faster learning than one might expect from a pure induction program. Explanation based learning/EBL: the entailment constraints satisfied by EBL are the following: Hypothesis ∧ Descriptions \|= Classifications Background \|= Hypothesis. Norvig I 779 (…) it was initially thought to be a way to learn from examples. But because it requires that the background knowledge be sufficient to explain the hypothesis, which in turn explains the observations, the agent does not actually learn anything factually new from the example. The agent could have derived the example from what it already knew, although that might have required an unreasonable amount of computation. EBL is now viewed as a method for converting first-principles theories into useful, special purpose knowledge. Relevance/observations/RBL: the prior knowledge background concerns the relevance of a set of features to the goal predicate. This knowledge, together with the observations, allows the agent to infer a new, general rule that explains the observations: Hypothesis ∧ Descriptions \|= Classifications , Background ∧ Descriptions ∧ Classifications \|= Hypothesis. We call this kind of generalization relevance-based learning, or RBL. (…) whereas RBL does make use of the content of the observations, it does not produce hypotheses that go beyond the logical content of the background knowledge and the observations. It is a deductive form of learning and cannot by itself account for the creation of new knowledge starting from scratch. Entailment constraint: Background ∧ Hypothesis ∧ Descriptions \|= Classifications. That is, the background knowledge and the new hypothesis combine to explain the examples. Knowledge-based inductive learning/KBIL algorithms: Algorithms that satisfy [the entailment] constraint are called knowledge-based inductive learning, or KBIL, algorithms. KBIL algorithms, (…) have been studied mainly in the field of inductive logic programming, or ILP. Norvig I 780 Explanation-based learning: The basic idea of memo functions is to accumulate a database of input–output pairs; when the function is called, it first checks the database to see whether it can avoid solving the problem from scratch. Explanation-based learning takes this a good deal further, by creating general rules that cover an entire class of cases. Norvig I 781 General rules: The basic idea behind EBL is first to construct an explanation of the observation using prior knowledge, and then to establish a definition of the class of cases for which the same explanation structure can be used. This definition provides the basis for a rule covering all of the cases in the class. Explanation: The “explanation” can be a logical proof, but more generally it can be any reasoning or problem-solving process whose steps are well defined. The key is to be able to identify the necessary conditions for those same steps to apply to another case. Norvig I 782 EBL: 1. Given an example, construct a proof that the goal predicate applies to the example using the available background knowledge. Norvig I 783 2. In parallel, construct a generalized proof tree for the variabilized goal using the same inference steps as in the original proof. 3. Construct a new rule whose left-hand side consists of the leaves of the proof tree and whose right-hand side is the variabilized goal (after applying the necessary bindings from the generalized proof). 4. Drop any conditions from the left-hand side that are true regardless of the values of the variables in the goal. Norvig I 794 Inverse resolution: Inverse resolution is based on the observation that if the example Classifications follow from Background ∧ Hypothesis ∧ Descriptions, then one must be able to prove this fact by resolution (because resolution is complete). If we can “run the proof backward,” then we can find a Hypothesis such that the proof goes through. Norvig I 795 Inverse entailment: The idea is to change the entailment constraint Background ∧ Hypothesis ∧ Descriptions \|= Classifications to the logically equivalent form Background ∧ Descriptions ∧ ¬Classifications \|= ¬Hypothesis. Norvig I 796 An inverse resolution procedure that inverts a complete resolution strategy is, in principle, a complete algorithm for learning first-order theories. That is, if some unknown Hypothesis generates a set of examples, then an inverse resolution procedure can generate Hypothesis from the examples. This observation suggests an interesting possibility: Suppose that the available examples include a variety of trajectories of falling bodies. Would an inverse resolution program be theoretically capable of inferring the law of gravity? The answer is clearly yes, because the law of gravity allows one to explain the examples, given suitable background mathematics. Norvig I 798 Literature: The current-best-hypothesis approach is an old idea in philosophy (Mill, 1843)(1). Early work in cognitive psychology also suggested that it is a natural form of concept learning in humans (Bruner et al., 1957)(2). In AI, the approach is most closely associated with the work of Patrick Winston, whose Ph.D. thesis (Winston, 1970)(3) addressed the problem of learning descriptions of complex objects. Version space: The version space method (Mitchell, 1977(4), 1982(5)) takes a different approach, maintaining the set of all consistent hypotheses and eliminating thosefound to be inconsistent with new examples. The approach was used in the Meta-DENDRAL Norvig I 799 expert system for chemistry (Buchanan and Mitchell, 1978)(6), and later in Mitchell’s (1983)(7) LEX system, which learns to solve calculus problems. A third influential thread was formed by the work of Michalski and colleagues on the AQ series of algorithms, which learned sets of logical rules (Michalski, 1969(8); Michalski et al., 1986(9)). EBL: EBL had its roots in the techniques used by the STRIPS planner (Fikes et al., 1972)(10). When a plan was constructed, a generalized version of it was saved in a plan library and used in later planning as a macro-operator. Similar ideas appeared in Anderson’s ACT* architecture, under the heading of knowledge compilation (Anderson, 1983)(11), and in the SOAR architecture, as chunking (Laird et al., 1986)(12). Schema acquisition (DeJong, 1981)(13), analytical generalization (Mitchell, 1982)(5), and constraint-based generalization (Minton, 1984)(14) were immediate precursors of the rapid growth of interest in EBL stimulated by the papers of Mitchell et al. (1986)(15) and DeJong and Mooney (1986)(16). Hirsh (1987) introduced the EBL algorithm described in the text, showing how it could be incorporated directly into a logic programming system. Van Harmelen and Bundy (1988)(18) explain EBL as a variant of the partial evaluation method used in program analysis systems (Jones et al., 1993)(19). VsEBL: Initial enthusiasm for EBL was tempered by Minton’s finding (1988)(20) that, without extensive extra work, EBL could easily slow down a program significantly. Formal probabilistic analysis of the expected payoff of EBL can be found in Greiner (1989)(21) and Subramanian and Feldman (1990)(22). An excellent survey of early work on EBL appears in Dietterich (1990)(23). Relevance: Relevance information in the form of functional dependencies was first developed in the database community, where it is used to structure large sets of attributes into manageable subsets. Functional dependencies were used for analogical reasoning by Carbonell and Collins (1973)(24) and rediscovered and given a full logical analysis by Davies and Russell (Davies, 1985(25); Davies and Russell, 1987(26)). Prior knowledge: Their role as prior knowledge in inductive learning was explored by Russell and Grosof (1987)(27). The equivalence of determinations to a restricted-vocabulary hypothesis space was proved in Russell (1988)(28). Learning: Learning algorithms for determinations and the improved performance obtained by RBDTL were first shown in the FOCUS algorithm, due to Almuallim and Dietterich (1991)(29). Tadepalli (1993)(30) describes a very ingenious algorithm for learning with determinations that shows large improvements in earning speed. Inverse deduction: The idea that inductive learning can be performed by inverse deduction can be traced to W. S. Jevons (1874)(31) (…). Computational investigations began with the remarkable Ph.D. thesis by Norvig I 800 Gordon Plotkin (1971)(32) at Edinburgh. Although Plotkin developed many of the theorems and methods that are in current use in ILP, he was discouraged by some undecidability results for certain subproblems in induction. MIS (Shapiro, 1981)(33) reintroduced the problem of learning logic programs, but was seen mainly as a contribution to the theory of automated debugging. Induction/rules: Work on rule induction, such as the ID3 (Quinlan, 1986)(34) and CN2 (Clark and Niblett, 1989)(35) systems, led to FOIL (Quinlan, 1990)(36), which for the first time allowed practical induction of relational rules. Relational Learning: The field of relational learning was reinvigorated by Muggleton and Buntine (1988)(37), whose CIGOL program incorporated a slightly incomplete version of inverse resolution and was capable of generating new predicates. The inverse resolution method also appears in (Russell, 1986)(38), with a simple algorithm given in a footnote. The next major system was GOLEM (Muggleton and Feng, 1990)(39), which uses a covering algorithm based on Plotkin’s concept of relative least general generalization. ITOU (Rouveirol and Puget, 1989)(40) and CLINT (De Raedt, 1992)(41) were other systems of that era. Natural language: More recently, PROGOL (Muggleton, 1995)(42) has taken a hybrid (top-down and bottom-up) approach to inverse entailment and has been applied to a number of practical problems, particularly in biology and natural language processing. Uncertainty: Muggleton (2000)(43) describes an extension of PROGOL to handle uncertainty in the form of stochastic logic programs. Inductive logic programming /ILP: A formal analysis of ILP methods appears in Muggleton (1991)(44), a large collection of papers in Muggleton (1992)(45), and a collection of techniques and applications in the book by Lavrauc and Duzeroski (1994)(46). Page and Srinivasan (2002)(47) give a more recent overview of the field’s history and challenges for the future. Early complexity results by Haussler (1989) suggested that learning first-order sentences was intractible. However, with better understanding of the importance of syntactic restrictions on clauses, positive results have been obtained even for clauses with recursion (Duzeroski et al., 1992)(48). Learnability results for ILP are surveyed by Kietz and Duzeroski (1994)(49) and Cohen and Page (1995)(50). Discovery systems/VsILP: Although ILP now seems to be the dominant approach to constructive induction, it has not been the only approach taken. So-called discovery systems aim to model the process of scientific discovery of new concepts, usually by a direct search in the space of concept definitions. Doug Lenat’s Automated Mathematician, or AM (Davis and Lenat, 1982)(51), used discovery heuristics expressed as expert system rules to guide its search for concepts and conjectures in elementary number theory. Unlike most systems designed for mathematical reasoning, AM lacked a concept of proof and could only make conjectures. It rediscovered Goldbach’s conjecture and the Unique Prime Factorization theorem. AM’s architecture was generalized in the EURISKO system (Lenat, 1983)(52) by adding a mechanism capable of rewriting the system’s own discovery heuristics. EURISKO was applied in a number of areas other than mathematical discovery, although with less success than AM. The methodology of AM and EURISKO has been controversial (Ritchie and Hanna, 1984; Lenat and Brown, 1984). 1. Mill, J. S. (1843). A System of Logic, Ratiocinative and Inductive: Being a Connected View of the Principles of Evidence, and Methods of Scientific Investigation. J.W. Parker, London. 2. Bruner, J. S., Goodnow, J. J., and Austin, G. A. (1957). A Study of Thinking. Wiley. 3. Winston, P. H. (1970). Learning structural descriptions from examples. Technical report MAC-TR-76, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. 4. Mitchell, T.M. (1977). Version spaces: A candidate elimination approach to rule learning. In IJCAI-77, pp. 305–310. 5. Mitchell, T. M. (1982). Generalization as search. AIJ, 18(2), 203–226. 6. Buchanan, B. G.,Mitchell, T.M., Smith, R. G., and Johnson, C. R. (1978). Models of learning systems. In Encyclopedia of Computer Science and Technology, Vol. 11. Dekker. 7. Mitchell, T. M., Utgoff, P. E., and Banerji, R. (1983). Learning by experimentation: Acquiring and refining problem-solving heuristics. In Michalski, R. S., Carbonell, J. G., and Mitchell, T. M. (Eds.), Machine Learning: An Artificial Intelligence Approach, pp. 163–190. Morgan Kaufmann. 8. Michalski, R. S. (1969). On the quasi-minimal solution of the general covering problem. In Proc. First International Symposium on Information Processing, pp. 125–128. 9. Michalski, R. S.,Mozetic, I., Hong, J., and Lavrauc, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In AAAI-86, pp. 1041–1045. 10. Fikes, R. E., Hart, P. E., and Nilsson, N. J. (1972). Learning and executing generalized robot plans. AIJ, 3(4), 251–288. 11. Anderson, J. R. (1983). The Architecture of Cognition. Harvard University Press. 12. Laird, J., Rosenbloom, P. S., and Newell, A. (1986). Chunking in Soar: The anatomy of a general learning mechanism. Machine Learning, 1, 11–46. 13. DeJong, G. (1981). Generalizations based on explanations. In IJCAI-81, pp. 67–69. 14. Minton, S. (1984). Constraint-based generalization: Learning game-playing plans from single examples. In AAAI-84, pp. 251–254. 15. Mitchell, T. M., Keller, R., and Kedar-Cabelli, S. (1986). Explanation-based generalization: A unifying view. Machine Learning, 1, 47–80. 16. DeJong, G. and Mooney, R. (1986). Explanation-based learning: An alternative view. Machine Learning, 1, 145–176. 17. Hirsh, H. (1987). Explanation-based generalization in a logic programming environment. In IJCAI-87. 18. van Harmelen, F. and Bundy, A. (1988). Explanation-based generalisation = partial evaluation. AIJ, 36(3), 401–412. 19. Jones, N. D., Gomard, C. K., and Sestoft, P. (1993). Partial Evaluation and Automatic Program Generation. Prentice-Hall. 20. Minton, S. (1988). Quantitative results concerning the utility of explanation-based learning. In AAAI-88, pp. 564–569. 21. Greiner, R. (1989). Towards a formal analysis of EBL. In ICML-89, pp. 450–453. 22. Subramanian, D. and Feldman, R. (1990). The utility of EBL in recursive domain theories. In AAAI-90, Vol. 2, pp. 942–949. 23. Dietterich, T. (1990). Machine learning. Annual Review of Computer Science, 4, 255–306. 24. Carbonell, J. R. and Collins, A. M. (1973). Natural semantics in artificial intelligence. In IJCAI-73, pp. 344–351. 25. Davies, T. R. (1985). Analogy. Informal note INCSLI- 85-4, Center for the Study of Language and Information (CSLI). 26. Davies, T. R. and Russell, S. J. (1987). A logical approach to reasoning by analogy. In IJCAI-87, Vol. 1, pp. 264–270. 27. Russell, S. J. and Grosof, B. (1987). A declarative approach to bias in concept learning. In AAAI-87. 28. Russell, S. J. (1988). Tree-structured bias. In AAAI-88, Vol. 2, pp. 641–645. 29. Almuallim, H. and Dietterich, T. (1991). Learning with many irrelevant features. In AAAI-91, Vol. 2, pp. 547–552. 30. Tadepalli, P. (1993). Learning from queries and examples with tree-structured bias. In ICML-93, pp. 322–329. 31. Jevons, W. S. (1874). The Principles of Science. Routledge/Thoemmes Press, London. 32. Plotkin, G. (1971). Automatic Methods of Inductive Inference. Ph.D. thesis, Edinburgh University. 33. Shapiro, E. (1981). An algorithm that infers theories from facts. In IJCAI-81, p. 1064. 34. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106. 35. Clark, P. and Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283. 36. Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5(3), 239–266. 37. Muggleton, S. H. and Buntine, W. (1988). Machine invention of first-order predicates by inverting resolution. In ICML-88, pp. 339–352. 38. Russell, S. J. (1986). A quantitative analysis of analogy by similarity. In AAAI-86, pp. 284–288. 39. Muggleton, S. H. and Feng, C. (1990). Efficient induction of logic programs. In Proc. Workshop on Algorithmic Learning Theory, pp. 368–381. 40. Rouveirol, C. and Puget, J.-F. (1989). A simple and general solution for inverting resolution. In Proc. European Working Session on Learning, pp. 201–210. 41. De Raedt, L. (1992). Interactive Theory Revision: An Inductive Logic Programming Approach. Academic Press. 42. Muggleton, S. H. (1995). Inverse entailment and Progol. New Generation Computing, 13(3-4), 245- 286. 43. Muggleton, S. H. (2000). Learning stochastic logic programs. Proc. AAAI 2000 Workshop on Learning Statistical Models from Relational Data. 44. Muggleton, S. H. (1991). Inductive logic programming. New Generation Computing, 8, 295–318. 45. Muggleton, S. H. (1992). Inductive Logic Programming. Academic Press. 46. Lavrauc, N. and Duzeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. Ellis Horwood 47. Page, C. D. and Srinivasan, A. (2002). ILP: A short look back and a longer look forward. Submitted to Journal of Machine Learning Research. 48. Duzeroski, S., Muggleton, S. H., and Russell, S. J. (1992). PAC-learnability of determinate logic programs. In COLT-92, pp. 128–135. 49. Kietz, J.-U. and Duzeroski, S. (1994). Inductive logic programming and learnability. SIGART Bulletin, 5(1), 22–32. 50. Cohen, W. W. and Page, C. D. (1995). Learnability in inductive logic programming: Methods and results. New Generation Computing, 13(3–4), 369-409. 51. Davis, R. and Lenat, D. B. (1982). Knowledge-Based Systems in Artificial Intelligence. McGraw- Hill. 52. Lenat, D. B. (1983). EURISKO: A program that learns new heuristics and domain concepts: The nature of heuristics, III: Program design and results. AIJ, 21(1–2), 61–98. 53. Ritchie, G. D. and Hanna, F. K. (1984). AM: A case study in AI methodology. AIJ, 23(3), 249–268. 54. Lenat, D. B. and Brown, J. S. (1984). Why AM and EURISKO appear to work. AIJ, 23(3), 269–294. _____________ Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. Translations: Dictionary of Arguments The note [Concept/Author], [Author1]Vs[Author2] or [Author]Vs[term] resp. "problem:"/"solution:", "old:"/"new:" and "thesis:" is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.	Norvig I Peter Norvig Stuart J. Russell Artificial Intelligence: A Modern Approach Upper Saddle River, NJ 2010

Send Link

> Counter arguments against Norvig

> Counter arguments in relation to Prior Knowledge

Authors A B C D E F G H I J K L M N O P Q R S T U V W Y Z

Concepts A B C D E F G H I J K L M N O P Q R S T U V W Z

Ed. Martin Schulz, access date 2024-04-26

Legal Notice Contact Data protection declaration