Human-like systematic generalization through a meta-learning neural network
Human-like systematic generalization through a meta-learning neural network
The power of human language and thought arises from systematic compositionality-the algebraic ability to understand and produce novel combinations from known components. Fodor and Pylyshyn famously argued that artificial neural networks lack this capacity and are therefore not viable models of the mind. Neural networks have advanced considerably in the years since, yet the systematicity challenge persists. Here we successfully address Fodor and Pylyshyn's challenge by providing evidence that neural networks can achieve human-like systematicity when optimized for their compositional skills. To do so, we introduce the meta-learning for compositionality approach for guiding training through a dynamic stream of compositional tasks. To compare humans and machines, we conducted human behavioural experiments using an instruction learning paradigm. After considering seven different models, we found that, in contrast to perfectly systematic but rigid probabilistic symbolic models, and perfectly flexible but unsystematic neural networks, only MLC achieves both the systematicity and flexibility needed for human-like generalization. MLC also advances the compositional skills of machine learning systems in several systematic generalization benchmarks. Our results show how a standard neural network architecture, optimized for its compositional skills, can mimic human systematic generalization in a head-to-head comparison.
People are adept at learning new concepts and systematically combining them with existing concepts. For example, once a child learns how to 'skip', they can understand how to 'skip backwards' or 'skip around a cone twice' due to their compositional skills. Fodor and Pylyshyn argued that neural networks lack this type of systematicity and are therefore not plausible cognitive models, leading to a vigorous debate that spans thirty-five years. Counterarguments to Fodor and Pylyshyn have focused on two main points. The first is that human compositional skills, although important, may not be as systematic and rule-like as Fodor and Pylyshyn indicated. The second is that neural networks, although limited in their most basic forms, can be more systematic when using sophisticated architectures. In recent years, neural networks have advanced considerably and led to a number of breakthroughs, including in natural language processing. In light of these advances, we and other researchers have reformulated classic tests of systematicity and reevaluated Fodor and Pylyshyn's arguments. Notably, modern neural networks still struggle on tests of systematicity-tests that even a minimally algebraic mind should pass. As the technology marches on, the systematicity debate continues.
In this Article, we provide evidence that neural networks can achieve human-like systematic generalization through MLC-an optimization procedure that we introduce for encouraging systematicity through a series of few-shot compositional tasks. Our implementation of MLC uses only common neural networks without added symbolic machinery, and without hand-designed internal representations or inductive biases. Instead, MLC provides a means of specifying the desired behaviour through high-level guidance and/or direct human examples; a neural network is then asked to develop the right learning skills through meta-learning.
To demonstrate the abilities of MLC, we evaluated humans and machines side by side on the same tests of systematic generalization. Specifically, we used instruction-learning tasks in a pseudolanguage to examine human and machine learning of structured algebraic systems. We also examined behaviour in response to highly ambiguous linguistic probes, designed to characterize human inductive biases and how these biases could either facilitate or hamper systematic generalization. Across these evaluations, MLC achieves (or even exceeds) human-level systematic generalization. MLC also produces human-like patterns of errors when human behaviour departs from purely algebraic reasoning, showing how neural networks are not only a capable but also a superior modeling tool for nuanced human compositional behaviour. In a final set of simulations, we show how MLC improves accuracy on popular benchmarks for few-shot systematic generalization.
Behavioural results
Behavioural results
First, we measured human systematic generalization, going beyond classic work that relied primarily on thought experiments to characterize human abilities. Our experimental paradigm asks participants to process instructions in a pseudolanguage in order to generate abstract outputs (meanings), differing from artificial grammar learning, statistical learning, and program learning in that explicit or implicit judgments of grammaticality are not needed. Instead, the participants generate sequences of symbols in response to sequences of words, enabling computational systems to directly model the resulting data by building on the powerful sequence-to-sequence toolkit from machine learning. All experiments were run on Amazon Mechanical Turk, and detailed procedures are described in the 'Behavioural methods: few-shot learning task' and 'Behavioural methods: open-ended task' sections of the Methods. The complete set of human and machine responses is viewable online.
Systematic generalization was evaluated through a few-shot learning paradigm. As illustrated in Fig. two, the participants were provided with a curriculum of fourteen study instructions (input/output of a word ('skip') that is presented only in isolation in the study examples, and no intended output is provided. The network produces a query output that is compared (hollow arrows) with a behavioural target. b, Episode b introduces the next word ('tiptoe') and the network is asked to use it compositionally ('tiptoe backwards around a cone'), and so on for many more training episodes. The colours highlight compositional reuse of words. Stick figures were adapted from art created by D. Chappard.
pairs) and asked to produce outputs for ten query instructions. The study instructions were consistent with an underlying interpretation grammar, which derives outputs from inputs through a set of compositional rewrite rules. To perform well, the participants must learn the meaning of words from just a few examples and generalize to more complex instructions. The participants were able to produce output sequences that exactly matched the algebraic standard in eighty point seven percent of cases (indicated by an asterisk in Fig. two b (i)). Chance performance is two point eight percent for two-length output sequences if the length is known, and exponentially less for longer sequences. Notably, participants also generalized correctly in seventy-two point five percent of cases to longer output sequences than seen during training, which is a type of generalization that neural networks often struggle with. When deviating from this algebraic standard,
percentage of samples for MLC). The superscript notes indicate the algebraic answer (asterisks), a one-to-one error (one-to-one) or an iconic concatenation error (IC). The words and colours were randomized for each participant and a canonical assignment is therefore shown here. A black circle indicates a colour that was unused in the study set.