nature human behaviour
nature human behaviour
Dynamic computational phenotyping of human cognition
Computational phenotyping has emerged as a powerful tool for characterizing individual variability across a variety of cognitive domains. An individual's computational phenotype is defined as a set of mechanistically interpretable parameters obtained from fitting computational models to behavioural data. However, the interpretation of these parameters hinges critically on their psychometric properties, which are rarely studied. To identify the sources governing the temporal variability of the computational phenotype, we carried out a twelve-week longitudinal study using a battery of seven tasks that measure aspects of human learning, memory, perception and decision making. To examine the influence of state effects, each week, participants provided reports tracking their mood, habits and daily activities. We developed a dynamic computational phenotyping framework, which allowed us to tease apart the time-varying effects of practice and internal states such as affective valence and arousal. Our results show that many phenotype dimensions covary with practice and affective factors, indicating that what appears to be unreliability may reflect previously unmeasured structure. These results support a fundamentally dynamic understanding of cognitive variability within an individual.
Untangling sources of individual variability remains a central challenge in cognitive science. This endeavour has been revolutionized by the use of computational models, which provide precise algorithmic accounts of cognitive processes in terms of parsimonious sets of parameters, collectively termed the 'computational phenotype'. Importantly, these computational parameters can be intuitively interpreted as cognitively meaningful entities, such as learning rate or risk attitude. The interpretability of the computational phenotype has made it an appealing tool for studying complex phenomena as far-reaching as brain function, psychiatric illness, developmental processes and cross-species variation. For example, research in the field of computational psychiatry demonstrates that computational modelling can be particularly insightful for teasing apart different behavioural aspects of mental illness. While the link between anxiety and disrupted decision-making is well established, characterizing the specific behavioural disruption was accomplished in a study that estimated the computational phenotype in patients diagnosed with pathological anxiety and healthy controls. The study showed that anxiety is specifically associated with enhanced risk aversion (indicating less risk-taking) but not loss aversion. Another example of the merits of computational phenotyping comes from developmental science. Previous behavioural studies have shown that children tend to explore more than adults. While this could be explained by generally noisier behaviour in children, a computational phenotyping study helped to elucidate this phenomenon, indicating that in fact children rely more on directed, but not random exploration,
thereby reducing uncertainty about the environment by choosing high-uncertainty options.
Despite the widespread use of computational phenotypes, their interpretation hinges critically on their psychometric properties, which remain poorly understood. This issue is even more prominent in longitudinal studies that address changes within individuals over time. However, test-retest reliabilities of the computational phenotypes remain largely unknown since computational models are rarely fit within the same subjects over more than one timepoint. Only a few studies explicitly address the reliability of computational phenotypes and rarely in more than two sessions. Such studies have found mixed results, with most phenotype parameters showing poor test-retest reliability and a few showing moderate to high test-retest reliability. Furthermore, a large-scale study focusing on the domain of self-control showed significantly lower reliability for task-based measurements, including the computational phenotype, compared with classic self-reported measurements.
Low test-retest reliability of the computational phenotypes could reflect measurement noise, non-stationarity of the underlying construct, or both. If the underlying construct is non-stationary, its temporal trajectory could be relatively unpredictable (for example, a random walk) or relatively predictable (for example, directional drift induced by practice). Deciphering these sources of variability necessitates a robust, high-powered longitudinal dataset-a task that we undertake in this study. Our investigation seeks to better discern the 'noise' and the 'signal' in computational phenotypes by modelling multiple potential sources of temporal variability.
Over a continuous three-month period, we engaged ninety human participants in a weekly battery of seven online computer-based tasks: Go/No-go, Change detection, Random dot motion, Lottery ticket, Intertemporal choice, Two-armed bandit and Numerosity comparison. These tasks were chosen since they cover various aspects of cognition such as learning, memory, perception and decision making. Using these tasks, we estimated the computational phenotype of each participant on a weekly basis. In addition, the inclusion of a survey tracking individuals' mood and daily activities enabled us to estimate day-specific state effects on the computational phenotype. This unique dataset, which we make publicly available, allows us to illuminate the processes governing the temporal variability of cognition.
Our results provide evidence for a fundamentally dynamic view of the computational phenotype within an individual, and indicate that both practice and affective effects contribute to its temporal variability.
Results
Results
Longitudinal data for dynamical computational phenotypes
We collected data from ninety participants who performed seven online cognitive tasks on a weekly basis for twelve consecutive weeks. The tasks we used were Go/No-go, Change detection, Intertemporal choice, Lottery ticket, Numerosity comparison, Two-armed bandit and Random dot motion. These tasks were selected for two main reasons. First, they are commonly used in cognitive and neurocognitive research, as well as in phenotyping individual, clinical and age-related variation. Second, these tasks have well-established and validated computational models.
First, we calculated reaction time and accuracy (where applicable) for each task averaged (plus or minus standard deviation) across weeks and participants. As can be seen in Supplementary Table one, on average, participants' performance was adequate and in line with previous studies using similar tasks. For more detailed analysis of the behavioural data, see Supplementary Figure six.
Next, for each participant and for each task, we fit the free parameters with previously validated computational models using a hierarchical Bayesian framework, which formalized various assumptions about within-and between-participant variability. By accounting for the structure of the data, this hierarchical framework has been shown to improve parameter stability and provide a more accurate estimate of parameter values at the participant level. In particular, we fit two statistical models to the data: an 'independent' model and a 'dynamic' model (as well as a 'reduced' independent model; see Supplementary Information). The independent model allowed us to quantify parameter stability without building it into our modelling assumptions: the parameters for each participant were assumed to be drawn independently each week from a participant-specific distribution. The dynamic model, which we describe in detail below, formalizes a more structured set of assumptions about how the computational phenotype evolves over time, thus allowing us to make insightful inferences about sources of its temporal variability. Model fitting yielded week-specific estimates for the nineteen parameters comprising the computational phenotype for each participant. First, we examined widely used diagnostic measures, such as R-hat and the number of divergent transitions, that serve to assess the convergence of the Markov chain Monte Carlo sampling procedure for parameter estimation (see Supplementary Information). Second, we verified that the parameters were identifiable. Third, we verified that all computational models yielded excellent posterior predictive checks.
We then asked whether the parameter estimates were stable over time within an individual. This was quantified using intraclass correlations, a widely used measure of test-retest reliability. Figure two shows the intraclass correlation values for the computational phenotype estimated using the independent model and the reduced model. Intraclass correlation values covered a wide range of zero point four nine to zero point nine nine, with half of the parameters showing poor-to-moderate stability and half moderate-to-excellent stability. In agreement with previous work, models with fewer parameters tended to be more stable and parameters derived from the same task tended to have similar values. Go/No-go is a notable counterexample, including both the most stable and least stable parameters across tasks.
While these intraclass correlation values are imperfect, indicating variability of the measured computational phenotype over time, they are relatively high compared with those often reported in the literature. We suspect that these relatively high values can be attributed to fitting our data using hierarchical Bayesian modelling, which adequately captures the hierarchical structure of the data. Indeed, previous work showed that the fitting procedure has notable effects on parameter stability, whereby hierarchical models that pool information across participants promote parameter stability. To test this hypothesis, we repeated the intraclass correlation analysis, this time fitting the behavioural data using a reduced hierarchical model. In this reduced model, sessions were not nested within participants; instead, all sessions across all participants were considered independent, such that model parameters were drawn from a single population-level distribution. As expected, this procedure resulted in lower intraclass correlation values across all phenotype parameters (Figure two, red dots; for further details on this analysis, see Supplementary Information).
We used simulated data to calculate an intraclass correlation upper bound for each parameter on the basis of a ground-truth phenotype that was fixed across time. This analysis yielded near-perfect stability for all parameters across tasks (Figure two, red vertical lines). While such near-perfect stability may seem surprising, it is the result of using the independent hierarchical model in the process of parameter estimation. This result indicates that the lower stability values observed in the real data are not the result of low interparticipant variability or of inadequate task design, but rather that there is true longitudinal variability in the computational phenotype within participants.
Finally, for each task, we also calculated intraclass correlation values for the behavioural measures of accuracy and mean reaction time. These values were mostly in the moderate range, zero point five to zero point seven five. Reaction times were consistently more stable than accuracy values.