Monitoring morphometric drift in lifelong learning segmentation of the spinal cord
Monitoring morphometric drift in lifelong learning segmentation of the spinal cord
ABSTRACT
Morphometric measures derived from spinal cord segmentations can serve as diagnostic and prognostic biomarkers in neurological diseases and injuries affecting the spinal cord. For instance, the spinal cord cross-sectional area can be used to monitor cord atrophy in multiple sclerosis and to characterize compression in degenerative cervical myelopathy. While robust, automatic segmentation methods to a wide variety of contrasts and pathologies have been developed over the past few years, whether their predictions are stable as the model is updated using new datasets has not been assessed. This is particularly important for deriving normative values from healthy participants. In this study, we present a spinal cord segmentation model trained on a multisite dataset, including nine different MRI contrasts and several spinal cord pathologies. We also introduce a lifelong learning framework to automatically monitor the morphometric drift as the model is updated using additional datasets. The framework is triggered by an automatic GitHub Actions workflow every time a new model is created, recording the morphometric values derived from the model's predictions over time. As a real-world application of the proposed framework, we employed the spinal cord segmentation model to update a recently introduced normative database of healthy participants containing commonly used measures of spinal cord morphometry. Results showed that (i) our model performs well compared with its previous versions and existing pathology-specific models on the lumbar spinal cord, images with severe compression, and in the presence of intramedullary lesions and/or atrophy achieving an average Dice score of zero point nine five zero point zero three; (ii) the automatic workflow for monitoring morphometric drift provides a quick feedback loop for developing future segmentation models; and (iii) the scaling factor required to update the database of morphometric measures is nearly constant among slices across the given vertebral levels, showing minimum drift between the current and previous versions of the model monitored by the framework. The code and model are open source and accessible via Spinal Cord Toolbox version seven point zero.
One. INTRODUCTION
One. INTRODUCTION
Spinal cord segmentation is relevant for quantifying morphometric changes, such as cord atrophy in multiple sclerosis, compression severity in degenerative cervical myelopathy, and spared tissue in spinal cord injury. The development of a robust and accurate spinal cord segmentation tool requires a large sample size which often involves the collaboration of multiple sites and the inclusion of a wide spectrum of MRI scans spanning various spinal cord pathologies, image resolutions, orientations, contrasts, and potential image artifacts. Consequently, obtaining stable morphometric measurements is challenging, as MRI contrasts with different resolutions (and degrees of anisotropy) have varying levels of partial volume effects, leading to subtle shifts in the boundary between the cord and the cerebrospinal fluid. Furthermore, the stability of morphometric measurements is inherently dependent on the version of the segmentation tool and may drift as newer versions are released. This poses a challenge in studies where morphometric measures (e.g., cross-sectional area) are monitored across time.
Previous work in automatic spinal cord segmentation has been limited by a lack of standardization, with models often developed in isolation using different procedures for creating ground-truth masks, different model architectures, and varying training strategies. Gros et al. proposed a collection of contrast-specific models trained on healthy controls and multiple sclerosis patients. It uses a convolutional network with two-dimensional kernels, which fails to capture the full spatial context in three-dimensional, resulting in poor performance in degenerative cervical myelopathy and spinal cord injury patients with lesions. Masse-Gignac et al. developed a cascade of two CNNs, trained separately on axial and sagittal T two-weighted scans, for segmenting injured spinal cords, adapting ground truth masks from S C T deepseg S C two D. Nozawa et al. focused on the segmentation of compressed spinal cords with two-dimensional UNets using transfer learning from DeepLabv three models. Bédard et al. introduced contrast_agnostic, a three-dimensional model trained on a dataset of healthy participants, which generalizes across contrasts but struggles to segment pathological cases. The existence of numerous specialized models highlights the lack of standardization in the development of an automatic segmentation pipeline and no continuous learning pipeline exists to monitor or mitigate drift in the segmentation performance of these models over time.
Morphometric measures derived from spinal cord segmentations are highly dependent on the method used and may drift as the methods evolve. This can lead to inconsistencies in normative values across methods. Moreover, morphometric measures exhibit substantial inter-participant variability driven by factors such as age and sex, which limits sensitivity to subtle changes. One approach to mitigate this variability is to compare them with morphometrics obtained from healthy controls. These normalization techniques assume that the morphometrics from new participants are computed using the same method as the normative reference—an assumption that no longer holds as segmentation methods are iteratively improved upon, highlighting the need for population databases to evolve alongside segmentation techniques.
Given that the aforementioned tools only target a limited set of pathologies, often with few MRI contrasts, there is great value in unifying their specialized analyses into a single model which could work with a substantially larger, cumulative, training set. With segmentation frameworks such as nnUNetV2, which has been widely adopted by the medical imaging community due to its robustness and generalization to several modalities and neural network architectures, achieving this objective is now possible. In addition, a standardized training strategy to continuously update models over time, monitor performance drift between various model updates, and manage model retraining would streamline these approaches substantially. Such a lifelong learning framework ensures that the model remains robust to shifts in the data distribution and continually refine their segmentation performance across the diverse set of contrasts and pathologies.
To address these challenges, our study contributes the following:
One. An automatic spinal cord segmentation model trained on a multi-site dataset gathered from seventy-five sites worldwide. This dataset consisted of nine different MRI contrasts spanning a wide range of image resolutions, including pathologies such as MS (with different phenotypes), traumatic SCI (acute and chronic), and non-traumatic SCI (DCM and ischemic SCI).
Two. A lifelong learning framework for developing models to segment new contrasts and pathologies over time. This framework also presents an automatic workflow capable of monitoring the drift in the spinal cord morphometrics across various versions of the models using GitHub Actions.
Three. Validation of the lifelong learning framework to update a normative database of spinal cord morphometric measures.
The proposed spinal cord segmentation model and normative database are open source and integrated into the Spinal Cord Toolbox, accessible as of version seven point zero.