Identification of unannotated microproteins involved in endothelial cell homeostasis, dysfunction, and vascular disease
Identification of unannotated microproteins involved in endothelial cell homeostasis, dysfunction, and vascular disease
One. Introduction
The translation of thousands of unannotated human open reading frames has been extensively documented. The majority of these open reading frames consist of one hundred or fewer codons (small open reading frames or smORFs) and exhibit limited evolutionary conservation. Seminal studies have identified open reading frames based on the in silico three-frame translation of transcriptomes or evolutionary sequence conservation, while the majority of recent studies have applied ribosome profiling to demonstrate smORF translation. By detecting the movement of translating ribosomes along a transcript in steps of three nucleotides, that is, periodicity, ribosome profiling can be used to infer the frame of active translation. At first sight, this would seem to be the method of choice to identify smORFs, but there are important shortcomings in its application for smORF annotation. These include very high variations in data quality, depth, and sparseness, as well as the low reproducibility between different smORF detection algorithms. Also, a large number of multi-mapping reads from ribosome-protected footprints cannot be unambiguously assigned to a specific genomic location and are therefore discarded. Even though ribosome profiling has been instrumental in identifying translation events in what were thought to be 'untranslated' regions, that is, upstream open reading frames, downstream open reading frames, and long non-coding open reading frames, it exhibits limited capacity to resolve the multiple periodicities of overlapping reading frames in protein-coding regions, which ultimately results in highly divergent smORF annotations. Indeed, a recent study elegantly documented that four different scoring algorithms applied to the same ribosome profiling dataset produced considerably divergent numbers, biotypes, and length of identified smORFs. It also reported a relatively low reproducibility across replicates. Importantly, in a separate study, many of the microproteins translated from smORFs and whose existence was confirmed by mass spectrometry, were not picked up by ribosome profiling. Indeed, an integrated proteogenomic approach was essential to identify microproteins encoded by smORFs that were either not called or had uncertain mapping by ribosome profiling. Another limitation that distinguishes ribosome profiling from other RNA sequencing techniques is its inability to differentiate between different cell types when ribosomes are isolated from tissues in vivo. Equally difficult is demonstrating that identified smORFs are translated into bona fide microproteins. This is because the likelihood of detecting a protein in standard proteomic pipelines is proportional to its size, and smaller proteins generate fewer tryptic peptides. Although proteomic datasets deposited in repositories have in the past been successfully interrogated to identify microproteins, the lack of targeted enrichment strategies for small proteins means that the actual number of microproteins is likely to be markedly underestimated. Finally, the databases used to match spectra to peptides do not include microprotein sequences, and as a consequence, these tend to be 'filtered out'.
Despite the challenges associated with their detection, it is clear that microproteins are actively involved in biological processes ranging from cell growth to the regulation of metabolism, and are likely to play major roles in pathophysiological conditions. Indeed, microproteins can act as signalling molecules, interact with other proteins to alter their conformation and/or activation state, as well as modulate the assembly of macromolecular complexes with other proteins or nucleic acids. In the heart, hundreds of novel microproteins have been identified, and in cardiomyocytes, microproteins are involved in the regulation of calcium handling, energy production by mitochondria, and contractility, as well as in heart disease. In contrast, the extent to which endothelial cell microproteins are involved in vascular homeostasis or the initiation of vascular disease has not been explored. This is despite the previous association of vascular inflammation with the translation of mitochondrial smORFs.
In this study, we combined whole transcriptome RNA- and RiboTag RNA-sequencing to identify putative endothelial cell-specific smORFs in intact murine tissues in vivo. We focussed on highly vascularized tissues, as the endothelial cells that line blood vessels respond to signals derived from the blood and the stroma and are rapidly activated in response to acute insults. We also studied alterations in the translation of smORFs in the carotid artery endothelium in a disease-relevant model of vascular inflammation. In parallel, whole transcriptome and RiboTag RNA-sequencing studies were conducted using cultured human endothelial cells under homeostatic and inflammatory conditions. The putative smORF datasets served as reference databases for subsequent optimized mass spectrometry analyses to identify bona fide microproteins. All datasets presented herein contain detailed information on the location and sequence of each smORF and the corresponding identified microprotein, are searchable, and represent a valuable resource for future functional studies.
Two. Methods
Two. Methods
Detailed methods are presented in Supplementary Methods online.