genes
genes
Article
Diverse Processes Drive the Origination and Maturation of an Array of Enhancers and Silencers During a Vast Evolutionary Timescale of a Bicistronic Gene
Abstract
Background/Objectives: A central question in molecular genetics concerns how transcrip- tional regulatory sequences and de novo genes originate and reach evolutionary fixation. In this study, we utilize the human bicistronic gene SMIM45 as a model to analyze the evo- lutionary trajectories of gene development. This locus comprises several functional units: three enhancers (one featuring an embedded silencer), an exonic silencer that partially overlaps an ORF, a highly conserved ancestral sequence encoding a sixty-eight amino acid microprotein, and a human-specific de novo gene encoding a one hundred seven amino acid protein expressed spatiotemporally in embryonic brain tissues. Methods: The alignment of gene sequences from different species was used to determine the evolutionary development of enhancers and silencers, and the development of the exonic silencer was determined through application of the cultivator model and assessment of nearest-neighbor bases. Results: We identify signifi- cant disparities in formation mechanisms; for example, the LOC twelve seven eight nine six four three zero NANOG hESC enhancer originated simply via two Alu insertions that constitute the enhancer. In contrast, the exonic silencer (a segment of the LOC thirteen zero zero six seven five nine ATAC-STARR-seq lymphoblastoid silent region one three eight one five)-a distinct, novel type of silencer-originated from a combination of diverse mechanisms, including a "cultivator gene" process of base pair fixation, consistent with the cultivator model proposed by Li Zhao and coworkers. Conclusions: SMIM45 exemplifies novel development mechanisms occurring over hundreds of millions of years, culminating in the birth of a human-specific, de novo one hundred seven amino acid cistron. The associated com- plex of enhancers and silencers suggests intricate regulation of the one hundred seven amino acid protein in fetal brain tissues.
One. Introduction
One. Introduction
Although bicistronic genes are uncommon in eukaryotes, they are being revealed more frequently using improved detection techniques. These genes form a heterogeneous group with diverse mechanisms for RNA transcription and protein expression. For instance, some utilize alternative RNA transcript isoforms, such as those expressed spatiotemporally in hippocampal neurons, while others employ leaky scanning from an internal ribosome entry site or an upstream open reading frame that inhibits cap-dependent translation. SMIM45 is also a bicistronic gene, encoding both an ancient sixty-eight amino acid microprotein and a human-specific, de novo one hundred seven amino acid protein. Notably, while the sixty-eight amino acid microprotein is expressed in somatic tissues, the one hundred seven amino acid protein is expressed spatiotemporally in embryonic brain tissues. A significant feature of SMIM45 is that it contains an array of enhancers and silencers that likely regulate the transcription of the one hundred seven amino acid cistron from its promoter, potentially representing a distinct transcriptional/translational mechanism for a bicistronic gene. This distinguishes the process from protein expression from a bicistronic transcript that relates to the above-mentioned processes. Thousands of bicistronic genes are yet to be characterized; notably, Raj et al. identified over two thousand of them in human gene analyses. Given the large number of uncharacterized genes, SMIM45 may represent a new class of bicistronic genes.
Enhancers and silencers are short regulatory sequences found abundantly throughout the human genome, which bind transcription factors and can function by regulating cell-specific transcription of genes in embryonic tissues. Super-enhancers refer to multiple enhancers present in a genomic locus that ensure gene expression in specific tissues, while super-silencers denote the presence of two or more silencers in a gene locus that act together to provide strong signals for repression of gene expression, whereby repression is dependent on the locus' high CpG content. As functional data are not yet available, we have not termed the array of enhancers and silencers as super-enhancers/super-silencers; however, their presence lends credence to these terms.
Due to the regulatory elements present in SMIM45, the gene is well suited to an analysis of the evolutionary formation of enhancers and silencers. In this paper, we analyze the mechanisms of origination and the timeline of the appearance and completion of the enhancers and silencers during evolution. The study shows that regulatory elements formed through diverse mechanisms, highlighting the development of SMIM45 via the continuous birth of functional elements over approximately four hundred million years. In particular, our analysis reveals that the short, exonic silencer (a segment of the silencer LOC thirteen zero zero six seven five nine ATAC-STARR-seq lymphoblastoid silent region one three eight one five), which overlaps the C-terminal sequence of the sixty-eight amino acid protein gene, originated and matured through a combination of distinct molecular processes. This silencer is unusual as it spans both a gene promoter and an ORF, and appears to be unique among known silencing elements. Comparisons are also made with known properties of silencers and enhancers, providing insight into how the expression of the one hundred seven amino acid cistron may be regulated.
Emera et al. investigated the evolution of enhancers, introducing a model of proto-enhancers as small, early developmental sequences that serve as nucleation sites for further development. We address the role of proto-enhancers/proto-silencers in SMIM45 development, but note that Emera et al.'s model aligns with the previously described evolution of the one hundred seven amino acid protein cistron. Here, development proceeds via the initial for- mation of a proto-gene, a short amino acid sequence called an early developmental sequence. In this framework, the proto-gene originates in ancient species and ma- tures through the contiguous fixation of nearest-neighbor bases of the original as well as secondary proto-genes.