Essays in Biochemistry

Pseudogenes as regulators of biological function

Ryan C. Pink, David R.F. Carter


A pseudogene arises when a gene loses the ability to produce a protein, which can be due to mutation or inaccurate duplication. Previous dogma has dictated that because the pseudogene no longer produces a protein it becomes functionless and evolutionarily inert, being neither conserved nor removed. However, recent evidence has forced a re-evaluation of this view. Some pseudogenes, although not translated into protein, are at least transcribed into RNA. In some cases, these pseudogene transcripts are capable of influencing the activity of other genes that code for proteins, thereby altering expression and in turn affecting the phenotype of the organism. In the present chapter, we will define pseudogenes, describe the evidence that they are transcribed into non-coding RNAs and outline the mechanisms by which they are able to influence the machinery of the eukaryotic cell.

  • non-coding RNA
  • pseudogene
  • RNA
  • transcription


A pseudogene is generally defined as a copy of a gene that has lost the capacity to produce a functional protein. They were first discovered in the 1970s when a copy of the 5S rRNA gene was found in Xenopus laevis with homology to the active gene, but with a clear truncation that rendered it non-functional [1]. Sporadic discovery and characterization of pseudogenes over the following 20 years has revealed a number of mechanisms for pseudogene formation [2]. Unitary pseudogenes are formed when spontaneous mutations occur in a coding gene that abolish either transcription or translation (Figure 1A) [3]. A second class of pseudogene, the duplicated pseudogene (Figure 1B), is formed when replication of the chromosome is performed incorrectly [2]. Such duplication events often lead to the formation of functional gene families, such as those found in the Hox gene clusters, but if part of the gene is not faithfully copied then these can lead to frameshift mutations or the loss of a promoter or enhancer, thus resulting in a non-functional duplicated pseudogene. The final class, known as the processed pseudogene (Figure 1C), is formed when an mRNA molecule is reverse-transcribed and integrated into a new location in the parental genome [4]. Because processed pseudogenes are produced from mRNA, they usually lack introns and a promoter, and are therefore only transcribed if they become integrated close to a pre-existing promoter [5].

Figure 1. Different classes of pseudogene

(A) A unitary pseudogene is formed when a spontaneous mutation occurs in a coding gene. Such mutations may ablate transcription from the promoter or cause premature stop codons or frameshifts to occur. (B) A duplicated pseudogene is formed when a gene is duplicated, but in such a way that mutations in the copy prevent formation of a protein. (C) Processed pseudogenes arise when DNA is transcribed into RNA, which is then reverse-transcribed into copy DNA (cDNA) and integrated into the genome. Such pseudogenes often lack promoter activity and may have deletions or truncations that prevent protein formation. Closed boxes depict exons; open boxes depict introns; ‘X’ shows a mutation that prevents the DNA from being able to make a protein.

The sequencing of a range of genomes, including the human genome, has revealed the extent of pseudogene abundance [68]. Estimates for the number of human pseudogenes range from 10000 to 20000, making them almost as prevalent as coding genes [9]. The majority of these are processed pseudogenes [68] and fewer than 100 are unitary pseudogenes [3]. Interestingly, the processed pseudogenes found in the human genome have been formed from just 10% of the coding genes [6,8], suggesting that either not all genes are capable of producing processed pseudogenes, or that only the processed pseudogenes produced by certain types of gene are selected for by evolution. The types of genes that produce processed pseudogenes are predominantly highly expressed housekeeping genes or shorter RNAs such as genes encoding ribosomal proteins [10]. It is of note that whereas mammalian genomes are particularly well endowed with pseudogene numbers [9], they are by no means the only species that harbour them. Pseudogenes have been found in various species [11], including bacteria, plants, insects and nematode worms, examples of which can be found in various databases [12].

Pseudogenes have often been labelled as ‘junk DNA’ because they lack protein-coding capacity. In fact, some genes that appear to be pseudogenized may in fact code for proteins [11]. Others are genuinely non-coding, but are by no means ‘junk’, as they may actually play functional roles [11,13,14]. In particular, it is apparent that some pseudogenes are capable of producing lncRNAs [long ncRNAs (non-coding RNAs)] [11]. Although RNA was seen previously as taking a purely intermediary role in the expression of proteins from DNA, it is now widely acknowledged that ncRNAs can play significant roles in the regulation of gene expression [15]. The present chapter explores the evidence that some pseudogenes can regulate gene expression via the generation of ncRNAs.


Examples of functional ncRNAs are being discovered at an ever-increasing rate, and their mechanism of function is diverse. Although the DNA encoding pseudogenes can play a role in normal biology, for example in the generation of antibody diversity [13], it is likely that the majority of pseudogene function is mediated through RNA molecules. Analysing the prevalence of pseudogene transcription should therefore give us insight into their potential function.

The way in which pseudogenes are generated often leads to their transcriptional silencing. Duplicated pseudogenes may be formed with mutations or deletions in their promoters, and processed pseudogenes may become integrated into transcriptionally silent regions of the genome. Assessing levels of pseudogene RNA can be difficult due to close homology with the original gene from which it was copied. Nevertheless, it has been demonstrated that many pseudogenes are transcribed into RNA, including the pseudogene versions of the housekeeping gene GAPDH (glyceraldehyde-3-phosphate dehydrogenase) [16], the transcription factor Oct4 [17] and the tumour suppressor PTEN (phosphatase and tensin homologue deleted on chromosome 10) [18]. Microarray technology and next-generation DNA sequencing experiments now give us tools to analyse pseudogene transcription in a genome-wide manner; results using such techniques suggest that as many as one-fifth of pseudogenes may be transcribed into RNA [5]. Recent RNA-seq experiments have shown that pseudogene RNA represents a significant proportion of the transcriptome in cancer cells [19]. A recent genome-wide study of pseudogene sequences revealed that some are relatively well conserved, and these are more likely to be transcribed [20]. Furthermore, approximately half of transcribed pseudogenes identified in humans are well conserved across primates [21].

Analysing the expression of coding genes in different tissues, dynamically during development or specific biological responses, or during disease, can give insight into their function. The same principle also holds true for pseudogenes and other genes encoding ncRNAs [11]. It is worth noting that the high contribution of pseudogene RNAs to the transcriptional landscape of the cell makes designing primers for PCR (for analysing the activity of genes or pseudogenes) particularly challenging. Several pseudogenes exhibit tissue-specific patterns of transcription, with a particular prevalence for testis-specific expression [5,22]. Pseudogenes can also exhibit patterns of expression that are distinct from the parent coding genes from which they were copied [23,24]. Pseudogene RNA levels can also change during differentiation [25] and in diseases such as cancer [26] and diabetes [27]. The findings that pseudogene transcription can be conserved across millions of years, and can occur in a dynamic and tissue-specific manner suggests that the transcripts generated from pseudogenes may have an important role.

Evidence for function

To clearly demonstrate a genuine biological role for any gene it is not sufficient to show a correlation. In the last 15 years a number of functional experiments have been carried out that support a biological role for pseudogene RNA molecules in the regulation of their protein-coding counterparts [11,13]. Several genes associated with cancer progression have pseudogenes that may contribute to the pathophysiology of tumours. The ABC (ATP-binding cassette) transporters are a family of transmembrane channels involved in the transport of various solutes across membranes and whose deregulation is associated with drug resistance in tumours. ABCC6 is one member of this family and, thanks to a well-conserved promoter, shares a similar pattern of tissue-specific expression to a pseudogene (ABCC6P1). The mRNA levels of the ABCC6 gene are reduced when transcript levels of the ABCC6 pseudogene are specifically reduced [28]. Oct4 is a pluripotency-associated transcription factor that is involved in stem cell identity and is often deregulated during cancer progression. Overexpression of an Oct4 pseudogene (Oct4P1) increased stem cell proliferation and inhibited differentiation of the mesenchymal lineage [29]. Overexpressing the pseudogene of the proto-oncogene BRAF led to increased MAPK (mitogen-activated protein kinase) signalling and a transformed phenotype in the mouse cell line NIH 3T3, and also caused tumour formation in mice [30]. In these cases the mechanisms by which the pseudogenes regulate the coding genes is unclear, but in other experiments a range of mechanisms are emerging that pseudogenes and their transcripts use to regulate gene expression and in turn cellular processes.

Antisense pairing and siRNA (small interfering RNA) production

If a processed pseudogene is integrated close to a promoter then this can result in transcription of the pseudogene. If the processed pseudogene is integrated in reverse orientation relative to the promoter then this would lead to an antisense transcript of the pseudogene, which, if it retains significant homology with the parent gene, could hybridize with the sense mRNA from the original coding gene [31]. Such an interaction occurs between the mRNA for the nNOS (neural nitric oxide synthase) gene and the RNA produced from a related pseudogene, which is transcribed in the antisense direction [32]. When both are transcribed in the same neurons of the snail Lymnaea stagnalis, the two form a duplex which leads to reduced translation of the coding gene (Figure 2A) [32]. Similarly when a transcript that is produced in the antisense direction to an Oct4 pseudogene is blocked this leads to reduced levels of Oct4 expression [33].

Figure 2. Mechanisms of pseudogene functionality

(A) Pseudogene RNA transcribed in the reverse (antisense) direction can combine with forward (sense) transcripts from the coding gene to produce dsRNA. This can inhibit translation of the coding RNA, or produce siRNAs that go into the RNAi pathway and cause the coding RNA to be degraded. siRNAs that destroy the coding transcript can also be generated by (B) pairing between sense and antisense transcribed pseudogenes and (C) double-stranded regions formed by secondary structure within a single pseudogene transcript. (D) Pseudogene transcripts may share binding sites for miRNAs or trans-acting proteins that regulate the stability of the mRNA. Increased levels of pseudogene transcripts can compete for these factors and therefore shield the coding transcripts from their effects.

The pairing of sense and antisense transcripts leads to the formation of dsRNA (double-stranded RNA), which can trigger activation of the RNAi (RNA interference) pathway. In mouse oocytes it has been shown that pairing of antisense pseudogene RNA and sense coding-gene mRNA leads to the formation of such duplexes [34,35]. Dicer, a protein component of the RNAi pathway, slices the dsRNA into smaller fragments known as siRNAs. These siRNAs are incorporated into the RISC (RNA-induced silencing complex) and lead to the degradation of mRNA from the parental coding gene. For example, siRNAs were produced when mRNA from the Ppp4r1 (encoding a protein phosphatase) gene and an antisense RNA from a pseudogene with high homology were combined; these siRNAs then appear to repress Ppp4r1 expression (Figure 2A) [35]. Interestingly, the siRNAs generated did not always come from the pairing of a pseudogene RNA with a coding gene mRNA. Sometimes they were generated from the pairing of two pseudogenes (one transcribed in the sense direction and the other in the antisense), but the siRNA then represses the coding parent gene, such as in the case of HDAC1 (encoding a histone deacetylase enzyme) (Figure 2B) [34]. In other instances the siRNAs were generated from the internal pairing of different regions within the same pseudogene transcript (i.e. from double-stranded regions formed by secondary structure folding). An example of the latter is the formation of hairpin loop structures in the Au76 pseudogene RNA, which are processed into siRNAs that repress expression of the homologous coding gene Rangap1 (encoding a protein that regulates G-coupled receptor signalling) (Figure 2C) [35]. Other organisms, including rice [36] and trypanosomes [37] have been shown to generate siRNAs from pseudogenes, which have the potential to repress expression of the parent coding gene, suggesting that this mechanism of pseudogene function may be relatively widespread in nature.

Regulation of mRNA stability

The regulation of mRNA stability is one way in which gene expression can be controlled. The stability of an mRNA can be influenced by protein factors that bind at different locations in the RNA [38]. If a pseudogene has a high homology with the parent coding gene, including the presence of the same cis-elements, then the RNAs from both could compete for the same pool of trans-acting molecules. Increasing the transcription of the pseudogene could produce a ‘sink’ for these trans-acting molecules, effectively lowering the concentration of the free proteins and thus changing the stability of mRNA from the coding genes (Figure 2D). This mechanism has been suggested for regulation of the imprinted Makorin-1 gene (encoding an enzyme that adds an ubiquitin moiety to other proteins) by the related pseudogene Makorin1-p1 [39]. Deregulation of the chromatin-related protein HMGA1 (high-mobility group A1) is involved in the development of Type 2 diabetes mellitus. In two diabetes patients, a low level of HMGA1 was found in correlation with a high level of the HMGA1 pseudogene RNA [27]. Blocking the pseudogene RNA partially restored the level of HMGA1 protein, suggesting that both transcripts compete for a positively stabilizing protein factor; when the level of pseudogene RNA increases it sequesters the protein factor, thus lowering the concentration of the free protein and causing a destabilization of HMGA1 mRNA [27]. The MYLKP1 (myosin light chain kinase pseudogene) gene is transcribed at higher levels in cancer cells. Overexpression of the pseudogene leads to destabilization of the parental gene mRNA and an increase in proliferation [40].

Another class of molecules that affects mRNA stability is the miRNA (microRNA). miRNAs are small (21–22 nt) single-stranded ncRNA molecules that are incorporated into the RISC and repress the expression of specific genes. This repression is achieved by base pairing between regions of the miRNA and the mRNA leading to degradation of the mRNA [41]. A single miRNA can target hundreds of different genes and any given mRNA can be targeted by more than one miRNA. Just as pseudogene RNAs can act as a ‘sink’ to sequester proteins that regulate coding mRNA stability, so they can also act as decoys to draw miRNAs away from coding genes (Figure 2D). A striking example of this was demonstrated for the PTEN gene and a related pseudogene PTENP1 [26]. PTEN is a tumour suppressor gene whose expression level must be carefully regulated; even minor reductions in PTEN protein abundance can influence cancer initiation and severity [42]. The level of homology between PTEN and PTENP1 is highest at the 3′-UTR (untranslated region), which is of significance because most miRNA–mRNA interactions are thought to occur at the 3′-UTR of the mRNA. Reducing the levels of PTENP1 RNA leads to lower levels of PTEN mRNA and protein, and an inhibition of cell growth [26]. Overexpression of the 3′-UTR of PTENP1 led to the reverse effect, with increased levels of PTEN expression and a stimulation of cell division [26]. These results suggest that specific miRNAs bind to both the PTEN and PTENP1 3′-UTR regions, and that increasing the amount of ‘decoy’ pseudogene transcript causes the miRNAs to be sequestered, lowering the effective concentration of the free miRNA in the cell and thus lifting the repression on the coding gene mRNA. This is consistent with the finding that PTEN and PTENP1 expression levels are usually correlated in prostate cancer samples and that PTENP1 is often deleted in sporadic colon cases [26]. A similar mechanism-of-action has been suggested for other gene–pseudogene pairs, including the oncogene KRASI and the homologous pseudogene KRASP1 [26]. Pseudogenes (or any ncRNAs) that act in this way to sequester miRNAs have been described as ‘ceRNAs’ (competing endogenous RNAs) [43]. The implications of these findings extend beyond the activity of individual pseudogenes, suggesting that the regulation of any given gene is partially dependent on the complex interactions of many RNA molecules (coding and non-coding) throughout the genome.


Genome sequencing has revealed an apparent paradox: the genomes of higher organisms such as humans do not have significantly more genes than lower organisms such as the nematode worm. To reconcile this it has been suggested that in higher organisms the greater abundance of regulatory ncRNAs allows the cell to fine-tune the expression of genes more precisely, thus orchestrating a more complex phenotype from the same number of building blocks [44]. Some unicellular organisms do not tolerate the formation of pseudogenes and instead actively remove them [45]. Mammals, and primates in particular, seem to retain and to some extent conserve pseudogenized genes [11,21]. This, coupled with the finding that many pseudogenes are transcribed, is consistent with pseudogenes playing a role as ncRNAs in regulating the activity of coding genes. It is unlikely that all pseudogenes play functional roles, but it appears that higher organisms have mechanisms in place that are ready to harness and conserve pseudogenes when one spontaneously arises that confers a useful regulatory role. Functional experiments have revealed that some pseudogenes do indeed play biological roles in cells, using a variety of mechanisms to influence genes and therefore affecting the phenotype of various organisms. With further experiments using next-generation sequencing technologies, the true extent of pseudogene influence and the mechanisms they use should be revealed.


  • Pseudogenes are copies of genes that have lost the ability to produce a functional protein.

  • Because they do not produce a protein, pseudogenes are often thought of as evolutionary relics, but evidence is emerging that some can play functional roles.

  • Many pseudogenes are transcribed into RNA, and it is already known that some non-coding RNAs play a role in regulating gene expression.

  • Many pseudogene RNAs appear capable of repressing or activating protein-coding genes with which they share sequence homology.

  • The mechanisms of pseudogene function are varied, but often involve regulating the stability of the coding gene mRNA.

  • This can be achieved by the pseudogene-mediated generation of small interfering RNAs, which knock down the coding gene via the RNA interference pathway, or by pseudogene transcript-mediated depletion of protein factors or microRNAs that affect coding mRNA stability.


We thank members of the laboratory for the critical reading of the chapter before submission and apologise to those whose excellent work has not been described due to space constraints. This work was supported by grants from Sparks and the Cancer and Polio Research Fund.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
View Abstract