Circadian rhythms (~24 h) in biochemistry, physiology and behaviour are found in almost all eukaryotes and some bacteria. The elucidation of the molecular components of the 24 h circadian clock in a number of model organisms in recent years has provided an opportunity to assess the adaptive value of variation in clock genes. Laboratory experiments using artificially generated mutants reveal that the circadian period is adaptive in a 24 h world. Natural genetic variation can also be studied, and there are a number of ways in which the signature of natural selection can be detected. These include the study of geographical patterns of genetic variation, which provide a first indication that selection may be at work, and the use of sophisticated statistical neutrality tests, which examine whether the pattern of variation observed is consistent with a selective rather than a neutral (or drift) scenario. Finally, examining the probable selective agents and their differential effects on the circadian phenotype of the natural variants provides the final compelling evidence for selection. We present some examples of how these types of analyses have not only enlightened the evolutionary study of clocks, but have also contributed to a more pragmatic molecular understanding of the function of clock proteins.
In the last 30–40 years, chronobiology has changed from a specialized field studying ‘eccentric' phenomena into a cutting-edge area of science with significant implications for biomedical research . This ‘cultural revolution' has coincided with the explosion of recombinant DNA technology, which has made it possible to assign a molecular identity to those genes that constitute the core components of the clock. This has established their function and, in general, provided information on clock design. Consequently, among complex behaviours, the rhythmic rest–activity cycle, which is controlled by the circadian clock, is arguably one of the best described and understood at the molecular level.
The symbiotic relationship between the study of the circadian clock and behaviour is not obligatory, in that there are many other phenotypes, physiological and biochemical, that could be studied in terms of temporal regulation. However, the study of animal clocks has historically overlapped with behavioural phenotypes because they are technically simpler and cheaper to quantify, and they provide a whole-organism outlook rather than that of a physiological variable from series of individuals sacrificed at different time points . Consequently, chronobiologists use behaviour as an entrée into the clock, and they use gene mutations in model organisms as a scalpel to dissect the neural basis of the circadian system.
Is the clock really important in the natural world?
As chronobiologists, we like to think that the ability to respond to environmental change is of fundamental importance to all organisms, in particular the capacity to anticipate predictable phenomena. By adjusting metabolism, physiology and behaviour in advance of these changes in light and dark, and hot and cold, the organism is ready for the challenges of the coming day or night. One would reasonably suppose that these relentlessly monotonous changes, every day and every night over the last 3–4 billion years will have resulted in selective pressure to express appropriate biological activities in phase with the natural rhythms of day and night; hence the evolution of a circadian clock.
It is now accepted that the clock is an important regulatory system affecting human health and well-being and not just the cause of inconveniences experienced by shift workers and jet-setters alike. In this respect, the discovery in mammals that critical cell-cycle components are also regulated in a circadian manner suggested that the mitotic and the circadian cycles might share components, leading to the speculation that disruption of the 24 h rhythms might predispose to cell-cycle dysfunction, possibly leading to tumorigenesis [3–5]. Indeed, a knockout of the mammalian Per2 (Period2) gene, has revealed that this clock gene can act as a tumour suppressor . Moreover, there is an ever-increasing literature showing correlations between misregulation of the clock and cardiovascular disease, cognitive dysfunction, obesity and depression . Of course, it could be that polymorphisms in clock genes might predispose to such conditions, not because of problems with the clock per se, but because of gene pleiotropy. For example, in Drosophila, a specific form of learning is disrupted in arrhythmic per mutants, but not in equally arrhythmic timeless, Clock and cycle mutants. Thus it is not the arrhythmicity per se that contributes to this poor learning phenotype, but a pleiotropic effect of the per gene . In other words, it is possible that the circadian clock might contribute very little, if anything, to the reproductive (Darwinian) success of an organism. The presence of a functional clock could simply be neutral, and selection would be driven by these pleiotropic effects of clock genes, which contribute, as a by-product, to preserving the integrity of the clock.
Another example, again in the fly, is that male clock mutants produce fewer numbers of gametes [8,9]. This may have nothing directly to do with circadian rhythms, but be pleiotropic effects of clock genes on the cell cycle. Thus Darwinian fitness would be enhanced by having these clock genes working harmoniously to generate the gametes, and a by-product would be normal circadian clock function. We are playing devil's advocate here, because although this scenario may contribute to maintaining optimal clock function, we feel it is unlikely that the circadian clock itself has not been the focus of evolutionary pressure, simply because of its ubiquity. Surely, at some point in evolution, the connections between the clock genes and pleiotropy would have been decoupled, so that they maintained their ‘extra' pleiotropic functions in the absence of a working circadian clock. Thus there should be species existing with clock genes, but no circadian behaviour. Indeed there are, but these are specialized organisms living in arrhythmic environments, sometimes for just a part of their life cycle (e.g. reindeer in the Arctic winter and summer months) . However, for the sake of argument, might it be that the detrimental effects we see on human health, in chronic shiftworkers for example , are not caused by the clock per se, but rather are derived from pleiotropic effects of disrupting clock gene regulation by ‘unnatural' circadian schedules?
It is therefore important to demonstrate experimentally that clocks do affect fitness, and that natural selection is able to change allele frequencies of clock genes in response to different environments. It is equally important to show that different alleles can produce visible phenotypic changes that can become the substrate for natural selection. Several laboratories have examined the costs, in terms of survival or fecundity, of running the clock out of synchrony with the environmental cycle. Investigations have been carried out in Cyanobacteria , in insects such as Drosophila  and mosquitoes , and in mammals . These studies have used manipulation of the environmental LD (light–dark) cycle and/or circadian mutants, or removal of clock function surgically or genetically. A prominent example comes from the photosynthetic and highly rhythmic Cyanobacteria. In a 24 h world, the wild-type bacteria outcompete short- or long-period mutants, whereas in a short- or long-cycle world, it is the corresponding mutants that show enhanced fitness compared with the wild-type . It is difficult to conceive of a situation where resonating the environmental cycle in tune with the mutant period would restore that fitness, unless fitness was directly coupled to the clock. The general conclusion from this and other similar experiments is that desynchronization between the environment and the endogenous clock of an organism is usually detrimental to fitness, including the presence of a clock in a constant environment. However, identifying the action of natural selection via changes in gene frequencies and tying these into specific phenotypes that contribute to changes in fitness is not easy.
A molecular model for the clock
Before progressing any further we must briefly describe the molecular components of the clock, which constitute the substrate on which natural selection may act. As the fruit fly Drosophila melanogaster provided the first molecular model for the clock, we will use this as our eukaryotic circadian template, recognising that the major components are largely conserved in mammals. In addition, the general scheme of a negative-feedback loop, but with different components, is also conserved in the major lower eukaryotic model Neurospora crassa. In the fly, PER (Period) and TIM (Timeless) act as negative autoregulators of their own transcription, whereas CLK (Clock) and CYC (Cycle) represent the positive elements. The loop is centred around the rhythmic production of per and tim mRNAs, which are then translated into proteins in the cytoplasm at the beginning of the night (Figure 1). PER and TIM physically interact and move into the nucleus in the second half of the night, where they inhibit the CLK–CYC dimer, preventing further transcription. Although the peak of per and tim transcripts is at about ZT16 [ZT (Zeitgeber time) is the time imposed on the system by the rhythmic environmental variable; the Zeitgeber (time giver) in this case is light. Conventionally, ZT0=‘lights on' and ZT12=‘lights off' in an LD 12:12 cycle] the respective proteins reach their maximum levels approx. 6 h later, coinciding with their nuclear accumulation . This is due to a complex interplay of phosphorylation [e.g. by the kinases DBT (Doubletime) and SGG (Shaggy)], dephosphorylation and degradation that regulates the half-life of the proteins and their subscellular localization. Indeed, it has recently emerged that rhythmic post-translational regulation (phosphorylation/de-phosphorylation/degradation of effector proteins) may be even more important than rhythmic transcription in maintaining circadian phenotypes . In addition, CLK is also in a loop of its own driven by rhythmic transcription of PDP1ε (Par Domain Protein 1ε) and VRI (Vrille), which intersects with the per–tim loop, providing further stability to the system .
To any organism it makes a big difference whether you live in the north or in the south of your respective hemisphere. In Leicester, U.K., for instance, days are longer and cooler during summer and shorter and colder in winter than in Padova, Italy. In Khartoum, Sudan, the temperature is more or less constant throughout the year. Thus it is reasonable to assume that adaptations might have evolved to help British and Italian organisms to cope with the more stressful demands imposed by their respective environments. This is not simply limited to poikilotherms, because photoperiods throughout the season show dramatic changes and this will also affect homeotherms. In Khartoum, these daylength changes are negligible, but in Leicester on midsummer's day, there are 19.5 h of light (although strictly speaking it does not really get completely dark, even during the night). In Padova, there are nearly 17 h of light on midsummer's day, but with pitch-black nights. It is expected that genes and proteins that are responsive to circadian changes in light and temperature may adapt the organism to these dramatically different environmental challenges by altering the frequencies of ‘wild-type' alleles that are more adaptive in one situation than another. This provides a rationale for collecting samples along a latitudinal transect and testing for polymorphisms that might be more or less frequent in one geographical location compared with another. As environmental variables change progressively along the geographical spectrum, frequencies of genes that are responsive to environmental variables may show clinal variation that correlates with latitude.
The period Thr-Gly region
In D. melanogaster, the Thr-Gly region encoded within PER (Figure 2A) provides an interesting example of a cline, as it shows latitudinal variation in the distribution of two main natural variants in Europe  (Figure 2B). The Thr-Gly region is a bit of an oddity, as laboratory strains reveal a length polymorphism for this region for 17, 20 and 23 Thr-Gly uninterrupted repeats  (Figure 2C). Removal of the whole Thr-Gly repeat did not dramatically affect the circadian phenotype, measured at 25°C; however, it dramatically shortened a rhythmic component found in the male's courtship song, which normally has a rhythm of ~60 s  (Figures 3A and 3B). This trait was reduced to about 40 s in the Thr-Gly deletion, a value similar to that of the conspecific sibling species Drosophila simulans . Indeed the classical short (19 h) pers, long (29 h) perL and arrhythmic per0 mutations also have parallel changes on the song cycles, which are altered to 40 s, 80 s and arrhythmic songs respectively . By substituting the Thr-Gly region of D. simulans plus some flanking regions into the PER of D. melanogaster, melanogaster host transformant flies sing like their simulans cousins with a 40 s song cycle . Moreover, D. melanogaster carrying the natural 17, 20 and 23 Thr-Gly variants show a linear relationship with song period (50–60s). The 40 s song cycle obtained by reducing Thr-Gly length artificially to zero in D. melanogaster (Figure 3C) falls nicely into line with that relationship within this species . D. simulans flies are also polymorphic in Thr-Gly length, with the main variants showing 23, 24 and 25 Thr-Gly repeats . However they sing with a short cycle of 35–40s, so between species it is not the length of the Thr-Gly repeat that is critical, but actually some other species-specific amino acid changes in the flanking region that are exchanged in the Thr-Gly-region interspecific transformation experiments .
We can understand the species specificity of song rhythms by using a ‘tuner' model (Figure 3D). The FM (frequency modulation)/AM (amplitude modulation) switch is represented by these other species-specific amino acid changes. FM may be D. simulans, and AM is D. melanogaster. Once in AM, the wavelength goes from 0–24 (representing the various natural and artificial repeats that have been studied), and the further up the scale, the longer the song period, within the D. melanogaster range of 50–60 s. It remains to be seen whether, in FM (D. simulans) mode, moving the wavelength between 23, 24 and 25 Thr-Gly repeats has similar effects in the D. simulans song rhythm range (30–40 s). As these songs are important for mate selection , it would not be a surprise if the Thr-Gly region was under some form of natural selection. In fact, neutrality tests, which are statistical examinations of the variation observed in a genomic region, for both D. simulans and D. melanogaster Thr-Gly regions, are consistent with a pattern of variation that is maintained by balancing selection, whereby more than a single allele is favoured [21,23]. These observations suggest that polymorphic Thr-Gly alleles are favoured by natural selection, but how song cycles of different Thr-Gly-length carriers relate to fitness is unclear. Although small differences in the song cycles of the different Thr-Gly-variant males can be detected, females appear to be equally receptive to song cycles in the 50–60s range. However, might selection in the Thr-Gly region be acting on circadian behaviour?
The Thr-Gly region and circadian adaptation
Sequencing of natural populations revealed that there are six main length variants in Europe and that the (Thr-Gly)17 and (Thr-Gly)20 alleles are by far the most common, equally contributing to 90% of the total length variation . The codons for Thr and Gly are fully degenerate, thus at the DNA level the Thr-Gly region looks like a string of repeating ACNGGN nucleotides. However, not all possible codons are used or equally represented . By depicting a codon pair as a ‘cassette' it is possible to see a clear pattern (Figure 2C) that suggests that the evolution of the Thr-Gly region is mediated by insertion/deletion events, a process that has a mutation rate several orders of magnitude higher than the base substitution rate [23,25].
As mentioned above, the mutational mechanisms giving rise to new Thr-Gly alleles are such that the generation of new variants should be a rather common phenomenon (in relative terms). Therefore we would expect to see many alleles, many more than the six that we observe in Europe. In fact in Sub-Saharan Africa, where D. melanogaster originated , an enormous number of different Thr-Gly alleles are represented, most of which are never found in Europe . Some alleles correspond to lengths also represented in Europe, such as (Thr-Gly)18, (Thr-Gly)20 and (Thr-Gly)23, but with new combinations of ‘cassettes'; others, such as those carrying 22 and 24 Thr-Gly repeats, are alleles perhaps specific to tropical regions as some of them are also found in Australia, especially the tropical north, but not in Europe . Curiously, variants shorter than 18 Thr-Gly repeats, are not found (or are extremely rare) in Africa, including the (Thr-Gly)17 alleles that are so common in Europe and also in Australia. The comparison of the structure (i.e. the distribution of the cassettes) of the different Thr-Gly alleles suggests that the (Thr-Gly)23b, present in all continents tested, is the ancestral sequence (Figure 2C) [27,28]. If for a moment we put aside Australia, where D. melanogaster is a very recent invader , simply by looking at the distribution of length variants in Europe and in Africa, it seems that the evolution of the Thr-Gly region has followed very different paths in the two continents, perhaps due to different selective processes. The conditions in Europe seem to have favoured the evolution of shorter variants, especially those that differ for a multiple of three Thr-Gly pairs from the ancestral sequence, namely (Thr-Gly)14, (Thr-Gly)17, (Thr-Gly)20 and (Thr-Gly)23. In Africa, longer variants thrive, the ‘multiple of 3′' rule does not apply (in fact variants with 22 and 24 Thr-Gly pairs are very common) and strong selection seems to be in place against alleles shorter than 18 and longer than 24 Thr-Gly pairs.
The Thr-Gly cline
The distribution of the two main European variants follows a cline, with the (Thr-Gly)20 allele more frequent in the north and the (Thr-Gly)17 more abundant in the south . The other variants (the yellow sector of the pies in Figure 2B) are quite rare and do not show any spatial pattern in their distribution . In eastern Australia, a sample collected in the early 1990s also showed a similar cline with high levels of the (Thr-Gly)20 variant in the south at the higher latitudes . Samples collected more than a decade later showed a much weaker geographical relationship in Australia , although an overall weak pattern similar to Europe could still be detected .
Although both the clinal distributions of gene frequencies in two hemispheres [16,27] and the implementation of statistical analyses of genetic variation  seem to suggest that balancing selection is shaping the distribution of the Thr-Gly variants in Europe, the acid test is to demonstrate phenotypic differences among the alleles. The two main environmental variables that change with latitude are photoperiod and temperature. In Europe, temperature values are generally higher at the lower latitudes, and also more stable during the summer growing season, so that in the Mediterranean the summer is predictably hot, whereas in the North hot and cool days are interspersed. Therefore one of the challenges that the clock has to face, especially in the north, is to be able to compensate for sudden changes in temperature without being tricked into running faster or slower. Circadian clocks possess this property, called temperature compensation, which is likely to be under more selective scrutiny in the north than in the south of Europe. Experiments in the laboratory using both natural variants and transgenic flies carrying (Thr-Gly)17 and (Thr-Gly)20 alleles in the same genetic background have shown that flies carrying the (Thr-Gly)20 allele have excellent temperature compensation, rather better than the (Thr-Gly)17 variant  (Figure 2D). However (Thr-Gly)17 confers a better match to the 24 h environmental cycle during hot summer days, whereas flies carrying the (Thr-Gly)20 allele have a clock running with a slightly shorter period . In support of this finding, the re-examination of the behaviour of flies carrying the artificially generated (Thr-Gly)0 allele at different temperatures has revealed that these flies have lost temperature compensation . In addition, the only outlier for the latitudinal cline was one collected at high altitude where the frequencies of the two variants were consistent with the colder temperature . Finally, a study in ‘Evolution Canyon' in Israel, where the hotter south-facing ‘African' slope has higher temperatures than the cooler north-facing European slope, also revealed the expected differences in Thr-Gly allele frequencies .
The fact that both natural variants and artificially generated transformants (Thr-Gly)17 and (Thr-Gly)20 showed the same phenotype should not be lost on the reader. That linkage disequilibrium, which means another nearby sequence associated with one Thr-Gly allele and not the other, might be generating the phenotypic difference between the (Thr-Gly)17 and (Thr-Gly)20, could be ruled out in this case. This was because the transgenes (Thr-Gly)17 and (Thr-Gly)20 had been constructed by using the template of one allele to generate the other, so all linked genetic variation was identical . Thus the differences in temperature compensation between the two alleles are used as a proxy for Darwinian fitness. This view is supported by experiments in Cyanobacteria that revealed fitness increments when endogenous periods resonated with the environmental period. Lifetime reproductive fitness of different Thr-Gly variants under realistic environmental fluctuations are yet to be performed.
One of the interesting features of European flies is that ~99% have Thr-Gly alleles that differ by a (Thr-Gly)3 unit, from (Thr-Gly)14 to (Thr-Gly)23, and if temperature compensation of these alleles is plotted as the difference in period between hot and cold temperatures the relationship is remarkably linear  (Figure 2D). The very rare alleles that fall out of synchrony with the (Thr-Gly)3 interval, for example flies with 18, 21 or 24 repeats, show dramatic deviations from this linear thermal relationship . Why is this? It turns out that structural studies of Thr-Gly repeats reveal that a (Thr-Gly)3 forms a β-turn, so that in protein terms the structural unit is (Thr-Gly)3 [33,34]. Thus we might imagine that the 14-17-20-23 series of alleles differs by one β-turn and that (Thr-Gly)18, for example, differs from (Thr-Gly)17, by a bit of a β-turn. Maybe that is why these rare variants that fall out of register in terms of the β-turn structure have such poor temperature compensation, which is why they are so rare in the challenging European environment, but very common in the non-challenging thermal environment of sub-Saharan Africa. So why is it that 14-17-20-23 appear to be the favoured alleles rather than 15-18-21-24, which more obviously are perfect multiples of (Thr-Gly)3? The answer probably lies in the sequences just before and just after the uninterrupted Thr-Gly repeat, that are similar to Thr-Gly, and may also generate additional β-turns (at least bioinformatically).
Biochemistry of the Thr-Gly region
The PER domains shown in Figure 2(A) reveal that the Thr-Gly repeat is immediately flanked by two regions called CRs [CR5 (for CR-5′) and CR3 (for CR-3′)]. These are ‘coevolving regions', which are variable between species and play a role in compensating for the different Thr-Gly lengths among species. When the repeat of a long Thr-Gly species (Drosophila pseudoobscura, with >200 residues in the Thr-Gly region) is placed adjacent to the CR5 that immediately flanks the repeat from a shorter Thr-Gly repeat species (D. melanogaster, ~60 residues), the PER protein is non-functional, in that it cannot rescue rhythmicity of D. melanogaster per0 mutants  (Figure 2E). However, if the main coevolving region (60 residues upstream of the repeat) is made conspecific with the repeat (both from D. pseudoobscura) in an otherwise D. melanogaster PER protein, the chimaeric PER protein functions normally in per0 D. melanogaster transformants. If the adjacent 30 residues upstream of CR5 are made conspecific to the repeat, then the chimaeric PER functions extremely well, in that all host D. melanogaster per0 transformant flies are rhythmic, but the period of the rhythm is dramatically temperature-sensitive  (Figure 2E). Thus the Thr-Gly repeats and CRs are dynamic regions from an evolutionary perspective, which both inter- and intra-specifically have functions related to circadian temperature compensation. However, the region immediately adjacent to the downstream CR3, indeed the very next residue, is the start of a 55-residue domain to which the DBT kinase binds and phosphorylates PER . Could the Thr-Gly repeat and CRs be acting to modulate the binding and subsequent phosphorylation of PER, with subtle implications for temperature compensation? Recent work in Neurospora has revealed that circadian temperature compensation is mediated by casein kinase 2 (related to DBT, which is a casein kinase 1ε), which phosphorylates FRQ (FREQUENCY), the major negative regulator of the fungal circadian transcriptional activators . Might natural adaptive circadian temperature compensation in Drosophila be similarly regulated by PER phosphorylation? This remains to be tested.
The length polymorphism for the Thr-Gly region of PER represents a widely cited example of how differences in behaviour among populations can be explained in terms of genetic variation shaped by adaptation to the local selective pressures. More specifically, the Thr-Gly region of the PER protein has provided a model system for studying how coding repeats evolve by duplications and deletions and not by point mutations, as was still believed in the 1980s and early 1990s. These mutational changes occur many orders of magnitude faster than point mutations and so repeats, particularly coding repeats, can provide a dynamic substrate for evolutionary change. Indeed the Thr-Gly and associated regions have been implicated in species-specific phenotypes, such as courtship songs and intra-specific variation in circadian temperature stability. Thus this region has served as an example for an interdisciplinary study of a genomic region that includes behaviour, genetics, molecular biology, population genetics, ecology and structural biology. The main omissions at this point are the biochemical and neuroanatomical explanations for how the Thr-Gly regions can modulate circadian temperature compensation or song cycles. However, these additional levels are within reach, and will conceivably maintain the per gene as one of the flagship examples for the single-gene dissection of complex behavioural phenotypes.
• Circadian rhythms (~24 h) in biochemistry, physiology and behaviour are found in almost all eukaryotes and some bacteria.
• The identification of clock genes provides an opportunity to assess whether natural variation in these genes is under natural selection.
• In the fruitfly D. melanogaster, latitudinal patterns of variation in the coding regions of the clock gene period have been shown to have important phenotypic consequences.
• The phenotypes of the per gene variants adapt the flies to their parti-cular thermal environments.
• Mathematical analyses of sequence variation in these clock genes are consistent with the role of natural selection maintaining this genetic variation.
We thank the Biotechnology and Biological Sciences Research Council (BBSRC), the Natural Environment Research Council (NERC), the European Community and the Royal Society, who have funded our work over the years.
- © The Authors Journal compilation © 2011 Biochemical Society