The nucleotide composition peculiarities of protein-coding exons of human genome
Institute of cell biophysics of RAS, 142290 Pushchino, Institutskaya 3, e-mail; firstname.lastname@example.org
Exploration the distribution specificity of microsatellites and other repeated nucleotide sequences in the eukaryotic genes remains an important task in elucidating the mechanism of the “structure-function” organization of a DNA molecule. Earlier we noted that the observed domination of AT Watson-Crick base pairs in mononucleotide and mixed sequences is a common property of genomic DNA. Most likely It is a consequence of the initially reduced variability of the form of AT complementary pairing to compare with GC-pairs. The feature of AT-pairs provides a naturally greater reliability of genetic processes in the cell, and primarily the replication process.
For comparison, our spectral analysis of the A/T and G/C nucleotide tracts distribution in the structure of the relic eukaryote genome - the Horseshoe-crab (Limulus polyphemus) showed the existence of such a non-trivial specific structure of DNA molecule even in this organism, which lived about 500 million years ago.
It is of interest to further study the effect of the "deficit" of nucleotide sequences from GC pairs on the distribution of A/T and G/C tracks now separately in the structure of exon, intron and intergenic regions of genomic DNA. In this paper, using the methods of comparative genomics, such an analysis was performed for the structure of the human genome. As a result, a local inversion of the appearance of A/T and G/C nucleotide tracks was found in the exons structure of protein-coding regions of all the chromosomes, except the Y chromosome and mitochondrial DNA. The dominant ones were microsatellite homonucleotide (dG)n and (dC)n and mixed (G/C)n tracts up to 10 base pairs long. In addition, a clear differentiation of the chromosomes was revealed both in terms of the increased GC-content of their exons and the degree of increased occurrence of G/C-tracks in them.
A detailed frequency analysis of the distribution of very short microsatellites of various compositions (A, T, G, C) showed a common and unusually rare occurrence of CG (CpG) dinucleotides in all chromosomes. This feature of the statistics of the pairs, the so-called "hot spots", has retained its character in exon regions. The use of computer chemistry methods to assess the specificity of the spatial folding of dinucleotides in the structure of complementary DNA duplexes made it possible to establish two main types of mutual orientations of the bases in piles. So, for frequently encountered dinucleotides such as AA, TT, and AT, the usual, rather compact packing of complementary pairs was obtained with parameters close to those of the classical DNA B-form. But for CpG and some other dinucleotides that are also not so common, a different, less ordered conformation was realized. Characteristic for her was a noticeable deviation of the laying of the bases from the stacking form in the structure of the spiral duplex.