An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs

Kozak et al. (1987). Nucleic Acids Res DOI: 10.1093/nar/15.20.8125

Key findings

Analysis of 699 vertebrate messenger RNAs established (GCC)GCCACCATGG as the consensus sequence for translation initiation in vertebrates. Position -3 relative to the ATG start codon showed the strongest conservation with 97% containing a purine (61% adenine, 36% guanine, only 3% pyrimidine). Position +4 preferentially contained guanine. Site-directed mutagenesis experiments confirmed that mutations at positions -3 and +4 exert the strongest influence on translational efficiency.

Upstream ATG codons occurred in fewer than 10% of vertebrate mRNAs analyzed, with most containing only a single upstream ATG. A notable exception was proto-oncogene transcripts where nearly two-thirds contained upstream ATG codons, typically multiple, preceding the major open reading frame. Nonfunctional upstream ATG codons were frequently preceded by pyrimidines at position -3, contrasting with functional initiator codons that rarely had pyrimidines in that position.

Leader sequence length analysis of 346 vertebrate mRNAs with mapped transcriptional start sites revealed that most fell within 20-100 nucleotides. The distribution showed 29 mRNAs with 30-39 nt leaders, 36 with 40-49 nt, 38 with 50-59 nt, 37 with 60-69 nt, and 40 with 70-79 nt leaders. Only 4 mRNAs had leaders shorter than 10 nucleotides, while 68 had leaders between 100-199 nt and 19 between 200-299 nt, suggesting functional constraints on leader length.

Parts used