微卫星是真核生物基因组中的一类高度重复的序列,一般分布在内含子区和基因间隔区中,但基因编码区也含有一定数量的微卫星。为探讨含有微卫星的基因表达频率是否偏低,对NCBI公共数据库中的421 725条杨树EST序列进行了分析,结果发现:其中53 524条EST序列中含有微卫星,含微卫星的EST序列比例是1269 %;而杨树基因组注释的45 555个基因中,有6 953个基因含有微卫星,含微卫星的基因占基因总数的比例为1526 %。对两样本频率进行差异显著性检验,结果显示微卫星在表达序列中的发生频率显著低于在注释基因中的发生频率(p<001),这说明含有微卫星的基因总体上表达水平偏低。 而对表达序列中微卫星的特征进行分析的结果显示,三碱基重复微卫星含量最丰富。在此,笔者提出了基因组中含有微卫星的基因可能总体表达水平偏低的假说,并利用杨树公共数据库中海量DNA序列对这一假说进行了验证。
Abstract
Microsatellites are highly repetitive sequences in eukaryotic genomes, which are commonly found in the intronic and intergenic regions. The genic regions also contain a number of microsatellites. Microsatellites are the most variable sequences in the genomes of different organisms. Mutation in microsatellite sequences will lead genes to produce shorter or completely different proteins. Thus, genes contains microsatellites would be strongly affected by selection. Low expression level is supposed to be one of the mechanisms that relax the selection against the corresponding genes and help their survival. In this paper, we analyzed 421 725 poplar ESTs in the publicly available NCBI database and detected 53 524 ESTs contained microsatellites, accounting for 1269 % of the investigated ESTs. Whereas in the 45 555 gene models annotated from the poplar genome sequences, 6 953 genes contained microsatellites, accounting for 1526 % of the total genes. Based on the frequency test between the EST database and gene database, microsatellites were found to occur with significantly lower frequency in ESTs than in annotated genes (p<001). Therefore, the results proved that the microsatellites frequency in expressed genes was lower than that of the expected level for all genes. The characteristics of microsatellite in ESTs were also explored in this study. The result showed that triplets were the most frequent microsatellites in ESTs. In this paper, the hypothesis that genes containing microsatellites might have low expression level is proposed for the first time. Meanwhile, a large number of ESTs are analyzed to verify this hypothesis. This study provides important evidences for us to understand the survival mechanism of microsatellites in genes.
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]何平.真核生物中的微卫星及其应用 [J]. 遗传, 1998, 20(4): 42-47.
[2]Thibodeau S N, Bren G, Schaid D. Microsatellite instability in cancer of the proximal colon [J]. Science, 1993, 260(5109): 816-819.
[3]Aaltonen L A, Lauri A, Leach F S, et al. Clues to the pathogenesis of familial colorectal cancer [J]. Science, 1993, 260(5109): 812-816.
[4]Ionov Y, Miguel A, Peinado, et al. Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colon carcinogenesis [J]. Nature, 1993, 363: 558-561.
[5]林武华,孙念绪.散发性结直肠癌与微卫星不稳定性的关系 [J]. 武警医学院学报, 2003, 12(3): 231-233.
[6]Powell W, Machray G C, Provan J. Polymorphism revealed by simple sequence repeats [J]. Trends in Plant Science, 1996, 1(7): 215-22.
[7]Li S X, Yin T M. Map and analysis of microsatellites in the genome of Populus [J]. Science in China Press, 2007, 50(5): 690-699.
[8]Li S X, Yin T M, Wang M X, et al. Characterization of microsatellites in the coding regions of the Populus genome. Molecular Breeding[J/OL].2009. DOI: 101007/s11032-010-9413-5. http://www.springerlink.com/content/y86485k50j405470/.
[9]Tuskan G A, Di Fazio S, Jansson S, et al.The genome of black cottonwood, Populus trichocarpa (Torr.& Gray)[J]. Science, 2006, 313(5793): 1596-1604.
[10]Jewell E, Robinson A, Savage D et al. SSR primer and SSR taxonomy tree: biome SSR discovery[J]. Nucl Acids Res 2006,34:656-659.
[11]李春善,王志和,王文林.生物统计学[M].2版.北京:科学出版社,2000.
[12]Svetlana T, Genevieve D, Angelika L, et al. Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential [J]. Genome Research, 2001, 11: 1441-1452.
[13]Toth G, Gaspari Z, Jurka J. Microsatellites in different eukaryotic genomes: survey and analysis [J]. Genome Research, 2000, 10(7): 967-981.
[14]Streelman J T, Kocher T D. Microsatellite variation associated with prolactin expression and growth of salt challenged Tilapia [J]. Physiological Genomics, 2002, 9(1): 1-4.
[15]Weber J L. Informativeness of human (dC-dA)n (dG-dT)n polymorphisms [J]. Genomics, 1990, 7: 524-530.
[16]Katti M V, Ranjekar P K, Gupta V S. Differential distribution of simple sequence repeats in eukaryotic genome sequences [J]. Molecular Biology and Evolution, 2001, 18(7): 161-1167.
[17]Morgante M, Hanafey M, Powell W. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes [J]. Nature Genetics, 2002, 30(2): 194-200.
基金
收稿日期:2010-08-22修回日期:2010-12-04基金项目:江苏省高校自然科学基金重点项目(10KJA180018);国家自然科学基金项目(31070543, 30971609)作者简介:刘菁菁(1985—),博士生。*尹佟明(通信作者),教授,长江学者。Email: tmyin@njfu.com.cn。