A) Expressed Sequence Tag (EST) Analysis
There are huge numbers of genes in our genome yet only few of them express to synthesis mRNAs which encode different proteins. These mRNAs are collectively called as transcriptome and mRNA can be reverse transcribed into cDNA, which provides evidence for all mRNA transcripts. Hence, mRNA and cDNA are crucial for gene expression profiling and transcriptome study.
Expressed sequence tags (ESTs) are short, unverified nucleotide fragment usually of 200-800 nucleotide bases. It is randomly selected by single-pass sequencing of either the 5’- or 3’-end of cDNA derived from cDNA libraries that constructed based on mRNA of specific gene. EST data sets has been recognized as the ‘poor man’s genome’ because EST data are widely used as a substitute to the genome sequencing.
There are several steps involved in EST generation. First, mRNAs isolated from specific cell line will be reverse transcribed to double-stranded cDNAs by using reverse transcriptase enzyme. cDNAs are then ligated into plasmid vector and cloned in order to get multiple copies of the cDNA for libraries construction. After then, ESTs can be generated by random sequencing of the cDNA clones with single-pass run from both 5’- and 3’-end directions, with no full-length read. The redundancies of the ESTs set can be reduced by normalization. EST data can be retrieved from different network interface such as UniGene from NCBI, TIGR, Cancer Genome Anatomy Project, ESTree, and dbEST at NCBI.
There are 5 stages involved in EST sequence analysis as described below:
1st: EST pre-processing • Lessen overall noise in EST data and enhance efficiency of downstream analysis.
• Identify and remove vector-sequence contaminants by referring non-redundant vector database.
• Cover repetitive elements (LINEs, SINEs, LTRs, SSRs) and low complexity sequences
• Generate high quality EST sequences
2nd: EST clustering and assembly • Reduce redundancy by assigning overlapping ESTs from the same transcript into a distinct group, representing specific gene.
• Two approaches: stringent and loose.
• Stringent clustering is more accurate in EST grouping but forms shorter sequence that covers less of the expressed gene sequence.
• Loose clustering generates longer sequence EST but less accurate.
3rd: Conceptual translation of ESTs • Translating EST sequencing into protein conceptually by identifying the ORFs or protein coding regions
• OrfPredictor, ESTScan, DECODER, etc.
4th: Functional annotations • Prediction of protein functions by referring non-redundant protein, motif, and family databases.
Application of EST Analysis
ESTs analysis has been used for genome mapping, gene-based site markers mapping, and large-scale study of co-expressed genes. Besides, EST technique is used in gene discovery, transcript and single nucleotide polymorphism (SNP) analysis, and functional annotation of presumed gene products. Other applications including...