The distribution of RPKM of rice genes ranged from 0 to over 104, genes involved in photosynthesis in the shoot or in regulation of physiological metals in the root were highly expressed, whereas about 30% of genes had RPKM 1. The satura tion of sequencing in rice was almost www.selleckchem.com/products/ganetespib-sta-9090.html the same as in a previous mammalian analysis. Accord ing to that analysis, one transcript in a cell corresponds to 1 to 3 RPKM, so genes having RPKM 1 might rarely be expressed. However, data on the RNA content of each rice cell are required to calculate the number of existing molecules of RNAs. As rice tissue contains cells of various sizes and types, the relationship between the number of existing molecules and their RPKM has not yet been accurately determined.
When we used four technical replicates, about 20% of genes expressed at relatively low levels did not reach their final RPKM, suggesting that these model set tings were insufficient for calculating the real RPKM of genes expressed at low levels. Summing of the four technical replicates covered 70. 1% of all annotated regions, corresponding to 15. 8% of 389 Mb of the rice genome. This result suggests that these regions were transcriptionally active under the experimental conditions. Even though the cumulative coverage was close to a plateau, the coverage rose gradually, the accumulation of about 95 million reads covered 77. 0% of annotated regions, suggesting that some of the reads expressed at low levels were not sequenced.
However, the gradual increase in coverage might have been due to the presence of contaminated genomic DNA or a very small amount of partly processed nuclear RNAs, because intron retention is the most preva lent alternative splicing form in rice, as it is in Arabi dopsis thaliana. Thus, we consider that the summing of four technical replicates of 36 bp reads, corresponding to a total of 1 Gbp of filtered sequences, covered almost all the transcripts in the rice cell under the experimental conditions, although more reads are required to obtain the final RPKM of genes expressed at relatively low levels. Identification of unannotated transcripts by mRNA sequencing mRNA Seq provides information on whole transcribed genes without the need to rely on annotation, whereas array technology is limited to providing data only on those previously annotated genes and on pre viously identified ESTs with no known homologies that have corresponding probes on the array.
On the basis of the piling up of mapped reads, we predicted 2,795 and 3,082 currently unannotated tran scripts in RAP DB. Of the RAP2 unannotated transcripts, 54. 6% in shoot and 53. 8% in root had not been annotated by Michigan State University, suggesting that these transcripts were novel transcripts. Unannotated transcripts included extended parts of previously annotated genes. Extension of 5 exons might contribute to the making of a different start codon or the shifting of Anacetrapib the reading frame of pre viously annotated genes.