Introduction to Bamboo Genome Project (BGP)

As one of the most important non-timber forest products in the world, bamboo represents the only major lineage of grasses that is native to forests, as well as some of the fastest-growing plants on Earth. Meanwhile, as an important resource of versatile raw products, Bamboo is of notable economic and environmental significance. There are about 2.5 billion people in the world depending on bamboo for their daily lives and international trade volume on bamboo amounts to 2.5 billion US dollars per year. Among various bamboo species, Moso bamboo (Phyllostachys edulis) is the most common one, which accounts for ~ 70% of the total bamboo growth area in Asia. As a valuable species with economic, ecological and social benefits, it is of great importance to get a broad understanding of the bamboo genomic feature.

The content of BGP mainly includes two aspects: 1) the draft genome of moso bamboo; 2) comparative genome-wide analyses of bamboo to other grass species. The moso bamboo genome contains 24 pairs of chromosomes (2n = 48) and is characteristic of a diploid. We conducted a flow cytometry analysis and estimated that it had a genome size of 2.075 Gb (2 C = 4.24 pg), which was very close to that estimated in a previous report. Because it is difficult to generate an inbred line of moso bamboo, owing to its infrequent sexual reproduction and the long periods of time between flowering intervals, we selected five plants from a single individual rhizome of the moso bamboo ecotype (Ph. edulis) and performed whole-genome shotgun sequencing. We generated 295 Gb of raw sequence data (approximately 147-fold coverage), including Illumina short reads and 10,327 pairs of BAC end sequences. The final assembly of 2.05 Gb was generated using the de novo Phusion-meta assembly pipeline that was developed in this study. The N50 length of the assembled scaffolds was over 328 kb, and about 80% of the assembly mapped to 5,499 scaffolds of greater than 62 kb in length (Table 1)

Table 1    Statistics of assembly and annotation for the moso bamboo genome
Total length* 2,051,719,643 bp
N50 length (contigs) 11,882 bp
N50 length (scaffolds) 328,698 bp
N80 length (scaffolds) 62,052 bp
Number of scaffolds (>N80 length) 5,499
Largest scaffold 4,869,017 bp
GC content 43.9%
Number of protein-coding genes 31,987
Average length of protein-coding genes 3,350 bp
Total size of transposable elements 1,210,862,930 bp
Content of transposable elements 59.0%
*Final scaffolds with less than 500 bp were excluded.

We predicted 31,987 protein-coding genes in the moso bamboo genome, with the support of RNA sequencing (RNA-seq) data (127 Gb) obtained from 7 bamboo tissues and 8,253 bamboo full-length cDNA sequences. Most basic metabolic pathways among the grass species were compared by aligning the annotated protein sequences to the KEGG data set, which showed high similarity between bamboo and rice. We also annotated 1,167 tRNA, 279 rRNA, 321 small nucleolar RNA, and 173 small nuclear RNA. De novo repeat annotation showed that approximately 59% of the moso bamboo genome consists of transposable elements, a proportion that was much higher than the previous estimation (23.3%) in the analysis of survey sequences9. The most abundant repeats were long-terminal repeat elements (LTRs), including 24.6% Gypsy-type LTRs and 12.3% Copia-type LTRs. When we used the sequences of the eight moso bamboo BACs, we observed that 52% of the genomes consisted of transposable elements.

# This site recommends the best viewd with 1024x768 in IE8 or above