As one of the most important non-timber forest products in the world, bamboo represents the only major lineage of grasses that is native to forests, as well as some of the fastest-growing plants on Earth. Meanwhile, as an important resource of versatile raw products, Bamboo is of notable economic and environmental significance. There are about 2.5 billion people in the world depending on bamboo for their daily lives and international trade volume on bamboo amounts to 2.5 billion US dollars per year. Among various bamboo species, Moso bamboo (Phyllostachys edulis) is the most common one, which accounts for ~ 70% of the total bamboo growth area in Asia. As a valuable species with economic, ecological and social benefits, it is of great importance to get a broad understanding of the bamboo genomic feature.
The content of BGP mainly includes two aspects: 1) the draft genome of moso bamboo; 2) comparative genome-wide analyses of bamboo to other grass species. The moso bamboo genome contains 24 pairs of chromosomes (2n = 48) and is characteristic of a diploid. We conducted a flow cytometry analysis and estimated that it had a genome size of 2.075 Gb (2 C = 4.24 pg), which was very close to that estimated in a previous report. Because it is difficult to generate an inbred line of moso bamboo, owing to its infrequent sexual reproduction and the long periods of time between flowering intervals, we selected five plants from a single individual rhizome of the moso bamboo ecotype (Ph. edulis) and performed whole-genome shotgun sequencing. We generated 295 Gb of raw sequence data (approximately 147-fold coverage), including Illumina short reads and 10,327 pairs of BAC end sequences. The final assembly of 2.05 Gb was generated using the de novo Phusion-meta assembly pipeline that was developed in this study. The N50 length of the assembled scaffolds was over 328 kb, and about 80% of the assembly mapped to 5,499 scaffolds of greater than 62 kb in length (Table 1)
|Table 1 Statistics of assembly and annotation for the moso bamboo genome|
|Total length*||2,051,719,643 bp|
|N50 length (contigs)||11,882 bp|
|N50 length (scaffolds)||328,698 bp|
|N80 length (scaffolds)||62,052 bp|
|Number of scaffolds (>N80 length)||5,499|
|Largest scaffold||4,869,017 bp|
|Number of protein-coding genes||31,987|
|Average length of protein-coding genes||3,350 bp|
|Total size of transposable elements||1,210,862,930 bp|
|Content of transposable elements||59.0%|
|*Final scaffolds with less than 500 bp were excluded.|
We predicted 31,987 protein-coding genes in the moso bamboo genome, with the support of RNA sequencing (RNA-seq) data (127 Gb) obtained from 7 bamboo tissues and 8,253 bamboo full-length cDNA sequences. Most basic metabolic pathways among the grass species were compared by aligning the annotated protein sequences to the KEGG data set, which showed high similarity between bamboo and rice. We also annotated 1,167 tRNA, 279 rRNA, 321 small nucleolar RNA, and 173 small nuclear RNA. De novo repeat annotation showed that approximately 59% of the moso bamboo genome consists of transposable elements, a proportion that was much higher than the previous estimation (23.3%) in the analysis of survey sequences9. The most abundant repeats were long-terminal repeat elements (LTRs), including 24.6% Gypsy-type LTRs and 12.3% Copia-type LTRs. When we used the sequences of the eight moso bamboo BACs, we observed that 52% of the genomes consisted of transposable elements.