Recently, the cotton genetic improvement team of HZAU published a paper entitled “Cotton pan-genome retrieves the lost sequences and genes during domestication and selection”, showing us the cotton genetic variation data set with the most abundant variation types so far. The paper also analyzes the genomic basis of cotton domestication and improvement from multiple scales, providing new loci for biological research on the formation of important traits in cotton, and offering new ideas for precise improvement of important traits in cotton from a pan-genomics perspective.
At present, cotton is an important cash crop widely planted in the world and the main source of natural textile fiber. It has always been the goal of breeders to breed an ideal plant type of cotton which can bring high yield and high fiber quality, with strong ability of disease/insect resistance and high temperature tolerance. In recent years, a large number of genomic data has been generated in cotton genomics research, which analyzed the genetic contribution of artificial domestication to cotton trait improvement, and identified a number of agronomic traits related loci.
In the previous study, the research team aligned high-quality reference genomes of G. hirsutum “TM-1” with G. barbadense “3-79” (Nature Genetics, 2019), which provided a good reference sequence for large-scale population genome variation analysis and excellent allele identification. However, relying on a single reference genome analysis will leave out many genetic variations, so it is necessary to detect the genetic diversity between various types of G. hirsutum and G. barbadense from the perspective of the population genome (pan-genome).
Variation groups provide data resources for functional genomics research
Multi-scale analysis of QTLs related to agronomic traits and genomic divergence in cotton domestication and improvement
In the study, the team constructed a genetic Variome for 1913 cotton samples, including 63 million single nucleotide polymorphisms (SNP), 4.9 million small insertion/deletion variations (InDel) and 290,000 structural variations (SV). They made a comprehensive investigation into cotton population traits from multiple scales. After analyzing genomic divergence in domestication and improvement, the team identified 162 QTLs related to 16 traits such as fiber quality, yield and flowering period.
Analysis of genetic basis of fiber domestication and improvement by Pan genomic mapping
PAV Selection signals in domestication and improvement
Based on the reference genome alignment strategy, the research team constructed the pan-genome of G. hirsutum (3388Mb), including 63,489 (61.8%) core genes and 39,278 (38.2%) variable genes. At the same time, the pan-genome of G.barbadense (2575Mb) was constructed, including 68789 (85.8%) core genes and 11359 (14.2%) variable genes. Gene frequency analysis in wild species and cultivated species showed that 6231 genes were selectively retained and filtered during domestication and improvement.
Finally, using pan-genomic data, the researchers probed into the frequency changes of several genes associated with traits such as fiber quality during domestication and improvement.
The research paper was published in Genome Biology, an international academic journal in genomics, with Li Jianying, postdoctoral fellow of HZAU as the first author, Prof. Wang Maojun and Jin Shuangxia from the cotton genetic improvement team as co-corresponding authors. Prof. Zhang Xianlong participated in the research design and paper revision. This research project was supported by the National Natural Science Foundation of China and the China Postdoctoral Science Foundation.
Source: http://news.hzau.edu.cn/2021/0424/60034.shtml
Translated by: Pan Yamei
Supervised by: Jin Bei