Recently, Professor Zhao Shuhong’s team from College of Veterinary Science and Medicine of HZAU has developed a Genome-Wide Association Study (GWAS) software rMVP, which is parallel-accelerated, memory-efficient and visualization-enhanced. It uses block matrix computing, parallel acceleration, computational flow optimization and other strategies to improve the computational efficiency of GWAS, uses memory mapping technology to reduce the dependence on memory resources, and provides flexible genomic big data processing and visualization modules, providing a new tool for the research of GWAS.
GWAS is an important method to identify candidate genes for target traits by statistically examining the degree of association between phenotypes and high-density markers covering the entire genome. In the past decade, GWAS has been widely used in human, animal, aquatic, plant and microbial research fields, and has identified many key genes affecting human diseases and agricultural economic traits, becoming a powerful tool to resolve the genetic mechanism of complex traits. With the reduction of sequencing cost and the development of phenome technology, the number of populations and phenotypes and the marker density which are used for GWAS are rapidly increasing, especially the hybrid linear model, relying on individual kinship matrices, whose memory consumption shows an n2 increase (n is the number of individuals). These huge data bring lots of new challenges to GWAS.
rMVP software is memory-efficient, visualization-enhanced, and parallel-accelerated. Through memory mapping technology, rMVP software can obtain data directly from disk, which effectively reduces memory consumption during GWAS analysis and avoids data copy problems during parallel computation of multiple processes. It uses the dual parallel mode of "thread-level" and "data-level" based on OpenMP technology and Intel MKL technology to accelerate the computation, and avoids the repeated computation of large matrices with the strategies of block matrix splitting and global optimization of GWAS process, which makes rMVP about 5-20 times faster than the corresponding models in PLINK, GEMMA and FarmCPU_pkg.
In addition, rMVP software has also produced a set of high-quality visualization tools that are compatible with the analysis results of PLINK, GEMMA, GCTA, TASSEL and other software. It can also yield a variety of high-resolution images including phenotype distribution, marker density distribution, population structure, Manhattan diagram, and QQ diagram. It adopts a circular Manhattan plot to display multi-trait and multi-model results in an innovative manner and introduces marker density information into the Manhattan diagram for the first time.
The co-first authors of this study are Yin Lilin, a postdoctoral fellow from our university, and Zhang Haohao, a doctoral student from Wuhan University of Technology, with Prof. Liu Xiaolei and Prof. Li Xinyun as co-corresponding authors.
rMVP软件:https://github.com/xiaolei-lab/rMVP
论文链接:https://www.sciencedirect.com/science/article/pii/S1672022921000504
Source:http://news.hzau.edu.cn/2021/0304/59488.shtml
Translated by: Li Juan
Supervised by: Xie Lujie