In a major advance for agricultural science, researchers have developed a new computational tool designed to swiftly and efficiently expose genetic diversity within DNA databases of various plant species.[1]
Credit: © 2024 KAUST; Heno Hwang.
In a major advance for agricultural science, researchers have developed a new computational tool designed to swiftly and efficiently expose genetic diversity within DNA databases of various plant species.[1]
The open-source platform is poised to accelerate the discovery of genetic variations that are key to developing crops with improved resilience, yield and nutritional value.
Harnessing advanced algorithms and the capabilities of high-performance computing (HPC), the KAUST team, led by plant genomicist Rod Wing, demonstrated the tool’s ability to detect small DNA differences — so-called single nucleotide variants (SNPs) — across various strains of rice, maize, soybean and sorghum.
In the case of the rice investigation, for instance, the team employed the tool on a complex genetic dataset of DNA sequences from thousands of distinct accessions — a comprehensive “pan-genome” that the researchers had previously helped to assemble for Asian rice (Oryza sativa). Using this dataset along with the group’s novel analytical method, the KAUST researchers uncovered more than 2 million genetic variants previously overlooked by conventional interrogations of a single reference rice genome.
This marks an initial step towards unlocking new avenues in crop enhancement and sustainable agriculture, notes plant geneticist and study co-author Yong Zhou. “These hidden SNPs could now be utilized for breeding programs immediately and also to identify novel functional genes for agricultural traits,” he says.
The discovery of SNPs in this manner can also help to reveal genetic and evolutionary connections among different rice lineages. Recently, Wing and Zhou spearheaded the creation of a high-quality reference genome for Hassawi red rice, a crop indigenous to Saudi Arabia known for its resilience to local drought and high-salinity conditions. Using the tool, the researchers were able to establish a genetic link between Hassawi rice and a subgroup of rice that includes varieties originating from Australia, India and parts of Southeast Asia.[2]
Key to the performance of the tool — named the high-performance computing genome variant calling workflow, or HPC-GVCW — is the ability to divide large chunks of the genome into discrete bits and then to rely on parallel processing technologies to solve complex computing problems on large-scale multidimensional genomics data.
“This reduces the execution time massively,” says study co-author Nagarajan Kathiresan, a computational scientist, “making it able to process 3,000 genomes within 24 hours.”
With more genomes now getting sequenced than ever before, Zhou adds, the new tool should prove invaluable for streamlining their analysis to empower next-generation crop breeding.
REFERENCE
- Zhou, Y., Kathiresan, N., Yu, Z., Rivera, L.F., Yang, Y., Thimma, M., Manickam, K., Chebotarov, D., Mauleon, R., Chougule, K., Wei, S., Gao, T., Green, C.D., Zuccolo, A., Xie, W., Ware, D., Zhang, J., McNally, K.L. & Wing, R.A. A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset. BMC Biology 22, 13 (2024).| article
- Sedeek K., Mohammed N., Zhou Y., Zuccolo A., Sanikommu K., Kantharajappa S., Al-Bader N., Tashkandi M., Wing R.A. & Mahfouz M.M. Multitrait engineering of Hassawi red rice for sustainable cultivation. Plant Science 341, 112018 (2024).| article
Journal
BMC Biology
DOI
10.1186/s12915-024-01820-5
Article Title
A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset
Article Publication Date
25-Jan-2024