VRPG     VRPG: an interactive web viewer for reference-projected pangenome graph
Introduction

With high-quality reference assemblies continuously expanding at both population and species levels, researchers began to use pangenome graph to better represent the population- and species-wide genomic variation landscapes in a sequence-resolved manner. Compared with the conventional single linear reference genome, pangenome graph offers more sensitive and accurate read mapping and variant discovery, especially in the presence of sequence polymorphisms and structural variants. Therefore, a population- or species-wide representative pangenome graph is expected to shed novel insights into the interpretation of the genotype-to-phenotype association and the discovery of missing heritability. An intuitive visualization of pangenome graph can greatly assist researchers to explore and understand the global and local genomic variation in their graph representation. Here we present VRPG, a web-based interactive viewer for reference-projected pangenome graphs. VRPG provides dynamic and efficient support for exploring and annotating pangenome graph built upon hundreds of whole genome assemblies along a reference-based coordinate system. Users can navigate the whole pangenome graph in real time through a web browser and easily find out information about coordinate and copy number of any sequence segment denoted by node on the displayed graph. Users can also simplify the pangenome for better visualization of structure variation, which is useful for graphs integrating large amounts of small variants, such as graphs created by Minigraph-Cactus and PGGB. VRPG also provides functionalities for interactively highlighting nodes, edges, and directed paths in the pangenome graph, which can be important in presenting and understanding pangenome graphs. Given the current surging adoption of pangenome graph in genomics studies, we believe VRPG will become a highly useful assistant to help researchers to better explore pangenome graph as well as the biology behind it.

As demonstrations, we pre-shipped four reference-projected pangenome graphs (one for yeast and the other three for human) with this web server. The yeast (Saccharomyces cerevisiae) pangenome graph was built upon 163 genome assemblies from 142 strains. The three human (Homo sapiens) pangenome graph were built by 90 genome assemblies from 46 samples.

The source code of VRPG is available at (https://github.com/codeatcg/vrpg) under the MIT open-source license, with which users can visualize their own reference pangenome graphs locally.

The yeast reference pangenome graph

The yeast reference pangenome graph was constructed by Evomics Lab. This graph was built upon 163 yeast genome assemblies from 142 strains. Briefly, we took the S. cerevisiae reference genome (denoted as 'SGDref') retrieved from the Saccharomyces genome database (SGD) as well as 162 assemblies from our recently released S. cerevisiae Reference Assembly Panel (ScRAP; https://www.biorxiv.org/content/10.1101/2022.10.04.510633v2) to construct reference pangenome graph by using minigraph with the command ‘minigraph -cxggs -l 5000’. With the SGDref as the reference genome, we incrementally added those 162 assemblies of other S. cerevisiae strains into the graph according to their phylogenetic distances to SGDref. We further mapped each input assembly to the graph by minigraph with the command ‘minigraph -cxasm’ to calculate the corresponding mapping depth.

The human reference pangenome graph

Human Pangenome Reference Consortium (HPRC) constructed three human pangenome graphs using Minigraph, Minigraph-Cactus, and PGGB respectively based on 90 genome assemblies from 46 samples. We retrieved the human Minigraph-Cactus and PGGB pangenome graph from the HRPC’s Github page (https://github.com/human-pangenomics/hpp_pangenome_resources) and downloaded a slightly updated version of the Minigraph pangenome graph provided by Heng Li (https://zenodo.org/record/6983934#.Y767A3ZByUk). For Minigraph-Cacuts and PGGB graphs the segment mapping depth was calculated by VRPG sub-module gfa2view. For Minigraph graph the whole genome assemblies used for building this pangenome graph were download from NCBI and were mapped to the pangenome graph by Minigraph with the command ‘minigraph -cxasm’ to calculate the corresponding mapping depth.

Users' own reference pangenome graph

To prepare users' own reference pangenome graph and its visualization with VRPG, please refer to the instructions at https://github.com/codeatcg/VRPG.

Cite the work

Zepu Miao, Jia-Xing Yue. (2023) VRPG: an interactive web viewer for reference pangenome graph. BioRxiv, (submitted; doi: https://doi.org/10.1101/2023.01.20.524991; demonstration available at https://www.evomicslab.org/app/vrpg; software available at https://github.com/codeatcg/VRPG) [LINK]