VRPG     VRPG: an interactive web viewer for reference-projected pangenome graph
Naming scheme of input assemblies

The naming scheme of input assemblies in VRPG follows the PanSN prefix naming pattern in general, although with a bit relaxation. The assembly name should consist of three parts: sample tag, delimiter, and haplotype tag. For our demonstration cases, we used "#" as the delimiter. Regarding the haplotype tag, for the yeast reference pangenome graph, we used "HP0" to denote haplotypes of haploid or homozygous diploid strains, while using "collapsed", "HP1", and "HP2" to denote collapsed, or the two phased haplotypes of heterozygous diploid strains. For Mingraph graph, "0" was used to denote haplotypes of haploid or collapsed samples while "mat" and "pat" were used to denote the maternal and paternal haplotypes of the phased diploid samples. For Minigrpah-Cactus and PGGB graphs “1” and “2” were used to denote the two phased haplotypes.

Nodes and edges

Genomic segments are represented by graph nodes and further illustrated as colored blocks in the view window. Segments from different assemblies are plotted in different colors. For segments from the reference genome (pre-defined when building the reference pangenome graph), the size of the corresponding block is proportional to the actual size of the sequence segment. For segments from non-reference assemblies, the corresponding block will be adjusted or even curved for better visualization and therefore its displayed size might not be proportional to the actual sequence segment size. The connections between different genomic segments are represented by graph edges and further illustrated as thin lines.

Basic view and highlighting

When clicking on a graph node, the node block will become thicker and the information about the corresponding genomic segment will be reported on the right panel. When clicking on a graph edge, the edge will be colored in re and its color will revert to black on a second click. When selecting a specific genome assembly in the "Highlight" field, the assembly-to-graph path of the selected assembly will be highlighted in red.

Information of nodes

The Start and End positions are 1-base coordinates. The node depth for each assembly is calculated by aligning the corresponding assembly to the pangenome graph. A depth of 1 means the query assembly has a unique match with the genomic segment represented by the graph node. A depth of 0 means the corresponding genomic segment is absent from the query assembly. A depth of 2 or more means the query assembly has multiple copies of the corresponding genomic segment.

Contact

The VRPG software and the associated web server are developed by Evomics Lab at Sun Yat-sen University Cancer Center (SYSUCC). If you have any questions, please feel free to contact us via yuejiaxing[at]gmail[dot]com.