Glycine Pan-Genome Page

In this form you can BLAST your sequences to individual cultivar transcript or gene sequences in the SoyBase Soybean Cultivar Genomes Collection or perform a Pan-genome analysis by comparing your input sequence to all the protein or nucleic acid sequences of all of those contained in the SoyBase Soybean Cultivar Genomes Collection. You can also download individual cultivar's genomic or gene model sequences

From this form you can download soybean cultivar genomic, gene model and protein sequences from the SoyBase soybean cultivar genomes collection. The results will be made into a file and transfered to your computer to save.

In contrast to the pan-genome, pan gene sequences represent the gene compliment of each of the genomes considered in a pan-genome set. Currently the pan gene complement of Williams 82, Zhonghuang 13, W05, PI483463 and Lee have been collected and homologous genes assembled into pan-gene sets. Pan-gene analysis of multiple genomes often identify gene models that are unique to one or more of the genomes compared. In these cases a query gene may not be a member of a pan-cluster if it is unique to that genome. Also it is possible that every genome in the comparison may not contain a homolog of the query gene model. Identification of unique genes or genes that have been duplicated in some genomes may shed light on phenotypes shared by those cultivars.

In the tool below, you can enter a name of a gene model from Williams 82 (Glyma.Wm82), Zhonghuang 13 (SoyZH13), W05 (Glysoja), Lee (GlymaLee), PI483463 (GlysoPI483463) or the GenBank RefSeq (LOC) genome assemblies. The tool will return the pangene set that the query gene model is a member and all of the other members of the set. The assembly and annotation version of the gene models will also be returned.

Advanced API Features

Pangene service (API)

We have provided an API to programmatically return sets of corresponding genes, given a query gene.

There are two variants of the service.

Provide a gene ID, with or without the associated prefix (genus-species.accession.assembly.annotation).

Example:
https://soybase.org/api/v1/panclusters/main/GlysoPI483463.05G006100
Provide a gene ID from one accession and annotation, and a second accession and annotation version from which you want the corresponding gene.

Example:
https://soybase.org/api/v1/panclusters/main/glyma.Wm82.gnm4.ann1.Glyma.05G006700/glyso.PI483463.gnm1.ann1

Please note, for bulk access, we recommend downloading the entire pangene set, from the six genomes described above.

To help keep the assemblies and annotations distinct, we use the following structure for gene names:

glyma.Wm82.gnm2.ann1.Glyma.01G123600.1

The first field, glyma or glyso, indicates the genus and species (Glycine max or Glycine soja).
The second field, Wm82, indicates the genotype - e.g. Wm82, Lee, Zh13, etc.
The third field indicates the assembly number, e.g. gnm1, gnm2, gnm4 for assembly numbers 1, 2, or 4.
The fourth field indicates the annotation, e.g. ann1 or ann2.
The fifth field indicates the gene name within the assembly and annotation, e.g. Glyma13g29860, GlymaLee.13G190100, or LOC100809005 (which all correspond via pangene cluster SoyPan000430).

The pan genome of soybean can viewed using the Genome Context Viewer. This tool allows you to see the gene context of a selection of soybean cultivars. Each genome is displayed centered on a query gene model. The tool allows you to visualize the genome context of the six genomes described above and one from Liu et al. 2020 that examined 26 soybean accessions. Detailed descriptions of how to use this tool are available at the Legume Federation website and at the project's GitHub repository

This pangenome data set, for download, includes genome-wide variant (VCF) data for 1007 soybean accessions, as well as predicted variants (and effects) within genes, indicating which accessions have the variants from Torkamaneh et al., 2020

This pangenome data set, for download, includes genomic sequence and gene variants present in 204 diverse accessions of Glycine max but not present in the reference assembly G. max Williams 82 v4. From Torkamaneh, Lemay, and Belzile, (2021): "The Pan-genome of the Cultivated Soybean (PanSoy) Reveals an Extraordinarily Conserved Gene Content."

Bayer et al. sequenced 1000 accessions from the USDA Soybean Germplasm Collection and assembled the genomes using the cultivar Lee as reference. The collection included wild and cultivated strains to assess genome-wide gene changes due to domestication.

SoyBase Genome Viewer representation of Lee gene presence/absence in 1000 accessions

Files associated with this project

Funded by the USDA-ARS. Developed by the USDA-ARS SoyBase and Legume Clade Database group at the Iowa State University, Ames, IA

U.S. Public Domain

Cultivar Lee	Wm82.a2.v1
Zhonghuang 13	All Listed G.max Genomes