SoyBase Follow us on Twitter @SoyBaseDatabase
Integrating Genetics and Genomics to Advance Soybean Research

Pan-Genome Sequence Search and Data Download Page

In this form you can BLAST your sequences to individual cultivar transcript or gene sequences in the SoyBase Soybean Cultivar Genomes Collection or perform a Pan-genome analysis by comparing your input sequence to all the protein or nucleic acid sequences of all of those contained in the SoyBase Soybean Cultivar Genomes Collection. You can also download individual cultivar's genomic or gene model sequences

Select a genome to query. Pick "All Listed G.max Genomes" or "All Listed G.soja Genomes" to perform a search of all the listed genomes.

Genome Selector
Glycine max Genomes
Cultivar Lee Wm82.a2.v1
Zhonghuang 13 All Listed G.max Genomes

Glycine soja Genomes
Cultivar PI 483463 Cultivar W05
All Listed G.soja Genomes

Select the type of sequence to search. The options are the gene model coding sequences (nucleic acid) or the protein sequence (amino acid).

Sequence Type Gene model transcripts Gene model protein Sequences

Pick the type of BLAST program to run.

Select the BLAST Program to run

Copy-n-paste a gene sequence you want to compare to a cultivar or the pan-genome. You can also choose a file containing multiple FASTA records with which to search.


Or load an Example Sequence.

Clear Sequence
Click Here For The Full BLAST Interface

From this form you can download soybean cultivar genomic, gene model and protein sequences from the SoyBase soybean cultivar genomes collection. The results will be made into a file and transfered to your computer to save.

Choose a genome from which to download sequences.

Glycine max Genomes
Cultivar Lee Cultivar Zhonghuang 13
Cultivar Wm82 Assembly 2

Glycine soja Genomes
Cultivar PI 483463 Cultivar W05

Select the type of sequence to download. The options are the entire genomic sequence (nucleic acid), the gene model transcript sequences (nucleic acid), coding sequences (nucleic acid) or the inferred protein sequence (amino acid) of each transcript.

Sequence Type
Genomic Sequence
Gene model Transcripts
Gene model coding sequences
Gene model Inferred Protein Sequence

In contrast to the pan-genome, pan gene sequences represent the gene compliment of each of the genomes considered in a pan-genome set. Currently the pan gene complement of Williams 82, Zhonghuang 13, W05, PI483463 and Lee have been collected and homologous genes assembled into pan-gene sets. Pan-gene analysis of multiple genomes often identify gene models that are unique to one or more of the genomes compared. In these cases a query gene may not be a member of a pan-cluster if it is unique to that genome. Also it is possible that every genome in the comparison may not contain a homolog of the query gene model. Identification of unique genes or genes that have been duplicated in some genomes may shed light on phenotypes shared by those cultivars.

In the tool below, you can enter a name of a gene model from Williams 82 (Glyma.Wm82), Zhonghuang 13 (SoyZH13), W05 (Glysoja), Lee (GlymaLee), PI483463 (GlysoPI483463) or the GenBank RefSeq (LOC) genome assemblies. The tool will return the pangene set that the query gene model is a member and all of the other members of the set. The assembly and annotation version of the gene models will also be returned.

Pan Gene Search
Example: Glyma.01G000100

Advanced API Features

Genomic Context Viewer screenshot
GCV genomic context viewer

The pan genome of soybean can viewed using the Genome Context Viewer. This tool allows you to see the gene context of a selection of soybean cultivars. Each genome is displayed centered on a query gene model. The tool allows you to visualize the genome context of the six genomes described above and one from Liu et al. 2020 that examined 26 soybean accessions. Detailed descriptions of how to use this tool are available at the Legume Federation website and at the project's GitHub repository

This pangenome data set, for download, includes genome-wide variant (VCF) data for 1007 soybean accessions, as well as predicted variants (and effects) within genes, indicating which accessions have the variants from Torkamaneh et al., 2020

This pangenome data set, for download, includes genomic sequence and gene variants present in 204 diverse accessions of Glycine max but not present in the reference assembly G. max Williams 82 v4. From Torkamaneh, Lemay, and Belzile, (2021): "The Pan-genome of the Cultivated Soybean (PanSoy) Reveals an Extraordinarily Conserved Gene Content."

Bayer et al. sequenced 1000 accessions from the USDA Soybean Germplasm Collection and assembled the genomes using the cultivar Lee as reference. The collection included wild and cultivated strains to assess genome-wide gene changes due to domestication.

SoyBase Genome Viewer representation of Lee gene presence/absence in 1000 accessions

Files associated with this project

Funded by the USDA-ARS. Developed by the USDA-ARS SoyBase and Legume Clade Database group at the Iowa State University, Ames, IA
Iowa State University Logo