In this form you can BLAST your sequences to individual cultivar transcript or gene sequences in the SoyBase Soybean Cultivar Genomes Collection or perform a Pan-genome analysis by comparing your input sequence to all the protein or nucleic acid sequences of all of those contained in the SoyBase Soybean Cultivar Genomes Collection. You can also download individual cultivar's genomic or gene model sequences
From this form you can download soybean cultivar genomic, gene model and protein sequences from the SoyBase soybean cultivar genomes collection. The results will be made into a file and transfered to your computer to save.
In contrast to the pan-genome, pan gene sequences represent the gene compliment of each of the genomes considered in a pan-genome set. Currently the pan gene complement of Williams 82, Zhonghuang 13, W05, PI483463 and Lee have been collected and homologous genes assembled into pan-gene sets. Pan-gene analysis of multiple genomes often identify gene models that are unique to one or more of the genomes compared. In these cases a query gene may not be a member of a pan-cluster if it is unique to that genome. Also it is possible that every genome in the comparison may not contain a homolog of the query gene model. Identification of unique genes or genes that have been duplicated in some genomes may shed light on phenotypes shared by those cultivars.
In the tool below, you can enter a name of a gene model from Williams 82 (Glyma.Wm82), Zhonghuang 13 (SoyZH13), W05 (Glysoja), Lee (GlymaLee), PI483463 (GlysoPI483463) or the GenBank RefSeq (LOC) genome assemblies. The tool will return the pangene set that the query gene model is a member and all of the other members of the set. The assembly and annotation version of the gene models will also be returned.
We have provided an API to programmatically return sets of corresponding genes, given a query gene.
There are two variants of the service.
Provide a gene ID, with or without the associated prefix (genus-species.accession.assembly.annotation).
Provide a gene ID from one accession and annotation, and a second accession and annotation version from which you want the corresponding gene.
Please note, for bulk access, we recommend downloading the entire pangene set, from the six genomes described above.
To help keep the assemblies and annotations distinct, we use the following structure for gene names:
The pan genome of soybean can viewed using the Genome Context Viewer. This tool allows you to see the gene context of a selection of soybean cultivars. Each genome is displayed centered on a query gene model. The tool allows you to visualize the genome context of the six genomes described above and one from Liu et al. 2020 that examined 26 soybean accessions. Detailed descriptions of how to use this tool are available at the Legume Federation website and at the project's GitHub repository
This pangenome data set, for download, includes genome-wide variant (VCF) data for 1007 soybean accessions, as well as predicted variants (and effects) within genes, indicating which accessions have the variants from Torkamaneh et al., 2020
This pangenome data set, for download, includes genomic sequence and gene variants present in 204 diverse accessions of Glycine max but not present in the reference assembly G. max Williams 82 v4. From Torkamaneh, Lemay, and Belzile, (2021): "The Pan-genome of the Cultivated Soybean (PanSoy) Reveals an Extraordinarily Conserved Gene Content."
Bayer et al. sequenced 1000 accessions from the USDA Soybean Germplasm Collection and assembled the genomes using the cultivar Lee as reference. The collection included wild and cultivated strains to assess genome-wide gene changes due to domestication.
SoyBase Genome Viewer representation of Lee gene presence/absence in 1000 accessions
Files associated with this project