SoyBase Follow us on Twitter @SoyBaseDatabase
Integrating Genetics and Genomics to Advance Soybean Research

Gene Names Derived by Sequence Similarity

When assigning gene function and therefore gene names to soybean genes based on sequence similarity to orthologs in other species it must be kept in mind that any particular gene in another species may be present in two or more copies in soybean based on the existence of a whole genome duplication in the genomic history of soybean.

Name assignments based on sequence similarity should be performed by reciprocal best-BLAST assignment. The last step in the name assignment should be to perform a multiple sequence alignment of the soybean sequence with other known sequences. The presence of required motifs and/or catalytic residues in the soybean protein sequence should be confirmed in this step.

In the event the soybean genome contains more than one copy of a sequence (paralogs), a dash-number will be arbitrarily assigned to each paralog to uniquely identify the sequence. For example if two sequences in soybean are identified as similar by sequence to the Arabidopsis gene alcohol dehydrogenase 1 (Arabdopsis gene symbol ADH1), one would be named ADH1-1 and the other ADH1-2.

Funded by the USDA-ARS. Developed by the USDA-ARS SoyBase and Legume Clade Database group at the Iowa State University, Ames, IA
Iowa State University Logo