Marker Nomenclature on the Soybean Physical Map
As the SoyMap project progresses and more loci and types of loci are identified and mapped, it is important that we agree on a set of naming conventions. After some discussion and revisions by the group, the guidelines below have been agreed to.
Note that these rules are only for the interim CMap (and possibly GBrowse) representations of the physical map. The rules for loci on the genetic map are somewhat different and in particular do not include the linkage group or detection technology as part of the name. The extra bits of data in the physical map are included to facilitate the error detection and resolution steps we will be going through as we refine the physical map. Ideally future mapping and/or the complete soybean sequence will clear up most ambiguities and these interim names will be replaced with the canonical locus names.
Marker names on the PHYSICAL MAP are quadpartite: linkagegroup_detectionmethod_locusname_paralogcount
linkage group includes the 20 currently in SoyBase and Unmapped
detectionmethod is indicated by a single letter (I realize that detectionmethod is perhaps not a perfect
terminology but I think it conveys the essence of what this part of the name covers)
SNP, regardless of actual SNP detection method = _s_
RFLP = _r_
overgo = _o_
microsatellite (i.e. SSR) = _m_
PCR (i.e. STS) = _p_
(note that _m_ and _p_ have previously been used for a different purpose - the previous meanings are now deprecated)
clearly there can sometimes be uncertainty in the appropriate choice - the aim is to allow loci detected
by multiple technologies to be clearly identified - thus it is only important that the namer be consistent
locusname is the one used in SoyBase, when possible. For new loci derived from Genbank sequences, the gb name, i.e. BE659864, should be used. Markers derived from a composite sequence (such as an EST contig) should use the contig's name as defined by the originator.
paralogcount starts with _1 and is required, i.e. even loci that have no known paralog will have a _1 (again possibly not a completely ideal terminology as a truly single copy locus obviously has no paralog but does have a _1 as part of the name)
Examples
A1_p_BE123456_1 = STS derived from the sequence in Genbank record BE123456
M_s_BE123456 _1 = SNP identified between 2 cultivars in the sequence BE123456 and detected as a SNP
Unmapped_m_Satt123_2 = 2nd locus, currently unmapped, identified by SSR Satt123; once the locus is
mapped the correct linkage group would replace "Unmapped" in the name