Issue 5561: group BioSequences that share the same type, PolymerType, and Species infor (gene-expression-ftf) Source: Affymetrix, Inc. (Mr. Stephen A. Chervitz, steve_chervitz(at)affymetrix.com) Nature: Uncategorized Issue Severity: Summary: It would be useful to have a way to group BioSequences that share the same Type, PolymerType, and Species information. Request from Steve Chervitz, Affymetrix, that the model be modified in some way to make it possible to group together sequences that shared these common characteristics. Resolution: Revised Text: Actions taken: July 29, 2002: received issue December 11, 2002: closed issue Discussion: Request from Steve Chervitz, Affymetrix, that the model be modified in some way to make it possible to group together sequences that shared these common characteristics. That would produce much smaller files that contained BioSquence information End of Annotations:===== X-Server-Uuid: F7D3E4A3-3C15-41D2-AC5D-A7D3F094E28F From: "Miller, Michael (Rosetta)" To: "'Juergen Boldt'" cc: gene-expression-ftf@omg.org Subject: Gene Expression FTF issues Date: Mon, 29 Jul 2002 07:30:43 -0700 X-Mailer: Internet Mail Service (5.5.2653.19) X-WSS-ID: 115B8FEF18115-01-01 Hi Juergen, Here's the last batch. Officially, I guess I'm entering them but these are collated from the MGED efforts for which there is no clear consensus yet. thanks, as always, Michael Michael Miller Senior Application Developer Rosetta Biosoftware michael_miller@rosettabio.com www.rosettabio.com It would be useful to have a way to group BioSequences that share the same Type, PolymerType, and Species information. Request from Steve Chervitz, Affymetrix, that the model be modified in some way to make it possible to group together sequences that shared these common characteristics. User-Agent: Microsoft-Entourage/9.0.1.3108 Date: Fri, 23 Aug 2002 17:38:25 -0700 Subject: Re: Final Issues: Preview for comments From: Karl Konnerth To: on 8/21/02 4:08 PM, Miller, Michael (Rosetta) at Michael_Miller@Rosettabio.com wrote: > 2) In call_to_vote_08_31_02.txt, where I suggest alternatives, if I don't > receive any feedback, I plan to go with the first alternative. If I receive > feedback but it falls equally on both sides, I'll submit both to the FTF and > let that vote be final. Hi, I don't have strong opinions on these but here are my recommendations on a few of them: > Issue #5550: Allowing a person to belong to only one organization is > too restrictive. > ==================================================================== > > Request to allow a person to belong to more than one organization. > > Alternative One: > Recommend take no action. > > Not clear if any benefit in the scope of the specification is gained > by this. Version two may reconsider this. I agree. I would think that in context of gene expression, a person could identify a single organization that is relevant. I don't see why multiple organizations are that important. > Issue #5559: Attribute to be added to describe its type > ======================================================= > > Parameters for Protocols can not only be input parameters but can be > output parameters or both input and output parameters. This would > allow > Protocol/ProtocolApplication to allow return values. An example of > an > output parameter from FeatureExtraction could be "Mean Average > Backround > Subtraction" > > Alternative One: > Recommend no action. > > Could be considered for version two. > > Alternative Two: > Recommend an attribute be added to Parameter to describe its type: > - name: type > - required: true > - enumeration: {IN, INOUT, OUT} > - default: IN I favor no action. This makes the protocol parameter attributes more complex, and also appears to complicate the interfaces. > Issue #5561: Group BioSequences that share the same type, PolymerType, > and Species information > ====================================================================== > > It would be useful to have a way to group BioSequences that share the same > Type, PolymerType, and Species information. This would allow the XML > files with BioSequences to be a great deal smaller. > > Recommend no action. > > Could be considered for version two. Any change would probably greatly effect > existing implementations. > > (Steve Cherwitz may have a suggested alternative that could make version one) I favor no action (unless Steve has a good solution). Yes, this might make the files smaller, but at a cost of making the XML files themselves and file processing algorithms more complicated. Do we have any idea as to how often this grouping would be used in practice? Best regards, ---Karl P.S. Do you get the feeling that I favor simplicity? Yes, especially when it comes to standards. A challenge is to resist adding more and more features until the standard collapses under its own weight. On the other hand, if a new feature would make it usable for someone who otherwise could not use the standard, I would consider it. From: "Chervitz, Steve" To: "'Karl Konnerth'" , gene-expression-ftf@omg.org Cc: "'mged mage'" Subject: RE: Final Issues: Preview for comments Date: Tue, 27 Aug 2002 18:27:48 -0700 X-Mailer: Internet Mail Service (5.5.2653.19) > -----Original Message----- > From: Karl Konnerth [mailto:konnerth@incyte.com] > Sent: Friday, August 23, 2002 5:38 PM > To: gene-expression-ftf@omg.org > Subject: Re: Final Issues: Preview for comments > > > on 8/21/02 4:08 PM, Miller, Michael (Rosetta) at > Michael_Miller@Rosettabio.com wrote: > > > > Issue #5561: Group BioSequences that share the same type, PolymerType, > > and Species information > > > ====================================================================== > > > > It would be useful to have a way to group BioSequences that share the > > same Type, PolymerType, and Species information. This would allow the > > XML files with BioSequences to be a great deal smaller. > > > > Recommend no action. > > > > Could be considered for version two. Any change would probably > > greatly effect existing implementations. > > > > (Steve Cherwitz may have a suggested alternative that could make > > version one) > > I favor no action (unless Steve has a good solution). Yes, > this might make the files smaller, but at a cost of making > the XML files themselves and file processing algorithms more > complicated. Do we have any idea as to how often this > grouping would be used in practice? This issue is more of a "nice to have" and may not be worth the added complexity it creates. Michael and I discussed it a bit and the long and short of it is that there's no simple way to do it without a fairly major change in the MAGE DTD. So it's most likely of consideration for version 1. For the record, I'm including the discussion I had with Michael about it below. Steve --------------------------- Steve Chervitz wrote on 8/14/02: The main approach I've been considering is to create a new element called something like BioSequenceGroup and then allow the BioSequence_package to contain zero or more BioSequenceGroup elements as well as an optional BioSequences_assnlist. That way, it wouldn't invalidate existing documents, and people could have groups and free-floating sequences in the same file if they wanted. The BioSequenceGroup would be describable and have optional PolymerType, Type, Species_assn, BioSequence_assnlist, and BioSequence_assnreflist sub-elements. The OntologyEntries, SequenceDatabases, and SeqFeatures apply to individual BioSequences, not groups (perhaps OntologyEntries could also go into the BioSequenceGroup -- not sure.) I don't think we need to break out a BioSequence_attrs entity. Michael Miller replied: Steve, This idea is certainly worthwhile but I don't see how it will work unless we seriously break existing XML. Feel free to discover the holes in my arguments below, I've been wrong before and likely will be wrong in the future. And definitely put it forth to the group. It's the consequences of what the generating code will do with this new class in the model and the restrictions on how it ends up being automagically generated into the DTD, > The main approach I've been considering is to create a new > element called > something like BioSequenceGroup and then allow the > BioSequence_package to > contain zero or more BioSequenceGroup elements as well as an > optional > BioSequences_assnlist. The generating code treats packages differently than classes, a list container is generated for each of the class names in the parameter > xml that is passed in (params.xml) so its just a matter of updating that file > with the class name "BioSequenceGroup". These lists are always optional. > The BioSequenceGroup would be describable ... To be in the package list it needs to be Identifiable > The BioSequenceGroup ... have optional > ... BioSequence_assnlist, and BioSequence_assnreflist The association in the model must be one or the other, and the > implications of BioSequence_assnlist is that the BioSequenceGroup "owns" its > BioSequences and worse for backwards compatibility, that a BioSequence can't live > on its own. There's no real problem with BioSequence_assnreflist--by making > the association between BioSequenceGroup and BioSequence two ways and > optional from BioSequence to BioSequenceGroup, it accomplishes what you want > except that now the BioSequence element, instead of having many shared > attributes, if they are associated with a BioSequenceGroup, will look like this: which doesn't help the bloat problem. I think the real solution was to have had the BioSequenceGroup in the first place and let it own its BioSequences. I'm certain if we had thought of that, we would have done it. BioSequences would still be Identifiable so they could be referenced but they would not be Independent. This would also eliminate the duplication of defining the same attributes between these two classes. We can simply munge the DTD towards what you say above, or it's possible to cheat and even tho it specifies in the model that they would be owned by the BioSequenceGroup we could work around that I'm pretty certain.