Issue 5561: group BioSequences that share the same type, PolymerType, and Species infor (gene-expression-ftf)
Source: Affymetrix, Inc. (Mr. Stephen A. Chervitz, steve_chervitz(at)affymetrix.com)
Nature: Uncategorized Issue
Severity: 
Summary: It would be useful to have a way to group BioSequences that share the same
Type, PolymerType, and Species information.


Request from Steve Chervitz, Affymetrix, that the model be modified in
some way to make it possible to group together sequences that shared these
common characteristics.
Resolution: 
Revised Text: 
Actions taken:
July 29, 2002: received issue
December 11, 2002: closed issue
Discussion: 
Request from Steve Chervitz, Affymetrix, that the model be modified in some way to make it possible to group together sequences that shared these common characteristics. That would produce much smaller files that contained BioSquence information
End of Annotations:=====
X-Server-Uuid: F7D3E4A3-3C15-41D2-AC5D-A7D3F094E28F 
From: "Miller, Michael (Rosetta)" <Michael_Miller@Rosettabio.com> 
To: "'Juergen Boldt'" <juergen@omg.org> 
cc: gene-expression-ftf@omg.org 
Subject: Gene Expression FTF issues 
Date: Mon, 29 Jul 2002 07:30:43 -0700 
X-Mailer: Internet Mail Service (5.5.2653.19) 
X-WSS-ID: 115B8FEF18115-01-01 


Hi Juergen,


Here's the last batch.  Officially, I guess I'm entering them but
these are
collated from the MGED efforts for which there is no clear consensus
yet.


thanks, as always,
Michael


Michael Miller
Senior Application Developer
Rosetta Biosoftware
michael_miller@rosettabio.com
www.rosettabio.com


It would be useful to have a way to group BioSequences that share the
same
Type, PolymerType, and Species information.


Request from Steve Chervitz, Affymetrix, that the model be modified in
some way to make it possible to group together sequences that shared
these
common characteristics.


User-Agent: Microsoft-Entourage/9.0.1.3108 
Date: Fri, 23 Aug 2002 17:38:25 -0700 
Subject: Re: Final Issues: Preview for comments 
From: Karl Konnerth <konnerth@incyte.com> 
To: <gene-expression-ftf@omg.org> 


on 8/21/02 4:08 PM, Miller, Michael (Rosetta) at
Michael_Miller@Rosettabio.com wrote:


> 2) In call_to_vote_08_31_02.txt, where I suggest alternatives, if I
     don't
> receive any feedback, I plan to go with the first alternative.  If I
     receive
> feedback but it falls equally on both sides, I'll submit both to the
     FTF and
> let that vote be final.


Hi,


I don't have strong opinions on these but here are my recommendations
on a
few of them:


> Issue #5550: Allowing a person to belong to only one organization is
> too restrictive.
> ====================================================================
> 
> Request to allow a person to belong to more than one organization.
> 
> Alternative One:
> Recommend take no action.
> 
> Not clear if any benefit in the scope of the specification is gained
> by this.  Version two may reconsider this.


I agree.  I would think that in context of gene expression, a person
could
identify a single organization that is relevant.  I don't see why
multiple
organizations are that important.


> Issue #5559: Attribute to be added to describe its type
> =======================================================
> 
> Parameters for Protocols can not only be input parameters but can be
> output parameters or both input and output parameters.  This would
> allow
> Protocol/ProtocolApplication to allow return values.  An example of
> an
> output parameter from FeatureExtraction could be "Mean Average
> Backround
> Subtraction"
> 
> Alternative One:
> Recommend no action.
> 
> Could be considered for version two.
> 
> Alternative Two:
> Recommend an attribute be added to Parameter to describe its type:
>  - name: type
>  - required: true
>  - enumeration: {IN, INOUT, OUT}
>  - default: IN


I favor no action.  This makes the protocol parameter attributes more
complex, and also appears to complicate the interfaces.


> Issue #5561: Group BioSequences that share the same type,
  PolymerType,
> and Species information
>
  ======================================================================
> 
> It would be useful to have a way to group BioSequences that share
  the same
> Type, PolymerType, and Species information.  This would allow the
  XML
> files with BioSequences to be a great deal smaller.
> 
> Recommend no action.
> 
> Could be considered for version two.  Any change would probably
  greatly effect
> existing implementations.
> 
> (Steve Cherwitz may have a suggested alternative that could make
  version one)


I favor no action (unless Steve has a good solution).  Yes, this might
make
the files smaller, but at a cost of making the XML files themselves
and file
processing algorithms more complicated.  Do we have any idea as to how
often
this grouping would be used in practice?


Best regards,


---Karl


P.S. Do you get the feeling that I favor simplicity?  Yes, especially
when
it comes to standards.  A challenge is to resist adding more and more
features until the standard collapses under its own weight.  On the
other
hand, if a new feature would make it usable for someone who otherwise
could
not use the standard, I would consider it.


From: "Chervitz, Steve" <Steve_Chervitz@affymetrix.com> 
To: "'Karl Konnerth'" <konnerth@incyte.com>,
gene-expression-ftf@omg.org 
Cc: "'mged mage'" <mged-mage@lists.sourceforge.net> 
Subject: RE: Final Issues: Preview for comments 
Date: Tue, 27 Aug 2002 18:27:48 -0700 
X-Mailer: Internet Mail Service (5.5.2653.19) 


> -----Original Message-----
> From: Karl Konnerth [mailto:konnerth@incyte.com] 
> Sent: Friday, August 23, 2002 5:38 PM
> To: gene-expression-ftf@omg.org
> Subject: Re: Final Issues: Preview for comments
> 
> 
> on 8/21/02 4:08 PM, Miller, Michael (Rosetta) at 
> Michael_Miller@Rosettabio.com wrote:
> >
> > Issue #5561: Group BioSequences that share the same type,
PolymerType, 
> > and Species information 
> > 
>
======================================================================
> > 
> > It would be useful to have a way to group BioSequences that share
the 
> > same Type, PolymerType, and Species information.  This would allow
the 
> > XML files with BioSequences to be a great deal smaller.
> > 
> > Recommend no action.
> > 
> > Could be considered for version two.  Any change would probably 
> > greatly effect existing implementations.
> > 
> > (Steve Cherwitz may have a suggested alternative that could make 
> > version one)
> 
> I favor no action (unless Steve has a good solution).  Yes, 
> this might make the files smaller, but at a cost of making 
> the XML files themselves and file processing algorithms more 
> complicated.  Do we have any idea as to how often this 
> grouping would be used in practice?


This issue is more of a "nice to have" and may not be worth the added
complexity it creates. Michael and I discussed it a bit and the long
and
short of it is that there's no simple way to do it without a fairly
major
change in the MAGE DTD. So it's most likely of consideration for
version 1.


For the record, I'm including the discussion I had with Michael about
it
below.


Steve


---------------------------
Steve Chervitz wrote on 8/14/02:


The main approach I've been considering is to create a new element
called
something like BioSequenceGroup and then allow the BioSequence_package
to
contain zero or more BioSequenceGroup elements as well as an optional
BioSequences_assnlist. 


That way, it wouldn't invalidate existing documents, and people could
have
groups and free-floating sequences in the same file if they wanted.


The BioSequenceGroup would be describable and have optional
PolymerType,
Type, Species_assn, BioSequence_assnlist, and BioSequence_assnreflist
sub-elements. The OntologyEntries, SequenceDatabases, and SeqFeatures
apply
to individual BioSequences, not groups (perhaps OntologyEntries could
also
go into the BioSequenceGroup -- not sure.) I don't think we need to
break
out a BioSequence_attrs entity.


Michael Miller replied:
Steve,


This idea is certainly worthwhile but I don't see how it will work
unless we
seriously break existing XML.  Feel free to discover the holes in my
arguments below, I've been wrong before and likely will be wrong in
the
future.  And definitely put it forth to the group.  It's the
consequences of
what the generating code will do with this new class in the model and
the
restrictions on how it ends up being automagically generated into the
DTD,


> The main approach I've been considering is to create a new
> element called
> something like BioSequenceGroup and then allow the 
> BioSequence_package to
> contain zero or more BioSequenceGroup elements as well as an
> optional
> BioSequences_assnlist. 
The generating code treats packages differently than classes, a list
container is generated for each of the class names in the parameter
> xml that
is passed in (params.xml) so its just a matter of updating that file
> with
the class name "BioSequenceGroup".  These lists are always optional. 


> The BioSequenceGroup would be describable ...
To be in the package list it needs to be Identifiable 


> The BioSequenceGroup ... have optional
> ... BioSequence_assnlist, and BioSequence_assnreflist
The association in the model must be one or the other, and the
> implications
of BioSequence_assnlist is that the BioSequenceGroup "owns" its
> BioSequences
and worse for backwards compatibility, that a BioSequence can't live
> on its
own.  There's no real problem with BioSequence_assnreflist--by making
> the
association between BioSequenceGroup and BioSequence two ways and
> optional
from BioSequence to BioSequenceGroup, it accomplishes what you want
> except
that now the BioSequence element, instead of having many shared
> attributes,
if they are associated with a BioSequenceGroup, will look like this:
<BioSequence ...>
    <BioSequenceGroup_assnref>
        <BioSequenceGroup_ref identifier="..."/>
    </BioSequenceGroup_assnref>
</BioSequence>


which doesn't help the bloat problem.


I think the real solution was to have had the BioSequenceGroup in the
first
place and let it own its BioSequences.  I'm certain if we had thought
of
that, we would have done it.  BioSequences would still be Identifiable
so
they could be referenced but they would not be Independent.  This
would also
eliminate the duplication of defining the same attributes between
these two
classes.


We can simply munge the DTD towards what you say above, or it's
possible to
cheat and even tho it specifies in the model that they would be owned
by the
BioSequenceGroup we could work around that I'm pretty certain.