Issue 3872: Separate the Alignment interface into more managable pieces (biomolecular-ftf) Source: (Mr. Philip Lijnzaad, p.lijnzaad@med.uu.nl) Nature: Uncategorized Issue Severity: Summary: Separate the Alignment interface into more managable pieces as follows: interface SimpleAlignment : CosLifeCycle::LifeCycleObject { // ... // here, everything BUT get_seq_region() // ... } interface Alignment : SimpleAlignment { SeqRegion get_seq_region( in AlignmentElement element, in Interval the_interval) raises(ElementNotInAlignment, IntervalOutOfBounds); } Resolution: rejected, see above Revised Text: Actions taken: September 19, 2000: received issue May 24, 2001: closed issue Discussion: Rejected. Adding the get_gaps() operation described in issue 3871 was deemed sufficient. This split had been proposed as an alternate solution. End of Annotations:===== I forgot a third sub-issue: 3687c: Separate the Alignment interface into more managable pieces as follows: interface SimpleAlignment : CosLifeCycle::LifeCycleObject { // ... // here, everything BUT get_seq_region() // ... } interface Alignment : SimpleAlignment { SeqRegion get_seq_region( in AlignmentElement element, in Interval the_interval) raises(ElementNotInAlignment, IntervalOutOfBounds); } ------------------------------------------------------------------------ with apologies, Philip -- When C++ is your hammer, everything looks like a thumb. (Steven Haflich) ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad@ebi.ac.uk \ European Bioinformatics Institute,rm A2-24 +44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53 Date: Tue, 19 Sep 2000 12:54:49 +0100 (BST) Message-Id: <200009191154.e8JBsn943066@o2-3.ebi.ac.uk> X-Authentication-Warning: o2-3.ebi.ac.uk: lijnzaad set sender to lijnzaad@ebi.ac.uk using -f From: Philip Lijnzaad To: biomolecular-ftf@omg.org cc: senger@ebi.ac.uk, muilu@ebi.ac.uk Subject: issue 3687c: Alignment and SimpleAlignment Reply-to: lijnzaad@ebi.ac.uk Content-Type: text X-UIDL: Y&2!!2DPd9DU;e9;[+!! Dear all, the EBI proposes to resolve the issue 3687c (or whatever number) by _not_ separating SimpleAlignment out of Alignment, on the grounds that SimpleAlignment would not be useful in its own right. Philip -- When C++ is your hammer, everything looks like a thumb. (Steven Haflich) ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad@ebi.ac.uk \ European Bioinformatics Institute,rm A2-24 +44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53 X-Authentication-Warning: o2-3.ebi.ac.uk: lijnzaad set sender to lijnzaad@ebi.ac.uk using -f From: Philip Lijnzaad MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Message-ID: <14811.2117.399224.623241@o2-3.ebi.ac.uk> Date: Wed, 4 Oct 2000 11:36:53 +0100 To: Scott Markel Cc: BSA FTF Subject: Re: [OMG-BSA] proposed resolutions #2 In-Reply-To: "[OMG-BSA] proposed resolutions #2" dated Oct 3 References: <39DABDE8.12D99D36@netgenics.com> X-Mailer: VM 6.76 under Emacs 20.5.1 Reply-To: lijnzaad@ebi.ac.uk Content-Type: text/plain; charset=us-ascii X-UIDL: Kn^!!nHGe9EZa!!iK5e9 > Here's a second batch of proposed resolutions. Based on email traffic > there seems to be a consensus on the proposals. As Martin noted, there was one point that we overlooked ... sorry. BTW, Issue 3871: adding get_gaps(): was there any consensus on that yet? I can't remember if Netgenics would be in favour or not, but it is related to this point: > ------------------------------------------------------------------------ > Issue 3872: Separate the Alignment interface into more managable pieces > "The EBI proposes to resolve the issue 3872 by _not_ > separating SimpleAlignment out of Alignment, > on the grounds that SimpleAlignment would not be useful in its own > right." > ------------------------------------------------------------------------ The proposal was: interface SimpleAlignment : CosLifeCycle::LifeCycleObject { typedef string AlignType; typedef sequence AlignTypeList; const AlignType PROTEIN = "PROTEIN"; const AlignType NON_PROTEIN = "NON_PROTEIN"; const AlignType SEQUENCE_ERROR = "SEQUENCE_ERROR"; const AlignType UNKNOWN = "UNKNOWN"; AlignmentElementList get_alignment_elements( in unsigned long start, in unsigned long how_many, out AlignmentElementIterator the_rest) raises(IndexOutOfBounds); unsigned long num_rows(); unsigned long num_columns(); AlignType get_align_type_by_column(in unsigned long col) } interface Alignment : SimpleAlignment { SeqRegion get_seq_region( in AlignmentElement element, in Interval the_interval) raises(ElementNotInAlignment, IntervalOutOfBounds); } but it was worded differently: it said: interface SimpleAlignment : CosLifeCycle::LifeCycleObject { // ... // here, everything but get_seq_region() // ... } The point is that when we made the proposal, we assumed that 'everything but get_seq_region()' _would_ include the newly added get_gaps(). However, when we proposed to resolve it by not splitting it, I must have thought they were independent issues, but they aren't. So the first thing to do is decide on get_gaps(). If get_gaps() is _not_ added, then the split results in SimpleAlignment being useles, and in that case we withdraw this issue. However, if get_gaps() _is_ added (as we would like), then we _are_ in favour of splitting (i.e., reinstate issue #3872), on the grounds that implementing get_seq_region() is a method that is only used for sophisticated alignments. So that was a mistake; my apologies for confusing the matter. Cheers, Philip -- Time passed, which, basically, is its job. -- Terry Pratchett (in: Equal Rites) ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad@ebi.ac.uk \ European Bioinformatics Institute,rm A2-24 +44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53 Date: Mon, 16 Oct 2000 22:36:51 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.75 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: BSA FTF Subject: Re: issues 3870 - 3872 Biomolecular FTF issues References: <4.2.0.58.20000928141053.00c6f410@emerald.omg.org> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: Y~\d9Vb(!!,pMe9-"Ee9 Juergen Boldt wrote: > > This is issue # 3870 > > typedef sequence IntervalList; is missing from the spec This behavior is already contained in CompositeSeqRegion, which extends Interval. An Interval can already represent an array of Intervals. > -------------------------------- > > This is issue 3871 > > IntervalList get_gaps(in AlignmentElement element, in Interval the_interval > > add an operation > > > > IntervalList get_gaps(in AlignmentElement element, in Interval > the_interval); > > > > to the interface Alignment. It's job is to simply return all the gaps of a > > particular sequence in a particular alignment. For symmetry with > > get_seq_region(), the_interval is also given, thus limiting the gaps to > those > > that you're interested in. Since this issue was discussed many times during design and we know that the functionality is provided by repeated calls to get_seq_region(), I'd prefer not to make this change. Obviously nothing prevents a vendor from providing this additional method in an extension. > ------------------------------ > > This is issue # 3872 > > Separate the Alignment interface into more managable pieces > > Separate the Alignment interface into more managable pieces as > follows: > > interface SimpleAlignment : CosLifeCycle::LifeCycleObject { > // ... > // here, everything BUT get_seq_region() > // ... > } > > interface Alignment : SimpleAlignment { > SeqRegion get_seq_region( > in AlignmentElement element, > in Interval the_interval) > raises(ElementNotInAlignment, IntervalOutOfBounds); > } I'm against this. I don't see what's gained by splitting Alignment into two interfaces. In either case, one interface or two, an implementation of get_seq_region() is required. Scott -- Scott Markel, Ph.D. NetGenics, Inc. smarkel@netgenics.com 4350 Executive Drive Tel: 858 455 5223 Suite 260 FAX: 858 455 1388 San Diego, CA 92121 Date: Tue, 17 Oct 2000 16:16:08 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.75 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: BSA FTF Subject: [OMG-BSA] proposed resolutions #3 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: ZGd!!Y64e9e!gd9$Z6!! Here's a third batch of proposed resolutions. Based on email traffic there seems to be a consensus on the proposals. In a few cases the consensus is 2 of 3. In most cases it's 2 of 2. I'm not reproducing the entire text of the individual issues. You can find that at http://cgi.omg.org/issues/biomolecular-ftf.html. This message is *not* a vote. I plan to send out a message kicking off a vote probably on Friday. The timing depends on the responses to this message. Scot Date: Tue, 17 Oct 2000 16:16:08 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.75 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: BSA FTF Subject: [OMG-BSA] proposed resolutions #3 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: ZGd!!Y64e9e!gd9$Z6!! Here's a third batch of proposed resolutions. Based on email traffic there seems to be a consensus on the proposals. In a few cases the consensus is 2 of 3. In most cases it's 2 of 2. I'm not reproducing the entire text of the individual issues. You can find that at http://cgi.omg.org/issues/biomolecular-ftf.html. This message is *not* a vote. I plan to send out a message kicking off a vote probably on Friday. The timing depends on the responses to this message. Scott ======================================================================== Issue 3870: typedef sequence IntervalList; is missing from the spec Proposed resolution: No change. CompositeSeqRegion already has this behavior. ------------------------------------------------------------------------ Issue 3871: IntervalList get_gaps(in AlignmentElement element, in Interval the_interval Proposed resolution: Add CompositeSeqRegion get_gaps(in string key, in Interval the_interval) raises(ElementNotInAlignment, IntervalOutOfBounds); to the Alignment interface (section 2.1.15, IDL change on p. 2-43 and method block on p. 2-46). Update the IDL in appendix C.1. The coordinates of a gap would be those of the original sequence; gaps of length 0 are not allowed. A start == 0 would be before the first nucleotide/aminoacid; a start = N is a gap between nucleotides/aminoacids N and N+1 (so start = sequence.length would be after the last). ------------------------------------------------------------------------ Issue 3872: Separate the Alignment interface into more managable pieces Proposed resolution: No change. ------------------------------------------------------------------------ Issue 3874: add an Identifier to SeqRegion Proposed resolution: Add "sequence Identifier" to the SeqRegion paragraph on p. 2-6 (section 2.1.5). Add "public Identifier id;" to the IDL on the bottom of the same page. Add a description block describing the ID as a sequence ID on p. 2-7. Update the IDL in appendix C.1. ------------------------------------------------------------------------ Issue 3875: inheritance in annotation iterators Proposed resolution: Add the following to the single sentence description in section 2.1.7 (p. 2-15) SeqAnnotationIterator is not used directly in this specification, but is provided as a convenience for vendor-specific IDL extensions and future OMG specifications where a collection of Annotations contains only SeqAnnotations. SeqAnnotationIterator is an optional interface. In addition, add SeqAnnotationIterator to the list of optional interfaces in section 1.1 (p. 1-1). ------------------------------------------------------------------------ Issue 3924: octet iterator Proposed resolution: Out of scope. ------------------------------------------------------------------------ Issue 3962: clarification of strand_type and CompositeSeqRegions Proposed resolution: Add the following to the first paragraph under CompositeSeqRegion (section 2.1.5, p. 2-7) and to region_operator's description (section 2.1.5, p. 2-8). All CompositeSeqRegions are expected to be translated in a depth-first traversal, along each node of the tree represented by the CompositeSeqRegions. This includes those nodes that have region_operator equal to JOIN or ORDER. Add the following to the descriptions of BioSequence's seq_interval() (section 2.1.9, p. 2-24) and NucleotideSequence's translate_seq_region() (section 2.1.10, p. 2-30). If the StrandType is minus, the string returned should be taken as reverse-complemented. Add the following to the description of NucleotideSequence's reverse_complement_interval() (section 2.1.10, p. 2-28). If the StrandType is minus, the string returned should be taken as reverse-complemented. This will result in a no-op, i.e., the strand_type leads to reverse-complementing, which is then reverse-complementing due to the semantics of the method, resulting in the same string that would be returned from seq_interval(). ------------------------------------------------------------------------ Issue 3963: clarification of SeqRegionOperator.ORDER Proposed resolution: Add the following to the description of the enum SeqRegionOperator's ORDER value (section 2.1.5, p. 2-8). Typically, it is used to represent a discontinuous region to which a descriptive annotation pertains. ------------------------------------------------------------------------ Issue 3964: OutOfBounds exceptions for circular sequences if start = 0 Proposed resolution: Add text similar to the following (BioSequence and Interval replaced by derived types, as appropriate) to the descriptions of * IntervalOutOfBounds (section 2.1.9, p. 2-21) * SeqRegionOutOfBounds (section 2.1.9, p. 2-21) * SeqAnnotationOutOfBounds (section 2.1.23, p. 2-63) and to the exceptions of * BioSequence's seq_interval() (section 2.1.9, p. 2-24) * NucleotideSequence's reverse_complement_interval() (section * 2.1.10, p. 2-28) * NucleotideSequence's translate_seq_region() (section 2.1.10, p. 2-30) Raises IntervalOutOfBounds if the Interval's start is less than 1 or if its start+length-1 is greater than the length of the BioSequence. If a BioSequence represents circular DNA, then this exception should be raised if the Interval's start is less than 1 or greater than the length of the BioSequence, or if its length is greater than the length of the BioSequence. ------------------------------------------------------------------------ Issue 3965: add BioSequence.get_annotations_by_name()? Proposed resolution: Out of scope. ------------------------------------------------------------------------ -- Scott Markel, Ph.D. NetGenics, Inc. smarkel@netgenics.com 4350 Executive Drive Tel: 858 455 5223 Suite 260 FAX: 858 455 1388 San Diego, CA 92121 Date: Tue, 17 Oct 2000 15:27:12 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.75 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: lijnzaad@ebi.ac.uk CC: BSA FTF Subject: Re: get_gaps() [ was: Re: issues 3870 - 3872 Biomolecular FTF issues ] References: <4.2.0.58.20000928141053.00c6f410@emerald.omg.org> <39EBE573.D721F2BD@netgenics.com> <14828.35323.945505.510495@o2-3.ebi.ac.uk> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: HGld90ZV!!Pl3e9'~9!! Philip, Philip Lijnzaad wrote: > > Dear all, > > this is just to clarify our experience and position, as we do feel strongly > about adding get_gaps() to the spec. I've noticed. :) > Consider the following alignment: > > 1 2 3 4 5 6 > 7 8 > 12345678901234567890123456789012345678901234567890123456789012345678901234567890 > ATTC-----------------------------------------------------------------GACGGCCCATG > ATTC---------------------------------------------------------------------------G > A--------------------------------------------------------------------GACGGCCCATG > TTTCTTGTGTCTCAAGGACAGAAGAGACTTCAGGTTCCCCCAGGAGATGGTAAAAGGGAGCCAGTTGCAGAAGGCCCATG Small aside: Please note that this particular example can be represented by 5 columns. > It will come as no surprise that EBI will vote in favour of adding a > get_gaps() method :-) I'll go along with adding get_gaps(), but not IntervalList or the proposed SimpleAlignment/Alignment split. Scott -- Scott Markel, Ph.D. NetGenics, Inc. smarkel@netgenics.com 4350 Executive Drive Tel: 858 455 5223 Suite 260 FAX: 858 455 1388 San Diego, CA 92121