Issue 3688: do we have to be more strict about the contents of attributes (biomolecular-ftf) Source: (Mr. Philip Lijnzaad, p.lijnzaad@med.uu.nl) Nature: Uncategorized Issue Severity: Summary: do we have to be more strict about the contents of attributes? E.g., interface BioSequence { readonly attribute string name; readonly attribute Identifier id; readonly attribute string description; readonly attribute string seq; readonly attribute unsigned long length; readonly attribute Basis the_basis; has quite a few strings, all of which could be empty. We may want to explicitly allow or forbid this on some of them. Resolution: accepted Revised Text: Add the following sentence to section 2.1.6 (Annotation, p. 2-9: return value of name) section 2.1.9 (BioSequence, p. 2-23: return value of id) section 2.1.18 (SearchHit, p. 2-51: return value of id). X shall not be empty. Actions taken: June 9, 2000: receieved issue May 24, 2001: closed issue Discussion: End of Annotations:===== Date: Fri, 9 Jun 2000 15:39:29 +0100 (BST) Message-Id: <200006091439.e59EdTA17797@o2-3.ebi.ac.uk> X-Authentication-Warning: o2-3.ebi.ac.uk: lijnzaad set sender to lijnzaad@ebi.ac.uk using -f From: Philip Lijnzaad To: biomolecular-ftf@emerald.omg.org cc: muilu@ebi.ac.uk Subject: some issues ... Reply-to: lijnzaad@ebi.ac.uk Content-Type: text X-UIDL: [C#!! X-Authentication-Warning: o2-3.ebi.ac.uk: lijnzaad set sender to lijnzaad@ebi.ac.uk using -f From: Philip Lijnzaad To: biomolecular-ftf@omg.org cc: senger@ebi.ac.uk, muilu@ebi.ac.uk Subject: Issue 3688: 'empty' attributes Reply-to: lijnzaad@ebi.ac.uk Content-Type: text X-UIDL: 1a_!!D~Le94Kp!!N):e9 The EBI proposes to resolve issue 3688 by adding text that explicitly forbids the following attributes to be empty strings: BioSequence::id BioSequence::seq to Annotation::name Any occurrence of Identifier (for this, add, as a first bullet to section 2.1.8.1, the sentence "An Identifier may not be the empty string") Philip -- When C++ is your hammer, everything looks like a thumb. (Steven Haflich) ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad@ebi.ac.uk \ European Bioinformatics Institute,rm A2-24 +44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53 Sender: muilu@ebi.ac.uk Message-ID: <39C86E87.EDCD0FF4@ebi.ac.uk> Date: Wed, 20 Sep 2000 08:00:08 +0000 From: Juha Muilu Reply-To: muilu@ebi.ac.uk Organization: EMBL-EBI X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Scott Markel CC: lijnzaad@ebi.ac.uk, biomolecular-ftf@omg.org, senger@ebi.ac.uk Subject: Re: Issue 3688: 'empty' attributes References: <200009191134.e8JBYjc43145@o2-3.ebi.ac.uk> <39C7D953.84CCEC3A@netgenics.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: Z1Ud9\1+e90;!"![^$!! Hi, I wonder are there cases where sequence string can be empty but length > 0 ? Scott Markel wrote: > > Philip, > > Philip Lijnzaad wrote: > > > > The EBI proposes to resolve issue 3688 by adding text that explicitly forbids > > the following attributes to be empty strings: > > > > BioSequence::id > > BioSequence::seq to > > Typo here? > > > Annotation::name > > Any occurrence of Identifier > > (for this, add, as a first bullet to section 2.1.8.1, the sentence > > "An Identifier may not be the empty string") > > I'm fine with this. Note, however, that an NCBI-like virtual sequence > couldn't be represented by our BioSequence with this restriction. > > Scott > > -- > Scott Markel, Ph.D. NetGenics, Inc. > smarkel@netgenics.com 4350 Executive Drive > Tel: 858 455 5223 Suite 260 > FAX: 858 455 1388 San Diego, CA 92121 -- +--------------------------------------------------------------------+ |Juha Muilu, Ph.D., EMBL Outstation| Email: muilu@ebi.ac.uk | |European Bioinformatics Institute | Phone: +44 (0)1223 494 624 | |Wellcome Trust Genome Campus | Fax: +44 (0)1223 494 468 | |Hinxton, Cambridge CB10 1SD, UK | http://industry.ebi.ac.uk/~muilu| +--------------------------------------------------------------------+ Date: Tue, 19 Sep 2000 14:23:31 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.73 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: lijnzaad@ebi.ac.uk CC: biomolecular-ftf@omg.org, senger@ebi.ac.uk, muilu@ebi.ac.uk Subject: Re: Issue 3688: 'empty' attributes References: <200009191134.e8JBYjc43145@o2-3.ebi.ac.uk> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: _pY!!JWXd9+R9e9ISCe9 Philip, Philip Lijnzaad wrote: > > The EBI proposes to resolve issue 3688 by adding text that explicitly forbids > the following attributes to be empty strings: > > BioSequence::id > BioSequence::seq to Typo here? > Annotation::name > Any occurrence of Identifier > (for this, add, as a first bullet to section 2.1.8.1, the > sentence > "An Identifier may not be the empty string") I'm fine with this. Note, however, that an NCBI-like virtual sequence couldn't be represented by our BioSequence with this restriction. Scott -- Scott Markel, Ph.D. NetGenics, Inc. smarkel@netgenics.com 4350 Executive Drive Tel: 858 455 5223 Suite 260 FAX: 858 455 1388 San Diego, CA 92121 Sender: muilu@ebi.ac.uk Message-ID: <39C86E87.EDCD0FF4@ebi.ac.uk> Date: Wed, 20 Sep 2000 08:00:08 +0000 From: Juha Muilu Reply-To: muilu@ebi.ac.uk Organization: EMBL-EBI X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Scott Markel CC: lijnzaad@ebi.ac.uk, biomolecular-ftf@omg.org, senger@ebi.ac.uk Subject: Re: Issue 3688: 'empty' attributes References: <200009191134.e8JBYjc43145@o2-3.ebi.ac.uk> <39C7D953.84CCEC3A@netgenics.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: Z1Ud9\1+e90;!"![^$!! Hi, I wonder are there cases where sequence string can be empty but length > 0 ? Scott Markel wrote: > > Philip, > > Philip Lijnzaad wrote: > > > > The EBI proposes to resolve issue 3688 by adding text that explicitly forbids > > the following attributes to be empty strings: > > > > BioSequence::id > > BioSequence::seq to > > Typo here? > > > Annotation::name > > Any occurrence of Identifier > > (for this, add, as a first bullet to section 2.1.8.1, the sentence > > "An Identifier may not be the empty string") > > I'm fine with this. Note, however, that an NCBI-like virtual sequence > couldn't be represented by our BioSequence with this restriction. > > Scott > > -- > Scott Markel, Ph.D. NetGenics, Inc. > smarkel@netgenics.com 4350 Executive Drive > Tel: 858 455 5223 Suite 260 > FAX: 858 455 1388 San Diego, CA 92121 -- +--------------------------------------------------------------------+ |Juha Muilu, Ph.D., EMBL Outstation| Email: muilu@ebi.ac.uk | |European Bioinformatics Institute | Phone: +44 (0)1223 494 624 | |Wellcome Trust Genome Campus | Fax: +44 (0)1223 494 468 | |Hinxton, Cambridge CB10 1SD, UK | http://industry.ebi.ac.uk/~muilu| +--------------------------------------------------------------------+ Date: Wed, 20 Sep 2000 09:01:07 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.73 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: muilu@ebi.ac.uk CC: lijnzaad@ebi.ac.uk, biomolecular-ftf@omg.org, senger@ebi.ac.uk Subject: Re: Issue 3688: 'empty' attributes References: <200009191134.e8JBYjc43145@o2-3.ebi.ac.uk> <39C7D953.84CCEC3A@netgenics.com> <39C86E87.EDCD0FF4@ebi.ac.uk> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: mW%e9;mC!!~_@!!26=!! Yes. Quoting from NCBI documentation "A 'virtual' Bioseq is one in which we know the type of molecule, and possibly its length, topology, and/or strandedness, but for which we do not have sequence data." Is this useful in the context of genomic maps? Please keep in mind that I'm basically in favor of Philip's proposal. See below. I just want to make sure that we don't bar a particular usage that we might want to leverage later. Scott Juha Muilu wrote: > > Hi, > I wonder are there cases where sequence string can be empty but > length > 0 ? > > Scott Markel wrote: > > > > Philip, > > > > Philip Lijnzaad wrote: > > > > > > The EBI proposes to resolve issue 3688 by adding text that explicitly forbids > > > the following attributes to be empty strings: > > > > > > BioSequence::id > > > BioSequence::seq to > > > > Typo here? > > > > > Annotation::name > > > Any occurrence of Identifier > > > (for this, add, as a first bullet to section 2.1.8.1, the sentence > > > "An Identifier may not be the empty string") > > > > I'm fine with this. Note, however, that an NCBI-like virtual sequence > > couldn't be represented by our BioSequence with this restriction. > > > > Scott -- Scott Markel, Ph.D. NetGenics, Inc. smarkel@netgenics.com 4350 Executive Drive Tel: 858 455 5223 Suite 260 FAX: 858 455 1388 San Diego, CA 92121 From: "Dickson, Mike" To: "Markel, Scott" , muilu@ebi.ac.uk Cc: lijnzaad@ebi.ac.uk, biomolecular-ftf@omg.org, senger@ebi.ac.uk Subject: RE: Issue 3688: 'empty' attributes Date: Wed, 20 Sep 2000 12:28:54 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" X-UIDL: n8R!!#O1e93_~e9W9od9 Why isn't this enough of a use case on its own? I.e. isnt the fact that you can describe a use case for one enough? Even in this case you do have a stable ID right? Its just the sequence part that can be empty. Mike > -----Original Message----- > From: Scott Markel [mailto:smarkel@netgenics.com] > Sent: Wednesday, September 20, 2000 12:01 PM > To: muilu@ebi.ac.uk > Cc: lijnzaad@ebi.ac.uk; biomolecular-ftf@omg.org; senger@ebi.ac.uk > Subject: Re: Issue 3688: 'empty' attributes > > > Yes. Quoting from NCBI documentation > > "A 'virtual' Bioseq is one in which we know the type of > molecule, and > possibly its length, topology, and/or strandedness, but for which we > do not have sequence data." > > Is this useful in the context of genomic maps? > > Please keep in mind that I'm basically in favor of Philip's proposal. > See below. I just want to make sure that we don't bar a particular > usage that we might want to leverage later. > > Scott > > Juha Muilu wrote: > > > > Hi, > > I wonder are there cases where sequence string can be empty but > > length > 0 ? > > > > Scott Markel wrote: > > > > > > Philip, > > > > > > Philip Lijnzaad wrote: > > > > > > > > The EBI proposes to resolve issue 3688 by adding text > that explicitly forbids > > > > the following attributes to be empty strings: > > > > > > > > BioSequence::id > > > > BioSequence::seq to > > > > > > Typo here? > > > > > > > Annotation::name > > > > Any occurrence of Identifier > > > > (for this, add, as a first bullet to section > 2.1.8.1, the sentence > > > > "An Identifier may not be the empty string") > > > > > > I'm fine with this. Note, however, that an NCBI-like > virtual sequence > > > couldn't be represented by our BioSequence with this restriction. > > > > > > Scott > > -- > Scott Markel, Ph.D. NetGenics, Inc. > smarkel@netgenics.com 4350 Executive Drive > Tel: 858 455 5223 Suite 260 > FAX: 858 455 1388 San Diego, CA 92121 > Date: Mon, 25 Sep 2000 13:39:12 +0100 (BST) Message-Id: <200009251239.e8PCdCQ42726@o2-3.ebi.ac.uk> X-Authentication-Warning: o2-3.ebi.ac.uk: lijnzaad set sender to lijnzaad@ebi.ac.uk using -f From: Philip Lijnzaad To: smarkel@netgenics.com CC: biomolecular-ftf@omg.org, senger@ebi.ac.uk, muilu@ebi.ac.uk In-reply-to: <39C7D953.84CCEC3A@netgenics.com> (message from Scott Markel on Tue, 19 Sep 2000 14:23:31 -0700) Subject: Re: Issue 3688: 'empty' attributes Reply-to: lijnzaad@ebi.ac.uk References: <200009191134.e8JBYjc43145@o2-3.ebi.ac.uk> <39C7D953.84CCEC3A@netgenics.com> Content-Type: text X-UIDL: ,h7!!20Oe9V&a!!a='!! Scott> Philip, Scott> Philip Lijnzaad wrote: >> >> The EBI proposes to resolve issue 3688 by adding text that explicitly forbids >> the following attributes to be empty strings: >> >> BioSequence::id >> BioSequence::seq to Scott> Typo here? the "to" is a typo, but I think we should state that BioSequence::seq should not be empty, simply because it would be a bit useless. >> Annotation::name >> Any occurrence of Identifier >> (for this, add, as a first bullet to section 2.1.8.1, the sentence >> "An Identifier may not be the empty string") Scott> I'm fine with this. Note, however, that an NCBI-like virtual sequence Scott> couldn't be represented by our BioSequence with this restriction. ? Don't know them, but would argue that you always want the aminoacids/nucleotides, whatsoever. And since this is CORBA, this can be done cheaply; you don't have to explicitly store all those nucleotides if they actually are constructed on the fly from lots of different other sequences. BTW, do we state somewhere that the length attribute must be identical to the length of the seq string ? I think we should. Cheers, Philip -- When C++ is your hammer, everything looks like a thumb. (Steven Haflich) ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad@ebi.ac.uk \ European Bioinformatics Institute,rm A2-24 +44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53 Date: Mon, 25 Sep 2000 14:09:31 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.73 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: lijnzaad@ebi.ac.uk CC: biomolecular-ftf@omg.org, senger@ebi.ac.uk, muilu@ebi.ac.uk Subject: Re: Issue 3688: 'empty' attributes References: <200009191134.e8JBYjc43145@o2-3.ebi.ac.uk> <39C7D953.84CCEC3A@netgenics.com> <200009251239.e8PCdCQ42726@o2-3.ebi.ac.uk> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: 9SI!!kRC!!m8j!!%>b!! Philip, Philip Lijnzaad wrote: > > Scott> Philip Lijnzaad wrote: > >> > >> The EBI proposes to resolve issue 3688 by adding text that explicitly forbids > >> the following attributes to be empty strings: > >> > >> BioSequence::id > >> BioSequence::seq to > > Scott> Typo here? > > the "to" is a typo, but I think we should state that BioSequence::seq should > not be empty, simply because it would be a bit useless. I disagree on the uselessness. Quoting from NCBI documentation "A 'virtual' Bioseq is one in which we know the type of molecule, and possibly its length, topology, and/or strandedness, but for which we do not have sequence data." I don't think we should preclude this usage. > >> Annotation::name > >> Any occurrence of Identifier > >> (for this, add, as a first bullet to section 2.1.8.1, the > >> sentence > >> "An Identifier may not be the empty string") > > Scott> I'm fine with this. Note, however, that an NCBI-like virtual > >> sequence > Scott> couldn't be represented by our BioSequence with this > >> restriction. > > ? Don't know them, but would argue that you always want the > aminoacids/nucleotides, whatsoever. And since this is CORBA, this > >> can be done > cheaply; you don't have to explicitly store all those nucleotides if > >> they > actually are constructed on the fly from lots of different other > >> sequences. It not a question of expense. There may be cases where the bases just aren't known, but there is other biosequence information that is known. > BTW, do we state somewhere that the length attribute must be identical to the > length of the seq string ? I think we should. Cheers, We don't and I don't think we should. The latter obvious follows from my comments above. Scott -- Scott Markel, Ph.D. NetGenics, Inc. smarkel@netgenics.com 4350 Executive Drive Tel: 858 455 5223 Suite 260 FAX: 858 455 1388 San Diego, CA 92121 Date: Tue, 26 Sep 2000 09:55:45 +0100 (BST) Message-Id: <200009260855.e8Q8tjv46648@o2-3.ebi.ac.uk> X-Authentication-Warning: o2-3.ebi.ac.uk: lijnzaad set sender to lijnzaad@ebi.ac.uk using -f From: Philip Lijnzaad To: smarkel@netgenics.com CC: biomolecular-ftf@omg.org, senger@ebi.ac.uk, muilu@ebi.ac.uk In-reply-to: <39CFBF0B.9F06EAED@netgenics.com> (message from Scott Markel on Mon, 25 Sep 2000 14:09:31 -0700) Subject: Re: Issue 3688: 'empty' attributes Reply-to: lijnzaad@ebi.ac.uk References: <200009191134.e8JBYjc43145@o2-3.ebi.ac.uk> <39C7D953.84CCEC3A@netgenics.com> <200009251239.e8PCdCQ42726@o2-3.ebi.ac.uk> <39CFBF0B.9F06EAED@netgenics.com> Content-Type: text X-UIDL: i+e!!<6Z!!0~Ne9~/Me9 >> >> The EBI proposes to resolve issue 3688 by adding text that explicitly forbids >> >> the following attributes to be empty strings: >> >> >> >> BioSequence::id >> >> BioSequence::seq to >> >> >> the "to" is a typo, but I think we should state that BioSequence::seq should >> not be empty, simply because it would be a bit useless. Scott> I disagree on the uselessness. Quoting from NCBI documentation Scott> "A 'virtual' Bioseq is one in which we know the type of molecule, and Scott> possibly its length, topology, and/or strandedness, but for which we Scott> do not have sequence data." OK, thanks, I thought they were less ephemeral. It seems I'm the only one wanting to have these stronger semantics, so I won't insist (but see below). So that would leave Annotation::name and all Identifiers as not being allowed to be empty. >> BTW, do we state somewhere that the length attribute must be identical to the >> length of the seq string ? I think we should. Cheers, Scott> We don't and I don't think we should. The latter obvious follows from Scott> my comments above. So ... is this BioSequence::length <-> BioSequence::seq.length() mismatch only allowed if BioSequence::seq.equals("") ? If not, why not (use case), and what are the semantics in this case? Just to give my position: apart from the virtual sequence bit, I think the two should be equal - to satisfy common expectation - to be able to rely on the length attribute when having to manage - memory on the client side - to be able to decided whether to get the nucleotides 'all in one - go' or in chunks using seq_interval(). Cheers, Philip -- When C++ is your hammer, everything looks like a thumb. (Steven Haflich) ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad@ebi.ac.uk \ European Bioinformatics Institute,rm A2-24 +44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53 Date: Tue, 26 Sep 2000 10:15:37 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.73 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: lijnzaad@ebi.ac.uk CC: biomolecular-ftf@omg.org, senger@ebi.ac.uk, muilu@ebi.ac.uk Subject: Re: Issue 3688: 'empty' attributes References: <200009191134.e8JBYjc43145@o2-3.ebi.ac.uk> <39C7D953.84CCEC3A@netgenics.com> <200009251239.e8PCdCQ42726@o2-3.ebi.ac.uk> <39CFBF0B.9F06EAED@netgenics.com> <200009260855.e8Q8tjv46648@o2-3.ebi.ac.uk> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: mCad9<:L!!]Mn!!^S_d9 Philip, Philip Lijnzaad wrote: > > >> >> The EBI proposes to resolve issue 3688 by adding text that explicitly forbids > >> >> the following attributes to be empty strings: > >> >> > >> >> BioSequence::id > >> >> BioSequence::seq to > >> > >> the "to" is a typo, but I think we should state that BioSequence::seq should > >> not be empty, simply because it would be a bit useless. > > Scott> I disagree on the uselessness. Quoting from NCBI documentation > > Scott> "A 'virtual' Bioseq is one in which we know the type of molecule, and > Scott> possibly its length, topology, and/or strandedness, but for which we > Scott> do not have sequence data." > > OK, thanks, I thought they were less ephemeral. It seems I'm the only one > wanting to have these stronger semantics, so I won't insist (but see > below). So that would leave Annotation::name and all Identifiers as not being > allowed to be empty. Okay with me. > >> BTW, do we state somewhere that the length attribute must be identical to the > >> length of the seq string ? I think we should. Cheers, > > Scott> We don't and I don't think we should. The latter obvious follows from > Scott> my comments above. > > So ... is this BioSequence::length <-> BioSequence::seq.length() mismatch > only allowed if BioSequence::seq.equals("") ? That's probably a reasonable restriction. > If not, why not (use case), and > what are the semantics in this case? The only other realworld case is that you know/think/conjecture that a sequence is 10K bases long, but you only know 300 bases at the 3' end and 250 bases at the 5' end. While this is a real example, the current BSA model wasn't designed to support this. I think we'd all agree that what we had in mind was a single string with known bases at the core of a BioSequence. An empty string is a special case of this. The sequence with known ends and unknown middle should probably be made from two sequences and, hence, is a different beast. > Just to give my position: apart from the > virtual sequence bit, I think the > two should be equal > > - to satisfy common expectation > - to be able to rely on the length attribute when having to manage > memory > on the client side > - to be able to decided whether to get the nucleotides 'all in one > go' or > in chunks using seq_interval(). Agreed. Scott -- Scott Markel, Ph.D. NetGenics, Inc. smarkel@netgenics.com 4350 Executive Drive Tel: 858 455 5223 Suite 260 FAX: 858 455 1388 San Diego, CA 92121