Issue 3699: proposed DsLSRBioObjects changes (04) (biomolecular-ftf) Source: SciTegic Inc. (Scott Markel, Ph.D., smarkel@scitegic.com smarkel@san.rr.com) Nature: Uncategorized Issue Severity: Summary: Add SequenceAlphabet. We decided early on to avoid the sequence alphabet issue. Well, it's back. It's pretty difficult to actually build a GeneticCode or check for invalid residues in the sequence factories without some standard functionality. The following is based on the IDL extensions we using in our implementation. The design was patterned after what the submitters did for GeneticCode. By the way, this fits in nicely with Philip's comment about adding DNASequence and RNASequence. typedef sequence<Residue> ResidueList; typedef string SequenceAlphabetName; typedef sequence<SequenceAlphabetName> SequenceAlphabetNameList; interface SequenceAlphabet { readonly attribute SequenceAlphabetName name; // valid is the union of unambiguous and ambiguous readonly attribute ResidueList valid_residues; readonly attribute ResidueList unambiguous_residues; readonly attribute ResidueList ambiguous_residues; boolean is_valid(in Residue the_residue); boolean is_ambiguous(in Residue the_residue); // returns the list of all residues included (represented) by // the input residue, e.g., return A for A and ACGT for N ResidueList included_residues(in Residue the_residue) raises(InvalidResidue); }; interface NucleotideSequenceAlphabet : SequenceAlphabet { readonly attribute ResidueList complementary_valid_residues; readonly attribute ResidueList complementary_unambiguous_residues; readonly attribute ResidueList complementary_ambiguous_residues; Residue complement(in Residue the_residue) raises(InvalidResidue); }; interface AminoAcidSequenceAlphabet : SequenceAlphabet { }; exception InvalidSequenceAlphabetName { string invalid_name; }; interface SequenceAlphabetFactory { const SequenceAlphabetName IUPAC_DNA = "IUPAC DNA"; const SequenceAlphabetName IUPAC_RNA = "IUPAC RNA"; const SequenceAlphabetName IUPAC_AA = "IUPAC AA"; readonly attribute SequenceAlphabetNameList sequence_alphabet_names; SequenceAlphabet create_sequence_alphabet(in SequenceAlphabetName name) raises(InvalidSequenceAlphabetName); }; interface GeneticCode { // in addition to its current functionality and the initiators // and terminators proposed above readonly attribute NucleotideSequenceAlphabet nucleotide_sequence_alphabet; readonly attribute AminoAcidSequenceAlphabet amino_acid_sequence_alphabet; }; interface BioSequence { // in addition to its current functionality readonly attribute SequenceAlphabet sequence_alphabet; }; Resolution: Rejected as out of scope for the FTF. Withdrawn by proposer. Revised Text: Actions taken: June 12, 2000: received issue May 24, 2001: closed issue Discussion: End of Annotations:===== Date: Mon, 12 Jun 2000 22:19:13 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.73 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: BSA FTF Subject: [BSA-FTF] proposed DsLSRBioObjects changes Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: QiFe9%fe!!5iZd9Oo]d9 Several issues have come up during our implementation of the BSA specification at NetGenics. In light of what we've learned, I'd like to propose the following. * Add SequenceAlphabet. We decided early on to avoid the sequence alphabet issue. Well, it's back. It's pretty difficult to actually build a GeneticCode or check for invalid residues in the sequence factories without some standard functionality. The following is based on the IDL extensions we using in our implementation. The design was patterned after what the submitters did for GeneticCode. By the way, this fits in nicely with Philip's comment about adding DNASequence and RNASequence. typedef sequence ResidueList; typedef string SequenceAlphabetName; typedef sequence SequenceAlphabetNameList; interface SequenceAlphabet { readonly attribute SequenceAlphabetName name; // valid is the union of unambiguous and ambiguous readonly attribute ResidueList valid_residues; readonly attribute ResidueList unambiguous_residues; readonly attribute ResidueList ambiguous_residues; boolean is_valid(in Residue the_residue); boolean is_ambiguous(in Residue the_residue); // returns the list of all residues included (represented) by // the input residue, e.g., return A for A and ACGT for N ResidueList included_residues(in Residue the_residue) raises(InvalidResidue); }; interface NucleotideSequenceAlphabet : SequenceAlphabet { readonly attribute ResidueList complementary_valid_residues; readonly attribute ResidueList complementary_unambiguous_residues; readonly attribute ResidueList complementary_ambiguous_residues; Residue complement(in Residue the_residue) raises(InvalidResidue); }; interface AminoAcidSequenceAlphabet : SequenceAlphabet { }; exception InvalidSequenceAlphabetName { string invalid_name; }; interface SequenceAlphabetFactory { const SequenceAlphabetName IUPAC_DNA = "IUPAC DNA"; const SequenceAlphabetName IUPAC_RNA = "IUPAC RNA"; const SequenceAlphabetName IUPAC_AA = "IUPAC AA"; readonly attribute SequenceAlphabetNameList sequence_alphabet_names; SequenceAlphabet create_sequence_alphabet(in SequenceAlphabetName name) raises(InvalidSequenceAlphabetName); }; interface GeneticCode { // in addition to its current functionality and the initiators // and terminators proposed above readonly attribute NucleotideSequenceAlphabet nucleotide_sequence_alphabet; readonly attribute AminoAcidSequenceAlphabet amino_acid_sequence_alphabet; }; interface BioSequence { // in addition to its current functionality readonly attribute SequenceAlphabet sequence_alphabet; }; Date: Tue, 13 Jun 2000 09:53:19 +0100 (BST) Message-Id: <200006130853.e5D8rJJ20118@o2-3.ebi.ac.uk> X-Authentication-Warning: o2-3.ebi.ac.uk: lijnzaad set sender to lijnzaad@ebi.ac.uk using -f From: Philip Lijnzaad To: smarkel@netgenics.com CC: biomolecular-ftf@omg.org In-reply-to: <3945C451.6CDBB21E@netgenics.com> (message from Scott Markel on Mon, 12 Jun 2000 22:19:13 -0700) Subject: Re: [BSA-FTF] proposed DsLSRBioObjects changes Reply-to: lijnzaad@ebi.ac.uk References: <3945C451.6CDBB21E@netgenics.com> Content-Type: text X-UIDL: 3'\d9`a3!!_bK!!0GCe9 > Several issues have come up during our implementation of the BSA > specification at NetGenics. In light of what we've learned, I'd > like > to propose the following. > * Have the BSA factories, e.g., AnnotationFactory, inherit from > CosLifeCycle::Factory. CosLifeCycle::LifeCycleObject's copy() and > and move() methods take a CosLifeCycle::FactoryFinder as their > first argument. CosLifeCycle::FactoryFinder naturally returns > CosLifeCycle::Factories. Note that CosLifeCycle::Factory is just a > typedef'd Object, so no additional implementation is required. So, > for no additional work, we can make the use of CosLifeCycle easier. That's fine for this case. At the same time I'd like to state that I'm not terribly fond of CosLifeCycle as a whole. From what I see, it's been in effect more or less depracated, amongst other because it's really difficult to pin down and implement the semantics of move() and copy() in the presence of object graphs. > * Remove CosLifeCycle::LifeCycleObject behavior from GeneticCode. > It seems to make sense to have GeneticCodes be singletons, so copy(), > move(), and remove() really aren't needed. yes, agreed. > * Add initiators and terminators to GeneticCode. Try using GeneticCode > to build an ORF finder and you'll appreciate the following > functionality. > typedef sequence CodonList; > interface GeneticCode > { > // in addition to existing functionality > readonly attribute CodonList initiators; > readonly attribute CodonList terminators; > boolean is_initiator(in Codon the_codon) > raises(InvalidResidue); > boolean is_terminator(in Codon the_codon) > raises(InvalidResidue); > }; Fine with me. (BTW, in InvalidResidue, what is offset, is that the phase of the nucleotide or so (i.e. 1 or 2 or 3) ? why is it long?) > * Add SequenceAlphabet. We decided early on to avoid the sequence > alphabet issue. Well, it's back. It's pretty difficult to actually > build a GeneticCode or check for invalid residues in the sequence > factories without some standard functionality. The following is > based on the IDL extensions we using in our implementation. The > design was patterned after what the submitters did for GeneticCode. > By the way, this fits in nicely with Philip's comment about adding > DNASequence and RNASequence. > typedef sequence ResidueList; > typedef string SequenceAlphabetName; > typedef sequence SequenceAlphabetNameList; > interface SequenceAlphabet > { > readonly attribute SequenceAlphabetName name; > // valid is the union of unambiguous and ambiguous > readonly attribute ResidueList valid_residues; > readonly attribute ResidueList unambiguous_residues; > readonly attribute ResidueList ambiguous_residues; > boolean is_valid(in Residue the_residue); > boolean is_ambiguous(in Residue the_residue); > // returns the list of all residues included (represented) by > // the input residue, e.g., return A for A and ACGT for N > ResidueList included_residues(in Residue the_residue) > raises(InvalidResidue); > }; sounds OK. > interface NucleotideSequenceAlphabet : SequenceAlphabet > { > readonly attribute ResidueList > complementary_valid_residues; > readonly attribute ResidueList > complementary_unambiguous_residues; > readonly attribute ResidueList > complementary_ambiguous_residues; > Residue complement(in Residue the_residue) > raises(InvalidResidue); > }; why do we need the readonly atttributes; they would be in the superclass ? The complement() function seems OK. But does all this belong in DsLSRBioObjects ? I think some of this is really more of a utility, so we should think about putting this in a separate module. > interface AminoAcidSequenceAlphabet : SequenceAlphabet > { > }; > exception InvalidSequenceAlphabetName > { > string invalid_name; > }; > interface SequenceAlphabetFactory > { > const SequenceAlphabetName IUPAC_DNA = "IUPAC DNA"; > const SequenceAlphabetName IUPAC_RNA = "IUPAC RNA"; > const SequenceAlphabetName IUPAC_AA = "IUPAC AA"; > readonly attribute SequenceAlphabetNameList sequence_alphabet_names; > SequenceAlphabet create_sequence_alphabet(in SequenceAlphabetName name) > raises(InvalidSequenceAlphabetName); > }; > interface GeneticCode > { > // in addition to its current functionality and the > initiators > // and terminators proposed above > readonly attribute NucleotideSequenceAlphabet > nucleotide_sequence_alphabet; > readonly attribute AminoAcidSequenceAlphabet > amino_acid_sequence_alphabet; > }; > interface BioSequence > { > // in addition to its current functionality > readonly attribute SequenceAlphabet sequence_alphabet; > }; looks OK. > * Consider changing AlignmentElement's element from an Object to a > BioSequence. At the moment element is required to have a name and > provide character-based data, but there's no way to ensure that > behavior. I really dislike implementations that have to hard code > checks on the type of a data member. If the upcoming BSA follow-on > RFPs add new bio-objects, we'll end up checking for BioSequence, > HMM, profile, etc., and then have to treat each one individually. yes, I agree, but if they are BioSequence and you want to also use HMM's and profiles and patterns, then all of them have to have readonly attribute string seq; and string seq_interval(in Interval the_interval); This string is not meaningful for HMMs, profiles and patterns; this was the reason for keeping CORBA::Object there. However, these use cases are probably very infrequent, so it's OK with us to make it BioObjects, as long as we document clearly that it is not the intention of the spec to invent new alphabets that intended to encode HMM states or patters; that's just the wrong way to do it, if only because HMM and patterns need not be even be linear. Cheers, Philip -- Ban GM foods! Long live the Mesolithicum and pesticides! ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad@ebi.ac.uk \ European Bioinformatics Institute,rm A2-24 +44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53 Date: Tue, 19 Sep 2000 20:21:40 -0700 From: Scott Markel Organization: NetGenics, Inc. X-Mailer: Mozilla 4.73 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: BSA FTF Subject: [OMG-BSA] proposed resolutions for some issues Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: W9!e9QLE!!'Okd9>SPe9 In accordance with the consensus at last week's FTF meeting, here are the proposed resolutions for the issues that were considered straightforward. Note that as some multi-issue issues are split apart, some of those may also fall into this category. I'm not reproducing the entire text of the individual issues. You can find that at http://cgi.omg.org/issues/biomolecular-ftf.html. Issues can be reported online at http://www.omg.org/technology/issuesform.htm. This message is *not* a vote. I just want to make sure these particular issues are as straightforward as some of us thought last week. Please let me know if you disagree. Actually it would be nice to know if you agree, too. :) If there's been no response by early next week (Philip is out until then), I'll send out a message kicking off a vote. If there are responses, hopefully we'll get the resolutions resolved and still vote on these issues next week. Scott Issue 3699: proposed DsLSRBioObjects changes (04) Resolution: Rejected. Withdrawn by proposer. ----