Issue 3335: how to extend BSA DTF?
Issue 3336: truncatable valuetypes
Issue 3437: Biomolecular FTF issue: orb.idl
Issue 3451: MultipleExceptions
Issue 3546: Ambiguous wording in section 2.2.10
Issue 3547: Editorial issue in section 2.2.10
Issue 3548: 2.2.10 has a bullet "an AnalysisType" which is no more valid
Issue 3549: 2.2.10 input Properties
Issue 3550: 2.2.10 the second paragraph is a bit confusing
Issue 3551: method 'result()' of an AnalysisInstance object
Issue 3552: time stamps in section 2.2.9
Issue 3553: Exception NotRunning() is raised by terminate() only
Issue 3554: 2.2.10 about last_event
Issue 3555: Precise definition needed
Issue 3556: Life cycle of an AnalysisInstance
Issue 3557: why does SimilaritySearchHit contain a list of Alignments
Issue 3595: using Time Service
Issue 3687: DsLSRBioObjects issue
Issue 3688: do we have to be more strict about the contents of attributes
Issue 3689: BASIS_NOT_APPLICABLE
Issue 3690: Metadata for properties necessary?
Issue 3691: The spec mentions IUPAC-IUB single letter codes
Issue 3692: DNASequence and RNASequence
Issue 3696: proposed DsLSRBioObjects changes (01)
Issue 3697: proposed DsLSRBioObjects changes (02)
Issue 3698: proposed DsLSRBioObjects changes (03)
Issue 3699: proposed DsLSRBioObjects changes (04)
Issue 3700: proposed DsLSRBioObjects changes (05)
Issue 3763: why no CosLifeCycle::LifeCycleObject for BioSequence ?
Issue 3805: section 2.1.5 (p. 2-7) - CompositeSeqRegion
Issue 3806: section 2.1.9 (p. 2-21) - IntervalOutOfBounds and SeqRegionOutOfBounds
Issue 3807: section 2.1.9 (p. 2-22) SeqRegionInvalid
Issue 3808: section 2.1.9 (p. 2-24) BioSequence
Issue 3809: section 2.1.9 (p. 2-24) BioSequence
Issue 3810: section 2.1.9 (p. 2-24) BioSequence
Issue 3811: section 2.1.9 (p. 2-25) BioSequence
Issue 3812: section 2.1.10 (p. 2-28) NucleotideSequence
Issue 3813: section 2.1.10 (p. 2-30) NucleotideSequence
Issue 3814: section 2.1.15 (p. 2-44) Alignment
Issue 3815: section 2.1.22 (p. 2-63) SeqAnnotationOutOfBounds
Issue 3816: section 2.1.25 (p. 2-71) GeneticCodeFactory
Issue 3870: typedef sequence<Interval> IntervalList; is missing from the spec
Issue 3871: IntervalList get_gaps(in AlignmentElement element, in Interval the_interval
Issue 3872: Separate the Alignment interface into more managable pieces
Issue 3874: add an Identifier to SeqRegion
Issue 3875: inheritance in annotation iterators
Issue 3924: module DsLSRAnalysis issue
Issue 3933: How to stringified CORBA TypeCodes
Issue 3934: Second issue - How to specify "repetitive" inputs
Issue 3961: SeqRegionInvalid not just for wrong StrandTypes
Issue 3962: clarification of strand_type and CompositeSeqRegions
Issue 3963: clarification of SeqRegionOperator.ORDER
Issue 3964: OutOfBounds exceptions for circular sequences if start = 0
Issue 3965: add BioSequence.get_annotations_by_name()?
Issue 3968: use key in Alignment's get_seq_region()
Issue 3335: how to extend BSA DTF? (biomolecular-ftf)
Click here for this issue's archive.
Source: Japan Biological Informatics Consortium (Mr. Martin Senger, martin.senger(at)gmail.com)
Nature: Uncategorized Issue
Severity:
Summary:
The question is how to extend BSA metadata to be able to include
additional vendor-specific tags.
Originally, I have suggested tag <extension> of type ANY to be used for
these specific tags. However, in that time my understanding of ANY was
wrong. I thought that an element of type ANY can have any data including
new tags. Which was a wrong assumption. One can use only tags (any of
them) defined in the DTD.
So now my understanding is that if I need a vendor-specific tags, I
need to extend DTD. The XML books say that an internal DTD has priority
over an external DTD and that document authors can override things using
their own internal DTD. Is this the way how to extend BSA metadata?
I have tried. Here is an example of an extended XML file with an
internal DTD defined new tags:
<?xml version = "1.0"?>
<!DOCTYPE DsLSRAnalysis SYSTEM "http://localhost/openBSA/docs/DsLSRAnalysis.dtd" [
<!ELEMENT extension (part1?,part2?,part3*)>
<!ELEMENT part1 ANY>
<!ELEMENT part2 ANY>
<!ELEMENT part3 ANY>
]>
<DsLSRAnalysis>
<analysis type = "search.list">
<extension>
<part1> this is part 1 </part1>
</extension>
</analysis>
</DsLSRAnalysis>
Unfortunately, this does not pass validation tests. I am getting an
error saying "Error at (file http://....DsLSRAnalysis.dtd, line 13, char
25): Duplicate element name, "extension".
Any ideas what I am doing wrong, or what is the correct way how to
extend BSA metadata? Is the element <extension> in BSA DTD useable at all?
If yes, how can be used, if not, should we re-consider it in the FTF?It seems to me that we have an error (or errors) in our spec. If I am right I would like to bring this issue to the attention of the FTF during Denver meeting. Please help me to find if I am mistaken or not. My understanding is that for being able to extend BSA spec by introducing new valuetypes as subclasses of the existing (in BSA spec) valuetypes, we need to use "truncatable" in our spec. Having it the "extended" servers can still provide sub-classed valuetypes to the "dumb" clients (those who understand only the original valuetype). But without this keyword the extended servers can work only with extended clients. Which is not what we wanted (remember the discussion with Oxford Molec. about it). Also the BSA document says in 1.4.1 that we are going to use "truncatable". However, we did not.
The corba spec says (in chapter 3.14 "CORBA Module") that "the file orb.idl must be included in IDL files that use names defined in the CORBA module". Our spec does contain (in module DsLSRAnalysis) CORBA::TypeCode. So I guess that we should have also #include <orb.idl> there. Some ORBs may not compile it properly without it.
Doing my implementation of AnalysisService::create_analysis() I have found that MultipleExceptions exception does give me only little space for sending back explanation why the parameter checking failed. I can say what property caused the failure and I can roughly tell a type of this failure. But unfortunately there is no 'reason' string to be filled with an explanatory message.
2.2.10 says "An AnalysisInstance object must offer...the EventChannel to which it publishes its analysis events and the last event that occurred..." The wording seems to me ambiguous. I guess that an AnalysisInstance offer the last event, not the EventChannel - so the last event should be under separate bullet. Suggestion: to split the bullet into two, and also reiterate that the EventChannel can be null (as stated in 2.2.7).
2.2.10 (and on other places) says "the JobControl that clients can use to control the execution...". I think that the word "can" is misleading here, a client could not start an analysis without a JobControl. Suggestion: to drop "can".
2.2.10 has a bullet "an AnalysisType" which is no more valid - an AnalysisInstance does not have any such attribute. Suggestion: to remove the bullet
2.2.10 says "the input Properties that were used in execution". Which is not exact - the input properties are available already after creation of an AnalysisInstance, even before any execution started, Suggestion: to change the wording to something like "used in creation of this AnalysisInstance".
2.2.10 the second paragraph is a bit confusing. If you read it slowly it may give you impression that run() is used for asynchonous invocation and wait() for synchronous. Suggestion: to replace the last sentence with: "If the client wants to be blocked waiting for the underlying BSA analysis tool to run to completion, it will invoke the run() method, followed immediately by the wait() method which will block the client until service execution completes."
Regarding method 'result()' of an AnalysisInstance object: If a particular result does not exist (not yet, or at all), should this method just ignore it, or to send back "an empty" Any (with tc kind tc_void)? Also we should specify what happen if methods result() and get_result() are called when an analysis is in the states CREATED and/or RUNNING (as we do it for TERMINATED* states). Suggestion: it needs to be discuss whether some document clarification would be enough, or if we want to have an additional exception.
About time stamps attributes (Execution performance information) in section 2.2.9: it is not clear if the current set of attributes is rich enough to provide the real "execution" information. Calling run() method is not necessary the same as starting an execution (considering some queueing sytem). Suggestion: to discuss and maybe to add an attribute keeping the "real" execution start; anyway the spec needs probably some clarification.
Exception NotRunning() is raised by terminate() only (as I remember) if the analysis is "not yet" running, but it should not be raised if it is already terminated. Suggestion: discuss if we can agree on the sentence above, and include it into the document.
2.2.10 about last_event says "last event that occurrred during execution". So what does this attribute return when the analysis instance is still in CREATED state? Suggestion: clarify in the document that in such case a plain AnalysisEvent is returned.
The document only indicates (by examples) how to code CORBA::TypeCodes
into stringified form used in metadata. We need much precise definition
how to do it and what to expect in 'type' tag of metadata.
Suggestion:
a) to investigate if such "rules" already exist in OMG worlds (such
as XMI), then clarify it in our document.
b) we may find that for sequences of IDL types defined in
DsLSRBioObjects module we will need to add also several sequence<...>
constructs in our spec (fortunately we have already quite a lot there);
this would be needed for being able to specify a "repetitive" argument -
when we do not know how many occurences it can have.
Life cycle of an AnalysisInstance. These objects have the back pointer to their AnalysisServices. However, I can imagine the situation where clients have AnalysisInstance objects stored somewhere and using them to retrieve results of previously (even long ago) started and finished analysis. This normal situation can become interesting if the server does not provide that analysis any more. I feel that it would be still reasonable to provide access to the old stored results. Suggestions: But to allow it, our spec should mention this possibility - and allows to have this back poiter empty. So the client will have access to their data but with limited support from the server (for example the client cannot get metadata because the corresponding analysis is simply dead). Or we could say that the back poiner cannot remain empty but can return NO_OBJ... or whatever sytem exception (in situations as described above)>. Or, we'd have to change the readonly attribute to a method so that we can add the exception.
during our implementation work we came across the following a point that
is puzzling:
why does SimilaritySearchHit contain a list of Alignments? A SearchHit is a
match of one target sequence with one or more matched sequence. Practically
by definition, each such a hit is _one_ alignment (even if the alignment is
more than pairwise). So we should either change the spec to be
valuetype SimilaritySearchHit : SearchHit
{
public Alignment the_alignment;
};
or, leave the spec and document that the only legal length for
alignment_list is 1. Thoughts?I would like to discuss the usage of data types from TimeBase module. It is probable that I just need some patient explanation - but it may also turn out that some clarification in our bsa spec could be made. I mean a clarification how to use the TimeBase data types properly. The talk is about the "absolute" attributes: created, started, ended (whatever moment they mean - it is a different topic). These attributes are expressed as TimeBase::UtcT. Which means that each of them consists from 4 values: TimeT, two InaccuracyT's and TdfT. The simplest is probably TdfT which defines in which time zone the analysis was executed. Is this interpretation correct? The members for inaccuracy may be used to say that the main time (in TimeT, see below) is actually only precise for milliseconds - because I guess most implementations and languages give us easily only milliseconds and not 100 nanoseconds. Again, would this interpretation of these attributes be correct? The main attribute is TimeT. Here is the definition from the TimeBase spec: "TimeT represents a single time value, which is 64 bits in size, and holds the number of 100 nanoseconds that have passed since the base time. For absolute time the base is 15 October 1582 00:00 of the Gregorian Calendar. All absolute time shall be computed using dates from the Gregorian Calendar." My question is: Does CORBA Time Service specify that this attribute must be an _absolute_ time? Would not be easier (and more appropriate for our purposes) to say in our spec that this time is _relative_ to the beginning of the epoch (1/1/1970)? In other words, there are actually two questions: - Do we need to use an absolute time here (to remain compliant with the Time Service)? - And if the answer is "no", then: do we want to have here an absolute gregorian time?
We have been busy implementing the BSA spec, and there's a few things in the
DsLSRBioObjects part that we would like your opinion on.
The biggest one concerns the Alignment interface. As anticipated by all of us
except Ewan, Alignment is tricky to implement efficiently because of
gaps. The only way to do it is to repeatedly invoke get_seq_region, and
seeing if you get back a null. Of course, an AlignmentEncoder can do this on
the server-side, but it this is still clumsy (and optional).
Our main proposal is to add an operation
IntervalList get_gaps(in AlignmentElement element, in Interval the_interval);
to the interface Alignment. It's job is to simply return all the gaps of a
particular sequence in a particular alignment. For symmetry with
get_seq_region(), the_interval is also given, thus limiting the gaps to those
that you're interested in.
[ Incidentally,
typedef sequence<Interval> IntervalList;
is missing from the spec; can this be added? ]
One gap would be represented as {start,length}, hence the use of Interval. We
could add
typedef Interval Gap;
to make the semantics clearer (elsewhere, an Interval is an existing segment;
here it denotes a missing segment). The coordinates of a gap would be those
of the original sequence; gaps of length 0 are not allowed. A gap.start == 0
would be before the first nucleotide/aminoacid; a gap.start = N is a gap
between nucleotides/aminoacids N and N+1 (so gap.start = sequence.length
would be after the last.
Another proposal is to separate Alignment into more managable pieces as
follows:
interface SimpleAlignment : CosLifeCycle::LifeCycleObject {
// ...
// here, everything get_seq_region()
// ...
}
interface Alignment : SimpleAlignment {
SeqRegion get_seq_region(
in AlignmentElement element,
in Interval the_interval)
raises(ElementNotInAlignment, IntervalOutOfBounds);
}
do we have to be more strict about the contents of attributes? E.g.,
interface BioSequence
{
readonly attribute string name;
readonly attribute Identifier id;
readonly attribute string description;
readonly attribute string seq;
readonly attribute unsigned long length;
readonly attribute Basis the_basis;
has quite a few strings, all of which could be empty. We may want to
explicitly allow or forbid this on some of them.
For enum Basis, we only have BASIS_NOT_KNOWN; for consistence with enum StrandTytpe, shouldn't we also have BASIS_NOT_APPLICABLE? (actually, I think this Basis thing may turn out to become on epistemological nightmare, but I digress)
- We have many properties lying around the spec; do we need a mechanism for getting meta data on this? I think we must have discussed in some detail, but the situation is quite pressing for SearchHit in particular: there we have hit_info, where prolly the most important piece of information is the goodness of the hit. What's the name of that: "score"? "p-value"? "quality"? What's it's type: float? double ? long? I realize that this is difficult to answer (otherwise we would have included it), but we are open to suggestions to make the standard a bit more precise in this respect (e.g., we're still willing to host a public repository kind-a-thing).
The spec mentions IUPAC-IUB single letter codes,
I think this has been renamed to IUPAC-IUBMB Joint Commission on
Biochemical Nomenclature (JCBN); the best web site for this is
http://www.chem.qmw.ac.uk/iupac/AminoAcid/ (amino acids)
http://www.chem.qmw.ac.uk/iubmb/misc/naseq.html (nucleic acids)
http://www.chem.qmw.ac.uk/iubmb/nomenclature/ (all biostandards)
Might be useful to include in the references.
IUB allows termination characters in amino acids; I think we should not
allow them. Also, it might be good to explicitly state that gap characters
('-' or so) are not supposed to be in sequences (this is fairly implicit in
IUPAC, so I think had doesn't hurt to be clear about this).
Can anybody remember why we don't have DNASequence and RNASequence, both inheriting from NucleotideSequence?
Rejected. This had been debated by the submitters during the revised submission process. A future RFP could address this.
Several issues have come up during our implementation of the BSA
specification at NetGenics. In light of what we've learned, I'd like
to propose the following.
* Have the BSA factories, e.g., AnnotationFactory, inherit from
CosLifeCycle::Factory. CosLifeCycle::LifeCycleObject's copy() and
and move() methods take a CosLifeCycle::FactoryFinder as their
first argument. CosLifeCycle::FactoryFinder naturally returns
CosLifeCycle::Factories. Note that CosLifeCycle::Factory is just a
typedef'd Object, so no additional implementation is required. So,
for no additional work, we can make the use of CosLifeCycle easier.
* Remove CosLifeCycle::LifeCycleObject behavior from GeneticCode.
It seems to make sense to have GeneticCodes be singletons, so copy(),
move(), and remove() really aren't needed. Add initiators and terminators to GeneticCode. Try using GeneticCode
to build an ORF finder and you'll appreciate the following
functionality.
typedef sequence<Codon> CodonList;
interface GeneticCode
{
// in addition to existing functionality
readonly attribute CodonList initiators;
readonly attribute CodonList terminators;
boolean is_initiator(in Codon the_codon)
raises(InvalidResidue);
boolean is_terminator(in Codon the_codon)
raises(InvalidResidue);
};Add SequenceAlphabet. We decided early on to avoid the sequence
alphabet issue. Well, it's back. It's pretty difficult to actually
build a GeneticCode or check for invalid residues in the sequence
factories without some standard functionality. The following is
based on the IDL extensions we using in our implementation. The
design was patterned after what the submitters did for GeneticCode.
By the way, this fits in nicely with Philip's comment about adding
DNASequence and RNASequence.
typedef sequence<Residue> ResidueList;
typedef string SequenceAlphabetName;
typedef sequence<SequenceAlphabetName> SequenceAlphabetNameList;
interface SequenceAlphabet
{
readonly attribute SequenceAlphabetName name;
// valid is the union of unambiguous and ambiguous
readonly attribute ResidueList valid_residues;
readonly attribute ResidueList unambiguous_residues;
readonly attribute ResidueList ambiguous_residues;
boolean is_valid(in Residue the_residue);
boolean is_ambiguous(in Residue the_residue);
// returns the list of all residues included (represented) by
// the input residue, e.g., return A for A and ACGT for N
ResidueList included_residues(in Residue the_residue)
raises(InvalidResidue);
};
interface NucleotideSequenceAlphabet : SequenceAlphabet
{
readonly attribute ResidueList complementary_valid_residues;
readonly attribute ResidueList complementary_unambiguous_residues;
readonly attribute ResidueList complementary_ambiguous_residues;
Residue complement(in Residue the_residue)
raises(InvalidResidue);
};
interface AminoAcidSequenceAlphabet : SequenceAlphabet
{
};
exception InvalidSequenceAlphabetName
{
string invalid_name;
};
interface SequenceAlphabetFactory
{
const SequenceAlphabetName IUPAC_DNA = "IUPAC DNA";
const SequenceAlphabetName IUPAC_RNA = "IUPAC RNA";
const SequenceAlphabetName IUPAC_AA = "IUPAC AA";
readonly attribute SequenceAlphabetNameList sequence_alphabet_names;
SequenceAlphabet create_sequence_alphabet(in SequenceAlphabetName name)
raises(InvalidSequenceAlphabetName);
};
interface GeneticCode
{
// in addition to its current functionality and the initiators
// and terminators proposed above
readonly attribute NucleotideSequenceAlphabet nucleotide_sequence_alphabet;
readonly attribute AminoAcidSequenceAlphabet amino_acid_sequence_alphabet;
};
interface BioSequence
{
// in addition to its current functionality
readonly attribute SequenceAlphabet sequence_alphabet;
};
Consider changing AlignmentElement's element from an Object to a
BioSequence. At the moment element is required to have a name and
provide character-based data, but there's no way to ensure that
behavior. I really dislike implementations that have to hard code
checks on the type of a data member. If the upcoming BSA follow-on
RFPs add new bio-objects, we'll end up checking for BioSequence,
HMM, profile, etc., and then have to treat each one individually.
The real way to solve this problem is to have a BioObject, but that
clearly outside the scope of the current FTF.CosLifeCycle::LifeCycleObject, but BioSequence itself does not ? Was this deemed to present too much constraints on BioSequence ? The problem that Fabien Campagne <campagne@inka.mssm.edu> notes with this is that a client of BioSequences can only call remove() on one of the sub-classes, not on an un-extended BioSequence itself, nor on a different sub-class that does not have a remove() operation (or maybe has it under a different name or whatever). He thinks this is bad design because without remove(), the client loose the option of trying to be cooperative in the resource management of the server. I have to agree on this point. Or is inheriting from LifeCycleObject not there mainly of remove() ? In that case, why the asymmetry between BioSequence and it's sub-class.
section 2.1.5 (p. 2-7) - CompositeSeqRegion The second paragraph correctly states that a CompositeSeqRegion's start and length are not defined. It should also state that strand_type and start_relative_to_seq_end are not defined.
section 2.1.9 (p. 2-21) - IntervalOutOfBounds and SeqRegionOutOfBounds Change If a BioSequence represents circular DNA, then this exception should not be raised. to If a BioSequence represents circular DNA, then this exception should not be raised, unless the SeqRegion's length is greater than that of the BioSequence.
section 2.1.9 (p. 2-22) SeqRegionInvalid The description is too narrow. Add another example, e.g., StrandType is inappropriate for a given type of BioSequence (e.g., StrandType other than NOT_APPLICABLE in an AminoAcidSequence)
section 2.1.9 (p. 2-24) BioSequence Change the description of description from The description attribute is a concise description of the sequence typically would include functional information (e.g., the contents of the description line from a Fasta file). to The description attribute, a concise description of the BioSequence, typically includes some functional information, e.g., the contents of the description line from a FASTA file.
section 2.1.9 (p. 2-24) BioSequence Under Exceptions for seq_interval(), the text is not entirely accurate. If the length field of the Interval input parameter is greater than that of the BioSequence, IntervalOutOfBounds should be raised irrespective of circularity. Change If a BioSequence represents circular DNA, then this exception should not be raised. to If a BioSequence represents circular DNA, then this exception should be raised only if the length field of the Interval input parameter is greater than that of the BioSequence. -
section 2.1.9 (p. 2-24) BioSequence seq_intervals()'s lone exception IntervalOutOfBounds is insufficient to handle cases where the Interval is actually a SeqRegion that has an invalid StrandType or SeqRegionOperator. I suggest adding SeqRegionInvalid to cover this case. I don't think we need to add SeqRegionOutOfBounds; IntervalOutOfBounds seems sufficient for coordinate problems.
section 2.1.9 (p. 2-25) BioSequence
The descriptions for both get_annotations() and num_annotations() need
to be made more precise.
Change
Only the SeqAnnotations that overlap seq_region will be returned.
to
Only the SeqAnnotations that overlap seq_region and have compatible
StrandTypes will be returned.
Add the following tables.
BioSequence Type Valid StrandTypes
BioSequence STRAND_NOT_KNOWN
NucleotideSequence STRAND_NOT_KNOWN,STRAND_PLUS,STRAND_MINUS,
STRAND_BOTH
AminoAcidSequence STRAND_NOT_APPLICABLE
StrandType Matching StrandTypes
STRAND_NOT_KNOWN STRAND_NOT_KNOWN, STRAND_PLUS, STRAND_MINUS,
STRAND_BOTH
STRAND_NOT_APPLICABLE STRAND_NOT_APPLICABLE
STRAND_PLUS STRAND_NOT_KNOWN, STRAND_PLUS, STRAND_BOTH
STRAND_MINUS STRAND_NOT_KNOWN, STRAND_MINUS, STRAND_BOTH
STRAND_BOTH STRAND_NOT_KNOWN, STRAND_PLUS, STRAND_MINUS,
STRAND_BOTH
section 2.1.10 (p. 2-28) NucleotideSequence Under Exceptions for reverse_complement_interval() change If a NucleotideSequence represents circular DNA, then this exception should not be raised. to If a NucleotideSequence represents circular DNA, then this exception should be raised only if the length field of the SeqRegion input parameter is greater than that of the NucleotideSequence.
section 2.1.10 (p. 2-30) NucleotideSequence Under Exceptions for translate_seq_region() change If a NucleotideSequence represents circular DNA, then this exception should not be raised. to If a NucleotideSequence represents circular DNA, then this exception should be raised only if the length field of the SeqRegion input parameter is greater than that of the NucleotideSequence.
section 2.1.15 (p. 2-44) Alignment In the description of the AlignType constants change However, more complex regions (e.g., a transmembrane spanning protein sequence segment, are entirely possible). to However, more complex regions, e.g., a transmembrane protein sequence segment, are entirely possible.
section 2.1.22 (p. 2-63) SeqAnnotationOutOfBounds Change the last line of the description from If a BioSequence represents circular DNA, then this exception should not be raised. to If a BioSequence represents circular DNA, then this exception should be raised only if the length field of the Interval input parameter is greater than that of the BioSequence.
section 2.1.25 (p. 2-71) GeneticCodeFactory The desription of the return value for the genetic_code_names atttribute is incorrect. Change Returns a GeneticCodeName. to Returns a GeneticCodeNameList.
typedef sequence<Interval> IntervalList; is missing from the spec. First sub issue of issue # 3687
add an operation > > IntervalList get_gaps(in AlignmentElement element, in Interval the_interval); > > to the interface Alignment. It's job is to simply return all the gaps of a > particular sequence in a particular alignment. For symmetry with > get_seq_region(), the_interval is also given, thus limiting the gaps to those > that you're interested in.
Separate the Alignment interface into more managable pieces as
follows:
interface SimpleAlignment : CosLifeCycle::LifeCycleObject {
// ...
// here, everything BUT get_seq_region()
// ...
}
interface Alignment : SimpleAlignment {
SeqRegion get_seq_region(
in AlignmentElement element,
in Interval the_interval)
raises(ElementNotInAlignment, IntervalOutOfBounds);
}Rejected. Adding the get_gaps() operation described in issue 3871 was deemed sufficient. This split had been proposed as an alternate solution.
Please add an Identifier to SeqRegion so that a CompositeSeqRegion can span multiple BioSequences.
During BSA implementation we have found that SeqAnnotationIterator does _not_ inherit from AnnotationIterator, and that SeqAnnotationIterator is not used explicitly anywhere in the spec. Please can you remind me why it is like that, or confirm my feeling that something is missing there? I see several possible solutions of this issue - ca we discuss them briefly before going to _the_ "resolution? 1) Nothing is wrong in current spec, I have just missed something. 2) The SeqAnnotationIterator will inherit from AnnotationIterator and its next() and next_n() methods will be renamed to something like next_seq_anno() and next_n_seq_anno(). Obviously, there would have be explanation that methods next() and next_seq_anno() should behave the same. Not too nice, and not too OO. 3) The SeqAnnotationIterator will inherit from AnnotationIterator but will remain empty. Does this bring "more type safety" - that is the claimed reason why we have SeqAnnotationIterator at the first place. 4) The SeqAnnotationIterator will be removed completely. 5) BioSequence interface will get an addition method get_seq_annotations returning in out parameter a SeqAnnotationIterator (no inheritance between iterators needed). To be honest I cannot make my mind which solution I would prefer at the moment. Probably the ad 1).
During implementation of BSA we found that some analysis can return
huge strings (not a big surprise :-)). Of course, the preferable way how
to deal with them is to convert the string(s) to a BioObject and use an
iterator defined for such object. However, because module DsLSRAnalysis is
designed for general usage, it should be prepared also for dealing with
long strings without any conversion (and let the client to do whatever
needed and wanted).
Therefore we are proposing to add an Iterator allowing to split long
strings into manageable pieces (and it would work on sequences of octets
as well). The remarks for such iterator:
- It does not raise any exception (such as InvalidIterator) because the
"iterated object" is considered unmutable.
- It uses sequence of octets so it can be used for binary data as well
(imagine transferring picture data - they may quite easily be results of
an analysis).
- It does not have method "next()" because it does not seem to be
necessary to ask just for one single octet.
- The parameter 'how_many' contains the maximal length of the returned
sequence of octets in the out parameter.
Proposed resolution:
--------------------
We propose to add the following into DsLSRAnalysis module (we will
suggest what describing text to be added into the document only after we
see a concensus on this issue, but basically the text will follow comments
raised in the summary above):
interface OctetIterator
{
boolean next_n(in unsigned long how_many,
out CORBA::OctetSeq octets);
void reset();
void destroy();
};
We need precise definition how to express CORBA TypeCode as a string > which is needed for using it in metadata attribute "type" in input/output > property spec. I am going to make a proposal on that when this issue gets > new number.
Summary: > We need to be able to send variable number of inputs of the same name > and type to an analysis (variable because we may not know in advance how > many "repetitions" will be sent by a client). I am going to make a > proposal (based on mail discussion with Mike Dickson and others) when this > issue gets new number.
Accepted. Decided to make no change to the specification. The specification does not preclude having multiple inputs of the same name.
Many methods have an Interval as an input parameter (directly, or indirectly
as part of a SeqAnnotation)
Since Interval can-be-a [Composite]SeqRegion, it would seem logical that all
these methods can raise SeqRegionInvalid as well.
Currently, SeqRegionInvalid can only be raised if the StrandType isn't
appropriate; we think that it's natural to also raise it for wrong
CompositeSeqRegion (e.g., when the CompositeSeqRegion has internal overlaps
in cases wher this does not make sense).
So the issue is 1) allow raising SeqRegionInvalid for reasons other than
wrong strandtype and 2) add this exception to the relevant methods, which
are:
BioSequence
- add_annotation
- seq_interval
NucleotideSequence
- reverse_complement_interval
Alignment
- get_seq_region
SingleCharacterAlignmentEncode
- get_row_column_interval
- get_row_interval
AnnotationFactory
- create_seq_annotation
(this one also needs SeqRegionOutOfBounds)
The methods BioSequence.seq_interval(in Interval the_interval), NucleotideSequence.translate_seq_region(in SeqRegion seq_region, ...) NucleotideSequence.reverse_complement_interval(in Interval the_interval) may all be called using a SeqRegion. In this case, the StrandType may be minus, and if that is the case, the string returned (for seq_interval) or the string 'transformed' (the other cases) should be taken as reverse-complemented. In the last case, this will result in a no-op, i.e. the strand_type leads to reverse-complementing, which is then reverse-complementing due to the semantics of the method, ending up in the same string that wouuld be had from seq_interval(). We suggest adding verbiage to explain these semantics (if, of course, everyone agrees that this is the expected behaviour!) Similar to this issue is that all CompositeSeqRegions are expected to be translated in a depth-first traversal, along each node of the tree represented by the CompositeSeqRegions. This includes those nodes that have region_operator == JOIN or ORDER.
The usage of the region_operator==ORDER can be clarified. I did not know this, but it turns out that ORDER is typically used for describing regions that don't as such get translated or transcribed as a contiguous stretch of biopolymers like exons are. Instead, they are often used for the location of descriptive annotations, such as 'histone binding region' or 'regulatory elements', or 'homology with mouse p53'. therefore, I propose to replace the text under ORDER of the enum SeqRegionOperator box to: ORDER should be used when the sub-regions are to be taken as an (ordered) set of sub-regions. Typically, it is used to represent a discontinuous region to which a descriptive annotation pertains.
We propose to throw an OutOfBounds exception on circular sequences also if start=0.
Given the wealth of annotations in many databases, we see the need for the addition of a method BioSequence.get_annotations_by_name(in string name); so that annotations can be retrieved more selectively.
Rejected. While the proposal is a reasonable one, it was thought to be the start of a larger issue of filtering annotation queries and this was deemed out of scope for the FTF. The vote was 1 YES, 3 NO, and 1 ABSTAIN
Problem: In section 2.1.15 Alignment's get_seq_region() method has AlignmentElement as an argument rather than the AlignmentElement's key. Proposed resolution: Change the IDL on p. 2-43, the method description on p. 2-46, and the IDL in appendix C.1.