Issue 5560: For interoperability, need a rule for formatting numbers (gene-expression-ftf) Source: Rosetta Biosoftware Business Unit (Mr. Michael D Miller, nobody) Nature: Uncategorized Issue Severity: Summary: Request from Michael Miller, Rosetta Biosoftware, for clarification on how numbers should be formatted: * integer values are just digits * floating point numbers use a '.' (period), with an optional leading zero and an optional leading +/- sign: * -0.912 * scientific notation: optional leading zero, optional leading +/- sign on the exponent and mantissa: * -2.34E-12 Resolution: reject the change but then see below Revised Text: After the first solution was suggested, Jason Stewart, Open Informatics, suggested that referencing the W3C XML specification would be better. Resolution Changes to the Specification: Add a section in Section 1.1, General Remarks, between Section 1.1.4 and Section 1.1.5 and title it "Number Format" Text for added section: "In order to foster interoperability, it is necessary to agree on a common lexicographical representation of numbers in valid MAGE-ML XML documents. Since the W3C recommendation, "XML Schema Part 2: Datatypes", contains definitions for such representations, they are the recommended form. Below are the definitions from that specification: "3.2.2 boolean ... 3.2.2.1 Lexical representation An instance of a datatype that is defined as ·boolean· can have the following legal literals {true, false, 1, 0}." "3.2.3 decimal ... NOTE: All ·minimally conforming· processors ·must· support decimal numbers with a minimum of 18 decimal digits (i.e., with a ·totalDigits· of 18). However, ·minimally conforming· processors ·may· set an application-defined limit on the maximum number of decimal digits they are prepared to support, in which case that application-defined maximum number ·must· be clearly documented. 3.2.3.1 Lexical representation decimal has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) separated by a period as a decimal indicator. If ·totalDigits· is specified, the number of digits must be less than or equal to ·totalDigits·. If ·fractionDigits· is specified, the number of digits following the decimal point must be less than or equal to the ·fractionDigits·. An optional leading sign is allowed. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. If the fractional part is zero, the period and following zero(es) can be omitted. For example: -1.23, 12678967.543233, +100000.00, 210." (·totalDigits· would be documented for an attribute by the exporting organization of the XML) "3.3.13 integer ... 3.3.13.1 Lexical representation integer has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) with an optional leading sign. If the sign is omitted, "+" is assumed. For example: -1, 0, 12678967543233, +100000." "3.2.4 float ... 3.2.4.1 Lexical representation float values have a lexical representation consisting of a mantissa followed, optionally, by the character "E" or "e", followed by an exponent. The exponent ·must· be an integer. The mantissa must be a decimal number. The representations for exponent and mantissa must follow the lexical rules for integer and decimal. If the "E" or "e" and the following exponent are omitted, an exponent value of 0 is assumed. The special values positive and negative zero, positive and negative infinity and not-a-number have lexical representations 0, -0, INF, -INF and NaN, respectively. For example, -1E4, 1267.43233E12, 12.78e-2, 12 and INF are all legal literals for float." "3.2.5 double ... 3.2.5.1 Lexical representation double values have a lexical representation consisting of a mantissa followed, optionally, by the character "E" or "e", followed by an exponent. The exponent ·must· be an integer. The mantissa must be a decimal number. The representations for exponent and mantissa must follow the lexical rules for integer and decimal. If the "E" or "e" and the following exponent are omitted, an exponent value of 0 is assumed. The special values positive and negative zero, positive and negative infinity and not-a-number have lexical representations 0, -0, INF, -INF and NaN, respectively. For example, -1E4, 1267.43233E12, 12.78e-2, 12 and INF are all legal literals for double." Add to Appendix A, References, in alphabetical order: "Paul V. Biron, Ashok Malhotra, eds. 2001. XML Schema Part 2: Datatypes, W3C Recommendation. 2 May 2001."" Actions taken: July 29, 2002: received issue December 11, 2002: closed issue Discussion: Request from Michael Miller, Rosetta Biosoftware, for clarification on how numbers should be formatted. Jason Stewart, Open Informatics, suggested the following: * integer values are just digits * floating point numbers use a '.' (period), with an optional leading zero and an optional leading +/- sign: -0.912 * scientific notation: optional leading zero, optional leading +/- sign on the exponent and mantissa: -2.34E-12 End of Annotations:===== X-Server-Uuid: F7D3E4A3-3C15-41D2-AC5D-A7D3F094E28F From: "Miller, Michael (Rosetta)" To: "'Juergen Boldt'" cc: gene-expression-ftf@omg.org Subject: Gene Expression FTF issues Date: Mon, 29 Jul 2002 07:30:43 -0700 X-Mailer: Internet Mail Service (5.5.2653.19) X-WSS-ID: 115B8FEF18115-01-01 Hi Juergen, Here's the last batch. Officially, I guess I'm entering them but these are collated from the MGED efforts for which there is no clear consensus yet. thanks, as always, Michael Michael Miller Senior Application Developer Rosetta Biosoftware michael_miller@rosettabio.com www.rosettabio.com For interoperability, need a rule for formatting numbers. Request from Michael Miller, Rosetta Biosoftware, for clarification on how numbers should be formatted: * integer values are just digits * floating point numbers use a '.' (period), with an optional leading zero and an optional leading +/- sign: * -0.912 * scientific notation: optional leading zero, optional leading +/- sign on the exponent and mantissa: * -2.34E-12 X-Server-Uuid: F7D3E4A3-3C15-41D2-AC5D-A7D3F094E28F From: "Miller, Michael (Rosetta)" To: "'jason@openinformatics.com'" cc: "'mged mage'" , "'GE FTF'" Subject: RE: [Mged-mage] Call to vote: Issues #5552, #5553, #5555, #5556, #5558, #5560 Date: Mon, 26 Aug 2002 11:43:47 -0700 X-Mailer: Internet Mail Service (5.5.2653.19) X-WSS-ID: 1174AAF411042-01-01 Hi Jason, You raise some good points, but there is the minor consideration that these are already sent to vote for the FTF with the deadline of this Friday. I'm forwarding this to the FTF for their considerations for voting on the bottom two issues, #5558 and #5560. The FTF can decide to accept the recommendations I put out or not to accept them. If they aren't accepted, then I can send out a further recommendation along the lines you outline. > If we add a special rule like 'you can have either This or That but > not both' and the rule is not encoded directly in the XMI file, then > we will have to add a third type of file that is used by MAGEstk, as > well as adding some special logic code to the generators. It looks like soon this won't be necessary because the rules in the > comments can be specified in a formal UML rules language, the Object Constraint Language (OCL), which is pretty stable but complex. A revision of OCL > is currently going through the OMG (OMG Document ad/02-05-09) > This will break MAGEstk for some time to come. I'll probably get the > Perl code working, after I figure out how best to do this, but who > will make the Java changes? It turns out that the above rule is encoded in the constraints of the > Model, if two association ends have the same rank, then they are a choice > group in the dtd. This is already in the Java code by adding the rank # to the association end object. Then the generating code orders the > associations by rank and checks for two associations that have the same rank. Note that the Java language can't enforce this directly, one either needs to add fairly complex code to have one association or the other but not both, or merely add a state check to the generated FactorValue class. The current generated java code simply lets the DTD enforce this. Note in the PNG the rank constraints for the associations from FactorValue and the change in the updated dtd: The Java code finds it in the XMI via Constraint-G.711 rank: 1 ... > After meeting with the SMD folks and discussing this issue with them, > I'd like to propose the following compromise. Rename SecurityGroup to > Role, and replace the two existing associations from Security > (WriteGroups and ReadGroups) with a single association, roles. > > Also it keeps it generic enough, that it is still within the scope of > the model, and is useful to other groups for describing what SMD wants > to do. But I would argue we aren't encoding specific Security information in MAGE--by specifying what SecurityGroup a person belongs to, the database already knows what database rules that person has. The only reason that should be in the MAGE document is if it can be used, and I don't think MAGE should be used to change people's db roles. It also adds no more information to the Gene Expression data. The Security Group can get used not by altering the security group but by setting up a reference from the Describable object to the existing security group. To exchange that level of DB information, I think should be a different document with a DB specific XML Schema or DTD. I haven't investigated, but does anyone know of one that exists? > ===================================================================== > > > > Recommend that the specification include a section on how > numbers are > > to be formatted. > > How about using the W3C XML Schema recommendation for the integer and > floating point numbers? This is a much more rigorous definition. > > It also removes any commas from the numbers, so 1,000 is invalid and > 1000 is valid. > > It also provides for INF, -INF, and NaN for floating point numbers. Good point, but see above, the change is already being voted on. Your earlier suggested format does specify no commas, though. I'm inclined to reconsider this, I will probably vote no on the recommendation then suggest this for the next round of voting. thanks, Michael (p.s. the PNG for Experiment that I sent out had a mistake on the cardinality of the Measurement association of FactorValue, it's correct in this version) > -----Original Message----- > From: jason@openinformatics.com [mailto:jason@openinformatics.com] > Sent: Sunday, August 25, 2002 5:12 PM > To: Miller, Michael (Rosetta) > Cc: gene-expression-ftf@omg.org; 'mged mage' > Subject: Re: [Mged-mage] Call to vote: Issues #5552, #5553, #5555, > #5556, #5558, #5560 > > > "Miller, Michael (Rosetta)" writes: > > > Issue #5556: Specifying a FactorValue needs to be more flexible > > =============================================================== > > > * add a rule that FactorValue has *either* a Measurement *or* a > > OntologyEntry. > > One word of caution is that MAGEstk currently utilizes two bits of > information from doing it's job: > * MAGE.xmi - the model > * package-list.xml - specifies the order of the 'packages' in MAGE-ML > > If we add a special rule like 'you can have either This or That but > not both' and the rule is not encoded directly in the XMI file, then > we will have to add a third type of file that is used by MAGEstk, as > well as adding some special logic code to the generators. > > I'm not opposing the change, I just want people who vote to realize > the impact of the decision. > > This will break MAGEstk for some time to come. I'll probably get the > Perl code working, after I figure out how best to do this, but who > will make the Java changes? > > > Issue #5558: Database role to be associated with person/organization > > ==================================================================== > > > > A database role should be associated with > person/organization. These > > roles will describe the types of privileges a person has > with the data, > > not the type of occupation the person has. > > > > Recommend reject change. Change would require modeling > information beyond > > the scope needed to interchange Gene Expression data. > > After meeting with the SMD folks and discussing this issue with them, > I'd like to propose the following compromise. Rename SecurityGroup to > Role, and replace the two existing associations from Security > (WriteGroups and ReadGroups) with a single association, roles. > > This both simplifies the model, and provides some needed > flexibility. I originally proposed the Security SecurityGroup model > based on GeneX's security model. Since then, I've discovered it is too > limited, and have expanded it as well. > > Also it keeps it generic enough, that it is still within the scope of > the model, and is useful to other groups for describing what SMD wants > to do. > > > =========================== > > > > Issue #5560: For interoperability, need a rule for > formatting numbers > > > ===================================================================== > > > > Recommend that the specification include a section on how > numbers are > > to be formatted. > > How about using the W3C XML Schema recommendation for the integer and > floating point numbers? This is a much more rigorous definition. > > It also removes any commas from the numbers, so 1,000 is invalid and > 1000 is valid. > > It also provides for INF, -INF, and NaN for floating point numbers. > > jas. > Experiment1.png