Issue 18289: DDTV spec had named element whose qualified name matched name generated by  procedure described in section B6 (canonical-xmi-ftf)
Source: NIST (Mr. Peter Denno, peter.denno(at)nist.gov)
Nature: Uncategorized Issue
Severity: 
Summary: The DTV spec had a named element whose qualified name matched a name generated by a procedure described in the spec section B6:

In other cases the xmi:id is the xmi:id of the parent XML element (or “_” for top level elements),
followed by the separator ‘-‘, followed by the name of the property (XML element. If there is
more than one value for the property this is further followed by ‘-‘ followed by the sequence
number (from 1) within the parent element and the property. Note that named elements (which
satisfy the first rule) are still included in this count.

The named element was not a sibling, so the part "Note that named elements...are still included in this count" did not apply. One quasi-solution is to use numbering whenever there is not a qualified name. Simply strike the phrase "If there is more than one value of the property this is further" in the above. The problem with this is that there could still be a element with a qualified name that matches the generated xmi:id (it could end with a number)! Perhaps we need to add "If the resulting name is a duplicate of a name generated using the procedure for qualified names described above, the first sequence number where duplication does not occur is used."

I realize that these are pretty complicated rules. 

Resolution: This is addressed by rule 3 of the resolution to 17495.


Disposition:	See issue 17495 for disposition

Revised Text: 
Actions taken:
December 6, 2012: received issue
December 23, 2013: closed issue
Discussion: 

End of Annotations:=====
te: Thu, 6 Dec 2012 12:15:59 -0500
From: Peter Denno <peter.denno@nist.gov>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120421 Thunderbird/10.0.4
To: "canonical-xmi-ftf@omg.org" <canonical-xmi-ftf@omg.org>,
        Ed Barkmeyer
        <edbark@el.nist.gov>
Subject: Canonical XMI: Problems with B6 Identification rules:
X-NISTMEL-MailScanner-Information: Please contact postmaster@mel.nist.gov for more information
X-NISTMEL-MailScanner-ID: qB6HFn11011610
X-NISTMEL-MailScanner: Found to be clean
X-NISTMEL-MailScanner-SpamCheck: 
X-NISTMEL-MailScanner-From: peter.denno@nist.gov
X-NISTMEL-MailScanner-Watermark: 1355418954.33381@vJX3GIgke/xgRIdv1TF7FA
X-Spam-Status: No
X-Brightmail-Tracker: AAAAARyCuTY=
X-Brightmail-Tracker: AAAAAA==


(2) The DTV spec had a named element whose qualified name matched a name generated by a procedure described in the spec section B6:

In other cases the xmi:id is the xmi:id of the parent XML element (or ._. for top level elements),
followed by the separator .-., followed by the name of the property (XML element. If there is
more than one value for the property this is further followed by .-. followed by the sequence
number (from 1) within the parent element and the property. Note that named elements (which
satisfy the first rule) are still included in this count.

The named element was not a sibling, so the part "Note that named elements...are still included in this count" did not apply. One quasi-solution is to use numbering whenever there is not a qualified name. Simply strike the phrase "If there is more than one value of the property this is further" in the above. The problem with this is that there could still be a element with a qualified name that matches the generated xmi:id (it could end with a number)! Perhaps we need to add "If the resulting name is a duplicate of a name generated using the procedure for qualified names described above, the first sequence number where duplication does not occur is used."

I realize that these are pretty complicated rules. 

Best regards, 
   Peter


-- 

Best regards,
  Peter

Peter Denno 
National Institute of Standards and Technology, 
Systems Integration Division, 
Engineering Laboratory,
100 Bureau Drive, Mail Stop 8265          Tel: +1 301-975-3595 
Gaithersburg, MD, USA 20899-8265          FAX: +1 301-975-4694 


m: "Rouquette, Nicolas F (313K)" <nicolas.f.rouquette@jpl.nasa.gov>
To: Pete Rivett <pete.rivett@adaptive.com>, Jishnu Mukerji <jishnu@hp.com>,
        "Barkmeyer, Edward J" <edward.barkmeyer@nist.gov>
CC: "Denno, Peter O." <peter.denno@nist.gov>,
        "date-time@omg.org"
        <date-time@omg.org>,
        "canonical-xmi-ftf@omg.org" <canonical-xmi-ftf@omg.org>,
        Ed Barkmeyer <edbark@el.nist.gov>
Subject: Re: Canonical XMI: Problems with B6 Identification rules:
Thread-Topic: Canonical XMI: Problems with B6 Identification rules:
Thread-Index: AQHN09YHReHOV2tAnEGXKjb+yldM+pgMjH2AgAAQkQCAAChLgIAAB2CA//+DXYA=
Date: Thu, 6 Dec 2012 21:51:45 +0000
Accept-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
user-agent: Microsoft-MacOutlook/14.2.4.120824
x-originating-ip: [128.149.137.113]
X-Source-Sender: nicolas.f.rouquette@jpl.nasa.gov
X-AUTH: Authorized
X-MIME-Autoconverted: from quoted-printable to 8bit by amethyst.omg.org id qB6LqGFl017118
X-Brightmail-Tracker: AAAABByCuTYcgeEcHIHMnByC4Qg=
X-Brightmail-Tracker: AAAAAA==


I don't like the "xmiName" tag; I prefer a simpler alternative:


Since we use xmi:id as a fragment for a URI, we can simply require that an
xmi:id generation algorithm produce a legal URI fragment.
This is what I've done at JPL for several years now; for example,
"foo<bar" becomes "foo_u00253Cbar".


To avoid the problem of named elements whose names would accidentally
collide with a previously-generated xmi:ID, I use a "prefixing" technique,
that is, the xmi:id generation algorithm adds a prefix to the URI encoded
name.


This way, we get a robust canonical xmi id generation technique.


At JPL, we've had to strengthen the canonical XMI spec to ensure
repeatable behavior.
We need repeatable behavior when we want to preserve cross-references
across models.
We've strengthened the canonical xmi spec in three areas:


A) non-reproducible xmi:id for unordered composite collections
B) non-reprodicible xmi:id for namespace-distinguishable features of the
same name
C) non-reprodicible xmi:id for overloaded behaviors and behavior features


I've analyzed (A) and separated 7 variants of this problem.
4 of 7 are easy to fix (I.e., no changes to UML 2.4.1).


3 of 7 are harder to fix:


- ordering clauses of a ConditionalNode
- ordering related elements of a LinkEndData
- ordering comments


These are harder to fix because, ideally, we'd change these collections to
be ordered in the UML metamodel.
If that's not a realistic thing to do, then a reasonable fallback strategy
is to stereotype these things such that the applied stereotype can be used
as an ordering key.
This would requires making the canonical xmi:id generation algorithm aware
of the applied stereotype ordering key and that could be impractical in
some cases.


- Nicolas.


On 12/6/12 1:17 PM, "Pete Rivett" <pete.rivett@adaptive.com> wrote:


>Where the metamodel element name (and the default conversion rule to _)
>is not suitable, XMI has a tag to override it. The following is in fact
>the first tag defined in section 7.11.1 of the XMI spec:
>
>       xmiName         string  nil     Provides an alternate name from
>the MOF name for writing to XMI. Useful in cases where the MOF name has
>characters that conflict with XML. This value is used rather than the
>MOF name.
>
>Pete
>
>-----Original Message-----
>From: Jishnu Mukerji [mailto:jishnu@hp.com]
>Sent: Thursday, December 06, 2012 12:51 PM
>To: Barkmeyer, Edward J
>Cc: Denno, Peter O.; date-time@omg.org; canonical-xmi-ftf@omg.org; Ed
>Barkmeyer
>Subject: Re: Canonical XMI: Problems with B6 Identification rules:
>
>Ed,
>
>I have complete sympathy with you guys. It really is a bug in the
>Canonical XMI spec in that it specifies a scheme for handling
>unacceptable characters in a way that is not information preserving, and
>that should be fixed irrespective of what you guys do this time around.
>
>A PIM should always be able to be a PIM without knowing how it is going
>to be externalized, including in the choice of names. So either put
>restrictions on names allowable in UML to reflect XML constraints or
>provide a mapping scheme that is info lossless. Incidentally on eis
>already provided by e.g. html. Wonder why that is not used in XMI.
>
>Again just my $0.02
>
>Jishnu.
>
>On 12/6/2012 1:27 PM, Barkmeyer, Edward J wrote:
>> For the record, DTV supplies more than one 'designation', i.e. name,
>for the cited relationships.  It just says that the ones with the
>mathematical characters are the "primary terms" (in SBVR parlance).  We
>could (and probably should) have chosen to use one of the alternative
>terms in creating the UML names for the associations.  I suppose we
>could still make that change, but it will cause replacement of several
>diagrams in the text (and probably some explanatory text in section 5,
>which describes the relationships among the renditions of the DTV).  I
>suggest that we make an issue for the DTV RTF, so as to head off
>problems with tools that use the UML XMI files to do other things.
>>
>> (This problem showed up in the 11th hour effort to generate a final
>> correct canonical UML file, now that that is nominally possible,
>> rather than repeating the process every few days while we corrected
>> inconsistencies between the text and the maintained UML model.  I had
>> a similar problem with the xmi:id values in the CMOF file.)
>>
>> --
>> Edward J. Barkmeyer                       Email: edbark@nist.gov
>> National Institute of Standards & Technology Engineering Laboratory --
>
>> Systems Integration Division
>> 100 Bureau Drive, Stop 8263               Office: +1 301-975-3528
>> Gaithersburg, MD 20899-8263               Mobile: +1 240-672-5800
>> ________________________________________
>> From: Jishnu Mukerji [jishnu@hp.com]
>> Sent: Thursday, December 06, 2012 12:27 PM
>> To: Denno, Peter O.
>> Cc: canonical-xmi-ftf@omg.org; Ed Barkmeyer
>> Subject: Re: Canonical XMI: Problems with B6 Identification rules:
>>
>> Where special characters have special meanings, we tend to use pair of
>alpha character to denote special meaning. E.g. < is denoted by "lt" or
>".lt." depending on the context.
>>
>> Perhaps such an out should be allowed in canonical xmi generation.
>Then it will be upto the domain that the XMI is specific to to specify
>in their standards what those ids specifically mean in the domain. For
>the rest they should be uninterpreted strings anyway.
>>
>> Of course DTV could make everyone's life simpler by simply using
>> element names like
>> "DateTime-Time_Insfratructure-duration_lt_duration2" etc.
>> notwithstanding what Canonical XMI says too :)
>>
>> Just my $0.02....
>>
>> Jishnu.
>>
>>
>>
>>
>> On 12/6/2012 12:15 PM, Peter Denno wrote:
>> Hi,
>>
>> While generating Canonical XMI for the Date Time Vocabulary spec, I
>think I discovered problems with the xmi:id generation rules.
>>
>> (1) The spec says "Where the above rules result in characters not
>> permitted for identifiers in XML documents (for example space, '/' or
>':' these must be replaced by '_'."
>>
>> DTV had elements named like this:
>>
>> "DateTime-Time_Infrastructure-duration1_<_duration2"
>> "DateTime-Time_Infrastructure-duration1_=_duration2"
>>
>> Obviously, if I change < and = to _, two elements will have the same
>name.
>>
>> We need a better strategy for handling special characters.
>>
>> (2) The DTV spec had a named element whose qualified name matched a
>name generated by a procedure described in the spec section B6:
>>
>> In other cases the xmi:id is the xmi:id of the parent XML element (or
>> "_" for top level elements), followed by the separator '-', followed
>> by the name of the property (XML element. If there is more than one
>> value for the property this is further followed by '-' followed by the
>
>> sequence number (from 1) within the parent element and the property.
>Note that named elements (which satisfy the first rule) are still
>included in this count.
>>
>> The named element was not a sibling, so the part "Note that named
>elements...are still included in this count" did not apply. One
>quasi-solution is to use numbering whenever there is not a qualified
>name. Simply strike the phrase "If there is more than one value of the
>property this is further" in the above. The problem with this is that
>there could still be a element with a qualified name that matches the
>generated xmi:id (it could end with a number)! Perhaps we need to add
>"If the resulting name is a duplicate of a name generated using the
>procedure for qualified names described above, the first sequence number
>where duplication does not occur is used."
>>
>> I realize that these are pretty complicated rules.
>>
>> Best regards,
>>     Peter
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best regards,
>>    Peter
>>
>> Peter Denno
>> National Institute of Standards and Technology, Systems Integration
>> Division, Engineering Laboratory,
>> 100 Bureau Drive, Mail Stop 8265          Tel: +1 301-975-3595
>> Gaithersburg, MD, USA 20899-8265          FAX: +1 301-975-4694
>>
>>
Subject: RE: Canonical XMI: Problems with B6 Identification rules:
Date: Fri, 7 Dec 2012 08:19:03 -0800
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Canonical XMI: Problems with B6 Identification rules:
Thread-Index: AQHN09YHReHOV2tAnEGXKjb+yldM+pgMjH2AgAAQkQCAAChLgIAAB2CA//+DXYCAATOokA==
From: "Pete Rivett" <pete.rivett@adaptive.com>
To: "Rouquette, Nicolas F (313K)" <nicolas.f.rouquette@jpl.nasa.gov>,
        "Jishnu Mukerji" <jishnu@hp.com>,
        "Barkmeyer, Edward J" <edward.barkmeyer@nist.gov>
Cc: "Denno, Peter O." <peter.denno@nist.gov>, <date-time@omg.org>,
        <canonical-xmi-ftf@omg.org>, "Ed Barkmeyer" <edbark@el.nist.gov>
X-MIME-Autoconverted: from quoted-printable to 8bit by amethyst.omg.org id qB7GJRQi005035
X-Brightmail-Tracker: AAAAAxyCuTYcgeEcHIHMnA==
X-Brightmail-Tracker: AAAAAA==


You have not put up much of an argument against xmiName: it seems to me
simple (one sentence) and the name to be used is documented as part of
the metamodel - so implementations do not have to replicate a
potentially complex string replacement algorithm or "prefixing
technique". Neither of which I see you have even tried to document.


Nonetheless, and regardless of its merits, xmiName is a mechanism that
is part of the XMI spec today and should be used by DTV in the short
term pending further discussion and resolution of the above in a future
version of XMI.


Pete


-----Original Message-----
From: Rouquette, Nicolas F (313K)
[mailto:nicolas.f.rouquette@jpl.nasa.gov] 
Sent: Thursday, December 06, 2012 1:52 PM
To: Pete Rivett; Jishnu Mukerji; Barkmeyer, Edward J
Cc: Denno, Peter O.; date-time@omg.org; canonical-xmi-ftf@omg.org; Ed
Barkmeyer
Subject: Re: Canonical XMI: Problems with B6 Identification rules:


I don't like the "xmiName" tag; I prefer a simpler alternative:


Since we use xmi:id as a fragment for a URI, we can simply require that
an xmi:id generation algorithm produce a legal URI fragment.
This is what I've done at JPL for several years now; for example,
"foo<bar" becomes "foo_u00253Cbar".


To avoid the problem of named elements whose names would accidentally
collide with a previously-generated xmi:ID, I use a "prefixing"
technique, that is, the xmi:id generation algorithm adds a prefix to the
URI encoded name.


This way, we get a robust canonical xmi id generation technique.


At JPL, we've had to strengthen the canonical XMI spec to ensure
repeatable behavior.
We need repeatable behavior when we want to preserve cross-references
across models.
We've strengthened the canonical xmi spec in three areas:


A) non-reproducible xmi:id for unordered composite collections
B) non-reprodicible xmi:id for namespace-distinguishable features of the
same name
C) non-reprodicible xmi:id for overloaded behaviors and behavior
features


I've analyzed (A) and separated 7 variants of this problem.
4 of 7 are easy to fix (I.e., no changes to UML 2.4.1).


3 of 7 are harder to fix:


- ordering clauses of a ConditionalNode
- ordering related elements of a LinkEndData
- ordering comments


These are harder to fix because, ideally, we'd change these collections
to be ordered in the UML metamodel.
If that's not a realistic thing to do, then a reasonable fallback
strategy is to stereotype these things such that the applied stereotype
can be used as an ordering key.
This would requires making the canonical xmi:id generation algorithm
aware of the applied stereotype ordering key and that could be
impractical in some cases.


- Nicolas.


On 12/6/12 1:17 PM, "Pete Rivett" <pete.rivett@adaptive.com> wrote:


>Where the metamodel element name (and the default conversion rule to _)


>is not suitable, XMI has a tag to override it. The following is in fact


>the first tag defined in section 7.11.1 of the XMI spec:
>
>       xmiName         string  nil     Provides an alternate name from
>the MOF name for writing to XMI. Useful in cases where the MOF name has


>characters that conflict with XML. This value is used rather than the 
>MOF name.
>
>Pete
>
>-----Original Message-----
>From: Jishnu Mukerji [mailto:jishnu@hp.com]
>Sent: Thursday, December 06, 2012 12:51 PM
>To: Barkmeyer, Edward J
>Cc: Denno, Peter O.; date-time@omg.org; canonical-xmi-ftf@omg.org; Ed 
>Barkmeyer
>Subject: Re: Canonical XMI: Problems with B6 Identification rules:
>
>Ed,
>
>I have complete sympathy with you guys. It really is a bug in the 
>Canonical XMI spec in that it specifies a scheme for handling 
>unacceptable characters in a way that is not information preserving, 
>and that should be fixed irrespective of what you guys do this time
around.
>
>A PIM should always be able to be a PIM without knowing how it is going


>to be externalized, including in the choice of names. So either put 
>restrictions on names allowable in UML to reflect XML constraints or 
>provide a mapping scheme that is info lossless. Incidentally on eis 
>already provided by e.g. html. Wonder why that is not used in XMI.
>
>Again just my $0.02
>
>Jishnu.
>
>On 12/6/2012 1:27 PM, Barkmeyer, Edward J wrote:
>> For the record, DTV supplies more than one 'designation', i.e. name,
>for the cited relationships.  It just says that the ones with the 
>mathematical characters are the "primary terms" (in SBVR parlance).  We


>could (and probably should) have chosen to use one of the alternative 
>terms in creating the UML names for the associations.  I suppose we 
>could still make that change, but it will cause replacement of several 
>diagrams in the text (and probably some explanatory text in section 5, 
>which describes the relationships among the renditions of the DTV).  I 
>suggest that we make an issue for the DTV RTF, so as to head off 
>problems with tools that use the UML XMI files to do other things.
>>
>> (This problem showed up in the 11th hour effort to generate a final 
>> correct canonical UML file, now that that is nominally possible, 
>> rather than repeating the process every few days while we corrected 
>> inconsistencies between the text and the maintained UML model.  I had


>> a similar problem with the xmi:id values in the CMOF file.)
>>
>> --
>> Edward J. Barkmeyer                       Email: edbark@nist.gov
>> National Institute of Standards & Technology Engineering Laboratory 
>> --
>
>> Systems Integration Division
>> 100 Bureau Drive, Stop 8263               Office: +1 301-975-3528
>> Gaithersburg, MD 20899-8263               Mobile: +1 240-672-5800
>> ________________________________________
>> From: Jishnu Mukerji [jishnu@hp.com]
>> Sent: Thursday, December 06, 2012 12:27 PM
>> To: Denno, Peter O.
>> Cc: canonical-xmi-ftf@omg.org; Ed Barkmeyer
>> Subject: Re: Canonical XMI: Problems with B6 Identification rules:
>>
>> Where special characters have special meanings, we tend to use pair 
>> of
>alpha character to denote special meaning. E.g. < is denoted by "lt" or


>".lt." depending on the context.
>>
>> Perhaps such an out should be allowed in canonical xmi generation.
>Then it will be upto the domain that the XMI is specific to to specify 
>in their standards what those ids specifically mean in the domain. For 
>the rest they should be uninterpreted strings anyway.
>>
>> Of course DTV could make everyone's life simpler by simply using 
>> element names like 
>> "DateTime-Time_Insfratructure-duration_lt_duration2" etc.
>> notwithstanding what Canonical XMI says too :)
>>
>> Just my $0.02....
>>
>> Jishnu.
>>
>>
>>
>>
>> On 12/6/2012 12:15 PM, Peter Denno wrote:
>> Hi,
>>
>> While generating Canonical XMI for the Date Time Vocabulary spec, I
>think I discovered problems with the xmi:id generation rules.
>>
>> (1) The spec says "Where the above rules result in characters not 
>> permitted for identifiers in XML documents (for example space, '/' or
>':' these must be replaced by '_'."
>>
>> DTV had elements named like this:
>>
>> "DateTime-Time_Infrastructure-duration1_<_duration2"
>> "DateTime-Time_Infrastructure-duration1_=_duration2"
>>
>> Obviously, if I change < and = to _, two elements will have the same
>name.
>>
>> We need a better strategy for handling special characters.
>>
>> (2) The DTV spec had a named element whose qualified name matched a
>name generated by a procedure described in the spec section B6:
>>
>> In other cases the xmi:id is the xmi:id of the parent XML element (or


>> "_" for top level elements), followed by the separator '-', followed 
>> by the name of the property (XML element. If there is more than one 
>> value for the property this is further followed by '-' followed by 
>> the
>
>> sequence number (from 1) within the parent element and the property.
>Note that named elements (which satisfy the first rule) are still 
>included in this count.
>>
>> The named element was not a sibling, so the part "Note that named
>elements...are still included in this count" did not apply. One 
>quasi-solution is to use numbering whenever there is not a qualified 
>name. Simply strike the phrase "If there is more than one value of the 
>property this is further" in the above. The problem with this is that 
>there could still be a element with a qualified name that matches the 
>generated xmi:id (it could end with a number)! Perhaps we need to add 
>"If the resulting name is a duplicate of a name generated using the 
>procedure for qualified names described above, the first sequence 
>number where duplication does not occur is used."
>>
>> I realize that these are pretty complicated rules.
>>
>> Best regards,
>>     Peter
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best regards,
>>    Peter
>>
>> Peter Denno
>> National Institute of Standards and Technology, Systems Integration 
>> Division, Engineering Laboratory,
>> 100 Bureau Drive, Mail Stop 8265          Tel: +1 301-975-3595
>> Gaithersburg, MD, USA 20899-8265          FAX: +1 301-975-4694
>>
>>
>Date: Fri, 7 Dec 2012 10:53:40 -0500
From: Peter Denno <peter.denno@nist.gov>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120421 Thunderbird/10.0.4
To: Jishnu Mukerji <jishnu@hp.com>
CC: "Barkmeyer, Edward J" <edward.barkmeyer@nist.gov>,
        "date-time@omg.org"
        <date-time@omg.org>,
        "canonical-xmi-ftf@omg.org" <canonical-xmi-ftf@omg.org>,
        Ed Barkmeyer <edbark@el.nist.gov>
Subject: Re: Canonical XMI: Problems with B6 Identification rules:
X-NISTMEL-MailScanner-Information: Please contact postmaster@mel.nist.gov for more information
X-NISTMEL-MailScanner-ID: qB7FrU8l010240
X-NISTMEL-MailScanner: Found to be clean
X-NISTMEL-MailScanner-SpamCheck: 
X-NISTMEL-MailScanner-From: peter.denno@nist.gov
X-NISTMEL-MailScanner-Watermark: 1355500410.66116@rO4PU4nAwmIzOMuoCY8CkQ
X-Spam-Status: No
X-Brightmail-Tracker: AAAAAxyCuTYcgeEcHIHMnA==
X-Brightmail-Tracker: AAAAAA==


Hi Jishnu,

On 12/06/2012 03:51 PM, Jishnu Mukerji wrote: 
Ed,

I have complete sympathy with you guys. It really is a bug in the 
Canonical XMI spec in that it specifies a scheme for handling 
unacceptable characters in a way that is not information preserving, and 
that should be fixed irrespective of what you guys do this time around.
Canonical XMI is information preserving. I do not think there is a bug in this respect. Where tools need to preserve identity information, they should use uuid, not xmi:id. Changing all special characters to an underscore is a bug for sure, but there should be no presumption that the algorithm for generating the xmi:id need preserve the value of the original xmi:id. (The xmi:id is not user-controlled information, so the concept of "information preserving" is not relevant.) In fact, changing the xmi:id to something on which we can impose an ordering on the objects is an essential idea of Canonical XMI.


A PIM should always be able to be a PIM without knowing how it is going 
to be externalized, including in the choice of names.
Agreed. No problem here with Canonical XMI. 

 So either put 
restrictions on names allowable in UML to reflect XML constraints 
I hope we never do this. It should not be necessary.

or 
provide a mapping scheme that is info lossless. 
As argued above, Canonical XMI is lossless in a useful sense of the term.

Incidentally on eis 
already provided by e.g. html. Wonder why that is not used in XMI.

Again just my $0.02

Jishnu.

On 12/6/2012 1:27 PM, Barkmeyer, Edward J wrote:

For the record, DTV supplies more than one 'designation', i.e. name, for the cited relationships.  It just says that the ones with the mathematical characters are the "primary terms" (in SBVR parlance).  We could (and probably should) have chosen to use one of the alternative terms in creating the UML names for the associations.  I suppose we could still make that change, but it will cause replacement of several diagrams in the text (and probably some explanatory text in section 5, which describes the relationships among the renditions of the DTV).  I suggest that we make an issue for the DTV RTF, so as to head off problems with tools that use the UML XMI files to do other things.

(This problem showed up in the 11th hour effort to generate a final correct canonical UML file, now that that is nominally possible, rather than repeating the process every few days while we corrected inconsistencies between the text and the maintained UML model.  I had a similar problem with the xmi:id values in the CMOF file.)

--
Edward J. Barkmeyer                       Email: edbark@nist.gov
National Institute of Standards & Technology
Engineering Laboratory -- Systems Integration Division
100 Bureau Drive, Stop 8263               Office: +1 301-975-3528
Gaithersburg, MD 20899-8263               Mobile: +1 240-672-5800
________________________________________
From: Jishnu Mukerji [jishnu@hp.com]
Sent: Thursday, December 06, 2012 12:27 PM
To: Denno, Peter O.
Cc: canonical-xmi-ftf@omg.org; Ed Barkmeyer
Subject: Re: Canonical XMI: Problems with B6 Identification rules:

Where special characters have special meanings, we tend to use pair of alpha character to denote special meaning. E.g. < is denoted by "lt" or ".lt." depending on the context.

Perhaps such an out should be allowed in canonical xmi generation. Then it will be upto the domain that the XMI is specific to to specify in their standards what those ids specifically mean in the domain. For the rest they should be uninterpreted strings anyway.

Of course DTV could make everyone's life simpler by simply using element names like "DateTime-Time_Insfratructure-duration_lt_duration2" etc. notwithstanding what Canonical XMI says too :)

Just my $0.02....

Jishnu.


On 12/6/2012 12:15 PM, Peter Denno wrote:
Hi,

While generating Canonical XMI for the Date Time Vocabulary spec, I think I discovered problems with the xmi:id generation rules.

(1) The spec says "Where the above rules result in characters not permitted for identifiers in XML documents (for
example space, ./. or .:. these must be replaced by ._.."

DTV had elements named like this:

"DateTime-Time_Infrastructure-duration1_<_duration2"
"DateTime-Time_Infrastructure-duration1_=_duration2"

Obviously, if I change < and = to _, two elements will have the same name.

We need a better strategy for handling special characters.

(2) The DTV spec had a named element whose qualified name matched a name generated by a procedure described in the spec section B6:

In other cases the xmi:id is the xmi:id of the parent XML element (or ._. for top level elements),
followed by the separator .-., followed by the name of the property (XML element. If there is
more than one value for the property this is further followed by .-. followed by the sequence
number (from 1) within the parent element and the property. Note that named elements (which
satisfy the first rule) are still included in this count.

The named element was not a sibling, so the part "Note that named elements...are still included in this count" did not apply. One quasi-solution is to use numbering whenever there is not a qualified name. Simply strike the phrase "If there is more than one value of the property this is further" in the above. The problem with this is that there could still be a element with a qualified name that matches the generated xmi:id (it could end with a number)! Perhaps we need to add "If the resulting name is a duplicate of a name generated using the procedure for qualified names described above, the first sequence number where duplication does not occur is used."

I realize that these are pretty complicated rules.

Best regards,
    Peter


--

Best regards,
   Peter

Peter Denno
National Institute of Standards and Technology,
Systems Integration Division,
Engineering Laboratory,
100 Bureau Drive, Mail Stop 8265          Tel: +1 301-975-3595
Gaithersburg, MD, USA 20899-8265          FAX: +1 301-975-4694


t regards,
  Peter

Peter Denno 
National Institute of Standards and Technology, 
Systems Integration Division, 
Engineering Laboratory,
100 Bureau Drive, Mail Stop 8265          Tel: +1 301-975-3595 
Gaithersburg, MD, USA 20899-8265          FAX: +1 301-975-4694 

Date: Fri, 7 Dec 2012 11:08:28 -0500
From: Peter Denno <peter.denno@nist.gov>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120421 Thunderbird/10.0.4
To: "Rouquette, Nicolas F (313K)" <nicolas.f.rouquette@jpl.nasa.gov>
CC: Pete Rivett <pete.rivett@adaptive.com>, Jishnu Mukerji <jishnu@hp.com>,
        "Barkmeyer, Edward J" <edward.barkmeyer@nist.gov>,
        "date-time@omg.org"
        <date-time@omg.org>,
        "canonical-xmi-ftf@omg.org" <canonical-xmi-ftf@omg.org>,
        Ed Barkmeyer <edbark@el.nist.gov>
Subject: Re: Canonical XMI: Problems with B6 Identification rules:
X-NISTMEL-MailScanner-Information: Please contact postmaster@mel.nist.gov for more information
X-NISTMEL-MailScanner-ID: qB7G8IgD011402
X-NISTMEL-MailScanner: Found to be clean
X-NISTMEL-MailScanner-SpamCheck: 
X-NISTMEL-MailScanner-From: peter.denno@nist.gov
X-NISTMEL-MailScanner-Watermark: 1355501300.80076@Txjn95az3Yvl5qr13XEwOg
X-Spam-Status: No
X-Brightmail-Tracker: AAAAAxyCuTYcgeEcHIHMnA==
X-Brightmail-Tracker: AAAAAA==


On 12/06/2012 04:51 PM, Rouquette, Nicolas F (313K) wrote: 
I don't like the "xmiName" tag; I prefer a simpler alternative:

Since we use xmi:id as a fragment for a URI, we can simply require that an
xmi:id generation algorithm produce a legal URI fragment.
This is what I've done at JPL for several years now; for example,
"foo<bar" becomes "foo_u00253Cbar".

I think we should adopt this.


To avoid the problem of named elements whose names would accidentally
collide with a previously-generated xmi:ID, I use a "prefixing" technique,
that is, the xmi:id generation algorithm adds a prefix to the URI encoded
name.
So this prefix is long and unusual enough that it is not likely to be used? 


This way, we get a robust canonical xmi id generation technique.

At JPL, we've had to strengthen the canonical XMI spec to ensure
repeatable behavior.
We need repeatable behavior when we want to preserve cross-references
across models.
I'd have to go back and study notes from the Reston meeting, but I think what you are referring to below was addressed in the spec. I believe we relied on uuid for some of this.

We've strengthened the canonical xmi spec in three areas:

A) non-reproducible xmi:id for unordered composite collections
B) non-reprodicible xmi:id for namespace-distinguishable features of the
same name
C) non-reprodicible xmi:id for overloaded behaviors and behavior features

I've analyzed (A) and separated 7 variants of this problem.
4 of 7 are easy to fix (I.e., no changes to UML 2.4.1).

3 of 7 are harder to fix:

- ordering clauses of a ConditionalNode
- ordering related elements of a LinkEndData
- ordering comments


These are harder to fix because, ideally, we'd change these collections to
be ordered in the UML metamodel.
If that's not a realistic thing to do, then a reasonable fallback strategy
is to stereotype these things such that the applied stereotype can be used
as an ordering key.
This would requires making the canonical xmi:id generation algorithm aware
of the applied stereotype ordering key and that could be impractical in
some cases.

- Nicolas.

On 12/6/12 1:17 PM, "Pete Rivett" <pete.rivett@adaptive.com> wrote:


Where the metamodel element name (and the default conversion rule to _)
is not suitable, XMI has a tag to override it. The following is in fact
the first tag defined in section 7.11.1 of the XMI spec:

	xmiName 	string 	nil 	Provides an alternate name from
the MOF name for writing to XMI. Useful in cases where the MOF name has
characters that conflict with XML. This value is used rather than the
MOF name.

Pete

-----Original Message-----
From: Jishnu Mukerji [mailto:jishnu@hp.com]
Sent: Thursday, December 06, 2012 12:51 PM
To: Barkmeyer, Edward J
Cc: Denno, Peter O.; date-time@omg.org; canonical-xmi-ftf@omg.org; Ed
Barkmeyer
Subject: Re: Canonical XMI: Problems with B6 Identification rules:

Ed,

I have complete sympathy with you guys. It really is a bug in the
Canonical XMI spec in that it specifies a scheme for handling
unacceptable characters in a way that is not information preserving, and
that should be fixed irrespective of what you guys do this time around.

A PIM should always be able to be a PIM without knowing how it is going
to be externalized, including in the choice of names. So either put
restrictions on names allowable in UML to reflect XML constraints or
provide a mapping scheme that is info lossless. Incidentally on eis
already provided by e.g. html. Wonder why that is not used in XMI.

Again just my $0.02

Jishnu.

On 12/6/2012 1:27 PM, Barkmeyer, Edward J wrote:

For the record, DTV supplies more than one 'designation', i.e. name,

for the cited relationships.  It just says that the ones with the
mathematical characters are the "primary terms" (in SBVR parlance).  We
could (and probably should) have chosen to use one of the alternative
terms in creating the UML names for the associations.  I suppose we
could still make that change, but it will cause replacement of several
diagrams in the text (and probably some explanatory text in section 5,
which describes the relationships among the renditions of the DTV).  I
suggest that we make an issue for the DTV RTF, so as to head off
problems with tools that use the UML XMI files to do other things.

(This problem showed up in the 11th hour effort to generate a final
correct canonical UML file, now that that is nominally possible,
rather than repeating the process every few days while we corrected
inconsistencies between the text and the maintained UML model.  I had
a similar problem with the xmi:id values in the CMOF file.)

--
Edward J. Barkmeyer                       Email: edbark@nist.gov
National Institute of Standards & Technology Engineering Laboratory --


Systems Integration Division
100 Bureau Drive, Stop 8263               Office: +1 301-975-3528
Gaithersburg, MD 20899-8263               Mobile: +1 240-672-5800
________________________________________
From: Jishnu Mukerji [jishnu@hp.com]
Sent: Thursday, December 06, 2012 12:27 PM
To: Denno, Peter O.
Cc: canonical-xmi-ftf@omg.org; Ed Barkmeyer
Subject: Re: Canonical XMI: Problems with B6 Identification rules:

Where special characters have special meanings, we tend to use pair of

alpha character to denote special meaning. E.g. < is denoted by "lt" or
".lt." depending on the context.

Perhaps such an out should be allowed in canonical xmi generation.

Then it will be upto the domain that the XMI is specific to to specify
in their standards what those ids specifically mean in the domain. For
the rest they should be uninterpreted strings anyway.

Of course DTV could make everyone's life simpler by simply using
element names like
"DateTime-Time_Insfratructure-duration_lt_duration2" etc.
notwithstanding what Canonical XMI says too :)

Just my $0.02....

Jishnu.


On 12/6/2012 12:15 PM, Peter Denno wrote:
Hi,

While generating Canonical XMI for the Date Time Vocabulary spec, I

think I discovered problems with the xmi:id generation rules.

(1) The spec says "Where the above rules result in characters not
permitted for identifiers in XML documents (for example space, '/' or

':' these must be replaced by '_'."

DTV had elements named like this:

"DateTime-Time_Infrastructure-duration1_<_duration2"
"DateTime-Time_Infrastructure-duration1_=_duration2"

Obviously, if I change < and = to _, two elements will have the same

name.

We need a better strategy for handling special characters.

(2) The DTV spec had a named element whose qualified name matched a

name generated by a procedure described in the spec section B6:

In other cases the xmi:id is the xmi:id of the parent XML element (or
"_" for top level elements), followed by the separator '-', followed
by the name of the property (XML element. If there is more than one
value for the property this is further followed by '-' followed by the


sequence number (from 1) within the parent element and the property.

Note that named elements (which satisfy the first rule) are still
included in this count.

The named element was not a sibling, so the part "Note that named

elements...are still included in this count" did not apply. One
quasi-solution is to use numbering whenever there is not a qualified
name. Simply strike the phrase "If there is more than one value of the
property this is further" in the above. The problem with this is that
there could still be a element with a qualified name that matches the
generated xmi:id (it could end with a number)! Perhaps we need to add
"If the resulting name is a duplicate of a name generated using the
procedure for qualified names described above, the first sequence number
where duplication does not occur is used."

I realize that these are pretty complicated rules.

Best regards,
    Peter


--

Best regards,
   Peter

Peter Denno
National Institute of Standards and Technology, Systems Integration
Division, Engineering Laboratory,
100 Bureau Drive, Mail Stop 8265          Tel: +1 301-975-3595
Gaithersburg, MD, USA 20899-8265          FAX: +1 301-975-4694


t regards,
  Peter

Peter Denno 
National Institute of Standards and Technology, 
Systems Integration Division, 
Engineering Laboratory,
100 Bureau Drive, Mail Stop 8265          Tel: +1 301-975-3595 
Gaithersburg, MD, USA 20899-8265          FAX: +1 301-975-4694