Issue 19808: Identification rules are too weak and too unpredictable (canonical-xmi-ftf)
Source: NASA (Dr. Nicolas F. Rouquette, nicolas.f.rouquette(at)jpl.nasa.gov)
Nature: Enhancement
Severity: Critical
Summary: The Canonical XMI rules for generating xmi:ids (section B6) are too weak to ensure reproducible results.


Consider the following procedure:


- Start a tool that supports Canonical XMI xmi:id generation
- Load an input model
- Compute the xmi:ids for all elements in the model


Repeated executions of this procedure with the same tool, same input model should always result in the same xmi:ids.
Reproducibility depends on the model. For example, the xmi:ids of comments owned by an element result are ordered by the "_n" unique suffix according to the order of their xmi:uuids. However, if modeling do not support xmi:uuids or do not preserve them, then the results can vary.


The Canonical XMI rules for generating xmi:ids (section B6) are too unpredictable because of the dependency on XMI serialization.


There are several cases where the xmi:id of an element depends on its serialization:


- multiple named elements that have the same name, the same owner and the same containing property. 
- multiple named elements that have the same owner and the same containing property but whose names differ only in characters that indistinguishably map to "_"
- multiple non-named elements that have the same owner and the same containing property


In all such cases, rule (4) appends a unique "_n" suffix according to the "export order"; which ultimately reduces to the ordering of xmi:uuids of an element amongst its siblings that have the same "base name". This means that changes somewhere in a model can result in unmodified elements elsewhere to have different xmi:ids than before the changes were made. 


If a tool implements Canonical XMI when saving/exporting models, then the xmi:id rules behavior effectively injects changes into a user's model (on save/export). There could be pathological cases where adding/removing a single element in a model with N (very large) elements could result in changing most of the model! (this is because a change in an xmi:id then propagates into changes to xmi:idrefs that refer to that changed xmi:id).

In practice, weaknesses and unpredictability severely undermine the utility of Canonical XMI identification rules.
Resolution: 
Revised Text: 
Actions taken:
June 18, 2015: received issue
Discussion: 


End of Annotations:=====

m: webmaster@omg.org
Date: 18 Jun 2015 12:53:03 -0400
To: <issues@omg.org>
Subject: Issue/Bug Report


*******************************************************************************
Name:            Nicolas Rouquette
Employer:        JPL
mailFrom:        nicolas.f.rouquette@jpl.nasa.gov
Terms_Agreement: I agree
Specification:   Canonical XMI
Section:         B.6
FormalNumber:    ptc/13-08-28
Version:         Beta 2
Doc_Year:        2013
Doc_Month:       August
Doc_Day:         28
Page:            9-10
Title:           Identification rules are too weak and too unpredictable
Nature:          Enhancement
Severity:        Critical
CODE:            3TMw8
B1:              Report Issue
Remote Name:     wildcard.jpl.nasa.gov
Remote User:     
HTTP User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.6.3 (KHTML, like Gecko) Version/8.0.6 Safari/600.6.3
Time:            12:53 PM


Description:


The Canonical XMI rules for generating xmi:ids (section B6) are too weak to ensure reproducible results.


Consider the following procedure:


- Start a tool that supports Canonical XMI xmi:id generation
- Load an input model
- Compute the xmi:ids for all elements in the model


Repeated executions of this procedure with the same tool, same input model should always result in the same xmi:ids.
Reproducibility depends on the model. For example, the xmi:ids of comments owned by an element result are ordered by the "_n" unique suffix according to the order of their xmi:uuids. However, if modeling do not support xmi:uuids or do not preserve them, then the results can vary.


The Canonical XMI rules for generating xmi:ids (section B6) are too unpredictable because of the dependency on XMI serialization.


There are several cases where the xmi:id of an element depends on its serialization:


- multiple named elements that have the same name, the same owner and the same containing property. 
- multiple named elements that have the same owner and the same containing property but whose names differ only in characters that indistinguishably map to "_"
- multiple non-named elements that have the same owner and the same containing property


In all such cases, rule (4) appends a unique "_n" suffix according to the "export order"; which ultimately reduces to the ordering of xmi:uuids of an element amongst its siblings that have the same "base name". This means that changes somewhere in a model can result in unmodified elements elsewhere to have different xmi:ids than before the changes were made. 


If a tool implements Canonical XMI when saving/exporting models, then the xmi:id rules behavior effectively injects changes into a user's model (on save/export). There could be pathological cases where adding/removing a single element in a model with N (very large) elements could result in changing most of the model! (this is because a change in an xmi:id then propagates into changes to xmi:idrefs that refer to that changed xmi:id).

In practice, weaknesses and unpredictability severely undermine the utility of Canonical XMI identification rules.