Issue 4710: Mapping for Character and String Types (java-rtf) Source: International Business Machines (Ms. Anne E. Collins, nobody) Nature: Uncategorized Issue Severity: Summary: Sections 1.4.3 and 1.4.5 of the IDL to Java spec state that character and string range and bounds checking should be performed at marshal time. Since it is possible to receive IDL characters, e.g UTF16 surrogate pairs, that cannot, at present, be mapped to Java characters, could we clarify that this checking should be performed at both marshaling and demarshaling time, per section 4.11.3.25 DATA_CONVERSION of the CORBA 2.5 spec which states:- This exception is raised if an ORB cannot convert the representation of data as marshaled into its native representation or vice-versa. Also, could anyone please comment on the meaning of the term "character set" in the sentence beginning "If the char falls outside the range defined by the character set,..." in section 1.4.3. Does this refer to the TCS, NCS or both? Resolution: Incorporate changes and close issue Revised Text: replace old 1.4.3: IDL characters are 8-bit quantities representing elements of a character set while Java characters are 16-bit unsigned quantities representing Unicode characters. In order to enforce type-safety, the Java CORBA runtime asserts range validity of all Java chars mapped from IDL chars when parameters are marshaled during method invocation. If the char falls outside the range defined by the character set, a CORBA::DATA_CONVERSION exception shall be thrown. The IDL wchar maps to the Java primitive type char. If the wchar falls outside the range defined by the character set, a CORBA::DATA_CONVERSION exception shall be thrown. with new 1.4.3: IDL characters are 8-bit quantities representing elements of a character set while Java characters are 16-bit unsigned quantities representing Unicode characters. In order to enforce type safety, the Java CORBA runtime asserts range validity of all Java chars mapped to or from IDL chars when parameters are marshaled or unmarshaled during method invocation. Assume that a Java ORB has a particular native character set (NCS) and some transmission character set (TCS) has been negotiated for a particular object reference. The following rules apply for reporting errors in character encoding during data marshalling and unmarshalling in a method invocation on that object reference: If an attempt is made to marshal a char represented in the sending ORB's NCS which cannot be represented in the sending ORB's TCS, a CORBA::DATA_CONVERSION exception with a minor code of 1 is thrown. If an attempt is made to unmarshal a char represent in the receiving ORB's TCS that cannot be represented in the receiving ORB's NCS, a CORBA::DATA_CONVERSION exception with a minor code of TBD is thrown. Similarly, the following rules apply for reporting errors in wchar encoding during data marshalling and unmarshalling in a method invocation on that object reference: If an attempt is made to marshal a wchar represented in the sending ORB's NCS which cannot be represented in the sending ORB's TCS, a CORBA::DATA_CONVERSION exception with a minor code of 1 is thrown. If an attempt is made to unmarshal a wchar represent in the receiving ORB's TCS that cannot be represented in the receiving ORB's NCS, a CORBA::DATA_CONVERSION exception with a minor code of TBD is thrown. Replace section 1.4.5: The IDL string, both bounded and unbounded variants, are mapped to java.lang.String. Range checking for characters in the string as well as bounds checking of the string is done at marshal time. Character range violations cause a CORBA::DATA_CONVERSION exception to be raised. Bounds violations cause a CORBA::BAD_PARAM exception to be raised. The IDL wstring, both bounded and unbounded variants, are mapped to java.lang.String. Bounds checking of the string is done at marshal time. Character range violations cause a CORBA::DATA_CONVERSION exception to be raised. Bounds violations cause a CORBA:: BAD_PARAM exception to be raised. with the new section 1.4.5: The IDL string type, in both the bounded and unbounded variants, is mapped to java type java.lang.String. Range checking for characters in the string as well as bounds checking of the string is done at marshal time. Character range violations cause a CORBA::DATA_CONVERSION exception to be raised as described in section 1.4.3, "Character Types". Bounds violations cause a CORBA::BAD_PARAM exception to be raised. The IDL wstring type, in both the bounded and unbounded variants, is mapped to java type java.lang.String. Bounds checking of the string is done at marshal time. Character range violations cause a CORBA::DATA_CONVERSION exception to be raised as described in section 1.4.3, "Character Types". Bounds violations cause a CORBA:: BAD_PARAM exception to be raised. Actions taken: November 21, 2001: received issue April 28, 2003: closed issue Discussion: End of Annotations:===== Importance: Normal Subject: Mapping for Character and String Types To: java-rtf@omg.org X-Mailer: Lotus Notes Release 5.0.5 September 22, 2000 Message-ID: From: "Ann Dalton1" Date: Wed, 21 Nov 2001 20:34:54 +0000 X-MIMETrack: Serialize by Router on d06ml005/06/M/IBM(Release 5.0.8 |June 18, 2001) at 21/11/2001 20:33:33 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-UIDL: J]Ke9*@Ke95% From: "Ann Dalton1" Date: Wed, 20 Feb 2002 00:00:16 +0000 X-MIMETrack: Serialize by Router on d06ml005/06/M/IBM(Release 5.0.8 |June 18, 2001) at 20/02/2002 01:10:59 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-UIDL: US1e9S$f!!<^Xd9`)nd9 Status: RO Sections 1.4.3 and 1.4.5 of the IDL to Java spec state that character and string range and bounds checking should be performed at marshal time. Since it is possible to receive IDL characters, e.g UTF16 surrogate pairs, that cannot, at present, be mapped to Java characters, could we clarify that this checking should be performed at both marshaling and demarshaling time, per section 4.11.3.25 DATA_CONVERSION of the CORBA 2.5 spec which states:- This exception is raised if an ORB cannot convert the representation of data as marshaled into its native representation or vice-versa. Also, could anyone please comment on the meaning of the term "character set" in the sentence beginning "If the char falls outside the range defined by the character set,..." in section 1.4.3. Does this refer to the TCS, NCS or both? Thanks, Ann A E Dalton ann_dalton@uk.ibm.com