Issue 7731: Codec Interface Deficiencies (corba-rtf) Source: Zuehlke Engineering (Mr. Frank Pilhofer, fpilhofer2008(at)gmail.com) Nature: Uncategorized Issue Severity: Summary: CORBA 3, chapter 13.8, defines the Codec interface to encode arbitrary data values into CORBA::OctetSeq "blobs" and vice versa. This interface can be used, e.g., to supply and retrieve ServiceContext data using the PortableInterceptor interfaces. In practice, the Codec interface is also being used for data serialization, i.e., to store and retrieve arbitrary values in files or other databases. However, the interface is deficient in that it does not consider all possible variables that are needed for interoperability. It supports setting the CDR version that is to be used, but neglects byteorder and codeset settings. Consequently, the encoded values are platform-specific. If a value was encoded on a little-endian system, it will not decode, or worse, decode erroneously, on a big-endian system. The same caveats apply to codesets, e.g., when an ISO-8859-1 encoded blob is decoded using UTF-8 or Windows-1252. To support interoperability, the Codec interface needs to be extended. My recommendation is to extend the CodecFactory interface, so that it supports creating CDR version-, byteorder-, and codeset-specific Codec instances, either supplying user- provided values for each, or informing the user about chosen defaults. Example: module IOP { const EncodingFormat ENCODING_DEFAULT = -1; typedef short ByteorderFormat; const ByteorderFormat BYTEORDER_DEFAULT = -1; const ByteorderFormat BYTEORDER_BIGENDIAN = 0; const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1; struct EncodingExt { EncodingFormat format; octet major_version; // set to 0 for default octet minor_version; ByteorderFormat byteorder; CONV_FRAME::CodeSetId char_data; // set to 0 for default CONV_FRAME::CodeSetId wchar_data; // set to 0 for default }; local interface CodecFactory { // create_codec remains as before Codec create_codec_ext (inout EncodingExt enc) raises (UnknownEncoding); }; }; The create_codec_ext operation would create an appropriate Codec instance, if available; it will then set all "default" members of the EncodingExt structure to their actual values, so that the application can store this information along with any encoded values. One potential criticism of the above is that the encoding format's parameters depend on the encoding format. For example, there may be encoding formats that are byteorder-independent, or that consistently use UTF-32 for strings, thus not needing codeset parameters. Also, they may use wildly different versioning. So a "better" solution might involve passing the EncodingFormat, and an Any with a format-specific data type. That could look like: module GIOP { typedef short ByteorderFormat; const ByteorderFormat BYTEORDER_DEFAULT = -1; const ByteorderFormat BYTEORDER_BIGENDIAN = 0; const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1; struct CDREncodingParameters { octet major_version; // set to 0 for default octet minor_version; ByteorderFormat byteorder; CONV_FRAME::CodeSetId char_data; // set to 0 for default CONV_FRAME::CodeSetId wchar_data; // set to 0 for default }; }; module IOP { const EncodingFormat ENCODING_DEFAULT = -1; local interface CodecFactory { // create_codec remains as before Codec create_codec_ext (inout EncodingFormat format, inout Any parameters) raises (UnknownEncoding); }; }; Once we have consensus on the approach, I will gladly volunteer to come up with a full set of editing instructions Resolution: duplicate Revised Text: Actions taken: September 9, 2004: received issue September 24, 2004: closed issue, duplicate Discussion: End of Annotations:===== ubject: Codec Interface Deficiencies Date: Thu, 9 Sep 2004 18:46:08 -0400 Thread-Topic: Codec Interface Deficiencies Thread-Index: AcSWvst1alYTcRP9RD2KOmjzuh67SA== From: "Pilhofer, Frank" To: Cc: X-MIME-Autoconverted: from quoted-printable to 8bit by amethyst.omg.org id i89N1f1U024359 This is a new issue for the Core RTF. CORBA 3, chapter 13.8, defines the Codec interface to encode arbitrary data values into CORBA::OctetSeq "blobs" and vice versa. This interface can be used, e.g., to supply and retrieve ServiceContext data using the PortableInterceptor interfaces. In practice, the Codec interface is also being used for data serialization, i.e., to store and retrieve arbitrary values in files or other databases. However, the interface is deficient in that it does not consider all possible variables that are needed for interoperability. It supports setting the CDR version that is to be used, but neglects byteorder and codeset settings. Consequently, the encoded values are platform-specific. If a value was encoded on a little-endian system, it will not decode, or worse, decode erroneously, on a big-endian system. The same caveats apply to codesets, e.g., when an ISO-8859-1 encoded blob is decoded using UTF-8 or Windows-1252. To support interoperability, the Codec interface needs to be extended. My recommendation is to extend the CodecFactory interface, so that it supports creating CDR version-, byteorder-, and codeset-specific Codec instances, either supplying user- provided values for each, or informing the user about chosen defaults. Example: module IOP { const EncodingFormat ENCODING_DEFAULT = -1; typedef short ByteorderFormat; const ByteorderFormat BYTEORDER_DEFAULT = -1; const ByteorderFormat BYTEORDER_BIGENDIAN = 0; const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1; struct EncodingExt { EncodingFormat format; octet major_version; // set to 0 for default octet minor_version; ByteorderFormat byteorder; CONV_FRAME::CodeSetId char_data; // set to 0 for default CONV_FRAME::CodeSetId wchar_data; // set to 0 for default }; local interface CodecFactory { // create_codec remains as before Codec create_codec_ext (inout EncodingExt enc) raises (UnknownEncoding); }; }; The create_codec_ext operation would create an appropriate Codec instance, if available; it will then set all "default" members of the EncodingExt structure to their actual values, so that the application can store this information along with any encoded values. One potential criticism of the above is that the encoding format's parameters depend on the encoding format. For example, there may be encoding formats that are byteorder-independent, or that consistently use UTF-32 for strings, thus not needing codeset parameters. Also, they may use wildly different versioning. So a "better" solution might involve passing the EncodingFormat, and an Any with a format-specific data type. That could look like: module GIOP { typedef short ByteorderFormat; const ByteorderFormat BYTEORDER_DEFAULT = -1; const ByteorderFormat BYTEORDER_BIGENDIAN = 0; const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1; struct CDREncodingParameters { octet major_version; // set to 0 for default octet minor_version; ByteorderFormat byteorder; CONV_FRAME::CodeSetId char_data; // set to 0 for default CONV_FRAME::CodeSetId wchar_data; // set to 0 for default }; }; module IOP { const EncodingFormat ENCODING_DEFAULT = -1; local interface CodecFactory { // create_codec remains as before Codec create_codec_ext (inout EncodingFormat format, inout Any parameters) raises (UnknownEncoding); }; }; Once we have consensus on the approach, I will gladly volunteer to come up with a full set of editing instructions. Discussion? X-Sender: andyp@ussfex01.bea.com X-Mailer: QUALCOMM Windows Eudora Version 6.1.2.0 Date: Wed, 15 Sep 2004 09:19:37 -0700 To: "Pilhofer, Frank" , From: Andy Piper Subject: Re: Codec Interface Deficiencies Cc: This is the original proposal I sent out I can't find the issue number andy At 03:46 PM 9/9/2004, Pilhofer, Frank wrote: This is a new issue for the Core RTF. CORBA 3, chapter 13.8, defines the Codec interface to encode arbitrary data values into CORBA::OctetSeq "blobs" and vice versa. This interface can be used, e.g., to supply and retrieve ServiceContext data using the PortableInterceptor interfaces. In practice, the Codec interface is also being used for data serialization, i.e., to store and retrieve arbitrary values in files or other databases. However, the interface is deficient in that it does not consider all possible variables that are needed for interoperability. It supports setting the CDR version that is to be used, but neglects byteorder and codeset settings. Consequently, the encoded values are platform-specific. If a value was encoded on a little-endian system, it will not decode, or worse, decode erroneously, on a big-endian system. The same caveats apply to codesets, e.g., when an ISO-8859-1 encoded blob is decoded using UTF-8 or Windows-1252. To support interoperability, the Codec interface needs to be extended. My recommendation is to extend the CodecFactory interface, so that it supports creating CDR version-, byteorder-, and codeset-specific Codec instances, either supplying user- provided values for each, or informing the user about chosen defaults. Example: module IOP { const EncodingFormat ENCODING_DEFAULT = -1; typedef short ByteorderFormat; const ByteorderFormat BYTEORDER_DEFAULT = -1; const ByteorderFormat BYTEORDER_BIGENDIAN = 0; const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1; struct EncodingExt { EncodingFormat format; octet major_version; // set to 0 for default octet minor_version; ByteorderFormat byteorder; CONV_FRAME::CodeSetId char_data; // set to 0 for default CONV_FRAME::CodeSetId wchar_data; // set to 0 for default }; local interface CodecFactory { // create_codec remains as before Codec create_codec_ext (inout EncodingExt enc) raises (UnknownEncoding); }; }; The create_codec_ext operation would create an appropriate Codec instance, if available; it will then set all "default" members of the EncodingExt structure to their actual values, so that the application can store this information along with any encoded values. One potential criticism of the above is that the encoding format's parameters depend on the encoding format. For example, there may be encoding formats that are byteorder-independent, or that consistently use UTF-32 for strings, thus not needing codeset parameters. Also, they may use wildly different versioning. So a "better" solution might involve passing the EncodingFormat, and an Any with a format-specific data type. That could look like: module GIOP { typedef short ByteorderFormat; const ByteorderFormat BYTEORDER_DEFAULT = -1; const ByteorderFormat BYTEORDER_BIGENDIAN = 0; const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1; struct CDREncodingParameters { octet major_version; // set to 0 for default octet minor_version; ByteorderFormat byteorder; CONV_FRAME::CodeSetId char_data; // set to 0 for default CONV_FRAME::CodeSetId wchar_data; // set to 0 for default }; }; module IOP { const EncodingFormat ENCODING_DEFAULT = -1; local interface CodecFactory { // create_codec remains as before Codec create_codec_ext (inout EncodingFormat format, inout Any parameters) raises (UnknownEncoding); }; }; Once we have consensus on the approach, I will gladly volunteer to come up with a full set of editing instructions. Discussion? Frank codeset.txt Date: Mon, 20 Sep 2004 09:12:18 -0400 From: "Robert A. Kukura" User-Agent: Mozilla Thunderbird 0.7.3 (Windows/20040803) X-Accept-Language: en-us, en To: "Pilhofer, Frank" CC: corba-rtf@omg.org Subject: Re: Codec Interface Deficiencies X-SPAM: 0.00 The Codec encodings currently defined in CORBA are CDR encapsulations, which include an initial octet specifying the byte order of the encapsulated data. Therefore, I cannot agree with the statement that "If a value was encoded on a little-endian system, it will not decode, or worse, decode erroneously, on a big-endian system." See CORBA 3.0.3 section 13.8, which states "The Codec provides a mechanism to transfer these components between their IDL data types and their CDR encapsulation representations" and section 15.3.3, which states "When encapsulating OMG IDL data types, the first octet in the stream (index 0) contains a boolean value indicating the byte ordering of the encapsulated data." I would not disagree that a general purpose encoding/decoding interface should allow control of the byte order encoded in encapsulations, and should support decoding of encodings (other than encapsulations) for which the byte order is not fixed or encoded. I also would agree that control of codesets is needed, since 15.3.3 does say that codesets must be "explicitly defined" for encapsulations that contain chars or wchars. I have not found it in CORBA 3.0.3, but I would think the narrow and wide charsets encoded and decoded by the current Codecs should be specified. I prefer the second (any-based) approach suggested below, but we'd need to specify that the CDREncodingParameters::byteorder field is ignored when decoding encapsulations. I would prefer an explicit operation on CodecFactory to get the default parameters for a specified EncodingFormat. Also, I'm not sure the notion of a default encoding is all that useful, and would rather avoid inout params. -Bob Pilhofer, Frank wrote: This is a new issue for the Core RTF. CORBA 3, chapter 13.8, defines the Codec interface to encode arbitrary data values into CORBA::OctetSeq "blobs" and vice versa. This interface can be used, e.g., to supply and retrieve ServiceContext data using the PortableInterceptor interfaces. In practice, the Codec interface is also being used for data serialization, i.e., to store and retrieve arbitrary values in files or other databases. However, the interface is deficient in that it does not consider all possible variables that are needed for interoperability. It supports setting the CDR version that is to be used, but neglects byteorder and codeset settings. Consequently, the encoded values are platform-specific. If a value was encoded on a little-endian system, it will not decode, or worse, decode erroneously, on a big-endian system. The same caveats apply to codesets, e.g., when an ISO-8859-1 encoded blob is decoded using UTF-8 or Windows-1252. To support interoperability, the Codec interface needs to be extended. My recommendation is to extend the CodecFactory interface, so that it supports creating CDR version-, byteorder-, and codeset-specific Codec instances, either supplying user- provided values for each, or informing the user about chosen defaults. Example: module IOP { const EncodingFormat ENCODING_DEFAULT = -1; typedef short ByteorderFormat; const ByteorderFormat BYTEORDER_DEFAULT = -1; const ByteorderFormat BYTEORDER_BIGENDIAN = 0; const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1; struct EncodingExt { EncodingFormat format; octet major_version; // set to 0 for default octet minor_version; ByteorderFormat byteorder; CONV_FRAME::CodeSetId char_data; // set to 0 for default CONV_FRAME::CodeSetId wchar_data; // set to 0 for default }; local interface CodecFactory { // create_codec remains as before Codec create_codec_ext (inout EncodingExt enc) raises (UnknownEncoding); }; }; The create_codec_ext operation would create an appropriate Codec instance, if available; it will then set all "default" members of the EncodingExt structure to their actual values, so that the application can store this information along with any encoded values. One potential criticism of the above is that the encoding format's parameters depend on the encoding format. For example, there may be encoding formats that are byteorder-independent, or that consistently use UTF-32 for strings, thus not needing codeset parameters. Also, they may use wildly different versioning. So a "better" solution might involve passing the EncodingFormat, and an Any with a format-specific data type. That could look like: module GIOP { typedef short ByteorderFormat; const ByteorderFormat BYTEORDER_DEFAULT = -1; const ByteorderFormat BYTEORDER_BIGENDIAN = 0; const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1; struct CDREncodingParameters { octet major_version; // set to 0 for default octet minor_version; ByteorderFormat byteorder; CONV_FRAME::CodeSetId char_data; // set to 0 for default CONV_FRAME::CodeSetId wchar_data; // set to 0 for default }; }; module IOP { const EncodingFormat ENCODING_DEFAULT = -1; local interface CodecFactory { // create_codec remains as before Codec create_codec_ext (inout EncodingFormat format, inout Any parameters) raises (UnknownEncoding); }; }; Once we have consensus on the approach, I will gladly volunteer to come up with a full set of editing instructions. Discussion? Frank Frank