Issue 7731: Codec Interface Deficiencies (corba-rtf)
Source: Zuehlke Engineering (Mr. Frank Pilhofer, fpilhofer2008(at)gmail.com)
Nature: Uncategorized Issue
Severity: 
Summary: CORBA 3, chapter 13.8, defines the Codec interface to encode
arbitrary data values into CORBA::OctetSeq "blobs" and vice
versa. This interface can be used, e.g., to supply and retrieve
ServiceContext data using the PortableInterceptor interfaces.


In practice, the Codec interface is also being used for data
serialization, i.e., to store and retrieve arbitrary values in
files or other databases.


However, the interface is deficient in that it does not consider
all possible variables that are needed for interoperability.
It supports setting the CDR version that is to be used, but
neglects byteorder and codeset settings.


Consequently, the encoded values are platform-specific. If a
value was encoded on a little-endian system, it will not decode,
or worse, decode erroneously, on a big-endian system. The same
caveats apply to codesets, e.g., when an ISO-8859-1 encoded
blob is decoded using UTF-8 or Windows-1252.


To support interoperability, the Codec interface needs to be
extended.


My recommendation is to extend the CodecFactory interface,
so that it supports creating CDR version-, byteorder-, and
codeset-specific Codec instances, either supplying user-
provided values for each, or informing the user about chosen
defaults.


Example:


module IOP {
  const EncodingFormat ENCODING_DEFAULT = -1;


  typedef short ByteorderFormat;
  const ByteorderFormat BYTEORDER_DEFAULT = -1;
  const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
  const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;


  struct EncodingExt {
    EncodingFormat format;
    octet major_version;   // set to 0 for default
    octet minor_version;
    ByteorderFormat byteorder;
    CONV_FRAME::CodeSetId char_data; // set to 0 for default
    CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
  };


  local interface CodecFactory {
    // create_codec remains as before
    Codec create_codec_ext (inout EncodingExt enc)
      raises (UnknownEncoding);
  };
};


The create_codec_ext operation would create an appropriate
Codec instance, if available; it will then set all "default"
members of the EncodingExt structure to their actual values,
so that the application can store this information along
with any encoded values.


One potential criticism of the above is that the encoding
format's parameters depend on the encoding format. For example,
there may be encoding formats that are byteorder-independent,
or that consistently use UTF-32 for strings, thus not needing
codeset parameters. Also, they may use wildly different
versioning. So a "better" solution might involve passing
the EncodingFormat, and an Any with a format-specific data
type.


That could look like:


module GIOP {
  typedef short ByteorderFormat;
  const ByteorderFormat BYTEORDER_DEFAULT = -1;
  const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
  const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;


  struct CDREncodingParameters {
    octet major_version;   // set to 0 for default
    octet minor_version;
    ByteorderFormat byteorder;
    CONV_FRAME::CodeSetId char_data; // set to 0 for default
    CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
  };
};


module IOP {
  const EncodingFormat ENCODING_DEFAULT = -1;


  local interface CodecFactory {
    // create_codec remains as before
    Codec create_codec_ext (inout EncodingFormat format,
                            inout Any parameters)
      raises (UnknownEncoding);
  };
};


Once we have consensus on the approach, I will gladly volunteer
to come up with a full set of editing instructions
Resolution: duplicate
Revised Text: 
Actions taken:
September 9, 2004: received issue
September 24, 2004: closed issue, duplicate
Discussion: 

End of Annotations:=====
ubject: Codec Interface Deficiencies 
Date: Thu, 9 Sep 2004 18:46:08 -0400 
Thread-Topic: Codec Interface Deficiencies 
Thread-Index: AcSWvst1alYTcRP9RD2KOmjzuh67SA== 
From: "Pilhofer, Frank" <fpilhofe@mc.com> 
To: <issues@omg.org> 
Cc: <corba-rtf@omg.org> 
X-MIME-Autoconverted: from quoted-printable to 8bit by amethyst.omg.org id i89N1f1U024359 


This is a new issue for the Core RTF.


CORBA 3, chapter 13.8, defines the Codec interface to encode
arbitrary data values into CORBA::OctetSeq "blobs" and vice
versa. This interface can be used, e.g., to supply and retrieve
ServiceContext data using the PortableInterceptor interfaces.


In practice, the Codec interface is also being used for data
serialization, i.e., to store and retrieve arbitrary values in
files or other databases.


However, the interface is deficient in that it does not consider
all possible variables that are needed for interoperability.
It supports setting the CDR version that is to be used, but
neglects byteorder and codeset settings.


Consequently, the encoded values are platform-specific. If a
value was encoded on a little-endian system, it will not decode,
or worse, decode erroneously, on a big-endian system. The same
caveats apply to codesets, e.g., when an ISO-8859-1 encoded
blob is decoded using UTF-8 or Windows-1252.


To support interoperability, the Codec interface needs to be
extended.


My recommendation is to extend the CodecFactory interface,
so that it supports creating CDR version-, byteorder-, and
codeset-specific Codec instances, either supplying user-
provided values for each, or informing the user about chosen
defaults.


Example:


module IOP {
  const EncodingFormat ENCODING_DEFAULT = -1;


  typedef short ByteorderFormat;
  const ByteorderFormat BYTEORDER_DEFAULT = -1;
  const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
  const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;


  struct EncodingExt {
    EncodingFormat format;
    octet major_version;   // set to 0 for default
    octet minor_version;
    ByteorderFormat byteorder;
    CONV_FRAME::CodeSetId char_data; // set to 0 for default
    CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
  };


  local interface CodecFactory {
    // create_codec remains as before
    Codec create_codec_ext (inout EncodingExt enc)
      raises (UnknownEncoding);
  };
};


The create_codec_ext operation would create an appropriate
Codec instance, if available; it will then set all "default"
members of the EncodingExt structure to their actual values,
so that the application can store this information along
with any encoded values.


One potential criticism of the above is that the encoding
format's parameters depend on the encoding format. For example,
there may be encoding formats that are byteorder-independent,
or that consistently use UTF-32 for strings, thus not needing
codeset parameters. Also, they may use wildly different
versioning. So a "better" solution might involve passing
the EncodingFormat, and an Any with a format-specific data
type.


That could look like:


module GIOP {
  typedef short ByteorderFormat;
  const ByteorderFormat BYTEORDER_DEFAULT = -1;
  const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
  const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;


  struct CDREncodingParameters {
    octet major_version;   // set to 0 for default
    octet minor_version;
    ByteorderFormat byteorder;
    CONV_FRAME::CodeSetId char_data; // set to 0 for default
    CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
  };
};


module IOP {
  const EncodingFormat ENCODING_DEFAULT = -1;


  local interface CodecFactory {
    // create_codec remains as before
    Codec create_codec_ext (inout EncodingFormat format,
                            inout Any parameters)
      raises (UnknownEncoding);
  };
};


Once we have consensus on the approach, I will gladly volunteer
to come up with a full set of editing instructions.


Discussion?

X-Sender: andyp@ussfex01.bea.com 
X-Mailer: QUALCOMM Windows Eudora Version 6.1.2.0 
Date: Wed, 15 Sep 2004 09:19:37 -0700 
To: "Pilhofer, Frank" <fpilhofe@mc.com>, <issues@omg.org> 
From: Andy Piper <andyp@bea.com> 
Subject: Re: Codec Interface Deficiencies 
Cc: <corba-rtf@omg.org> 


This is the original proposal I sent out I can't find the issue number


andy


At 03:46 PM 9/9/2004, Pilhofer, Frank wrote:
This is a new issue for the Core RTF.


CORBA 3, chapter 13.8, defines the Codec interface to encode
arbitrary data values into CORBA::OctetSeq "blobs" and vice
versa. This interface can be used, e.g., to supply and retrieve
ServiceContext data using the PortableInterceptor interfaces.


In practice, the Codec interface is also being used for data
serialization, i.e., to store and retrieve arbitrary values in
files or other databases.


However, the interface is deficient in that it does not consider
all possible variables that are needed for interoperability.
It supports setting the CDR version that is to be used, but
neglects byteorder and codeset settings.


Consequently, the encoded values are platform-specific. If a
value was encoded on a little-endian system, it will not decode,
or worse, decode erroneously, on a big-endian system. The same
caveats apply to codesets, e.g., when an ISO-8859-1 encoded
blob is decoded using UTF-8 or Windows-1252.


To support interoperability, the Codec interface needs to be
extended.


My recommendation is to extend the CodecFactory interface,
so that it supports creating CDR version-, byteorder-, and
codeset-specific Codec instances, either supplying user-
provided values for each, or informing the user about chosen
defaults.


Example:


module IOP {
  const EncodingFormat ENCODING_DEFAULT = -1;


  typedef short ByteorderFormat;
  const ByteorderFormat BYTEORDER_DEFAULT = -1;
  const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
  const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;


  struct EncodingExt {
    EncodingFormat format;
    octet major_version;   // set to 0 for default
    octet minor_version;
    ByteorderFormat byteorder;
    CONV_FRAME::CodeSetId char_data; // set to 0 for default
    CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
  };


  local interface CodecFactory {
    // create_codec remains as before
    Codec create_codec_ext (inout EncodingExt enc)
      raises (UnknownEncoding);
  };
};


The create_codec_ext operation would create an appropriate
Codec instance, if available; it will then set all "default"
members of the EncodingExt structure to their actual values,
so that the application can store this information along
with any encoded values.


One potential criticism of the above is that the encoding
format's parameters depend on the encoding format. For example,
there may be encoding formats that are byteorder-independent,
or that consistently use UTF-32 for strings, thus not needing
codeset parameters. Also, they may use wildly different
versioning. So a "better" solution might involve passing
the EncodingFormat, and an Any with a format-specific data
type.


That could look like:


module GIOP {
  typedef short ByteorderFormat;
  const ByteorderFormat BYTEORDER_DEFAULT = -1;
  const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
  const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;


  struct CDREncodingParameters {
    octet major_version;   // set to 0 for default
    octet minor_version;
    ByteorderFormat byteorder;
    CONV_FRAME::CodeSetId char_data; // set to 0 for default
    CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
  };
};


module IOP {
  const EncodingFormat ENCODING_DEFAULT = -1;


  local interface CodecFactory {
    // create_codec remains as before
    Codec create_codec_ext (inout EncodingFormat format,
                            inout Any parameters)
      raises (UnknownEncoding);
  };
};


Once we have consensus on the approach, I will gladly volunteer
to come up with a full set of editing instructions.


Discussion?
Frank


 codeset.txt 


Date: Mon, 20 Sep 2004 09:12:18 -0400 
From: "Robert A. Kukura" <Robert_A_Kukura@raytheon.com> 
User-Agent: Mozilla Thunderbird 0.7.3 (Windows/20040803) 
X-Accept-Language: en-us, en 
To: "Pilhofer, Frank" <fpilhofe@mc.com> 
CC: corba-rtf@omg.org 
Subject: Re: Codec Interface Deficiencies 
X-SPAM: 0.00 


The Codec encodings currently defined in CORBA are CDR encapsulations, which include an initial octet specifying the byte order of the encapsulated data. Therefore, I cannot agree with the statement that "If a value was encoded on a little-endian system, it will not decode, or worse, decode erroneously, on a big-endian system."  See CORBA 3.0.3 section 13.8, which states "The Codec provides a mechanism to transfer these components between their IDL data types and their CDR encapsulation representations" and section 15.3.3, which states "When encapsulating OMG IDL data types, the first octet in the stream (index 0) contains a boolean value indicating the byte ordering of the encapsulated data."


I would not disagree that a general purpose encoding/decoding interface should allow control of the byte order encoded in encapsulations, and should support decoding of encodings (other than encapsulations) for which the byte order is not fixed or encoded.


I also would agree that control of codesets is needed, since 15.3.3 does say that codesets must be "explicitly defined" for encapsulations that contain chars or wchars. I have not found it in CORBA 3.0.3, but I would think the narrow and wide charsets encoded and decoded by the current Codecs should be specified.


I prefer the second (any-based) approach suggested below, but we'd need to specify that the CDREncodingParameters::byteorder field is ignored when decoding encapsulations. I would prefer an explicit operation on CodecFactory to get the default parameters for a specified EncodingFormat. Also, I'm not sure the notion of a default encoding is all that useful, and would rather avoid inout params.


-Bob


Pilhofer, Frank wrote:


This is a new issue for the Core RTF.


CORBA 3, chapter 13.8, defines the Codec interface to encode
arbitrary data values into CORBA::OctetSeq "blobs" and vice
versa. This interface can be used, e.g., to supply and retrieve
ServiceContext data using the PortableInterceptor interfaces.


In practice, the Codec interface is also being used for data
serialization, i.e., to store and retrieve arbitrary values in
files or other databases.


However, the interface is deficient in that it does not consider
all possible variables that are needed for interoperability.
It supports setting the CDR version that is to be used, but
neglects byteorder and codeset settings.


Consequently, the encoded values are platform-specific. If a
value was encoded on a little-endian system, it will not decode,
or worse, decode erroneously, on a big-endian system. The same
caveats apply to codesets, e.g., when an ISO-8859-1 encoded
blob is decoded using UTF-8 or Windows-1252.


To support interoperability, the Codec interface needs to be
extended.


My recommendation is to extend the CodecFactory interface,
so that it supports creating CDR version-, byteorder-, and
codeset-specific Codec instances, either supplying user-
provided values for each, or informing the user about chosen
defaults.


Example:


module IOP {
 const EncodingFormat ENCODING_DEFAULT = -1;


 typedef short ByteorderFormat;
 const ByteorderFormat BYTEORDER_DEFAULT = -1;
 const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
 const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;


 struct EncodingExt {
   EncodingFormat format;
   octet major_version;   // set to 0 for default
   octet minor_version;
   ByteorderFormat byteorder;
   CONV_FRAME::CodeSetId char_data; // set to 0 for default
   CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
 };


 local interface CodecFactory {
   // create_codec remains as before
   Codec create_codec_ext (inout EncodingExt enc)
     raises (UnknownEncoding);
 };
};


The create_codec_ext operation would create an appropriate
Codec instance, if available; it will then set all "default"
members of the EncodingExt structure to their actual values,
so that the application can store this information along
with any encoded values.


One potential criticism of the above is that the encoding
format's parameters depend on the encoding format. For example,
there may be encoding formats that are byteorder-independent,
or that consistently use UTF-32 for strings, thus not needing
codeset parameters. Also, they may use wildly different
versioning. So a "better" solution might involve passing
the EncodingFormat, and an Any with a format-specific data
type.


That could look like:


module GIOP {
 typedef short ByteorderFormat;
 const ByteorderFormat BYTEORDER_DEFAULT = -1;
 const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
 const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;


 struct CDREncodingParameters {
   octet major_version;   // set to 0 for default
   octet minor_version;
   ByteorderFormat byteorder;
   CONV_FRAME::CodeSetId char_data; // set to 0 for default
   CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
 };
};


module IOP {
 const EncodingFormat ENCODING_DEFAULT = -1;


 local interface CodecFactory {
   // create_codec remains as before
   Codec create_codec_ext (inout EncodingFormat format,
                           inout Any parameters)
     raises (UnknownEncoding);
 };
};


Once we have consensus on the approach, I will gladly volunteer
to come up with a full set of editing instructions.


Discussion?
Frank


Frank