Issue 4539: UTF-8 and IDL character types in C++ (cxx_revision) Source: AdNovum Informatik (Mr. Stefan Wengi, nobody) Nature: Uncategorized Issue Severity: Summary: implementing support for wchar/wstring I ran into some potential problems with the UTF-8 encoding for the IDL 'char' type. Lets suppose we have a C++ server with a native single-byte code set like ISO 8859-1. The Code Set Conversion specification states that UTF-8 is the fallback code set for 'char'. -> a client could decide to send characters in UTF-8 encoding. What happens on the server side with UTF-8 encoded characters that use more than 1 byte and thus don't fit into the single byte character as specified by the C++ mapping for IDL type 'char'? Resolution: Revised Text: Actions taken: August 29, 2001: received issue Discussion: deferred in June 2011 to the next RTF End of Annotations:===== ender: Stefan.Wengi@AdNovum.CH Message-ID: <3B8CB405.EB60040B@adnovum.com> Date: Wed, 29 Aug 2001 11:21:09 +0200 From: Stefan Wengi Organization: AdNovum Software Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; SunOS 5.6 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: issues@omg.org Subject: UTF-8 and IDL character types in C++ Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: 8:0!! a client could decide to send characters in UTF-8 encoding. What happens on the server side with UTF-8 encoded characters that use more than 1 byte and thus don't fit into the single byte character as specified by the C++ mapping for IDL type 'char'? cheers Stefan -- +---------------------------------------------------------------------+ Stefan Wengi mailto:Stefan.Wengi@adnovum.com CTO dipl. Informatik Ing. ETH AdNovum Software Inc. San Mateo, CA 94404 phone: +1 (650) 525 9322 1400 Fashion Island Boulevard, Suite 309 fax: +1 (650) 525 9324 +---------------------------------------------------------------------+ AdNovum Informatik AG http://www.adnovum.ch phone: +41 (1) 272 6111 Roentgenstrasse 22, CH-8005 Zuerich fax: +41 (1) 272 6312 +---------------------------------------------------------------------+ Sender: jon@floorboard.com Message-ID: <3B8D2141.D6381E8C@floorboard.com> Date: Wed, 29 Aug 2001 10:07:13 -0700 From: Jonathan Biggar X-Mailer: Mozilla 4.77 [en] (X11; U; SunOS 5.7 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: Juergen Boldt CC: cxx_revision@emerald.omg.org Subject: Re: issue4539 -- C++ RTF issue References: <4.3.2.7.2.20010829122405.04bc4c00@emerald.omg.org> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: *\6!!eK4!!!']!!#J1e9 Juergen Boldt wrote: > > This is issue # 4539 Stefan Wengi > > UTF-8 and IDL character types in C++ > > implementing support for wchar/wstring I ran into some potential > problems with the UTF-8 encoding for the IDL 'char' type. > > Lets suppose we have a C++ server with a native single-byte code set > like ISO 8859-1. > The Code Set Conversion specification states that UTF-8 is the fallback > code set for 'char'. > -> a client could decide to send characters in UTF-8 encoding. > > What happens on the server side with UTF-8 encoded characters that use > more than 1 byte and thus don't fit into the single byte character as > specified by the C++ mapping for IDL type 'char'? I don't think this is really a C++ issue, rather it is an interop issue, because it applies to more languages than just C++. Chapter 13.10.2.6 states that if a character can't be properly converted into the fallback codeset then a DATA_CONVERSION exception is raised with minor code 1. That doesn't cover the issue raised here exactly, since it is conversion back from the fallback codeset to the native codeset that is the problem here. I think that the text in 13.10.2.6 should be modified to cover this case as well, either with the same minor code or a new one if enough people think a new minor code is called for. -- Jon Biggar Floorboard Software jon@floorboard.com jon@biggar.org