Issue 1612: Description of Wide String type (orb_revision) Source: (, ) Nature: Revision Severity: Summary: Summary: Null termination is a marshalling and language mapping issue and should not appear as part of the semantic description of the IDL type. It should be completely valid for a language with a native wide string type to handle the varying length nature of the wide string with or without use of null termination and certainly without exposing it to the user. Therefore, the description of the wide string type in 3.8.3 should not mention null termination. Also the syntax should be included. Resolution: Revised Text: Actions taken: June 29, 1998: received issue February 23, 1999: closed issue Discussion: End of Annotations:===== Return-Path: X-Sender: giddiv@gamma Date: Mon, 29 Jun 1998 16:20:03 -0400 To: issues@omg.org, orb_revision@omg.org From: Victor Giddings Subject: Description of Wide String type Issue: Null termination is a marshalling and language mapping issue and should not appear as part of the semantic description of the IDL type. It should be completely valid for a language with a native wide string type to handle the varying length nature of the wide string with or without use of null termination and certainly without exposing it to the user. Therefore, the description of the wide string type in 3.8.3 should not mention null termination. Also the syntax should be included. Present text: "The wstring data type represents a null-terminated (note: a wide character null) sequence of wchar. Type wstring is analogous to string, except that its element type is wchar instead of char." Proposed change: The wstring data type represents a sequence of wchar, except the wide character null. Type wstring is analogous to string, except that its element type is wchar instead of char. The actual length of a wstring is set at run-time and, if the bounded form is used, must be less than or equal to the bound. The syntax for defining a wstring is: :: = | Return-Path: Sender: jis@fpk.hp.com Date: Mon, 29 Jun 1998 17:25:25 -0400 From: Jishnu Mukerji Organization: Hewlett-Packard New Jersey Laboratories To: Victor Giddings Cc: orb_revision@omg.org Subject: Re: Description of Wide String type References: <199806292020.QAA22521@gamma.ois.com> Victor Giddings wrote: > Issue: > > Null termination is a marshalling and language mapping issue and should not > appear as part of the semantic description of the IDL type. It should be > completely valid for a language with a native wide string type to handle the > varying length nature of the wide string with or without use of null > termination and certainly without exposing it to the user. Therefore, the > description of the wide string type in 3.8.3 should not mention null > termination. > > Also the syntax should be included. > > Present text: > "The wstring data type represents a null-terminated (note: a wide character > null) sequence of wchar. Type wstring is analogous to string, except that its > element type is wchar instead of char." > > Proposed change: > > The wstring data type represents a sequence of wchar, except the wide > character > null. Type wstring is analogous to string, except that its element type is > wchar > instead of char. The actual length of a wstring is set at run-time and, if the > bounded form is used, must be less than or equal to the bound. > > The syntax for defining a wstring is: > ::= ?wstring? ? ?>? | ?wstring? This is a very reasonable proposal. It brings the description of wstring in line with the existing description of string in that section. So unless I hear major objections to this I am going to include this with the proposed resolution above, in the list of issues to vote on that will go out middle of this week. So, if anyone has any major objections to this please let me know, and start a discussion of the objections. Thanks, Jishnu. Return-Path: X-Authentication-Warning: tigger.dstc.edu.au: michi owned process doing -bs Date: Tue, 30 Jun 1998 08:01:44 +1000 (EST) From: Michi Henning To: Jishnu Mukerji cc: Victor Giddings , orb_revision@omg.org Subject: Re: Description of Wide String type On Mon, 29 Jun 1998, Jishnu Mukerji wrote: > > The syntax for defining a wstring is: > > ::= ?wstring? ? ?>? | > ?wstring? > > This is a very reasonable proposal. It brings the description of > wstring in line > with the existing description of string in that section. So unless I > hear major > objections to this I am going to include this with the proposed > resolution above, > in the list of issues to vote on that will go out middle of this > week. Sounds good, but we need to fix something else: Wide character and wide string literals are specfied exactly like character and string literals. This is rather naive and makes life hard for compiler writers. Reason: When the tokenizer sees either the opening single quote for a character literal or the opening quote for a string literal, it has no idea whether it should interpret the contents as a narrow or a wide literal. This makes the tokenizer context sensitive. Suggestions: Change the syntax to require an "L" prefix as in C++, e.g.: const wchar WCHAR_CONST = L'X'; const wstring WSTRING_CONST = L"Hello"; This gives wide character and wide string literals a proper implicit type. C++ introduced the "L" prefix for exactly the same reasons, and I don't see why we can't follow that. Cheers, Michi. -- Michi Henning +61 7 33654310 DSTC Pty Ltd +61 7 33654311 (fax) University of Qld 4072 michi@dstc.edu.au AUSTRALIA http://www.dstc.edu.au/BDU/staff/michi-henning.html Return-Path: Date: Mon, 29 Jun 1998 17:58:21 PDT Sender: Bill Janssen From: Bill Janssen To: orb_revision@omg.org, Victor Giddings Subject: Re: Description of Wide String type References: <199806292020.QAA22521@gamma.ois.com> Excerpts from local.omg: 29-Jun-98 Description of Wide String .. Victor Giddings@ois.com (1144*) > Null termination is a marshalling and language mapping issue and should not > appear as part of the semantic description of the IDL type. Actually, if you really want to get into it, the whole distinction between "wide" and ? (not-wide?) strings is a language mapping issue. Why don't we just have a single string type in IDL? Bill Return-Path: X-Authentication-Warning: tigger.dstc.edu.au: michi owned process doing -bs Date: Tue, 30 Jun 1998 11:17:14 +1000 (EST) From: Michi Henning To: Bill Janssen cc: orb_revision@omg.org, Victor Giddings Subject: Re: Description of Wide String type On Mon, 29 Jun 1998, Bill Janssen wrote: > Excerpts from local.omg: 29-Jun-98 Description of Wide String .. Victor > Giddings@ois.com (1144*) > > > Null termination is a marshalling and language mapping issue and should not > > appear as part of the semantic description of the IDL type. > > Actually, if you really want to get into it, the whole distinction > between "wide" and ? (not-wide?) strings is a language mapping issue. > Why don't we just have a single string type in IDL? I think because then, all strings would have to be wide strings (just like Java). Unfortunately, making such a change would make a complete mess of existing IDL specifications and existing source code. I agree with the sentiments about null termination though - IDL is not the place to talk about this. Cheers, Michi. -- Michi Henning +61 7 33654310 DSTC Pty Ltd +61 7 33654311 (fax) University of Qld 4072 michi@dstc.edu.au AUSTRALIA http://www.dstc.edu.au/BDU/staff/michi-henning.html Return-Path: Date: Mon, 29 Jun 1998 18:43:39 PDT Sender: Bill Janssen From: Bill Janssen To: Bill Janssen , Michi Henning Subject: Re: Description of Wide String type CC: orb_revision@omg.org, Victor Giddings References: Excerpts from direct: 29-Jun-98 Re: Description of Wide Str.. Michi Henning@dstc.edu.a (1046*) > I think because then, all strings would have to be wide strings (just > like Java). I think the right way to think about it is that an IDL string type should have a `language' associated with it. By `language', I mean roughly the concept in Internet RFC 2277, but extended somewhat to cover non-human languages, and anything which specifies regular patterns of strings. For example, you should be able to register the `language' "Java-source", since Java is a well-defined restriction on string patterns. The mapping to any particular language/locale would be whatever is appropriate to support the character set(s) of the language. The mapping to the wire would be to whatever codesets can be negotiated to transmit the character set of the language. We'd change the declaration of a string to take two parameters, instead of one: typedef string<,en-us> US_English_String; Wide string types would be created automatically by specifying a language which needs wide characters: typedef string <,Java-source> Java_String; Bill Return-Path: X-Authentication-Warning: tigger.dstc.edu.au: michi owned process doing -bs Date: Tue, 30 Jun 1998 11:54:52 +1000 (EST) From: Michi Henning To: Bill Janssen cc: orb_revision@omg.org, Victor Giddings Subject: Re: Description of Wide String type On Mon, 29 Jun 1998, Bill Janssen wrote: > I think the right way to think about it is that an IDL string type > should have a `language' associated with it. By `language', I mean > roughly the concept in Internet RFC 2277, but extended somewhat to > cover > non-human languages, and anything which specifies regular patterns > of > strings. For example, you should be able to register the `language' > "Java-source", since Java is a well-defined restriction on string > patterns. The mapping to any particular language/locale would be > whatever is appropriate to support the character set(s) of the > language. > The mapping to the wire would be to whatever codesets can be > negotiated > to transmit the character set of the language. > > We'd change the declaration of a string to take two parameters, > instead of one: > > typedef string<,en-us> US_English_String; > > Wide string types would be created automatically by specifying a > language which needs wide characters: > > typedef string <,Java-source> Java_String; Hmmm... Wouldn't that create a combinatorial explosion for language mappings? The C++ mapping then would have to define a mapping for English strings, a mapping for Java strings, a mapping for... This seems to be getting away from the language independence of IDL, as far as I can see. Cheers, Michi. -- Michi Henning +61 7 33654310 DSTC Pty Ltd +61 7 33654311 (fax) University of Qld 4072 michi@dstc.edu.au AUSTRALIA http://www.dstc.edu.au/BDU/staff/michi-henning.html Return-Path: Date: Mon, 29 Jun 1998 19:26:28 PDT Sender: Bill Janssen From: Bill Janssen To: Bill Janssen , Michi Henning Subject: Re: Description of Wide String type CC: orb_revision@omg.org, Victor Giddings References: Excerpts from direct: 29-Jun-98 Re: Description of Wide Str.. Michi Henning@dstc.edu.a (1565*) > Hmmm... Wouldn't that create a combinatorial explosion for language > mappings? The C++ mapping then would have to define a mapping for > English > strings, a mapping for Java strings, a mapping for... There are some interesting problems here, mainly because most programming languages are poorly designed w.r.t. proper support of strings. I'd expect that the C and C++ mapping would simply map each string type to either "char *" or "wchar *", depending on the compiler/runtime/locale settings. Bill Return-Path: From: Mike_Spreitzer.PARC@xerox.com X-NS-Transport-ID: 0800201FCE5D3932FBA2 Date: Tue, 30 Jun 1998 08:26:07 PDT Subject: Re: Issue 1612 -- Core revision issue To: juergen@omg.org cc: issues@omg.org, orb_revision@omg.org > description of the wide string type in 3.8.3 should not mention null termination. But if it is desired to keep null termination an option for some language mappings and/or marshallings, you have to prohibit the character with code 0 from appearing the in the "content" of the string. Return-Path: Sender: jis@fpk.hp.com Date: Tue, 30 Jun 1998 13:13:52 -0400 From: Jishnu Mukerji Organization: Hewlett-Packard New Jersey Laboratories To: Mike_Spreitzer.PARC@xerox.com Cc: orb_revision@omg.org Subject: Re: Issue 1612 -- Core revision issue References: <98Jun30.082634pdt."56775(4)"@alpha.xerox.com> Mike_Spreitzer.PARC@xerox.com wrote: > > description of the wide string type in 3.8.3 should not mention null > termination. > > But if it is desired to keep null termination an option for some language > mappings and/or marshallings, you have to prohibit the character with code 0 > from appearing the in the "content" of the string. That is accounted for in the proposed change excerpted below: > Proposed change: > > The wstring data type represents a sequence of wchar, except the wide > character > null. Type wstring is analogous to string, except that its element type is > wchar > instead of char. The actual length of a wstring is set at run-time and, if the > bounded form is used, must be less than or equal to the bound. > > The syntax for defining a wstring is: > ::= "wstring" "<" ">" | "wstring" > Jishnu. Return-Path: X-Authentication-Warning: tigger.dstc.edu.au: michi owned process doing -bs Date: Wed, 1 Jul 1998 08:03:44 +1000 (EST) From: Michi Henning To: Mike_Spreitzer.PARC@xerox.com cc: juergen@omg.org, issues@omg.org, orb_revision@omg.org Subject: Re: Issue 1612 -- Core revision issue On Tue, 30 Jun 1998 Mike_Spreitzer.PARC@xerox.com wrote: > > description of the wide string type in 3.8.3 should not mention null > termination. > > But if it is desired to keep null termination an option for some language > mappings and/or marshallings, you have to prohibit the character with code 0 > from appearing the in the "content" of the string. Agreed. We already do this for ordinary strings. Cheers, Michi. -- Michi Henning +61 7 33654310 DSTC Pty Ltd +61 7 33654311 (fax) University of Qld 4072 michi@dstc.edu.au AUSTRALIA http://www.dstc.edu.au/BDU/staff/michi-henning.html