Issue 1067: Wide character and wide string literals (orb_revision) Source: (, ) Nature: Uncategorized Issue Severity: Summary: Summary: Section 3.2.5 on page 3-9, 2nd para says: Wide character and wide string literals are specified exactly like character and string literals. All character and string literals, both wide and non-wide, may only be specified (portably) using the characters found in the ISO 8859-1 character set, that is interface, names, operation names, type names, etc., will continue to be limited to the ISO 8859-1 character set. - The first part says that wide character literals and wide string literals are to be specified exactly like character and string literals. This seems to be impossible - if they were exactly the same, there would be no point in having them... At any rate, the sentence seems to imply that I must restrict myself to ISO Latin-1 characters in wide literals. - The second part then says that wide literals are restricted to 8859-1, but that interface names (etc.) will continue to be limited to 8859-1. Now what is that supposed to mean? Interface names have always (and incorrectly) been limited to 8859-1. Nothing has changed. Am I to imply then that the sentence was meant to suggest that wide literals can actually contain wide characters other than 8859-1? This paragraph simply doesn"t make sense as it stands. Resolution: Revised Text: Actions taken: March 18, 1998: received issue February 23, 1999: closed issue Discussion: End of Annotations:===== Return-Path: X-Authentication-Warning: tigger.dstc.edu.au: michi owned process doing -bs Date: Wed, 18 Mar 1998 15:09:48 +1000 (EST) From: Michi Henning To: issues@omg.org Subject: Wide character and wide string literals Hi, Section 3.2.5 on page 3-9, 2nd para says: Wide character and wide string literals are specified exactly like character and string literals. All character and string literals, both wide and non-wide, may only be specified (portably) using the characters found in the ISO 8859-1 character set, that is interface, names, operation names, type names, etc., will continue to be limited to the ISO 8859-1 character set. - The first part says that wide character literals and wide string literals are to be specified exactly like character and string literals. This seems to be impossible - if they were exactly the same, there would be no point in having them... At any rate, the sentence seems to imply that I must restrict myself to ISO Latin-1 characters in wide literals. - The second part then says that wide literals are restricted to 8859-1, but that interface names (etc.) will continue to be limited to 8859-1. Now what is that supposed to mean? Interface names have always (and incorrectly) been limited to 8859-1. Nothing has changed. Am I to imply then that the sentence was meant to suggest that wide literals can actually contain wide characters other than 8859-1? This paragraph simply doesn't make sense as it stands. Some more questions: If wide string literals are exactly the same as non-wide string literals, I need a context sensitive tokenizer. Why not use the C++ convention of prefixing a wide string literal with L? That way, the tokenizer can tell without context information whether it is supposed to return a wide or non-wide string to the parser. Similarly for wide char literals. The L prefix would make life easier. There is no mention of universal charcter names that could be used to portably translate a source file into the IDL source character set. With what is specified right now, I think portability goes down the drain. If an IDL file contains wide chars or strings and is compiled with different compiler implementations, the chances of actually getting the same constants seem quite slim. Cheers, Michi. -- Michi Henning +61 7 33654310 DSTC Pty Ltd +61 7 33654311 (fax) University of Qld 4072 michi@dstc.edu.au AUSTRALIA http://www.dstc.edu.au/BDU/staff/michi-henning.html Return-Path: X-Authentication-Warning: tigger.dstc.edu.au: michi owned process doing -bs Date: Wed, 1 Jul 1998 12:47:26 +1000 (EST) From: Michi Henning To: orb_revision@omg.org Subject: Wide characters and wide string literals Hi, I didn't get any comments on this one, so here it is once more... In Section 3.2.5, at the end of "Character Literals", the spec says: Wide character and wide string literals are specified exactly like character and string literals. All character and string literals, both wide and non-wide, may only be specified (portably) using the characters found in the ISO 8859-1 character set, that is interface names, operation names, type names, etc., will continue to be limited to the ISO 8859-1 character set. Several problems with this: 1) The section about *character* literals talks about wide *string* literals, but the section about *string* literals says nothing about wide string literals. Character and wide character literals should be discussed in the section on character literals, and string and wide string literals should be discussed in the section on string literals. 2) The last sentence doesn't make sense. It says that literals are limited to ISO 8859-1, but that identifiers are limited to ISO 8859-1. Also, with the identifier change to ASCII letters, digits, and underscores, this no longer applies. 3) If wide character and string literals are exactly like normal literals, the tokenizer becomes context-sensitive. This is a Bad Thing (TM). Here is the proposal: Change the last para of "Character Literals" to read: Wide character literals have an L prefix, for example: const wchar C1 = L'X'; Attempts to assign a wide character literal to a non-wide character constant or to assign a non-wide character literal to a wide character constant result in a compile-time diagnostic. Both wide and non-wide character literals must be specified using characters from the ISO 8859-1 character set. Add the following text to the end of the "String Literals" section: Wide string literals have an L prefix, for example: const wstring S1 = L"Hello"; Attempts to assign a wide string literal to a non-wide string constant or to assign a non-wide string literal to a wide string constant result in a compile-time diagnostic. Both wide and non-wide string literals must be specified using characters from the ISO 8859-1 character set. A wide string literal shall not contain the wide character with value zero. Cheers, Michi. -- Michi Henning +61 7 33654310 DSTC Pty Ltd +61 7 33654311 (fax) University of Qld 4072 michi@dstc.edu.au AUSTRALIA http://www.dstc.edu.au/BDU/staff/michi-henning.html