Issue 4113: Null termination of strings (interop) Source: (Mr. Simon C. Nash, ) Nature: Uncategorized Issue Severity: Summary: Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR encoding of strings, includes the following sentence in the first paragraph: "Both the string length and contents include a terminating null." It is not clear from this whether exactly one terminating null is required, or whether more than one null can be included, with the string being terminated by the first null. Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL defines the string type string consisting of all possible 8-bit quantities except null"), any additional nulls following the first terminating null cannot be part of the string, and it therefore seems reasonable to ignore them. Proposed Resolution: Change the above sentence in section 15.3.2.7 to: "Both the string length and contents include at least one terminating null." Also make the same change to the corresponding sentence in the third paragraph of section 15.3.2.7 describing GIOP 1.1 wide strings. Resolution: To close with clarification revision Revised Text: Change the first para of 15.3.2.7 from: A string is encoded as an unsigned long indicating the length of the string in octets, followed by the string value in single- or multi-byte form represented as a sequence of octets. Both the string length and contents include a terminating null. to read: A string is encoded as an unsigned long indicating the length of the string in octets, followed by the string value in single- or multi-byte form represented as a sequence of octets. The string contents include a single terminating null character. The string length includes the null character, so an empty string has a length of 1. Change the third para of 15.3.2.7 from: For GIOP version 1.1, a wide string is encoded as an unsigned long indicating the length of the string in octets or unsigned integers (determined by the transfer syntax for wchar) followed by the individual wide characters. Both the string length and contents include a terminating null. The terminating null character for a wstring is also a wide character. to read: For GIOP version 1.1, a wide string is encoded as an unsigned long indicating the length of the string in octets or unsigned integers (determined by the transfer syntax for wchar) followed by the individual wide characters. The string contents include a single terminating null character. The string length includes the null character. The terminating null character for a wstring is also a wide character. Change the fourth para of 15.3.2.7 from: For GIOP version 1.2, when encoding a wstring, always encode the length as the total number of octets used by the encoded value, regardless of whether the encoding is byte-oriented or not. For GIOP version 1.2 a wstring is not terminated by a NUL character. In particular, in GIOP version 1.2 a length of 0 is legal for wstring. to read: For GIOP version 1.2, when encoding a wstring, always encode the length as the total number of octets used by the encoded value, regardless of whether the encoding is byte-oriented or not. For GIOP version 1.2 a wstring is not terminated by a null character. In particular, in GIOP version 1.2 a length of 0 is legal for wstring. Actions taken: December 8, 2000: received issue October 3, 2001: closed issue Discussion: In the words of an RTF member, extracted from the mail archive on this issue: I don't think any such change is needed. The string length tells me how many bytes are in the string, including the terminating NUL. I would expect that length to give me *exactly* that count. What follows the terminating NUL is either padding, or the next value in the byte stream. I see no point in allowing a string to have several terminating NUL characters. What would this improve? However, several RTF vote 1 comments indicated that clarficication would help. End of Annotations:===== Date: Fri, 08 Dec 2000 15:50:49 +0000 From: Simon Nash Organization: IBM X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I) X-Accept-Language: en MIME-Version: 1.0 To: issues@omg.org CC: interop@omg.org Subject: Null termination of strings Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: k)W!!K[8e9E~-e9)d\d9 Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR encoding of strings, includes the following sentence in the first paragraph: "Both the string length and contents include a terminating null." It is not clear from this whether exactly one terminating null is required, or whether more than one null can be included, with the string being terminated by the first null. Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL defines the string type string consisting of all possible 8-bit quantities except null"), any additional nulls following the first terminating null cannot be part of the string, and it therefore seems reasonable to ignore them. Proposed Resolution: Change the above sentence in section 15.3.2.7 to: "Both the string length and contents include at least one terminating null." Also make the same change to the corresponding sentence in the third paragraph of section 15.3.2.7 describing GIOP 1.1 wide strings. Simon -- Simon C Nash, Technology Architect, IBM Java Technology Centre Tel. +44-1962-815156 Fax +44-1962-818999 Hursley, England Internet: nash@hursley.ibm.com Lotus Notes: Simon Nash@ibmgb Date: Thu, 21 Dec 2000 06:01:21 +1000 (EST) From: Michi Henning To: Simon Nash cc: issues@omg.org, interop@omg.org Subject: Re: Null termination of strings In-Reply-To: <3A310359.13F43A5E@hursley.ibm.com> Message-ID: Organization: Object Oriented Concepts MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: 4T6e9g]h!!(,%"!OTPe9 On Fri, 8 Dec 2000, Simon Nash wrote: > Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR encoding > of strings, includes the following sentence in the first paragraph: > > "Both the string length and contents include a terminating null." > > It is not clear from this whether exactly one terminating null is required, > or whether more than one null can be included, with the string being terminated > by the first null. > > Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL defines the string > type string consisting of all possible 8-bit quantities except null"), any > additional nulls following the first terminating null cannot be part of the > string, and it therefore seems reasonable to ignore them. > > Proposed Resolution: > > Change the above sentence in section 15.3.2.7 to: > > "Both the string length and contents include at least one terminating null." > > Also make the same change to the corresponding sentence in the third paragraph > of section 15.3.2.7 describing GIOP 1.1 wide strings. I don't think any such change is needed. The string length tells me how many bytes are in the string, including the terminating NUL. I would expect that length to give me *exactly* that count. What follows the terminating NUL is either padding, or the next value in the byte stream. I see no point in allowing a string to have several terminating NUL characters. What would this improve? Cheers, Michi. -- Michi Henning +61 7 3324 9633 Object Oriented Concepts +61 4 1118 2700 (mobile) Suite 4, 8 Martha St +61 7 3324 9799 (fax) Camp Hill 4152 michi@ooc.com.au Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html Date: Wed, 20 Dec 2000 22:46:08 +0000 From: Simon Nash Organization: IBM X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I) X-Accept-Language: en MIME-Version: 1.0 To: Michi Henning CC: interop@omg.org Subject: Re: Null termination of strings References: Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: 94#!!LX@!!*Ak!!p$pd9 Michi, Extra nulls could be used to ensure a specific alignment of data that follows the string. Simon Michi Henning wrote: > > On Fri, 8 Dec 2000, Simon Nash wrote: > > > Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR encoding > > of strings, includes the following sentence in the first paragraph: > > > > "Both the string length and contents include a terminating null." > > > > It is not clear from this whether exactly one terminating null is required, > > or whether more than one null can be included, with the string being terminated > > by the first null. > > > > Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL defines the string > > type string consisting of all possible 8-bit quantities except null"), any > > additional nulls following the first terminating null cannot be part of the > > string, and it therefore seems reasonable to ignore them. > > > > Proposed Resolution: > > > > Change the above sentence in section 15.3.2.7 to: > > > > "Both the string length and contents include at least one terminating null." > > > > Also make the same change to the corresponding sentence in the third paragraph > > of section 15.3.2.7 describing GIOP 1.1 wide strings. > > I don't think any such change is needed. The string length tells me how > many bytes are in the string, including the terminating NUL. I would expect > that length to give me *exactly* that count. What follows the terminating > NUL is either padding, or the next value in the byte stream. > > I see no point in allowing a string to have several terminating NUL characters. > What would this improve? > > Cheers, > > Michi. > -- > Michi Henning +61 7 3324 9633 > Object Oriented Concepts +61 4 1118 2700 (mobile) > Suite 4, 8 Martha St +61 7 3324 9799 (fax) > Camp Hill 4152 michi@ooc.com.au > Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html -- Simon C Nash, Technology Architect, IBM Java Technology Centre Tel. +44-1962-815156 Fax +44-1962-818999 Hursley, England Internet: nash@hursley.ibm.com Lotus Notes: Simon Nash@ibmgb Date: Thu, 21 Dec 2000 09:00:12 +1000 (EST) From: Michi Henning To: Simon Nash cc: interop@omg.org Subject: Re: Null termination of strings In-Reply-To: <3A4136B0.F59E6743@hursley.ibm.com> Message-ID: Organization: Object Oriented Concepts MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: ,4(e9e>jd9$6fd9IQ5!! On Wed, 20 Dec 2000, Simon Nash wrote: > Michi, > Extra nulls could be used to ensure a specific alignment of data > that > follows the string. Sure. But those bytes are *not* part of the string. Instead, they are padding, and the contents of those padding bytes need not be defined. I see your point: for example, if a string ends at some byte boundary and then I need three bytes of padding to the start of the next value, I could use: string length | string value | padding ----------------------------------------------- 2 | "c\0" | "bbb" 3 | "c\0\0" | "bb" 4 | "c\0\0\0" | "b" 5 | "c\0\0\0\0" | none [ "c" means an arbitrary character in a string, "b" means a padding byte with undefined value. ] All of these get me to the next value boundary. However, I would argue that the last three options are non-sensical. For example, during unmarshaling, I read the length value and allocate that many bytes to hold the string, and then I skip the appropriate number of bytes in the input stream to get to the next value boundary. This works fine for the first case above, but is wasteful for the other three cases because I end up allocating more than the necessary number of bytes for the string. Personally, I am in favour of making this sort of trickery illegal and to update the spec (if it doesn't say that already) that the string length must be exactly the number of bytes in the string plus one extra byte for the single terminating NUL. Cheers, Michi. -- Michi Henning +61 7 3324 9633 Object Oriented Concepts +61 4 1118 2700 (mobile) Suite 4, 8 Martha St +61 7 3324 9799 (fax) Camp Hill 4152 michi@ooc.com.au Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html Date: Wed, 20 Dec 2000 18:45:24 -0500 From: Paul Kyzivat X-Mailer: Mozilla 4.73 [en]C-CCK-MCD (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: Michi Henning CC: Simon Nash , interop@omg.org Subject: Re: Null termination of strings References: Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: 1KKe9NK*e9\:j!!)iNd9 I agree with Michi. Not all languages use a null terminated internal representation. Permitting this sort of trickery would mean that in all of those languages (including Java) an unmarshaller would have to check for extra trailing nulls and remove them. Ugh! Paul (the lurker) Michi Henning wrote: > > On Wed, 20 Dec 2000, Simon Nash wrote: > > > Michi, > > Extra nulls could be used to ensure a specific alignment of data that > > follows the string. > > Sure. But those bytes are *not* part of the string. Instead, they are padding, > and the contents of those padding bytes need not be defined. > > I see your point: for example, if a string ends at some byte boundary > and then I need three bytes of padding to the start of the next value, I > could use: > > string length | string value | padding > ----------------------------------------------- > 2 | "c\0" | "bbb" > 3 | "c\0\0" | "bb" > 4 | "c\0\0\0" | "b" > 5 | "c\0\0\0\0" | none > > [ "c" means an arbitrary character in a string, "b" means a padding > byte with undefined value. ] > > All of these get me to the next value boundary. However, I would argue > that the last three options are non-sensical. For example, during unmarshaling, > I read the length value and allocate that many bytes to hold the string, > and then I skip the appropriate number of bytes in the input stream to > get to the next value boundary. This works fine for the first case above, > but is wasteful for the other three cases because I end up allocating > more than the necessary number of bytes for the string. > > Personally, I am in favour of making this sort of trickery illegal and > to update the spec (if it doesn't say that already) that the string > length must be exactly the number of bytes in the string plus one extra > byte for the single terminating NUL. > > Cheers, > > Michi. > -- > Michi Henning +61 7 3324 9633 > Object Oriented Concepts +61 4 1118 2700 (mobile) > Suite 4, 8 Martha St +61 7 3324 9799 (fax) > Camp Hill 4152 michi@ooc.com.au > Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html Date: Thu, 21 Dec 2000 19:00:35 +0000 From: Simon Nash Organization: IBM X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I) X-Accept-Language: en MIME-Version: 1.0 To: Michi Henning CC: interop@omg.org Subject: Re: Null termination of strings References: Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: %%gd9Q!M!!kmQ!!V$jd9 Michi, I know that CDR leaves padding if necessitated by the alignment of the next item. But this could be used to force the next item to a stronger alignment boundary than usual. For example, if the next item were a 4-byte aligned item, adding extra nulls on the end of the string could force it to an 8-byte boundary. In the abstract this does not seem very useful. However, I am aware of an ORB product that does this when building GIOP requests in order to ensure that the data always starts on an 8-byte boundary. It does this by padding the message name string within the request header. I would like a ruling from the RTF on whether this technique is legal. Simon Michi Henning wrote: > > On Wed, 20 Dec 2000, Simon Nash wrote: > > > Michi, > > Extra nulls could be used to ensure a specific alignment of data that > > follows the string. > > Sure. But those bytes are *not* part of the string. Instead, they are padding, > and the contents of those padding bytes need not be defined. > > I see your point: for example, if a string ends at some byte boundary > and then I need three bytes of padding to the start of the next value, I > could use: > > string length | string value | padding > ----------------------------------------------- > 2 | "c\0" | "bbb" > 3 | "c\0\0" | "bb" > 4 | "c\0\0\0" | "b" > 5 | "c\0\0\0\0" | none > > [ "c" means an arbitrary character in a string, "b" means a padding > byte with undefined value. ] > > All of these get me to the next value boundary. However, I would argue > that the last three options are non-sensical. For example, during unmarshaling, > I read the length value and allocate that many bytes to hold the string, > and then I skip the appropriate number of bytes in the input stream to > get to the next value boundary. This works fine for the first case above, > but is wasteful for the other three cases because I end up allocating > more than the necessary number of bytes for the string. > > Personally, I am in favour of making this sort of trickery illegal and > to update the spec (if it doesn't say that already) that the string > length must be exactly the number of bytes in the string plus one extra > byte for the single terminating NUL. > > Cheers, > > Michi. > -- > Michi Henning +61 7 3324 9633 > Object Oriented Concepts +61 4 1118 2700 (mobile) > Suite 4, 8 Martha St +61 7 3324 9799 (fax) > Camp Hill 4152 michi@ooc.com.au > Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html -- Simon C Nash, Technology Architect, IBM Java Technology Centre Tel. +44-1962-815156 Fax +44-1962-818999 Hursley, England Internet: nash@hursley.ibm.com Lotus Notes: Simon Nash@ibmgb Date: Fri, 22 Dec 2000 05:40:44 +1000 (EST) From: Michi Henning To: Simon Nash cc: interop@omg.org Subject: Re: Null termination of strings In-Reply-To: <3A425353.1D0C09FB@hursley.ibm.com> Message-ID: Organization: Object Oriented Concepts MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: G8 Michi, > I know that CDR leaves padding if necessitated by the alignment of > the > next item. But this could be used to force the next item to a > stronger > alignment boundary than usual. For example, if the next item were a > 4-byte aligned item, adding extra nulls on the end of the string > could > force it to an 8-byte boundary. > > In the abstract this does not seem very useful. However, I am aware > of > an ORB product that does this when building GIOP requests in order > to > ensure that the data always starts on an 8-byte boundary. It does > this > by padding the message name string within the request header. I > would > like a ruling from the RTF on whether this technique is legal. Hmmm... The relevant words in the spec are: A string is encoded as an unsigned long indicating the length of the string in octets, followed by the string value in single- or multi-byte form represented as a sequence of octets. Both the string length and contents include a terminating null. Note that this requires "a" terminating null, which I would interpret to mean a *single* terminating null. [ As an aside, the use of the term "null" here is wrong. It should be "NUL", which is the official name of the ASCII character whose value is zero. We should probably clean this up. ] Overall, I would be inclined to rule the alignment technique you describe as non-compliant. Cheers, Michi. -- Michi Henning +61 7 3324 9633 Object Oriented Concepts +61 4 1118 2700 (mobile) Suite 4, 8 Martha St +61 7 3324 9799 (fax) Camp Hill 4152 michi@ooc.com.au Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html Date: Fri, 22 Dec 2000 10:16:28 +0000 From: Simon Nash Organization: IBM X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I) X-Accept-Language: en MIME-Version: 1.0 To: Michi Henning CC: interop@omg.org Subject: Re: Null termination of strings References: Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: ZIQe9E%H!!,Qhd9Om]d9 Michi, The alternative interpretation of these spec words is that the first null (NUL) in the string acts as the terminator. Other data following the terminating null (NUL) is not part of the string contents. This would be consistent with how terminating nulls (NULs) work in C and C++. Simon Michi Henning wrote: > > On Thu, 21 Dec 2000, Simon Nash wrote: > > > Michi, > > I know that CDR leaves padding if necessitated by the alignment of the > > next item. But this could be used to force the next item to a stronger > > alignment boundary than usual. For example, if the next item were a > > 4-byte aligned item, adding extra nulls on the end of the string could > > force it to an 8-byte boundary. > > > > In the abstract this does not seem very useful. However, I am aware of > > an ORB product that does this when building GIOP requests in order to > > ensure that the data always starts on an 8-byte boundary. It does this > > by padding the message name string within the request header. I would > > like a ruling from the RTF on whether this technique is legal. > > Hmmm... The relevant words in the spec are: > > A string is encoded as an unsigned long indicating the length of > the string in octets, followed by the string value in single- or > multi-byte form represented as a sequence of octets. Both the > string length and contents include a terminating null. > > Note that this requires "a" terminating null, which I would interpret to > mean a *single* terminating null. > > [ As an aside, the use of the term "null" here is wrong. It should be "NUL", > which is the official name of the ASCII character whose value is zero. > We should probably clean this up. ] > > Overall, I would be inclined to rule the alignment technique you describe > as non-compliant. > > Cheers, > > Michi. > -- > Michi Henning +61 7 3324 9633 > Object Oriented Concepts +61 4 1118 2700 (mobile) > Suite 4, 8 Martha St +61 7 3324 9799 (fax) > Camp Hill 4152 michi@ooc.com.au > Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html -- Simon C Nash, Technology Architect, IBM Java Technology Centre Tel. +44-1962-815156 Fax +44-1962-818999 Hursley, England Internet: nash@hursley.ibm.com Lotus Notes: Simon Nash@ibmgb X-Sent: 22 Dec 2000 10:25:39 GMT From: "Nick Sharman" To: "Simon Nash" , "Michi Henning" Cc: Subject: RE: Null termination of strings Date: Fri, 22 Dec 2000 10:29:45 -0000 Message-ID: MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) In-Reply-To: <3A425353.1D0C09FB@hursley.ibm.com> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Content-Type: text/plain; charset="us-ascii" X-UIDL: fBRd9$MS!!6A9e9P[nd9 Simon, > Michi, > I know that CDR leaves padding if necessitated by the alignment of > the > next item. But this could be used to force the next item to a > stronger > alignment boundary than usual. For example, if the next item were a > 4-byte aligned item, adding extra nulls on the end of the string > could > force it to an 8-byte boundary. > > In the abstract this does not seem very useful. However, I am aware > of > an ORB product that does this when building GIOP requests in order > to > ensure that the data always starts on an 8-byte boundary. It does > this > by padding the message name string within the request header. I > would > like a ruling from the RTF on whether this technique is legal. > > Simon There's no problem with GIOP 1.2 requests (or replies), as data is required to be 8-byte aligned. You don't need to alter the length of the operation name, just increment the output buffer pointer to the next multiple of 8. For 1.0 & 1.1 requests, the last thing before the data is not the operation name; it's the principal. This is an otherwise-unused octet sequence. Its length is an unsigned long, which takes you to a 4-byte boundary. Just output 0 or 4 bytes of arbitrary data as the content, as necessary to take uou to the next 8-byte boundary. The data in 1.0 & 1.1 replies is always 4-byte aligned, since the last header field is an enum value. If you want 8-byte alignment, allocate a vendor service context tag to be used only for padding, to be marshalled at the end of the SC list, and choose an appropriate length, 0 or 4, to finish on an 8-byte boundary (the rest of the header is 8 bytes, so you still end up 8-byte aligned). Regards Nick Date: Sat, 23 Dec 2000 05:00:17 +1000 (EST) From: Michi Henning To: Simon Nash cc: interop@omg.org Subject: Re: Null termination of strings In-Reply-To: <3A4329FC.5C825496@hursley.ibm.com> Message-ID: Organization: Object Oriented Concepts MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: hAod9a#5e9>R-e9?X Michi, > The alternative interpretation of these spec words is that the first > null > (NUL) in the string acts as the terminator. Other data following > the > terminating null (NUL) is not part of the string contents. This > would > be consistent with how terminating nulls (NULs) work in C and C++. Hmmm... I really don't like this, for the reasons Paul and I outlined. By claiming that the string is longer than it actually is in its length field and then adding additional NUL bytes, I make unmarshaling more wasteful. In addition, for languages that do not use the concept of NUL termination and instead represent a string as byte array and count, the unmarshaler would have to scan every received string from it's tail to strip of redundant NUL bytes and then adjust the length count accordingly. I honestly see no gain by allowing the additional NUL bytes, but I see disadvantages in complexity. I quite strongly feel that the implementation you mention should be ruled non-compliant. Cheers, Michi. -- Michi Henning +61 7 3324 9633 Object Oriented Concepts +61 4 1118 2700 (mobile) Suite 4, 8 Martha St +61 7 3324 9799 (fax) Camp Hill 4152 michi@ooc.com.au Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html Date: Fri, 22 Dec 2000 13:12:53 -0800 From: Everett Anderson X-Mailer: Mozilla 4.73 [en] (Windows NT 5.0; U) X-Accept-Language: en,pdf,ja MIME-Version: 1.0 To: Michi Henning CC: interop@omg.org Subject: Re: Null termination of strings References: Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: #li!!ECG!!K?Be9@~J!! > I honestly see no gain by allowing the additional NUL bytes, but I see > disadvantages in complexity. I quite strongly feel that the implementation > you mention should be ruled non-compliant. I tend to agree, though I guess my objection is based mainly on the ugly Java implementation. With respect to padding and fragmentation in GIOP 1.1, it seems like there were mistakes that were corrected in GIOP 1.2, and string shouldn't have to pay the price forever. From: "Rutt, T E (Tom)" To: Simon Nash , Michi Henning , "'Nick Sharman'" Cc: interop@omg.org Subject: RE: Null termination of strings Date: Fri, 5 Jan 2001 14:16:26 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain X-UIDL: /U]d9V/5e9^$jd9lK Michi, > I know that CDR leaves padding if necessitated by the > alignment of the > next item. But this could be used to force the next item to > a stronger > alignment boundary than usual. For example, if the next > item were a > 4-byte aligned item, adding extra nulls on the end of the > string could > force it to an 8-byte boundary. > > In the abstract this does not seem very useful. However, I > am aware of > an ORB product that does this when building GIOP requests in > order to > ensure that the data always starts on an 8-byte boundary. > It does this > by padding the message name string within the request > header. I would > like a ruling from the RTF on whether this technique is > legal. > > Simon There's no problem with GIOP 1.2 requests (or replies), as data is required to be 8-byte aligned. You don't need to alter the length of the operation name, just increment the output buffer pointer to the next multiple of 8. For 1.0 & 1.1 requests, the last thing before the data is not the operation name; it's the principal. This is an otherwise-unused octet sequence. Its length is an unsigned long, which takes you to a 4-byte boundary. Just output 0 or 4 bytes of arbitrary data as the content, as necessary to take uou to the next 8-byte boundary. The data in 1.0 & 1.1 replies is always 4-byte aligned, since the last header field is an enum value. If you want 8-byte alignment, allocate a vendor service context tag to be used only for padding, to be marshalled at the end of the SC list, and choose an appropriate length, 0 or 4, to finish on an 8-byte boundary (the rest of the header is 8 bytes, so you still end up 8-byte aligned). Regards Nick From: Jeffrey Mischkinsky Message-Id: <200101051947.LAA03471@wheel.dcn.davis.ca.us> Subject: Re: Null termination of strings To: terutt@lucent.com ("Rutt, T E (Tom)") Date: Fri, 5 Jan 2001 11:47:33 -0800 (PST) Cc: nash@hursley.ibm.com (Simon Nash), michi@ooc.com.au (Michi Henning), nick.sharman@cp.net ('Nick Sharman'), interop@omg.org In-Reply-To: <4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com> from "Rutt, T E (Tom)" at Jan 05, 2001 02:16:26 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: ^?/e9DQ&!!$^&e9Kccd9 '"Rutt, T E (Tom)"' writes: > > I think that Null terminated is interpreted as One null character. I agree > > I do not like the idea of extra nulls in the string itself. OTOH I don't think that we specify the bit pattern of padding bytes. Feel free to use a 0, 377, 255, etc. But padding bytes are not part of the item that prcedes them. jeff > > > ter > > ---------- > From: Nick Sharman [SMTP:nick.sharman@cp.net] > Sent: Friday, December 22, 2000 5:30 AM > To: Simon Nash; Michi Henning > Cc: interop@omg.org > Subject: RE: Null termination of strings > > Simon, > > > > Michi, > > I know that CDR leaves padding if necessitated by the alignment of > the > > next item. But this could be used to force the next item to a > stronger > > alignment boundary than usual. For example, if the next item were > a > > 4-byte aligned item, adding extra nulls on the end of the string > could > > force it to an 8-byte boundary. > > > > In the abstract this does not seem very useful. However, I am > aware of > > an ORB product that does this when building GIOP requests in order > to > > ensure that the data always starts on an 8-byte boundary. It does > this > > by padding the message name string within the request header. I > would > > like a ruling from the RTF on whether this technique is legal. > > > > Simon > > There's no problem with GIOP 1.2 requests (or replies), as data is > required > to be 8-byte aligned. You don't need to alter the length of the > operation > name, just increment the output buffer pointer to the next multiple > of 8. > > For 1.0 & 1.1 requests, the last thing before the data is not the > operation > name; it's the principal. This is an otherwise-unused octet > sequence. Its > length is an unsigned long, which takes you to a 4-byte boundary. > Just > output 0 or 4 bytes of arbitrary data as the content, as necessary > to take > uou to the next 8-byte boundary. > > The data in 1.0 & 1.1 replies is always 4-byte aligned, since the > last > header field is an enum value. If you want 8-byte alignment, > allocate a > vendor service context tag to be used only for padding, to be > marshalled at > the end of the SC list, and choose an appropriate length, 0 or 4, to > finish > on an 8-byte boundary (the rest of the header is 8 bytes, so you > still end > up 8-byte aligned). > > Regards > Nick > > > > > > -- Jeff Mischkinsky jmischki@dcn.davis.ca.us +1 530-758-9850 jeff@persistence.com +1 650-372-3604 Date: Sun, 07 Jan 2001 14:44:03 +0100 From: Marcus Wittig X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Rutt, T E (Tom)" CC: Simon Nash , Michi Henning , "'Nick Sharman'" , interop@omg.org Subject: Re: Null termination of strings References: <4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=iso-8859-1 X-UIDL: 2Ag!!Ld+e9aH(e9ocNe9 "'Nick Sharman'" wrote: > > The data in 1.0 & 1.1 replies is always 4-byte aligned, >since the > last > header field is an enum value. If you want 8-byte >alignment, > allocate a > vendor service context tag to be used only for padding, to >be > marshalled at > the end of the SC list, and choose an appropriate length, 0 >or 4, to > finish > on an 8-byte boundary (the rest of the header is 8 bytes, so >you > still end > up 8-byte aligned). Unfortunately, this won't work always as expected due to a weakness of the Core standard up to version CORBA 2.3.1. The problem is core spec defines the following rule about what an ORB should do with a service context which is not in the OMG-defined range: "The receiving ORB may choose to ignore it, process it if i t it, or raise a system exception, however it must be passed on through a bridge. If a system exception is raised, it shall be BAD_PARAM with an OMG standard" (see chapter 13.6, CORBA 2.3.1) So, the target ORB may throw a system exception if it comes accross an "unknown" service context. Too bad! This problem has been recognized and the new CORBA 2.4 spec has changed this rule by removing the second sentence. Fine, but in practice it is of little value as vendors have to deal with legacy ORB products for a long time. There is at least one ORB product I know about which throws a system exception if it receives an "unknown" service context. Kind Regards Marcus Wittig Date: Mon, 08 Jan 2001 13:58:55 +0000 From: Simon Nash Organization: IBM X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I) X-Accept-Language: en MIME-Version: 1.0 To: "Rutt, T E (Tom)" CC: Michi Henning , "'Nick Sharman'" , interop@omg.org Subject: Re: Null termination of strings References: <4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: eWkd9njD!!o!De9bFM!! Tom, OK, there is pretty strong consensus that extra NUL terminators should not be allowed as part of the string. So what should a compliant ORB do when it receives a string or wstring that contains these extra NULs? It could: a. strip off the last NUL (except for GIOP 1.2 wstring, where there is no teminating NUL), and treat the others as part of the string or wstring. This might be natural for a Java ORB. b. terminate the string or wstring at the first NUL (except for GIOP 1.2 wstring, where there is no teminating NUL). This might be natural for a C++ ORB. c. raise a MARSHAL exception. d. fail in some other way. Are all these legal, or should GIOP specify which of these is correct? A related issue is whether or not NUL characters can be embedded in GIOP 1.2 wstrings. These do not use NUL as a terminator, so from reading chapter 15 this would appear to be OK. However, section 3.10.4.3 says that wstrings cannot include the wide character null. Is this a hangover from pre-GIOP 1.2 days, when wstrings were NUL-terminated, or is there still a good reason for this limitation? This came up recently in the context of the Java to IDL mapping. A Java string (which is mapped to an IDL wstring) contained an embedded NUL, and one of our products did not handle this correctly. If we fix our product to put a NUL wide character on the wire in this case, will this be compliant with GIOP 1.2? Simon "Rutt, T E (Tom)" wrote: > > I think that Null terminated is interpreted as One null character. > > I do not like the idea of extra nulls in the string itself. > > ter > > ---------- > From: Nick Sharman [SMTP:nick.sharman@cp.net] > Sent: Friday, December 22, 2000 5:30 AM > To: Simon Nash; Michi Henning > Cc: interop@omg.org > Subject: RE: Null termination of strings > > Simon, > > > Michi, > > I know that CDR leaves padding if necessitated by the alignment of > the > > next item. But this could be used to force the next item to a > stronger > > alignment boundary than usual. For example, if the next item were > a > > 4-byte aligned item, adding extra nulls on the end of the string > could > > force it to an 8-byte boundary. > > > > In the abstract this does not seem very useful. However, I am > aware of > > an ORB product that does this when building GIOP requests in order > to > > ensure that the data always starts on an 8-byte boundary. It does > this > > by padding the message name string within the request header. I > would > > like a ruling from the RTF on whether this technique is legal. > > > > Simon > > There's no problem with GIOP 1.2 requests (or replies), as data is > required > to be 8-byte aligned. You don't need to alter the length of the > operation > name, just increment the output buffer pointer to the next multiple > of 8. > > For 1.0 & 1.1 requests, the last thing before the data is not the > operation > name; it's the principal. This is an otherwise-unused octet > sequence. Its > length is an unsigned long, which takes you to a 4-byte boundary. > Just > output 0 or 4 bytes of arbitrary data as the content, as necessary > to take > uou to the next 8-byte boundary. > > The data in 1.0 & 1.1 replies is always 4-byte aligned, since the > last > header field is an enum value. If you want 8-byte alignment, > allocate a > vendor service context tag to be used only for padding, to be > marshalled at > the end of the SC list, and choose an appropriate length, 0 or 4, to > finish > on an 8-byte boundary (the rest of the header is 8 bytes, so you > still end > up 8-byte aligned). > > Regards > Nick > > -- Simon C Nash, Technology Architect, IBM Java Technology Centre Tel. +44-1962-815156 Fax +44-1962-818999 Hursley, England Internet: nash@hursley.ibm.com Lotus Notes: Simon Nash@ibmgb Date: Mon, 08 Jan 2001 09:57:28 -0500 From: Paul Kyzivat X-Mailer: Mozilla 4.73 [en]C-CCK-MCD (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: interop@omg.org Subject: Re: Null termination of strings References: <4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com> <3A59C79F.162CC8D1@hursley.ibm.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: !jXd9DIFe9c>N!!jffd9 Simon Nash wrote: > > Tom, > OK, there is pretty strong consensus that extra NUL terminators should > not be allowed as part of the string. So what should a compliant ORB > do when it receives a string or wstring that contains these extra NULs? > It could: > a. strip off the last NUL (except for GIOP 1.2 wstring, where there is > no teminating NUL), and treat the others as part of the string or > wstring. This might be natural for a Java ORB. > b. terminate the string or wstring at the first NUL (except for GIOP 1.2 > wstring, where there is no teminating NUL). This might be natural > for a C++ ORB. > c. raise a MARSHAL exception. > d. fail in some other way. > Are all these legal, or should GIOP specify which of these is correct? I think at least (a) and (c) should be legal implementations. > > A related issue is whether or not NUL characters can be embedded in > GIOP 1.2 wstrings. These do not use NUL as a terminator, so from > reading > chapter 15 this would appear to be OK. However, section 3.10.4.3 > says > that wstrings cannot include the wide character null. Is this a > hangover > from pre-GIOP 1.2 days, when wstrings were NUL-terminated, or is > there > still a good reason for this limitation? Well, there are two obvious reasons why the restriction should still remain: 1) It is hard to ensure that your string will only be conveyed via giop 1.2. You may send it that way, but it may be passed on by the recipient to somebody else using an earlier version of giop. 2) Embedded nulls in strings don't work for either the C or C++ language mappings. The first of these is a more valid reason in my mind. This behavior by C & C++ has always been a mistake in my mind. IF this restriction is ever lifted in Interop, then the C & C++ language mappings will need tobe ammended to specify what happens if an embedded null is received. > > This came up recently in the context of the Java to IDL mapping. A > Java > string (which is mapped to an IDL wstring) contained an embedded > NUL, > and one of our products did not handle this correctly. If we fix > our > product to put a NUL wide character on the wire in this case, will > this > be compliant with GIOP 1.2? I don't think so. Paul Date: Mon, 8 Jan 2001 18:17:20 +0100 (MET) Message-Id: <200101081717.SAA07392@pandora.informatik.hu-berlin.de> X-Authentication-Warning: pandora.informatik.hu-berlin.de: loewis set sender to loewis@informatik.hu-berlin.de using -f From: Martin von Loewis To: nash@hursley.ibm.com CC: terutt@lucent.com, michi@ooc.com.au, nick.sharman@cp.net, interop@omg.org In-reply-to: <3A59C79F.162CC8D1@hursley.ibm.com> (message from Simon Nash on Mon, 08 Jan 2001 13:58:55 +0000) Subject: Re: Null termination of strings References: <4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com> <3A59C79F.162CC8D1@hursley.ibm.com> User-Agent: SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) Emacs/20.6 (sparc-sun-solaris2.6) MULE/4.0 (HANANOEN) MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu") Content-Type: text/plain; charset=US-ASCII X-UIDL: W'9!!O4ad9&P`d9\41!! > OK, there is pretty strong consensus that extra NUL terminators should > not be allowed as part of the string. So what should a compliant ORB > do when it receives a string or wstring that contains these extra NULs? > It could: > a. strip off the last NUL (except for GIOP 1.2 wstring, where there is > no teminating NUL), and treat the others as part of the string or > wstring. This might be natural for a Java ORB. > b. terminate the string or wstring at the first NUL (except for GIOP 1.2 > wstring, where there is no teminating NUL). This might be natural > for a C++ ORB. > c. raise a MARSHAL exception. > d. fail in some other way. > Are all these legal, or should GIOP specify which of these is > correct? The message being received is ill-formed, so an ORB doing c) is clearly behaving properly. I think an ORB should not be required to detect all possible message errors. If it doesn't detect this error, further behaviour is unspecified. The ORB may do any of these, and many more, including e. Turn your coffee machine off > A related issue is whether or not NUL characters can be embedded in > GIOP 1.2 wstrings. Not sure what you mean by NUL character here. A wide string is encoded as a sequence of octets, some of which may have an all-bits-zero octet. That, in general, is different from a NUL wide character, whose wire representation depends on the coded character set. > These do not use NUL as a terminator, so from reading chapter 15 > this would appear to be OK. However, section 3.10.4.3 says that > wstrings cannot include the wide character null. Is this a hangover > from pre-GIOP 1.2 days, when wstrings were NUL-terminated, or is > there still a good reason for this limitation? I think Paul is right that the C and C++ mappings currently cannot represent wide character strings containing (wint_t)0, and that this is a good reason to disallow such strings in CORBA. > This came up recently in the context of the Java to IDL mapping. A Java > string (which is mapped to an IDL wstring) contained an embedded NUL, > and one of our products did not handle this correctly. If we fix our > product to put a NUL wide character on the wire in this case, will this > be compliant with GIOP 1.2? No, that string would still contain a NUL wide character, which is not supported. Of course, if the product really failed for a null octet in the wstring (which frequently happens for UCS-2), then the product would be broken; adding a NUL wide character would also render the other product non-compliant. Regards, Martin Date: Thu, 18 Jan 2001 10:08:12 +1000 (EST) From: Michi Henning To: Paul Kyzivat cc: interop@omg.org Subject: Re: Null termination of strings In-Reply-To: <3A59D557.9F326EC0@cisco.com> Message-ID: Organization: Object Oriented Concepts MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: lKR!!R2bd9[&I!!^;cd9 On Mon, 8 Jan 2001, Paul Kyzivat wrote: > Simon Nash wrote: > > > > Tom, > > OK, there is pretty strong consensus that extra NUL terminators should > > not be allowed as part of the string. So what should a compliant ORB > > do when it receives a string or wstring that contains these extra NULs? > > It could: > > a. strip off the last NUL (except for GIOP 1.2 wstring, where there is > > no teminating NUL), and treat the others as part of the string or > > wstring. This might be natural for a Java ORB. > > b. terminate the string or wstring at the first NUL (except for GIOP 1.2 > > wstring, where there is no teminating NUL). This might be natural > > for a C++ ORB. > > c. raise a MARSHAL exception. > > d. fail in some other way. > > Are all these legal, or should GIOP specify which of these is correct? > > I think at least (a) and (c) should be legal implementations. I can't say I like that much. That's because, if (a) and (c) are both legal, we will end up with the situation where one ORB consistently rejects a request, and a different ORB consistently accepts the same request, and both can claim to be compliant. That's not a good idea... I'd prefer to require a MARSHAL exception. Cheers, Michi. -- Michi Henning +61 7 3324 9633 Object Oriented Concepts +61 4 1118 2700 (mobile) Suite 4, 8 Martha St +61 7 3324 9799 (fax) Camp Hill 4152 michi@ooc.com.au Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html Date: Wed, 17 Jan 2001 19:27:55 -0500 From: Paul Kyzivat X-Mailer: Mozilla 4.73 [en]C-CCK-MCD (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 CC: interop@omg.org Subject: Re: Null termination of strings References: Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: F)S!!; I'd prefer to require a MARSHAL exception. My comment was predicated on the assumption that sending multiple NULs is illegal. So this is simply discussing what an orb is permitted to do when confronted with an invalid encoding on the wire. I don't think a conforming implementation should be required to catch this. It is a needless expense. This then becomes a quality of implementation issue, and permits orbs to decide whether to emphasize speed, or the ability to diagnose a defective orb at the other end. Making it illegal to send the NUL incurs no extra cost for C++ orbs because the null is already the end delimiter on strings. I suppose it is an extra cost in Java because it is easy to get a NUL into a string. But it seems better to sometimes pay the cost on one end than it is to always pay the cost on both ends. Paul Michi Henning wrote: > > On Mon, 8 Jan 2001, Paul Kyzivat wrote: > > > Simon Nash wrote: > > > > > > Tom, > > > OK, there is pretty strong consensus that extra NUL terminators should > > > not be allowed as part of the string. So what should a compliant ORB > > > do when it receives a string or wstring that contains these extra NULs? > > > It could: > > > a. strip off the last NUL (except for GIOP 1.2 wstring, where there is > > > no teminating NUL), and treat the others as part of the string or > > > wstring. This might be natural for a Java ORB. > > > b. terminate the string or wstring at the first NUL (except for GIOP 1.2 > > > wstring, where there is no teminating NUL). This might be natural > > > for a C++ ORB. > > > c. raise a MARSHAL exception. > > > d. fail in some other way. > > > Are all these legal, or should GIOP specify which of these is correct? > > > > I think at least (a) and (c) should be legal implementations. > > I can't say I like that much. That's because, if (a) and (c) are both legal, > we will end up with the situation where one ORB consistently rejects a > request, and a different ORB consistently accepts the same request, and > both can claim to be compliant. That's not a good idea... > > I'd prefer to require a MARSHAL exception. > > Cheers, > > Michi. > -- > Michi Henning +61 7 3324 9633 > Object Oriented Concepts +61 4 1118 2700 (mobile) > Suite 4, 8 Martha St +61 7 3324 9799 (fax) > Camp Hill 4152 michi@ooc.com.au > Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html Date: Thu, 18 Jan 2001 11:07:23 +1000 (EST) From: Michi Henning To: Paul Kyzivat cc: interop@omg.org Subject: Re: Null termination of strings In-Reply-To: <3A66388B.8CA40485@cisco.com> Message-ID: Organization: Object Oriented Concepts MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: ^WBe9`cZ!!P"2e9)SW!! On Wed, 17 Jan 2001, Paul Kyzivat wrote: > > I'd prefer to require a MARSHAL exception. > > My comment was predicated on the assumption that sending > multiple NULs is illegal. So this is simply discussing > what an orb is permitted to do when confronted with an > invalid encoding on the wire. > > I don't think a conforming implementation should be > required to catch this. It is a needless expense. > This then becomes a quality of implementation issue, > and permits orbs to decide whether to emphasize speed, > or the ability to diagnose a defective orb at the > other end. OK, I agree with that. > Making it illegal to send the NUL incurs no extra cost > for C++ orbs because the null is already the end delimiter > on strings. I suppose it is an extra cost in Java because > it is easy to get a NUL into a string. But it seems better > to sometimes pay the cost on one end than it is to always > pay the cost on both ends. Yes. I'd make it illegal then for an ORB to send a string that has more than one NUL at the end. This seems to make the most sense anyway, seeing that IDL prohibits embedded NULs in strings. Whether to eat the string or to throw an exception at the receiving end then becomes a quality-of-implementation issue. But at least, by making it illegal to send such a string, there is a definite culprit to point the finger at. Cheers, Michi. Date: Thu, 18 Jan 2001 19:16:48 +0000 From: Simon Nash Organization: IBM X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I) X-Accept-Language: en MIME-Version: 1.0 To: Michi Henning CC: Paul Kyzivat , interop@omg.org Subject: Re: Null termination of strings References: Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: +8id9:<(!!`'7!!6ml!! Michi, Michi Henning wrote: > > On Wed, 17 Jan 2001, Paul Kyzivat wrote: > > > > I'd prefer to require a MARSHAL exception. > > > > My comment was predicated on the assumption that sending > > multiple NULs is illegal. So this is simply discussing > > what an orb is permitted to do when confronted with an > > invalid encoding on the wire. > > > > I don't think a conforming implementation should be > > required to catch this. It is a needless expense. > > This then becomes a quality of implementation issue, > > and permits orbs to decide whether to emphasize speed, > > or the ability to diagnose a defective orb at the > > other end. > > OK, I agree with that. > > > Making it illegal to send the NUL incurs no extra cost > > for C++ orbs because the null is already the end delimiter > > on strings. I suppose it is an extra cost in Java because > > it is easy to get a NUL into a string. But it seems better > > to sometimes pay the cost on one end than it is to always > > pay the cost on both ends. > > Yes. I'd make it illegal then for an ORB to send a string that has more > than one NUL at the end. This seems to make the most sense anyway, seeing > that IDL prohibits embedded NULs in strings. Whether to eat the string > or to throw an exception at the receiving end then becomes a > quality-of-implementation issue. But at least, by making it illegal to > send such a string, there is a definite culprit to point the finger at. > Sorry, but I disagree that Java ORBs should have to scan every string that the application sends to make sure that it does not contain an embedded NUL. If the receiver does not have to diagnose this error, then neither should the sender. Simon -- Simon C Nash, Technology Architect, IBM Java Technology Centre Tel. +44-1962-815156 Fax +44-1962-818999 Hursley, England Internet: nash@hursley.ibm.com Lotus Notes: Simon Nash@ibmgb Date: Thu, 21 Dec 2000 06:01:21 +1000 (EST) From: Michi Henning To: Simon Nash cc: issues@omg.org, interop@omg.org Subject: Re: Null termination of strings In-Reply-To: <3A310359.13F43A5E@hursley.ibm.com> Message-ID: Organization: Object Oriented Concepts MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: 4T6e9g]h!!(,%"!OTPe9 On Fri, 8 Dec 2000, Simon Nash wrote: > Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR encoding > of strings, includes the following sentence in the first paragraph: > > "Both the string length and contents include a terminating null." > > It is not clear from this whether exactly one terminating null is required, > or whether more than one null can be included, with the string being terminated > by the first null. > > Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL defines the string > type string consisting of all possible 8-bit quantities except null"), any > additional nulls following the first terminating null cannot be part of the > string, and it therefore seems reasonable to ignore them. > > Proposed Resolution: > > Change the above sentence in section 15.3.2.7 to: > > "Both the string length and contents include at least one terminating null." > > Also make the same change to the corresponding sentence in the third paragraph > of section 15.3.2.7 describing GIOP 1.1 wide strings. I don't think any such change is needed. The string length tells me how many bytes are in the string, including the terminating NUL. I would expect that length to give me *exactly* that count. What follows the terminating NUL is either padding, or the next value in the byte stream. I see no point in allowing a string to have several terminating NUL characters. What would this improve? Cheers, Michi. -- Michi Henning +61 7 3324 9633 Object Oriented Concepts +61 4 1118 2700 (mobile) Suite 4, 8 Martha St +61 7 3324 9799 (fax) Camp Hill 4152 michi@ooc.com.au Brisbane, AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html Date: Sat, 2 Jun 2001 10:53:27 +1000 (EST) From: Michi Henning To: Interoperability RTF cc: Interoperability RTF Subject: On 4113 In-Reply-To: <4.3.2.7.2.20010601174603.01c8c430@emerald.omg.org> Message-ID: Organization: IONA Technologies MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: fP=e9VWKe91~ed9_J To: Interoperability RTF Subject: Re: On 4113 In-Reply-To: Message-ID: Organization: IONA Technologies MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: FDGe9>-Wd91,]!!dg`d9 On Sat, 2 Jun 2001, Michi Henning wrote: > Third, it's been wrong since the day dot, and no-one seems to have noticed: > > Both the string length and contents include a terminating null. > ^^^^^^ > > What the hell do we need a terminating null (or NUL) for the string *length* > for?! It's an unsigned long, afer all. Ah, just got the insight. It means that the count includes the NUL. The wording is awful though. I would suggest: The string contents include a terminating null character. The string length includes the null character, so an empty string has a length of 1. Cheers, Michi. -- Michi Henning +61 7 3324 9633 Chief CORBA Scientist +61 4 1118 2700 (mobile) IONA Technologies +61 7 3324 9799 (fax) Total Business Integration http://www.ooc.com.au/staff/michi From: "Everett Anderson" To: "Interoperability RTF" Subject: RE: On 4113 Date: Wed, 6 Jun 2001 10:57:42 -0700 Message-ID: MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Content-Type: text/plain; charset="us-ascii" X-UIDL: 6]Od9:@7e9(Bm!!LjK!! Hi, I'd like to change Sun's vote for 4113 from YES to NO. I really would like it resolved in this vote, but agree that the wording has always been quite awkward. I'd support Michi's suggestion below. Thanks, Everett > -----Original Message----- > From: Michi Henning [mailto:michi.henning@iona.com] > Sent: Monday, June 04, 2001 3:07 PM > To: Interoperability RTF > Subject: Re: On 4113 > > > On Sat, 2 Jun 2001, Michi Henning wrote: > > > Third, it's been wrong since the day dot, and no-one seems to > have noticed: > > > > Both the string length and contents include a terminating null. > > ^^^^^^ > > > > What the hell do we need a terminating null (or NUL) for the > string *length* > > for?! It's an unsigned long, afer all. > > Ah, just got the insight. It means that the count includes the NUL. > The wording is awful though. I would suggest: > > The string contents include a terminating null character. The string > length includes the null character, so an empty string has a length > of 1. > > Cheers, > > Michi. > -- > Michi Henning +61 7 3324 9633 > Chief CORBA Scientist +61 4 1118 2700 (mobile) > IONA Technologies +61 7 3324 9799 (fax) > Total Business Integration > http://www.ooc.com.au/staff/michi > > Date: Sat, 09 Jun 2001 00:38:57 +0100 From: Simon Nash Organization: IBM X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I) X-Accept-Language: en MIME-Version: 1.0 To: terutt@lucent.com CC: interop@omg.org Subject: Re: Interop Final Wordsmith before Vote 3 References: <3B21422F.55EBC160@lucent.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: M4hd9)OR!!f*0e9[_Ud9 Status: RO Tom, My suggested changes for 4113 are marked with ^^^^ below. Simon > > Revised Text: > > Change the first para of 15.3.2.7 from: > > A string is encoded as an unsigned long indicating the length > of the string in octets, > followed by the string value in single- or multi-byte form > represented as a sequence of > octets. Both the string length and contents include a > terminating null. > > to read: > A string is encoded as an unsigned long indicating the > length of > the string in octets, followed by the string value in > single- or > multi-byte form represented as a sequence of octets. The > string contents include a terminating null character. The > string > length includes the null character, so an empty string has a > length of 1. > A string is encoded as an unsigned long indicating the length of the string in octets, followed by the string value in single- or multi-byte form represented as a sequence of octets. The string contents include a single terminating null character. The > string ^^^^^^ length includes the null character, so an empty string has a length of > 1. > Change the third para of 15.3.2.7 from: > > For GIOP version 1.1, a wide string is encoded as an unsigned > long > indicating the length of the string in octets or unsigned > integers > (determined by the transfer syntax for wchar) followed by the > individual > wide characters. Both the string length and contents include a > terminating > null. The terminating null character for a wstring is also a > wide character. > > to read: > > For GIOP version 1.1, a wide string is encoded as an > unsigned > long indicating the length of the string in octets or > unsigned > integers (determined by the transfer syntax for wchar) > followed > by the individual wide characters. The string contents > include > a terminating null character. The terminating null character > for a wstring is also a wide character. > For GIOP version 1.1, a wide string is encoded as an unsigned long indicating the length of the string in octets or unsigned integers (determined by the transfer syntax for wchar) followed by the individual wide characters. The string contents include a single terminating null character. The string length includes ^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^ the null character. The terminating null character for a wstring ^^^^^^^^^^^^^^^^^^^ is also a wide character. > Change the fourth para of 15.3.2.7 from: > > For GIOP version 1.2, when encoding a wstring, always encode > the > length as the total number of octets used by the encoded value, > regardless > of whether the encoding is byte-oriented or not. For GIOP > version 1.2 > a wstring is not terminated by a NUL character. In particular, > in GIOP > version 1.2 a length of 0 is legal for wstring. > > to read: > > For GIOP version 1.2, when encoding a wstring, always encode > the > length as the total number of octets used by the encoded > value, > regardless of whether the encoding is byte-oriented or > not. For > GIOP version 1.2 a wstring is not terminated by a null > character. > In particular, in GIOP version 1.2 a length of 0 is legal > for wstring. > -- Simon C Nash, Chief Technical Officer, IBM Java Technology Tel. +44-1962-815156 Fax +44-1962-818999 Hursley, England Internet: nash@hursley.ibm.com Lotus Notes: Simon Nash@ibmgb