Issue 4113: Null termination of strings (interop)
Source:  (Mr. Simon C. Nash, )
Nature: Uncategorized Issue
Severity: 
Summary: Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR encoding
of strings, includes the following sentence in the first paragraph:

  "Both the string length and contents include a terminating null."

It is not clear from this whether exactly one terminating null is required,
or whether more than one null can be included, with the string being terminated
by the first null.

Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL defines the string
type string consisting of all possible 8-bit quantities except null"), any
additional nulls following the first terminating null cannot be part of the
string, and it therefore seems reasonable to ignore them.

Proposed Resolution:

Change the above sentence in section 15.3.2.7 to:

  "Both the string length and contents include at least one terminating null."

Also make the same change to the corresponding sentence in the third paragraph
of section 15.3.2.7 describing GIOP 1.1 wide strings.

Resolution: To close with clarification revision
Revised Text: Change the first para of 15.3.2.7 from: 

A string is encoded as an unsigned long indicating the length of the string in octets, 
followed by the string value in single- or multi-byte form represented as a sequence of 
octets. Both the string length and contents include a terminating null.
  to read: 
        A string is encoded as an unsigned long indicating the length of 
        the string in octets, followed by the string value in single- or 
        multi-byte form represented as a sequence of octets. The 
        string contents include a single terminating null character.  The string 
        length includes the null character, so an empty string has a length of 1. 
Change the third para of 15.3.2.7 from: 

For GIOP version 1.1, a wide string is encoded as an unsigned long 
indicating the length of the string in octets or unsigned integers 
(determined by the transfer syntax for wchar) followed by the individual 
wide characters. Both the string length and contents include a terminating 
null. The terminating null character for a wstring is also a wide character.
  to read: 
        For GIOP version 1.1, a wide string is encoded as an unsigned 
        long indicating the length of the string in octets or unsigned 
        integers (determined by the transfer syntax for wchar) followed 
        by the individual wide characters. The string contents include 
        a single terminating null character. The string length includes 
        the null character.  The terminating null character for a wstring is 
        also a wide character. 

Change the fourth para of 15.3.2.7 from: 

For GIOP version 1.2, when encoding a wstring, always encode the 
length as the total number of octets used by the encoded value, regardless 
of whether the encoding is byte-oriented or not. For GIOP version 1.2 
a wstring is not terminated by a NUL character. In particular, in GIOP 
version 1.2 a length of 0 is legal for wstring.
  to read: 
        For GIOP version 1.2, when encoding a wstring, always encode the 
        length as the total number of octets used by the encoded value, 
        regardless of whether the encoding is byte-oriented or not. For 
        GIOP version 1.2 a wstring is not terminated by a null character. 
        In particular, in GIOP version 1.2 a length of 0 is legal for wstring. 

Actions taken:
December 8, 2000: received issue
October 3, 2001: closed issue
Discussion: 
In the words of an RTF member, extracted from the mail archive on this issue: 
I don't think any such change is needed. The string length tells me how 
many bytes are in the string, including the terminating NUL. I would expect 
that length to give me *exactly* that count. What follows the terminating 
NUL is either padding, or the next value in the byte stream. 
I see no point in allowing a string to have several terminating NUL characters. 
What would this improve? 
 

However, several RTF vote 1 comments indicated that clarficication would help. 

End of Annotations:=====
Date: Fri, 08 Dec 2000 15:50:49 +0000
From: Simon Nash <nash@hursley.ibm.com>
Organization: IBM
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en
MIME-Version: 1.0
To: issues@omg.org
CC: interop@omg.org
Subject: Null termination of strings 
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: k)W!!K[8e9E~-e9)d\d9

Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR
encoding
of strings, includes the following sentence in the first paragraph:

  "Both the string length and contents include a terminating null."

It is not clear from this whether exactly one terminating null is
required,
or whether more than one null can be included, with the string being
terminated
by the first null.

Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL defines
the string
type string consisting of all possible 8-bit quantities except null"),
any
additional nulls following the first terminating null cannot be part
of the
string, and it therefore seems reasonable to ignore them.

Proposed Resolution:

Change the above sentence in section 15.3.2.7 to:

  "Both the string length and contents include at least one
  terminating null."

Also make the same change to the corresponding sentence in the third
paragraph
of section 15.3.2.7 describing GIOP 1.1 wide strings.

   Simon
-- 
Simon C Nash, Technology Architect, IBM Java Technology Centre
Tel. +44-1962-815156   Fax +44-1962-818999    Hursley, England
Internet: nash@hursley.ibm.com   Lotus Notes: Simon Nash@ibmgb

Date: Thu, 21 Dec 2000 06:01:21 +1000 (EST)
From: Michi Henning <michi@ooc.com.au>
To: Simon Nash <nash@hursley.ibm.com>
cc: issues@omg.org, interop@omg.org
Subject: Re: Null termination of strings 
In-Reply-To: <3A310359.13F43A5E@hursley.ibm.com>
Message-ID:
<Pine.HPX.4.05.10012210559240.10509-100000@bobo.ooc.com.au>
Organization: Object Oriented Concepts
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: 4T6e9g]h!!(,%"!OTPe9

On Fri, 8 Dec 2000, Simon Nash wrote:

> Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR
  encoding
> of strings, includes the following sentence in the first paragraph:
> 
>   "Both the string length and contents include a terminating null."
> 
> It is not clear from this whether exactly one terminating null is
  required,
> or whether more than one null can be included, with the string being
  terminated
> by the first null.
> 
> Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL
  defines the string
> type string consisting of all possible 8-bit quantities except
  null"), any
> additional nulls following the first terminating null cannot be part
  of the
> string, and it therefore seems reasonable to ignore them.
> 
> Proposed Resolution:
> 
> Change the above sentence in section 15.3.2.7 to:
> 
>   "Both the string length and contents include at least one
  terminating null."
> 
> Also make the same change to the corresponding sentence in the third
  paragraph
> of section 15.3.2.7 describing GIOP 1.1 wide strings.

I don't think any such change is needed. The string length tells me
how
many bytes are in the string, including the terminating NUL. I would
expect
that length to give me *exactly* that count. What follows the
terminating
NUL is either padding, or the next value in the byte stream.

I see no point in allowing a string to have several terminating NUL
characters.
What would this improve?

						Cheers,

													Michi.
--
Michi Henning               +61 7 3324 9633
Object Oriented Concepts    +61 4 1118 2700 (mobile)
Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
Camp Hill 4152              michi@ooc.com.au
Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

Date: Wed, 20 Dec 2000 22:46:08 +0000
From: Simon Nash <nash@hursley.ibm.com>
Organization: IBM
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en
MIME-Version: 1.0
To: Michi Henning <michi@ooc.com.au>
CC: interop@omg.org
Subject: Re: Null termination of strings
References:
<Pine.HPX.4.05.10012210559240.10509-100000@bobo.ooc.com.au>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: 94#!!LX@!!*Ak!!p$pd9

Michi,
Extra nulls could be used to ensure a specific alignment of data that
follows the string.

   Simon

Michi Henning wrote:
> 
> On Fri, 8 Dec 2000, Simon Nash wrote:
> 
> > Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR
encoding
> > of strings, includes the following sentence in the first
paragraph:
> >
> >   "Both the string length and contents include a terminating
null."
> >
> > It is not clear from this whether exactly one terminating null is
required,
> > or whether more than one null can be included, with the string
being terminated
> > by the first null.
> >
> > Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL
defines the string
> > type string consisting of all possible 8-bit quantities except
null"), any
> > additional nulls following the first terminating null cannot be
part of the
> > string, and it therefore seems reasonable to ignore them.
> >
> > Proposed Resolution:
> >
> > Change the above sentence in section 15.3.2.7 to:
> >
> >   "Both the string length and contents include at least one
terminating null."
> >
> > Also make the same change to the corresponding sentence in the
third paragraph
> > of section 15.3.2.7 describing GIOP 1.1 wide strings.
> 
> I don't think any such change is needed. The string length tells me
how
> many bytes are in the string, including the terminating NUL. I would
expect
> that length to give me *exactly* that count. What follows the
terminating
> NUL is either padding, or the next value in the byte stream.
> 
> I see no point in allowing a string to have several terminating NUL
characters.
> What would this improve?
> 
>                                                         Cheers,
> 
>
Michi.
> --
> Michi Henning               +61 7 3324 9633
> Object Oriented Concepts    +61 4 1118 2700 (mobile)
> Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
> Camp Hill 4152              michi@ooc.com.au
> Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

-- 
Simon C Nash, Technology Architect, IBM Java Technology Centre
Tel. +44-1962-815156   Fax +44-1962-818999    Hursley, England
Internet: nash@hursley.ibm.com   Lotus Notes: Simon Nash@ibmgb

Date: Thu, 21 Dec 2000 09:00:12 +1000 (EST)
From: Michi Henning <michi@ooc.com.au>
To: Simon Nash <nash@hursley.ibm.com>
cc: interop@omg.org
Subject: Re: Null termination of strings
In-Reply-To: <3A4136B0.F59E6743@hursley.ibm.com>
Message-ID:
<Pine.HPX.4.05.10012210851010.10677-100000@bobo.ooc.com.au>
Organization: Object Oriented Concepts
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: ,4(e9e>jd9$6fd9IQ5!!

On Wed, 20 Dec 2000, Simon Nash wrote:

> Michi,
> Extra nulls could be used to ensure a specific alignment of data
> that
> follows the string.

Sure. But those bytes are *not* part of the string. Instead, they are
padding,
and the contents of those padding bytes need not be defined.

I see your point: for example, if a string ends at some byte boundary
and then I need three bytes of padding to the start of the next value,
I
could use:

        string length   | string value | padding
        -----------------------------------------------
         2              | "c\0"        | "bbb"
         3              | "c\0\0"      | "bb"
         4              | "c\0\0\0"    | "b"
         5              | "c\0\0\0\0"  | none

       [ "c" means an arbitrary character in a string, "b" means a
       padding
         byte with undefined value. ]

All of these get me to the next value boundary. However, I would argue
that the last three options are non-sensical. For example, during
unmarshaling,
I read the length value and allocate that many bytes to hold the
string,
and then I skip the appropriate number of bytes in the input stream to
get to the next value boundary. This works fine for the first case
above,
but is wasteful for the other three cases because I end up allocating
more than the necessary number of bytes for the string.

Personally, I am in favour of making this sort of trickery illegal and
to update the spec (if it doesn't say that already) that the string
length must be exactly the number of bytes in the string plus one
extra
byte for the single terminating NUL.

						Cheers,

													Michi.
--
Michi Henning               +61 7 3324 9633
Object Oriented Concepts    +61 4 1118 2700 (mobile)
Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
Camp Hill 4152              michi@ooc.com.au
Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

Date: Wed, 20 Dec 2000 18:45:24 -0500
From: Paul Kyzivat <pkyzivat@cisco.com>
X-Mailer: Mozilla 4.73 [en]C-CCK-MCD   (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Michi Henning <michi@ooc.com.au>
CC: Simon Nash <nash@hursley.ibm.com>, interop@omg.org
Subject: Re: Null termination of strings
References:
<Pine.HPX.4.05.10012210851010.10677-100000@bobo.ooc.com.au>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: 1KKe9NK*e9\:j!!)iNd9

I agree with Michi.

Not all languages use a null terminated internal representation.
Permitting this sort of trickery would mean that in all of those
languages (including Java) an unmarshaller would have to check for
extra
trailing nulls and remove them. Ugh!

	 Paul (the lurker)

Michi Henning wrote:
> 
> On Wed, 20 Dec 2000, Simon Nash wrote:
> 
> > Michi,
> > Extra nulls could be used to ensure a specific alignment of data
that
> > follows the string.
> 
> Sure. But those bytes are *not* part of the string. Instead, they
are padding,
> and the contents of those padding bytes need not be defined.
> 
> I see your point: for example, if a string ends at some byte
boundary
> and then I need three bytes of padding to the start of the next
value, I
> could use:
> 
>         string length   | string value | padding
>         -----------------------------------------------
>          2              | "c\0"        | "bbb"
>          3              | "c\0\0"      | "bb"
>          4              | "c\0\0\0"    | "b"
>          5              | "c\0\0\0\0"  | none
> 
>        [ "c" means an arbitrary character in a string, "b" means a
padding
>          byte with undefined value. ]
> 
> All of these get me to the next value boundary. However, I would
argue
> that the last three options are non-sensical. For example, during
unmarshaling,
> I read the length value and allocate that many bytes to hold the
string,
> and then I skip the appropriate number of bytes in the input stream
to
> get to the next value boundary. This works fine for the first case
above,
> but is wasteful for the other three cases because I end up
allocating
> more than the necessary number of bytes for the string.
> 
> Personally, I am in favour of making this sort of trickery illegal
and
> to update the spec (if it doesn't say that already) that the string
> length must be exactly the number of bytes in the string plus one
extra
> byte for the single terminating NUL.
> 
>                                                         Cheers,
> 
>
Michi.
> --
> Michi Henning               +61 7 3324 9633
> Object Oriented Concepts    +61 4 1118 2700 (mobile)
> Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
> Camp Hill 4152              michi@ooc.com.au
> Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

Date: Thu, 21 Dec 2000 19:00:35 +0000
From: Simon Nash <nash@hursley.ibm.com>
Organization: IBM
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en
MIME-Version: 1.0
To: Michi Henning <michi@ooc.com.au>
CC: interop@omg.org
Subject: Re: Null termination of strings
References:
<Pine.HPX.4.05.10012210851010.10677-100000@bobo.ooc.com.au>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: %%gd9Q!M!!kmQ!!V$jd9

Michi,
I know that CDR leaves padding if necessitated by the alignment of the
next item.  But this could be used to force the next item to a
stronger
alignment boundary than usual.  For example, if the next item were a
4-byte aligned item, adding extra nulls on the end of the string could
force it to an 8-byte boundary.

In the abstract this does not seem very useful.  However, I am aware
of
an ORB product that does this when building GIOP requests in order to
ensure that the data always starts on an 8-byte boundary.  It does
this
by padding the message name string within the request header.  I would
like a ruling from the RTF on whether this technique is legal.

   Simon

Michi Henning wrote:
> 
> On Wed, 20 Dec 2000, Simon Nash wrote:
> 
> > Michi,
> > Extra nulls could be used to ensure a specific alignment of data
that
> > follows the string.
> 
> Sure. But those bytes are *not* part of the string. Instead, they
are padding,
> and the contents of those padding bytes need not be defined.
> 
> I see your point: for example, if a string ends at some byte
boundary
> and then I need three bytes of padding to the start of the next
value, I
> could use:
> 
>         string length   | string value | padding
>         -----------------------------------------------
>          2              | "c\0"        | "bbb"
>          3              | "c\0\0"      | "bb"
>          4              | "c\0\0\0"    | "b"
>          5              | "c\0\0\0\0"  | none
> 
>        [ "c" means an arbitrary character in a string, "b" means a
padding
>          byte with undefined value. ]
> 
> All of these get me to the next value boundary. However, I would
argue
> that the last three options are non-sensical. For example, during
unmarshaling,
> I read the length value and allocate that many bytes to hold the
string,
> and then I skip the appropriate number of bytes in the input stream
to
> get to the next value boundary. This works fine for the first case
above,
> but is wasteful for the other three cases because I end up
allocating
> more than the necessary number of bytes for the string.
> 
> Personally, I am in favour of making this sort of trickery illegal
and
> to update the spec (if it doesn't say that already) that the string
> length must be exactly the number of bytes in the string plus one
extra
> byte for the single terminating NUL.
> 
>                                                         Cheers,
> 
>
Michi.
> --
> Michi Henning               +61 7 3324 9633
> Object Oriented Concepts    +61 4 1118 2700 (mobile)
> Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
> Camp Hill 4152              michi@ooc.com.au
> Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

-- 
Simon C Nash, Technology Architect, IBM Java Technology Centre
Tel. +44-1962-815156   Fax +44-1962-818999    Hursley, England
Internet: nash@hursley.ibm.com   Lotus Notes: Simon Nash@ibmgb

Date: Fri, 22 Dec 2000 05:40:44 +1000 (EST)
From: Michi Henning <michi@ooc.com.au>
To: Simon Nash <nash@hursley.ibm.com>
cc: interop@omg.org
Subject: Re: Null termination of strings
In-Reply-To: <3A425353.1D0C09FB@hursley.ibm.com>
Message-ID:
<Pine.HPX.4.05.10012220534220.10952-100000@bobo.ooc.com.au>
Organization: Object Oriented Concepts
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: G8<e9f2dd9IB1e9:cZd9

On Thu, 21 Dec 2000, Simon Nash wrote:

> Michi,
> I know that CDR leaves padding if necessitated by the alignment of
> the
> next item.  But this could be used to force the next item to a
> stronger
> alignment boundary than usual.  For example, if the next item were a
> 4-byte aligned item, adding extra nulls on the end of the string
> could
> force it to an 8-byte boundary.
> 
> In the abstract this does not seem very useful.  However, I am aware
> of
> an ORB product that does this when building GIOP requests in order
> to
> ensure that the data always starts on an 8-byte boundary.  It does
> this
> by padding the message name string within the request header.  I
> would
> like a ruling from the RTF on whether this technique is legal.

Hmmm... The relevant words in the spec are:

	A string is encoded as an unsigned long indicating the length
	of
	the string in octets, followed by the string value in single-
	or
	multi-byte form represented as a sequence of octets. Both the
	string length and contents include a terminating null.

Note that this requires "a" terminating null, which I would interpret
to
mean a *single* terminating null.

[ As an aside, the use of the term "null" here is wrong. It should be
"NUL",
  which is the official name of the ASCII character whose value is
  zero.
  We should probably clean this up. ]

Overall, I would be inclined to rule the alignment technique you
describe
as non-compliant.

						Cheers,

													Michi.
--
Michi Henning               +61 7 3324 9633
Object Oriented Concepts    +61 4 1118 2700 (mobile)
Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
Camp Hill 4152              michi@ooc.com.au
Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

Date: Fri, 22 Dec 2000 10:16:28 +0000
From: Simon Nash <nash@hursley.ibm.com>
Organization: IBM
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en
MIME-Version: 1.0
To: Michi Henning <michi@ooc.com.au>
CC: interop@omg.org
Subject: Re: Null termination of strings
References:
<Pine.HPX.4.05.10012220534220.10952-100000@bobo.ooc.com.au>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: ZIQe9E%H!!,Qhd9Om]d9

Michi,
The alternative interpretation of these spec words is that the first
null
(NUL) in the string acts as the terminator.  Other data following the
terminating null (NUL) is not part of the string contents.  This would
be consistent with how terminating nulls (NULs) work in C and C++.

   Simon

Michi Henning wrote:
> 
> On Thu, 21 Dec 2000, Simon Nash wrote:
> 
> > Michi,
> > I know that CDR leaves padding if necessitated by the alignment of
the
> > next item.  But this could be used to force the next item to a
stronger
> > alignment boundary than usual.  For example, if the next item were
a
> > 4-byte aligned item, adding extra nulls on the end of the string
could
> > force it to an 8-byte boundary.
> >
> > In the abstract this does not seem very useful.  However, I am
aware of
> > an ORB product that does this when building GIOP requests in order
to
> > ensure that the data always starts on an 8-byte boundary.  It does
this
> > by padding the message name string within the request header.  I
would
> > like a ruling from the RTF on whether this technique is legal.
> 
> Hmmm... The relevant words in the spec are:
> 
>         A string is encoded as an unsigned long indicating the
length of
>         the string in octets, followed by the string value in
single- or
>         multi-byte form represented as a sequence of octets. Both
the
>         string length and contents include a terminating null.
> 
> Note that this requires "a" terminating null, which I would
interpret to
> mean a *single* terminating null.
> 
> [ As an aside, the use of the term "null" here is wrong. It should
be "NUL",
>   which is the official name of the ASCII character whose value is
zero.
>   We should probably clean this up. ]
> 
> Overall, I would be inclined to rule the alignment technique you
describe
> as non-compliant.
> 
>                                                         Cheers,
> 
>
Michi.
> --
> Michi Henning               +61 7 3324 9633
> Object Oriented Concepts    +61 4 1118 2700 (mobile)
> Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
> Camp Hill 4152              michi@ooc.com.au
> Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

-- 
Simon C Nash, Technology Architect, IBM Java Technology Centre
Tel. +44-1962-815156   Fax +44-1962-818999    Hursley, England
Internet: nash@hursley.ibm.com   Lotus Notes: Simon Nash@ibmgb

X-Sent: 22 Dec 2000 10:25:39 GMT
From: "Nick Sharman" <nick.sharman@cp.net>
To: "Simon Nash" <nash@hursley.ibm.com>, "Michi Henning"
<michi@ooc.com.au>
Cc: <interop@omg.org>
Subject: RE: Null termination of strings
Date: Fri, 22 Dec 2000 10:29:45 -0000
Message-ID: <NDBBLFOJOFLOHHNHLOPNAEDFCLAA.nick.sharman@cp.net>
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
In-Reply-To: <3A425353.1D0C09FB@hursley.ibm.com>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Content-Type: text/plain;
	      charset="us-ascii"
X-UIDL: fBRd9$MS!!6A9e9P[nd9

Simon,


> Michi,
> I know that CDR leaves padding if necessitated by the alignment of
> the
> next item.  But this could be used to force the next item to a
> stronger
> alignment boundary than usual.  For example, if the next item were a
> 4-byte aligned item, adding extra nulls on the end of the string
> could
> force it to an 8-byte boundary.
>
> In the abstract this does not seem very useful.  However, I am aware
> of
> an ORB product that does this when building GIOP requests in order
> to
> ensure that the data always starts on an 8-byte boundary.  It does
> this
> by padding the message name string within the request header.  I
> would
> like a ruling from the RTF on whether this technique is legal.
>
>    Simon

There's no problem with GIOP 1.2 requests (or replies), as data is
required
to be 8-byte aligned.  You don't need to alter the length of the
operation
name, just increment the output buffer pointer to the next multiple of
8.

For 1.0 & 1.1 requests, the last thing before the data is not the
operation
name; it's the principal.  This is an otherwise-unused octet
sequence.  Its
length is an unsigned long, which takes you to a 4-byte boundary.
Just
output 0 or 4 bytes of arbitrary data as the content, as necessary to
take
uou to the next 8-byte boundary.

The data in 1.0 & 1.1 replies is always 4-byte aligned, since the last
header field is an enum value.  If you want 8-byte alignment, allocate
a
vendor service context tag to be used only for padding, to be
marshalled at
the end of the SC list, and choose an appropriate length, 0 or 4, to
finish
on an 8-byte boundary (the rest of the header is 8 bytes, so you still
end
up 8-byte aligned).

Regards
Nick


Date: Sat, 23 Dec 2000 05:00:17 +1000 (EST)
From: Michi Henning <michi@ooc.com.au>
To: Simon Nash <nash@hursley.ibm.com>
cc: interop@omg.org
Subject: Re: Null termination of strings
In-Reply-To: <3A4329FC.5C825496@hursley.ibm.com>
Message-ID:
<Pine.HPX.4.05.10012230456580.11227-100000@bobo.ooc.com.au>
Organization: Object Oriented Concepts
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: hAod9a#5e9>R-e9?X<!!

On Fri, 22 Dec 2000, Simon Nash wrote:

> Michi,
> The alternative interpretation of these spec words is that the first
> null
> (NUL) in the string acts as the terminator.  Other data following
> the
> terminating null (NUL) is not part of the string contents.  This
> would
> be consistent with how terminating nulls (NULs) work in C and C++.

Hmmm... I really don't like this, for the reasons Paul and I outlined.
By claiming that the string is longer than it actually is in its
length
field and then adding additional NUL bytes, I make unmarshaling more
wasteful.
In addition, for languages that do not use the concept of NUL
termination
and instead represent a string as byte array and count, the
unmarshaler
would have to scan every received string from it's tail to strip of
redundant NUL bytes and then adjust the length count accordingly.

I honestly see no gain by allowing the additional NUL bytes, but I see
disadvantages in complexity. I quite strongly feel that the
implementation
you mention should be ruled non-compliant.

					Cheers,

												Michi.
--
Michi Henning               +61 7 3324 9633
Object Oriented Concepts    +61 4 1118 2700 (mobile)
Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
Camp Hill 4152              michi@ooc.com.au
Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

Date: Fri, 22 Dec 2000 13:12:53 -0800
From: Everett Anderson <Everett.Anderson@sun.com>
X-Mailer: Mozilla 4.73 [en] (Windows NT 5.0; U)
X-Accept-Language: en,pdf,ja
MIME-Version: 1.0
To: Michi Henning <michi@ooc.com.au>
CC: interop@omg.org
Subject: Re: Null termination of strings
References:
<Pine.HPX.4.05.10012230456580.11227-100000@bobo.ooc.com.au>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: #li!!ECG!!K?Be9@~J!!

> I honestly see no gain by allowing the additional NUL bytes, but I
  see
> disadvantages in complexity. I quite strongly feel that the
  implementation
> you mention should be ruled non-compliant.

I tend to agree, though I guess my objection is based mainly on the
ugly
Java implementation.  With respect to padding and fragmentation in
GIOP
1.1, it seems like there were mistakes that were corrected in GIOP
1.2,
and string shouldn't have to pay the price forever.

From: "Rutt, T E (Tom)" <terutt@lucent.com>
To: Simon Nash <nash@hursley.ibm.com>, Michi Henning
<michi@ooc.com.au>,
   "'Nick Sharman'" <nick.sharman@cp.net>
Cc: interop@omg.org
Subject: RE: Null termination of strings
Date: Fri, 5 Jan 2001 14:16:26 -0500 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain
X-UIDL: /U]d9V/5e9^$jd9lK<!!

I think that Null terminated is interpreted as One null character.

I do not like the idea of extra nulls in the string itself.


ter

	----------
	From:  Nick Sharman [SMTP:nick.sharman@cp.net]
	Sent:  Friday, December 22, 2000 5:30 AM
	To:  Simon Nash; Michi Henning
	Cc:  interop@omg.org
	Subject:  RE: Null termination of strings

	Simon,


	> Michi,
	> I know that CDR leaves padding if necessitated by the
	> alignment of
the
	> next item.  But this could be used to force the next item to
	> a
stronger
	> alignment boundary than usual.  For example, if the next
	> item were
a
	> 4-byte aligned item, adding extra nulls on the end of the
	> string
could
	> force it to an 8-byte boundary.
	>
	> In the abstract this does not seem very useful.  However, I
	> am
aware of
      > an ORB product that does this when building GIOP requests in
	> order
to
	> ensure that the data always starts on an 8-byte boundary.
	> It does
this
	> by padding the message name string within the request
	> header.  I
would
	> like a ruling from the RTF on whether this technique is
	> legal.
	>
	>    Simon

	There's no problem with GIOP 1.2 requests (or replies), as
	data is
required
	to be 8-byte aligned.  You don't need to alter the length of
	the
operation
	name, just increment the output buffer pointer to the next
	multiple
of 8.

   For 1.0 & 1.1 requests, the last thing before the data is not the
operation
	name; it's the principal.  This is an otherwise-unused octet
sequence.  Its
	   length is an unsigned long, which takes you to a 4-byte
	   boundary.
Just
	output 0 or 4 bytes of arbitrary data as the content, as
	necessary
to take
   uou to the next 8-byte boundary.

   The data in 1.0 & 1.1 replies is always 4-byte aligned, since the
last
	header field is an enum value.  If you want 8-byte alignment,
allocate a
	 vendor service context tag to be used only for padding, to be
marshalled at
	   the end of the SC list, and choose an appropriate length, 0
	   or 4, to
finish
	on an 8-byte boundary (the rest of the header is 8 bytes, so
	you
still end
      up 8-byte aligned).

      Regards
      Nick


From: Jeffrey Mischkinsky <jmischki@wheel.dcn.davis.ca.us>
Message-Id: <200101051947.LAA03471@wheel.dcn.davis.ca.us>
Subject: Re: Null termination of strings
To: terutt@lucent.com ("Rutt, T E (Tom)")
Date: Fri, 5 Jan 2001 11:47:33 -0800 (PST)
Cc: nash@hursley.ibm.com (Simon Nash), michi@ooc.com.au (Michi
Henning),
   nick.sharman@cp.net ('Nick Sharman'), interop@omg.org
In-Reply-To:
<4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com>
from "Rutt, T E (Tom)" at Jan 05, 2001 02:16:26 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: ^?/e9DQ&!!$^&e9Kccd9

'"Rutt, T E (Tom)"' writes:
> 
> I think that Null terminated is interpreted as One null character.
I agree
> 
> I do not like the idea of extra nulls in the string itself.
OTOH I don't think that we specify the bit pattern of padding bytes.
Feel free to use a 0, 377, 255, etc. But padding bytes are not part of
the
item that prcedes them.

jeff
> 
> 
> ter
> 
>	----------
>	From:  Nick Sharman [SMTP:nick.sharman@cp.net]
>	Sent:  Friday, December 22, 2000 5:30 AM
>	To:  Simon Nash; Michi Henning
>	Cc:  interop@omg.org
>	Subject:  RE: Null termination of strings
> 
>	Simon,
> 
> 
>	> Michi,
>	> I know that CDR leaves padding if necessitated by the
alignment of
> the
>	> next item.  But this could be used to force the next item to
a
> stronger
>	> alignment boundary than usual.  For example, if the next
item were
> a
>	> 4-byte aligned item, adding extra nulls on the end of the
string
> could
>	> force it to an 8-byte boundary.
>	>
>	> In the abstract this does not seem very useful.  However, I
am
> aware of
>	> an ORB product that does this when building GIOP requests in
order
> to
>	> ensure that the data always starts on an 8-byte boundary.
It does
> this
>	> by padding the message name string within the request
header.  I
> would
>	> like a ruling from the RTF on whether this technique is
legal.
>	>
>	>    Simon
> 
>	There's no problem with GIOP 1.2 requests (or replies), as
data is
> required
>	to be 8-byte aligned.  You don't need to alter the length of
the
> operation
>	name, just increment the output buffer pointer to the next
multiple
> of 8.
> 
>	For 1.0 & 1.1 requests, the last thing before the data is not
the
> operation
>	name; it's the principal.  This is an otherwise-unused octet
> sequence.  Its
>	     length is an unsigned long, which takes you to a 4-byte
boundary.
> Just
>	output 0 or 4 bytes of arbitrary data as the content, as
necessary
> to take
>    uou to the next 8-byte boundary.
> 
>	The data in 1.0 & 1.1 replies is always 4-byte aligned, since
the
> last
>	header field is an enum value.  If you want 8-byte alignment,
> allocate a
>	   vendor service context tag to be used only for padding, to
be
> marshalled at
>	     the end of the SC list, and choose an appropriate length,
0 or 4, to
> finish
>	on an 8-byte boundary (the rest of the header is 8 bytes, so
you
> still end
>	up 8-byte aligned).
> 
>	Regards
>	Nick
> 
> 
> 
> 
>	
> 


-- 
Jeff Mischkinsky                
jmischki@dcn.davis.ca.us        +1 530-758-9850
jeff@persistence.com		+1 650-372-3604

Date: Sun, 07 Jan 2001 14:44:03 +0100
From: Marcus Wittig <Marcus.Wittig@xtradyne.de>
X-Mailer: Mozilla 4.75 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: "Rutt, T E (Tom)" <terutt@lucent.com>
CC: Simon Nash <nash@hursley.ibm.com>, Michi Henning
<michi@ooc.com.au>,
   "'Nick Sharman'" <nick.sharman@cp.net>, interop@omg.org
Subject: Re: Null termination of strings
References:
<4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com>
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=iso-8859-1
X-UIDL: 2Ag!!Ld+e9aH(e9ocNe9


"'Nick Sharman'" <nick.sharman@cp.net> wrote:

>
>         The data in 1.0 & 1.1 replies is always 4-byte aligned,
>since the
> last
>         header field is an enum value.  If you want 8-byte
>alignment,
> allocate a
>         vendor service context tag to be used only for padding, to
>be
> marshalled at
>         the end of the SC list, and choose an appropriate length, 0
>or 4, to
> finish
>         on an 8-byte boundary (the rest of the header is 8 bytes, so
>you
> still end
>         up 8-byte aligned).

Unfortunately, this won't work always as expected due to a weakness of
the Core
standard up to version CORBA 2.3.1. The problem is core spec defines
the
following rule about what an ORB should do with a service context
which is not
in the OMG-defined range:
    "The receiving ORB may choose to ignore it, process it if i t
it, or
    raise a system exception, however it must be passed on through a
    bridge. If
a
    system exception is raised, it shall be BAD_PARAM with an OMG
    standard"
    (see chapter 13.6, CORBA 2.3.1)
So, the target ORB may throw a system exception if it comes accross an
"unknown"
service context. Too bad! This problem has been recognized and the new
CORBA 2.4
spec has changed this rule by removing the second sentence. Fine, but
in
practice it is of little value as vendors have to deal with legacy ORB
products
for a long time. There is at least one ORB product I know about which
throws a
system exception if it receives an "unknown" service context.

Kind Regards
Marcus Wittig

Date: Mon, 08 Jan 2001 13:58:55 +0000
From: Simon Nash <nash@hursley.ibm.com>
Organization: IBM
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en
MIME-Version: 1.0
To: "Rutt, T E (Tom)" <terutt@lucent.com>
CC: Michi Henning <michi@ooc.com.au>, "'Nick Sharman'"
<nick.sharman@cp.net>,
   interop@omg.org
Subject: Re: Null termination of strings
References:
<4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: eWkd9njD!!o!De9bFM!!

Tom,
OK, there is pretty strong consensus that extra NUL terminators should
not be allowed as part of the string.  So what should a compliant ORB
do when it receives a string or wstring that contains these extra
NULs?
It could:
 a. strip off the last NUL (except for GIOP 1.2 wstring, where there
 is
    no teminating NUL), and treat the others as part of the string or
    wstring.  This might be natural for a Java ORB.
 b. terminate the string or wstring at the first NUL (except for GIOP
 1.2
    wstring, where there is no teminating NUL).  This might be natural
    for a C++ ORB.
 c. raise a MARSHAL exception.
 d. fail in some other way.
Are all these legal, or should GIOP specify which of these is correct?

A related issue is whether or not NUL characters can be embedded in 
GIOP 1.2 wstrings.  These do not use NUL as a terminator, so from
reading
chapter 15 this would appear to be OK.  However, section 3.10.4.3 says
that wstrings cannot include the wide character null.  Is this a
hangover
from pre-GIOP 1.2 days, when wstrings were NUL-terminated, or is there
still a good reason for this limitation?

This came up recently in the context of the Java to IDL mapping.  A
Java
string (which is mapped to an IDL wstring) contained an embedded NUL,
and one of our products did not handle this correctly.  If we fix our
product to put a NUL wide character on the wire in this case, will
this
be compliant with GIOP 1.2?

   Simon

"Rutt, T E (Tom)" wrote:
> 
> I think that Null terminated is interpreted as One null character.
> 
> I do not like the idea of extra nulls in the string itself.
> 
> ter
> 
>         ----------
>         From:  Nick Sharman [SMTP:nick.sharman@cp.net]
>         Sent:  Friday, December 22, 2000 5:30 AM
>         To:  Simon Nash; Michi Henning
>         Cc:  interop@omg.org
>         Subject:  RE: Null termination of strings
> 
>         Simon,
> 
>         > Michi,
>         > I know that CDR leaves padding if necessitated by the
alignment of
> the
>         > next item.  But this could be used to force the next item
to a
> stronger
>         > alignment boundary than usual.  For example, if the next
item were
> a
>         > 4-byte aligned item, adding extra nulls on the end of the
string
> could
>         > force it to an 8-byte boundary.
>         >
>         > In the abstract this does not seem very useful.  However,
I am
> aware of
>         > an ORB product that does this when building GIOP requests
in order
> to
>         > ensure that the data always starts on an 8-byte boundary.
It does
> this
>         > by padding the message name string within the request
header.  I
> would
>         > like a ruling from the RTF on whether this technique is
legal.
>         >
>         >    Simon
> 
>         There's no problem with GIOP 1.2 requests (or replies), as
data is
> required
>         to be 8-byte aligned.  You don't need to alter the length of
the
> operation
>         name, just increment the output buffer pointer to the next
multiple
> of 8.
> 
>         For 1.0 & 1.1 requests, the last thing before the data is
not the
> operation
>         name; it's the principal.  This is an otherwise-unused octet
> sequence.  Its
>         length is an unsigned long, which takes you to a 4-byte
boundary.
> Just
>         output 0 or 4 bytes of arbitrary data as the content, as
necessary
> to take
>         uou to the next 8-byte boundary.
> 
>         The data in 1.0 & 1.1 replies is always 4-byte aligned,
since the
> last
>         header field is an enum value.  If you want 8-byte
alignment,
> allocate a
>         vendor service context tag to be used only for padding, to
be
> marshalled at
>         the end of the SC list, and choose an appropriate length, 0
or 4, to
> finish
>         on an 8-byte boundary (the rest of the header is 8 bytes, so
you
> still end
>         up 8-byte aligned).
> 
>         Regards
>         Nick
> 
> 

-- 
Simon C Nash, Technology Architect, IBM Java Technology Centre
Tel. +44-1962-815156   Fax +44-1962-818999    Hursley, England
Internet: nash@hursley.ibm.com   Lotus Notes: Simon Nash@ibmgb

Date: Mon, 08 Jan 2001 09:57:28 -0500
From: Paul Kyzivat <pkyzivat@cisco.com>
X-Mailer: Mozilla 4.73 [en]C-CCK-MCD   (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
To: interop@omg.org
Subject: Re: Null termination of strings
References:
<4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com>
<3A59C79F.162CC8D1@hursley.ibm.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: !jXd9DIFe9c>N!!jffd9


Simon Nash wrote:
> 
> Tom,
> OK, there is pretty strong consensus that extra NUL terminators
should
> not be allowed as part of the string.  So what should a compliant
ORB
> do when it receives a string or wstring that contains these extra
NULs?
> It could:
>  a. strip off the last NUL (except for GIOP 1.2 wstring, where there
is
>     no teminating NUL), and treat the others as part of the string
or
>     wstring.  This might be natural for a Java ORB.
>  b. terminate the string or wstring at the first NUL (except for
GIOP 1.2
>     wstring, where there is no teminating NUL).  This might be
natural
>     for a C++ ORB.
>  c. raise a MARSHAL exception.
>  d. fail in some other way.
> Are all these legal, or should GIOP specify which of these is
correct?

I think at least (a) and (c) should be legal implementations.

> 
> A related issue is whether or not NUL characters can be embedded in
> GIOP 1.2 wstrings.  These do not use NUL as a terminator, so from
> reading
> chapter 15 this would appear to be OK.  However, section 3.10.4.3
> says
> that wstrings cannot include the wide character null.  Is this a
> hangover
> from pre-GIOP 1.2 days, when wstrings were NUL-terminated, or is
> there
> still a good reason for this limitation?

Well, there are two obvious reasons why the restriction should still
remain:

1) It is hard to ensure that your string will only be conveyed via
   giop
1.2.
You may send it that way, but it may be passed on by the recipient to
somebody
else using an earlier version of giop. 

2) Embedded nulls in strings don't work for either the C or C++
   language
mappings.

The first of these is a more valid reason in my mind. 
This behavior by C & C++ has always been a mistake in my mind.

IF this restriction is ever lifted in Interop, then the C & C++
language mappings will need tobe ammended to specify what happens
if an embedded null is received.

> 
> This came up recently in the context of the Java to IDL mapping.  A
> Java
> string (which is mapped to an IDL wstring) contained an embedded
> NUL,
> and one of our products did not handle this correctly.  If we fix
> our
> product to put a NUL wide character on the wire in this case, will
> this
> be compliant with GIOP 1.2?

I don't think so.

  Paul

Date: Mon, 8 Jan 2001 18:17:20 +0100 (MET)
Message-Id: <200101081717.SAA07392@pandora.informatik.hu-berlin.de>
X-Authentication-Warning: pandora.informatik.hu-berlin.de: loewis set
sender to loewis@informatik.hu-berlin.de using -f
From: Martin von Loewis <loewis@informatik.hu-berlin.de>
To: nash@hursley.ibm.com
CC: terutt@lucent.com, michi@ooc.com.au, nick.sharman@cp.net,
interop@omg.org
In-reply-to: <3A59C79F.162CC8D1@hursley.ibm.com> (message from Simon
Nash on
     Mon, 08 Jan 2001 13:58:55 +0000)
Subject: Re: Null termination of strings
References:
<4490F7068AC0D111A7120008C72878EC085E6948@nj7460exch003u.ho.lucent.com>
<3A59C79F.162CC8D1@hursley.ibm.com>
User-Agent: SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) Emacs/20.6
(sparc-sun-solaris2.6) MULE/4.0 (HANANOEN)
MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu")
Content-Type: text/plain; charset=US-ASCII
X-UIDL: W'9!!O4ad9&P`d9\41!!

> OK, there is pretty strong consensus that extra NUL terminators
  should
> not be allowed as part of the string.  So what should a compliant
  ORB
> do when it receives a string or wstring that contains these extra
  NULs?
> It could:
>  a. strip off the last NUL (except for GIOP 1.2 wstring, where there
  is
>     no teminating NUL), and treat the others as part of the string
  or
>     wstring.  This might be natural for a Java ORB.
>  b. terminate the string or wstring at the first NUL (except for
  GIOP 1.2
>     wstring, where there is no teminating NUL).  This might be
  natural
>     for a C++ ORB.
>  c. raise a MARSHAL exception.
>  d. fail in some other way.
> Are all these legal, or should GIOP specify which of these is
> correct?

The message being received is ill-formed, so an ORB doing c) is
clearly behaving properly. I think an ORB should not be required
to detect all possible message errors. If it doesn't detect this
error, further behaviour is unspecified. The ORB may do any of these,
and many more, including

e. Turn your coffee machine off

> A related issue is whether or not NUL characters can be embedded in 
> GIOP 1.2 wstrings.  

Not sure what you mean by NUL character here. A wide string is encoded
as a sequence of octets, some of which may have an all-bits-zero
octet. That, in general, is different from a NUL wide character, whose
wire representation depends on the coded character set.

> These do not use NUL as a terminator, so from reading chapter 15
> this would appear to be OK.  However, section 3.10.4.3 says that
> wstrings cannot include the wide character null.  Is this a hangover
> from pre-GIOP 1.2 days, when wstrings were NUL-terminated, or is
> there still a good reason for this limitation?

I think Paul is right that the C and C++ mappings currently cannot
represent wide character strings containing (wint_t)0, and that this
is a good reason to disallow such strings in CORBA.

> This came up recently in the context of the Java to IDL mapping.  A
  Java
> string (which is mapped to an IDL wstring) contained an embedded
  NUL,
> and one of our products did not handle this correctly.  If we fix
  our
> product to put a NUL wide character on the wire in this case, will
  this
> be compliant with GIOP 1.2?

No, that string would still contain a NUL wide character, which is not
supported. Of course, if the product really failed for a null octet in
the wstring (which frequently happens for UCS-2), then the product
would be broken; adding a NUL wide character would also render the
other product non-compliant.

Regards,
Martin

Date: Thu, 18 Jan 2001 10:08:12 +1000 (EST)
From: Michi Henning <michi@ooc.com.au>
To: Paul Kyzivat <pkyzivat@cisco.com>
cc: interop@omg.org
Subject: Re: Null termination of strings
In-Reply-To: <3A59D557.9F326EC0@cisco.com>
Message-ID: <Pine.HPX.4.05.10101181006460.4486-100000@bobo.ooc.com.au>
Organization: Object Oriented Concepts
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: lKR!!R2bd9[&I!!^;cd9

On Mon, 8 Jan 2001, Paul Kyzivat wrote:

> Simon Nash wrote:
> > 
> > Tom,
> > OK, there is pretty strong consensus that extra NUL terminators
should
> > not be allowed as part of the string.  So what should a compliant
ORB
> > do when it receives a string or wstring that contains these extra
NULs?
> > It could:
> >  a. strip off the last NUL (except for GIOP 1.2 wstring, where
there is
> >     no teminating NUL), and treat the others as part of the string
or
> >     wstring.  This might be natural for a Java ORB.
> >  b. terminate the string or wstring at the first NUL (except for
GIOP 1.2
> >     wstring, where there is no teminating NUL).  This might be
natural
> >     for a C++ ORB.
> >  c. raise a MARSHAL exception.
> >  d. fail in some other way.
> > Are all these legal, or should GIOP specify which of these is
correct?
> 
> I think at least (a) and (c) should be legal implementations.

I can't say I like that much. That's because, if (a) and (c) are both
legal,
we will end up with the situation where one ORB consistently rejects a
request, and a different ORB consistently accepts the same request,
and
both can claim to be compliant. That's not a good idea...

I'd prefer to require a MARSHAL exception.

					Cheers,

												Michi.
--
Michi Henning               +61 7 3324 9633
Object Oriented Concepts    +61 4 1118 2700 (mobile)
Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
Camp Hill 4152              michi@ooc.com.au
Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

Date: Wed, 17 Jan 2001 19:27:55 -0500
From: Paul Kyzivat <pkyzivat@cisco.com>
X-Mailer: Mozilla 4.73 [en]C-CCK-MCD   (WinNT; U)
X-Accept-Language: en
MIME-Version: 1.0
CC: interop@omg.org
Subject: Re: Null termination of strings
References: <Pine.HPX.4.05.10101181006460.4486-100000@bobo.ooc.com.au>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: F)S!!;<N!!$7Oe9Kj0!!

> I'd prefer to require a MARSHAL exception.

My comment was predicated on the assumption that sending
multiple NULs is illegal. So this is simply discussing
what an orb is permitted to do when confronted with an
invalid encoding on the wire.

I don't think a conforming implementation should be
required to catch this. It is a needless expense.
This then becomes a quality of implementation issue,
and permits orbs to decide whether to emphasize speed,
or the ability to diagnose a defective orb at the
other end.

Making it illegal to send the NUL incurs no extra cost
for C++ orbs because the null is already the end delimiter
on strings. I suppose it is an extra cost in Java because
it is easy to get a NUL into a string. But it seems better
to sometimes pay the cost on one end than it is to always
pay the cost on both ends.

    Paul

Michi Henning wrote:
> 
> On Mon, 8 Jan 2001, Paul Kyzivat wrote:
> 
> > Simon Nash wrote:
> > >
> > > Tom,
> > > OK, there is pretty strong consensus that extra NUL terminators
should
> > > not be allowed as part of the string.  So what should a
compliant ORB
> > > do when it receives a string or wstring that contains these
extra NULs?
> > > It could:
> > >  a. strip off the last NUL (except for GIOP 1.2 wstring, where
there is
> > >     no teminating NUL), and treat the others as part of the
string or
> > >     wstring.  This might be natural for a Java ORB.
> > >  b. terminate the string or wstring at the first NUL (except for
GIOP 1.2
> > >     wstring, where there is no teminating NUL).  This might be
natural
> > >     for a C++ ORB.
> > >  c. raise a MARSHAL exception.
> > >  d. fail in some other way.
> > > Are all these legal, or should GIOP specify which of these is
correct?
> >
> > I think at least (a) and (c) should be legal implementations.
> 
> I can't say I like that much. That's because, if (a) and (c) are
both legal,
> we will end up with the situation where one ORB consistently rejects
a
> request, and a different ORB consistently accepts the same request,
and
> both can claim to be compliant. That's not a good idea...
> 
> I'd prefer to require a MARSHAL exception.
> 
>                                                         Cheers,
> 
>
Michi.
> --
> Michi Henning               +61 7 3324 9633
> Object Oriented Concepts    +61 4 1118 2700 (mobile)
> Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
> Camp Hill 4152              michi@ooc.com.au
> Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

Date: Thu, 18 Jan 2001 11:07:23 +1000 (EST)
From: Michi Henning <michi@ooc.com.au>
To: Paul Kyzivat <pkyzivat@cisco.com>
cc: interop@omg.org
Subject: Re: Null termination of strings
In-Reply-To: <3A66388B.8CA40485@cisco.com>
Message-ID: <Pine.HPX.4.05.10101181105510.4486-100000@bobo.ooc.com.au>
Organization: Object Oriented Concepts
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: ^WBe9`cZ!!P"2e9)SW!!

On Wed, 17 Jan 2001, Paul Kyzivat wrote:

> > I'd prefer to require a MARSHAL exception.
> 
> My comment was predicated on the assumption that sending
> multiple NULs is illegal. So this is simply discussing
> what an orb is permitted to do when confronted with an
> invalid encoding on the wire.
> 
> I don't think a conforming implementation should be
> required to catch this. It is a needless expense.
> This then becomes a quality of implementation issue,
> and permits orbs to decide whether to emphasize speed,
> or the ability to diagnose a defective orb at the
> other end.

OK, I agree with that.

> Making it illegal to send the NUL incurs no extra cost
> for C++ orbs because the null is already the end delimiter
> on strings. I suppose it is an extra cost in Java because
> it is easy to get a NUL into a string. But it seems better
> to sometimes pay the cost on one end than it is to always
> pay the cost on both ends.

Yes. I'd make it illegal then for an ORB to send a string that has
more
than one NUL at the end. This seems to make the most sense anyway,
seeing
that IDL prohibits embedded NULs in strings. Whether to eat the string
or to throw an exception at the receiving end then becomes a
quality-of-implementation issue. But at least, by making it illegal to
send such a string, there is a definite culprit to point the finger
at.

							Cheers,

														Michi.

Date: Thu, 18 Jan 2001 19:16:48 +0000
From: Simon Nash <nash@hursley.ibm.com>
Organization: IBM
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en
MIME-Version: 1.0
To: Michi Henning <michi@ooc.com.au>
CC: Paul Kyzivat <pkyzivat@cisco.com>, interop@omg.org
Subject: Re: Null termination of strings
References: <Pine.HPX.4.05.10101181105510.4486-100000@bobo.ooc.com.au>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: +8id9:<(!!`'7!!6ml!!

Michi,

Michi Henning wrote:
> 
> On Wed, 17 Jan 2001, Paul Kyzivat wrote:
> 
> > > I'd prefer to require a MARSHAL exception.
> >
> > My comment was predicated on the assumption that sending
> > multiple NULs is illegal. So this is simply discussing
> > what an orb is permitted to do when confronted with an
> > invalid encoding on the wire.
> >
> > I don't think a conforming implementation should be
> > required to catch this. It is a needless expense.
> > This then becomes a quality of implementation issue,
> > and permits orbs to decide whether to emphasize speed,
> > or the ability to diagnose a defective orb at the
> > other end.
> 
> OK, I agree with that.
> 
> > Making it illegal to send the NUL incurs no extra cost
> > for C++ orbs because the null is already the end delimiter
> > on strings. I suppose it is an extra cost in Java because
> > it is easy to get a NUL into a string. But it seems better
> > to sometimes pay the cost on one end than it is to always
> > pay the cost on both ends.
> 
> Yes. I'd make it illegal then for an ORB to send a string that has
more
> than one NUL at the end. This seems to make the most sense anyway,
seeing
> that IDL prohibits embedded NULs in strings. Whether to eat the
string
> or to throw an exception at the receiving end then becomes a
> quality-of-implementation issue. But at least, by making it illegal
to
> send such a string, there is a definite culprit to point the finger
at.
> 
Sorry, but I disagree that Java ORBs should have to scan every string
that
the application sends to make sure that it does not contain an
embedded NUL.
If the receiver does not have to diagnose this error, then neither
should
the sender.

   Simon
-- 
Simon C Nash, Technology Architect, IBM Java Technology Centre
Tel. +44-1962-815156   Fax +44-1962-818999    Hursley, England
Internet: nash@hursley.ibm.com   Lotus Notes: Simon Nash@ibmgb

Date: Thu, 21 Dec 2000 06:01:21 +1000 (EST)
From: Michi Henning <michi@ooc.com.au>
To: Simon Nash <nash@hursley.ibm.com>
cc: issues@omg.org, interop@omg.org
Subject: Re: Null termination of strings 
In-Reply-To: <3A310359.13F43A5E@hursley.ibm.com>
Message-ID:
<Pine.HPX.4.05.10012210559240.10509-100000@bobo.ooc.com.au>
Organization: Object Oriented Concepts
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: 4T6e9g]h!!(,%"!OTPe9

On Fri, 8 Dec 2000, Simon Nash wrote:

> Section 15.3.2.7 of the CORBA 2.3 spec, which describes the CDR
  encoding
> of strings, includes the following sentence in the first paragraph:
> 
>   "Both the string length and contents include a terminating null."
> 
> It is not clear from this whether exactly one terminating null is
  required,
> or whether more than one null can be included, with the string being
  terminated
> by the first null.
> 
> Since IDL strings cannot include nulls (see 3.10.3.2: "OMG IDL
  defines the string
> type string consisting of all possible 8-bit quantities except
  null"), any
> additional nulls following the first terminating null cannot be part
  of the
> string, and it therefore seems reasonable to ignore them.
> 
> Proposed Resolution:
> 
> Change the above sentence in section 15.3.2.7 to:
> 
>   "Both the string length and contents include at least one
  terminating null."
> 
> Also make the same change to the corresponding sentence in the third
  paragraph
> of section 15.3.2.7 describing GIOP 1.1 wide strings.

I don't think any such change is needed. The string length tells me
how
many bytes are in the string, including the terminating NUL. I would
expect
that length to give me *exactly* that count. What follows the
terminating
NUL is either padding, or the next value in the byte stream.

I see no point in allowing a string to have several terminating NUL
characters.
What would this improve?

						Cheers,

													Michi.
--
Michi Henning               +61 7 3324 9633
Object Oriented Concepts    +61 4 1118 2700 (mobile)
Suite 4, 8 Martha St        +61 7 3324 9799 (fax)
Camp Hill 4152              michi@ooc.com.au
Brisbane, AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html


Date: Sat, 2 Jun 2001 10:53:27 +1000 (EST)
From: Michi Henning <michi.henning@iona.com>
To: Interoperability RTF <interop@omg.org>
cc: Interoperability RTF <interop@omg.org>
Subject: On 4113
In-Reply-To: <4.3.2.7.2.20010601174603.01c8c430@emerald.omg.org>
Message-ID: <Pine.HPX.4.05.10106021037330.4283-100000@bobo.ooc.com.au>
Organization: IONA Technologies
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: fP=e9VWKe91~ed9_J<!!
Status: RO

Hi,

for 4113, the suggestion for the vote is to close no change.

The contentious sentence currently reads:

    Both the string length and contents include a terminating null.

For one, the fact that we had to have a long discussion about this
issue
indicates that the intent isn't clear enough, so I would like to see
"exactly one" instead of "one" and propagate the consensus into the
spec
instead of leaving the ambiguity as is.

Second, I would like to see the spec corrected to talk about NUL and
null
correctly. The name of the byte with value zero in ASCII is "NUL" but,
for wide characters sets, the name of the character containing all
zeros
is "null character". Sad, but true. Let's use the terminology
correctly.

Third, it's been wrong since the day dot, and no-one seems to have
noticed:

	Both the string length and contents include a terminating
	null.
			^^^^^^

What the hell do we need a terminating null (or NUL) for the string
*length*
for?! It's an unsigned long, afer all.

Proposal:

Change the first para of 15.3.2.7 to read:

       A string is encoded as an unsigned long indicating the length
       of
       the string in octets, followed by the string value in single-
       or
       multi-byte form represented as a sequence of octets. The
						      ^^^
						      string contents
       include a terminating null character.
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Change the third para of 15.3.2.7 to read:

       For GIOP version 1.1, a wide string is encoded as an unsigned
       long indicating the length of the string in octets or unsigned
       integers (determined by the transfer syntax for wchar) followed
       by the individual wide characters. The string contents include
       a terminating null character. The terminating null character
		        ^^^^^^^^^
			for a wstring is also a wide character.

Change the fourth para of 15.3.2.7 to read:

       For GIOP version 1.2, when encoding a wstring, always encode
       the
       length as the total number of octets used by the encoded value,
       regardless of whether the encoding is byte-oriented or not. For
       GIOP version 1.2 a wstring is not terminated by a null
       character.
						  ^^^^^^^^^^^^^^
						  In particular, in
       GIOP version 1.2 a length of 0 is legal for wstring.

				    Cheers,

											Michi.
--
Michi Henning                             +61 7 3324 9633
Chief CORBA Scientist                     +61 4 1118 2700 (mobile)
IONA Technologies                         +61 7 3324 9799 (fax)
Total Business Integration
http://www.ooc.com.au/staff/michi


Date: Tue, 5 Jun 2001 08:06:36 +1000 (EST)
From: Michi Henning <michi.henning@iona.com>
To: Interoperability RTF <interop@omg.org>
Subject: Re: On 4113
In-Reply-To:
<Pine.HPX.4.05.10106021037330.4283-100000@bobo.ooc.com.au>
Message-ID: <Pine.HPX.4.05.10106050804520.4975-100000@bobo.ooc.com.au>
Organization: IONA Technologies
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: FDGe9>-Wd91,]!!dg`d9

On Sat, 2 Jun 2001, Michi Henning wrote:

> Third, it's been wrong since the day dot, and no-one seems to have
  noticed:
> 
>	Both the string length and contents include a terminating
  null.
>			^^^^^^
> 
> What the hell do we need a terminating null (or NUL) for the string
  *length*
> for?! It's an unsigned long, afer all.

Ah, just got the insight. It means that the count includes the NUL.
The wording is awful though. I would suggest:

    The string contents include a terminating null character. The
    string
    length includes the null character, so an empty string has a
    length
    of 1.

					Cheers,

												Michi.
--
Michi Henning                             +61 7 3324 9633
Chief CORBA Scientist                     +61 4 1118 2700 (mobile)
IONA Technologies                         +61 7 3324 9799 (fax)
Total Business Integration
http://www.ooc.com.au/staff/michi


From: "Everett Anderson" <everett.anderson@eng.sun.com>
To: "Interoperability RTF" <interop@emerald.omg.org>
Subject: RE: On 4113
Date: Wed, 6 Jun 2001 10:57:42 -0700
Message-ID:
<NGENKBMDDELACOOFLINGOEKPCAAA.everett.anderson@eng.sun.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
Importance: Normal
In-Reply-To:
<Pine.HPX.4.05.10106050804520.4975-100000@bobo.ooc.com.au>
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
Content-Type: text/plain;
	      charset="us-ascii"
X-UIDL: 6]Od9:@7e9(Bm!!LjK!!

Hi,

I'd like to change Sun's vote for 4113 from YES to NO.  I really would
like
it resolved in this vote, but agree that the wording has always been
quite
awkward.  I'd support Michi's suggestion below.

Thanks,
Everett


> -----Original Message-----
> From: Michi Henning [mailto:michi.henning@iona.com]
> Sent: Monday, June 04, 2001 3:07 PM
> To: Interoperability RTF
> Subject: Re: On 4113
>
>
> On Sat, 2 Jun 2001, Michi Henning wrote:
>
> > Third, it's been wrong since the day dot, and no-one seems to
> have noticed:
> >
> >	Both the string length and contents include a terminating
null.
> >			^^^^^^
> >
> > What the hell do we need a terminating null (or NUL) for the
> string *length*
> > for?! It's an unsigned long, afer all.
>
> Ah, just got the insight. It means that the count includes the NUL.
> The wording is awful though. I would suggest:
>
>	The string contents include a terminating null character. The
string
>	length includes the null character, so an empty string has a
length
>	of 1.
>
>							Cheers,
>
>								Michi.
> --
> Michi Henning                             +61 7 3324 9633
> Chief CORBA Scientist                     +61 4 1118 2700 (mobile)
> IONA Technologies                         +61 7 3324 9799 (fax)
> Total Business Integration
> http://www.ooc.com.au/staff/michi
>
>


Date: Sat, 09 Jun 2001 00:38:57 +0100
From: Simon Nash <nash@hursley.ibm.com>
Organization: IBM
X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I)
X-Accept-Language: en
MIME-Version: 1.0
To: terutt@lucent.com
CC: interop@omg.org
Subject: Re: Interop Final Wordsmith before Vote 3
References: <3B21422F.55EBC160@lucent.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: M4hd9)OR!!f*0e9[_Ud9
Status: RO

Tom,
My suggested changes for 4113 are marked with ^^^^ below.

   Simon

> 
> Revised Text:
> 
> Change the first para of 15.3.2.7 from:
> 
>      A string is encoded as an unsigned long indicating the length
> of the string in octets,
>      followed by the string value in single- or multi-byte form
> represented as a sequence of
>      octets. Both the string length and contents include a
> terminating null.
> 
>   to read:
>         A string is encoded as an unsigned long indicating the
> length of
>         the string in octets, followed by the string value in
> single- or
>         multi-byte form represented as a sequence of octets. The
>         string contents include a terminating null character.  The
> string
>         length includes the null character, so an empty string has a
> length of 1.
>
A string is encoded as an unsigned long indicating the length of
the string in octets, followed by the string value in single- or
multi-byte form represented as a sequence of octets. The
string contents include a single terminating null character. The
> string
                          ^^^^^^
length includes the null character, so an empty string has a length of
> 1.
 
> Change the third para of 15.3.2.7 from:
> 
>      For GIOP version 1.1, a wide string is encoded as an unsigned
> long
>      indicating the length of the string in octets or unsigned
> integers
>      (determined by the transfer syntax for wchar) followed by the
> individual
>      wide characters. Both the string length and contents include a
> terminating
>      null. The terminating null character for a wstring is also a
> wide character.
> 
>   to read:
> 
>         For GIOP version 1.1, a wide string is encoded as an
> unsigned
>         long indicating the length of the string in octets or
> unsigned
>         integers (determined by the transfer syntax for wchar)
> followed
>         by the individual wide characters. The string contents
> include
>         a terminating null character. The terminating null character
>         for a wstring is also a wide character.
>
For GIOP version 1.1, a wide string is encoded as an unsigned
long indicating the length of the string in octets or unsigned
integers (determined by the transfer syntax for wchar) followed
by the individual wide characters. The string contents include
a single terminating null character. The string length includes
  ^^^^^^                             ^^^^^^^^^^^^^^^^^^^^^^^^^^ 
the null character. The terminating null character for a wstring
^^^^^^^^^^^^^^^^^^^
is also a wide character.
 
> Change the fourth para of 15.3.2.7 from:
> 
>      For GIOP version 1.2, when encoding a wstring, always encode
> the
>      length as the total number of octets used by the encoded value,
> regardless
>      of whether the encoding is byte-oriented or not. For GIOP
> version 1.2
>      a wstring is not terminated by a NUL character. In particular,
> in GIOP
>      version 1.2 a length of 0 is legal for wstring.
> 
>   to read:
> 
>         For GIOP version 1.2, when encoding a wstring, always encode
> the
>         length as the total number of octets used by the encoded
> value,
>         regardless of whether the encoding is byte-oriented or
> not. For
>         GIOP version 1.2 a wstring is not terminated by a null
> character.
>         In particular, in GIOP version 1.2 a length of 0 is legal
> for wstring.
> 

-- 
Simon C Nash, Chief Technical Officer, IBM Java Technology
Tel. +44-1962-815156   Fax +44-1962-818999    Hursley, England
Internet: nash@hursley.ibm.com   Lotus Notes: Simon Nash@ibmgb