Issue 1531:  Typecode encoding is too long (interop)
Source:  (, )
Nature:  Uncategorized Issue
Severity: 
Summary: Summary: Proposal: allow for alternate, compact encoding of typecodes
 It is clear that typecodes are quite long when encoded into CDR and 
 transmitted over the wire.  This is particularly painful when the CORBA::Any
 type is used in interfaces.  Besides, the use of any is becoming increasingly
 important in multiple CORBA specifications, and a global solution to this
 problem should be provided.
 
 My proposal is to allow for an alternate encoding of typecodes, for those
 specific cases where type identification can be achieved either by other
 application specific mechanisms (for example, Name-Value pairs) or where
 the Repository Id of the type is enough to reconstruct the full typecode
 information locally on the receiving side.
 

Resolution: 
Revised Text: :Any
Actions taken:
June 18, 1998: received issue
February 17, 1999: closed issue
Discussion: 

End of Annotations:=====
Return-Path: <javier@cnd.hp.com>
Date: Thu, 18 Jun 1998 09:48:57 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: interop@omg.org, issues@omg.org
Subject: Issue: typecodes too long
Content-Md5: rkD7uShVsVA5yVNCxb2Ppg==

Issue: typecode encoding is too long, verbose

Proposal: allow for alternate, compact encoding of typecodes

Specifics: 

It is clear that typecodes are quite long when encoded into CDR and 
transmitted over the wire.  This is particularly painful when the
CORBA::Any
type is used in interfaces.  Besides, the use of any is becoming
increasingly
important in multiple CORBA specifications, and a global solution to
this
problem should be provided.

My proposal is to allow for an alternate encoding of typecodes, for
those
specific cases where type identification can be achieved either by
other
application specific mechanisms (for example, Name-Value pairs) or
where
the Repository Id of the type is enough to reconstruct the full
typecode
information locally on the receiving side.

Specifically, I would propose to use a separate TCKind value (say,
tk_implicit
or tk_compact, or tk_opaque ... whatever feels better), encoded as
follows:

TCKind           Integer Value    Type     Parameters
----------------------------------------------------------------------------
tk_implicit      23               complex  string (repository ID)
----------------------------------------------------------------------------

The Repository Id is mandatory, and corresponds EXACTLY to the rep id
of
the type being transmitted.  This type could be one of tk_struct,
tk_union,
tk_enum (I'm not sure about tk_alias, tk_except and tk_objref).

Therefore, this only applies to the above mentioned types (struct,
union
and enum).  The reason is that other types are not required to have
repository ids, and that their typecode encoding is not so verbose
(therefore, the gains are much smaller, if any).

How to enable the use of this compact types is another issue, that I
have
not completely thought about yet.  Any suggestions in this direction
are really welcome.  The options I have been considering up to now are
somewhat related to the proposals made in conjunction to the typecode
equality/equivalence proposal, and also potentially some modification
to GIOP to allow for negotiation of this type of information.
Anyhow, as I said, suggestions are very welcome!

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <janssen@parc.xerox.com>
Date: Thu, 18 Jun 1998 10:54:51 PDT
Sender: Bill Janssen <janssen@parc.xerox.com>
From: Bill Janssen <janssen@parc.xerox.com>
To: interop@omg.org, Javier Lopez-Martin <javier@cnd.hp.com>
Subject: Re: Issue: typecodes too long
References: <199806181548.AA091694937@ovdm40.cnd.hp.com>

Great idea!  What RFP are you going to add it to?

One possibility is to add another repository ID form, which would be a
URI of some form (IOR:?).  Use of tk_implicit would be restricted to
types with these kinds of repository IDs.  If you need the full
description of the typecode, you retrieve the URI.  Perhaps some form
of
the IOR of an object which is a description of the type in an IR.

Bill

Return-Path: <javier@cnd.hp.com>
Date: Thu, 18 Jun 1998 12:36:43 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: interop@omg.org, javier@cnd.hp.com, janssen@parc.xerox.com
Subject: Re: Issue: typecodes too long
Content-Md5: 6T1jDbEUK5HOjQqpqAKUZA==

Hi Bill,

> Great idea!  What RFP are you going to add it to?

Thanks, my intention was that this could go into the CORE RTF, maybe 
interop RTF.  But no RFP: too long, for a relatively minor change 
(in my opinion).

> One possibility is to add another repository ID form, which would be
a
> URI of some form (IOR:?).  Use of tk_implicit would be restricted to
> types with these kinds of repository IDs.  If you need the full
> description of the typecode, you retrieve the URI.  Perhaps some
form of
> the IOR of an object which is a description of the type in an IR.

One way to do this would be to embed an IOR as an optional parameter
to
the typecode, and the IOR would be one for an IR object that could
reconstruct the full typecode.  But this would defeat most of the
advantages I was trying to achieve: smaller typecodes ...

Javier


Return-Path: <janssen@parc.xerox.com>
Date: Thu, 18 Jun 1998 12:56:20 PDT
Sender: Bill Janssen <janssen@parc.xerox.com>
From: Bill Janssen <janssen@parc.xerox.com>
To: interop@omg.org, javier@cnd.hp.com, janssen@parc.xerox.com,
        Javier Lopez-Martin <javier@cnd.hp.com>
Subject: Re: Issue: typecodes too long
References: <199806181836.AA121365003@ovdm40.cnd.hp.com>

Unfortunately, it's too big a (good) change for an RTF.

Bill

Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Thu, 18 Jun 1998 13:52:16 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Javier Lopez-Martin <javier@cnd.hp.com>
CC: interop@omg.org, issues@omg.org
Subject: Re: Issue: typecodes too long
References: <199806181548.AA091694937@ovdm40.cnd.hp.com>

Javier Lopez-Martin wrote:
> Specifically, I would propose to use a separate TCKind value (say,
> tk_implicit
> or tk_compact, or tk_opaque ... whatever feels better), encoded as
> follows:
> 
> TCKind           Integer Value    Type     Parameters
>
> ----------------------------------------------------------------------------
> tk_implicit      23               complex  string (repository ID)
>
> ----------------------------------------------------------------------------
> 
> The Repository Id is mandatory, and corresponds EXACTLY to the rep
> id of
> the type being transmitted.  This type could be one of tk_struct,
> tk_union,
> tk_enum (I'm not sure about tk_alias, tk_except and tk_objref).
> 
> Therefore, this only applies to the above mentioned types (struct,
> union
> and enum).  The reason is that other types are not required to have
> repository ids, and that their typecode encoding is not so verbose
> (therefore, the gains are much smaller, if any).
> 
> How to enable the use of this compact types is another issue, that I
> have
> not completely thought about yet.  Any suggestions in this direction
> are really welcome.  The options I have been considering up to now
> are
> somewhat related to the proposals made in conjunction to the
> typecode
> equality/equivalence proposal, and also potentially some
> modification
> to GIOP to allow for negotiation of this type of information.
> Anyhow, as I said, suggestions are very welcome!

It's a nice idea, but unless we can figure out how to negotiate the
usage of this new typecode value, it won't fly.  In particular, this
pretty much requires the availability of the Interface Repository on
the
receiving side, so unless that side can inform the sender that it
doesn't have the IR, we would have a major interoperability issue.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <javier@cnd.hp.com>
Date: Thu, 18 Jun 1998 15:40:45 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: javier@cnd.hp.com, jon@floorboard.com
Subject: Re: Issue: typecodes too long
Cc: interop@omg.org, issues@omg.org
Content-Md5: RpBLZax2i++ogJpbSp22Kg==

Hi Jon,

> Javier Lopez-Martin wrote:
> > Specifically, I would propose to use a separate TCKind value (say,
> 
tk_implicit
> > or tk_compact, or tk_opaque ... whatever feels better), encoded as
> follows:
> > 
> > TCKind           Integer Value    Type     Parameters
> >
>
> ----------------------------------------------------------------------------
> > tk_implicit      23               complex  string (repository ID)
> >
>
> ----------------------------------------------------------------------------
> > 
> > The Repository Id is mandatory, and corresponds EXACTLY to the rep
> id of
> > the type being transmitted.  This type could be one of tk_struct,
> tk_union,
> > tk_enum (I'm not sure about tk_alias, tk_except and tk_objref).
> > 
> > Therefore, this only applies to the above mentioned types (struct,
> union
> > and enum).  The reason is that other types are not required to
> have
> > repository ids, and that their typecode encoding is not so verbose
> > (therefore, the gains are much smaller, if any).
> > 
> > How to enable the use of this compact types is another issue, that
> I have
> > not completely thought about yet.  Any suggestions in this
> direction
> > are really welcome.  The options I have been considering up to now
> are
> > somewhat related to the proposals made in conjunction to the
> typecode
> > equality/equivalence proposal, and also potentially some
> modification
> > to GIOP to allow for negotiation of this type of information.
> > Anyhow, as I said, suggestions are very welcome!
> 
> It's a nice idea, but unless we can figure out how to negotiate the
> usage of this new typecode value, it won't fly.  In particular, this
> pretty much requires the availability of the Interface Repository on
> the
> receiving side, so unless that side can inform the sender that it
> doesn't have the IR, we would have a major interoperability issue.

No, the original idea is that the type information should be available
by some other mechanism (maybe application specific, maybe the IR was
available ...).

What I would like is to be able to receive the value with this
typecode,
and then extract it to a typed variable, of the right type.  The
extraction
could be safe, because we would have defined the
TypeCode::equivalent()
operation to take this into account (the comparison would take into
account the repository id only in case tk_implicit is involved).
On the sending side, this could be explicit: the any would be sent
saying something like:

ppc->push(any_var->implicit());

If the receiving side is unable to process, then some kind of error
would
be produced (in particular, the extraction would fail).

Other way to do this would be via the IORs: we could add a profile
stating whether an object is willing to accept anys in this format or
not;
by default, it does not (to keep backwards compatibility).

More thoughts?

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <stiller@franz.com>
Date: Thu, 18 Jun 1998 16:28:30 -0700
From: Lewis Stiller <stiller@franz.com>
To: javier@cnd.hp.com
CC: jon@floorboard.com, interop@omg.org
Subject: Re: Issue: typecodes too long

We made a similar suggestion some time ago although Javier's proposal
is more detailed and well thought-out. We support Javier's
proposal. 

>  From: Javier Lopez-Martin <javier@cnd.hp.com>
>  Content-Md5: RpBLZax2i++ogJpbSp22Kg==
>  
>  Hi Jon,
>  
>  > Javier Lopez-Martin wrote:
>  > > Specifically, I would propose to use a separate TCKind value
>  (say, 
>  tk_implicit
>  > > or tk_compact, or tk_opaque ... whatever feels better), encoded
>  as follows:
>  > > 
>  > > TCKind           Integer Value    Type     Parameters
>  > >
>
>  ----------------------------------------------------------------------------
>  > > tk_implicit      23               complex  string (repository
>  ID)
>  > >
>
>  ----------------------------------------------------------------------------
>  > > 
>  > > The Repository Id is mandatory, and corresponds EXACTLY to the
>  rep id of
>  > > the type being transmitted.  This type could be one of
>  tk_struct, tk_union,
>  > > tk_enum (I'm not sure about tk_alias, tk_except and tk_objref).
>  > > 
>  > > Therefore, this only applies to the above mentioned types
>  (struct, union
>  > > and enum).  The reason is that other types are not required to
>  have
>  > > repository ids, and that their typecode encoding is not so
>  verbose
>  > > (therefore, the gains are much smaller, if any).
>  > > 
>  > > How to enable the use of this compact types is another issue,
>  that I have
>  > > not completely thought about yet.  Any suggestions in this
>  direction
>  > > are really welcome.  The options I have been considering up to
>  now are
>  > > somewhat related to the proposals made in conjunction to the
>  typecode
>  > > equality/equivalence proposal, and also potentially some
>  modification
>  > > to GIOP to allow for negotiation of this type of information.
>  > > Anyhow, as I said, suggestions are very welcome!
>  > 
>  > It's a nice idea, but unless we can figure out how to negotiate
>  the
>  > usage of this new typecode value, it won't fly.  In particular,
>  this
>  > pretty much requires the availability of the Interface Repository
>  on the
>  > receiving side, so unless that side can inform the sender that it
>  > doesn't have the IR, we would have a major interoperability
>  issue.
>  
>  No, the original idea is that the type information should be
>  available
>  by some other mechanism (maybe application specific, maybe the IR
>  was
>  available ...).
>  
>  What I would like is to be able to receive the value with this
>  typecode,
>  and then extract it to a typed variable, of the right type.  The
>  extraction
>  could be safe, because we would have defined the
>  TypeCode::equivalent()
>  operation to take this into account (the comparison would take into
>  account the repository id only in case tk_implicit is involved).
>  On the sending side, this could be explicit: the any would be sent
>  saying something like:
>  
>  ppc->push(any_var->implicit());
>  
>  If the receiving side is unable to process, then some kind of error
>  would
>  be produced (in particular, the extraction would fail).
>  
>  Other way to do this would be via the IORs: we could add a profile
>  stating whether an object is willing to accept anys in this format
>  or not;
>  by default, it does not (to keep backwards compatibility).
>  
>  More thoughts?
>  
>  Javier Lopez-Martin
>  Hewlett-Packard Co
>  javier@cnd.hp.com
>  

Dr. Lewis Stiller     Franz Inc.
1995 University Ave., Berkeley, CA  94704   
stiller@franz.com     

Return-Path: <dud@dstc.edu.au>
To: interop@omg.org, issues@omg.org
Subject: Re: Issue: typecodes too long 
Date: Mon, 22 Jun 1998 13:56:09 +1000
From: Keith Duddy <dud@dstc.edu.au>

Hi all,

I'm glad to see that typecodes are back on the agenda...

Stephen Crawley and I have a paper accepted to Middleware'98 about the
IR and typecodes (most of the credit goes to Stephen). We thought that
it might add something to the debate. After it is officially published
(Sept) we will adapt it for an OMG-specific audience and publish a
green paper. For now you can have an advanced look via:

http://www.dstc.edu.au/AU/staff/crawley/papers/Middleware98.ps

We address issues with TypeCode::is_equivalent, and propose a new
RepositoryId format, as well as a mechanism for determining
equivalences between these and existing RepositoryIds.

cheers,
K
--
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Keith Duddy  ::  dud@dstc.edu.au  ::
http://www.dstc.edu.au/AU/staff/dud
	    CRC for Distributed Systems Technology (DSTC)
       Gehrmann Labs, University of Queensland, 4072, Australia
	    ph: +61 7 336 5 4310  ::  fx: +61 7 336 5 4311
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
 2nd edition of my book ``Java Programming with CORBA'' now in
bookshops 
      >>>       http://www.wiley.com/compbooks/vogel         <<<
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Return-Path: <javier@cnd.hp.com>
Date: Mon, 22 Jun 1998 10:15:46 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: interop@omg.org, issues@omg.org, dud@dstc.edu.au
Subject: Re: Issue: typecodes too long
Content-Md5: LtbpJRoG52N93z9DKoUo9Q==

Hi Keith,

> I'm glad to see that typecodes are back on the agenda...
> 
> Stephen Crawley and I have a paper accepted to Middleware'98 about
> the
> IR and typecodes (most of the credit goes to Stephen). We thought
> that
> it might add something to the debate. After it is officially
> published
> (Sept) we will adapt it for an OMG-specific audience and publish a
> green paper. For now you can have an advanced look via:
> 
> http://www.dstc.edu.au/AU/staff/crawley/papers/Middleware98.ps
> 
> We address issues with TypeCode::is_equivalent, and propose a new
> RepositoryId format, as well as a mechanism for determining
> equivalences between these and existing RepositoryIds.

I read the paper with a lot of interest, and I found it very useful,
and in general, accurate (although some things have changed since the
last resolutions from the core/interop RTF on making the RepID
mandatory
always ...).

However, I think that the discussion in the paper, being very useful
and
interesting, does not address the fundamental issue that I raised: the
size of on-the-wire typecodes is way too big.  In fact, with some of
your
proposal, the problem would not be any better, but worse (I believe, 
I haven't done a complete evaluation of your proposals, but so it
seems).

What I am proposing is a low-impact way to send the repository id
(and only the repository id) as the typecode within an any.  The
application
will be responsible to get the rest of the typecode information (if
needed),
either by application-defined mechanisms, or from an IFR.

Your proposal for different RepID formats is orthogonal to this one
...

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <crawley@dstc.edu.au>
To: Javier Lopez-Martin <javier@cnd.hp.com>
cc: interop@omg.org, issues@omg.org, dud@dstc.edu.au,
crawley@dstc.edu.au
Subject: Re: Issue: typecodes too long 
Date: Tue, 23 Jun 1998 10:50:19 +1000
From: Stephen Crawley <crawley@dstc.edu.au>


> I read the paper with a lot of interest, and I found it very useful,
> and in general, accurate (although some things have changed since
> the
> last resolutions from the core/interop RTF on making the RepID
> mandatory
> always ...).
> 
> However, I think that the discussion in the paper, being very useful
> and
> interesting, does not address the fundamental issue that I raised:
> the
> size of on-the-wire typecodes is way too big.  In fact, with some of
> your
> proposal, the problem would not be any better, but worse (I believe,
> 
> I haven't done a complete evaluation of your proposals, but so it
> seems).
> 
> What I am proposing is a low-impact way to send the repository id
> (and only the repository id) as the typecode within an any.  The
> application
> will be responsible to get the rest of the typecode information (if
> needed),
> either by application-defined mechanisms, or from an IFR.
> 
> Your proposal for different RepID formats is orthogonal to this one
> ...

Javier,

As you will be aware having read our paper, the theme is that CORBA's
runtime type system has holes, one of which is that a given
RepositoryId
string can mean different things.  Currently, this only seriously
impacts 
on the use of object references.  The type safety of data values
within
Anys is currently unaffected.

A key problem with your proposal is that, since it uses Repository Ids

for type checking of data types, it actually increases the scope for
damage
when the Repository Ids are mismanaged.  In particular, if the ORBs
that 
encoded and decoded an Any do not "agree" on the IFR definition of the
Any's type, the decoding is liable to fail.  In some circumstances the

decoding ORB won't even be able to tell, and will deliver some random
garbage to the client program.

However, if our RepositoryId proposal was deployed, it should
eliminate
the problem of an id string meaning different types to different IFRs,
and hence different ORBs.  Clearly, our proposals are NOT orthogonal
if we consider overall type safety!

Potentially, a second problem is that your proposal relies on the
decoding 
ORB being able to retrieve the type definition from the IFR.  If the
decoding ORB can't find the definition, it is potentially stuffed!
This could happen because:

  a)  the decoding ORB's IFR is down
  b)  the decoding ORB's IFR hasn't been populated with the type's
interface
  c)  the decoding ORB's IFR contains a different version of the
interface

[These problems are all fixed, or easier to address in the context of
Keith and my paper's proposed IFR extensions.]

Your idea of negotiating with the sender of the Any about transmission
of a full TypeCode won't always work because the sender may not have
that information.  For example,

  a)  the sender may not have done the encoding; i.e. the Any may have
      come from somewhere else
  b)  the sender may no longer have the type definition; i.e. the
      IFR definitions have been deleted / lost since the Any was
      encoded.

With the current CORBA specification of IFRs and Repository Ids,
your proposal has (in my opinion) significant potential impact
on reliability in a number of situations.  Admittedly, in most of
these, "mismanagement" of the ORB is also contributing factor. IMO,
it would be a bad idea to introduce a performance optimisation into
the CORBA core at the expense of type safety and increased
vulnerability
to network and server failure.
      
That having been said, I think your proposal is worth pursuing.  
But not just yet.

-- Steve


Return-Path: <javier@cnd.hp.com>
Date: Mon, 22 Jun 1998 19:22:58 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: javier@cnd.hp.com, crawley@dstc.edu.au
Subject: Re: OMG:Re: Issue: typecodes too long
Cc: interop@omg.org, issues@omg.org, dud@dstc.edu.au,
crawley@dstc.edu.au
Content-Md5: KLpsPAd4TLcD1H03PtreJA==

Stephen,

> As you will be aware having read our paper, the theme is that
CORBA's
> runtime type system has holes, one of which is that a given
RepositoryId
> string can mean different things.  Currently, this only seriously
impacts 
> on the use of object references.  The type safety of data values
within
> Anys is currently unaffected.

My proposal would only affect data values within anys.

> A key problem with your proposal is that, since it uses Repository
Ids 
> for type checking of data types, it actually increases the scope for
damage
> when the Repository Ids are mismanaged.  In particular, if the ORBs
that 
> encoded and decoded an Any do not "agree" on the IFR definition of
the
> Any's type, the decoding is liable to fail.  In some circumstances
the 
> decoding ORB won't even be able to tell, and will deliver some
random
> garbage to the client program.

Well, I think that the key here is "mismanaged": if the repository ids
are not managed according to the spec (that is, the same repository id
identifies a type, and not having the same repository id means the
types
are different, although maybe equivalent) then my proposal has a
problem.
But so does CORBA in general: a bunch of things would stop working,
both
within mono-ORB environments, and multi-ORB environments (this case
being
much worse).  So, in my opinion, I'm not introducing any new
requirement
in this respect; only use the repository ids the way they were
designed.

> However, if our RepositoryId proposal was deployed, it should
eliminate
> the problem of an id string meaning different types to different
IFRs,
> and hence different ORBs.  Clearly, our proposals are NOT orthogonal
> if we consider overall type safety!

Yes it is orthogonal, because currently the problem appears when there
is
mismanagement of repository ids.  Your proposal would defend against
this
misuse cases, and mine relies on the fact that there is no misuse.
I would say they are complementary, but orthogonal: your proposal does
not
need mine, and mine doesn't need yours.

> Potentially, a second problem is that your proposal relies on the
decoding 
> ORB being able to retrieve the type definition from the IFR.  If the
> decoding ORB can't find the definition, it is potentially stuffed!
> This could happen because:
> 
>   a)  the decoding ORB's IFR is down
>   b)  the decoding ORB's IFR hasn't been populated with the type's
interface
>   c)  the decoding ORB's IFR contains a different version of the
interface

There is no requirement in my proposal that the IFR be needed on the
receiving
side.  What I said is that the type information may be carried by
application
specified means, and therefore available to the application regardless
of
the availability of the IFR.  Another possible scenario is where both
ORBs
have agreed on the IFR contents (or on a fixed set of types being
carried
in the anys), that might be affected in your scenarios; however, this
would
be violating the premises under which this optimization could be used.

> [These problems are all fixed, or easier to address in the context
of
> Keith and my paper's proposed IFR extensions.]

If I understood the paper correctly, it relies on the availability of
an IFR, only the original one that encoded the repository id.  So, the
same problems might happen if the remote IFR is not available.
Besides, carrying an IOR in the typecode would be expensive (in terms
of bandwidth) and also accessing a remote IFR could mean a performance
penalty.  Yes, it could improve the relyability, but at a very high
price in performance and bandwidth.  I might not have understood your
proposal correctly, so please, correct me if I am wrong.

> Your idea of negotiating with the sender of the Any about
transmission
> of a full TypeCode won't always work because the sender may not have
> that information.  For example,
> 
>   a)  the sender may not have done the encoding; i.e. the Any may
have
>       come from somewhere else
>   b)  the sender may no longer have the type definition; i.e. the
>       IFR definitions have been deleted / lost since the Any was
>       encoded.

The sender must have the full information always, or else should not
have
accepted the compacted any in the first place (when receiving it from
the
place where it was originally encoded).

> With the current CORBA specification of IFRs and Repository Ids,
> your proposal has (in my opinion) significant potential impact
> on reliability in a number of situations.  Admittedly, in most of
> these, "mismanagement" of the ORB is also contributing factor. IMO,
> it would be a bad idea to introduce a performance optimisation into
> the CORBA core at the expense of type safety and increased
> vulnerability
> to network and server failure.

What I think is that we are looking at very rare cases, most if not
all of
them misuses.  Granted, in those misuse cases, the application won't
be 
able to do anything but loose the information/raise an exception.
But the vast majority of uses of any are somewhat predictable, because
there
is a limited number of types being carried in the any, or because the
application has access to type information by other means (for
example,
in name-value pairs, the name almost always determines the type, and
if it
does not, then there is a limited number of types that might go
there).
Improving performance in the broader case is, in my opinion, very
important
if we want to make CORBA (more) successful in the real bussiness
world.
Looking for perfection is a good goal, but the good of the majority
should
be considered as well ...

> That having been said, I think your proposal is worth pursuing.  
> But not just yet.

I don't think the proposal is too dramatic.  Granted, it could have
significant
impact in end user applications and performance, but in terms of what
it
requires from ORBs (and specifically, from ORB vendors) it is very
moderate,
with a definite gain for users ...

I still think it is worth trying to get this into the CORBA 2.3 spec.

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <crawley@dstc.edu.au>
To: Javier Lopez-Martin <javier@cnd.hp.com>
cc: crawley@dstc.edu.au, interop@omg.org, issues@omg.org,
dud@dstc.edu.au
Subject: Re: OMG:Re: Issue: typecodes too long 
Date: Tue, 23 Jun 1998 12:18:31 +1000
From: Stephen Crawley <crawley@dstc.edu.au>


Javier,

> > As you will be aware having read our paper, the theme is that
CORBA's
> > runtime type system has holes, one of which is that a given
RepositoryId
> > string can mean different things.  Currently, this only seriously
impacts 
> > on the use of object references.  The type safety of data values
within
> > Anys is currently unaffected.
> 
> My proposal would only affect data values within anys.

I'm quite aware of this.

> > A key problem with your proposal is that, since it uses Repository
Ids 
> > for type checking of data types, it actually increases the scope
for damage
> > when the Repository Ids are mismanaged.  In particular, if the
ORBs that 
> > encoded and decoded an Any do not "agree" on the IFR definition of
the
> > Any's type, the decoding is liable to fail.  In some circumstances
the 
> > decoding ORB won't even be able to tell, and will deliver some
random
> > garbage to the client program.
> 
> Well, I think that the key here is "mismanaged": if the repository
ids
> are not managed according to the spec (that is, the same repository
id
> identifies a type, and not having the same repository id means the
types
> are different, although maybe equivalent) then my proposal has a
problem.
> But so does CORBA in general: a bunch of things would stop working,
both
> within mono-ORB environments, and multi-ORB environments (this case
being
> much worse).  So, in my opinion, I'm not introducing any new
requirement
> in this respect; only use the repository ids the way they were
designed.

As we argued in our paper, the mis-management issue is going to be
increasingly significant.  As CORBA starts to be used between
different
domains of administration, within and between organisations, both the
likelihood and the impact of mistakes involving Repository Ids is
likely
to increase significantly.  This coupled with the facts that:

   a)  CORBA has no support for federation of IFRs, and

   b)  CORBA has inadequate support for interface versioning

make me think that this is going to be a significant problem that
could
give CORBA a bad name ... if it is not fixed soonish.

> > However, if our RepositoryId proposal was deployed, it should
eliminate
> > the problem of an id string meaning different types to different
IFRs,
> > and hence different ORBs.  Clearly, our proposals are NOT
orthogonal
> > if we consider overall type safety!
> 
> Yes it is orthogonal, because currently the problem appears when
there is
> mismanagement of repository ids.  Your proposal would defend against
this
> misuse cases, and mine relies on the fact that there is no misuse.
> I would say they are complementary, but orthogonal: your proposal
does not
> need mine, and mine doesn't need yours.

No.  Not orthogonal.  Without our proposal in place first, your
proposal
makes CORBA less type-safe in some respects.

> > Potentially, a second problem is that your proposal relies on the
decoding 
> > ORB being able to retrieve the type definition from the IFR.  If
the
> > decoding ORB can't find the definition, it is potentially stuffed!
> > This could happen because:
> > 
> >   a)  the decoding ORB's IFR is down
> >   b)  the decoding ORB's IFR hasn't been populated with the type's
interface
> >   c)  the decoding ORB's IFR contains a different version of the
interface
> 
> There is no requirement in my proposal that the IFR be needed on the
receiving
> side.  What I said is that the type information may be carried by
application
> specified means, and therefore available to the application
regardless of
> the availability of the IFR.  Another possible scenario is where
both ORBs
> have agreed on the IFR contents (or on a fixed set of types being
carried
> in the anys), that might be affected in your scenarios; however,
this would
> be violating the premises under which this optimization could be
used.

OK.  I maybe understood this.  

> > [These problems are all fixed, or easier to address in the context
of
> > Keith and my paper's proposed IFR extensions.]
> 
> If I understood the paper correctly, it relies on the availability
of
> an IFR, only the original one that encoded the repository id.  So,
the
> same problems might happen if the remote IFR is not available.

In the worst case, that is true.  However, we would expect a
client ORB to use a local IFR rather than the originating one.
We would expect a decent IFR implementation to maintain a persistent
cache of repository ids and definitions, and also to provide tools
for synchronising contents with other IFRs with which they are
formally
federated. 

> Besides, carrying an IOR in the typecode would be expensive (in
terms
> of bandwidth) and also accessing a remote IFR could mean a
performance
> penalty.  Yes, it could improve the relyability, but at a very high
> price in performance and bandwidth.  I might not have understood
your
> proposal correctly, so please, correct me if I am wrong.

First, we have proposed a number of possible representations for the 
new style RepositoryIds, and there may be other alternatives.  Only 
some of the representations we proposed contained an IOR.  Others
rely on things like embedded IP addresses, DNS names or special
purpose IFR location mechanims.

Second, we didn't spell it out, but we anticipate that a half decent
ORB would treat IORs for their interface repository product as a 
special case.  In particular, they should take steps to ensure that
they weren't bloated with monstrous object keys, profiles and the
like.

So, while bandwidth is a problem in theory, it shouldn't be if
our proposal is developed in the right direction.

> > Your idea of negotiating with the sender of the Any about
transmission
> > of a full TypeCode won't always work because the sender may not
have
> > that information.  For example,
> > 
> >   a)  the sender may not have done the encoding; i.e. the Any may
have
> >       come from somewhere else
> >   b)  the sender may no longer have the type definition; i.e. the
> >       IFR definitions have been deleted / lost since the Any was
> >       encoded.
> 
> The sender must have the full information always, or else should not
have
> accepted the compacted any in the first place (when receiving it
from the
> place where it was originally encoded).

This doesn't address point b)

> > With the current CORBA specification of IFRs and Repository Ids,
> > your proposal has (in my opinion) significant potential impact
> > on reliability in a number of situations.  Admittedly, in most of
> > these, "mismanagement" of the ORB is also contributing
> factor. IMO,
> > it would be a bad idea to introduce a performance optimisation
> into
> > the CORBA core at the expense of type safety and increased
> vulnerability
> > to network and server failure.
> 
> What I think is that we are looking at very rare cases, most if not
> all of
> them misuses.  

I doubt that they are really that rare.  Just not talked about much.
Also, I predict that they will become more common as people build
larger and larger CORBA systems, and as they try to link them to
systems run by other people.

> Granted, in those misuse cases, the application won't be 
> able to do anything but loose the information/raise an exception.

Or in the worst case, continue running obliviously with garbage
information.

> But the vast majority of uses of any are somewhat predictable,
because there
> is a limited number of types being carried in the any, or because
the
> application has access to type information by other means (for
example,
> in name-value pairs, the name almost always determines the type, and
if it
> does not, then there is a limited number of types that might go
there).

All I can say is, take a look at the MOF's "Reflective" interfaces :-)

> Improving performance in the broader case is, in my opinion, very
important
> if we want to make CORBA (more) successful in the real bussiness
world.

I'm sorry, but performance at the expense of reliability is not what
users really need if they are trying to build big systems.

> I still think it is worth trying to get this into the CORBA 2.3
spec.

That's too soon IMO.

But if the OMG is going down this route anyway, the Task Force
membership 
ought to be aware that this is happening.  There are a lot of users
who
wouldn't like this little "surprise".

-- Steve


Return-Path: <loewis@hp832.informatik.hu-berlin.de>
Date: Tue, 23 Jun 1998 11:45:29 +0200
From: "Martin v. Loewis" <loewis@informatik.hu-berlin.de>
To: crawley@dstc.edu.au
CC: interop@omg.org
Subject: Re: OMG:Re: Issue: typecodes too long
References:  <199806230218.MAA24112@piglet.dstc.edu.au>

> As we argued in our paper, the mis-management issue is going to be
> increasingly significant.  As CORBA starts to be used between
> different
> domains of administration, within and between organisations, both
> the
> likelihood and the impact of mistakes involving Repository Ids is
> likely
> to increase significantly.  

How much of a problem is this, really? If you write your types in IDL
(as you always should), they all have a well-defined repository ID,
even if the repository is not accessible. Problems really start only
if pragmas are used in inconsistent ways. I don't see why CORBA should
be able to detect these errors at run time, if the problem really is
in the configuration management.

> This coupled with the facts that:
> 
>    a)  CORBA has no support for federation of IFRs, and

Are you saying it is impossible to implement a distributed IFR? I
would not think so. Furthermore, why does that matter? Repository IDs
have to be unique with and without IFR.

>    b)  CORBA has inadequate support for interface versioning

It seems possible to implement any kind of versioning scheme I can
think of on top of the existing mechanism. Why is that inadequate?

Martin


Return-Path: <crawley@dstc.edu.au>
To: "Martin v. Loewis" <loewis@informatik.hu-berlin.de>
cc: crawley@dstc.edu.au, interop@omg.org
Subject: Re: OMG:Re: Issue: typecodes too long 
Date: Wed, 24 Jun 1998 10:27:43 +1000
From: Stephen Crawley <crawley@dstc.edu.au>

> > As we argued in our paper, the mis-management issue is going to be
> > increasingly significant.  As CORBA starts to be used between
> different
> > domains of administration, within and between organisations, both
> the
> > likelihood and the impact of mistakes involving Repository Ids is
> likely
> > to increase significantly.  
> 
> How much of a problem is this, really? 

I've been bitten on more than one occasion by one variant of the
problem.
If seen it (or at least the symptoms) reported in comp.object.corba a
few
times.  This variant arises when you make a change to your IDL without

changing the pragmas, then try to use an old client against a new
server 
or vice versa.  Sometimes you get a marshalling failure, sometimes
worse,
depending on ORB quality and luck.

> If you write your types in IDL
> (as you always should), they all have a well-defined repository ID,
> even if the repository is not accessible. Problems really start only
> if pragmas are used in inconsistent ways. I don't see why CORBA
> should
> be able to detect these errors at run time, if the problem really is
> in the configuration management.

Well, the approach that Keith Duddy and I proposed in our paper solves
the problem.   It does this by getting rid of the need for those pesky
pragmas in the first place.   So I would dispute your assertion that
the problem is unsolvable except by "good" configuration management.

It is the failure of manual configuration management that causes
the problem.  And I find it very hard to believe that this is not
going to get a lot worse as users build larger CORBA systems.

> > This coupled with the facts that:
> > 
> >    a)  CORBA has no support for federation of IFRs, and
> 
> Are you saying it is impossible to implement a distributed IFR? I
> would not think so. 

Federation is not impossible.  Indeed the IFR spec talks about it.
However standardised support for IFR federation is neither defined
nor mandated.  And in practice, ORB vendors don't support it.  

> Furthermore, why does that matter? Repository IDs
> have to be unique with and without IFR.

Yes they do.  [Pedantically:  this is not true for local ids.  Also
you need to be precise about what "unique" means.  The invariant
currently required is 'equal repository ids <=> equivalent types'].  
However, without federation of IFR's there is not much prospect of 
detecting non-unique Repository Ids until it is too late.

> >    b)  CORBA has inadequate support for interface versioning
> 
> It seems possible to implement any kind of versioning scheme I can
> think of on top of the existing mechanism. Why is that inadequate?

Once again, it is inadequate because the semantics of versioning is
unspecified and support for versioning is not mandated.  In practice,
ORB vendors don't provide anything in this area, and users must solve
the interface versioning problem themselves ... or ignore it.

-- Steve

Return-Path: <RolandT@discreet.com>
Sender: raz@discreet.com
Date: Thu, 25 Jun 1998 21:54:01 -0300
From: Roland Turner <RolandT@discreet.com>
Organization: -
To: interop@omg.org, issues@omg.org
Subject: Re: Issue: typecodes too long
References: <199806230050.KAA19088@piglet.dstc.edu.au>

A meta-thought on over-long typecodes.

This is one of a number of related problems. Michi Hemming put up a
proposal some time ago for more compact IORs. Bill Jansenn has
repeatedly made disparaging remarks about the sheer number of zeros
that
IIOP shuffles about.

It is clear that, in some cases, the size of IIOP (or rather GIOP)
messages is a concern. GIOP (or rather CDR) errs on the size of
facilitiating rapid [un]marshalling by ensuring correct alignment, at
the expense of lots of bandwidth "waste".

This is important, when IIOP is being used over switched > 100Mbps
fabric with sub-millisecond latency, it should go fast and, as far as
possible shouldn't be bogged down with bandwidth-optimal marshalling.

Conversely, many applications require communication over low-bandwidth
networks.

CORBA needs to cover both.

It would seem to me that any proposal to increase the complexity of
CORBA to cut down the size of its communication should go the whole
hog
and define a CDR which is miserly with bandwidth use, not just clean
up
on or two areas. ORBs should then be free to negotiate - those with
lots
of bandwidth between them but worried about lack of CPU can opt for
the
current approach, those wanting to use a slow network can opt for the
more CPU intensive but bandwidth miserly approach. If we continue down
the one size fits all approach, we'll end up with a protocol that is
still far from optimal on slow networks, but that saddles ORBs on very
fast LANs with lots of CPU overhead.

Co-incidentally, such an approach provides for transparent migration
to
the faster protocol - ORBs which don't offer it will continue to work
correctly, and it provides another opportunity for QoS trade-offs.


I have neither the time nor the energy to get deeply into this, but
it's
an observaton that no-one else appears to have made, so I thought it
worth raising. I won't say any more on it (please don't follow up with
rational argument, I'll ignore it), but if others think that the idea
is
good, perhaps a different approach to the "CDR is fat" problems can be
taken.

Incidentally, I also recognise that the current discussion is a little
more focussed and in some respects, in a slightly different scope. My
point was simply to propose a broader viewpoint.

- Raz

Return-Path: <javier@cnd.hp.com>
Date: Tue, 30 Jun 1998 20:49:18 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: orb_revision@omg.org, interop@omg.org
Subject: New compact TypeCode proposal
Content-Md5: ycXaMUDSJ/hb8uxv1AM6+Q==

Hi all,

This is a resend of issue 1531 (Typecodes too long), currently
assigned
to Interop but also relevant here in ORB revision.

This builds on the proposal that Jon Biggar has just made on typecode
comparison, and relies on some of his proposals to solve some issues.

Here it goes:

A new typecode kind is introduced, called "tk_implicit" (subject to
name 
change if appropriate). This typecode would be encoded as a complex 
typecode, with a single parameter, the Repository ID (as follows):

TCKind           Integer Value    Type     Parameters
----------------------------------------------------------------------------
tk_implicit      ??               complex  string (repository ID)
----------------------------------------------------------------------------

This typecode could be used as a replacement for a tk_struct,
tk_union,
tk_enum or tk_except typecode, where the typecode may be reconstructed
by application defined mechanisms or by accessing a properly
configured
interface repository.

The semantics of this typecode are that the Repository ID uniquely
identifies
the corresponding type, and therefore it is enough to reconstruct a
full
typecode if needed, or to verify type equivalence.  This proposal does

not try to address the situations that might arise in cases where both
ends of a communication do not agree on the Repository IDs for a type,
or more importantly, think that two different types have the same 
Repository ID; these situations are considered a mis-configuration 
problem, to be solved by other means.

The Repository Id is mandatory, and is EXACTLY the same as the rep id
of
the original (non-implicit) type.  This type could be one of
tk_struct,
tk_union, tk_enum or tk_except.  Specifically, usage of this mechanism
is disallowed for tk_objref (no gains, and might create problems in
other 
areas) and for tk_alias (not significant in comparisons); if the
aliases 
are important for an application, they might be applied to this
typecode 
the same way they would to any other typecode.

Another operation, besides those defined in the proposal on typecode
equivalence made by Jon Bigger, would be added to the TypeCode pseudo
interface:

> 10.  Define two new operations on TypeCodes:
> 
> pseudo interface TypeCode {
>     ...
>     TypeCode    get_compact_typecode();
>     TypeCode    get_canonical_typecode();

    TypeCode   get_implicit_typecode();

>     ...
> };
> 
> TypeCode::get_compact_typecode() strips out all optional name &
> member
> name fields.  TypeCode::get_canonical_typecode() looks up the
> TypeCode
> in the Interface Repository and returns the fully fleshed-out
> TypeCode.
> If the top level TypeCode does not contain a RepositoryId, (such as
> array and sequence TypeCodes, or TypeCodes from older ORBs), then a
> new
> TypeCode is constructed by calling
> TypeCode::get_canonical_typecode() on
> each member TypeCode of the original TypeCode.  If the Interface
> Repository is unavailable, or does not contain the information
> needed,
> the call raises the INTF_REPOS system exception.

TypeCode::get_implicit_typecode would return the tk_implicit
equivalent
of the current typecode, if it is the appropriate kind (tk_struct,
tk_union, tk_enum or tk_except), ignoring aliases.  If the base
typecode
is not one of the appropriate kinds, then a copy of the same typecode
is
returned. Whatever tk_alias typecodes may be present in the original 
typecode are preserved in the (new) implicit typecode.

An application willing to use this typecode would have to explicitly 
use this typecode (by means that would be language mapping dependent)
whenever sending a typecode accross an interface (either directly, or
within a CORBA::Any).  The receiving side might reject the value with
a MARSHALL exception if this new typecode kind is unknown, or if the
repository id/type is not recognized by the ORB (and required); if 
this is the case, the sender should be ready to resend the operation, 
this time with a full typecode.

With this current proposal:
- There is NO need to make any changes to GIOP
- It is fully upwards compatible with the current version of CORBA
- The changes are VERY minor: one more typecode kind, with its
  corresponding on-the-wire encoding, and one more operation on the
  typecode pseudo interface
- If you don't want to use it, you don't have to pay for it (the extra
  coding that might be required), but if you use it, you might get
  significant improvements

In the future, if GIOP changes are attempted, some kind of ORB
negotiation
could be used to do automatic (ORB level) encoding changes, and those
would
be fully upwards compatible with the current proposal.  But these
changes
are not needed now (in my opinion, anyway).

I hope this is acceptable, and does not have any holes :-).

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <paulk@noblenet.com>
Date: Wed, 01 Jul 1998 09:11:39 -0400
From: Paul H Kyzivat <paulk@noblenet.com>
Organization: NobleNet
To: Javier Lopez-Martin <javier@cnd.hp.com>
CC: orb_revision@omg.org, interop@omg.org
Subject: Re: New compact TypeCode proposal
References: <199807010249.AA153641358@ovdm40.cnd.hp.com>

I have embedded comments below

Javier Lopez-Martin wrote:
> 
> Hi all,
> 
> This is a resend of issue 1531 (Typecodes too long), currently
> assigned
> to Interop but also relevant here in ORB revision.
> 
> This builds on the proposal that Jon Biggar has just made on
> typecode
> comparison, and relies on some of his proposals to solve some
> issues.
> 
> Here it goes:
> 
> A new typecode kind is introduced, called "tk_implicit" (subject to
> name
> change if appropriate). This typecode would be encoded as a complex
> typecode, with a single parameter, the Repository ID (as follows):
> 
> TCKind           Integer Value    Type     Parameters
> 
>
> ----------------------------------------------------------------------------
> tk_implicit      ??               complex  string (repository ID)
> 
>
> ----------------------------------------------------------------------------
> 
> This typecode could be used as a replacement for a tk_struct,
> tk_union,
> tk_enum or tk_except typecode, where the typecode may be
> reconstructed
> by application defined mechanisms or by accessing a properly
> configured
> interface repository.

Do you mean "application defined mechanisms", or do you mean "ORB
specific mechanisms"?

It is usually/often the ORB that needs this information (e.g. for
marshalling) so application defined mechanisms aren't especially
helpful.

> 
> The semantics of this typecode are that the Repository ID uniquely
> identifies
> the corresponding type, and therefore it is enough to reconstruct a
> full
> typecode if needed, or to verify type equivalence.  This proposal
> does
> not try to address the situations that might arise in cases where
> both
> ends of a communication do not agree on the Repository IDs for a
> type,
> or more importantly, think that two different types have the same
> Repository ID; these situations are considered a mis-configuration
> problem, to be solved by other means.
> 
> The Repository Id is mandatory, and is EXACTLY the same as the rep
> id
> of
> the original (non-implicit) type.  This type could be one of
> tk_struct,
> tk_union, tk_enum or tk_except.  Specifically, usage of this
> mechanism
> is disallowed for tk_objref (no gains, and might create problems in
> other
> areas) and for tk_alias (not significant in comparisons); if the
> aliases
> are important for an application, they might be applied to this
> typecode
> the same way they would to any other typecode.
> 
> Another operation, besides those defined in the proposal on typecode
> equivalence made by Jon Bigger, would be added to the TypeCode
> pseudo
> interface:
> 
> > 10.  Define two new operations on TypeCodes:
> >
> > pseudo interface TypeCode {
> >     ...
> >     TypeCode    get_compact_typecode();
> >     TypeCode    get_canonical_typecode();
> 
>     TypeCode   get_implicit_typecode();
> 
> >     ...
> > };
> >
> > TypeCode::get_compact_typecode() strips out all optional name &
> member
> > name fields.  TypeCode::get_canonical_typecode() looks up the
> TypeCode
> > in the Interface Repository and returns the fully fleshed-out
> TypeCode.
> > If the top level TypeCode does not contain a RepositoryId, (such
> as
> > array and sequence TypeCodes, or TypeCodes from older ORBs), then
> a
> new
> > TypeCode is constructed by calling
> TypeCode::get_canonical_typecode() on
> > each member TypeCode of the original TypeCode.  If the Interface
> > Repository is unavailable, or does not contain the information
> needed,
> > the call raises the INTF_REPOS system exception.
> 
> TypeCode::get_implicit_typecode would return the tk_implicit
> equivalent
> of the current typecode, if it is the appropriate kind (tk_struct,
> tk_union, tk_enum or tk_except), ignoring aliases.  If the base
> typecode
> is not one of the appropriate kinds, then a copy of the same
> typecode
> is
> returned. Whatever tk_alias typecodes may be present in the original
> typecode are preserved in the (new) implicit typecode.
> 
> An application willing to use this typecode would have to explicitly
> use this typecode (by means that would be language mapping
> dependent)
> whenever sending a typecode accross an interface (either directly,
> or
> within a CORBA::Any).  The receiving side might reject the value
> with
> a MARSHALL exception if this new typecode kind is unknown, or if the
> repository id/type is not recognized by the ORB (and required); if
> this is the case, the sender should be ready to resend the
> operation,
> this time with a full typecode.

Do you mean that the receiving ORB should only be prepared to
unmarshal
Any values containing this kind of typecode if it happens to have the
repository id handy? And that the sending ORB need not do anything
special about the MARSHAL exception - that it is up to the sending
application to retry the request if such an exception is received? If
so, then I think this results in an undesirable blurring of the
division
of responsibility between the ORB and the application program.

In any case, whether it is the ORB or the application that does the
retry, I don't think this works. What happens if a typecode is sent as
part of an OUT argument (or worse yet, as part of an exception
return)?
Then there is no way to learn that the recipient couldn't process it
and
retry.

So, I don't think this can be an optional feature, unless it is
negotiated between orbs. 

Also, while I would very much like to have compressed typecodes, I
don't
think this is the right way to get them. It raises the possibility
that
a receiving ORB, in the midst of doing marshalling, may need to
consult
an IR in order to get the information needed to finish marshalling. I
think this is a very undesirable state of affairs. 

I might change my position on this if the encoding of Any was changed
so
that it was possible to take an Any off the wire without understanding
the typecode.

> 
> With this current proposal:
> - There is NO need to make any changes to GIOP
> - It is fully upwards compatible with the current version of CORBA
> - The changes are VERY minor: one more typecode kind, with its
>   corresponding on-the-wire encoding, and one more operation on the
>   typecode pseudo interface
> - If you don't want to use it, you don't have to pay for it (the
> extra
>   coding that might be required), but if you use it, you might get
>   significant improvements
> 
> In the future, if GIOP changes are attempted, some kind of ORB
> negotiation
> could be used to do automatic (ORB level) encoding changes, and
> those
> would
> be fully upwards compatible with the current proposal.  But these
> changes
> are not needed now (in my opinion, anyway).
> 
> I hope this is acceptable, and does not have any holes :-).
> 
> Javier Lopez-Martin
> Hewlett-Packard Co
> javier@cnd.hp.com
 

Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 2 Jul 1998 12:02:03 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: orb_revision@omg.org, interop@omg.org
Subject: Issues 817, 1138, 1531, and 1581

Hi, 

we've seen a spate of issues lately that relate to marshaling,
particularly
marshaling for strings, anys, and type codes. The following is a
proposal
for how to solve a number of these issues. Tom Rutt, Keith Duddy, and
myself
put this together. Please review and tear it to shreds as you see
fit... ;-)

I've sent this to both orb_revision and interop because the proposal
affects
both marshaling and the API for the TypeCode interface (because it
adds
new TCKind values).

Recently, we *almost* adopted a change to allow an empty string to be
marshaled
as a zero length value followed by nothing. Fortunately, Bob Kukura
pointed
out that this isn't going to work because in an encapsulation, it
would
be impossible to recognize when the alternate encoding is used.
Similar arguments apply to any marshaling change -- encapsulations get
in
the way.

So, part of the solution is to allow for a different encapsulation.

Proposal:

For argument's sake, call the current CDR encoding rules CDR 1.0.
We propose to create CDR 1.1. CDR 1.1 changes how strings, anys, and
type codes are marshaled. One problem is that for encapsulations, the
kind of the encoding used for the encapsulation is not part of the
encapsulation. So, step 1 is to permit encapsulations to carry
different
encodings.

Currently, an encapsulation is marshaled as a four-byte count,
followed
by a single boolean octet to indicate endian-ness, followed by the
data.
To indicate an encapsulation carrying CDR 1.1, we can use the second-
least significant bit of the octet to indicate CDR 1.1. So, we get

	Octet Value	CDR version
	----------------------------------
	0x0		1.0, big-endian
	0x1		1.0, little-endian
	0x2		1.1, big-endian
	0x3		1.1, little-endian

The remaining six bits are reserved and must be zero for both CDR 1.0
and 1.1.
(There is precedence for this -- it is exactly the way fragmentation
was
added to GIOP, by changing the boolean octet to a flags octet).

To make sure client and server agree on who can encode and decode
what,
GIOP 1.0 and GIOP 1.1 only use CDR 1.0. GIOP 1.2 only uses CDR 1.1.

CDR 1.1 encapsulations cannot occur inside a CDR 1.0 encapsulation.

CDR 1.1 encapsulations cannot be used with GIOP 1.0 or GIOP 1.1.

What is different in CDR 1.1?

CDR 1.1 changes the marshaling rules for strings and wide strings, and
adds two new encodings for anys and type codes:

	- The null terminator for strings and wide strings is dropped,
	  and the count of bytes/characters preceding the string is
	  just the number of characters, not counting any terminator
	  (because there is none). This gets rid of the issue that
	  empty strings effectively consume 8 bytes on the wire, which
	  has nasty implications for marshaling type codes for
structured
	  types (in some cases, the null termination of CDR 1.0
strings leads
	  to type codes that are around 35% larger on the wire).

	- A second, alternative encoding for type any is added to CDR
1.1.
	  The alternative encoding is indicated by a new TCKind,
called
	  tk_any_encaps. The difference to the normal encoding of type
	  any is that instead of having an any contain a type code
followed
	  by the data, in CDR 1.1, I can encode the any as a type code
followed
	  by the *encapsulated* data.

	  This gets rid of a range of problems relating to marshaling
of type
	  any. For example, an event channel cannot receive an any and

	  send it to a consumer without completely unmarshaling and
	  remarshaling every any value. The CDR 1.1 encoding for any
values
	  permits an event channel to both unmarshal and marshal an
any
	  as a blob, without having to decode and re-encode all of the
data
	  in the any.

	  CDR 1.1 any is represented as (in this order):

	  - tk_any_encaps (4 bytes)
	  - type code (normal CDR 1.0 encoding or new tk_implicit
encoding,
	    see below)
	  - *encapsulation* of the any value

	  This means we have two encodings of type any side by side,
the old
	  CDR 1.0 encoding (which is still valid), and the new
encoding.
	  How does the sender decide how to marshal an any?

	  One of several possible strategies:

		- Use the CDR 1.0 encoding for all anys with simple
type codes
		  that are fixed length, and for all any's carrying a
simple
		  fixed-length value or a single string. For such
anys,
		  the CDR 1.1 encoding just adds bulk (an extra 8
bytes).

		- Use the CDR 1.1 encoding for all anys with complex
type
		  codes that have several parameters.

		- Use the CDR 1.0 encoding if the sender wants to use
		  fragmentation (encapsulation of the value can get in
		  the way of fragmentation if the value is larger than
		  the largest supported fragment size).

	- Javier recently proposed to add a new TCKind tk_implicit
	  to get more compact marshaling of type codes. CDR 1.1 adds
	  this new TCKind tk_implicit. tk_implicit has a single
parameter
	  of type long (not of type string as Javier initially
suggested).

	  The idea of tk_implicit is to avoid repeatedly marshaling
the
	  same type code over and over again. Avoiding this is
	  particularly important for any values sent to event
channels.
	  Typically, the same type code is sent over and over with
	  every event, and for structures, the type code can easily be
	  10 times the size of the actual value (if the type code
	  contains member names). Even without member names, the type
code
	  can still be more than 5 times the size of the value.

	  So, when I send an any value to a channel, I can send
tk_implicit
	  instead of the full type code. tk_implicit just sends a
number
	  that indicates which type is meant, instead of sending the
full
	  type code.

	  Just sending a number (or index) relies on caching of type
codes
	  by the receiver. The idea is that if I send a stream of anys
	  to an event channel, I only send the full type code
once. Once
	  I have sent the full type code, future anys with the same
type
	  code just contain that type code's index.

	  To make caching work, we need a session concept. A session
is
	  simply an IIOP connection. Once a connection goes down,
	  all previously cached type codes become invalid.

	  So, how are indexes exchanged?

	  We need some negotiation between sender and receiver. The
service
	  context allows us to do that. We propose two new service
contexts:

	  module IOP {
		  // ...
		  const ServiceId TypeCodeCacheEnabled = N; (to be
assigned)
		  const ServiceId ReposIdCache = N+1;	    (ditto)
		  // ...
	  };

	  For TypeCodeCacheEnabled, the corresponding context_data
	  octet sequence of the service context is empty.

	  TypeCodeCacheEnabled is used by the sender to indicate to
the
	  receiver that it understands caching.

	  For example, if I am a client and send my first request to a
	  server via a connection, I can add a TypeCodeCacheEnabled
entry 
	  to the service context of the request message. This
indicates
	  to the server that I understand caching, and that I am able
	  to cache type codes that I receive from the server.

	  Correspondingly, the server can on its first reply via a
particular
	  connection indicate to the client that it understands
caching
	  by adding a TypeCodeCachedEnabled entry to the service
context
	  of its reply message.

	  In other words, TypeCodeCacheEnabled allows client and
server
	  to inform each other about their capabilities. Client and
server
	  need to send this context ID only once (but are allowed to
send
	  it with every message, and need not send it with the first
message).

	  The reason for having a TypeCodeCacheEnabled service context
	  is to permit either side to decide whether caching of type
codes
	  is worthwhile (without this message, a receiver may
uselessly
	  cache type codes even though the send will never take
advantage
	  of the cache).

	  Once client and server have agreed to use type code caching,
	  tk_implicit works as follows:

	  When a client sends a type code for the very first time, it
must
	  send it using normal CDR 1.0 encoding rules (as a full type
code).
	  If the server understands caching and has cached this type
code,
	  it includes a ReposIdCache entry with in its reply service
context.
	  That ReposIdCache entry contains a sequence of pairs. Each
	  pair is the repository ID of a type code sent by the client
	  previously, together with an index value. The encoding in
the
	  context_data member of a ServiceContext for ReposIdCache is:

		struct CacheEntry {
			string	repositoryID;
			long	index;
		};
		typedef sequence<CacheEntry> CacheEntrySeq;

	  So, if the client invokes an operation with five in
parameters,
	  all of type any, the server can return a sequence of up to
	  five cache entries to the client. Each sequence member
contains
	  a repository ID and the index value that was assigned to
that
	  repository ID by the server.

	  If the client now sends another any value with a previously
sent
	  type code, it encodes the type code as tk_implicit. The
single
	  parameter value of the type code is the index that was
previously
	  assigned by the server.

	  The net effect is that if a server understands caching, the
client
	  needs to send the full typecode only once per connection,
and
	  thereafter just sends the cache index for that type code.

	  Of course, the client can use the tk_implict encoding only
for
	  those type codes it has previously received an index for.

	  For typecodes returned from the server to the client (as
inout
	  or out parameters or return values), we again use the
service
	  context. Suppose the server returns an any value to the
client
	  for the first time. If the server previously (or with this
reply)
	  has indicated to the client that it understands caching, the
client
	  can cache the type code it has just received and assign an
index
	  to that type code. The next time the client makes a request
to
	  the server, it sends the repository ID for the type code and
	  the index to the server. This indicates to the server that
in
	  future (on the same connection), it can just send the index
	  to the client for that type code (instead of the full type
code).

	  If a connection ever goes down, all cached type codes are
thrown
	  away at both ends.

	  If either side runs out of room for cached type codes, it
can
	  close the connection to clear both caches. The client then
re-opens
	  the connection and both sides start afresh. (This happens as
	  part of the normal rebinding after a CloseConnection message
	  from a server).

	  Alternatively, if a cache fills up, the party with the full
cache
	  can simply stop sending further ReposIdCache service
contexts
	  and sit on whatever type codes it has currently cached.

	  The tk_implicit scheme deliberately does not include to
explicitly
	  outdate a cache (other than by dropping a connection), or to
	  selective outdate particular cache entries. The reason for
this
	  decision is to keep complixity down. We were also worried
about
	  race conditions in multi-threaded clients and servers.

	  Neither side is obliged to send tk_implit type codes, even
if
	  the other end previously has sent a cache entry for that
type
	  code. It is *always* legal to send the full type code
instead.

	  If either side sends an index value inside a tk_implicit
	  that hasn't been seen as part of a CacheEntry previously,
	  a MARSHAL exception is raised to the client application
code.

	  If either side sends a tk_implicit type code when the
receiver
	  hasn't previously indicated willingness to cache type codes
with
	  TypeCodeCacheEnabled, BAD_CONTEXT is raised.

	  If either side sends a repository ID inside a CacheEntry
that
	  was cached previously, BAD_CONTEXT is raised.

	  If either side sends an index value inside a CacheEntry
	  when that index value is already in use, BAD_CONTEXT is
raised.

For all exception conditions, we can define appropriate minor codes
to indicate what is wrong with a context.

All of the above relies an CDR 1.1, which can only occur as part of
GIOP 1.2.
None of the tk_implicit caching is mandatory, so it is fully backward
compatible with previous versions, and ORB vendors are not obliged to
implement it.

The BAD_CONTEXT exceptions are optional (neither side is obliged to
raise
these, but can). This allows an ORB that doesn't want to cache type
codes
to completely ignore the service context.

As far as we can see, the above solves a whole pile of niggling
problems.
In particular, it gets rid of the excessive cost of marshaling type
codes
without breaking backward compatibility. The scheme is optional, so
ORB
vendors don't have to implement it they don't want to.

In effect, the service context is used to negotiate "quality of
protocol"
parameters transparently. The negotiation is piggy-backed onto normal
message,
so no additional messages need to be exchanged, and no new GIOP
messages
are required.

The overhead is minimal - TypeCodeCacheEnabled needs to be send only
once
per connection. The ReposIdCache service context is sent only once for
each repository ID. After that, type codes are sent as simple number,
instead of sending the full type code or the repository ID (the
repository
ID can still be large, say, 20 - 50 bytes, which is why we modified
Javier's tk_implicit proposal).

So, that's about it, flame away ;-)

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <Mike_Spreitzer.PARC@xerox.com>
From: Mike_Spreitzer.PARC@xerox.com
X-NS-Transport-ID: 0800201FCE5D3932FBE9
Date: Wed, 1 Jul 1998 22:21:42 PDT
Subject: Re: Issues 817, 1138, 1531, and 1581
To: michi@dstc.edu.AU
cc: orb_revision@omg.org, interop@omg.org

(1) ``encapsulation of the value can get in the way of
fragmentation''.  So why
not introduce a chunked encapsulation format?

(2) I can put into an any a struct that has a member whose type is
any, right?
And in that inner any, I can put another struct with an any-typed
member,
right?  And I can continue nesting ad nauseum, right?  And the
receiver can
understand and process that nested structure to whatever depth it
chooses,
right?  So "if the client invokes an operation with five in
parameters, all of
type any", the server could return a CacheEntrySeq with more than five
entries,
right?  That is, the limit is the number of different type codes that
actually
are sent, not the number of anys that appear in the static type(s) of
the
parameters.

Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 2 Jul 1998 16:24:18 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Mike_Spreitzer.PARC@xerox.com
cc: orb_revision@omg.org, interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581

On Wed, 1 Jul 1998 Mike_Spreitzer.PARC@xerox.com wrote:

> (1) ``encapsulation of the value can get in the way of
fragmentation''.  So why
> not introduce a chunked encapsulation format?

We wanted to keep things simple. I'm not fundamentally opposed to
chunked
encapsulation myself -- if someone wants to work out the details,
that sounds good to me!

> (2) I can put into an any a struct that has a member whose type is
any, right?
> And in that inner any, I can put another struct with an any-typed
member,
> right?  And I can continue nesting ad nauseum, right?  And the
receiver can
> understand and process that nested structure to whatever depth it
chooses,
> right?  So "if the client invokes an operation with five in
parameters, all of
> type any", the server could return a CacheEntrySeq with more than
five entries,
> right?  That is, the limit is the number of different type codes
that actually
> are sent, not the number of anys that appear in the static type(s)
of the
> parameters.

OK, you caught us there. I was making up an example as I went. In the
scenario
you describe, there could indeed be more than five elements on the
CacheEntrySeq. The point really is that a receiver can only return a
CacheEntry for a type code it has previously received from the sender,
not just any arbitrary type code.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html

Return-Path: <jon@floorboard.com>
Date: Thu, 02 Jul 1998 13:38:31 -0400
From: Jonathan Biggar <jon@floorboard.com>
Organization: Floorboard Software
To: Michi Henning <michi@dstc.edu.au>
CC: orb_revision@omg.org, interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980702115554.26779L-100000@tigger.dstc.edu.au>

Here are my comments.  I think I left your proposal in large pieces,
not small
ones. :-)

Michi Henning wrote:

> Hi,
>
> we've seen a spate of issues lately that relate to marshaling,
particularly
> marshaling for strings, anys, and type codes. The following is a
proposal
> for how to solve a number of these issues. Tom Rutt, Keith Duddy,
and myself
> put this together. Please review and tear it to shreds as you see
fit... ;-)
>
> I've sent this to both orb_revision and interop because the proposal
affects
> both marshaling and the API for the TypeCode interface (because it
adds
> new TCKind values).

Your proposal should not have to affect the TypeCode interface by
assigning a new
TCKind value, since this is transparent to the programmer.  All of
your mechanism
is handled inside the ORB or GIOP/IIOP stack.  In fact, your new
TCKind value could
just be handled like the "indirect" TypeCode marshalling now, just use
0xFFFFFFFE.

> Recently, we *almost* adopted a change to allow an empty string to
be marshaled
> as a zero length value followed by nothing. Fortunately, Bob Kukura
pointed
> out that this isn't going to work because in an encapsulation, it
would
> be impossible to recognize when the alternate encoding is used.
> Similar arguments apply to any marshaling change -- encapsulations
get in
> the way.
>
> So, part of the solution is to allow for a different encapsulation.
>
> Proposal:
>
> For argument's sake, call the current CDR encoding rules CDR 1.0.
> We propose to create CDR 1.1. CDR 1.1 changes how strings, anys, and
> type codes are marshaled. One problem is that for encapsulations,
the
> kind of the encoding used for the encapsulation is not part of the
> encapsulation. So, step 1 is to permit encapsulations to carry
different
> encodings.
>
> Currently, an encapsulation is marshaled as a four-byte count,
followed
> by a single boolean octet to indicate endian-ness, followed by the
data.
> To indicate an encapsulation carrying CDR 1.1, we can use the
second-
> least significant bit of the octet to indicate CDR 1.1. So, we get
>
>         Octet Value     CDR version
>         ----------------------------------
>         0x0             1.0, big-endian
>         0x1             1.0, little-endian
>         0x2             1.1, big-endian
>         0x3             1.1, little-endian
>
> The remaining six bits are reserved and must be zero for both CDR
1.0 and 1.1.
> (There is precedence for this -- it is exactly the way fragmentation
was
> added to GIOP, by changing the boolean octet to a flags octet).
>
> To make sure client and server agree on who can encode and decode
what,
> GIOP 1.0 and GIOP 1.1 only use CDR 1.0. GIOP 1.2 only uses CDR 1.1.
>
> CDR 1.1 encapsulations cannot occur inside a CDR 1.0 encapsulation.
>
> CDR 1.1 encapsulations cannot be used with GIOP 1.0 or GIOP 1.1.

There is an issue to consider here, because some ORB implementations
may store
TypeCodes or Anys as encapsulated CDR.  Thus if a server receives an
encapsulated
value from a CDR 1.1 source and send it to a CDR 1.0 destination, it
will need to
convert the encasulation format, which could be pretty messy.

> What is different in CDR 1.1?

> CDR 1.1 changes the marshaling rules for strings and wide strings,
and
> adds two new encodings for anys and type codes:
>
>         - The null terminator for strings and wide strings is
dropped,
>           and the count of bytes/characters preceding the string is
>           just the number of characters, not counting any terminator
>           (because there is none). This gets rid of the issue that
>           empty strings effectively consume 8 bytes on the wire,
which
>           has nasty implications for marshaling type codes for
structured
>           types (in some cases, the null termination of CDR 1.0
strings leads
>           to type codes that are around 35% larger on the wire).
>
>         - A second, alternative encoding for type any is added to
CDR 1.1.
>           The alternative encoding is indicated by a new TCKind,
called
>           tk_any_encaps. The difference to the normal encoding of
type
>           any is that instead of having an any contain a type code
followed
>           by the data, in CDR 1.1, I can encode the any as a type
code followed
>           by the *encapsulated* data.

Assign tk_any_encaps the value 0xFFFFFFFE like the TypeCode indirect
mechanism and
you don't need to modify TCKind.

>           This gets rid of a range of problems relating to
marshaling of type
>           any. For example, an event channel cannot receive an any
and
>           send it to a consumer without completely unmarshaling and
>           remarshaling every any value. The CDR 1.1 encoding for any
values
>           permits an event channel to both unmarshal and marshal an
any
>           as a blob, without having to decode and re-encode all of
the data
>           in the any.
>
>           CDR 1.1 any is represented as (in this order):
>
>           - tk_any_encaps (4 bytes)
>           - type code (normal CDR 1.0 encoding or new tk_implicit
encoding,
>             see below)
>           - *encapsulation* of the any value
>
>           This means we have two encodings of type any side by side,
the old
>           CDR 1.0 encoding (which is still valid), and the new
encoding.
>           How does the sender decide how to marshal an any?
>
>           One of several possible strategies:
>
>                 - Use the CDR 1.0 encoding for all anys with simple
type codes
>                   that are fixed length, and for all any's carrying
a simple
>                   fixed-length value or a single string. For such
anys,
>                   the CDR 1.1 encoding just adds bulk (an extra 8
bytes).
>
>                 - Use the CDR 1.1 encoding for all anys with complex
type
>                   codes that have several parameters.
>
>                 - Use the CDR 1.0 encoding if the sender wants to
use
>                   fragmentation (encapsulation of the value can get
in
>                   the way of fragmentation if the value is larger
than
>                   the largest supported fragment size).
>
>         - Javier recently proposed to add a new TCKind tk_implicit
>           to get more compact marshaling of type codes. CDR 1.1 adds
>           this new TCKind tk_implicit. tk_implicit has a single
parameter
>           of type long (not of type string as Javier initially
suggested).

Again, for tk_implicit, just use 0xFFFFFFFD.  Or even better, rather
than using an
additional long parameter, just use the range 0xF0000000 to 0xFFFFFFF0
(adjust the
top end as appropriate) to encode both the fact that the typecode is
implicit and
the cache tag value.  This still leaves almost 2^31 entries in the
cache, which is
by far more than enough, and saves you 4 more bytes.

>           The idea of tk_implicit is to avoid repeatedly marshaling
the
>           same type code over and over again. Avoiding this is
>           particularly important for any values sent to event
channels.
>           Typically, the same type code is sent over and over with
>           every event, and for structures, the type code can easily
be
>           10 times the size of the actual value (if the type code
>           contains member names). Even without member names, the
type code
>           can still be more than 5 times the size of the value.
>
>           So, when I send an any value to a channel, I can send
tk_implicit
>           instead of the full type code. tk_implicit just sends a
number
>           that indicates which type is meant, instead of sending the
full
>           type code.
>
>           Just sending a number (or index) relies on caching of type
codes
>           by the receiver. The idea is that if I send a stream of
anys
>           to an event channel, I only send the full type code
once. Once
>           I have sent the full type code, future anys with the same
type
>           code just contain that type code's index.
>
>           To make caching work, we need a session concept. A session
is
>           simply an IIOP connection. Once a connection goes down,
>           all previously cached type codes become invalid.
>
>           So, how are indexes exchanged?
>
>           We need some negotiation between sender and receiver. The
service
>           context allows us to do that. We propose two new service
contexts:
>
>           module IOP {
>                   // ...
>                   const ServiceId TypeCodeCacheEnabled = N; (to be
assigned)
>                   const ServiceId ReposIdCache = N+1;       (ditto)
>                   // ...
>           };
>
>           For TypeCodeCacheEnabled, the corresponding context_data
>           octet sequence of the service context is empty.
>
>           TypeCodeCacheEnabled is used by the sender to indicate to
the
>           receiver that it understands caching.
>
>           For example, if I am a client and send my first request to
a
>           server via a connection, I can add a TypeCodeCacheEnabled
entry
>           to the service context of the request message. This
indicates
>           to the server that I understand caching, and that I am
able
>           to cache type codes that I receive from the server.
>
>           Correspondingly, the server can on its first reply via a
particular
>           connection indicate to the client that it understands
caching
>           by adding a TypeCodeCachedEnabled entry to the service
context
>           of its reply message.
>
>           In other words, TypeCodeCacheEnabled allows client and
server
>           to inform each other about their capabilities. Client and
server
>           need to send this context ID only once (but are allowed to
send
>           it with every message, and need not send it with the first
message).
>
>           The reason for having a TypeCodeCacheEnabled service
context
>           is to permit either side to decide whether caching of type
codes
>           is worthwhile (without this message, a receiver may
uselessly
>           cache type codes even though the send will never take
advantage
>           of the cache).
>
>           Once client and server have agreed to use type code
caching,
>           tk_implicit works as follows:
>
>           When a client sends a type code for the very first time,
it must
>           send it using normal CDR 1.0 encoding rules (as a full
type code).
>           If the server understands caching and has cached this type
code,
>           it includes a ReposIdCache entry with in its reply service
context.
>           That ReposIdCache entry contains a sequence of pairs. Each
>           pair is the repository ID of a type code sent by the
client
>           previously, together with an index value. The encoding in
the
>           context_data member of a ServiceContext for ReposIdCache
is:
>
>                 struct CacheEntry {
>                         string  repositoryID;
>                         long    index;
>                 };
>                 typedef sequence<CacheEntry> CacheEntrySeq;
>
>           So, if the client invokes an operation with five in
parameters,
>           all of type any, the server can return a sequence of up to
>           five cache entries to the client. Each sequence member
contains
>           a repository ID and the index value that was assigned to
that
>           repository ID by the server.
>
>           If the client now sends another any value with a
previously sent
>           type code, it encodes the type code as tk_implicit. The
single
>           parameter value of the type code is the index that was
previously
>           assigned by the server.
>
>           The net effect is that if a server understands caching,
the client
>           needs to send the full typecode only once per connection,
and
>           thereafter just sends the cache index for that type code.
>
>           Of course, the client can use the tk_implict encoding only
for
>           those type codes it has previously received an index for.
>
>           For typecodes returned from the server to the client (as
inout
>           or out parameters or return values), we again use the
service
>           context. Suppose the server returns an any value to the
client
>           for the first time. If the server previously (or with this
reply)
>           has indicated to the client that it understands caching,
the client
>           can cache the type code it has just received and assign an
index
>           to that type code. The next time the client makes a
request to
>           the server, it sends the repository ID for the type code
and
>           the index to the server. This indicates to the server that
in
>           future (on the same connection), it can just send the
index
>           to the client for that type code (instead of the full type
code).

You need to make it perfectly clear here that there are two potential
TypeCode
caches here, one for the client & one for the server, and each cache
uses its own
index range and operates independently of the other.

>           If a connection ever goes down, all cached type codes are
thrown
>           away at both ends.
>
>           If either side runs out of room for cached type codes, it
can
>           close the connection to clear both caches. The client then
re-opens
>           the connection and both sides start afresh. (This happens
as
>           part of the normal rebinding after a CloseConnection
message
>           from a server).
>
>           Alternatively, if a cache fills up, the party with the
full cache
>           can simply stop sending further ReposIdCache service
contexts
>           and sit on whatever type codes it has currently cached.
>
>           The tk_implicit scheme deliberately does not include to
explicitly
>           outdate a cache (other than by dropping a connection), or
to
>           selective outdate particular cache entries. The reason for
this
>           decision is to keep complixity down. We were also worried
about
>           race conditions in multi-threaded clients and servers.
>
>           Neither side is obliged to send tk_implit type codes, even
if
>           the other end previously has sent a cache entry for that
type
>           code. It is *always* legal to send the full type code
instead.
>
>           If either side sends an index value inside a tk_implicit
>           that hasn't been seen as part of a CacheEntry previously,
>           a MARSHAL exception is raised to the client application
code.
>
>           If either side sends a tk_implicit type code when the
receiver
>           hasn't previously indicated willingness to cache type
codes with
>           TypeCodeCacheEnabled, BAD_CONTEXT is raised.
>
>           If either side sends a repository ID inside a CacheEntry
that
>           was cached previously, BAD_CONTEXT is raised.
>
>           If either side sends an index value inside a CacheEntry
>           when that index value is already in use, BAD_CONTEXT is
raised.
>
> For all exception conditions, we can define appropriate minor codes
> to indicate what is wrong with a context.
>
> All of the above relies an CDR 1.1, which can only occur as part of
GIOP 1.2.
> None of the tk_implicit caching is mandatory, so it is fully
backward
> compatible with previous versions, and ORB vendors are not obliged
to
> implement it.
>
> The BAD_CONTEXT exceptions are optional (neither side is obliged to
raise
> these, but can). This allows an ORB that doesn't want to cache type
codes
> to completely ignore the service context.

You should restate this to say that if an ORB uses the cache
mechanism, it must
raise BAD_CONTEXT for the appropriate error conditions, but if it
ignores the cache
mechanism, it does not need to detect the error conditions and raise
BAD_CONTEXT.[As an aside here, I thought BAD_CONTEXT had to do with
the IDL context
mechanism, not service contexts.  Aren't ambiguous System Exceptions
fun? :-)]

> As far as we can see, the above solves a whole pile of niggling
problems.
> In particular, it gets rid of the excessive cost of marshaling type
codes
> without breaking backward compatibility. The scheme is optional, so
ORB
> vendors don't have to implement it they don't want to.
>
> In effect, the service context is used to negotiate "quality of
protocol"
> parameters transparently. The negotiation is piggy-backed onto
normal message,
> so no additional messages need to be exchanged, and no new GIOP
messages
> are required.
>
> The overhead is minimal - TypeCodeCacheEnabled needs to be send only
once
> per connection. The ReposIdCache service context is sent only once
for
> each repository ID. After that, type codes are sent as simple
number,
> instead of sending the full type code or the repository ID (the
repository
> ID can still be large, say, 20 - 50 bytes, which is why we modified
> Javier's tk_implicit proposal).

This looks pretty good.  Since it is entirely optional, we don't
really have to do
a comprehensive cost/benefit or resource use/performance comparison.
A possible
ORB implementation optimization might be to select these mechanisms
based on the
bandwith of the communication medium.  If the bandwidth is high and
memory cost is
high, it may be better to forget about TypeCode caching.  If the
bandwith is low,
using the TypeCode caching could reduce the overhead of anys by 10x or
more.

--
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Mon, 6 Jul 1998 11:41:19 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: orb_revision@omg.org, interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581

On Thu, 2 Jul 1998, Jonathan Biggar wrote:

> > Currently, an encapsulation is marshaled as a four-byte count,
followed
> > by a single boolean octet to indicate endian-ness, followed by the
data.
> > To indicate an encapsulation carrying CDR 1.1, we can use the
second-
> > least significant bit of the octet to indicate CDR 1.1. So, we get
> >
> >         Octet Value     CDR version
> >         ----------------------------------
> >         0x0             1.0, big-endian
> >         0x1             1.0, little-endian
> >         0x2             1.1, big-endian
> >         0x3             1.1, little-endian
> >
> > The remaining six bits are reserved and must be zero for both CDR
1.0 and 1.1.
> > (There is precedence for this -- it is exactly the way
fragmentation was
> > added to GIOP, by changing the boolean octet to a flags octet).
> >
> > To make sure client and server agree on who can encode and decode
what,
> > GIOP 1.0 and GIOP 1.1 only use CDR 1.0. GIOP 1.2 only uses CDR
1.1.
> >
> > CDR 1.1 encapsulations cannot occur inside a CDR 1.0
encapsulation.
> >
> > CDR 1.1 encapsulations cannot be used with GIOP 1.0 or GIOP 1.1.
> 
> There is an issue to consider here, because some ORB implementations
may store
> TypeCodes or Anys as encapsulated CDR.  Thus if a server receives an
encapsulated
> value from a CDR 1.1 source and send it to a CDR 1.0 destination, it
will need to
> convert the encasulation format, which could be pretty messy.

Yes, and this is true for the reverse as well -- if a CDR 1.0
encapsulation
is sent to a CDR 1.1 destination, the same applies (because we have
specified that CDR 1.1 only applies to GIOP 1.2). We could relax this
by requiring that GIOP 1.2 implementations must understand both CDR
encodings.
On the other hand, that would require some extra work at run-time too.

How do other people feel about this? If CDR 1.0 can be sent via GIOP
1.2,
conversion in one direction can be avoided. The conversion in the
other
direction is simply the price of the optimization, as far as I can
see.

> Assign tk_any_encaps the value 0xFFFFFFFE like the TypeCode indirect
mechanism and
> you don't need to modify TCKind.

I agree, that would make sense. That way, the TypeCode API doesn't
have to
change.

> >         - Javier recently proposed to add a new TCKind tk_implicit
> >           to get more compact marshaling of type codes. CDR 1.1
> adds
> >           this new TCKind tk_implicit. tk_implicit has a single
> parameter
> >           of type long (not of type string as Javier initially
> suggested).
> 
> Again, for tk_implicit, just use 0xFFFFFFFD.  Or even better, rather
> than using an
> additional long parameter, just use the range 0xF0000000 to
> 0xFFFFFFF0 (adjust the
> top end as appropriate) to encode both the fact that the typecode is
> implicit and
> the cache tag value.  This still leaves almost 2^31 entries in the
> cache, which is
> by far more than enough, and saves you 4 more bytes.

Again, I agree. If we use something like 0xFFFFFFFD, we keep changes
away
from the TypeCode API (which is a good thing). However, I'm not sure
whether using a range of these "escape hatch" TCKind values to encode
the index is a good idea. I think I would prefer using a separate long
value still. It is a bit cleaner (IMO) and saving four more bytes
doesn't
seem worth the added complexity. (I'm also reluctant to eat up large
swaths
of the TCKind namespace.)

> >           For typecodes returned from the server to the client (as
inout
> >           or out parameters or return values), we again use the
service
> >           context. Suppose the server returns an any value to the
client
> >           for the first time. If the server previously (or with
this reply)
> >           has indicated to the client that it understands caching,
the client
> >           can cache the type code it has just received and assign
an index
> >           to that type code. The next time the client makes a
request to
> >           the server, it sends the repository ID for the type code
and
> >           the index to the server. This indicates to the server
that in
> >           future (on the same connection), it can just send the
index
> >           to the client for that type code (instead of the full
type code).
> 
> You need to make it perfectly clear here that there are two
potential TypeCode
> caches here, one for the client & one for the server, and each cache
uses its own
> index range and operates independently of the other.

Agreed. What we wrote so far isn't meant to go into the spec as it
stands,
but to get some discussion going. The formal words to go into the spec
need to be tightened up a bit.

> You should restate this to say that if an ORB uses the cache
mechanism, it must
> raise BAD_CONTEXT for the appropriate error conditions, but if it
ignores the cache
> mechanism, it does not need to detect the error conditions and raise
> BAD_CONTEXT.

Sounds good to me.

> [As an aside here, I thought BAD_CONTEXT had to do with the IDL
context
> mechanism, not service contexts.  Aren't ambiguous System Exceptions
fun? :-)]

Hmmm... The only semantics for BAD_CONTEXT are in an IDL comment in
chapter 3:

	exception BAD_CONTEXT ex_body; // error processing context
object

>From this, I agree with you -- BAD_CONTEXT apparently relates to
context
objects, not to service contexts. We could use MARSHAL where we
suggested
BAD_CONTEXT. BAD_TYPECODE might be another option.

> This looks pretty good.  Since it is entirely optional, we don't
really have to do
> a comprehensive cost/benefit or resource use/performance comparison.

Given the inefficiency of the current marshaling rules, I think the
benefit
would quite noticable as soon as a type code is sent more than
once. In
particular, unmarshaling a type code is very expensive compared to
looking
it up in a cache, so I would expect to see performance improvements in
both
bandwidth and CPU cycles. Because the caching information is
piggy-backed
onto normal messages, maintaining the caching information is (almost)
vfree.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <paulk@noblenet.com>
Date: Mon, 06 Jul 1998 12:28:10 -0400
From: Paul H Kyzivat <paulk@noblenet.com>
Organization: NobleNet
To: Michi Henning <michi@dstc.edu.au>
CC: orb_revision@omg.org, interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980702115554.26779L-100000@tigger.dstc.edu.au>

Michi Henning wrote:
> 
> Hi,
> 
> we've seen a spate of issues lately that relate to marshaling,
> particularly
> marshaling for strings, anys, and type codes. The following is a
> proposal
> for how to solve a number of these issues. Tom Rutt, Keith Duddy,
> and
> myself
> put this together. Please review and tear it to shreds as you see
> fit... ;-)
> 
> I've sent this to both orb_revision and interop because the proposal
> affects
> both marshaling and the API for the TypeCode interface (because it
> adds
> new TCKind values).
> 
> Recently, we *almost* adopted a change to allow an empty string to
> be
> marshaled
> as a zero length value followed by nothing. Fortunately, Bob Kukura
> pointed
> out that this isn't going to work because in an encapsulation, it
> would
> be impossible to recognize when the alternate encoding is used.
> Similar arguments apply to any marshaling change -- encapsulations
> get
> in
> the way.
> 
> So, part of the solution is to allow for a different encapsulation.
> 
> Proposal:
> 
> For argument's sake, call the current CDR encoding rules CDR 1.0.
> We propose to create CDR 1.1. CDR 1.1 changes how strings, anys, and
> type codes are marshaled. One problem is that for encapsulations,
> the
> kind of the encoding used for the encapsulation is not part of the
> encapsulation. So, step 1 is to permit encapsulations to carry
> different
> encodings.
> 
> Currently, an encapsulation is marshaled as a four-byte count,
> followed
> by a single boolean octet to indicate endian-ness, followed by the
> data.
> To indicate an encapsulation carrying CDR 1.1, we can use the
> second-
> least significant bit of the octet to indicate CDR 1.1. So, we get
> 
>         Octet Value     CDR version
>         ----------------------------------
>         0x0             1.0, big-endian
>         0x1             1.0, little-endian
>         0x2             1.1, big-endian
>         0x3             1.1, little-endian
> 
> The remaining six bits are reserved and must be zero for both CDR
> 1.0
> and 1.1.
> (There is precedence for this -- it is exactly the way fragmentation
> was
> added to GIOP, by changing the boolean octet to a flags octet).
> 
> To make sure client and server agree on who can encode and decode
> what,
> GIOP 1.0 and GIOP 1.1 only use CDR 1.0. GIOP 1.2 only uses CDR 1.1.
> 
> CDR 1.1 encapsulations cannot occur inside a CDR 1.0 encapsulation.
> 
> CDR 1.1 encapsulations cannot be used with GIOP 1.0 or GIOP 1.1.
> 
> What is different in CDR 1.1?
> 
> CDR 1.1 changes the marshaling rules for strings and wide strings,
> and
> adds two new encodings for anys and type codes:
> 
>         - The null terminator for strings and wide strings is
> dropped,
>           and the count of bytes/characters preceding the string is
>           just the number of characters, not counting any terminator
>           (because there is none). This gets rid of the issue that
>           empty strings effectively consume 8 bytes on the wire,
> which
>           has nasty implications for marshaling type codes for
> structured
>           types (in some cases, the null termination of CDR 1.0
> strings leads
>           to type codes that are around 35% larger on the wire).
> 
>         - A second, alternative encoding for type any is added to
> CDR
> 1.1.
>           The alternative encoding is indicated by a new TCKind,
> called
>           tk_any_encaps. The difference to the normal encoding of
> type
>           any is that instead of having an any contain a type code
> followed
>           by the data, in CDR 1.1, I can encode the any as a type
> code
> followed
>           by the *encapsulated* data.
> 
>           This gets rid of a range of problems relating to
> marshaling
> of type
>           any. For example, an event channel cannot receive an any
> and
>           send it to a consumer without completely unmarshaling and
>           remarshaling every any value. The CDR 1.1 encoding for any
> values
>           permits an event channel to both unmarshal and marshal an
> any
>           as a blob, without having to decode and re-encode all of
> the
> data
>           in the any.
> 
>           CDR 1.1 any is represented as (in this order):
> 
>           - tk_any_encaps (4 bytes)
>           - type code (normal CDR 1.0 encoding or new tk_implicit
> encoding,
>             see below)
>           - *encapsulation* of the any value
> 
>           This means we have two encodings of type any side by side,
> the old
>           CDR 1.0 encoding (which is still valid), and the new
> encoding.
>           How does the sender decide how to marshal an any?
> 
>           One of several possible strategies:
> 
>                 - Use the CDR 1.0 encoding for all anys with simple
> type codes
>                   that are fixed length, and for all any's carrying
> a
> simple
>                   fixed-length value or a single string. For such
> anys,
>                   the CDR 1.1 encoding just adds bulk (an extra 8
> bytes).
> 
>                 - Use the CDR 1.1 encoding for all anys with complex
> type
>                   codes that have several parameters.
> 
>                 - Use the CDR 1.0 encoding if the sender wants to
> use
>                   fragmentation (encapsulation of the value can get
> in
>                   the way of fragmentation if the value is larger
> than
>                   the largest supported fragment size).
> 
>         - Javier recently proposed to add a new TCKind tk_implicit
>           to get more compact marshaling of type codes. CDR 1.1 adds
>           this new TCKind tk_implicit. tk_implicit has a single
> parameter
>           of type long (not of type string as Javier initially
> suggested).
> 
>           The idea of tk_implicit is to avoid repeatedly marshaling
> the
>           same type code over and over again. Avoiding this is
>           particularly important for any values sent to event
> channels.
>           Typically, the same type code is sent over and over with
>           every event, and for structures, the type code can easily
> be
>           10 times the size of the actual value (if the type code
>           contains member names). Even without member names, the
> type
> code
>           can still be more than 5 times the size of the value.
> 
>           So, when I send an any value to a channel, I can send
> tk_implicit
>           instead of the full type code. tk_implicit just sends a
> number
>           that indicates which type is meant, instead of sending the
> full
>           type code.
> 
>           Just sending a number (or index) relies on caching of type
> codes
>           by the receiver. The idea is that if I send a stream of
> anys
>           to an event channel, I only send the full type code once.
> Once
>           I have sent the full type code, future anys with the same
> type
>           code just contain that type code's index.
> 
>           To make caching work, we need a session concept. A session
> is
>           simply an IIOP connection. Once a connection goes down,
>           all previously cached type codes become invalid.
> 
>           So, how are indexes exchanged?
> 
>           We need some negotiation between sender and receiver. The
> service
>           context allows us to do that. We propose two new service
> contexts:
> 
>           module IOP {
>                   // ...
>                   const ServiceId TypeCodeCacheEnabled = N; (to be
> assigned)
>                   const ServiceId ReposIdCache = N+1;       (ditto)
>                   // ...
>           };
> 
>           For TypeCodeCacheEnabled, the corresponding context_data
>           octet sequence of the service context is empty.
> 
>           TypeCodeCacheEnabled is used by the sender to indicate to
> the
>           receiver that it understands caching.
> 
>           For example, if I am a client and send my first request to
> a
>           server via a connection, I can add a TypeCodeCacheEnabled
> entry
>           to the service context of the request message. This
> indicates
>           to the server that I understand caching, and that I am
> able
>           to cache type codes that I receive from the server.
> 
>           Correspondingly, the server can on its first reply via a
> particular
>           connection indicate to the client that it understands
> caching
>           by adding a TypeCodeCachedEnabled entry to the service
> context
>           of its reply message.
> 
>           In other words, TypeCodeCacheEnabled allows client and
> server
>           to inform each other about their capabilities. Client and
> server
>           need to send this context ID only once (but are allowed to
> send
>           it with every message, and need not send it with the first
> message).
> 
>           The reason for having a TypeCodeCacheEnabled service
> context
>           is to permit either side to decide whether caching of type
> codes
>           is worthwhile (without this message, a receiver may
> uselessly
>           cache type codes even though the send will never take
> advantage
>           of the cache).
> 
>           Once client and server have agreed to use type code
> caching,
>           tk_implicit works as follows:
> 
>           When a client sends a type code for the very first time,
> it
> must
>           send it using normal CDR 1.0 encoding rules (as a full
> type
> code).
>           If the server understands caching and has cached this type
> code,
>           it includes a ReposIdCache entry with in its reply service
> context.
>           That ReposIdCache entry contains a sequence of pairs. Each
>           pair is the repository ID of a type code sent by the
> client
>           previously, together with an index value. The encoding in
> the
>           context_data member of a ServiceContext for ReposIdCache
> is:
> 
>                 struct CacheEntry {
>                         string  repositoryID;
>                         long    index;
>                 };
>                 typedef sequence<CacheEntry> CacheEntrySeq;
> 
>           So, if the client invokes an operation with five in
> parameters,
>           all of type any, the server can return a sequence of up to
>           five cache entries to the client. Each sequence member
> contains
>           a repository ID and the index value that was assigned to
> that
>           repository ID by the server.
> 
>           If the client now sends another any value with a
> previously
> sent
>           type code, it encodes the type code as tk_implicit. The
> single
>           parameter value of the type code is the index that was
> previously
>           assigned by the server.
> 
>           The net effect is that if a server understands caching,
> the
> client
>           needs to send the full typecode only once per connection,
> and
>           thereafter just sends the cache index for that type code.
> 
>           Of course, the client can use the tk_implict encoding only
> for
>           those type codes it has previously received an index for.
> 
>           For typecodes returned from the server to the client (as
> inout
>           or out parameters or return values), we again use the
> service
>           context. Suppose the server returns an any value to the
> client
>           for the first time. If the server previously (or with this
> reply)
>           has indicated to the client that it understands caching,
> the
> client
>           can cache the type code it has just received and assign an
> index
>           to that type code. The next time the client makes a
> request
> to
>           the server, it sends the repository ID for the type code
> and
>           the index to the server. This indicates to the server that
> in
>           future (on the same connection), it can just send the
> index
>           to the client for that type code (instead of the full type
> code).
> 
>           If a connection ever goes down, all cached type codes are
> thrown
>           away at both ends.
> 
>           If either side runs out of room for cached type codes, it
> can
>           close the connection to clear both caches. The client then
> re-opens
>           the connection and both sides start afresh. (This happens
> as
>           part of the normal rebinding after a CloseConnection
> message
>           from a server).
> 
>           Alternatively, if a cache fills up, the party with the
> full
> cache
>           can simply stop sending further ReposIdCache service
> contexts
>           and sit on whatever type codes it has currently cached.
> 
>           The tk_implicit scheme deliberately does not include to
> explicitly
>           outdate a cache (other than by dropping a connection), or
> to
>           selective outdate particular cache entries. The reason for
> this
>           decision is to keep complixity down. We were also worried
> about
>           race conditions in multi-threaded clients and servers.
> 
>           Neither side is obliged to send tk_implit type codes, even
> if
>           the other end previously has sent a cache entry for that
> type
>           code. It is *always* legal to send the full type code
> instead.
> 
>           If either side sends an index value inside a tk_implicit
>           that hasn't been seen as part of a CacheEntry previously,
>           a MARSHAL exception is raised to the client application
> code.
> 
>           If either side sends a tk_implicit type code when the
> receiver
>           hasn't previously indicated willingness to cache type
> codes
> with
>           TypeCodeCacheEnabled, BAD_CONTEXT is raised.
> 
>           If either side sends a repository ID inside a CacheEntry
> that
>           was cached previously, BAD_CONTEXT is raised.
> 
>           If either side sends an index value inside a CacheEntry
>           when that index value is already in use, BAD_CONTEXT is
> raised.
> 
> For all exception conditions, we can define appropriate minor codes
> to indicate what is wrong with a context.
> 
> All of the above relies an CDR 1.1, which can only occur as part of
> GIOP 1.2.
> None of the tk_implicit caching is mandatory, so it is fully
> backward
> compatible with previous versions, and ORB vendors are not obliged
> to
> implement it.
> 
> The BAD_CONTEXT exceptions are optional (neither side is obliged to
> raise
> these, but can). This allows an ORB that doesn't want to cache type
> codes
> to completely ignore the service context.
> 
> As far as we can see, the above solves a whole pile of niggling
> problems.
> In particular, it gets rid of the excessive cost of marshaling type
> codes
> without breaking backward compatibility. The scheme is optional, so
> ORB
> vendors don't have to implement it they don't want to.
> 
> In effect, the service context is used to negotiate "quality of
> protocol"
> parameters transparently. The negotiation is piggy-backed onto
> normal
> message,
> so no additional messages need to be exchanged, and no new GIOP
> messages
> are required.
> 
> The overhead is minimal - TypeCodeCacheEnabled needs to be send only
> once
> per connection. The ReposIdCache service context is sent only once
> for
> each repository ID. After that, type codes are sent as simple
> number,
> instead of sending the full type code or the repository ID (the
> repository
> ID can still be large, say, 20 - 50 bytes, which is why we modified
> Javier's tk_implicit proposal).
> 
> So, that's about it, flame away ;-)

I think that something is needed here, and caching seems attractive in
many ways. Yet this proposal seems to be offhandedly introducing new
concepts that have portentially broad impact. Most major is
introducing
the notion of Session.

>      To make caching work, we need a session concept. A session is
>      simply an IIOP connection. Once a connection goes down,
>      all previously cached type codes become invalid.

Marshalling is not unique to IIOP, so it isn't sufficient to refer to
IIOP to define what a session is. There may well be CORBA
implementations that use other transports having other notions of
connection, or none at all.

>      If either side runs out of room for cached type codes, it can
>      close the connection to clear both caches. The client the 
>      re-opens the connection and both sides start afresh.

This cannot be guaranteed to work. Once the connection is dropped,
there
is no guarantee that it can be re-established, or that references
previously obtained will still be valid. For instance, once the
connection is dropped the server may decide it is no longer needed and
shutdown. Then a subsequent connection attempt may start a new server.
Transient object references from the prior server would then be
invalid.

If the notion of Session is to be used, I would like to see it
incorporated into the OMA. And the protocol needed to maintain the
caches probably needs to be stronger. (Handling this protocol via the
service context does seem like a nice approach.)
 
Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Mon, 6 Jul 1998 11:41:19 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: orb_revision@omg.org, interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581

On Thu, 2 Jul 1998, Jonathan Biggar wrote:

> > Currently, an encapsulation is marshaled as a four-byte count,
followed
> > by a single boolean octet to indicate endian-ness, followed by the
data.
> > To indicate an encapsulation carrying CDR 1.1, we can use the
second-
> > least significant bit of the octet to indicate CDR 1.1. So, we get
> >
> >         Octet Value     CDR version
> >         ----------------------------------
> >         0x0             1.0, big-endian
> >         0x1             1.0, little-endian
> >         0x2             1.1, big-endian
> >         0x3             1.1, little-endian
> >
> > The remaining six bits are reserved and must be zero for both CDR
1.0 and 1.1.
> > (There is precedence for this -- it is exactly the way
fragmentation was
> > added to GIOP, by changing the boolean octet to a flags octet).
> >
> > To make sure client and server agree on who can encode and decode
what,
> > GIOP 1.0 and GIOP 1.1 only use CDR 1.0. GIOP 1.2 only uses CDR
1.1.
> >
> > CDR 1.1 encapsulations cannot occur inside a CDR 1.0
encapsulation.
> >
> > CDR 1.1 encapsulations cannot be used with GIOP 1.0 or GIOP 1.1.
> 
> There is an issue to consider here, because some ORB implementations
may store
> TypeCodes or Anys as encapsulated CDR.  Thus if a server receives an
encapsulated
> value from a CDR 1.1 source and send it to a CDR 1.0 destination, it
will need to
> convert the encasulation format, which could be pretty messy.

Yes, and this is true for the reverse as well -- if a CDR 1.0
encapsulation
is sent to a CDR 1.1 destination, the same applies (because we have
specified that CDR 1.1 only applies to GIOP 1.2). We could relax this
by requiring that GIOP 1.2 implementations must understand both CDR
encodings.
On the other hand, that would require some extra work at run-time too.

How do other people feel about this? If CDR 1.0 can be sent via GIOP
1.2,
conversion in one direction can be avoided. The conversion in the
other
direction is simply the price of the optimization, as far as I can
see.

> Assign tk_any_encaps the value 0xFFFFFFFE like the TypeCode indirect
mechanism and
> you don't need to modify TCKind.

I agree, that would make sense. That way, the TypeCode API doesn't
have to
change.

> >         - Javier recently proposed to add a new TCKind tk_implicit
> >           to get more compact marshaling of type codes. CDR 1.1
> adds
> >           this new TCKind tk_implicit. tk_implicit has a single
> parameter
> >           of type long (not of type string as Javier initially
> suggested).
> 
> Again, for tk_implicit, just use 0xFFFFFFFD.  Or even better, rather
> than using an
> additional long parameter, just use the range 0xF0000000 to
> 0xFFFFFFF0 (adjust the
> top end as appropriate) to encode both the fact that the typecode is
> implicit and
> the cache tag value.  This still leaves almost 2^31 entries in the
> cache, which is
> by far more than enough, and saves you 4 more bytes.

Again, I agree. If we use something like 0xFFFFFFFD, we keep changes
away
from the TypeCode API (which is a good thing). However, I'm not sure
whether using a range of these "escape hatch" TCKind values to encode
the index is a good idea. I think I would prefer using a separate long
value still. It is a bit cleaner (IMO) and saving four more bytes
doesn't
seem worth the added complexity. (I'm also reluctant to eat up large
swaths
of the TCKind namespace.)

> >           For typecodes returned from the server to the client (as
inout
> >           or out parameters or return values), we again use the
service
> >           context. Suppose the server returns an any value to the
client
> >           for the first time. If the server previously (or with
this reply)
> >           has indicated to the client that it understands caching,
the client
> >           can cache the type code it has just received and assign
an index
> >           to that type code. The next time the client makes a
request to
> >           the server, it sends the repository ID for the type code
and
> >           the index to the server. This indicates to the server
that in
> >           future (on the same connection), it can just send the
index
> >           to the client for that type code (instead of the full
type code).
> 
> You need to make it perfectly clear here that there are two
potential TypeCode
> caches here, one for the client & one for the server, and each cache
uses its own
> index range and operates independently of the other.

Agreed. What we wrote so far isn't meant to go into the spec as it
stands,
but to get some discussion going. The formal words to go into the spec
need to be tightened up a bit.

> You should restate this to say that if an ORB uses the cache
mechanism, it must
> raise BAD_CONTEXT for the appropriate error conditions, but if it
ignores the cache
> mechanism, it does not need to detect the error conditions and raise
> BAD_CONTEXT.

Sounds good to me.

> [As an aside here, I thought BAD_CONTEXT had to do with the IDL
context
> mechanism, not service contexts.  Aren't ambiguous System Exceptions
fun? :-)]

Hmmm... The only semantics for BAD_CONTEXT are in an IDL comment in
chapter 3:

	exception BAD_CONTEXT ex_body; // error processing context
object

>From this, I agree with you -- BAD_CONTEXT apparently relates to
context
objects, not to service contexts. We could use MARSHAL where we
suggested
BAD_CONTEXT. BAD_TYPECODE might be another option.

> This looks pretty good.  Since it is entirely optional, we don't
really have to do
> a comprehensive cost/benefit or resource use/performance comparison.

Given the inefficiency of the current marshaling rules, I think the
benefit
would quite noticable as soon as a type code is sent more than
once. In
particular, unmarshaling a type code is very expensive compared to
looking
it up in a cache, so I would expect to see performance improvements in
both
bandwidth and CPU cycles. Because the caching information is
piggy-backed
onto normal messages, maintaining the caching information is (almost)
free.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Tue, 7 Jul 1998 08:38:17 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Paul H Kyzivat <paulk@noblenet.com>
cc: orb_revision@omg.org, interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581
Content-ID: <Pine.OSF.3.96.980707083146.30803B@tigger.dstc.edu.au>

On Mon, 6 Jul 1998, Paul H Kyzivat wrote:

> I think that something is needed here, and caching seems attractive
in
> many ways. Yet this proposal seems to be offhandedly introducing new
> concepts that have portentially broad impact. Most major is
introducing
> the notion of Session.

The proposal only covers GIOP, which is connection-oriented. Using a
connection as a session in this context makes sense (IMO). If we
add other, non-connection-oriented protocols, the same caching concept
may not even apply, or such protocols can choose to add a caching
concept that is appropriate for the transport.

> >      To make caching work, we need a session concept. A session is
> >      simply an IIOP connection. Once a connection goes down,
> >      all previously cached type codes become invalid.
> 
> Marshalling is not unique to IIOP, so it isn't sufficient to refer
> to
> IIOP to define what a session is. There may well be CORBA
> implementations that use other transports having other notions of
> connection, or none at all.

There are no other connection-less transports specified at this point
in time. The proposal is not meant to present a universal caching
strategy
that will work for all protocols current and future. It simply defines
a caching strategy for GIOP.

> >      If either side runs out of room for cached type codes, it can
> >      close the connection to clear both caches. The client the 
> >      re-opens the connection and both sides start afresh.
> 
> This cannot be guaranteed to work. Once the connection is dropped,
> there
> is no guarantee that it can be re-established, or that references
> previously obtained will still be valid.

This seems to be in conflict with the current words in specification,
which
explicitly state that a client can close a connection at any time, and
that
a server can close a connection after sending a CloseConnection
message
when there are no outstanding requests on the connection. The spec
explicitly
requires rebinding from the client if the server closes a connection.

> For instance, once the
> connection is dropped the server may decide it is no longer needed
> and
> shutdown. Then a subsequent connection attempt may start a new
> server.
> Transient object references from the prior server would then be
> invalid.

This is not my understanding of the words in the spec. If references
are
transient, yes, they will not work for another process
instance. However,
the server is in error if it simply goes away and invalidates
references
just because a client closed a connection (because GIOP explicitly
permits
the client to do just that).

> If the notion of Session is to be used, I would like to see it
> incorporated into the OMA.

I don't think the OMA would be the right place. The OMA is at a much
higher
level of abstraction than protocol-level caching or protocol-level
sessions.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <paulk@noblenet.com>
Date: Mon, 06 Jul 1998 19:13:29 -0400
From: Paul H Kyzivat <paulk@noblenet.com>
Organization: NobleNet
To: Michi Henning <michi@dstc.edu.au>
CC: orb_revision@omg.org, interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980707083145.30803A-100000@tigger.dstc.edu.au>

Michi Henning wrote:
...
> The proposal only covers GIOP, which is connection-oriented. Using a
> connection as a session in this context makes sense (IMO). If we
> add other, non-connection-oriented protocols, the same caching
concept
> may not even apply, or such protocols can choose to add a caching
> concept that is appropriate for the transport.
...
> There are no other connection-less transports specified at this
point
> in time. The proposal is not meant to present a universal caching
> strategy
> that will work for all protocols current and future. It simply
defines
> a caching strategy for GIOP.

OK. I give in.

> 
> > >      If either side runs out of room for cached type codes, it
> can
> > >      close the connection to clear both caches. The client the
> > >      re-opens the connection and both sides start afresh.
> >
> > This cannot be guaranteed to work. Once the connection is dropped,
> there
> > is no guarantee that it can be re-established, or that references
> > previously obtained will still be valid.
> 
> This seems to be in conflict with the current words in
> specification,
> which
> explicitly state that a client can close a connection at any time,
> and
> that
> a server can close a connection after sending a CloseConnection
> message
> when there are no outstanding requests on the connection. The spec
> explicitly
> requires rebinding from the client if the server closes a
> connection.
> 
> > For instance, once the
> > connection is dropped the server may decide it is no longer needed
> and
> > shutdown. Then a subsequent connection attempt may start a new
> server.
> > Transient object references from the prior server would then be
> invalid.
> 
> This is not my understanding of the words in the spec. If references
> are
> transient, yes, they will not work for another process instance.
> However,
> the server is in error if it simply goes away and invalidates
> references
> just because a client closed a connection (because GIOP explicitly
> permits
> the client to do just that).

This is a bit off the subject - it is probably another portability
issue. But it won't hurt to discuss it briefly here and see where it
goes.

If a server can't shut down when a client (presumably the last client)
goes away then when can it? Surely it can't be prevented from shutting
down until all transient objects have been destroyed via overt
actions.

For instance, a COSNaming server has to create BindingIterators under
some circumstances. These are about as good a candidate for a
transient
object as I can imagine. A buggy client is quite likely to forget to
destroy one of these.  It would be folly for the server to be forced
to
keep these forever, or to be prevented from shutting down because they
are present. A robust server is likely to use the close of the
connection as an excuse to get rid of these things. 

So, a client can drop a connection and re-establish it at will, but
there are bound to be consequences to doing so. As a result, I don't
think we should be inventing new problems that can only be cured by
doing this.

Your other solution (just not saving any more) is more appealing,
though
it may be sub-optimal for a very long lived connection.

> 
> > If the notion of Session is to be used, I would like to see it
> > incorporated into the OMA.
> 
> I don't think the OMA would be the right place. The OMA is at a much
> higher
> level of abstraction than protocol-level caching or protocol-level
> sessions.

Because your proposal is merely an optimization it doesn't need higher
level visibility. But my digression into the implications of dropping
connections indicates that there may be more to the session notion. It
is more a consequence of the TRANSIENT policy than this subject.
 
Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Tue, 7 Jul 1998 09:22:25 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Paul H Kyzivat <paulk@noblenet.com>
cc: orb_revision@omg.org, interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581
Content-ID: <Pine.OSF.3.96.980707091427.30803J@tigger.dstc.edu.au>

On Mon, 6 Jul 1998, Paul H Kyzivat wrote:

> This is a bit off the subject - it is probably another portability
> issue. But it won't hurt to discuss it briefly here and see where it
> goes.
> 
> If a server can't shut down when a client (presumably the last
> client)
> goes away then when can it? Surely it can't be prevented from
> shutting
> down until all transient objects have been destroyed via overt
> actions.
> 
> For instance, a COSNaming server has to create BindingIterators
> under
> some circumstances. These are about as good a candidate for a
> transient
> object as I can imagine. A buggy client is quite likely to forget to
> destroy one of these.  It would be folly for the server to be forced
> to
> keep these forever, or to be prevented from shutting down because
> they
> are present. A robust server is likely to use the close of the
> connection as an excuse to get rid of these things. 

A compliant and robust server is entitled to axe an iterator at any
time
without warning, even if the connection never goes down. This is
necessary
so the server can protect itself from state pile-up. If a server
couldn't
do this, a malicious client could create iterators in an infinite loop
without ever destroying them.

Connection management is completely unrelated to how and when a server
decides to destroy a transient object.

Besides, we have this situation now already. Many ORBs will
automatically
shut down a server after some period of idle time. If a client creates
an iterator and then doesn't use it for a while, the server will still
shut down (regardless of whether the connection stays open or not).
When the client next uses the iterator, it won't work if the iterator
is transient.

So, our proposal does not change the existing situation at all. In
addition,
I don't think there is a bug here. If a server creates transient
iterators
and then shuts down, it simply shouldn't shut down if it considers
breaking
these iterators as unacceptable...

> So, a client can drop a connection and re-establish it at will, but
> there are bound to be consequences to doing so. As a result, I don't
> think we should be inventing new problems that can only be cured by
> doing this.

As I said above, we aren't inventing new problems, because the same
thing
can happen already, without caching. In addition, I don't think there
is a problem in the first place. If the server wants to make sure that
its transient objects remain available, it shouldn't shut down.

If an ORB reaps transient objects as a result of connection closure,
that
ORB is simply broken, because connection closure doesn't mean
anything.

> Your other solution (just not saving any more) is more appealing,
though
> it may be sub-optimal for a very long lived connection.

Yes. We considered alternatives to not accepting more cache entries
or to close the connection. We discarded these alternatives because it
would have complicated the caching protocol a lot. The way things are,
there is no problem with cache coherency. As soon as you want to allow
one side or the other to selectively invalidate cache entries (say, on
an least recently used basis), all sorts of consistency issues and
race
conditions pop up.

Basically, the proposal we drafted gets us a better than 90% solution
very simply. To get the last few percent and achieve perfection would
probably complicate things more than the optimization is worth.

> Because your proposal is merely an optimization it doesn't need
higher
> level visibility. But my digression into the implications of
dropping
> connections indicates that there may be more to the session
notion. It
> is more a consequence of the TRANSIENT policy than this subject.

Well, as I said, I don't think there is a problem here. Server
shut-down
is always under control of the server. If the server shuts down and
then
complains that its transient references stop working, it shouldn't
shut down...

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <db@Eng.Sun.COM>
Sender: db@Eng.Sun.COM
Date: Mon, 06 Jul 1998 17:11:27 -0700
From: David Brownell <db@Eng.Sun.COM>
Organization: JavaSoft, Inc.
To: Michi Henning <michi@dstc.edu.au>
CC: Paul H Kyzivat <paulk@noblenet.com>, orb_revision@omg.org,
interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980707083145.30803A-100000@tigger.dstc.edu.au>

Michi Henning wrote:
> 
> 
> The proposal only covers GIOP, which is connection-oriented. Using a
> connection as a session in this context makes sense (IMO). 

But that's not a session, it's a connection.  Sessions persist over
multiple connections, per standard networking usage.  I'd resist
equating the two, particularly now that we're starting to see many
more session oriented protocols finally cropping up.  (SSL sessions
can be resumed, etc; HTTP/HTTPS often uses cookies or URL rewriting,
and so on.)


> > >      If either side runs out of room for cached type codes, it
can
> > >      close the connection to clear both caches. The client the
> > >      re-opens the connection and both sides start afresh.
> >
> > This cannot be guaranteed to work. Once the connection is dropped,
there
> > is no guarantee that it can be re-established, or that references
> > previously obtained will still be valid.

Right ...

> This seems to be in conflict with the current words in
specification, which
> explicitly state that a client can close a connection at any time,
and that
> a server can close a connection after sending a CloseConnection
message
> when there are no outstanding requests on the connection. The spec
explicitly
> requires rebinding from the client if the server closes a
connection.

But it doesn't require that the connection CAN be re-established, or
that
those references would still work.  Intentionally so!!


Better would be session management tools, such as letting the session
purge typecodes when both sides agree ... forced rebinding is costly,
and the space reduction shouldn't care which typecodes are freed.
Best
to do it to the ones the client doesn't care about.

Similarly with iterators and other stuff associated with the session
between
client and server (i.e. any two communicating tiers) ... the same
logic
applies, except even more so:  the client may be completely *unable*
to
reestablish the session state that would otherwise be discarded.  Or
the
act of reestablishing it could be prohibitively expensive, forcing
N-level
distributed "undo" (for example) ...

Certainly, the advantage of having a "Session" object in the OMA is
that
it could provide the appropriate hook for ill-defined clients which
aren't
participating in those session state management protocols.  "No
Session, 
No Service".  Whether to accept a given client in a session is another
issue...
one could want to authenticate the client code (not user!) to help get
rid of
the inevitable misbehaved clients out there.

- Dave
 
Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Tue, 7 Jul 1998 10:19:29 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: David Brownell <db@Eng.Sun.COM>
cc: Paul H Kyzivat <paulk@noblenet.com>, orb_revision@omg.org,
interop@omg.org
Subject: Re: Issues 817, 1138, 1531, and 1581
Content-ID: <Pine.OSF.3.96.980707101652.30803P@tigger.dstc.edu.au>

On Mon, 6 Jul 1998, David Brownell wrote:

> Michi Henning wrote:
> > 
> > 
> > The proposal only covers GIOP, which is connection-oriented. Using
> a
> > connection as a session in this context makes sense (IMO). 
> 
> But that's not a session, it's a connection.  Sessions persist over
> multiple connections, per standard networking usage.

I disagree. In our proposal, a session has the same duration as a
connection
*by definition*.

If you are worried about the term "session", fine. We can rename it
to "type code caching context" or some such.

> I'd resist
> equating the two, particularly now that we're starting to see many
> more session oriented protocols finally cropping up.  (SSL sessions
> can be resumed, etc; HTTP/HTTPS often uses cookies or URL rewriting,
> and so on.)

I think renaming to some term other than "session" would help to avoid
confusion.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Wed, 15 Jul 1998 14:50:45 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Michi Henning <michi@dstc.edu.au>
CC: interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980716061314.22149D-100000@tigger.dstc.edu.au>

My comments inline below:

Michi Henning wrote:

> New marshaling rules:
> 
>         Strings:
> 
>         The null terminator for strings and wide strings is dropped,
>         and the count of bytes/characters preceding the string is
>         just the number of characters, not counting any terminator
>         (because there is none). This gets rid of the issue that
> empty
>         strings effectively consume 8 bytes on the wire, which has
> nasty
>         implications for marshaling type codes for structured types
>         (in some cases, the null termination of CDR 1.0 strings
> leads
>         to type codes that are around 35% larger on the wire).

One more twist on the issue.  This method of encoding strings requires
that C and C++ implementations copy the string before presenting it to
application code (for example as an in parameter to an operation
invocation) whenever the string does not happen to be followed by a
zero
octet.  Of course it will happen most of the time that the string is
followed by a pad octets, but the CDR 1.0 spec explicitly does not
require the pad octets to be zero.  Can we change the proposal for CDR
1.1 to require pad octets to be zero?

Also, have we given consideration to the cost/benefit of requiring C
and
C++ implementations to copy strings in this situation?

> A sending ORB must ensure that cached typecodes do not appear inside
> encapsulations. This is because the cache index for cached type
> code makes sense only for one particular client-server pair. If
> cached type codes were allowed inside encapsulations, this would
> force the receiver of an encapsulation to decode it just to be able
> to safely forward the encapsulation to a new destination.
v
It would be better to word this more strongly to state that the cached
typecode format can only appear at the top level in GIOP Request &
Reply
messages and not in any encapsulations embedded in those messages. 
This, of course, means that anys embedded in anys cannot take
advantage
of typecode caching.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 09:06:33 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Jonathan Biggar wrote:

> > 
> >         Strings:
> > 
> >         The null terminator for strings and wide strings is
> dropped,
> >         and the count of bytes/characters preceding the string is
> >         just the number of characters, not counting any terminator
> >         (because there is none). This gets rid of the issue that
> empty
> >         strings effectively consume 8 bytes on the wire, which has
> nasty
> >         implications for marshaling type codes for structured
> types
> >         (in some cases, the null termination of CDR 1.0 strings
> leads
> >         to type codes that are around 35% larger on the wire).
> 
> One more twist on the issue.  This method of encoding strings
> requires
> that C and C++ implementations copy the string before presenting it
> to
> application code (for example as an in parameter to an operation
> invocation) whenever the string does not happen to be followed by a
> zero
> octet.  Of course it will happen most of the time that the string is
> followed by a pad octets, but the CDR 1.0 spec explicitly does not
> require the pad octets to be zero.  Can we change the proposal for
> CDR
> 1.1 to require pad octets to be zero?

I have no problem with requiring pad octets to be zero. However, I
don't
think that will solve the problem in all cases. For example, if the
string occurs as part of a struct and is followed by a char, there
won't
be any padding following the string.

> Also, have we given consideration to the cost/benefit of requiring C
and
> C++ implementations to copy strings in this situation?

I can see only one case in which a copy would be required, namely when
a string is passed as an in parameter to a server. I can't think of
any other case where the missing NUL terminator would be an issue
because
in all other cases, the string is allocated by the client or server
is passed by a pointer that must be deallocated by the receiver. In
those cases, you can't point directly into the marshaling buffer
anyway.

So, back to the in string parameter. Let's see what the performance
hit
would be, at least roughly. For collocated calls, it's not an issue
because
no marshaling buffer is involved. So, the additional string copy is
necessary only for remote calls (during unmarshaling of the string,
the
server-side ORB has to copy the string into a separate NUL-terminated
hunk of memory and pass a pointer to that hunk to the skeleton).

What's the cost of that? A call to the memory allocator followed by a
memcpy(). The memcpy() cost is absolutely negligible. The call to the
allocator is expensive however. How much more expensive compared to
all the other things that already go on during call dispatch? I don't
know, but maybe some ORB implementors could chip in here? But given
all
the other things that already happen during call dispatch, my gut
feeling is that the additional string copy won't be noticable.

> > A sending ORB must ensure that cached typecodes do not appear
inside
> > encapsulations. This is because the cache index for cached type
> > code makes sense only for one particular client-server pair. If
> > cached type codes were allowed inside encapsulations, this would
> > force the receiver of an encapsulation to decode it just to be
able
> > to safely forward the encapsulation to a new destination.
> 
> It would be better to word this more strongly to state that the
cached
> typecode format can only appear at the top level in GIOP Request &
Reply
> messages and not in any encapsulations embedded in those messages. 
> This, of course, means that anys embedded in anys cannot take
advantage
> of typecode caching.

You are right, and I would not object to a stronger wording. However,
how often to Anys appear inside other Anys? I suspect not very often
(but I'm sure someone will scream and tell me otherwise if I'm wrong
;-)

The more important question (to me) is: Is it affordable for
implementations
to have this requirement imposed on them? Basically, it is very easy
to say that cached type codes cannot appear inside
encapsulations. However,
how hard or easy is it to implement that?

The way I see it, the marshalling code that marshals type codes would
have to know whether or not that type code is being marshaled at a
point
where it will eventually be encapsulated. If the type code is goint to
go into an encapsulation, the marshaling code would have to avoid
using
cached type codes.

Alternatively, when the marshaling code decides that something needs
to
be encapsulated, it could scan the data to be encapsulated in order to
replace cached type codes with full ones.

Of course, there are all sorts of optimizations to make either
approach
cheaper, but is it reasonable to expect marshaling engines to deal
with this?
If not, I'm afraid the cached type codes will encounter serious
obstacles...

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <javier@cnd.hp.com>
Date: Wed, 15 Jul 1998 18:19:07 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: interop@omg.org, michi@dstc.edu.au
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
Content-Md5: iDvXHF6DqoSMCN6U9rH9RQ==

Michi,

Let me start by saying that I mostly agree with what you have proposed
(together with Keith and Tom).  I would like to suggest some
modifications
though to try and complete the proposal.

> New marshaling rules:
> 
>	Strings:
> 
>	The null terminator for strings and wide strings is dropped,
>	and the count of bytes/characters preceding the string is
>	just the number of characters, not counting any terminator
>	(because there is none). This gets rid of the issue that empty
>	strings effectively consume 8 bytes on the wire, which has
> nasty
>	implications for marshaling type codes for structured types
>	(in some cases, the null termination of CDR 1.0 strings leads
>	to type codes that are around 35% larger on the wire).

Good.

>	Type any:
> 
>	A second, alternative encoding for type any is added to CDR
> 1.1.
>	The alternative encoding is indicated using a mechanism
> similar
>	to the one for indirection for recursive type codes.
>	A value of 0xfffffffe for TCKind indicates the alternative
>	any encoding. The difference to the normal encoding of type
> any
>	is that instead of having an any contain a type code followed
>	by the data, in CDR 1.1, the any contains a type code followed
>	by the *encapsulated* data.
> 
>	This gets rid of a range of problems relating to marshaling
>	of type any. For example, an event channel cannot receive an
>	any and send it to a consumer without completely unmarshaling
>	and remarshaling every any value. The CDR 1.1 encoding for any
>	values permits an event channel to both unmarshal and marshal
>	an any as a blob, without having to decode and re-encode all
> of
>	the data in the any.
> 
>	The reason for using 0xfffffffe instead of an ordinary
> tk_whatever
>	constant is that this avoids making changes to the type code
>	API. The new marshaling is completely transparent to the
> application
>	code.
> 
>	CDR 1.1 any is represented as (in this order):

The old tk_any + typecode + value is still valid in CDR 1.1.  What you

present here is the alternative encoding, that might or might not be
used at the ORB's discretion (just clarifying that the old encoding
is also valid in CDR 1.1 ...).

> 
>	- 0xfffffffe (4 bytes)
> 
>	- type code (normal CDR 1.0 encoding or new cached encoding,
>	  see below)
> 
>	- *encapsulation* of the any value
> 
>	This means we have two encodings of type any side by side, the
old
>	CDR 1.0 encoding (which is still valid), and the new encoding.
>	How does the sender decide how to marshal an any?
> 
>	One of several possible strategies:
> 
>	- Use CDR 1.0 encoding for all anys with simple type codes
>	  that are fixed length, and for all anys carrying a simple
>	  fixed-length value or a single string. For such anys,
>	  the CDR 1.1 encoding just adds bulk (an extra 8 bytes).
> 
>	- Use the CDR 1.1 encoding for all anys with complex type
>	  codes that have several parameters.
> 
>	- Use the CDR 1.0 encoding if the sender wants to use
>	  fragmentation (encapsulation of the value can get in
>	  the way of fragmentation if the value is larger than
>	  the largest supported fragment size).


What follows is, in my opinion, a separate proposal, that builds on
the above (ie, it requires it to be doable).  However, the above rules
could be approved independently of the caching mechanism.  Right?

>	Type codes:
> 
>	- CDR 1.1 adds caching for type codes to save bandwidth. The
>	  new encoding is indicated by a type code with the value
>	  0xfffffffd in the first four bytes (again, to avoid changing
>	  the type code API).
> 
>	  The idea of cached type codes is to avoid repeatedly
> marshaling the
>	  same type code over and over again. Avoiding this is
>	  particularly important for any values sent to event
> channels.
>	  Typically, the same type code is sent over and over with
>	  every event, and for structures, the type code can easily be
>	  10 times the size of the actual value (if the type code
>	  contains member names). Even without member names, the type
> code
>	  can still be more than 5 times the size of the value.
> 
>	  Basic strategy: If a client and a server talk to each other
>	  and both have indicated that they are able to do type code
>	  caching, the sender can send a number instead of a full type
>	  code if the receiver has previously indicated that it has
>	  cached the type code incicated by the number.
> 
>	  To make caching work, we need the notion of a type code
> caching
>	  context that indicates when caches are valid. A type code
> caching
>	  context is simply an IIOP connection. Once a connection goes
> down,
>	  all previously cached type codes become invalid.
> 
>	  Exchange of caching information requires some negotiation
> between
>	  sender and receiver. GIOP 1.2 uses the service context to
>	  to this. We propose two new service contexts:
> 
>	  module IOP {
>		  // ...
>		  const ServiceId TypeCodeCacheEnabled = N; (to be
> assigned)
>		  const ServiceId ReposIdCache = N+1;	    (ditto)
>		  // ...
>	  };
> 
>	  For TypeCodeCacheEnabled, the corresponding context_data
>	  octet sequence of the service context is empty.
> 
>	  TypeCodeCacheEnabled is used by the sender to indicate to
> the
>	  receiver that it understands caching.
> 
>	  For example, when a client sends the first request to a
>	  server via a connection, the client can add a
> TypeCodeCacheEnabled
>	  entry to the service context of the request message. This
> indicates
>	  to the server that the client understands caching.
> 
>	  Correspondingly, the server can on its first reply via a
> particular
>	  connection indicate to the client that it understands
> caching
>	  by adding a TypeCodeCachedEnabled entry to the service
> context
>	  of its reply message.
> 
>	  TypeCodeCacheEnabled allows client and server to inform each
>	  other about their capabilities. Client and server need to
> send
>	  this context ID only once per connection (but are allowed to
>	  send it with every message, and need not send it with the
>	  first message).
> 
>	  TypeCodeCacheEnabled messages permit either side to decide
>	  whether caching of type codes is worthwhile (without this
>	  message, a receiver may uselessly cache type codes even
>	  though the sender will never take advantage of the cache).

With this mechanism that you have proposed, both ends MUST support the
caching, and be willing to cache.  I would prefer a mechanims that
would allow each end to decide whether to support caching or not.

Suggestion: modify the above negotiation to be directional; if both
ends want to do caching, you get the same behaviour as above, and if
only one end wants caching, at least you might get the benefit in
that direction.  The way to modify the negotiation may be by using
values attached to the TypeCodeCacheEnabled service context, or by
using two service contexts.

For the sake of the explanation, make it two service contexts, called
"WantYouToCache" and "OkI'llCache".  The negotiation would go as
follows: client sends "WantYouToCache" with some request; if server
accepts, sends the "OkI'llCache" in the response; and if the server
also wants the client to cache, then it will send the "WantYouToCache"
service context as well (in the same message, or separately), and if
the client is able (and willing) to cache, it'll reply with another
"OkI'llCache" (in some future message).

Only after a sending end (client or server) has seen a "OkI'llCache"
service context, it will be able to use the optimization.  This 
effectively makes both communication directions independent, does
not add complexity, and increases flexibility.

Maybe this was already possible with your proposal, and I was missing
something.  If it is so, let me know.


>	  Once client and server have agreed to use type code caching,
>	  cached type codes work as follows:
> 
>	  When a client sends a type code for the very first time, it
must
>	  send it using normal CDR 1.0 encoding rules (as a full type
code).
>	  If the server understands caching and has cached this type
code,
>	  it includes a ReposIdCache entry with in its reply service
context.

The receiving side might cache at this time (might not "have cached"
the
typecode previously), right?

>	  A ReposIdCache entry contains a sequence of pairs. Each
>	  pair is the repository ID of a type code sent by the client
>	  previously, together with an index value. The encoding in
the
>	  context_data member of a ServiceContext for ReposIdCache is:
> 
>		struct CacheEntry {
>			string	repositoryID;
>			long	index;
>		};
>		typedef sequence<CacheEntry> CacheEntrySeq;
> 
>	  So, if a client invokes an operation where the in or inout
>	  parameters contain values of type any (possibly nested),
>	  the server can return a sequence of those type codes that
>	  it has chosen to cache. Each sequence member contains
>	  a repository ID and the index value that was assigned to
that
>	  repository ID by the server. The receiver of a type code
must
>	  return CacheEntrySeq only for type codes it received in the
>	  corresponding request.

You are mentioning "possibly nested" typecodes here.  With your
modification
below, this is no longer possible, right?  This can only be used with
outermost typecodes.

Also, why the limitation of only being able to send typecodes received
in a request in the corresponding response?  I think this is an
arbitrary
limitation, that could be dropped.  Effectively, it is not there in
the
server to client negotiation ...

>	  If the client sends another any value with a previously sent
>	  type code which the server has cached, it encodes the type
code
>	  as 0xfffffffd, followed by a single parameter value of type
>	  unsigned long. That parameter is the index of the type code
>	  previously assigned by the server.
> 
>	  The net effect is that if a server understands caching, the
client
>	  needs to send the full typecode only once per connection,
and
>	  thereafter just sends the cache index for that type code.

The client needs to keep a "pseudo" cache (or something like that)
with
repository ids and server cache indexes.

>	  A client can use the tk_implict encoding only for those type
>	  codes it has previously received an index for.

Oops, you ment the 0xfffffffd encoding ...

>	  For typecodes returned from the server to the client (as
inout
>	  or out parameters or return values), we again use the
service
>	  context. Suppose the server returns an any value to the
client
>	  for the first time. If the server previously (or with this
reply)
>	  has indicated to the client that it understands caching, the
client
>	  can cache the type code it has just received and assign an
index
>	  to that type code. The next time the client makes a request
to
>	  the server, it sends the repository ID for the type code and
>	  the index to the server. This indicates to the server that
in
>	  future (on the same connection), it can just send the index
>	  to the client for that type code (instead of the full type
code).

This is just the other direction of the same mechanism.  I would
prefer
them to be separately configurable, as I mentioned above.

>	  If a connection ever goes down, all cached type codes are
thrown
>	  away at both ends.

Ok.

>	  The client and server caches are completely independent from
>	  each other and each uses its own index range.

Yes, but you need an extra "thing" on the sending end to keep the
agreed
mapping between repository id and cache index in the receiving side.

>	  If either side runs out of room for cached type codes, it
can
>	  close the connection to clear both caches. The client then
re-opens
>	  the connection and both sides start afresh. (This happens as
>	  part of the normal rebinding after a CloseConnection message
>	  from a server).
> 
>	  Alternatively, if a cache fills up, the party with the full
cache
>	  can simply stop sending further ReposIdCache service
contexts
>	  and sit on whatever type codes it has currently cached.
> 
>	  The type caching scheme deliberately does not include to
explicitly
>	  outdate a cache (other than by dropping a connection), or to
>	  selective outdate particular cache entries. The reason for
this
>	  decision is to keep complixity down and to avoid race
conditions
>	  in multi-threaded clients and servers.

This is what I think is weakest in the proposal.  Forcing the close of
the connection to clean up the full cache seems to me like too
drastical.
Besides, you might want to get rid of 50% of your cache, but not the
rest; if you invalidate the whole cache, you have to repopulate it
with
the "still valid" 50% that you just threw away.

Also, not being able to cache beyond a certain point seems problematic
for long-running connections.  You probably have a lot of useless
"old"
cached typecodes, but you cannot get rid of them in any way (unless
you
close the connection).

I would like to propose a mechanism to selectively invalidate the
caches
at either end (the receiving cache) that, as far as I can tell, has
no problems of race conditions.

First, add a new service context, something like
InvalidateCacheEntries,
that would carry a sequence of cache ids to be invalidated.  This
could
be sent at any time by the "sending" entity, and would inmediately
invalidate entries in the "receiving" side (as soon as received);
as soon as the message containing this is sent, no other message from
the sending side might use the removed cache ids.  As the cache is
specific to a connection, there is no risk of other threads stepping
over the operation.

Second, add another service context, call it WantToInvalidate, that
would
carry a sequence of cache ids. This would be sent by the "receiving"
entity at any time, to indicate to the "sending" side of the need to
clean up some entries; sending this does NOT invalidate the entries,
only notify the sending entity of the need to clean up.  The "sending"
side should, after receiving this, send an InvalidateCacheEntries
service context, with a list of cache ids to invalidate at the
"receiving"
end.  Note that the WantToInvalidate provides only a "hint" of what
entries to invalidate, the one that actually eliminates entries is
the list sent with InvalidateCacheEntries.  As this message is only
a hint, there should be no concern with race conditions or deadlocks.

This would still be piggy-backed in the normal messages, and would
allow for a relatively easy recovery mechanism to clean up
unwanted entries in the caches.  This may be used by either end of
the connection, to clean up the corresponding caches.


>	  Neither side is obliged to send cached type codes, even if
>	  the other end previously has sent a cache entry for that
type
>	  code. It is *always* legal to send the full type code
instead.
> 
>	  If either side sends an index value inside a cached type
code
>	  that hasn't been seen as part of a CacheEntry previously,
>	  a MARSHAL exception is raised to the client application
code.

Ok.

>	  If either side sends a cached type code when the receiver
>	  hasn't previously indicated willingness to cache type codes
with
>	  TypeCodeCacheEnabled, BAD_SERVICE_CONTEXT is raised.

Actually, there are two distinct problems: if the sending side sends a
ReposIdCache service context without prior TypeCodeCacheEnabled, then
BAD_SERVICE_CONTEXT should be raised.  If a cached type code (a
typecode
with kind == 0xfffffffd) is sent in a message, and no previous
negotiation
has happened, then MARSHAL should be used.

>	  If either side sends a repository ID inside a CacheEntry
that
>	  was cached previously, BAD_SERVICE_CONTEXT is raised.
> 
>	  If either side sends an index value inside a CacheEntry
>	  when that index value is already in use, BAD_SERVICE_CONTEXT
>	  is raised.

I would prefer these two to allow for repetitions to happen. That is,
if the repository id and cache id are the same as already in the
cache,
no exception needs to be raised.

One global comment about the BAD_SERVICE_CONTEXT exception: as this
mechanism is piggy backed in "normal" messages, where should this
exception be sent?  The user has no need to see it, and the ORB
probably
is unable to do anything about it.  So we might as well point this as
a hard error in the protocol, and an indication that whoever was going
to raise the exception should close the connection at the earliest
possible time ...

I'm not sure about this one though.  What do others think?


> Note that we have added a new system exception to indicate a bad
service
> context. This is because, according to the comment in chapter 3 of
the spec,
> BAD_CONTEXT is to be used for bad context *objects*, not for bad
> service contexts (which didn't exist when that exception was
defined).
> 
> For all exception conditions, we can define appropriate minor codes
> to indicate what is wrong with a context.
> 
> All of the above relies an CDR 1.1, which can only occur as part of
GIOP 1.2.
> None of the type code caching is mandatory, so it is fully backward
> compatible with previous versions, and ORB vendors are not obliged
to
> implement it.
> 
> The BAD_SERVICE_CONTEXT exceptions are optional (neither side is
obliged
> to raise these, but can). This allows an ORB that doesn't want to
cache
> type codes to completely ignore the service context.

Ok, fine with me.

> A sending ORB must ensure that cached typecodes do not appear inside
> encapsulations. This is because the cache index for cached type
> code makes sense only for one particular client-server pair. If
> cached type codes were allowed inside encapsulations, this would
> force the receiver of an encapsulation to decode it just to be able
> to safely forward the encapsulation to a new destination.

This is pretty strong.  I see the reason behind it, but I know of some
particularly common cases where this restriction seems too strong.
I will try to think whether there is a different way to solve the
problem
instead of plainly forbidding it.


I would like to make another proposal, complementary to this one.
Michi, I know you hate it, but I still think that it is worth making
the proposal and letting others scream and yell ... or support it.
This complementary proposal depends on the first part of this one
(CDR 1.1) but does not depend on the caching mechanism here proposed
(although it would perfectly coexist with it, without problems).

This other proposal is the introduction of an "application level"
mechanism for doing parts of what is being suggesting here.  The base
of
this proposal is to introduce a new TC_Kind (call it tk_implicit),
that
could be used instead of any typecode that has a repository id.  An
any
with an "implicit" typecode would be marshalled as the tk_implicit
typecode (that is an int followed by the repository id, and nothing
else),
and then followed by the encapsulated value.  Note the value is
encoded
as an encapsulation. In the receiving end, the repository id would be
enough to get the full typecode (if needed).

The sending application should explicitly replace the complete
typecode
by the equivalent "implicit" typecode.

At the receiving side, the application would be responsible for the 
identification of the appropriate type being transmitted, either by
looking at the repository id in the typecode, or by other application
defined mechanisms (ie, NVPairs normally carry enough information in
the name to know what the right type for the value is), and extract
the value into the right variable.

To support this, we need a mechanism to create "implicit" typecodes
from
full typecodes, and to replace typecodes in the language mappings.
Also, the equivalent() proposal by Jon Biggar should be modified to
make implicit typecodes equivalent to full typecodes with the same
repository id.

It would be the application responsibility to negotiate the use of
this
mechanism, either one-way or two-ways.  If the receiving end does not
understand what was received, it may raise the MARSHAL system
exception
(or other that is more appropriate, maybe even a new one); if it is
the client receiving a response who receives the implicit that is not
understood, the it would be ignored.  In both cases, this should be
considered as an application error, either programming or
configuration.

This does not require almost anything from the ORB vendors, and only
the
applications that want to use it will pay for it.  Besides, it does
not
require any extra cache mechanism (the application is likely to have
the knowledge already, so no extra expense there).


Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com


Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Wed, 15 Jul 1998 17:21:16 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Michi Henning <michi@dstc.edu.au>
CC: interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980716085128.22682C-100000@tigger.dstc.edu.au>

Michi Henning wrote:

> The more important question (to me) is: Is it affordable for
implementations
> to have this requirement imposed on them? Basically, it is very easy
> to say that cached type codes cannot appear inside
encapsulations. However,
> how hard or easy is it to implement that?

I don't think this is a problem...

> The way I see it, the marshalling code that marshals type codes
would
> have to know whether or not that type code is being marshaled at a
point
> where it will eventually be encapsulated. If the type code is goint
to
> go into an encapsulation, the marshaling code would have to avoid
using
> cached type codes.

This seems the more likely implementation approach.  The marshaling
engine is probably handed a "pointer" to the TypeCode cache that it
can
use during marshalling to emit cached TypeCode references.  It would
be
simple to arrange for that "pointer" to point to an empty cache during
marshaling of encapsulations.

> Alternatively, when the marshaling code decides that something needs
to
> be encapsulated, it could scan the data to be encapsulated in order
to
> replace cached type codes with full ones.

Ick!

> Of course, there are all sorts of optimizations to make either
approach
> cheaper, but is it reasonable to expect marshaling engines to deal
with this?
> If not, I'm afraid the cached type codes will encounter serious
obstacles...

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org


Return-Path: <javier@cnd.hp.com>
Date: Wed, 15 Jul 1998 18:24:33 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: jon@floorboard.com, michi@dstc.edu.au
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
Cc: interop@omg.org
Content-Md5: 2AMq1v8e1ETOuEV8xFCUqQ==

Michi,

> > > A sending ORB must ensure that cached typecodes do not appear
inside
> > > encapsulations. This is because the cache index for cached type
> > > code makes sense only for one particular client-server pair. If
> > > cached type codes were allowed inside encapsulations, this would
> > > force the receiver of an encapsulation to decode it just to be
able
> > > to safely forward the encapsulation to a new destination.
> > 
> > It would be better to word this more strongly to state that the
cached
> > typecode format can only appear at the top level in GIOP Request &
Reply
> > messages and not in any encapsulations embedded in those
messages. 
> > This, of course, means that anys embedded in anys cannot take
advantage
> > of typecode caching.
> 
> You are right, and I would not object to a stronger
wording. However,
> how often to Anys appear inside other Anys? I suspect not very often
> (but I'm sure someone will scream and tell me otherwise if I'm wrong
;-)

You asked for it ;-), so ...

Nested anys are VERY common in the case of CMIP events converted to
CORBA
and forwarded via the event/notification services.  In fact ALL CMIP
events
converted to CORBA are the same basic type, a struct with several
fields,
one of which is the "data" field, that guess what, is an any.  So,
when
you send this to an (untyped) event/notif channel, you have an any
within
an any, always ...

You see, the situation is much more common that what you think,
specially
for certain application domains.

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com


> The more important question (to me) is: Is it affordable for
implementations
> to have this requirement imposed on them? Basically, it is very easy
> to say that cached type codes cannot appear inside
encapsulations. However,
> how hard or easy is it to implement that?
> 
> The way I see it, the marshalling code that marshals type codes
would
> have to know whether or not that type code is being marshaled at a
point
> where it will eventually be encapsulated. If the type code is goint
to
> go into an encapsulation, the marshaling code would have to avoid
using
> cached type codes.
> 
> Alternatively, when the marshaling code decides that something needs
to
> be encapsulated, it could scan the data to be encapsulated in order
to
> replace cached type codes with full ones.
> 
> Of course, there are all sorts of optimizations to make either
approach
> cheaper, but is it reasonable to expect marshaling engines to deal
with this?
> If not, I'm afraid the cached type codes will encounter serious
obstacles...
> 
>							Cheers,
> 
>								Michi.
> --
> Michi Henning              +61 7 33654310
> DSTC Pty Ltd               +61 7 33654311 (fax)
> University of Qld 4072     michi@dstc.edu.au
> AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 10:48:36 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Javier Lopez-Martin <javier@cnd.hp.com>
cc: jon@floorboard.com, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Javier Lopez-Martin wrote:

> > You are right, and I would not object to a stronger
wording. However,
> > how often to Anys appear inside other Anys? I suspect not very
often
> > (but I'm sure someone will scream and tell me otherwise if I'm
wrong ;-)
> 
> You asked for it ;-), so ...

Javier,

there was exactly one specific person I had in mind when I wrote the
above sentence. Guess who it was :-)

> Nested anys are VERY common in the case of CMIP events converted to
CORBA
> and forwarded via the event/notification services.  In fact ALL CMIP
events
> converted to CORBA are the same basic type, a struct with several
fields,
> one of which is the "data" field, that guess what, is an any.  So,
when
> you send this to an (untyped) event/notif channel, you have an any
within
> an any, always ...
> 
> You see, the situation is much more common that what you think,
specially
> for certain application domains.

Yes. If we want to accommodate cached type codes inside
encapsulations,
we get a problem though, namely that the receiver of the encapsulation
can no longer blindly forward it to another destination (which was the
motivation for encapsulating the value part of an Any in the first
place).

This seems like a catch-22: either we have encapsulated Any values, or
we have cached type codes, but I can't see how to have both with the
current proposal.

Javier, I understand your reasoning -- for the JIDM mapping, Anys
nested
inside other Anys are important. On the other hand, how much of CORBA
should be tuned to the requirements of one particular object model
bridge? I'm not saying that what you are asking for is unreasonable,
it's
just that I'm not sure how many concessions CORBA should make to
accomodate bridges that arguable are not the mainstream use of CORBA.
(This is a little bit off-topic and more of a philosophical and
political
question. I didn't mean it to be demeaning.)

One option I am carefully considering (but I'm not sure whether it
makes
sense yet) is that CDR 1.1 could add a flag to encapsulations that
indicates
whether an encapsulation includes a cached type code or not. If not,
it is safe to forward the encapsulation without decoding it.

This would (sort of) get around the problem, but I'm worried about the
ever-increasing complexity. The basic caching proposal we put on the
table has the advantage of simplicity. If we try to layer all-singing
and all-dancing caching stuff onto service contexts, I would probably
prefer to drop the proposal. The intent was not to invent a whole new
protocol, but to get an optimization for the existing one without
disturbing too much.

What we are running into here really is that GIOP is difficult to
extend
for new functionality. GIOP mangles different abstraction levels, such
as address resolution (LocationForward), marshaling (CDR rules),
fragmentation, and connection management into a single big blob.

If this proposal turns into a monolith of complexity and hidden
stateful
interactions between client and server, I definitely won't support it.
I don't want to go down in history as yet another person who has made
CORBA worse instead of better...

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html

Return-Path: <javier@cnd.hp.com>
Date: Wed, 15 Jul 1998 19:11:47 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: javier@cnd.hp.com, michi@dstc.edu.au
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
Cc: jon@floorboard.com, interop@omg.org
Content-Md5: kJFu1ZKwDg60+3i8bUa9+A==

> On Wed, 15 Jul 1998, Javier Lopez-Martin wrote:
> 
> > > You are right, and I would not object to a stronger
> wording. However,
> > > how often to Anys appear inside other Anys? I suspect not very
> often
> > > (but I'm sure someone will scream and tell me otherwise if I'm
> wrong ;-)
> > 
> > You asked for it ;-), so ...
> 
> Javier,
> 
> there was exactly one specific person I had in mind when I wrote the
> above sentence. Guess who it was :-)

I sure did bite in ...

> > Nested anys are VERY common in the case of CMIP events converted
to CORBA
> > and forwarded via the event/notification services.  In fact ALL
CMIP events
> > converted to CORBA are the same basic type, a struct with several
fields,
> > one of which is the "data" field, that guess what, is an any.  So,
when
> > you send this to an (untyped) event/notif channel, you have an any
within
> > an any, always ...
> > 
> > You see, the situation is much more common that what you think,
specially
> > for certain application domains.
> 
> Yes. If we want to accommodate cached type codes inside
encapsulations,
> we get a problem though, namely that the receiver of the
encapsulation
> can no longer blindly forward it to another destination (which was
the
> motivation for encapsulating the value part of an Any in the first
place).
> 
> This seems like a catch-22: either we have encapsulated Any values,
or
> we have cached type codes, but I can't see how to have both with the
> current proposal.

I'm beginning to feel the same way.  One way could be to make the case
of nested anys one for explicit non-encapsulation encoding.  In a non
encapsulated value, you could still use caching (although you couldn't
forward without remarshalling).  It is definitel a catch-22 ...

> Javier, I understand your reasoning -- for the JIDM mapping, Anys
nested
> inside other Anys are important.

I was just mentioning a case that I happen to know quite well.
However,
I think the case is much more prevalent that what you are suggesting.

> On the other hand, how much of CORBA
> should be tuned to the requirements of one particular object model
> bridge? I'm not saying that what you are asking for is unreasonable,
> it's
> just that I'm not sure how many concessions CORBA should make to
> accomodate bridges that arguable are not the mainstream use of
> CORBA.
> (This is a little bit off-topic and more of a philosophical and
> political
> question. I didn't mean it to be demeaning.)

No, I wouldn't ask for anything in CORBA to "accomodate" bridges o
other
object models, unless these things would be useful for a number of
other
general purpose situations.

Sure, small typecodes are VERY useful if you want to get any
performance
in an application that is based on the 'any' type.  And sure, it will 
benefit the CORBA/TMN interworking situation.  But a lot of other
things
in CORBA would benefit as well because of that.

But enough of this off-topic.

Summary: I believe that nested anys are more common that what you
think.
Maybe this could be a configuration option let to the application (or
to
the ORB/app administrator): choose between encapsulated values and not
encapsulated values in the presence of nested anys.

> One option I am carefully considering (but I'm not sure whether it
makes
> sense yet) is that CDR 1.1 could add a flag to encapsulations that
indicates
> whether an encapsulation includes a cached type code or not. If not,
> it is safe to forward the encapsulation without decoding it.
> 
> This would (sort of) get around the problem, but I'm worried about
the
> ever-increasing complexity. The basic caching proposal we put on the
> table has the advantage of simplicity.

I don't particularly like this.  Yes, it will get around the problem,
but at the expense of complexity, as you already mention.  I'd rather
not do this, and force encapsulations to not have cached typecodes
(what a pity :-().

> If we try to layer all-singing
> and all-dancing caching stuff onto service contexts, I would
> probably
> prefer to drop the proposal. The intent was not to invent a whole
> new
> protocol, but to get an optimization for the existing one without
> disturbing too much.
> 
> What we are running into here really is that GIOP is difficult to
> extend
> for new functionality. GIOP mangles different abstraction levels,
> such
> as address resolution (LocationForward), marshaling (CDR rules),
> fragmentation, and connection management into a single big blob.
> 
> If this proposal turns into a monolith of complexity and hidden
> stateful
> interactions between client and server, I definitely won't support
> it.
>
> I don't want to go down in history as yet another person who has
> made
> CORBA worse instead of better...

I think that is a shared feeling around here ... 

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Wed, 15 Jul 1998 18:17:41 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Javier Lopez-Martin <javier@cnd.hp.com>
CC: interop@omg.org, michi@dstc.edu.au
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References: <199807160019.AA086378347@ovdm40.cnd.hp.com>

Javier Lopez-Martin wrote:

> The old tk_any + typecode + value is still valid in CDR 1.1.  What
you
> present here is the alternative encoding, that might or might not be
> used at the ORB's discretion (just clarifying that the old encoding
> is also valid in CDR 1.1 ...).

Right.

> What follows is, in my opinion, a separate proposal, that builds on
> the above (ie, it requires it to be doable).  However, the above
> rules
> could be approved independently of the caching mechanism.  Right?

Right.

> With this mechanism that you have proposed, both ends MUST support
the
> caching, and be willing to cache.  I would prefer a mechanims that
> would allow each end to decide whether to support caching or not.

No.  Both sides need to indicate that they support the caching
protocol,
but it is no problem to use the defined protocol to say that you
support
caching, but then never actually add anything to the cache on your
end. 
If you never add anything to the cache, then the other end can never
send you a cached typecode.

> For the sake of the explanation, make it two service contexts,
called
> "WantYouToCache" and "OkI'llCache".  The negotiation would go as
> follows: client sends "WantYouToCache" with some request; if server
> accepts, sends the "OkI'llCache" in the response; and if the server
> also wants the client to cache, then it will send the
"WantYouToCache"
> service context as well (in the same message, or separately), and if
> the client is able (and willing) to cache, it'll reply with another
> "OkI'llCache" (in some future message).

The two proposed service contexts already add as the "want" and "ok".

> You are mentioning "possibly nested" typecodes here.  With your
modification
> below, this is no longer possible, right?  This can only be used
with
> outermost typecodes.

Good point.  Nested TypeCodes are inside encapsulations!  Could this
be
allowed as an exception to the no-cached-typecodes in encapsulations
rules?  Otherwise if I get a tk_array or tk_sequence, there is no way
to
cache the component.

> Also, why the limitation of only being able to send typecodes
received
> in a request in the corresponding response?  I think this is an
arbitrary
> limitation, that could be dropped.  Effectively, it is not there in
the
> server to client negotiation ...

I agree.  The caching side should be free to send the ReposIdCache
context on any message (containing service contexts!), not necessarily
the direct response.

> >         If the client sends another any value with a previously
sent
> >         type code which the server has cached, it encodes the type
code
> >         as 0xfffffffd, followed by a single parameter value of
type
> >         unsigned long. That parameter is the index of the type
code
> >         previously assigned by the server.
> >
> >         The net effect is that if a server understands caching,
the client
> >         needs to send the full typecode only once per connection,
and
> >         thereafter just sends the cache index for that type code.
> 
> The client needs to keep a "pseudo" cache (or something like that)
with
> repository ids and server cache indexes.

Right, there is both a sender and a receiver side cache data structure
defined for each direction of the conversation.

> This is what I think is weakest in the proposal.  Forcing the close
of
> the connection to clean up the full cache seems to me like too
drastical.
> Besides, you might want to get rid of 50% of your cache, but not the
> rest; if you invalidate the whole cache, you have to repopulate it
with
> the "still valid" 50% that you just threw away.
> 
> Also, not being able to cache beyond a certain point seems
problematic
> for long-running connections.  You probably have a lot of useless
"old"
> cached typecodes, but you cannot get rid of them in any way (unless
you
> close the connection).
> 
> I would like to propose a mechanism to selectively invalidate the
caches
> at either end (the receiving cache) that, as far as I can tell, has
> no problems of race conditions.
> 
> First, add a new service context, something like
InvalidateCacheEntries,
> that would carry a sequence of cache ids to be invalidated.  This
could
> be sent at any time by the "sending" entity, and would inmediately
> invalidate entries in the "receiving" side (as soon as received);
> as soon as the message containing this is sent, no other message
from
> the sending side might use the removed cache ids.  As the cache is
> specific to a connection, there is no risk of other threads stepping
> over the operation.
>
> Second, add another service context, call it WantToInvalidate, that
would
> carry a sequence of cache ids. This would be sent by the "receiving"
> entity at any time, to indicate to the "sending" side of the need to
> clean up some entries; sending this does NOT invalidate the entries,
> only notify the sending entity of the need to clean up.  The
"sending"
> side should, after receiving this, send an InvalidateCacheEntries
> service context, with a list of cache ids to invalidate at the
"receiving"
> end.  Note that the WantToInvalidate provides only a "hint" of what
> entries to invalidate, the one that actually eliminates entries is
> the list sent with InvalidateCacheEntries.  As this message is only
> a hint, there should be no concern with race conditions or
deadlocks.

This would work ok.  The big question is how to detect that you want
to
invalidate entries in the cache and which ones?  Do some LRU
algorithm. 
Of course, it is completely implementation defined--no need to
standardize.

> Actually, there are two distinct problems: if the sending side sends
a
> ReposIdCache service context without prior TypeCodeCacheEnabled,
then
> BAD_SERVICE_CONTEXT should be raised.  If a cached type code (a
typecode
> with kind == 0xfffffffd) is sent in a message, and no previous
negotiation
> has happened, then MARSHAL should be used.

Right.

> >         If either side sends a repository ID inside a CacheEntry
that
> >         was cached previously, BAD_SERVICE_CONTEXT is raised.
> >
> >         If either side sends an index value inside a CacheEntry
> >         when that index value is already in use,
BAD_SERVICE_CONTEXT
> >         is raised.
> 
> I would prefer these two to allow for repetitions to happen. That
is,
> if the repository id and cache id are the same as already in the
cache,
> no exception needs to be raised.

I agree.  No need to detect duplicates in the cache.  The receiver can
just keep the most recently received index.

> One global comment about the BAD_SERVICE_CONTEXT exception: as this
> mechanism is piggy backed in "normal" messages, where should this
> exception be sent?  The user has no need to see it, and the ORB
> probably
> is unable to do anything about it.  So we might as well point this
> as
> a hard error in the protocol, and an indication that whoever was
> going
> to raise the exception should close the connection at the earliest
> possible time ...

Use the GIOP Error message?  I've never been clear what use it really
is
for, since there is no mandated response required when receiving one.

> I would like to make another proposal, complementary to this one.
> Michi, I know you hate it, but I still think that it is worth making
> the proposal and letting others scream and yell ... or support it.
> This complementary proposal depends on the first part of this one
> (CDR 1.1) but does not depend on the caching mechanism here proposed
> (although it would perfectly coexist with it, without problems).
> 
> This other proposal is the introduction of an "application level"
> mechanism for doing parts of what is being suggesting here.  The
> base of
> this proposal is to introduce a new TC_Kind (call it tk_implicit),
> that
> could be used instead of any typecode that has a repository id.  An
> any
> with an "implicit" typecode would be marshalled as the tk_implicit
> typecode (that is an int followed by the repository id, and nothing
> else),
> and then followed by the encapsulated value.  Note the value is
> encoded
> as an encapsulation. In the receiving end, the repository id would
> be
> enough to get the full typecode (if needed).

I don't see how you can make this work, since you have not defined any
negotiation mechanism for an application to discover that its peer
understands tk_implicit!  It seems like this has some awful
interoperability problems.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Wed, 15 Jul 1998 18:19:52 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Javier Lopez-Martin <javier@cnd.hp.com>
CC: michi@dstc.edu.au, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References: <199807160111.AA096211507@ovdm40.cnd.hp.com>

Javier Lopez-Martin wrote:
> > One option I am carefully considering (but I'm not sure whether it
> makes
> > sense yet) is that CDR 1.1 could add a flag to encapsulations that
> indicates
> > whether an encapsulation includes a cached type code or not. If
> not,
> > it is safe to forward the encapsulation without decoding it.
> >
> > This would (sort of) get around the problem, but I'm worried about
> the
> > ever-increasing complexity. The basic caching proposal we put on
> the
> > table has the advantage of simplicity.
> 
> I don't particularly like this.  Yes, it will get around the
> problem,
> but at the expense of complexity, as you already mention.  I'd
> rather
> not do this, and force encapsulations to not have cached typecodes
> (what a pity :-().

Again, Ick!  Any proposal that requires ORBs to decode and reencode
encapsulations just makes me sick to my stomach.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 11:51:31 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Javier Lopez-Martin <javier@cnd.hp.com>
cc: interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Javier Lopez-Martin wrote:

> The old tk_any + typecode + value is still valid in CDR 1.1.  What
you 
> present here is the alternative encoding, that might or might not be
> used at the ORB's discretion (just clarifying that the old encoding
> is also valid in CDR 1.1 ...).

Yes, that's correct.

> What follows is, in my opinion, a separate proposal, that builds on
> the above (ie, it requires it to be doable).  However, the above
> rules
> could be approved independently of the caching mechanism.  Right?

Yes. The caching mechanism is orthogonal to the change for strings and
encapsulation of Any values.

> With this mechanism that you have proposed, both ends MUST support
the
> caching, and be willing to cache.  I would prefer a mechanims that
> would allow each end to decide whether to support caching or not.

I don't think this is needed -- see below.

> Suggestion: modify the above negotiation to be directional; if both
> ends want to do caching, you get the same behaviour as above, and if
> only one end wants caching, at least you might get the benefit in
> that direction.  The way to modify the negotiation may be by using
> values attached to the TypeCodeCacheEnabled service context, or by
> using two service contexts.
> 
> For the sake of the explanation, make it two service contexts,
> called
> "WantYouToCache" and "OkI'llCache".  The negotiation would go as
> follows: client sends "WantYouToCache" with some request; if server
> accepts, sends the "OkI'llCache" in the response; and if the server
> also wants the client to cache, then it will send the
> "WantYouToCache"
> service context as well (in the same message, or separately), and if
> the client is able (and willing) to cache, it'll reply with another
> "OkI'llCache" (in some future message).
> 
> Only after a sending end (client or server) has seen a "OkI'llCache"
> service context, it will be able to use the optimization.  This 
> effectively makes both communication directions independent, does
> not add complexity, and increases flexibility.
> 
> Maybe this was already possible with your proposal, and I was
> missing
> something.  If it is so, let me know.

The scheme you suggest is stateful because one side sends a request to
which the other side can respond. (Whenever I see stateful
interactions,
I have alarm bells going off...)

But I think we don't need this anyway. For one, caching can *only*
work
if *both* sides understand it. This is because the receiver of a full
type code must send its repository-ID/index pair to the sender, and
the sender then must use that repository-ID/index pair later.

I think what you are alluding to is that you want it to work where
only one side wants to maintain a cache. The original proposal allows
that already. Consider a client that understands caching, and a server
that understands it too, but doesn't want to devote resources to the
cache.

When the client and server interact, they initially inform each other
that
they understand caching. Thereafter, the client sends CacheEntry
contexts
to the server, and the server may decide to use these CacheEntries
later.
If the server wants to use the CacheEntries, it need only put the
repository-ID/index pairs into its cache, but not the full type code.
Of course, the server can also choose to ignore the CacheEntries (but
then,
I agree, the client would uselessly maintain a cache that will never
be
hit by the server). The server in turn simply need not ever sends a
CacheEntry to the client, meaning that the client must always send
full
type codes to the server.

However, I see your point about having a cache only on one side *and*
using it effectively. Instead of teh request/reply scheme you proposed
above, why not split the caching agreement that happens initially into
two parts? Each side informs the other of two pieces of information:

	1) Will (or won't) *send* CacheEntries

	2) Can (or can't) *accept* CacheEntries

With that information, each side can decide in advance what is worth
caching and what is not.

I like this idea better than your proposal because it is stateless --
no reply is required to an offer to do caching.

> >	  When a client sends a type code for the very first time, it
must
> >	  send it using normal CDR 1.0 encoding rules (as a full type
code).
> >	  If the server understands caching and has cached this type
code,
> >	  it includes a ReposIdCache entry with in its reply service
context.
> 
> The receiving side might cache at this time (might not "have cached"
the
> typecode previously), right?

Yes.

> >	  So, if a client invokes an operation where the in or inout
> >	  parameters contain values of type any (possibly nested),
> >	  the server can return a sequence of those type codes that
> >	  it has chosen to cache. Each sequence member contains
> >	  a repository ID and the index value that was assigned to
> that
> >	  repository ID by the server. The receiver of a type code
> must
> >	  return CacheEntrySeq only for type codes it received in the
> >	  corresponding request.
> 
> You are mentioning "possibly nested" typecodes here.  With your
> modification
> below, this is no longer possible, right?  This can only be used
> with
> outermost typecodes.

Sorry, I didn't express this clearly. Mike Spreitzer previously
pointed
out that not only the top-level parameters may be of type any, but
that
they may be things like structures containing Anys. So, by allowing
to cache "nested" type codes, what I really meant was something like:

	struct X {
		any		a;
		CORBA::TypeCode	tc;
	};
	typedef sequence<X> Seq;

If a parameter is of type Seq, the receiver is entitled to cache every
typecode that appears anywhere in the parameters, nested or not. The
receiver can even cache a type code that belongs to an Any that is
nested
inside another Any. The modification I proposed for nested Anys just
means that an ORB cannot *send* a cached type code inside an
encapsulation.
However, if an ORB *receives* a type code inside an encapsulation, it
can cache it (after all, each type code has a unique repository ID, so
I can always cache any typecode I like, provide I have the full type
code
to put into the cache).

> Also, why the limitation of only being able to send typecodes
received
> in a request in the corresponding response?  I think this is an
arbitrary
> limitation, that could be dropped.  Effectively, it is not there in
the
> server to client negotiation ...

The limitation is there to keep things as stateless as
possible. Suppose
that the client currently is in the middle of an invocation that
includes
a type code. When the reply for *that* invocation comes back, the
client
can get a CacheEntry for the type code. If the server were to send the
CacheEntry half an hour later, the client may find it difficult to
deal
with that. It's no big deal, I admit. However, why *would* the server
delay sending the CacheEntry. There is only one logical point at which
the server would ever add a type code to its cache, and that is when
it
has just received that type code.

For the similar reasons, we have made it illegal for the sender to
just
willy-nilly send CacheEntries for type codes it happens to understand.
Instead, the sender can only send CacheEntries for those type codes
that it previously received from the other party.

> >	  The net effect is that if a server understands caching, the
client
> >	  needs to send the full typecode only once per connection,
and
> >	  thereafter just sends the cache index for that type code.
> 
> The client needs to keep a "pseudo" cache (or something like that)
with
> repository ids and server cache indexes.

Well, it's not a pseudo cache. It simply is a table of repository ID
and
index pairs. Whenever the client is about to marshal a type code, it
can
look in the table to see whether the server has cached that type code.
If so, it can just send the index instead of the full type code.

You are right though - there is a second cache that contains *full
type code*
and index pairs. That's the cache containing those entries for which
the *client* has sent CacheEntries to the server. Typically, the
client
will only put type codes into that cache if it has no static knowledge
of those type codes (and if they are complex, because there is no
gain in caching simple type codes).

> >	  A client can use the tk_implict encoding only for those type
> >	  codes it has previously received an index for.
> 
> Oops, you ment the 0xfffffffd encoding ...

Yes, sorry.

> This is just the other direction of the same mechanism.  I would
prefer
> them to be separately configurable, as I mentioned above.

I think that's fine, provided we get rid of the request/reply model.

> Yes, but you need an extra "thing" on the sending end to keep the
agreed
> mapping between repository id and cache index in the receiving side.

See the above explanations. I hope they make this clear.

> >	  The type caching scheme deliberately does not include to
explicitly
> >	  outdate a cache (other than by dropping a connection), or to
> >	  selective outdate particular cache entries. The reason for
this
> >	  decision is to keep complixity down and to avoid race
conditions
> >	  in multi-threaded clients and servers.
> 
> This is what I think is weakest in the proposal.  Forcing the close
of
> the connection to clean up the full cache seems to me like too
drastical.
> Besides, you might want to get rid of 50% of your cache, but not the
> rest; if you invalidate the whole cache, you have to repopulate it
with
> the "still valid" 50% that you just threw away.

That's a strong argument.

> Also, not being able to cache beyond a certain point seems
problematic
> for long-running connections.  You probably have a lot of useless
"old"
> cached typecodes, but you cannot get rid of them in any way (unless
you
> close the connection).

Another strong point.

> I would like to propose a mechanism to selectively invalidate the
caches
> at either end (the receiving cache) that, as far as I can tell, has
> no problems of race conditions.
> 
> First, add a new service context, something like
InvalidateCacheEntries,
> that would carry a sequence of cache ids to be invalidated.  This
could
> be sent at any time by the "sending" entity, and would inmediately
> invalidate entries in the "receiving" side (as soon as received);
> as soon as the message containing this is sent, no other message
from
> the sending side might use the removed cache ids.  As the cache is
> specific to a connection, there is no risk of other threads stepping
> over the operation.

Only if you assume a single thread per connection (I think). Scenario:
An ORB uses a thread-per-request model, so each operation executes in
its own thread, but all invocations come in via the same connection.
Some operations are long running, whereas others may complete quickly.
In addition, the ORB may use fragmentation at the protocol level.

I think this scenario makes it possible for things to interleave as
follows:

- Request 1 arrives at the server and is dispatched. The operation
  takes a while to run and produces a large result. Because the result
  is large, the server ORB decides to fragment the reply.

- First fragment for reply 1 is returned to the client and contains
  a cached type code.

- Request 2 arrives in the server and invalidates some cache entries.

- The second fragment is returned to the client, possibly with a
  cached type code that is supposed to be invalid now.

I think it is possible to get around the race conditions with
appropriate
inter-thread communication, but it may be hard to add this to existing
implementations.

> Second, add another service context, call it WantToInvalidate, that
would
> carry a sequence of cache ids. This would be sent by the "receiving"
> entity at any time, to indicate to the "sending" side of the need to
> clean up some entries; sending this does NOT invalidate the entries,
> only notify the sending entity of the need to clean up.  The
"sending"
> side should, after receiving this, send an InvalidateCacheEntries
> service context, with a list of cache ids to invalidate at the
"receiving"
> end.  Note that the WantToInvalidate provides only a "hint" of what
> entries to invalidate, the one that actually eliminates entries is
> the list sent with InvalidateCacheEntries.  As this message is only
> a hint, there should be no concern with race conditions or
deadlocks.

Let me see I understand this correctly. Suppose I am the client. I
have
previosly told the server that I have cached certain type codes. Now
I want to let the server know that I want to clean out my cache.
I send "WantToInvalidate index 10" to the server. After I've done
that,
the entry is still valid and I cannot throw it away until the server
says
"Invalidate index 10" because the server makes a promise to not use
that index for a cached type code again (because it has removed the
corresponding entry from its own cache).

> This would still be piggy-backed in the normal messages, and would
> allow for a relatively easy recovery mechanism to clean up
> unwanted entries in the caches.  This may be used by either end of
> the connection, to clean up the corresponding caches.

I think that could work, and I don't think it's too complex. In
particular,
no-one is forced to obey that protocol again, in the sense that the
client
can choose to never send a WantToInvalidate, and the server can choose
to never return a Invalidate.

> >	  If either side sends a cached type code when the receiver
> >	  hasn't previously indicated willingness to cache type codes
> with
> >	  TypeCodeCacheEnabled, BAD_SERVICE_CONTEXT is raised.
> 
> Actually, there are two distinct problems: if the sending side sends
> a
> ReposIdCache service context without prior TypeCodeCacheEnabled,
> then
> BAD_SERVICE_CONTEXT should be raised.  If a cached type code (a
> typecode
> with kind == 0xfffffffd) is sent in a message, and no previous
> negotiation
> has happened, then MARSHAL should be used.

I disagree. I would prefer to make both these conditions
BAD_SERVICE_CONTEXT,
and to use minor codes to distinguish them. Rationale: MARSHAL (to me)
indicates that the contents of a packet are corrupted somehow. For
example,
the offset is out of bounds for a recursive type code, or the byte
count
of a packet does not agree with the actual length of the packet.

The case you mention is different, in that there is nothing wrong with
what's in the packet as such, just that someone sent the context at
the wrong time. I really would prefer to keep BAD_SERVICE_CONTEXT for
this
instead of MARSHAL.

> >	  If either side sends a repository ID inside a CacheEntry
that
> >	  was cached previously, BAD_SERVICE_CONTEXT is raised.
> > 
> >	  If either side sends an index value inside a CacheEntry
> >	  when that index value is already in use, BAD_SERVICE_CONTEXT
> >	  is raised.
> 
> I would prefer these two to allow for repetitions to happen. That
is,
> if the repository id and cache id are the same as already in the
cache,
> no exception needs to be raised.

I don't feel too strongly about this. On the other hand, I see no
reason
to allow these repeated entries. Again, there is only one logical
point
at which someone would add an entry to the cache, and that is just
after
receiving a new type code. The next message exchanged between the two
parties then carries the CacheEntry for the new type code. Why would
the sending side ever tell the receiving side something that was
stated
earlier?

To paraphrase, what do we gain by allowing repeated CacheEntries if
they
don't conflict with the information already in the cache? I don't
think
we gain anything, so I can't see a reason to allow it.

> One global comment about the BAD_SERVICE_CONTEXT exception: as this
> mechanism is piggy backed in "normal" messages, where should this
> exception be sent?  The user has no need to see it, and the ORB
> probably
> is unable to do anything about it.  So we might as well point this
> as
> a hard error in the protocol, and an indication that whoever was
> going
> to raise the exception should close the connection at the earliest
> possible time ...
> 
> I'm not sure about this one though.  What do others think?

Well, it is similar to a MARSHAL exception in that it indicates a bug
in one of the two ORBs involved. However, we don't have any such
requirements
on MARSHAL, so I don't see a need to add any for BAD_SERVICE_CONTEXT.
The exception simply gets propagated to the client, who does what it
likes
with it. I don't think we should say anthing about connection closure
or some such. In particular, connection closure may not be possible,
because other operations may still be executing in the server, and a
server
cannot close a connection while a request is outstanding.

> > A sending ORB must ensure that cached typecodes do not appear
inside
> > encapsulations. This is because the cache index for cached type
> > code makes sense only for one particular client-server pair. If
> > cached type codes were allowed inside encapsulations, this would
> > force the receiver of an encapsulation to decode it just to be
able
> > to safely forward the encapsulation to a new destination.
> 
> This is pretty strong.  I see the reason behind it, but I know of
some
> particularly common cases where this restriction seems too strong.
> I will try to think whether there is a different way to solve the
problem
> instead of plainly forbidding it.

Well, I'd be happy to see a solution to this. However, as I stated in
my
other message, I will strongly oppose anything that looks too complex.
In my opinion, the entire caching idea only stacks up if we can keep
it fairly simple. If that isn't possible, I'm in favour of dropping it
instead of making it more complex.

> I would like to make another proposal, complementary to this one.
> Michi, I know you hate it, but I still think that it is worth making
> the proposal and letting others scream and yell ... or support it.

Well, as you know, I've screamed already in private ;-)

Javier, I agree that this can be made to work, but I see a number of
serious problems with this suggestion:

	- Applications all of a sudden have to be aware that caching
	  exists and need to explicitly negotiate this with the other
party.

	- Applications would explicitly replace the full type code
	  with its implicit form. This means that there would all of
	  a sudden be two kinds of Anys floating around. Ones
containing
	  full type codes and ones without full type codes.

	  The really serious problem with this is that it destroys the
	  guaranteed introspection capability that Any values have.
	  If the sender ever makes a mistake for some reason, the
	  receiver can find itself holding an Any that doesn't contain
	  a type code, and the receiver may have no idea what the
repository
	  ID inside the implicity type code actually means.

	- It would expose applications to low-level optimizations like
	  type code caching. I see this firmly as a platform
responsibility.
	  The whole point of CORBA is that I don't have to worry about
	  how all this magic happens behind the scenes. If we do what
	  we suggest, we significantly dilute the protocol and
	  implementation transparency of CORBA.

	- The side that incorrectly sends a tk_implicit may not even
be
	  responsible, because the tk_implicit could have come from
	  a different sender inside an encapsulation. In other words,
	  tk_implicit seems to suffer the same problem as the cached
type
	  codes with respect to encapsulation?

	- tk_implicit introduces implicit shared state between client
	  and server which is not visible at the IDL interface level.
	  In other words, if I just look at an interface like

		interface foo {
			void op(in any a);
		};
	  
	  I have no way of telling when, if at all, it is OK to send
	  an any with tk_implicit. (This is different from the cached
	  type code proposal, because the caching proposal keeps the
	  application out of things and is completely transparent).

	  In effect, your suggestion creates two kinds of IDL type
any.
	  One that can be sent as tk_implicit, and one that can't.
	  This is bad, because we now have a single IDL type that has
two
	  sets of semantics. Worse, I cannot tell without
understanding
	  all of the surrounding context which set of semantics to
apply.

So, (I know this doesn't come as a surprise to you ;-), I strongly
oppose
the idea of letting applications anywhere near caching of type codes.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <javier@cnd.hp.com>
Date: Wed, 15 Jul 1998 19:54:06 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: javier@cnd.hp.com, jon@floorboard.com
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
Cc: interop@omg.org, michi@dstc.edu.au
Content-Md5: ANIv0p65d4L//TnqyoEb7Q==

Hi Jon,

> > With this mechanism that you have proposed, both ends MUST support
the
> > caching, and be willing to cache.  I would prefer a mechanims that
> > would allow each end to decide whether to support caching or not.
> 
> No.  Both sides need to indicate that they support the caching
protocol,
> but it is no problem to use the defined protocol to say that you
support
> caching, but then never actually add anything to the cache on your
end. 
> If you never add anything to the cache, then the other end can never
> send you a cached typecode.

Ok, I see what you are saying.

> > For the sake of the explanation, make it two service contexts,
called
> > "WantYouToCache" and "OkI'llCache".  The negotiation would go as
> > follows: client sends "WantYouToCache" with some request; if
server
> > accepts, sends the "OkI'llCache" in the response; and if the
server
> > also wants the client to cache, then it will send the
"WantYouToCache"
> > service context as well (in the same message, or separately), and
if
> > the client is able (and willing) to cache, it'll reply with
another
> > "OkI'llCache" (in some future message).
> 
> The two proposed service contexts already add as the "want" and
"ok".

I see how you could make it to work with the currently defined
messages.
It is a bit awkward, but can be done.  So, we don't need this extra
mechanism (better!).

> > You are mentioning "possibly nested" typecodes here.  With your
modification
> > below, this is no longer possible, right?  This can only be used
with
> > outermost typecodes.
> 
> Good point.  Nested TypeCodes are inside encapsulations!  Could this
be
> allowed as an exception to the no-cached-typecodes in encapsulations
> rules?  Otherwise if I get a tk_array or tk_sequence, there is no
way to
> cache the component.

The problem only happens when the typecode is within an encapsulation,

that would normally happen within the (new) encoding of an any.  I
think 
that only that situation is being questioned, so the typecode for the
component of an array or sequence could be cached without problem
(I guess ...).


> > This is what I think is weakest in the proposal.  Forcing the
close of
> > the connection to clean up the full cache seems to me like too
drastical.
> > Besides, you might want to get rid of 50% of your cache, but not
the
> > rest; if you invalidate the whole cache, you have to repopulate it
with
> > the "still valid" 50% that you just threw away.
> > 
> > Also, not being able to cache beyond a certain point seems
problematic
> > for long-running connections.  You probably have a lot of useless
"old"
> > cached typecodes, but you cannot get rid of them in any way
(unless you
> > close the connection).
> > 
> > I would like to propose a mechanism to selectively invalidate the
caches
> > at either end (the receiving cache) that, as far as I can tell,
has
> > no problems of race conditions.
> > 
> > First, add a new service context, something like
InvalidateCacheEntries,
> > that would carry a sequence of cache ids to be invalidated.  This
could
> > be sent at any time by the "sending" entity, and would inmediately
> > invalidate entries in the "receiving" side (as soon as received);
> > as soon as the message containing this is sent, no other message
from
> > the sending side might use the removed cache ids.  As the cache is
> > specific to a connection, there is no risk of other threads
stepping
> > over the operation.
> >
> > Second, add another service context, call it WantToInvalidate,
that would
> > carry a sequence of cache ids. This would be sent by the
"receiving"
> > entity at any time, to indicate to the "sending" side of the need
to
> > clean up some entries; sending this does NOT invalidate the
entries,
> > only notify the sending entity of the need to clean up.  The
"sending"
> > side should, after receiving this, send an InvalidateCacheEntries
> > service context, with a list of cache ids to invalidate at the
"receiving"
> > end.  Note that the WantToInvalidate provides only a "hint" of
what
> > entries to invalidate, the one that actually eliminates entries is
> > the list sent with InvalidateCacheEntries.  As this message is
only
> > a hint, there should be no concern with race conditions or
deadlocks.
> 
> This would work ok.  The big question is how to detect that you want
to
> invalidate entries in the cache and which ones?  Do some LRU
algorithm. 
> Of course, it is completely implementation defined--no need to
> standardize.

Exactly: this is an implementation issue, and should not be
standardized.
As long as the proposal is reasonably implementable (that I think it
is), 
not too complex (that I don't think it is), and does not suffer from 
inherent technical problems (that I haven't seen, but I might be
wrong), 
then it is ok.  And I think that the ability to partially invalidate
the cache far outweights the (small) extra complexity.

> > >         If either side sends a repository ID inside a CacheEntry
that
> > >         was cached previously, BAD_SERVICE_CONTEXT is raised.
> > >
> > >         If either side sends an index value inside a CacheEntry
> > >         when that index value is already in use,
BAD_SERVICE_CONTEXT
> > >         is raised.
> > 
> > I would prefer these two to allow for repetitions to happen. That
is,
> > if the repository id and cache id are the same as already in the
cache,
> > no exception needs to be raised.
> 
> I agree.  No need to detect duplicates in the cache.  The receiver
can
> just keep the most recently received index.

Well, you need to detect that you don't have two different repository
ids 
with the same cache id ...

What I was saying is that if the repository id and cache index
received
are the same as already in the cache, no exception should be raised.
I have no problem in "replacing" an entry with the same repository id
by a newer one with a different cache index.  But I don't think that
allowing two repository ids to map to the same cache index would be
right.

> > One global comment about the BAD_SERVICE_CONTEXT exception: as
this
> > mechanism is piggy backed in "normal" messages, where should this
> > exception be sent?  The user has no need to see it, and the ORB
probably
> > is unable to do anything about it.  So we might as well point this
as
> > a hard error in the protocol, and an indication that whoever was
going
> > to raise the exception should close the connection at the earliest
> > possible time ...
> 
> Use the GIOP Error message?  I've never been clear what use it
really is
> for, since there is no mandated response required when receiving
one.

Exactly.  We might as well do nothing here.

> > I would like to make another proposal, complementary to this one.
> > Michi, I know you hate it, but I still think that it is worth
> making
> > the proposal and letting others scream and yell ... or support it.
> > This complementary proposal depends on the first part of this one
> > (CDR 1.1) but does not depend on the caching mechanism here
> proposed
> > (although it would perfectly coexist with it, without problems).
> > 
> > This other proposal is the introduction of an "application level"
> > mechanism for doing parts of what is being suggesting here.  The
> base of
> > this proposal is to introduce a new TC_Kind (call it tk_implicit),
> that
> > could be used instead of any typecode that has a repository id.
> An any
> > with an "implicit" typecode would be marshalled as the tk_implicit
> > typecode (that is an int followed by the repository id, and
> nothing else),
> > and then followed by the encapsulated value.  Note the value is
> encoded
> > as an encapsulation. In the receiving end, the repository id would
> be
> > enough to get the full typecode (if needed).
> 
> I don't see how you can make this work, since you have not defined
> any
> negotiation mechanism for an application to discover that its peer
> understands tk_implicit!  It seems like this has some awful
> interoperability problems.

I explicitly mentioned (later in my proposal, text not quoted) that
the
negotiation of this facility would be strictly an application issue.
So, applications will have to agree on a mechanism to indicate the
other
end the willingness to support implicit.  Actually, I don't see that
many interoperability problems: the application has to negotiate (or
mandate, whatever is chosen) the use of this facility, via unspecified
mechanisms (ie, not specified by CORBA, but specified by the
application).
Of course, this application won't interoperate with another
application ...
As for interoperability with services, it is up to service
implementors
to provide for this or not; the most likely case is that only customer
demand would drive service vendors to provide this (if at all).
But within applications, it might make a lot of difference, and does
not
require almost anything from the ORB to support them.

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 12:07:27 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Jonathan Biggar wrote:

> > You are mentioning "possibly nested" typecodes here.  With your
modification
> > below, this is no longer possible, right?  This can only be used
with
> > outermost typecodes.
> 
> Good point.  Nested TypeCodes are inside encapsulations!  Could this
be
> allowed as an exception to the no-cached-typecodes in encapsulations
> rules?  Otherwise if I get a tk_array or tk_sequence, there is no
way to
> cache the component.

Be careful -- that's not what I meant. If I receive a complex data
value
for the first time from someone, then all type codes that have not
been
transmitted and cached earlier will be full type codes. If I'm capable
of caching, then I can add every full type code I receive to my
cache. 
Whether that full type code is inside an encapsulation or not is
irrelevant.

The encapsulation issue only arises when I'm *sending* an
encapsulation.
In that case, I cannot put the cached (short form) type code into the
encapsulation. even though

> > Also, why the limitation of only being able to send typecodes
received
> > in a request in the corresponding response?  I think this is an
arbitrary
> > limitation, that could be dropped.  Effectively, it is not there
in the
> > server to client negotiation ...
> 
> I agree.  The caching side should be free to send the ReposIdCache
> context on any message (containing service contexts!), not
necessarily
> the direct response.

Well, as I said in my previous message, the motivation for this was
to keep things stateless. It seems pointless to delay the ReposIdCache
context until later, because there is only one logical point at which
someone can add a new type code to their cache. I'm not hung up about
this one, but thought that the restriction would make implementations
a little simpler.

> > Second, add another service context, call it WantToInvalidate,
that would
> > carry a sequence of cache ids. This would be sent by the
"receiving"
> > entity at any time, to indicate to the "sending" side of the need
to
> > clean up some entries; sending this does NOT invalidate the
entries,
> > only notify the sending entity of the need to clean up.  The
"sending"
> > side should, after receiving this, send an InvalidateCacheEntries
> > service context, with a list of cache ids to invalidate at the
"receiving"
> > end.  Note that the WantToInvalidate provides only a "hint" of
what
> > entries to invalidate, the one that actually eliminates entries is
> > the list sent with InvalidateCacheEntries.  As this message is
only
> > a hint, there should be no concern with race conditions or
deadlocks.
> 
> This would work ok.  The big question is how to detect that you want
to
> invalidate entries in the cache and which ones?  Do some LRU
algorithm. 
> Of course, it is completely implementation defined--no need to
> standardize.

Agree. In fact, the most efficient implementation would be to never
outdate
anything at all. This would be legal with Javier's proposal.

> > Actually, there are two distinct problems: if the sending side
sends a
> > ReposIdCache service context without prior TypeCodeCacheEnabled,
then
> > BAD_SERVICE_CONTEXT should be raised.  If a cached type code (a
typecode
> > with kind == 0xfffffffd) is sent in a message, and no previous
negotiation
> > has happened, then MARSHAL should be used.
> 
> Right.

I don't like MARSHAL, for the reasons I explained previously. I think
marshal is more for bit-level errors than errors that arise from a
message
type (the service context) being sent under the wrong circumstances.
I'm willing to be convinced otherwise though if people feel strongly
about it.

> > >         If either side sends a repository ID inside a CacheEntry
that
> > >         was cached previously, BAD_SERVICE_CONTEXT is raised.
> > >
> > >         If either side sends an index value inside a CacheEntry
> > >         when that index value is already in use,
BAD_SERVICE_CONTEXT
> > >         is raised.
> > 
> > I would prefer these two to allow for repetitions to happen. That
is,
> > if the repository id and cache id are the same as already in the
cache,
> > no exception needs to be raised.
> 
> I agree.  No need to detect duplicates in the cache.  The receiver
can
> just keep the most recently received index.

Aren't there race conditions if multiple requests are outstanding on
the
same connection?

> I don't see how you can make this work, since you have not defined
any
> negotiation mechanism for an application to discover that its peer
> understands tk_implicit!  It seems like this has some awful
> interoperability problems.

My point exactly.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 12:08:16 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Jonathan Biggar wrote:

> Again, Ick!  Any proposal that requires ORBs to decode and reencode
> encapsulations just makes me sick to my stomach.

I think we are beginning to reach consensus that this approach is not
an option :-)

								Michi.


Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Wed, 15 Jul 1998 19:22:39 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Michi Henning <michi@dstc.edu.au>
CC: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980716120020.22682K-100000@tigger.dstc.edu.au>

Michi Henning wrote:
> 
> On Wed, 15 Jul 1998, Jonathan Biggar wrote:
> 
> > > You are mentioning "possibly nested" typecodes here.  With your
> modification
> > > below, this is no longer possible, right?  This can only be used
> with
> > > outermost typecodes.
> >
> > Good point.  Nested TypeCodes are inside encapsulations!  Could
> this be
> > allowed as an exception to the no-cached-typecodes in
> encapsulations
> > rules?  Otherwise if I get a tk_array or tk_sequence, there is no
> way to
> > cache the component.
> 
> Be careful -- that's not what I meant. If I receive a complex data
> value
> for the first time from someone, then all type codes that have not
> been
> transmitted and cached earlier will be full type codes. If I'm
> capable
> of caching, then I can add every full type code I receive to my
> cache.
> Whether that full type code is inside an encapsulation or not is
> irrelevant.
> 
> The encapsulation issue only arises when I'm *sending* an
> encapsulation.
> In that case, I cannot put the cached (short form) type code into
> the
> encapsulation. even though

I'm just explicitly stating that TypeCode components, although they
are
in encapsulations themselves inside their parent TypeCode, an explicit
exception to the rule.  You should be able to send a TypeCode (at the
top level) that is a tk_sequence of a cached typecode.

> > > Also, why the limitation of only being able to send typecodes
received
> > > in a request in the corresponding response?  I think this is an
arbitrary
> > > limitation, that could be dropped.  Effectively, it is not there
in the
> > > server to client negotiation ...
> >
> > I agree.  The caching side should be free to send the ReposIdCache
> > context on any message (containing service contexts!), not
necessarily
> > the direct response.
> 
> Well, as I said in my previous message, the motivation for this was
> to keep things stateless. It seems pointless to delay the
ReposIdCache
> context until later, because there is only one logical point at
which
> someone can add a new type code to their cache. I'm not hung up
about
> this one, but thought that the restriction would make
implementations
> a little simpler.

The scenario in question is a multithreaded server that receives two
long running requests.  It adds the TypeCodes for each request to its
cache, but it should be free to respond with cache indexes for both
requests on the first reply to complete.  Otherwise you have to keep
track of which request to send each cache index on.

> > > Second, add another service context, call it WantToInvalidate,
that would
> > > carry a sequence of cache ids. This would be sent by the
"receiving"
> > > entity at any time, to indicate to the "sending" side of the
need to
> > > clean up some entries; sending this does NOT invalidate the
entries,
> > > only notify the sending entity of the need to clean up.  The
"sending"
> > > side should, after receiving this, send an
InvalidateCacheEntries
> > > service context, with a list of cache ids to invalidate at the
"receiving"
> > > end.  Note that the WantToInvalidate provides only a "hint" of
what
> > > entries to invalidate, the one that actually eliminates entries
is
> > > the list sent with InvalidateCacheEntries.  As this message is
only
> > > a hint, there should be no concern with race conditions or
deadlocks.
> >
> > This would work ok.  The big question is how to detect that you
want to
> > invalidate entries in the cache and which ones?  Do some LRU
algorithm.
> > Of course, it is completely implementation defined--no need to
> > standardize.
> 
> Agree. In fact, the most efficient implementation would be to never
outdate
> anything at all. This would be legal with Javier's proposal.

So do we add his proposal to your proposal?

> > I agree.  No need to detect duplicates in the cache.  The receiver
can
> > just keep the most recently received index.
> 
> Aren't there race conditions if multiple requests are outstanding on
the
> same connection?

No, because the sender's cache maps repository id to index, and the
receiver's cache maps index to repository id.  So the receiver can
have
both indexes in its cache with no problem.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Wed, 15 Jul 1998 19:40:48 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Michi Henning <michi@dstc.edu.au>
CC: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980716105348.22682I-100000@tigger.dstc.edu.au>

Michi Henning wrote:
> > Also, why the limitation of only being able to send typecodes
> received
> > in a request in the corresponding response?  I think this is an
> arbitrary
> > limitation, that could be dropped.  Effectively, it is not there
> in the
> > server to client negotiation ...
> 
> The limitation is there to keep things as stateless as
> possible. Suppose
> that the client currently is in the middle of an invocation that
> includes
> a type code. When the reply for *that* invocation comes back, the
> client
> can get a CacheEntry for the type code. If the server were to send
> the
> CacheEntry half an hour later, the client may find it difficult to
> deal
> with that. It's no big deal, I admit. However, why *would* the
> server
> delay sending the CacheEntry. There is only one logical point at
> which
> the server would ever add a type code to its cache, and that is when
> it
> has just received that type code.

Michi, this is overkill, and places a burden on the TypeCode cache to
match cache entries with the request that created them.  I don't see
any
reason to limit when the receiver sends back the CacheEntry for a
TypeCode.  Generally, as soon as possible seems to be the right
approach, but there isn't any real reason to mandate a particular
time.

> For the similar reasons, we have made it illegal for the sender to
just
> willy-nilly send CacheEntries for type codes it happens to
understand.
> Instead, the sender can only send CacheEntries for those type codes
> that it previously received from the other party.

Why not.  If an ORB vendor wants to make it possible for an
application
to preload the cache with TypeCodes in the applications domain, what
does it hurt?

> Well, it's not a pseudo cache. It simply is a table of repository ID
and
> index pairs. Whenever the client is about to marshal a type code, it
can
> look in the table to see whether the server has cached that type
code.
> If so, it can just send the index instead of the full type code.
> 
> You are right though - there is a second cache that contains *full
type code*
> and index pairs. That's the cache containing those entries for which
> the *client* has sent CacheEntries to the server. Typically, the
client
> will only put type codes into that cache if it has no static
knowledge
> of those type codes (and if they are complex, because there is no
> gain in caching simple type codes).

It would be a good idea to clarify that both sides have a sender cache
and a receiver cache, and that the sender cache maps repository ids to
indexes, and the receiver cache maps indexes to repository ids.

> > Second, add another service context, call it WantToInvalidate,
that would
> > carry a sequence of cache ids. This would be sent by the
"receiving"
> > entity at any time, to indicate to the "sending" side of the need
to
> > clean up some entries; sending this does NOT invalidate the
entries,
> > only notify the sending entity of the need to clean up.  The
"sending"
> > side should, after receiving this, send an InvalidateCacheEntries
> > service context, with a list of cache ids to invalidate at the
"receiving"
> > end.  Note that the WantToInvalidate provides only a "hint" of
what
> > entries to invalidate, the one that actually eliminates entries is
> > the list sent with InvalidateCacheEntries.  As this message is
only
> > a hint, there should be no concern with race conditions or
deadlocks.
> 
> Let me see I understand this correctly. Suppose I am the client. I
have
> previosly told the server that I have cached certain type codes. Now
> I want to let the server know that I want to clean out my cache.
> I send "WantToInvalidate index 10" to the server. After I've done
that,
> the entry is still valid and I cannot throw it away until the server
says
> "Invalidate index 10" because the server makes a promise to not use
> that index for a cached type code again (because it has removed the
> corresponding entry from its own cache).
> 
> > This would still be piggy-backed in the normal messages, and would
> > allow for a relatively easy recovery mechanism to clean up
> > unwanted entries in the caches.  This may be used by either end of
> > the connection, to clean up the corresponding caches.
> 
> I think that could work, and I don't think it's too complex. In
particular,
> no-one is forced to obey that protocol again, in the sense that the
client
> can choose to never send a WantToInvalidate, and the server can
choose
> to never return a Invalidate.

I think if we don't allow sending an Invalidate except in response to
a
WantToInvalidate, then the race conditions go away.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <javier@cnd.hp.com>
Date: Wed, 15 Jul 1998 20:45:36 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: javier@cnd.hp.com, michi@dstc.edu.au
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
Cc: interop@omg.org
Content-Md5: GsWBl/lWv8yz86JbSMN6NA==

> > With this mechanism that you have proposed, both ends MUST support
the
> > caching, and be willing to cache.  I would prefer a mechanims that
> > would allow each end to decide whether to support caching or not.
> 
> I don't think this is needed -- see below.

Jon has already convinced me that this is not needed.  I think what
I wanted can be done with what is there now.  But I prefer your
new suggestion below: it achieves what I wanted in a cleaner manner.

> > Suggestion: modify the above negotiation to be directional; if
both
> > ends want to do caching, you get the same behaviour as above, and
if
> > only one end wants caching, at least you might get the benefit in
> > that direction.  The way to modify the negotiation may be by using
> > values attached to the TypeCodeCacheEnabled service context, or by
> > using two service contexts.
> > 
> > For the sake of the explanation, make it two service contexts,
called
> > "WantYouToCache" and "OkI'llCache".  The negotiation would go as
> > follows: client sends "WantYouToCache" with some request; if
server
> > accepts, sends the "OkI'llCache" in the response; and if the
server
> > also wants the client to cache, then it will send the
"WantYouToCache"
> > service context as well (in the same message, or separately), and
if
> > the client is able (and willing) to cache, it'll reply with
another
> > "OkI'llCache" (in some future message).
> > 
> > Only after a sending end (client or server) has seen a
"OkI'llCache"
> > service context, it will be able to use the optimization.  This 
> > effectively makes both communication directions independent, does
> > not add complexity, and increases flexibility.
> > 
> > Maybe this was already possible with your proposal, and I was
missing
> > something.  If it is so, let me know.
> 
> The scheme you suggest is stateful because one side sends a request
to
> which the other side can respond. (Whenever I see stateful
interactions,
> I have alarm bells going off...)
> 
> But I think we don't need this anyway. For one, caching can *only*
work
> if *both* sides understand it. This is because the receiver of a
full
> type code must send its repository-ID/index pair to the sender, and
> the sender then must use that repository-ID/index pair later.

Yes, both ends need to understand the protocol, but that is different
from accepting to actually cache entries.  And end might be willing
to send "cached" entries to the other end, but not to locally cache.
This is important information in negotiation, although I am now
convinced that the protocol, as specified, might work without this 
knowledge.

> I think what you are alluding to is that you want it to work where
> only one side wants to maintain a cache. The original proposal
> allows
> that already. Consider a client that understands caching, and a
> server
> that understands it too, but doesn't want to devote resources to the
> cache.
> 
> When the client and server interact, they initially inform each
> other that
> they understand caching. Thereafter, the client sends CacheEntry
> contexts
> to the server, and the server may decide to use these CacheEntries
> later.
> If the server wants to use the CacheEntries, it need only put the
> repository-ID/index pairs into its cache, but not the full type
> code.
> Of course, the server can also choose to ignore the CacheEntries
> (but then,
> I agree, the client would uselessly maintain a cache that will never
> be
> hit by the server). The server in turn simply need not ever sends a
> CacheEntry to the client, meaning that the client must always send
> full
> type codes to the server.

Yes, as I said, it might be made to work.

> However, I see your point about having a cache only on one side
*and*
> using it effectively. Instead of teh request/reply scheme you
proposed
> above, why not split the caching agreement that happens initially
into
> two parts? Each side informs the other of two pieces of information:
> 
>	1) Will (or won't) *send* CacheEntries
> 
>	2) Can (or can't) *accept* CacheEntries
> 
> With that information, each side can decide in advance what is worth
> caching and what is not.
> 
> I like this idea better than your proposal because it is stateless
--
> no reply is required to an offer to do caching.

I also like this better.  It provides the same information as what I
was
proposing, and is simpler.


> > >	  So, if a client invokes an operation where the in or inout
> > >	  parameters contain values of type any (possibly nested),
> > >	  the server can return a sequence of those type codes that
> > >	  it has chosen to cache. Each sequence member contains
> > >	  a repository ID and the index value that was assigned to
> that
> > >	  repository ID by the server. The receiver of a type code
> must
> > >	  return CacheEntrySeq only for type codes it received in the
> > >	  corresponding request.
> > 
> > You are mentioning "possibly nested" typecodes here.  With your
> modification
> > below, this is no longer possible, right?  This can only be used
> with
> > outermost typecodes.
> 
> Sorry, I didn't express this clearly. Mike Spreitzer previously
> pointed
> out that not only the top-level parameters may be of type any, but
> that
> they may be things like structures containing Anys. So, by allowing
> to cache "nested" type codes, what I really meant was something
> like:
> 
>	struct X {
>		any		a;
>		CORBA::TypeCode	tc;
>	};
>	typedef sequence<X> Seq;
> 
> If a parameter is of type Seq, the receiver is entitled to cache
> every
> typecode that appears anywhere in the parameters, nested or not. The
> receiver can even cache a type code that belongs to an Any that is
> nested
> inside another Any. The modification I proposed for nested Anys just
> means that an ORB cannot *send* a cached type code inside an
> encapsulation.
> However, if an ORB *receives* a type code inside an encapsulation,
> it
> can cache it (after all, each type code has a unique repository ID,
> so
> I can always cache any typecode I like, provide I have the full type
> code
> to put into the cache).

I understand it now.  Agreed.

> > Also, why the limitation of only being able to send typecodes
received
> > in a request in the corresponding response?  I think this is an
arbitrary
> > limitation, that could be dropped.  Effectively, it is not there
in the
> > server to client negotiation ...
> 
> The limitation is there to keep things as stateless as
possible. Suppose
> that the client currently is in the middle of an invocation that
includes
> a type code. When the reply for *that* invocation comes back, the
client
> can get a CacheEntry for the type code. If the server were to send
the
> CacheEntry half an hour later, the client may find it difficult to
deal
> with that. It's no big deal, I admit. However, why *would* the
server
> delay sending the CacheEntry. There is only one logical point at
which
> the server would ever add a type code to its cache, and that is when
it
> has just received that type code.

I see no problem in sending the CacheEntry at any time the caching
side
sees appropriate.  In the other side, the only relevant information
are
repository id and cache index, both in the received cache entry; there
is no need to keep the full typecode on this side.  The interaction
is totally stateless.  I really see no reason to limit this.

> For the similar reasons, we have made it illegal for the sender to
just
> willy-nilly send CacheEntries for type codes it happens to
understand.
> Instead, the sender can only send CacheEntries for those type codes
> that it previously received from the other party.

This is different.  Basically, what would be the criteria to send the
typecodes?  There is no reason.  The only way for this to be effective
would be to have the application do it, but we don't want that.

> > >	  The net effect is that if a server understands caching, the
client
> > >	  needs to send the full typecode only once per connection,
and
> > >	  thereafter just sends the cache index for that type code.
> > 
> > The client needs to keep a "pseudo" cache (or something like that)
with
> > repository ids and server cache indexes.
> 
> Well, it's not a pseudo cache. It simply is a table of repository ID
and
> index pairs. Whenever the client is about to marshal a type code, it
can
> look in the table to see whether the server has cached that type
code.
> If so, it can just send the index instead of the full type code.
> 
> You are right though - there is a second cache that contains *full
type code*
> and index pairs. That's the cache containing those entries for which
> the *client* has sent CacheEntries to the server. Typically, the
client
> will only put type codes into that cache if it has no static
knowledge
> of those type codes (and if they are complex, because there is no
> gain in caching simple type codes).

Yes, this is what I meant: there are two different data structures at
each
end: the full cache, and this table.  One supports the receiving path,
and
the other the sending path.

> > >	  The type caching scheme deliberately does not include to
explicitly
> > >	  outdate a cache (other than by dropping a connection), or to
> > >	  selective outdate particular cache entries. The reason for
this
> > >	  decision is to keep complixity down and to avoid race
conditions
> > >	  in multi-threaded clients and servers.
> > 
> > This is what I think is weakest in the proposal.  Forcing the
close of
> > the connection to clean up the full cache seems to me like too
drastical.
> > Besides, you might want to get rid of 50% of your cache, but not
the
> > rest; if you invalidate the whole cache, you have to repopulate it
with
> > the "still valid" 50% that you just threw away.
> 
> That's a strong argument.
> 
> > Also, not being able to cache beyond a certain point seems
problematic
> > for long-running connections.  You probably have a lot of useless
"old"
> > cached typecodes, but you cannot get rid of them in any way
(unless you
> > close the connection).
> 
> Another strong point.
> 
> > I would like to propose a mechanism to selectively invalidate the
caches
> > at either end (the receiving cache) that, as far as I can tell,
has
> > no problems of race conditions.
> > 
> > First, add a new service context, something like
InvalidateCacheEntries,
> > that would carry a sequence of cache ids to be invalidated.  This
could
> > be sent at any time by the "sending" entity, and would inmediately
> > invalidate entries in the "receiving" side (as soon as received);
> > as soon as the message containing this is sent, no other message
from
> > the sending side might use the removed cache ids.  As the cache is
> > specific to a connection, there is no risk of other threads
stepping
> > over the operation.
> 
> Only if you assume a single thread per connection (I
think). Scenario:
> An ORB uses a thread-per-request model, so each operation executes
in
> its own thread, but all invocations come in via the same connection.
> Some operations are long running, whereas others may complete
quickly.
> In addition, the ORB may use fragmentation at the protocol level.
> 
> I think this scenario makes it possible for things to interleave as
follows:
> 
> - Request 1 arrives at the server and is dispatched. The operation
>   takes a while to run and produces a large result. Because the
result
>   is large, the server ORB decides to fragment the reply.
> 
> - First fragment for reply 1 is returned to the client and contains
>   a cached type code.
> 
> - Request 2 arrives in the server and invalidates some cache
entries.
> 
> - The second fragment is returned to the client, possibly with a
>   cached type code that is supposed to be invalid now.
> 
> I think it is possible to get around the race conditions with
appropriate
> inter-thread communication, but it may be hard to add this to
existing
> implementations.

Let me see if I understand your scenario:

Reply 1 Fragment 1 arrives in client
Reply 2 arrives in client, invalidating some cache entries
Reply 1 Fragment 2 arrives in client, with an invalidated cache entry

This, in my opinion, is a wrong implementation of cache invalidation:
reply 2 should have never invalidated something currently in use.
Anyhow, even if this happened, as you mentioned earlier, this could be
made to work by inter-thread communication.  This would only be needed
in the presence of fragmentation though.

> > Second, add another service context, call it WantToInvalidate,
that would
> > carry a sequence of cache ids. This would be sent by the
"receiving"
> > entity at any time, to indicate to the "sending" side of the need
to
> > clean up some entries; sending this does NOT invalidate the
entries,
> > only notify the sending entity of the need to clean up.  The
"sending"
> > side should, after receiving this, send an InvalidateCacheEntries
> > service context, with a list of cache ids to invalidate at the
"receiving"
> > end.  Note that the WantToInvalidate provides only a "hint" of
what
> > entries to invalidate, the one that actually eliminates entries is
> > the list sent with InvalidateCacheEntries.  As this message is
only
> > a hint, there should be no concern with race conditions or
deadlocks.
> 
> Let me see I understand this correctly. Suppose I am the client. I
have
> previosly told the server that I have cached certain type codes. Now
> I want to let the server know that I want to clean out my cache.
> I send "WantToInvalidate index 10" to the server. After I've done
that,
> the entry is still valid and I cannot throw it away until the server
says
> "Invalidate index 10" because the server makes a promise to not use
> that index for a cached type code again (because it has removed the
> corresponding entry from its own cache).

Yes, this is the idea.

> > This would still be piggy-backed in the normal messages, and would
> > allow for a relatively easy recovery mechanism to clean up
> > unwanted entries in the caches.  This may be used by either end of
> > the connection, to clean up the corresponding caches.
> 
> I think that could work, and I don't think it's too complex. In
> particular,
> no-one is forced to obey that protocol again, in the sense that the
> client
> can choose to never send a WantToInvalidate, and the server can
> choose
> to never return a Invalidate.

And the Invalidate might invalidate something different than that
suggested
by WantToInvalidate.

Also, this would not invalidate your other original ideas: closing a
connection fully cleans up the cache, and when exhausted, you keep
discarding new typecodes (ie, not caching them).

> > >	  If either side sends a cached type code when the receiver
> > >	  hasn't previously indicated willingness to cache type codes
> with
> > >	  TypeCodeCacheEnabled, BAD_SERVICE_CONTEXT is raised.
> > 
> > Actually, there are two distinct problems: if the sending side
> sends a
> > ReposIdCache service context without prior TypeCodeCacheEnabled,
> then
> > BAD_SERVICE_CONTEXT should be raised.  If a cached type code (a
> typecode
> > with kind == 0xfffffffd) is sent in a message, and no previous
> negotiation
> > has happened, then MARSHAL should be used.
> 
> I disagree. I would prefer to make both these conditions
> BAD_SERVICE_CONTEXT,
> and to use minor codes to distinguish them. Rationale: MARSHAL (to
> me)
> indicates that the contents of a packet are corrupted somehow. For
> example,
> the offset is out of bounds for a recursive type code, or the byte
> count
> of a packet does not agree with the actual length of the packet.
> 
> The case you mention is different, in that there is nothing wrong
> with
> what's in the packet as such, just that someone sent the context at
> the wrong time. I really would prefer to keep BAD_SERVICE_CONTEXT
> for this
> instead of MARSHAL.

In the case I argue MARSHAL should be used, there is no service
context
being sent, so how can you send a BAD_SERVICE_CONTEXT exception, where
there is no such thing in the message to which you are responding?

For me, having a value of 0xfffffffd in a kind without having agreed
to
do caching is just another case of corrupted message buffer, and
therefore
the exception should be MARSHAL.

> > >	  If either side sends a repository ID inside a CacheEntry
that
> > >	  was cached previously, BAD_SERVICE_CONTEXT is raised.
> > > 
> > >	  If either side sends an index value inside a CacheEntry
> > >	  when that index value is already in use, BAD_SERVICE_CONTEXT
> > >	  is raised.
> > 
> > I would prefer these two to allow for repetitions to happen. That
is,
> > if the repository id and cache id are the same as already in the
cache,
> > no exception needs to be raised.
> 
> I don't feel too strongly about this. On the other hand, I see no
reason
> to allow these repeated entries. Again, there is only one logical
point
> at which someone would add an entry to the cache, and that is just
after
> receiving a new type code. The next message exchanged between the
two
> parties then carries the CacheEntry for the new type code. Why would
> the sending side ever tell the receiving side something that was
stated
> earlier?
> 
> To paraphrase, what do we gain by allowing repeated CacheEntries if
they
> don't conflict with the information already in the cache? I don't
think
> we gain anything, so I can't see a reason to allow it.

I see your point here.  I don't have a too strong opinion here,
although
I have a slight inclination for the permissive solution.

> > One global comment about the BAD_SERVICE_CONTEXT exception: as
this
> > mechanism is piggy backed in "normal" messages, where should this
> > exception be sent?  The user has no need to see it, and the ORB
probably
> > is unable to do anything about it.  So we might as well point this
as
> > a hard error in the protocol, and an indication that whoever was
going
> > to raise the exception should close the connection at the earliest
> > possible time ...
> > 
> > I'm not sure about this one though.  What do others think?
> 
> Well, it is similar to a MARSHAL exception in that it indicates a
bug
> in one of the two ORBs involved. However, we don't have any such
requirements
> on MARSHAL, so I don't see a need to add any for
BAD_SERVICE_CONTEXT.
> The exception simply gets propagated to the client, who does what it
likes
> with it. I don't think we should say anthing about connection
closure
> or some such. In particular, connection closure may not be possible,
> because other operations may still be executing in the server, and a
server
> cannot close a connection while a request is outstanding.

My point is that, even if BAD_SERVICE_CONTEXT would have to be raised,
this does not indicate any problem in the real application exchange.
It might very well be the case that the application would still
function properly, without noticing this at all (it just would not
be using caching, but otherwise functioning properly).

I think I would prefer these exceptions to not be raised, but rather
be logged or plainly discarded.

> > > A sending ORB must ensure that cached typecodes do not appear
inside
> > > encapsulations. This is because the cache index for cached type
> > > code makes sense only for one particular client-server pair. If
> > > cached type codes were allowed inside encapsulations, this would
> > > force the receiver of an encapsulation to decode it just to be
able
> > > to safely forward the encapsulation to a new destination.
> > 
> > This is pretty strong.  I see the reason behind it, but I know of
some
> > particularly common cases where this restriction seems too strong.
> > I will try to think whether there is a different way to solve the
problem
> > instead of plainly forbidding it.
> 
> Well, I'd be happy to see a solution to this. However, as I stated
in my
> other message, I will strongly oppose anything that looks too
complex.
> In my opinion, the entire caching idea only stacks up if we can keep
> it fairly simple. If that isn't possible, I'm in favour of dropping
it
> instead of making it more complex.

I agree: whatever we come up with must be relatively simple for the
task.

> > I would like to make another proposal, complementary to this one.
> > Michi, I know you hate it, but I still think that it is worth
> making
> > the proposal and letting others scream and yell ... or support it.
> 
> Well, as you know, I've screamed already in private ;-)

I know, that is why I said "letting 'others' scream and yell" ;-).

> Javier, I agree that this can be made to work, but I see a number of
> serious problems with this suggestion:
> 
>	- Applications all of a sudden have to be aware that caching
>	  exists and need to explicitly negotiate this with the other
> party.

This has NOTHING to do with caching.  I mention "application-defined"
mechanisms, but not a cache.  Specifically, in Name-Value pairs, the
name usually contains enough information to know the type of the value
upfront, without looking at the typecode in the any.  There are a
number
of other cases where applications have other means to know the type
that goes in an any, without having to inspect the any.

>	- Applications would explicitly replace the full type code
>	  with its implicit form. This means that there would all of
>	  a sudden be two kinds of Anys floating around. Ones
containing
>	  full type codes and ones without full type codes.

Yes, this is true.

>	  The really serious problem with this is that it destroys the
>	  guaranteed introspection capability that Any values have.
>	  If the sender ever makes a mistake for some reason, the
>	  receiver can find itself holding an Any that doesn't contain
>	  a type code, and the receiver may have no idea what the
repository
>	  ID inside the implicity type code actually means.

This would be an application error, that would normally be detected
when
trying to extract the value from the any.

>	- It would expose applications to low-level optimizations like
>	  type code caching. I see this firmly as a platform
responsibility.
>	  The whole point of CORBA is that I don't have to worry about
>	  how all this magic happens behind the scenes. If we do what
>	  we suggest, we significantly dilute the protocol and
>	  implementation transparency of CORBA.

Again, I am not proposing to do typecode caching at the application
level.
What I am proposing is an application level optimization.  And I don't

think the protocol/implementation is being exposed so much with this.

>	- The side that incorrectly sends a tk_implicit may not even
be
>	  responsible, because the tk_implicit could have come from
>	  a different sender inside an encapsulation. In other words,
>	  tk_implicit seems to suffer the same problem as the cached
type
>	  codes with respect to encapsulation?

Again, this would be an application problem.

>	- tk_implicit introduces implicit shared state between client
>	  and server which is not visible at the IDL interface level.
>	  In other words, if I just look at an interface like
> 
>		interface foo {
>			void op(in any a);
>		};
>	  
>	  I have no way of telling when, if at all, it is OK to send
>	  an any with tk_implicit. (This is different from the cached
>	  type code proposal, because the caching proposal keeps the
>	  application out of things and is completely transparent).
> 
>	  In effect, your suggestion creates two kinds of IDL type
any.
>	  One that can be sent as tk_implicit, and one that can't.
>	  This is bad, because we now have a single IDL type that has
two
>	  sets of semantics. Worse, I cannot tell without
understanding
>	  all of the surrounding context which set of semantics to
apply.

My original idea was to propose a new IDL type (or a qualifier for
any), 
so that this would be apparent at the interface level. However, I
thought
that it would be too much to introduce this new type.


> So, (I know this doesn't come as a surprise to you ;-), I strongly
oppose
> the idea of letting applications anywhere near caching of type
codes.

Again, I'm not proposing application level caching of typecodes.  
What I am proposing is application agreed use of anys with short
typecodes.

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 12:50:36 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
Reply-To: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Jonathan Biggar wrote:

> I'm just explicitly stating that TypeCode components, although they
are
> in encapsulations themselves inside their parent TypeCode, an
explicit
> exception to the rule.  You should be able to send a TypeCode (at
the
> top level) that is a tk_sequence of a cached typecode.

Yep.

> The scenario in question is a multithreaded server that receives two
> long running requests.  It adds the TypeCodes for each request to
> its
> cache, but it should be free to respond with cache indexes for both
> requests on the first reply to complete.  Otherwise you have to keep
> track of which request to send each cache index on.

OK, you've convinced me on that one, that's a strong argument.

> > Agree. In fact, the most efficient implementation would be to
never outdate
> > anything at all. This would be legal with Javier's proposal.
> 
> So do we add his proposal to your proposal?

I'm in favour of that because I think it works. However, can you give
us your opinion on the "Can(Can't) Accept" and "Will(Won't) Send"
negotiation for caching? Does that hang together? Is it necessary?
I think it can be useful to prevent the situation where one side
needlessly
caches things because the other side will never use the cached
entries.

> > > I agree.  No need to detect duplicates in the cache.  The
receiver can
> > > just keep the most recently received index.
> > 
> > Aren't there race conditions if multiple requests are outstanding
on the
> > same connection?
> 
> No, because the sender's cache maps repository id to index, and the
> receiver's cache maps index to repository id.  So the receiver can
have
> both indexes in its cache with no problem.

OK, I'm happy with that.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 12:54:23 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Jonathan Biggar wrote:

> Michi, this is overkill, and places a burden on the TypeCode cache
to
> match cache entries with the request that created them.  I don't see
any
> reason to limit when the receiver sends back the CacheEntry for a
> TypeCode.  Generally, as soon as possible seems to be the right
> approach, but there isn't any real reason to mandate a particular
time.

Yes, I agree with you.

> > For the similar reasons, we have made it illegal for the sender to
just
> > willy-nilly send CacheEntries for type codes it happens to
understand.
> > Instead, the sender can only send CacheEntries for those type
codes
> > that it previously received from the other party.
> 
> Why not.  If an ORB vendor wants to make it possible for an
application
> to preload the cache with TypeCodes in the applications domain, what
> does it hurt?

OK. I can accept that one too. I was trying to steer the caching
towards
caching type codes that are *actually* used, instead of type codes
that
*may* be used. However, I agree with you that there is no overriding
reason
to forbid this, so we should allow it.

> It would be a good idea to clarify that both sides have a sender
cache
> and a receiver cache, and that the sender cache maps repository ids
to
> indexes, and the receiver cache maps indexes to repository ids.

Completely agree.

> I think if we don't allow sending an Invalidate except in response
to a
> WantToInvalidate, then the race conditions go away.

Yes, I think that's right.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Wed, 15 Jul 1998 19:56:39 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Michi Henning <michi@dstc.edu.au>
CC: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980716123244.22682S-100000@tigger.dstc.edu.au>

Michi Henning wrote:
> > > Agree. In fact, the most efficient implementation would be to
> never outdate
> > > anything at all. This would be legal with Javier's proposal.
> >
> > So do we add his proposal to your proposal?
> 
> I'm in favour of that because I think it works. However, can you
> give
> us your opinion on the "Can(Can't) Accept" and "Will(Won't) Send"
> negotiation for caching? Does that hang together? Is it necessary?
> I think it can be useful to prevent the situation where one side
> needlessly
> caches things because the other side will never use the cached
> entries.

Just modify your TypeCodeCacheEnabled service context to contain:

struct SupportedCaching {
    boolean willing_to_receive;
    boolean willing_to_send;
};

Modify the protocol to only send cache entries if the local end is
willing_to_send and the remote end is willing_to_receive.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 13:08:12 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Javier Lopez-Martin <javier@cnd.hp.com>
cc: interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Javier Lopez-Martin wrote:

> I see no problem in sending the CacheEntry at any time the caching
side
> sees appropriate.  In the other side, the only relevant information
are
> repository id and cache index, both in the received cache entry;
there
> is no need to keep the full typecode on this side.  The interaction
> is totally stateless.  I really see no reason to limit this.

I agree with you and Jon, there is no need to limit this.

> > For the similar reasons, we have made it illegal for the sender to
just
> > willy-nilly send CacheEntries for type codes it happens to
understand.
> > Instead, the sender can only send CacheEntries for those type
codes
> > that it previously received from the other party.
> 
> This is different.  Basically, what would be the criteria to send
the
> typecodes?  There is no reason.  The only way for this to be
effective
> would be to have the application do it, but we don't want that.

As Jon explained, it allows pre-loading of the cache. This may be
far-fetched,
but there is no need to disallow it because it does no harm.

> Let me see if I understand your scenario:
> 
> Reply 1 Fragment 1 arrives in client
> Reply 2 arrives in client, invalidating some cache entries
> Reply 1 Fragment 2 arrives in client, with an invalidated cache
> entry
> 
> This, in my opinion, is a wrong implementation of cache
> invalidation:
> reply 2 should have never invalidated something currently in use.
> Anyhow, even if this happened, as you mentioned earlier, this could
> be
> made to work by inter-thread communication.  This would only be
> needed
> in the presence of fragmentation though.

Yes. As Jon also pointed out, if an Invalidate can only be sent as a
response
to a previous WantToInvalidate, the race conditions go away.

> > > This would still be piggy-backed in the normal messages, and
would
> > > allow for a relatively easy recovery mechanism to clean up
> > > unwanted entries in the caches.  This may be used by either end
of
> > > the connection, to clean up the corresponding caches.
> > 
> > I think that could work, and I don't think it's too complex. In
particular,
> > no-one is forced to obey that protocol again, in the sense that
the client
> > can choose to never send a WantToInvalidate, and the server can
choose
> > to never return a Invalidate.
> 
> And the Invalidate might invalidate something different than that
suggested
> by WantToInvalidate.

I don't think we can allow this, as Jon pointed out. An Invalidate
that
does not indicate an entry that was previously offered with
WantToInvalidate
is a protocol error.

> Also, this would not invalidate your other original ideas: closing a
> connection fully cleans up the cache, and when exhausted, you keep
> discarding new typecodes (ie, not caching them).

Yes.

> > The case you mention is different, in that there is nothing wrong
with
> > what's in the packet as such, just that someone sent the context
at
> > the wrong time. I really would prefer to keep BAD_SERVICE_CONTEXT
for this
> > instead of MARSHAL.
> 
> In the case I argue MARSHAL should be used, there is no service
context
> being sent, so how can you send a BAD_SERVICE_CONTEXT exception,
where
> there is no such thing in the message to which you are responding?
>
> For me, having a value of 0xfffffffd in a kind without having agreed
to
> do caching is just another case of corrupted message buffer, and
therefore
> the exception should be MARSHAL.

Bloody good point ;-) You are right, MARSHAL it is.

> My point is that, even if BAD_SERVICE_CONTEXT would have to be
raised,
> this does not indicate any problem in the real application exchange.
> It might very well be the case that the application would still
> function properly, without noticing this at all (it just would not
> be using caching, but otherwise functioning properly).
> 
> I think I would prefer these exceptions to not be raised, but rather
> be logged or plainly discarded.

OK, I see your point now. The issue really is that we have a protocol
error,
but the error is recoverable because the receiver can simply ignore
the
offending service context.

I agree -- it is a bit harsh to raise an exception in the application
if the error is harmless from the receiver's point of view. How do
other people feel about this?

> My original idea was to propose a new IDL type (or a qualifier for
any), 
> so that this would be apparent at the interface level. However, I
thought
> that it would be too much to introduce this new type.

Absolutely. But giving a single type two sets of semantics doesn't
solve
the problem, IMO.

> Again, I'm not proposing application level caching of typecodes.  
> What I am proposing is application agreed use of anys with short
> typecodes.

But that *is* caching of a sort, even though there may be no physical
cache
involved. The way I see it, it relies on some unspecified
understanding
between client and server to recognize when and when not to send Anys
with tk_implicit. That understanding is not visible in the interface
contract between client and server. That's what I really dislike about
the entire idea.

Also, let's step back for a moment...

This entire idea was motivated by the inefficient marshaling of Any
values.
We are now discussing whether or not to introduce an application-level
mechanism to deal with that inefficiency. This seems to be attacking
the
problem at the wrong level. Let's fix the protocl to get the
marshaling
more efficient and there will be no need to even be tempted to
introduce
application-level hooks.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 13:15:25 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Jonathan Biggar wrote:

> Just modify your TypeCodeCacheEnabled service context to contain:
> 
> struct SupportedCaching {
>     boolean willing_to_receive;
>     boolean willing_to_send;
> };
> 
> Modify the protocol to only send cache entries if the local end is
> willing_to_send and the remote end is willing_to_receive.

Sounds good.

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <javier@cnd.hp.com>
Date: Wed, 15 Jul 1998 21:19:12 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: javier@cnd.hp.com, michi@dstc.edu.au
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
Cc: interop@omg.org
Content-Md5: 1bFQUwgL4gHQ+yqS1O4vqA==

> > > For the similar reasons, we have made it illegal for the sender
to just
> > > willy-nilly send CacheEntries for type codes it happens to
understand.
> > > Instead, the sender can only send CacheEntries for those type
codes
> > > that it previously received from the other party.
> > 
> > This is different.  Basically, what would be the criteria to send
the
> > typecodes?  There is no reason.  The only way for this to be
effective
> > would be to have the application do it, but we don't want that.
> 
> As Jon explained, it allows pre-loading of the cache. This may be
far-fetched,
> but there is no need to disallow it because it does no harm.

That was also what I was referring with "the application".  The ORB
has no
clue on how to populate the cache, but the application might.  If the
agreement is to allow this, shouldn't a standard mechanism be
specified
for this as well?  I'm also ok with letting it to ORB implementors ...

> > Let me see if I understand your scenario:
> > 
> > Reply 1 Fragment 1 arrives in client
> > Reply 2 arrives in client, invalidating some cache entries
> > Reply 1 Fragment 2 arrives in client, with an invalidated cache
> entry
> > 
> > This, in my opinion, is a wrong implementation of cache
> invalidation:
> > reply 2 should have never invalidated something currently in use.
> > Anyhow, even if this happened, as you mentioned earlier, this
> could be
> > made to work by inter-thread communication.  This would only be
> needed
> > in the presence of fragmentation though.
> 
> Yes. As Jon also pointed out, if an Invalidate can only be sent as a
> response
> to a previous WantToInvalidate, the race conditions go away.

I don't think I see why.  Besides, I would prefer the sending side to
be
able to clean up the cache independently of what the receiving side
has
to say, so I would prefer to keep the possibility of un-provoked
Invalidates,
if at all possible.

> > > > This would still be piggy-backed in the normal messages, and
would
> > > > allow for a relatively easy recovery mechanism to clean up
> > > > unwanted entries in the caches.  This may be used by either
end of
> > > > the connection, to clean up the corresponding caches.
> > > 
> > > I think that could work, and I don't think it's too complex. In 
particular,
> > > no-one is forced to obey that protocol again, in the sense that
the client
> > > can choose to never send a WantToInvalidate, and the server can
choose
> > > to never return a Invalidate.
> > 
> > And the Invalidate might invalidate something different than that
suggested
> > by WantToInvalidate.
> 
> I don't think we can allow this, as Jon pointed out. An Invalidate
that
> does not indicate an entry that was previously offered with
WantToInvalidate
> is a protocol error.

Why?  It is not a protocol error in my proposal ...  I would like to
understand why this limitation.

> > My original idea was to propose a new IDL type (or a qualifier for
any), 
> > so that this would be apparent at the interface level. However, I
thought
> > that it would be too much to introduce this new type.
> 
> Absolutely. But giving a single type two sets of semantics doesn't
solve
> the problem, IMO.
> 
> > Again, I'm not proposing application level caching of typecodes.  
> > What I am proposing is application agreed use of anys with short
typecodes.
> 
> But that *is* caching of a sort, even though there may be no
physical cache
> involved. The way I see it, it relies on some unspecified
understanding
> between client and server to recognize when and when not to send
Anys
> with tk_implicit.

I wouldn't call this caching ...

> That understanding is not visible in the interface
> contract between client and server. That's what I really dislike
> about
> the entire idea.

I agree it would be nicer to have it visible at the interface level.

> Also, let's step back for a moment...
> 
> This entire idea was motivated by the inefficient marshaling of Any
> values.
> We are now discussing whether or not to introduce an
> application-level
> mechanism to deal with that inefficiency. This seems to be attacking
> the
> problem at the wrong level. Let's fix the protocl to get the
> marshaling
> more efficient and there will be no need to even be tempted to
> introduce
> application-level hooks.

You see, I'm a little more pesimistic than you with regards to ORB
vendors
implementing this caching schema.  That's why I'm proposing this
application level, simple hook.  Granted, when all ORBs implement the
caching mechanism, it would be useless.  But, in the meantime ...

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 14:11:20 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Javier Lopez-Martin <javier@cnd.hp.com>
cc: interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Javier Lopez-Martin wrote:

> > 
> > Yes. As Jon also pointed out, if an Invalidate can only be sent as
> a response
> > to a previous WantToInvalidate, the race conditions go away.
> 
> I don't think I see why.  Besides, I would prefer the sending side
> to be
> able to clean up the cache independently of what the receiving side
> has
> to say, so I would prefer to keep the possibility of un-provoked
> Invalidates,
> if at all possible.

If unprovoked Invalidates are possible, then the sender of the
Invalidate
can never be sure when it is safe to remove the entry it wants to
invalidate
from the cache, due to long-running operations whose results may still
be outstanding.

With a WantToInvalidate first, the side that wants to invalidate can
be
sure that it is safe to remove the entry only after it has seen the
confirmation from the other end in form of an Invalidate reply that
the
other end will not send another CacheEntry for that type code. In
other words,
the WantToInvalidate/Invalidate protocol ensures that no-one removes
an
entry from the cache prematurely.

> > I don't think we can allow this, as Jon pointed out. An Invalidate
that
> > does not indicate an entry that was previously offered with
WantToInvalidate
> > is a protocol error.
> 
> Why?  It is not a protocol error in my proposal ...  I would like to
> understand why this limitation.

See above.

> You see, I'm a little more pesimistic than you with regards to ORB
vendors
> implementing this caching schema.  That's why I'm proposing this
> application level, simple hook.  Granted, when all ORBs implement
the
> caching mechanism, it would be useless.  But, in the meantime ...

No, no "in the meantime..." please. If this caching proposal doesn't
stand
up, let's axe it and shed no tears. Instead of adding things at the
application level, I'd much rather wait until GIOP 2.0 (or HTTP-NG, or
whatever) and do it properly then.

CORBA has survived for quite a while on GIOP 1.0 and 1.1 -- it's not
going to keel over and die if we can't get more efficient type code
marshaling into the spec this time around. Let's not add stuff to the
application layer that should be handled at the protocol level as a
stop-
gap measure. We'll regret it later.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <beckwb@ois.com>
Date: Thu, 16 Jul 1998 00:20:31 -0400 (EDT)
From: Bill Beckwith <bill.beckwith@ois.com>
X-Sender: beckwb@gamma
To: interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Wed, 15 Jul 1998, Javier Lopez-Martin wrote:

> > Also, let's step back for a moment...
> > 
> > This entire idea was motivated by the inefficient marshaling of
> Any values.
> > We are now discussing whether or not to introduce an
> application-level
> > mechanism to deal with that inefficiency. This seems to be
> attacking the
> > problem at the wrong level. Let's fix the protocl to get the
> marshaling
> > more efficient and there will be no need to even be tempted to
> introduce
> > application-level hooks.
> 
> You see, I'm a little more pesimistic than you with regards to ORB
> vendors
> implementing this caching schema.  That's why I'm proposing this
> application level, simple hook.  Granted, when all ORBs implement
> the
> caching mechanism, it would be useless.  But, in the meantime ...

I have been catching up on this entire discussion of typecode caching.
Geez its making me crazy.  I'll try and sumarize my feelings:

Be careful about designing some new, complicated typecode caching
mechanism that:

	a. may _not_ improve or may hurt performance (please prototype
	   before legistating!)

	b. will increase the size and bug count of ORBs (at a
	   critical layer)

	c. may reduce ORB interoperability (because of bugs and
	   misinterpretations)

	d. introduces connection state (I realize that the char set
	   negotiation is also state based)

	e. further muddies what little layering and separation we
	   have between CDR, GIOP, and IIOP

The goal of optimizing type Any's is valiant, but the current path
looks suspect to me.

It is not clear what the optimization goal is.  Reduce data sent over
the wire?  Reduce memory in client and/or server?  Reduce client 
and/or server CPU?

I'm not sure my points are clear but I wanted to be brief. :-)
I'll expound if requested.

-- Bill

Return-Path: <javier@cnd.hp.com>
From: Javier Lopez-Martin <javier@cnd.hp.com>
Subject: Re: OMG:Re: Vote 2, Issues 817, 1138, 1531, and 1581
To: michi@dstc.edu.au
Date: Thu, 16 Jul 1998 0:25:45 MDT
Cc: javier@cnd.hp.com, interop@omg.org

Michi,

> > > Yes. As Jon also pointed out, if an Invalidate can only be sent
as a response
> > > to a previous WantToInvalidate, the race conditions go away.
> > 
> > I don't think I see why.  Besides, I would prefer the sending side
to be
> > able to clean up the cache independently of what the receiving
side has
> > to say, so I would prefer to keep the possibility of un-provoked
Invalidates,
> > if at all possible.
> 
> If unprovoked Invalidates are possible, then the sender of the
Invalidate
> can never be sure when it is safe to remove the entry it wants to
invalidate
> from the cache, due to long-running operations whose results may
still
> be outstanding.

The sender of the Invalidate can invalidate as soon as it sends the
message.  Anything that is done after that must not (and cannot) use
the invalidated cache entries.  Invalidate is sent from the "sending"
end to the "receiving" side, so the cached typecode has either been
sent already, or won't be sent at all.  This may be done at any time,
without regards of the WantToInvalidate having happened.

The only potential situation is, as you already pointed, in the
receiving side, if a message is sent with fragmentation, and it is
interspersed with the one sending the Invalidate, and it has a
reference to an invalidated typecode.  However, this situation may
be solved (in several different ways I can think of) at either end
of the connection.

> With a WantToInvalidate first, the side that wants to invalidate can
be
> sure that it is safe to remove the entry only after it has seen the
> confirmation from the other end in form of an Invalidate reply that
the
> other end will not send another CacheEntry for that type code. In
other words,
> the WantToInvalidate/Invalidate protocol ensures that no-one removes
an
> entry from the cache prematurely.

I don't see how requiring the WantToInvalidate prior helps in the
above
scenario: a fragment received, then a WantToInvalidate sent, then
the Invalidate received, then the next fragment received ...
Same problem as above (and same solutions).

> > > I don't think we can allow this, as Jon pointed out. An
Invalidate that
> > > does not indicate an entry that was previously offered with
WantToInvalidate
> > > is a protocol error.
> > 
> > Why?  It is not a protocol error in my proposal ...  I would like
to
> > understand why this limitation.
> 
> See above.

I still don't see it.  See above.

> > You see, I'm a little more pesimistic than you with regards to ORB
vendors
> > implementing this caching schema.  That's why I'm proposing this
> > application level, simple hook.  Granted, when all ORBs implement
the
> > caching mechanism, it would be useless.  But, in the meantime ...
> 
> No, no "in the meantime..." please. If this caching proposal doesn't
stand
> up, let's axe it and shed no tears. Instead of adding things at the
> application level, I'd much rather wait until GIOP 2.0 (or HTTP-NG,
or
> whatever) and do it properly then.
> 
> CORBA has survived for quite a while on GIOP 1.0 and 1.1 -- it's not
> going to keel over and die if we can't get more efficient type code
> marshaling into the spec this time around. Let's not add stuff to
the
> application layer that should be handled at the protocol level as a
stop-
> gap measure. We'll regret it later.

It seems to me that it is just you, Jon and me that have opinions on
this.
Let's get others.  But if there are no more opinions, I'm ready to
give up:
it's not worth adding anything without enough interest/support.

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Thu, 16 Jul 1998 16:42:33 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Javier Lopez-Martin <javier@cnd.hp.com>
cc: interop@omg.org
Subject: Re: OMG:Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Thu, 16 Jul 1998, Javier Lopez-Martin wrote:

> The sender of the Invalidate can invalidate as soon as it sends the
> message.  Anything that is done after that must not (and cannot) use
> the invalidated cache entries.  Invalidate is sent from the
> "sending"
> end to the "receiving" side, so the cached typecode has either been
> sent already, or won't be sent at all.  This may be done at any
> time,
> without regards of the WantToInvalidate having happened.

I don't think this works. Consider. You are the server, I am the
client.
Some time in the past, you sent me an any with a full type code for
for
type FOO. I cached that type code and told you to use index 20 for it.
>From now on, you always send index 20. Two hours later, my cache is
full,
and I decide I want to ditch FOO at index 20. Now I send you an
"Invalidate 20" service context in one of my requests and throw away
the entry in slot 20.

Unfortunately for me, I am multi-threaded and have all sorts of
invocations going on in parallel. So, one of your replies trundles in
containing a full type code for BAR. I go and cache BAR at index
20. More
or less in parallel, while I've done this, you finish completing
another
request of mine (before my "Invalidate 20" entry has reached you or
you
had a chance to process it). Because you still don't know that slot 20
contains now BAR on my side, you give me a cached type code with slot
20, but as far as you are concerned, it still denotes FOO. As a
result,
I go and unpack your Any containing a BAR using the type information
for FOO. Of course, that won't work.

> The only potential situation is, as you already pointed, in the
> receiving side, if a message is sent with fragmentation, and it is
> interspersed with the one sending the Invalidate, and it has a
> reference to an invalidated typecode.  However, this situation may
> be solved (in several different ways I can think of) at either end
> of the connection.

I think the above example shows that it may fail even without
fragmentation.

Coincidentally, the race is introduced *not* by the fact that we are
using
numeric indexes. The above scenario cannot happen if we use repository
IDs
instead of indexes, but it can *still* happen that I throw my type
code
out of the cache while you are still using it for cached type
codes. In that
case, I will hit an empty slot when your reply finally arrives instead
of
using the wrong type code. The end result is just as bad of course.

> I don't see how requiring the WantToInvalidate prior helps in the
above
> scenario: a fragment received, then a WantToInvalidate sent, then
> the Invalidate received, then the next fragment received ...
> Same problem as above (and same solutions).

The above scenario is solved by this. I first send you a
"WantInvalidate 20".
Now you know that I want to stop using slot 20, but that I guarantee
to
keep it reserved for you until you tell me its OK to throw that entry
away.
For example, you may count the outstanding requests per connection -
as soon
as the outstanding request count drops to zero, you now know that no
operation
is still in progress that may produce a result containing a FOO, and
therefore,
next time you send me a reply, you tell me "Invalidate 20" (assuming
that
reply does not contain a FOO itself). Now I know I can safely reuse
slot 20
because you have promised me that you won't send me more replies using
slot 20.

> I still don't see it.  See above.

I hope the above makes it clearer?

> It seems to me that it is just you, Jon and me that have opinions on
this.
> Let's get others.  But if there are no more opinions, I'm ready to
give up:
> it's not worth adding anything without enough interest/support.

Well, I was hoping that Bob "Old Eagle Eye" Kukura would cast an eye
on this.
Bob has saved me from making a fool of myself in the past, so I was
hoping
he would do it this time 'round too! ;-)

I agree though. If no-one else says Yay or Nay, we might as well drop
it.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <loewis@hp832.informatik.hu-berlin.de>
Date: Thu, 16 Jul 1998 12:08:23 +0200
From: "Martin v. Loewis" <loewis@informatik.hu-berlin.de>
To: michi@dstc.edu.au
CC: jon@floorboard.com, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980716085128.22682C-100000@tigger.dstc.edu.au>

> So, back to the in string parameter. Let's see what the performance
hit
> would be, at least roughly. For collocated calls, it's not an issue
because
> no marshaling buffer is involved. So, the additional string copy is
> necessary only for remote calls (during unmarshaling of the string,
the
> server-side ORB has to copy the string into a separate
NUL-terminated
> hunk of memory and pass a pointer to that hunk to the skeleton).

I believe a clever implementation could avoid that. You could just
overwrite the byte following the string. It always contains a
primitive type (maybe a sequence length), which you would have to
fetch in advance, or padding, or it is the end of the buffer, in which
case the marshalling buffer should provide extra space in advance.

So, I don't think that such a change in the marshalling would require
implementations to perform additional copies.

Regards,
Martin

Return-Path: <paulk@noblenet.com>
Date: Thu, 16 Jul 1998 10:31:03 -0400
From: Paul H Kyzivat <paulk@noblenet.com>
Organization: NobleNet
To: Michi Henning <michi@dstc.edu.au>
CC: Javier Lopez-Martin <javier@cnd.hp.com>, jon@floorboard.com,
        interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980716103506.22682G-100000@tigger.dstc.edu.au>

Michi Henning wrote:
> 
> Yes. If we want to accommodate cached type codes inside
> encapsulations,
> we get a problem though, namely that the receiver of the
> encapsulation
> can no longer blindly forward it to another destination (which was
> the
> motivation for encapsulating the value part of an Any in the first
> place).
> 
> This seems like a catch-22: either we have encapsulated Any values,
> or
> we have cached type codes, but I can't see how to have both with the
> current proposal.
> 
> Javier, I understand your reasoning -- for the JIDM mapping, Anys
> nested
> inside other Anys are important. On the other hand, how much of
> CORBA
> should be tuned to the requirements of one particular object model
> bridge? I'm not saying that what you are asking for is unreasonable,
> it's
> just that I'm not sure how many concessions CORBA should make to
> accomodate bridges that arguable are not the mainstream use of
> CORBA.
> (This is a little bit off-topic and more of a philosophical and
> political
> question. I didn't mean it to be demeaning.)
> 
> One option I am carefully considering (but I'm not sure whether it
> makes
> sense yet) is that CDR 1.1 could add a flag to encapsulations that
> indicates
> whether an encapsulation includes a cached type code or not. If not,
> it is safe to forward the encapsulation without decoding it.
> 
> This would (sort of) get around the problem, but I'm worried about
> the
> ever-increasing complexity. The basic caching proposal we put on the
> table has the advantage of simplicity. If we try to layer
> all-singing
> and all-dancing caching stuff onto service contexts, I would
> probably
> prefer to drop the proposal. The intent was not to invent a whole
> new
> protocol, but to get an optimization for the existing one without
> disturbing too much.
> 
> What we are running into here really is that GIOP is difficult to
> extend
> for new functionality. GIOP mangles different abstraction levels,
> such
> as address resolution (LocationForward), marshaling (CDR rules),
> fragmentation, and connection management into a single big blob.
> 
> If this proposal turns into a monolith of complexity and hidden
> stateful
> interactions between client and server, I definitely won't support
> it.
> I don't want to go down in history as yet another person who has
> made
> CORBA worse instead of better...

Thank you for saying this. I have been thinking it for awhile now.

This caching scheme works poorly in conjunction with encapsulations,
and
they are important. And needing to maintain per connection caches
troubles me.

Yet the goal of reducing the overhead of sending typecodes is
important,
and worth the work. I wish a clean answer was apparent to be, but it
isn't.
 
Return-Path: <javier@cnd.hp.com>
Date: Thu, 16 Jul 1998 10:34:19 -0600
From: Javier Lopez-Martin <javier@cnd.hp.com>
To: javier@cnd.hp.com, michi@dstc.edu.au
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
Cc: interop@omg.org
Content-Md5: yy91RVfGLhdnF1nLmfGYwA==

> On Thu, 16 Jul 1998, Javier Lopez-Martin wrote:
> 
> > The sender of the Invalidate can invalidate as soon as it sends
> the
> > message.  Anything that is done after that must not (and cannot)
> use
> > the invalidated cache entries.  Invalidate is sent from the
> "sending"
> > end to the "receiving" side, so the cached typecode has either
> been
> > sent already, or won't be sent at all.  This may be done at any
> time,
> > without regards of the WantToInvalidate having happened.
> 
> I don't think this works. Consider. You are the server, I am the
> client.
> Some time in the past, you sent me an any with a full type code for
> for
> type FOO. I cached that type code and told you to use index 20 for
> it.
> >From now on, you always send index 20. Two hours later, my cache is
> full,
> and I decide I want to ditch FOO at index 20. Now I send you an
> "Invalidate 20" service context in one of my requests and throw away
> the entry in slot 20.

This is *incorrect* according to my proposal.  The side that has the
actual cache (you) can only send a hint by sending "WantToInvalidate
20", 
and can only clean up when it has *received* an "Invalidate 20" (that
can only be sent by the sending side -me-).


> Unfortunately for me, I am multi-threaded and have all sorts of
> invocations going on in parallel. So, one of your replies trundles
> in
> containing a full type code for BAR. I go and cache BAR at index
> 20. More
> or less in parallel, while I've done this, you finish completing
> another
> request of mine (before my "Invalidate 20" entry has reached you or
> you
> had a chance to process it). Because you still don't know that slot
> 20
> contains now BAR on my side, you give me a cached type code with
> slot
> 20, but as far as you are concerned, it still denotes FOO. As a
> result,
> I go and unpack your Any containing a BAR using the type information
> for FOO. Of course, that won't work.

As I said, the scenario you are presenting is ill-behaved with respect
to my original proposal for invalidation. "Invalidate" travels from
sender to receiver, while "WantToInvalidate" goes in the opposite
direction.

> > The only potential situation is, as you already pointed, in the
> > receiving side, if a message is sent with fragmentation, and it is
> > interspersed with the one sending the Invalidate, and it has a
> > reference to an invalidated typecode.  However, this situation may
> > be solved (in several different ways I can think of) at either end
> > of the connection.
> 
> I think the above example shows that it may fail even without
> fragmentation.
> 
> Coincidentally, the race is introduced *not* by the fact that we are
> using
> numeric indexes. The above scenario cannot happen if we use
> repository IDs
> instead of indexes, but it can *still* happen that I throw my type
> code
> out of the cache while you are still using it for cached type
> codes. In that
> case, I will hit an empty slot when your reply finally arrives
> instead of
> using the wrong type code. The end result is just as bad of course.

Does the misinterpretation in the protocol make any difference to you?
It looks to me as if the problem is not such when you assign the right
directionality to each message ...


> > I don't see how requiring the WantToInvalidate prior helps in the
above
> > scenario: a fragment received, then a WantToInvalidate sent, then
> > the Invalidate received, then the next fragment received ...
> > Same problem as above (and same solutions).
> 
> The above scenario is solved by this. I first send you a
"WantInvalidate 20".
> Now you know that I want to stop using slot 20, but that I guarantee
to
> keep it reserved for you until you tell me its OK to throw that
entry away.
> For example, you may count the outstanding requests per connection -
as soon
> as the outstanding request count drops to zero, you now know that no
operation
> is still in progress that may produce a result containing a FOO, and

therefore,
> next time you send me a reply, you tell me "Invalidate 20" (assuming
> that
> reply does not contain a FOO itself). Now I know I can safely reuse
> slot 20
> because you have promised me that you won't send me more replies
> using slot 
20.

Yes, this is the "legal" way to do it (according to my original
proposal):
the "receiving" side (you) can only send a hint "WantToInvalidate",
and
the "sending" side (me) is the ultimate authority (the only allowed to
send "Invalidate").  However, it is unimportant, in my opinion,
whether
I have received a "WantToInvalidate" or not prior to sending the
"Invalidate":
either way, the situation is the same, control is in the sending side,
that is the one with the right information to make that decission.

> > I still don't see it.  See above.
> 
> I hope the above makes it clearer?

I hope the clarification in the protocol intentions solve the issue.

> > It seems to me that it is just you, Jon and me that have opinions
on this.
> > Let's get others.  But if there are no more opinions, I'm ready to
give up:
> > it's not worth adding anything without enough interest/support.
> 
> Well, I was hoping that Bob "Old Eagle Eye" Kukura would cast an eye
on this.
> Bob has saved me from making a fool of myself in the past, so I was
hoping
> he would do it this time 'round too! ;-)
> 
> I agree though. If no-one else says Yay or Nay, we might as well
drop it.

Exactly.

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com

Return-Path: <jgoldberg@inprise.com>
Date: Thu, 16 Jul 1998 19:07:22 -0700
From: "Jon Goldberg" <jgoldberg@inprise.com>
To: Javier Lopez-Martin <javier@cnd.hp.com>
CC: michi@dstc.edu.au, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References: <s5adcaef.082@corp.inprise.com>

Hi Folks-

I've finally caught up on this thread, and want to
say that I think the goals of the proposal and most
of the spec work you guys have done is very laudable.
Unfortunately, I'm *really* hesitant about us 
pursuing the inclusion of these changes in CORBA 2.3.

One of the major points of these short-deadline RTFs
was to take what we already had adopted since CORBA 2.2,
integrate the various RTFs and specs and clean up any
holes or rough spots that had been exposed.  I feel
rather strongly that this is the wrong time to throw
a major new interop spec into the works.  I have no
problem with continuing to refine it and getting 
implementation experience with the new caching
protocol, but see its inclusion in CORBA 2.3 as too
destabilizing.  We are so close to the deadline, I just
don't think the spec has enough time to get "tested"
right now.

So I guess I propose that we save this CDR revision and
typecode caching protocol to be part of the next RTF and
probably fit under the heading of "GIOP Optimizations".

take care,
 Jon

Return-Path: <kukura@iona.com>
Date: Thu, 16 Jul 1998 23:33:25 -0400
From: Bob Kukura <kukura@iona.com>
Organization: IONA Technologies
To: Michi Henning <michi@dstc.edu.au>
CC: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org,
vinoski@iona.com,
        mmihic@iona.com
Subject: Re: OMG:Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980716162743.22682b-100000@tigger.dstc.edu.au>

Michi Henning wrote:
[...]
> 
> > It seems to me that it is just you, Jon and me that have opinions
on this.
> > Let's get others.  But if there are no more opinions, I'm ready to
give up:
> > it's not worth adding anything without enough interest/support.
> 
> Well, I was hoping that Bob "Old Eagle Eye" Kukura would cast an eye
on this.
> Bob has saved me from making a fool of myself in the past, so I was
hoping
> he would do it this time 'round too! ;-)

I think my "eagle eye" is going to need glasses after this.  I've
spent
most of
the day sorting out and digesting this discussion, and still haven't
come to a conclusion one way or the other.  Basically, I see
value in making Anys more efficient, agree that this this proposal
achieves this goal admirably in certain kinds of applications, but
share
all the concerns that
Bill Beckwith expressed.  Here are some of my thoughts:

* I'm very concerned about the extra effort required by ORB vendors to
build and test a GIOP 1.2 that included CDR 1.1 over one that did
not. 
I think its important that vendors quickly converge on interoperable
implementations of GIOP 1.2, because OBV, messaging, firewalls,
etc. all
depend on it.   Nothing we add in 1.2 can make the ORBs any simpler,
since 1.0 and 1.1 must continue to be supported.  Making TypeCode
caching an optional conformance point helps, but might really mean
that
noone ends up implementing it.

* GIOP 1.2 is a minor revision of GIOP 1.1, and therefore will be
expected to be more stable than 1.1, with a better chance of any two
implementations successfully interoperating.  The additional
complexity
of TypeCode caching and multiple CDR versions seriously jeopardizes
this, in my opinion.  OBV does too, but at least it shouldn't break
things when values aren't being used.

* One question I kept asking myself is: if we were given a clean slate
to
design GIOP 2.0 and CDR 2.0, is this the way we would do it?  I'd hope
we wouldn't have to specify completely different mechanisms for
TypeCode
indirections, OBV indirections, and TypeCode caching.  Also, I would
probably want to look at optimization techniques that would apply to
object
references as well as TypeCodes, and that would handle Anys in Anys.

* If we were going to expand the use of encapsulations (i.e. for
Anys),
I'd really like to see something like Bill Janssen's chunked
encapsulation proposal included so that we don't lose the ability to
stream large Anys as a series of fragments.

* The approach of having the receiver indicate to the sender which
TypeCodes it has decided to cache avoids a lot of possible problems
related to interleaved fragments from different messages being
transmitted in a different order than that in which they are generated
(at least until cache invalidation is added).  But I'm having trouble
imagining a basis on which a receiver would decide whether to cache a
particular TypeCode.  The ORB would really have no idea whether or not
a
particular TypeCode was likely to be retransmitted on a connection,
and
would end up caching everything (or nothing).  But this caching not
only
costs memory and CPU cycles in both the sender and receiver, but also
costs bandwidth for the various ServiceContexts.  This attempt to
optimize bandwidth will actually increase use of bandwidth for
applications that don't resend the same TypeCodes.

* I've got some concerns about what happens when a request is canceled
by a client.  The server may or may not have already processed prior
fragments containing ServiceContext information, so the client won't
know whether the server has seen the cache related stuff in the
ServiceContextList.

* I would really like to see Anys made efficient enough that they
could
be used wherever self describing data is needed, such as in OBV.  This
proposal might be sufficient.

* Finally, I wonder if there are simpler enhancements we could make
that
would improve the situation somewhat:

One idea, that might help OBV if its RTF were seriously considering
Anys, would be to allow the existing TypeCode indirection mechanism to
refer outside the top level TypeCode to previous TypeCode instances in
the same message.  We couldn't do this on a connection scope because
the
ORB doesn't necessarily know the order fragments are going to be sent
in
as it is marshaling into them, so this would only help when the same
TypeCode appeared several times in the same message (i.e. OBV).

Another idea would be to further optimize the existing CDR TypeCode
representation.  The names are optional in CDR 1.0, and this proposal
attempted to optimize the empty names, but why not use an encoding in
Anys that doesn't even allow for the names to be transmitted, and
never
represents aliases?  This would save additional bytes.  All that would
be needed would be the RepositoryIds and structural information. 
Encapsulations, or at least nested encapsulations, could also be
avoided.  The regular CDR TypeCode would still be used when TypeCodes
are explicitly sent (i.e. not as part of Anys (or values)), so names
and
aliases would be preserved.  I think this approach could reduce the
Any
transmission overhead significantly, without the complexity of
caching. 
If anyone is interested, I could work up a more specific proposal.

-Bob

> 
> I agree though. If no-one else says Yay or Nay, we might as well
> drop it.
> 
>                                                         Cheers,
> 
>
> Michi.
> --
> Michi Henning              +61 7 33654310
> DSTC Pty Ltd               +61 7 33654311 (fax)
> University of Qld 4072     michi@dstc.edu.au
> AUSTRALIA
> http://www.dstc.edu.au/BDU/staff/michi-henning.html
Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Thu, 16 Jul 1998 20:41:07 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Jon Goldberg <jgoldberg@inprise.com>
CC: Javier Lopez-Martin <javier@cnd.hp.com>, michi@dstc.edu.au,
        interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References: <s5adcaef.082@corp.inprise.com>
<35AEB1DA.7D336FD@inprise.com>

Jon Goldberg wrote:
> 
> Hi Folks-
> 
> I've finally caught up on this thread, and want to
> say that I think the goals of the proposal and most
> of the spec work you guys have done is very laudable.
> Unfortunately, I'm *really* hesitant about us
> pursuing the inclusion of these changes in CORBA 2.3.
> 
> One of the major points of these short-deadline RTFs
> was to take what we already had adopted since CORBA 2.2,
> integrate the various RTFs and specs and clean up any
> holes or rough spots that had been exposed.  I feel
> rather strongly that this is the wrong time to throw
> a major new interop spec into the works.  I have no
> problem with continuing to refine it and getting
> implementation experience with the new caching
> protocol, but see its inclusion in CORBA 2.3 as too
> destabilizing.  We are so close to the deadline, I just
> don't think the spec has enough time to get "tested"
> right now.
> 
> So I guess I propose that we save this CDR revision and
> typecode caching protocol to be part of the next RTF and
> probably fit under the heading of "GIOP Optimizations".

You are right that this is a pretty large change, and given the amount
of "design" discussion that has gone on in the last couple of days, I
can certainly see why there should be concern about rushing this into
the spec.

One point in the proposals merit is that it is an entirely transparent
and optional component of the protocol, so no one has to implement it
in
order to be conformant.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Thu, 16 Jul 1998 20:42:18 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Jon Goldberg <jgoldberg@inprise.com>
CC: Javier Lopez-Martin <javier@cnd.hp.com>, michi@dstc.edu.au,
        interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References: <s5adcaef.082@corp.inprise.com>
<35AEB1DA.7D336FD@inprise.com>

Jon Goldberg wrote:
> I've finally caught up on this thread, and want to
> say that I think the goals of the proposal and most
> of the spec work you guys have done is very laudable.
> Unfortunately, I'm *really* hesitant about us
> pursuing the inclusion of these changes in CORBA 2.3.
> 
> One of the major points of these short-deadline RTFs
> was to take what we already had adopted since CORBA 2.2,
> integrate the various RTFs and specs and clean up any
> holes or rough spots that had been exposed.  I feel
> rather strongly that this is the wrong time to throw
> a major new interop spec into the works.  I have no
> problem with continuing to refine it and getting
> implementation experience with the new caching
> protocol, but see its inclusion in CORBA 2.3 as too
> destabilizing.  We are so close to the deadline, I just
> don't think the spec has enough time to get "tested"
> right now.
> 
> So I guess I propose that we save this CDR revision and
> typecode caching protocol to be part of the next RTF and
> probably fit under the heading of "GIOP Optimizations".

One more comment and a question.  The proposal is really two parts:
the
new CDR 1.1 marshaling rules, and the new TypeCode caching scheme.
Does
your concern apply to both or just the latter?

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <janssen@parc.xerox.com>
Date: Thu, 16 Jul 1998 21:05:59 PDT
Sender: Bill Janssen <janssen@parc.xerox.com>
From: Bill Janssen <janssen@parc.xerox.com>
To: Michi Henning <michi@dstc.edu.au>, Bob Kukura <kukura@iona.com>
Subject: Re: OMG:Re: Vote 2, Issues 817, 1138, 1531, and 1581
CC: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org,
vinoski@iona.com,
        mmihic@iona.com
References:
<Pine.OSF.3.96.980716162743.22682b-100000@tigger.dstc.edu.au>
	<35AEC605.DC01462F@iona.com>

Excerpts from local.omg: 16-Jul-98 Re: OMG:Re: Vote 2, Issues .. Bob
Kukura@iona.com (5720*)

> * If we were going to expand the use of encapsulations (i.e. for
Anys),
> I'd really like to see something like Bill Janssen's chunked
> encapsulation proposal included so that we don't lose the ability to
> stream large Anys as a series of fragments.

Don't worry, I'll put it in the HTTP-ng wire protocol :-).

Bill

Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Fri, 17 Jul 1998 14:44:23 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Bob Kukura <kukura@iona.com>
cc: Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org,
vinoski@iona.com,
        mmihic@iona.com
Subject: Re: OMG:Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Thu, 16 Jul 1998, Bob Kukura wrote:

> I think my "eagle eye" is going to need glasses after this.
Bob,

ruining your eye-sight to check the sanity of this proposal... talk
about dedication... :-)

> * I'm very concerned about the extra effort required by ORB vendors
to
> build and test a GIOP 1.2 that included CDR 1.1 over one that did
not. 
> I think its important that vendors quickly converge on interoperable
> implementations of GIOP 1.2, because OBV, messaging, firewalls,
etc. all
> depend on it.   Nothing we add in 1.2 can make the ORBs any simpler,
> since 1.0 and 1.1 must continue to be supported.

Yes. Considering that a number of ORBs still do not have completely
correct
and reliable IIOP implementations, that is a very real
concern. Personally,
I prefer something that works slowly to something that doesn't work
fast... ;-)

> Making TypeCode
> caching an optional conformance point helps, but might really mean
> that
> noone ends up implementing it.

Yes. Javier expressed the same concern. My hope was that the
performance
edge it can provide would provide the motivation, but that may well be
a naive assumption.

> * GIOP 1.2 is a minor revision of GIOP 1.1, and therefore will be
> expected to be more stable than 1.1, with a better chance of any two
> implementations successfully interoperating.  The additional
> complexity
> of TypeCode caching and multiple CDR versions seriously jeopardizes
> this, in my opinion.  OBV does too, but at least it shouldn't break
> things when values aren't being used.

I agree. The caching proposal isn't totally trivial to implement, even
though we tried to keep it simple.

> * One question I kept asking myself is: if we were given a clean
slate
> to
> design GIOP 2.0 and CDR 2.0, is this the way we would do it?

My answer is a most emphatic "no". I've been feeling vaguely guilty
about
abusing service contexts to effectively layer anothe protocol
"underneath"
GIOP. The problem really is that all of GIOP is one big
mangled-together
ball, without clear delineation of concerns. A more layered protocol
architecture would keep things like address resolution, marshaling,
caching,
connection management, and fragmentation separate from each other. I
don't
think we need seven layers to achieve this ;-) but two or three
definitely
wouldn't do any harm...

> * The approach of having the receiver indicate to the sender which
> TypeCodes it has decided to cache avoids a lot of possible problems
> related to interleaved fragments from different messages being
> transmitted in a different order than that in which they are
> generated
> (at least until cache invalidation is added).  But I'm having
> trouble
> imagining a basis on which a receiver would decide whether to cache
> a
> particular TypeCode.  The ORB would really have no idea whether or
> not a
> particular TypeCode was likely to be retransmitted on a connection,
> and
> would end up caching everything (or nothing).

My expectation was that the usual LRU or LFU approach would take care
of
this. The actual caching step is cheap, considering that in order to
cache
something, the receiving ORB has decoded it already, and so only needs
to keep what it already has around in the cache.

> But this caching not only
> costs memory and CPU cycles in both the sender and receiver, but
> also
> costs bandwidth for the various ServiceContexts.  This attempt to
> optimize bandwidth will actually increase use of bandwidth for
> applications that don't resend the same TypeCodes.

Yes. If an application continously sends anys with different type
codes,
bandwidth goes up. However, is that ever going to happen? I think not.
For one, type codes are the result of IDL definitions written by
humans,
so by their very nature, there are fewer type codes than there are
values instances using those types codes (by many orders of
magnitude).
Second, the vast majority of applications will deal over and over with
the
same type codes. Of course, something like an event channel has to be
able
to deal with all possible type codes, but at any one period of time,
it will
deal with a small number of them, simply because the suppliers and
consumers
are working in their respective application domains, which again will
only use a small number of type codes (relative to the number of
values sent
that use those type codes).

> * I've got some concerns about what happens when a request is
canceled
> by a client.  The server may or may not have already processed prior
> fragments containing ServiceContext information, so the client won't
> know whether the server has seen the cache related stuff in the
> ServiceContextList.

The client when it invokes a request, can inform that server that the
client
has cached one of the server's type codes. If that information is
lost,
that does no harm because the server simply won't return cached type
codes
for the repository IDs that were lost.

Similarly if the client has perviously received a WantToInvalidate
from the
server, the Invalidate confirmation from the client may get
lost. Again,
that won't do any harm because the server will simply hang on to what
it
still has in the cache.

Alternatively, the request may contain a WantToInvalidate from the
client
to the server, but again, I see no problem with cancellation here.

> * I would really like to see Anys made efficient enough that they
could
> be used wherever self describing data is needed, such as in OBV.
This
> proposal might be sufficient.

I agree. Javier pointed out that for anys nested inside other anys, we
don't have a solution. Unless someone comes up with something really
creative, I'm putting this into the "too hard" basket (because I do
not want to lose encapsulation of the value part of anys).

> * Finally, I wonder if there are simpler enhancements we could make
that
> would improve the situation somewhat:
> 
> One idea, that might help OBV if its RTF were seriously considering
> Anys, would be to allow the existing TypeCode indirection mechanism
to
> refer outside the top level TypeCode to previous TypeCode instances
in
> the same message.  We couldn't do this on a connection scope because
the
> ORB doesn't necessarily know the order fragments are going to be
sent in
> as it is marshaling into them, so this would only help when the same
> TypeCode appeared several times in the same message (i.e. OBV).

This is something we could do. However, I find it hard to believe that
this optimization would provide significant gain. How often does this
actually happen? My guess is very rarely. I see the current proposal
as solving the major problem, namely that of repeated transmission of
type codes to things like event channels, where the typecodes can
easily
account for 90% of the total bandwidth. Of course, that doesn't mean
we can't implement the optimization you suggest, separately or in
addition
to type code caching.

> Another idea would be to further optimize the existing CDR TypeCode
> representation.  The names are optional in CDR 1.0, and this
> proposal
> attempted to optimize the empty names, but why not use an encoding
> in
> Anys that doesn't even allow for the names to be transmitted, and
> never
> represents aliases?  This would save additional bytes.  All that
> would
> be needed would be the RepositoryIds and structural information. 
> Encapsulations, or at least nested encapsulations, could also be
> avoided.  The regular CDR TypeCode would still be used when
> TypeCodes
> are explicitly sent (i.e. not as part of Anys (or values)), so names
> and
> aliases would be preserved.  I think this approach could reduce the
> Any
> transmission overhead significantly, without the complexity of
> caching. 
> If anyone is interested, I could work up a more specific proposal.

I like this idea. Much of the overhead in type codes actually stems
from
the repeated marshaling of the name strings. Even if the strings are
empty,
they still end up consuming 8 bytes each (at least most of the time)
due
to alignment restrictions. Our proposal attempted to remedy that by
cutting the cost down to 4 bytes per empty string. I looked at this
last year and found that the empty strings can account for around 35%
of the total size of the type code for a typical structure.

Not sending any strings at all sounds good to me! A "minimal" type
code
would be a really nice thing to have. Throw away as many of the
parameters as
possible (all the string-valued ones except for the repository ID),
and
the size of the type codes should come down quite a bit. In addition,
if the outermost respository ID is present for a complex type, there
really
is little point in having the repository IDs for all the nested types
inside it. So, if we could get down to a single repository ID plus
purely
structural information, we might have a winner.

Add to that a re-use indirection scheme, to avoid repeated transmition
of
the same type code, and things would get smaller still. For example:

	struct x {
		long	l;
		foo	bar;
		string	s;
		// blah...
	};

	struct y {
		x	x1;
		x	x2;
		x	x3;
	};

In this case, the type code for x needs to really be sent only once if
we are marshaling the type code for y. Make the first occurence of the
x type code a minimal one and then for members x2 and x3, point at the
type code that preceeded them on the wire.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Fri, 17 Jul 1998 14:46:53 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: Jon Goldberg <jgoldberg@inprise.com>,
        Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Thu, 16 Jul 1998, Jonathan Biggar wrote:

> Jon Goldberg wrote:
>
> > So I guess I propose that we save this CDR revision and
> > typecode caching protocol to be part of the next RTF and
> > probably fit under the heading of "GIOP Optimizations".
> 
> You are right that this is a pretty large change, and given the
amount
> of "design" discussion that has gone on in the last couple of days,
I
> can certainly see why there should be concern about rushing this
into
> the spec.

I share these sentiments. I think 2.3 is too soon.

> One point in the proposals merit is that it is an entirely
transparent
> and optional component of the protocol, so no one has to implement
it in
> order to be conformant.

Yes. But just because it's optional, that doesn't mean it can also be
wrong ;-)  Whatever we do with this, I want it to work, not to be
found
out later to contain hidden problems...

Also, Bob Kukura's comment really struck a note with me. Would we do
it
the same way if we could design all this from scratch? Most probably
not...

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Thu, 16 Jul 1998 22:51:37 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Michi Henning <michi@dstc.edu.au>
CC: Bob Kukura <kukura@iona.com>, Javier Lopez-Martin
<javier@cnd.hp.com>,
        interop@omg.org, vinoski@iona.com, mmihic@iona.com
Subject: Re: OMG:Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980717140942.21051F-100000@tigger.dstc.edu.au>

Michi Henning wrote:
> 
> On Thu, 16 Jul 1998, Bob Kukura wrote:
> > Another idea would be to further optimize the existing CDR
> TypeCode
> > representation.  The names are optional in CDR 1.0, and this
> proposal
> > attempted to optimize the empty names, but why not use an encoding
> in
> > Anys that doesn't even allow for the names to be transmitted, and
> never
> > represents aliases?  This would save additional bytes.  All that
> would
> > be needed would be the RepositoryIds and structural information.
> > Encapsulations, or at least nested encapsulations, could also be
> > avoided.  The regular CDR TypeCode would still be used when
> TypeCodes
> > are explicitly sent (i.e. not as part of Anys (or values)), so
> names and
> > aliases would be preserved.  I think this approach could reduce
> the Any
> > transmission overhead significantly, without the complexity of
> caching.
> > If anyone is interested, I could work up a more specific proposal.
> 
> I like this idea. Much of the overhead in type codes actually stems
> from
> the repeated marshaling of the name strings. Even if the strings are
> empty,
> they still end up consuming 8 bytes each (at least most of the time)
> due
> to alignment restrictions. Our proposal attempted to remedy that by
> cutting the cost down to 4 bytes per empty string. I looked at this
> last year and found that the empty strings can account for around
> 35%
> of the total size of the type code for a typical structure.

I agree, this idea has a lot of merit, and should be simple enough to
get into CORBA 2.3!

> Not sending any strings at all sounds good to me! A "minimal" type
code
> would be a really nice thing to have. Throw away as many of the
parameters as
> possible (all the string-valued ones except for the repository ID),
and
> the size of the type codes should come down quite a bit. In
addition,
> if the outermost respository ID is present for a complex type, there
really
> is little point in having the repository IDs for all the nested
types
> inside it. So, if we could get down to a single repository ID plus
purely
> structural information, we might have a winner.

So, to write the new TypeCode CDR marshaling rules in short form:

1.  Use 0xFFFFFFFE to indicate a compressed TypeCode.

2.  Compressed TypeCodes are marshaled just like normal TypeCodes
except
that the the name and member_name parameters are skipped entirely.

3.  The outermost TypeCode containing an id parameter in a compressed
TypeCode has the id marshalled normally.  Nested TypeCodes marshal the
id parameter as a long with the value 0.  (This avoids the extra
overhead of the null termination octet and padding for these strings.)

4.  TypeCode indirection can be used inside compressed TypeCodes as
usual.

Does this do it?

> Add to that a re-use indirection scheme, to avoid repeated
transmition of
> the same type code, and things would get smaller still. For example:
> 
>         struct x {
>                 long    l;
>                 foo     bar;
>                 string  s;
>                 // blah...
>         };
> 
>         struct y {
>                 x       x1;
>                 x       x2;
>                 x       x3;
>         };
> 
> In this case, the type code for x needs to really be sent only once
if
> we are marshaling the type code for y. Make the first occurence of
the
> x type code a minimal one and then for members x2 and x3, point at
the
> type code that preceeded them on the wire.

Doesn't the existing TypeCode indirection mechanism handle struct y
already?
It isn't only for recursive sequences.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Thu, 16 Jul 1998 22:57:02 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Michi Henning <michi@dstc.edu.au>, Bob Kukura <kukura@iona.com>,
        Javier Lopez-Martin <javier@cnd.hp.com>, interop@omg.org,
        vinoski@iona.com, mmihic@iona.com
Subject: Re: OMG:Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980717140942.21051F-100000@tigger.dstc.edu.au>
<35AEE669.8C3CEABE@floorboard.com>

Jonathan Biggar wrote:
> So, to write the new TypeCode CDR marshaling rules in short form:
> 
> 1.  Use 0xFFFFFFFE to indicate a compressed TypeCode.
> 
> 2.  Compressed TypeCodes are marshaled just like normal TypeCodes
> except
> that the the name and member_name parameters are skipped entirely.
> 
> 3.  The outermost TypeCode containing an id parameter in a
> compressed
> TypeCode has the id marshalled normally.  Nested TypeCodes marshal
> the
> id parameter as a long with the value 0.  (This avoids the extra
> overhead of the null termination octet and padding for these
> strings.)
> 
> 4.  TypeCode indirection can be used inside compressed TypeCodes as
> usual.
> 
> Does this do it?

Also add:

5.  Compressed TypeCodes can also skip any top level or nested alias
TypeCodes in order to avoid the overhead of the extra level of
encapsulation.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Fri, 17 Jul 1998 16:15:47 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Jonathan Biggar <jon@floorboard.com>
cc: Bob Kukura <kukura@iona.com>, Javier Lopez-Martin
<javier@cnd.hp.com>,
        interop@omg.org, vinoski@iona.com, mmihic@iona.com
Subject: Re: OMG:Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Thu, 16 Jul 1998, Jonathan Biggar wrote:

> So, to write the new TypeCode CDR marshaling rules in short form:
> 
> 1.  Use 0xFFFFFFFE to indicate a compressed TypeCode.
> 
> 2.  Compressed TypeCodes are marshaled just like normal TypeCodes
> except
> that the the name and member_name parameters are skipped entirely.
> 
> 3.  The outermost TypeCode containing an id parameter in a
> compressed
> TypeCode has the id marshalled normally.  Nested TypeCodes marshal
> the
> id parameter as a long with the value 0.  (This avoids the extra
> overhead of the null termination octet and padding for these
> strings.)
> 
> 4.  TypeCode indirection can be used inside compressed TypeCodes as
> usual.
> 
> Does this do it?

One concern. (I'm not sure whether this is really a problem or not.)

The above rules mean that in an abbreviated type code, I *must* use
a long with value 0 for the nested repository IDs. What if I have
a marshaling engine that builds up type codes recursively? In that
case,
the routine that marshals the type code for the current type may not
understand
the surrounding context. The above rules would require the calling
routine
to pass extra info around as to whether the called routine needs to
include
a repository ID or not.

I may be shooting at phantoms here, but I thought I'll raise it
anyway.

If this isn't a real concern, then the rules you show will work I
think.
If it *is* a concern, leave the nested repository IDs as strings that
are allowed to be empty. That way, the nested repository IDs can be
there,
but need not be.

And indeed, if we accept CDR 1.1 and drop the NUL terminator from
strings,
the problem goes away entirely, because then a long with value zero
and an empty string will be marshaled identically :-)

> Doesn't the existing TypeCode indirection mechanism handle struct y
> already?
> It isn't only for recursive sequences.

Oops, time to go home I think.

Jon "Spot the mistake" Biggar strikes again :-)

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <javier@cnd.hp.com>
From: Javier Lopez-Martin <javier@cnd.hp.com>
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
To: michi@dstc.edu.au
Date: Fri, 17 Jul 1998 0:16:09 MDT
Cc: kukura@iona.com, javier@cnd.hp.com, interop@omg.org,
vinoski@iona.com,
        mmihic@iona.com

Hi Michi, Bob,

> > * I'm very concerned about the extra effort required by ORB
vendors to
> > build and test a GIOP 1.2 that included CDR 1.1 over one that did
not. 
> > I think its important that vendors quickly converge on
interoperable
> > implementations of GIOP 1.2, because OBV, messaging, firewalls,
etc. all
> > depend on it.   Nothing we add in 1.2 can make the ORBs any
simpler,
> > since 1.0 and 1.1 must continue to be supported.
> 
> Yes. Considering that a number of ORBs still do not have completely
correct
> and reliable IIOP implementations, that is a very real
concern. Personally,
> I prefer something that works slowly to something that doesn't work
fast... ;-)

I share the concerns expressed here, and by Bill on quality.

> > Making TypeCode
> > caching an optional conformance point helps, but might really mean
> that
> > noone ends up implementing it.
> 
> Yes. Javier expressed the same concern. My hope was that the
> performance
> edge it can provide would provide the motivation, but that may well
> be
> a naive assumption.

Because of the quality concerns (extra testing, interoperability 
potential misinterpretations and the like), it is likely that noone
implements this :-(.  That is why I wanted to propose the "implicit"
option: nothing to do from the ORB perspective.


> > * GIOP 1.2 is a minor revision of GIOP 1.1, and therefore will be
> > expected to be more stable than 1.1, with a better chance of any
> two
> > implementations successfully interoperating.  The additional
> complexity
> > of TypeCode caching and multiple CDR versions seriously
> jeopardizes
> > this, in my opinion.  OBV does too, but at least it shouldn't
> break
> > things when values aren't being used.
> 
> I agree. The caching proposal isn't totally trivial to implement,
> even
> though we tried to keep it simple.

I agree as well.

> > * One question I kept asking myself is: if we were given a clean
slate
> > to
> > design GIOP 2.0 and CDR 2.0, is this the way we would do it?
> 
> My answer is a most emphatic "no". I've been feeling vaguely guilty
about
> abusing service contexts to effectively layer anothe protocol
"underneath"
> GIOP. The problem really is that all of GIOP is one big
mangled-together
> ball, without clear delineation of concerns. A more layered protocol
> architecture would keep things like address resolution, marshaling,
caching,
> connection management, and fragmentation separate from each other. I
don't
> think we need seven layers to achieve this ;-) but two or three
definitely
> wouldn't do any harm...
> 
> > * The approach of having the receiver indicate to the sender which
> > TypeCodes it has decided to cache avoids a lot of possible
problems
> > related to interleaved fragments from different messages being
> > transmitted in a different order than that in which they are
generated
> > (at least until cache invalidation is added).  But I'm having
trouble
> > imagining a basis on which a receiver would decide whether to
cache a
> > particular TypeCode.  The ORB would really have no idea whether or
not a
> > particular TypeCode was likely to be retransmitted on a
connection, and
> > would end up caching everything (or nothing).
> 
> My expectation was that the usual LRU or LFU approach would take
care of
> this. The actual caching step is cheap, considering that in order to
cache
> something, the receiving ORB has decoded it already, and so only
needs
> to keep what it already has around in the cache.

This could be solved by using selective invalidation of cache entries,
but this slightly complicates the exchanges.

> > But this caching not only
> > costs memory and CPU cycles in both the sender and receiver, but
> also
> > costs bandwidth for the various ServiceContexts.  This attempt to
> > optimize bandwidth will actually increase use of bandwidth for
> > applications that don't resend the same TypeCodes.
> 
> Yes. If an application continously sends anys with different type
> codes,
> bandwidth goes up. However, is that ever going to happen? I think
> not.
> For one, type codes are the result of IDL definitions written by
> humans,
> so by their very nature, there are fewer type codes than there are
> values instances using those types codes (by many orders of
> magnitude).
> Second, the vast majority of applications will deal over and over
> with the
> same type codes. Of course, something like an event channel has to
> be able
> to deal with all possible type codes, but at any one period of time,
> it will
> deal with a small number of them, simply because the suppliers and
> consumers
> are working in their respective application domains, which again
> will
> only use a small number of type codes (relative to the number of
> values sent
> that use those type codes).

It is quite unlikely to have a repetition average on usage of
typecodes
of less than two or three times (per connection), for the more
pesimistic
cases.  Even with this average, the bandwith would be smaller using
caching.
So I wouldn't consider this as an issue.  The use of CPU/memory is
true
though.

> > * I've got some concerns about what happens when a request is
canceled
> > by a client.  The server may or may not have already processed
prior
> > fragments containing ServiceContext information, so the client
won't
> > know whether the server has seen the cache related stuff in the
> > ServiceContextList.
> 
> The client when it invokes a request, can inform that server that
the client
> has cached one of the server's type codes. If that information is
lost,
> that does no harm because the server simply won't return cached type
codes
> for the repository IDs that were lost.
> 
> Similarly if the client has perviously received a WantToInvalidate
from the
> server, the Invalidate confirmation from the client may get
lost. Again,
> that won't do any harm because the server will simply hang on to
what it
> still has in the cache.
> 
> Alternatively, the request may contain a WantToInvalidate from the
client
> to the server, but again, I see no problem with cancellation here.

As Michi, I don't see too much of a problem here.

> > * I would really like to see Anys made efficient enough that they
could
> > be used wherever self describing data is needed, such as in OBV.
This
> > proposal might be sufficient.
> 
> I agree. Javier pointed out that for anys nested inside other anys,
we
> don't have a solution. Unless someone comes up with something really
> creative, I'm putting this into the "too hard" basket (because I do
> not want to lose encapsulation of the value part of anys).
> 
> > * Finally, I wonder if there are simpler enhancements we could
make that
> > would improve the situation somewhat:
> > 
> > One idea, that might help OBV if its RTF were seriously
considering
> > Anys, would be to allow the existing TypeCode indirection
mechanism to
> > refer outside the top level TypeCode to previous TypeCode
instances in
> > the same message.  We couldn't do this on a connection scope
because the
> > ORB doesn't necessarily know the order fragments are going to be
sent in
> > as it is marshaling into them, so this would only help when the
same
> > TypeCode appeared several times in the same message (i.e. OBV).
> 
> This is something we could do. However, I find it hard to believe
that
> this optimization would provide significant gain. How often does
this
> actually happen? My guess is very rarely. I see the current proposal
> as solving the major problem, namely that of repeated transmission
of
> type codes to things like event channels, where the typecodes can
easily
> account for 90% of the total bandwidth. Of course, that doesn't mean
> we can't implement the optimization you suggest, separately or in
addition
> to type code caching.

This proposal might help OBV, but I don't see it helping in the
general
case (at least not significantly).  As Michi points out, the more
common
case is repeated typecodes in different messages.

> > Another idea would be to further optimize the existing CDR
TypeCode
> > representation.  The names are optional in CDR 1.0, and this
proposal
> > attempted to optimize the empty names, but why not use an encoding
in
> > Anys that doesn't even allow for the names to be transmitted, and
never
> > represents aliases?  This would save additional bytes.  All that
would
> > be needed would be the RepositoryIds and structural information. 
> > Encapsulations, or at least nested encapsulations, could also be
> > avoided.  The regular CDR TypeCode would still be used when
TypeCodes
> > are explicitly sent (i.e. not as part of Anys (or values)), so
names and
> > aliases would be preserved.  I think this approach could reduce
the Any
> > transmission overhead significantly, without the complexity of
caching. 
> > If anyone is interested, I could work up a more specific proposal.
> 
> I like this idea. Much of the overhead in type codes actually stems
from
> the repeated marshaling of the name strings. Even if the strings are
empty,
> they still end up consuming 8 bytes each (at least most of the time)
due
> to alignment restrictions. Our proposal attempted to remedy that by
> cutting the cost down to 4 bytes per empty string. I looked at this
> last year and found that the empty strings can account for around
35%
> of the total size of the type code for a typical structure.
> 
> Not sending any strings at all sounds good to me! A "minimal" type
code
> would be a really nice thing to have. Throw away as many of the
parameters as
> possible (all the string-valued ones except for the repository ID),
and
> the size of the type codes should come down quite a bit. In
addition,
> if the outermost respository ID is present for a complex type, there
really
> is little point in having the repository IDs for all the nested
types
> inside it. So, if we could get down to a single repository ID plus
purely
> structural information, we might have a winner.

Well, this is the intention of what I was proposing with the
"implicit"
variant, but the difference is that the structural information is
kept.

The main issue I see here is the following: if it's the ORB who
decides
to send a typecode with this encoding (no names, only top-level rep
id,
structural info, no aliases), then applications are not able to
recover
any of it, or have the option to choose what is appropriate for its
own use.  For example, names are important for a notification service
implementation: if all names are never transmitted, a notification
service implementation has two options: always replace the typecode
received by a full one obtained from an IFR, or use positional
filtering
only; neither option is too nice.  Besides, there would be no way to
recover the possible aliases ...
However, in other situations, this optimization would be very
positive,
and would not be an issue (like in Name-Value pairs, or other cases
where the application can infer the 'complete' type of an any by other
means).

Don't get me wrong, I like this idea ... a lot!  But I think it must
be controlled somehow at the application level, or else it could be
used in the wrong places.  And using it blindly everywhere might
create other problems.
[For example, if I recall correctly, the agreement on making the
repository id mandatory was that *all* repository ids, at all levels,
where mandatory; if the encoding removes them, then the receiving
end must regain them so that the typecode behaves properly to the
application in the general case ... ]

> Add to that a re-use indirection scheme, to avoid repeated
transmition of
> the same type code, and things would get smaller still. For example:
> 
>       struct x {
>               long    l;
>               foo     bar;
>               string  s;
>               // blah...
>       };
> 
>       struct y {
>               x       x1;
>               x       x2;
>               x       x3;
>       };
> 
> In this case, the type code for x needs to really be sent only once
if
> we are marshaling the type code for y. Make the first occurence of
the
> x type code a minimal one and then for members x2 and x3, point at
the
> type code that preceeded them on the wire.

This is the standard indirection typecode encoding, right?

Javier

--
Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Fri, 17 Jul 1998 16:29:42 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Javier Lopez-Martin <javier@cnd.hp.com>
cc: kukura@iona.com, interop@omg.org, vinoski@iona.com,
mmihic@iona.com
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Fri, 17 Jul 1998, Javier Lopez-Martin wrote:

> The main issue I see here is the following: if it's the ORB who
decides
> to send a typecode with this encoding (no names, only top-level rep
id,
> structural info, no aliases), then applications are not able to
recover
> any of it, or have the option to choose what is appropriate for its
> own use.  For example, names are important for a notification
service
> implementation: if all names are never transmitted, a notification
> service implementation has two options: always replace the typecode
> received by a full one obtained from an IFR, or use positional
filtering
> only; neither option is too nice.  Besides, there would be no way to
> recover the possible aliases ...

Yes. On the other hand, I think it wasn't exactly a hot idea to make
the notification service query language depend on an optional feature
of type codes. Positional notation was added only after people
realized
that the names are optional...

So, I would really say that this the notification service's problem,
not the ORB's problem. For example, there are options other than
the ones you present Javier. I could make a tool that accepts IDL and
an ordinary constraint expression and converts the constraint
expression
into its positional equivalent. That way, the service doesn't have to
use an IFR at run time (which would be expensive), and users are still
shielded from positional notation (in the sense that they don't have
to write these rather ugly expressions themselves).

> Don't get me wrong, I like this idea ... a lot!  But I think it must
> be controlled somehow at the application level, or else it could be
> used in the wrong places.  And using it blindly everywhere might
> create other problems.
> [For example, if I recall correctly, the agreement on making the
> repository id mandatory was that *all* repository ids, at all
> levels,
> where mandatory; if the encoding removes them, then the receiving
> end must regain them so that the typecode behaves properly to the
> application in the general case ... ]

Good point. 

However, the repository ID is mandatory only for object references and
exceptions. In all other cases, it is optional. So, to modify Jon's
set of rules:

	The repository ID is encoded as a string and is mandatory
	for the outermost enclosing type code. For nested type codes,
	the repository ID is optional, except for type codes of type
	tk_objref and and tk_except, which always carry a repostory
ID.

Sound OK?

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <javier@cnd.hp.com>
From: Javier Lopez-Martin <javier@cnd.hp.com>
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
To: michi@dstc.edu.au
Date: Fri, 17 Jul 1998 0:46:46 MDT
Cc: javier@cnd.hp.com, kukura@iona.com, interop@omg.org,
vinoski@iona.com,
        mmihic@iona.com, jon@floorboad.com
Expires: Fri, 10 Aug 1998

Michi,

> > The main issue I see here is the following: if it's the ORB who
decides
> > to send a typecode with this encoding (no names, only top-level
rep id,
> > structural info, no aliases), then applications are not able to
recover
> > any of it, or have the option to choose what is appropriate for
its
> > own use.  For example, names are important for a notification
service
> > implementation: if all names are never transmitted, a notification
> > service implementation has two options: always replace the
typecode
> > received by a full one obtained from an IFR, or use positional
filtering
> > only; neither option is too nice.  Besides, there would be no way
to
> > recover the possible aliases ...
> 
> Yes. On the other hand, I think it wasn't exactly a hot idea to make
> the notification service query language depend on an optional
feature
> of type codes. Positional notation was added only after people
realized
> that the names are optional...

Agreed.

> So, I would really say that this the notification service's problem,
> not the ORB's problem. For example, there are options other than
> the ones you present Javier. I could make a tool that accepts IDL
> and
> an ordinary constraint expression and converts the constraint
> expression
> into its positional equivalent. That way, the service doesn't have
> to
> use an IFR at run time (which would be expensive), and users are
> still
> shielded from positional notation (in the sense that they don't have
> to write these rather ugly expressions themselves).

Right again.  However, I was just pointing to an example of something
that might create problems to go unnoticed.

> > Don't get me wrong, I like this idea ... a lot!  But I think it
must
> > be controlled somehow at the application level, or else it could
be
> > used in the wrong places.  And using it blindly everywhere might
> > create other problems.
> > [For example, if I recall correctly, the agreement on making the
> > repository id mandatory was that *all* repository ids, at all
levels,
> > where mandatory; if the encoding removes them, then the receiving
> > end must regain them so that the typecode behaves properly to the
> > application in the general case ... ]
> 
> Good point. 
> 
> However, the repository ID is mandatory only for object references
and
> exceptions. In all other cases, it is optional. So, to modify Jon's
> set of rules:
> 
>       The repository ID is encoded as a string and is mandatory
>       for the outermost enclosing type code. For nested type codes,
>       the repository ID is optional, except for type codes of type
>       tk_objref and and tk_except, which always carry a repostory
ID.
> 
> Sound OK?

Yes, but it is changing an agreement taken in the last RTF report,
where *all* repository ids (at all levels, and for all types) were
made mandatory.  I asked for them to be mandatory only at the top
level, but was convinced (I think that by Jon Biggar) that it would
be better that all were mandatory, because if not, the operations to
get inner typecodes from other typecodes would return a "different"
kind of typecode.  The same argument you use for different kind of
any encodings ...

Javier Lopez-Martin
Hewlett-Packard Co
javier@cnd.hp.com


Return-Path: <janssen@parc.xerox.com>
Date: Fri, 17 Jul 1998 00:03:23 PDT
Sender: Bill Janssen <janssen@parc.xerox.com>
From: Bill Janssen <janssen@parc.xerox.com>
To: michi@dstc.edu.au, Javier Lopez-Martin <javier@cnd.hp.com>
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
CC: javier@cnd.hp.com, kukura@iona.com, interop@omg.org,
vinoski@iona.com,
        mmihic@iona.com, jon@floorboad.com
References: <199807170646.AA230378006@ovdm40.cnd.hp.com>

Excerpts from local.omg: 16-Jul-98 Re: Vote 2, Issues 817,
113.. Javier
Lopez-Martin@cnd. (3238)

> Yes, but it is changing an agreement taken in the last RTF report,
> where *all* repository ids (at all levels, and for all types) were
> made mandatory.

An agreement which should be retained, IMHO.

Bill

Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Fri, 17 Jul 1998 21:00:51 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Javier Lopez-Martin <javier@cnd.hp.com>
cc: kukura@iona.com, interop@omg.org, vinoski@iona.com,
mmihic@iona.com,
        jon@floorboad.com
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581

On Fri, 17 Jul 1998, Javier Lopez-Martin wrote:

> > However, the repository ID is mandatory only for object references
and
> > exceptions. In all other cases, it is optional. So, to modify
Jon's
> > set of rules:
> > 
> >       The repository ID is encoded as a string and is mandatory
> >       for the outermost enclosing type code. For nested type
codes,
> >       the repository ID is optional, except for type codes of type
> >       tk_objref and and tk_except, which always carry a repostory
ID.
> > 
> > Sound OK?
> 
> Yes, but it is changing an agreement taken in the last RTF report,
> where *all* repository ids (at all levels, and for all types) were
> made mandatory.  I asked for them to be mandatory only at the top
> level, but was convinced (I think that by Jon Biggar) that it would
> be better that all were mandatory, because if not, the operations to
> get inner typecodes from other typecodes would return a "different"
> kind of typecode.  The same argument you use for different kind of
> any encodings ...

Javier, you have this absolutely annoying habit of being right... ;-)
I plain forgot about that decision and just looked at the parameter
table
for type codes in the spec.

Looks like that's the end of the idea of omitting repository IDs in
nested
type codes... I still like Bob's idea though of having a minimal type
code
that leaves out as much as possible. And Jon's suggestion of cleverly
marshaling names as either strings or longs is nice too. It squeezes
another four bytes :-)

							Cheers,

								Michi.
--
vMichi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html


Return-Path: <paulk@noblenet.com>
Date: Mon, 20 Jul 1998 14:28:15 -0400
From: Paul H Kyzivat <paulk@noblenet.com>
Organization: NobleNet
To: Michi Henning <michi@dstc.edu.au>
CC: interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
References:
<Pine.OSF.3.96.980716061314.22149D-100000@tigger.dstc.edu.au>

Michi Henning wrote:
[snip]
>         Type any:
> 
>         A second, alternative encoding for type any is added to CDR
> 1.1.
>         The alternative encoding is indicated using a mechanism
> similar
>         to the one for indirection for recursive type codes.
>         A value of 0xfffffffe for TCKind indicates the alternative
>         any encoding. The difference to the normal encoding of type
> any
>         is that instead of having an any contain a type code
followed
>         by the data, in CDR 1.1, the any contains a type code
followed
>         by the *encapsulated* data.
> 
>         This gets rid of a range of problems relating to marshaling
>         of type any. For example, an event channel cannot receive an
>         any and send it to a consumer without completely
unmarshaling
>         and remarshaling every any value. The CDR 1.1 encoding for
any
>         values permits an event channel to both unmarshal and
marshal
>         an any as a blob, without having to decode and re-encode all
> of
>         the data in the any.
[snip]
>         This means we have two encodings of type any side by side,
the
> old
>         CDR 1.0 encoding (which is still valid), and the new
encoding.
>         How does the sender decide how to marshal an any?
> 
>         One of several possible strategies:
> 
>         - Use CDR 1.0 encoding for all anys with simple type codes
>           that are fixed length, and for all anys carrying a simple
>           fixed-length value or a single string. For such anys,
>           the CDR 1.1 encoding just adds bulk (an extra 8 bytes).
> 
>         - Use the CDR 1.1 encoding for all anys with complex type
>           codes that have several parameters.
> 
>         - Use the CDR 1.0 encoding if the sender wants to use
>           fragmentation (encapsulation of the value can get in
>           the way of fragmentation if the value is larger than
>           the largest supported fragment size).

Allowing the sender to decide based on its convenience negates the
values you state for introducing this change. I agree that using the
old
rules for some simple types makes sense. But setting up a competition
between sender convenience and server convenience seems bad. Probably
all senders will opt for the advantages of fragmentation, and then
everybody will just have to implement something that nobody uses.

I think it is better to decide on a new set of rules and then
deprecate
the old ones. Doing this may require finding a solution to
fragmentation
of encapsulated anys.
 
Return-Path: <kukura@iona.com>
Date: Mon, 20 Jul 1998 17:29:45 -0400
From: Bob Kukura <kukura@iona.com>
Organization: IONA Technologies
To: interop@omg.org
CC: vinoski@iona.com, mchapman@iona.com, cryan@iona.com
Subject: new TypeCode optimization proposal, plus comments on other
CDR 1.1 changes

Vote 2 in the interop RTF consists of a proposal to define CDR 1.1
with
changes to string, any, and TypeCode encodings.  Consensus seems to
have
formed that the TypeCode caching mechanism included in this proposal
is
too complex for inclusion in CORBA 2.3.   I'm not sure if others
agree,
but I feel that without the TypeCode caching mechanism, there is not
enough value in this proposal to warrant requiring ORBs to support
multiple CDR versions, and therefore vote NO on the entire CDR 1.1
proposal as it stands.

I could live with the changes to string.  I at least agree that an
empty
string should be representable by a length of zero and no additional
NUL
byte.  It would be possible to adopt this change without dropping
NUL-termination of non-empty strings, which would avoid the
pointer-into-buffer issues that have been raised.

In my opinion, the alternative encapsulated Any representation is only
of any possible value to those ORBs that represent Anys internally in
CDR format.  Not all ORBs do this, and, for those that do, there will
be
complications to this with multiple CDR versions.  Also, without
"chunked" encapsulations, the benefits of fragmentation would be
lost.  

Furthermore, I don't see how the sending context that marshals the Any
could know which representation to use.  If the receiving context is
implemented to represent Anys in CDR and is just passing on the Any,
then the encapsulated representation is a win.  Otherwise it is a
loss. 
The sending context has no way to know how the receiving context ORB
is
implemented, or what the application will do, so it has no basis to
choose which representation to use.  Finally, this alternative
representation will not eliminate a single line of code from any ORB
implementation, since CDR 1.0 Anys will still need to be supported.

Finally, I have attached a simple, concrete alternative proposal to
optimize the representation of TypeCodes (particularly when used in
Anys) on the wire.  It simply adds seven new TCKind encodings (as
"simple" param lists) to the existing table, which leave out optional
names and avoid using encapsulations.  This proposal would replace the
TypeCode caching mechanism in the vote 2 proposal.  I'd prefer to also
leave out the Any encapsulation portion of vote 2, but thats a
separable
issue.  

One other TypeCode optimization that could be added would be to
increase
the indirection scope in CDR 1.1 to the CDR stream rather than the top
level TypeCode.

-BobChange 13.3.4, third bullet under "Basic TypeCode Encoding
Framework"
to:

Typecodes with simple parameter lists are encoded as the TCKind enum
value followed by the parameter value(s), encoded as indicated in
Table 13-2.


Add to table 13-2, with a notation that these encodings are only
available in CDR 1.1 and up, and under conditions described below:


tk_objref	0xffff000e	simple	string(RepositoryId)
tk_struct	0xffff000f	simple	string(RepositoryId)
ulong(count)
					{TypeCode(member type)}
tk_union	0xffff0010	simple	string(RepositoryId)
					TypeCode(discriminant type) 
					long(default used)
ulong(count)
					{discriminant type(label
value)
					TypeCode(member type)}
tk_enum		0xffff0011	simple	string(RepositoryId)
ulong(count)
tk_sequence	0xffff0013	simple	TypeCode(element type)
ulong(max length)
tk_array	0xffff0014	simple	TypeCode(element type)
ulong(length)
tk_except	0xffff0016	simple	string(RepositoryId)
ulong(count)
					{TypeCode(member type)}


Add the following section after the section on Indirection and
before the section on "Any".

Optimized TypeCode Encodings

CDR 1.1 provides alternative encodings for tk_objref, tk_struct,
tk_union, tk_enum, tk_sequence, tk_array, and tk_except TypeCodes that
optimize encoded size by leaving out optional name parameters and by
avoiding the use of encapsulations.  These alternative encodings can
be used when marshaling a TypeCode in CDR 1.1 whenever either of the
following conditions is true:

 * The TypeCode instance being marshaled does not contain
   the optional parameters.

 * The TypeCode is being marshaled to indicate the type of an Any, as
   described below.

The alternative encodings are identified in the CDR stream by the
value 0xffff0000 being ORed with the TCKind value identifying the kind
of the TypeCode.  The content type of tk_alias TypeCodes may also be
marshaled directly, without any representation of the tk_alias
TypeCode, under the same conditions where the alternative TypeCode
encodings are used.


Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Mon, 20 Jul 1998 15:48:00 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Bob Kukura <kukura@iona.com>
CC: interop@omg.org, vinoski@iona.com, mchapman@iona.com,
cryan@iona.com
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
References: <35B3B6C9.4459EAAC@iona.com>

Bob Kukura wrote:
>
>   ------------------------------------------------------------------------
> Change 13.3.4, third bullet under "Basic TypeCode Encoding
Framework"
> to:
> 
> Typecodes with simple parameter lists are encoded as the TCKind enum
> value followed by the parameter value(s), encoded as indicated in
> Table 13-2.
> 
> Add to table 13-2, with a notation that these encodings are only
> available in CDR 1.1 and up, and under conditions described below:
> 
> tk_objref       0xffff000e      simple  string(RepositoryId)
> tk_struct       0xffff000f      simple  string(RepositoryId)
ulong(count)
>                                         {TypeCode(member type)}
> tk_union        0xffff0010      simple  string(RepositoryId)
>                                         TypeCode(discriminant type)
>                                         long(default used)
ulong(count)
>                                         {discriminant type(label
value)
>                                         TypeCode(member type)}
> tk_enum         0xffff0011      simple  string(RepositoryId)
ulong(count)
> tk_sequence     0xffff0013      simple  TypeCode(element type)
ulong(max length)
> tk_array        0xffff0014      simple  TypeCode(element type)
ulong(length)
> tk_except       0xffff0016      simple  string(RepositoryId)
ulong(count)
>                                         {TypeCode(member type)}
> 

I am opposed to this proposal because it significantly complicates the
process of unmarshalling TypeCodes.  The existing CDR representation
wraps the contents of complicated TypeCodes in an encapsulation so
that
the end of the TypeCode can easily be found without having to
interpret
the entire TypeCode.  A second advantage to the encapsulation of the
TypeCode contents is that it allows the marshalling engine to embed a
TypeCode received from a source with a different byte order without
having to remarshal the entire TypeCode.

Although this proposal does save space by avoiding the encapsulations
and the optional name and member_name fields, I don't believe that its
benefits outweight its costs.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <kukura@iona.com>
Date: Mon, 20 Jul 1998 22:44:37 -0400
From: Bob Kukura <kukura@iona.com>
Organization: IONA Technologies
To: Jonathan Biggar <jon@floorboard.com>
CC: interop@omg.org, vinoski@iona.com, mchapman@iona.com,
cryan@iona.com
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
References: <35B3B6C9.4459EAAC@iona.com>
<35B3C920.F5AA3179@floorboard.com>

Jonathan Biggar wrote:
> 
> Bob Kukura wrote:
> >
>
> ------------------------------------------------------------------------
> > Change 13.3.4, third bullet under "Basic TypeCode Encoding
> Framework"
> > to:
> >
> > Typecodes with simple parameter lists are encoded as the TCKind
> enum
> > value followed by the parameter value(s), encoded as indicated in
> > Table 13-2.
> >
> > Add to table 13-2, with a notation that these encodings are only
> > available in CDR 1.1 and up, and under conditions described below:
> >
> > tk_objref       0xffff000e      simple  string(RepositoryId)
> > tk_struct       0xffff000f      simple  string(RepositoryId)
> ulong(count)
> >                                         {TypeCode(member type)}
> > tk_union        0xffff0010      simple  string(RepositoryId)
> >                                         TypeCode(discriminant
> type)
> >                                         long(default used)
> ulong(count)
> >                                         {discriminant type(label
> value)
> >                                         TypeCode(member type)}
> > tk_enum         0xffff0011      simple  string(RepositoryId)
> ulong(count)
> > tk_sequence     0xffff0013      simple  TypeCode(element type)
> ulong(max length)
> > tk_array        0xffff0014      simple  TypeCode(element type)
> ulong(length)
> > tk_except       0xffff0016      simple  string(RepositoryId)
> ulong(count)
> >                                         {TypeCode(member type)}
> >
> 
> I am opposed to this proposal because it significantly complicates
> the
> process of unmarshalling TypeCodes.  The existing CDR representation
> wraps the contents of complicated TypeCodes in an encapsulation so
> that
> the end of the TypeCode can easily be found without having to
> interpret
> the entire TypeCode.  A second advantage to the encapsulation of the
> TypeCode contents is that it allows the marshalling engine to embed
> a
> TypeCode received from a source with a different byte order without
> having to remarshal the entire TypeCode.

Your arguments only apply to those ORBs that store TypeCodes in
marshaled form.  Many ORBs don't do this, and each use of
encapsulations
makes these ORBs more complicated.  But messages size is impacted for
all ORBs wherever encapsulations are required.  If compact messages
are
what we're after, we've got to give up the encapsulations.  Each
encapsulation typically wastes 8 bytes with absolutely no benefit to
many ORBs.

In my opinion, CDR encapsulations serve a useful purpose when ORBs
need
to be able to skip over marshaled data of an unknown format, such as
in
IOR profile and component bodies and service contexts, but are
needless
overhead otherwise.

> 
> Although this proposal does save space by avoiding the
> encapsulations
> and the optional name and member_name fields, I don't believe that
> its
> benefits outweight its costs.

Do you have other costs in mind, or just the requirement that
TypeCodes
be interpreted while they are being demarshaled?

I think the space saved would be significant in real applications.
For
the following IDL:

struct AStruct
{
  unsigned long member1;
  sequence <string> member2;
};

the current representation, with names omitted, would be something
like:

0x0000000f		4	tk_struct
0x00000050		4	length of tk_struct encapsulation
0x00			1	tk_struct encapsulation byte order
0x000000		3	padding
0x00000010		4	length of tk_struct RepositoryId
"IDL:AStruct:1.1"	16	RepositoryID string (including nul)
0x00000001		4	length of tk_struct name
""			1	name (nul)
0x000000		3	padding
0x00000002		4	member count
0x00000001		4	length of member1 name
""			1	name (nul)
0x000000		3	padding
0x00000005		4	tk_ulong
0x00000001		4	length of member2 name
""			1	name (nul)
0x000000		3	padding
0x00000013		4	tk_sequence
0x00000010		4	length of tk_sequence encapsulation
0x00			1	tk_sequence encapsulation byte order
0x000000		3	padding
0x00000012		4	tk_string
0x00000000		4	max length of tk_string
0x00000000		4	max length of tk_sequence

for a total of 88 bytes.  The proposed CDR 1.1 string encoding would
reduce this to 76 bytes.

My proposed optimized encoding would be:

0xffff000f		4	tk_struct (optimized)
0x00000010		4	length of tk_struct RepositoryId
"IDL:AStruct:1.1"	16	RepositoryID string (including nul)
0x00000002		4	member count
0x00000005		4	tk_ulong
0xffff0013		4	tk_sequence (optimized)
0x00000012		4	tk_string
0x00000000		4	max length of tk_string
0x00000000		4	max length of tk_sequence

totalling only 48 bytes, with no wasted space.

So, to summarize:

 useful information:		48 bytes
 encapsulation overhead:	16 bytes in CDR 1.0 and vote 2
proposal
 optional name overhead:	24 bytes in CDR 1.0, 12 bytes in vote
2
proposal

Use of encapsulations costs 16 bytes in this simple example, with
absolutely no benefit to those ORBs that do no represent TypeCodes
internally in CDR format.  Layers and layers of nested CDR
encapsulations, particularly for aliases, are not uncommon in real
world
TypeCodes, where the savings would be even more signficant.

> 
> --
> Jon Biggar
> Floorboard Software
> jon@floorboard.com
> jon@biggar.org

-Bob

Return-Path: <jon@floorboard.com>
Sender: jon@floorboard.com
Date: Mon, 20 Jul 1998 20:25:07 -0700
From: Jonathan Biggar <jon@floorboard.com>
To: Bob Kukura <kukura@iona.com>
CC: interop@omg.org, vinoski@iona.com, mchapman@iona.com,
cryan@iona.com
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
References: <35B3B6C9.4459EAAC@iona.com>
<35B3C920.F5AA3179@floorboard.com> <35B40095.D908A512@iona.com>

Bob Kukura wrote:
> In my opinion, CDR encapsulations serve a useful purpose when ORBs
> need
> to be able to skip over marshaled data of an unknown format, such as
> in
> IOR profile and component bodies and service contexts, but are
> needless
> overhead otherwise.

Like TypeCodes?

> > Although this proposal does save space by avoiding the
encapsulations
> > and the optional name and member_name fields, I don't believe that
its
> > benefits outweight its costs.
> 
> Do you have other costs in mind, or just the requirement that
TypeCodes
> be interpreted while they are being demarshaled?

Thinking about the proposal some more, I wouldn't object to it if it
were wrapped in a single encapsulation at the top level.  This would
still allow a marshaling engine to find the end of the top level
TypeCode quickly, but only add an minimal additional amount of
overhead.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Return-Path: <beckwb@ois.com>
Date: Tue, 21 Jul 1998 00:34:24 -0400 (EDT)
From: Bill Beckwith <bill.beckwith@ois.com>
X-Sender: beckwb@gamma
To: Bob Kukura <kukura@iona.com>
cc: Jonathan Biggar <jon@floorboard.com>, interop@omg.org,
vinoski@iona.com,
        mchapman@iona.com, cryan@iona.com
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes

Hi Bob,

I am please that at least you have implied an optimization objective:
less
bytes on the wire.  Is that in fact your objective? 

I'd like to come to some form of concensus about what we are trying to
achieve before engaging in an optimization debate. 

I see the following costs associated with type Any's: 

			relevant
    capability		metrics
    ----------		--------
    insertion into	CPU time, temp memory used
    extraction from	CPU time, temp memory used
    (static)		memory used (in the Any variable)
    marshal		CPU time, stream data needed, temp memory used
    demarshal		CPU time, stream data needed, temp memory used

If anyone sees any other potential side effects of a type Any please
speak
up. 

If we are targeting efficient transfer over 33.6Kb modems then by all
means
let's reduce the number of bytes on the wire (but remember that the
modem
will compress data).  If we are targeting 1 Gb ethernet then CPU time
is the
obvious target. 

The current architecture of GIOP requires that the typecode be fully
interpreted just to determine the length of the data.  This is CPU
expensive
mearly to save four bytes of size and an endian flag. My original
suggestion
is that we either encapsulate the entire Any (typecode and data) or we
encapsulate the Any's data. 

This doesn't complicate an ORB, regardless of whether the ORB stores
type Any
as a CDR or otherwise.  It just offers an endian and a length in a
place
where a length cannot be easily determined.  It is designed to
simplify the
ORB and reduce CPU overhead.  It also greatly simplifies the effort of
writing firewall proxies, etc. 

WRT to fragmentation.  Encapsulation does not interfere with
fragmentation. It
does interfere with fixed sized buffering mechanisms. 

-- Bill

Return-Path: <Mike_Spreitzer.PARC@xerox.com>
From: Mike_Spreitzer.PARC@xerox.com
X-NS-Transport-ID: 0000AA008AD0DAD938AE
Date: Mon, 20 Jul 1998 21:52:41 PDT
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
To: bill.beckwith@ois.COM
cc: kukura@iona.COM, jon@floorboard.COM, interop@omg.org,
vinoski@iona.COM,
        mchapman@iona.COM, cryan@iona.COM

> Encapsulation does not interfere with fragmentation. It
> does interfere with fixed sized buffering mechanisms.

It doesn't even do that if you use a chunked encapsulation.  Which you
should,
if you're going to do encapsulation at all.  Unless I'm
misunderstanding
something here, fragmentation was introduced to facilitate fixed-size
buffering
mechanisms.  If you introduce a non-chunked encapsulation, you prevent
the
benefit of fragmentation from being realized.


Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Tue, 21 Jul 1998 09:40:36 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: Paul H Kyzivat <paulk@noblenet.com>
cc: interop@omg.org
Subject: Re: Vote 2, Issues 817, 1138, 1531, and 1581
Content-ID: <Pine.OSF.3.96.980721093244.26905G@tigger.dstc.edu.au>

On Mon, 20 Jul 1998, Paul H Kyzivat wrote:

> Allowing the sender to decide based on its convenience negates the
> values you state for introducing this change. I agree that using the
> old
> rules for some simple types makes sense. But setting up a
> competition
> between sender convenience and server convenience seems
> bad. Probably
> all senders will opt for the advantages of fragmentation, and then
> everybody will just have to implement something that nobody uses.

I agree, I think. The problem is that in absence of quality-of-service
negotiation or some such, the sender can only use heuristics to make
such decisions. We badly need some sort of session or communication
context
concept to get these things right. On the other hand, that is
difficult
because the interactions between objects are essentially stateless
(each invocation is a stand-along thing that should not require any
additional context).

> I think it is better to decide on a new set of rules and then
deprecate
> the old ones. Doing this may require finding a solution to
fragmentation
> of encapsulated anys.

I suspect that we should address this with GIOP 2.0 -- the problem is
that fragmentation is mangled into the high-level protocol right now,
whereas I think it really should be a lower-level layer that is
transparent
to the "logical" GIOP message exchange protocol.

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html

Return-Path: <kukura@iona.com>
Date: Tue, 21 Jul 1998 14:28:18 -0400
From: Bob Kukura <kukura@iona.com>
Organization: IONA Technologies
To: Jonathan Biggar <jon@floorboard.com>
CC: interop@omg.org, vinoski@iona.com, mchapman@iona.com,
cryan@iona.com
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
References: <35B3B6C9.4459EAAC@iona.com>
<35B3C920.F5AA3179@floorboard.com> <35B40095.D908A512@iona.com>
<35B40A13.608B5E05@floorboard.com>

Jonathan Biggar wrote:
> 
> Bob Kukura wrote:
> > In my opinion, CDR encapsulations serve a useful purpose when ORBs
> need
> > to be able to skip over marshaled data of an unknown format, such
> as in
> > IOR profile and component bodies and service contexts, but are
> needless
> > overhead otherwise.
> 
> Like TypeCodes?

TypeCodes are of a known format.  Encapsulations (as currently used)
don't help an ORB that doesn't understand some of the TCKinds in a
marshaled TypeCode.

> 
> > > Although this proposal does save space by avoiding the
> encapsulations
> > > and the optional name and member_name fields, I don't believe
> that its
> > > benefits outweight its costs.
> >
> > Do you have other costs in mind, or just the requirement that
> TypeCodes
> > be interpreted while they are being demarshaled?
> 
> Thinking about the proposal some more, I wouldn't object to it if it
> were wrapped in a single encapsulation at the top level.  This would
> still allow a marshaling engine to find the end of the top level
> TypeCode quickly, but only add an minimal additional amount of
> overhead.

This top level encapsulation is still only of any use to ORBs that
want
to delay interpreting the TypeCode.  Should everyone continue to pay
the
price for this design choice?  If consensus is that this top level
encapsulation is worth the lost bandwidth, I think it could be worked
into my proposal, but it would complicate things a bit more.

-Bob

> 
> --
> Jon Biggar
> Floorboard Software
> jon@floorboard.com
> jon@biggar.org


Return-Path: <kukura@iona.com>
Date: Tue, 21 Jul 1998 15:41:01 -0400
From: Bob Kukura <kukura@iona.com>
Organization: IONA Technologies
To: Mike_Spreitzer.PARC@xerox.com
CC: bill.beckwith@ois.COM, jon@floorboard.COM, interop@omg.org,
        vinoski@iona.com, mchapman@iona.com
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
References: <98Jul20.215344pdt."52740(4)"@alpha.xerox.com>

Mike_Spreitzer.PARC@xerox.com wrote:
> 
> > Encapsulation does not interfere with fragmentation. It
> > does interfere with fixed sized buffering mechanisms.
> 
> It doesn't even do that if you use a chunked encapsulation.  Which
> you should,
> if you're going to do encapsulation at all.  Unless I'm
> misunderstanding
> something here, fragmentation was introduced to facilitate
> fixed-size buffering
> mechanisms.  If you introduce a non-chunked encapsulation, you
> prevent the
> benefit of fragmentation from being realized.

I agree with Mike here.  I don't think we could support any proposal
to
encapsulate the value portion of Anys that did not provide chunking.
If
we do (eventually) do this, my view is that chunking should be added
to
CDR as a "chunked CDR encapsulation" construct used by Any (not that I
support this) rather than as a generic representation for all kinds of
sequences.  The reason is that IDL sequence have a known length at the
time marshaling begins, so there is no reason to chunk them.  CDR
encapsulations that are being marshaled "on the fly" are streams whose
length is not known until they have been marshaled.  Furthermore, most
ORBs represent a sequence as a single buffer, and chunking normal
sequences would require resizing of that buffer as chunks were
received.

-Bob


Return-Path: <kukura@iona.com>
Date: Tue, 21 Jul 1998 14:28:18 -0400
From: Bob Kukura <kukura@iona.com>
Organization: IONA Technologies
To: Jonathan Biggar <jon@floorboard.com>
CC: interop@omg.org, vinoski@iona.com, mchapman@iona.com,
cryan@iona.com
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
References: <35B3B6C9.4459EAAC@iona.com>
<35B3C920.F5AA3179@floorboard.com> <35B40095.D908A512@iona.com>
<35B40A13.608B5E05@floorboard.com>

Jonathan Biggar wrote:
> 
> Bob Kukura wrote:
> > In my opinion, CDR encapsulations serve a useful purpose when ORBs
> need
> > to be able to skip over marshaled data of an unknown format, such
> as in
> > IOR profile and component bodies and service contexts, but are
> needless
> > overhead otherwise.
> 
> Like TypeCodes?

TypeCodes are of a known format.  Encapsulations (as currently used)
don't help an ORB that doesn't understand some of the TCKinds in a
marshaled TypeCode.

> 
> > > Although this proposal does save space by avoiding the
> encapsulations
> > > and the optional name and member_name fields, I don't believe
> that its
> > > benefits outweight its costs.
> >
> > Do you have other costs in mind, or just the requirement that
> TypeCodes
> > be interpreted while they are being demarshaled?
> 
> Thinking about the proposal some more, I wouldn't object to it if it
> were wrapped in a single encapsulation at the top level.  This would
> still allow a marshaling engine to find the end of the top level
> TypeCode quickly, but only add an minimal additional amount of
> overhead.

This top level encapsulation is still only of any use to ORBs that
want
to delay interpreting the TypeCode.  Should everyone continue to pay
the
price for this design choice?  If consensus is that this top level
encapsulation is worth the lost bandwidth, I think it could be worked
into my proposal, but it would complicate things a bit more.

-Bob

> 
> --
> Jon Biggar
> Floorboard Software
> jon@floorboard.com
> jon@biggar.org

Return-Path: <kukura@iona.com>
Date: Tue, 21 Jul 1998 15:41:01 -0400
From: Bob Kukura <kukura@iona.com>
Organization: IONA Technologies
To: Mike_Spreitzer.PARC@xerox.com
CC: bill.beckwith@ois.COM, jon@floorboard.COM, interop@omg.org,
        vinoski@iona.com, mchapman@iona.com
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
References: <98Jul20.215344pdt."52740(4)"@alpha.xerox.com>

Mike_Spreitzer.PARC@xerox.com wrote:
> 
> > Encapsulation does not interfere with fragmentation. It
> > does interfere with fixed sized buffering mechanisms.
> 
> It doesn't even do that if you use a chunked encapsulation.  Which
> you should,
> if you're going to do encapsulation at all.  Unless I'm
> misunderstanding
> something here, fragmentation was introduced to facilitate
> fixed-size buffering
> mechanisms.  If you introduce a non-chunked encapsulation, you
> prevent the
> benefit of fragmentation from being realized.

I agree with Mike here.  I don't think we could support any proposal
to
encapsulate the value portion of Anys that did not provide chunking.
If
we do (eventually) do this, my view is that chunking should be added
to
CDR as a "chunked CDR encapsulation" construct used by Any (not that I
support this) rather than as a generic representation for all kinds of
sequences.  The reason is that IDL sequence have a known length at the
time marshaling begins, so there is no reason to chunk them.  CDR
encapsulations that are being marshaled "on the fly" are streams whose
length is not known until they have been marshaled.  Furthermore, most
ORBs represent a sequence as a single buffer, and chunking normal
sequences would require resizing of that buffer as chunks were
received.

-Bob

Return-Path: <janssen@parc.xerox.com>
Date: Tue, 21 Jul 1998 17:31:15 PDT
Sender: Bill Janssen <janssen@parc.xerox.com>
From: Bill Janssen <janssen@parc.xerox.com>
To: Mike_Spreitzer.PARC@xerox.com, Bob Kukura <kukura@iona.com>
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
CC: bill.beckwith@ois.com, jon@floorboard.com, interop@omg.org,
        vinoski@iona.com, mchapman@iona.com
References: <98Jul20.215344pdt."52740(4)"@alpha.xerox.com>
	<35B4EECD.63708D73@iona.com>

Excerpts from local.omg: 21-Jul-98 Re: new TypeCode optimizati.. Bob
Kukura@iona.com (1309*)

> The reason is that IDL sequence have a known length at the
> time marshaling begins, so there is no reason to chunk them.

Bob, there's no a priori reason why sequence lengths are known at
run-time.  In some of our language mappings, we currently map IDL
sequences to data types which require this, but that's not a
characteristic of the abstract data type, nor is it necessarily a good
thing.  A classic example is using fixed-size buffering to transfer,
as
a sequence of octets, the results of a subprocess that's generating
image data on the fly.  You don't really know the length until it's
all
been generated, and you may not want to buffer megabytes of stuff in
your server.  This kind of thing occurs more often than we'd like to
admit.

In general, I think it would be a good idea to use chunking for all
string, sequence, valuetype, and encapsulation marshalling in GIOP.
We
can then let language mappings catch up to it.

Bill

Return-Path: <kukura@iona.com>
Date: Tue, 21 Jul 1998 22:27:32 -0400
From: Bob Kukura <kukura@iona.com>
Organization: IONA Technologies
To: Bill Janssen <janssen@parc.xerox.com>
CC: Mike_Spreitzer.PARC@xerox.com, bill.beckwith@ois.com,
jon@floorboard.com,
        interop@omg.org, vinoski@iona.com, mchapman@iona.com
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
References: <98Jul20.215344pdt."52740(4)"@alpha.xerox.com>
		<35B4EECD.63708D73@iona.com>
<UphH=H8B0KGW8lrvRL@holmes.parc.xerox.com>

Bill Janssen wrote:
> 
> Excerpts from local.omg: 21-Jul-98 Re: new TypeCode optimizati.. Bob
> Kukura@iona.com (1309*)
> 
> > The reason is that IDL sequence have a known length at the
> > time marshaling begins, so there is no reason to chunk them.
> 
> Bob, there's no a priori reason why sequence lengths are known at
> run-time.  In some of our language mappings, we currently map IDL
> sequences to data types which require this, but that's not a
> characteristic of the abstract data type, nor is it necessarily a
> good
> thing.  A classic example is using fixed-size buffering to transfer,
> as
> a sequence of octets, the results of a subprocess that's generating
> image data on the fly.  You don't really know the length until it's
> all
> been generated, and you may not want to buffer megabytes of stuff in
> your server.  This kind of thing occurs more often than we'd like to
> admit.

CORBA 2.2, section 1.2.4, page 1-5, defines sequence in the CORBA
object
model as:

    A sequence type, which consists of a variable-length array of a
single
type; the length of the sequence is available at run-time.

I take this to mean that the length is a property of the abstract type
that is available at run-time, before the ORB begins marshaling the
sequence.  

But I agree a 100% that the kind of thing you are talking about occurs
often enough to be worth supporting efficiently in CORBA - CDR
encapsulations are an example.  But why would this not be best handled
as a distinct IDL template type, similar to DCE pipes?

> 
> In general, I think it would be a good idea to use chunking for all
> string, sequence, valuetype, and encapsulation marshalling in GIOP.
> We
> can then let language mappings catch up to it.

I still disagree.  The receiving context can be implemented most
efficiently for current language mappings if the lengths of strings
and
sequences are known before any of their elements are demarshaled.  A
distinct stream or pipe type would allow language mappings to handle
each case optimally.  A CDR encapsulation should be defined as
stream<octet> or pipe<octet> rather than as a sequence<octet>.

> 
> Bill

Of course, this is completely academic for CORBA 2.3 at this point.

-Bob

Return-Path: <janssen@parc.xerox.com>
Date: Tue, 21 Jul 1998 20:30:29 PDT
Sender: Bill Janssen <janssen@parc.xerox.com>
From: Bill Janssen <janssen@parc.xerox.com>
To: Bill Janssen <janssen@parc.xerox.com>, Bob Kukura
<kukura@iona.com>
Subject: Re: new TypeCode optimization proposal, plus comments on
other CDR 1.1 changes
CC: Mike_Spreitzer.PARC@xerox.com, bill.beckwith@ois.com,
jon@floorboard.com,
        interop@omg.org, vinoski@iona.com, mchapman@iona.com
References: <98Jul20.215344pdt."52740(4)"@alpha.xerox.com>
		<35B4EECD.63708D73@iona.com>
<UphH=H8B0KGW8lrvRL@holmes.parc.xerox.com>
	<35B54E14.85CDAB81@iona.com>

Excerpts from direct: 21-Jul-98 Re: new TypeCode optimizati.. Bob
Kukura@iona.com (2227*)

> The receiving context can be implemented most
> efficiently for current language mappings if the lengths of strings
> and
> sequences are known before any of their elements are demarshaled.

Yes, that's true.  The current scheme can provide certain advantages
for
the receivers; the other scheme provides certain advantages for the
senders.  I'm not sure why we should decide that the receivers deserve
the advantage, though.  Perhaps a new pipe type would be a good idea.

Bill

Return-Path: <guy.genilloud@di.epfl.ch>
X-Sender: genillou@dimail.epfl.ch
References: <35B3B6C9.4459EAAC@iona.com>
 <35B3C920.F5AA3179@floorboard.com> <35B40095.D908A512@iona.com>
 <35B40A13.608B5E05@floorboard.com>
Date: Wed, 22 Jul 1998 12:50:32 +0200
To: Bob Kukura <kukura@iona.com>, Jonathan Biggar <jon@floorboard.com>
From: Guy Genilloud <guy.genilloud@dimail.epfl.ch>
Subject: Re: new TypeCode optimization proposal, plus comments on
other
 CDR 1.1 changes
Cc: interop@omg.org, vinoski@iona.com, mchapman@iona.com,
cryan@iona.com

At 2:28 PM -0400 7/21/98, Bob Kukura wrote:
>This top level encapsulation is still only of any use to ORBs that
want
>to delay interpreting the TypeCode.  Should everyone continue to pay
the
>price for this design choice?  If consensus is that this top level
>encapsulation is worth the lost bandwidth, I think it could be worked
>into my proposal, but it would complicate things a bit more.

FYI...

There is a practice in Telecom systems management, sometimes referred
to as Best Efforts Management, that allows the recipient of an Any to
discard this Any if it does not understand it.
(in ASN.1, an OID for the ANY is passed separately from the ANY --
I don't remember what the TMN- CORBA submission actually does in
IDL...).

It is also clear that some Anys are forwarded without having been
interpreted.  The Notification service is an example.

In both cases, we can speak of avoiding interpreting the TypeCode
rather
than on delaying this interpretation...

Best regards

Guy Genilloud


--------------------------------------------------------------------
Dr. Guy Genilloud                  <http://icawww.epfl.ch/genilloud>

Institute for computer Communications and Applications (ICA)
Swiss Federal Institute of Technology (EPFL)   tel: +41 21 693 46 57
CH-1015 Lausanne, SWITZERLAND                  fax: +41 21 693 47 01
--------------------------------------------------------------------