Issue 1642:  Java to IDL identifier mapping (java2idl-rtf)
Source:  (, )
Nature:  Revision
Severity: 
Summary: Summary: orbos/98-02-01 says:
 
 	We have provided name manglings to work around [the limitations],
 	but we recommend that OMG consider extending the definitions of
 	IDL identifiers so that a wider range of unicode characters can
 	be supported and so that case is significant in distinguishing
 	identifiers.
 
 I would suggest to remove this recommendation because it is pragmatically
 unimplementable for implementation languages that do not support
 extended character sets. If the OMG were to take this step, it would
 effectively alienate CORBA from all such implementation languages, which
 I believe would be detrimental to the success of CORBA.
 
 	Because overloaded methods are a popular feature in object-oriented
 	programming languages we recommend that OMG considers extending
 	IDL to allow overloaded methods.
 
 Again, I suggest to remove the recommendation because it is pragmatically
 unimplementable. Steve Vinoski recently posted an interesting article
 on this topic in comp.object.corba -- I have attached a copy below.
 

Resolution: 
Revised Text: 
Actions taken:
July 8, 1998: received issue
February 23, 1999: closed issue
Discussion: 

End of Annotations:=====
Return-Path: <michi@dstc.edu.au>
X-Authentication-Warning: tigger.dstc.edu.au: michi owned process
doing -bs
Date: Tue, 7 Jul 1998 22:32:42 +1000 (EST)
From: Michi Henning <michi@dstc.edu.au>
To: java2idl-rtf@omg.org, issues@omg.org
Subject: Java to IDL identifier mapping

orbos/98-02-01 says:

	We have provided name manglings to work around [the
limitations],
	but we recommend that OMG consider extending the definitions
of
	IDL identifiers so that a wider range of unicode characters
can
	be supported and so that case is significant in distinguishing
	identifiers.

I would suggest to remove this recommendation because it is
pragmatically
unimplementable for implementation languages that do not support
extended character sets. If the OMG were to take this step, it would
effectively alienate CORBA from all such implementation languages,
which
I believe would be detrimental to the success of CORBA.

	Because overloaded methods are a popular feature in
object-oriented
	programming languages we recommend that OMG considers
extending
	IDL to allow overloaded methods.

Again, I suggest to remove the recommendation because it is
pragmatically
unimplementable. Steve Vinoski recently posted an interesting article
on this topic in comp.object.corba -- I have attached a copy below.

On the topic of name mangling:

	It is possible that the name mangling rules we define in
Section 5.2
	may result in name collisions. [...] We believe that in
practice
	the name mangling rules we have chosen will create names that
would
	not have been used by normal Java programmers, and that the
risk
	of of namepsace collisions by the mangled names is acceptably
low.

I take it that if such a collision occurs due to the presence of an
abnormal Java programmer, the Java is not translatable to legal IDL?
If so, this should be explicitly stated.

The name mangling algorithm in general results in virtually
impossible-
to-use names in the IDL. For example, the spec quotes a
three-character
identifier that results in a seven-character mangled identifer (a$b ->
aU0024b). For a three-character identifer consisting of characters
that
are not legal in IDL, this results in an 18-character IDL identifier
that
looks something like

	U0024BU0024CU0024D

This is not to mention things like a 6-character Kanji or Hangul
identifier,
which would result in a 36-character IDL identifier that is completely
incomprehensible.

Similarly, the mangling for overloaded operations results in
identifiers
such as

	hello__long__a_b_c__long
	hello__CORBA_sequences_sequence_of_long

These identifier names are the result of overloaded Jave operations
that
are perfectly reasonable and innocuous.

For Jave identifiers differing only in case, the name mangling creates
identifiers such as jack_, Jack_0, and jAcK_1_3.

I am doubtful that such a name mangling is practical. I think it is
likely
that anyone will turn away in disgust from a Java interface that has
been
translated this way if it contains even a handful of such identifiers.
What is the point of having a mapping if it creates identifiers that
no-one
can use?

Why not take a different approach?

	1) Compile the Java. If the Java input contains any
identifiers
	   that don't map cleanly or the Jave contains overloaded
operations,
	   the compiler produces a list of the problematic identifers,
	   such as

		a$b
		jack
		Jack
		jAcK
		void hello();
		void hello(int x, a.b.c y, int z);
		void hello(int z[]);

	2) I now augment the list given to me by the compiler by
adding
	   the identifiers I want to use to map away from the
problems.
	   In other words, I edit the list to read:

		a$b					a_dollar_b
		jack					jack
		Jack					big_jack
		jAcK					mixed_up_jack
		void hello();				hello
		void hello(int x, a.b.c y, int z);	hello3
		void hello				hello_array

	3) I compile the Java spec a second time and feed the file I
have
	   just created into the compiler to augment the compilation.
	   The compiler uses the identifiers I have specified to
	   map away from the problems.

With this approach, I get IDL that is readable, makes sense to a
human,
is usable, and actually has some chance of not being rejected by
everyone.
The implementation effort is no larger than mangling names. Nice,
easy,
understandable, and effective. Why not use it?

							Cheers,

								Michi.
--
Michi Henning              +61 7 33654310
DSTC Pty Ltd               +61 7 33654311 (fax)
University of Qld 4072     michi@dstc.edu.au
AUSTRALIA
http://www.dstc.edu.au/BDU/staff/michi-henning.html

Steve's article:

Subject:      Re: Why no overloaded methods in IDL?
From:         Steve Vinoski <vinoski@iona.com>
Date:         1998/06/20
Message-ID:   <358B6423.F10D57DE@iona.com>
Newsgroups:   comp.object.corba 

I chopped out a bunch of stuff in a failed attempt to keep this
short. The topic
is whether IDL should support the overloading of operation names.

Greg Pasquariello wrote:

> The Java-to-IDL spec already forces this, in the opposite
direction.  In
> addition, this is only an issue for mappings to languages that do
not
> support IDL.

On a side note, my own personal opinion is that they can do whatever
they want
with the Java-to-IDL mapping, including making it as ugly as they
like, as long
as they don't try to change IDL just to make that mapping easier. The
recent
damage to the OMG IDL grammar and the CORBA object model caused by the
Java-hype-influenced Objects By Value Specification is bad enough.
But don't
get me started.

> This is not directed personally at anyone, but so far all I've seen
are
> people telling me why it can't be done!  I think something that
allows for a
> richer interface mapping, particularly one that maps naturally to
many
> modern languages, should be seriously considered.

I agree with Greg that it would have been much nicer to allow for
overloading
when mapping into languages like C++ and Java that support it, since
there are
many, many more CORBA users writing in C++ and Java than there are in
C or other
languages that do not support overloading. However, I think there are
some
pretty tough issues to resolve before it would even start appearing to
be
possible.

One issue is that we would have to figure out exactly what type
equality means
in IDL. Believe it or not, this is not at all clear, even after all
these years
and despite the presence of the TypeCode::equal operation. For
example, in the
C++ mapping, the following two sequences yield different types that
would allow
for overloading of C++ functions:

typedef sequence<long> A;
typedef sequence<long> B;

However, in C, they map to the same underlying sequence type of which
A and B
are just aliases. So, are these different IDL types or not?
TypeCode::equal says
they are, but that's not the only legal or useful interpretation. What
if C++
said the two types were just aliases of the same underlying type, as
in the C
mapping? If they're different in IDL, overloading two IDL operations
where one
takes an A and one takes a B would be fine, and yet such overloading
would fail
to map into overloading in this hypothetical C++ mapping. Similarly,
string and
string<3> might be different in IDL, but they're not in C or C++. Jon
Biggar has
recently been on a crusade to fix the type equality problem, but I
don't
remember the details so I don't know if his proposals would actually
tighten
things up enough to allow overloading in IDL.

IDL overloading might also have to take into account the lowest common
denominator programming language overloading rules. For example, C++
does not
allow overloading on return type only, so IDL might have to avoid that
too.

As someone else has already pointed out, there is also the issue of
overloading
operations in base interfaces with operations in derived
interfaces. One
question when faced with such a situation is whether the derived
interface
writer actually intended to overload the operation in the base
interface four
bases down, or was he just unaware that the same name was already
used? Should
the IDL compiler warn about such a situation, or should special
overloading
syntax always be required so as to make overloading explicit?

Someone also (sorry for being too lazy to look it up who it was)
brought up the
issue of how to name overloaded operations on the wire so as to make
them
unique. I believe Greg has suggested some form of mangling performed
by the IDL
compiler, but that would be less than ideal at the very least for the
poor DII
programmer who has to use the mangled names to create requests, and
for the
Interface Repository user who has to look up operations by their
mangled names.
The mangling would have to be done such that adding operations did not
renumber
or rename existing operations, as has already been pointed out, and
the mangled
names would need to be mappable directly into the various languages
for which
mappings already exist, so the mangling would have to avoid putting
leading
numbers, underscores, etc. onto the mangled names.

A better approach might be to make the IDL writer responsible for
mangling,
maybe like this:

interface A {
    void op(in string s) mangle(op_s);
    void op(in char c) mangle(op_c);
};

A language mapping like C would ignore the name "op" and always use
the "op_s"
and "op_c" names, where C++ could just use "op" for both. This way,
the mangled
names would be documented in the interface, could be made less ugly
than an
automated mangling approach, could be checked by the IDL compiler to
avoid
collisions in base interfaces, and would serve as the "on the wire"
operation
names. Of course, at this point you'd have to ask if it's really worth
it -- if
the IDL developer has to think up the mangled name anyway, one could
argue that
they might as well not overload them in the first place and just go
with the
mangled name.

Finally, there is much inertia in the OMG when it comes to big changes
in IDL,
and rightfully so. IDL is the basis for all CORBA interfaces, both
those in the
CORBA specs and those developed by application developers, and changes
in IDL
tend to ripple into all corners of the CORBA spec and your
applications. Changes
of this magnitude, even if they are fairly straightforward, are
greeted with
skepticism simply because folks are afraid of unforeseen consequences,
and
because of the amount of work they take to address all the places in
all the
specs that are affected. The recent major screw-up of the IDL grammar
by the
Objects By Value submission should be evidence enough that tampering
with IDL
should be left to the experts.

The bottom line in my opinion is that while overloading would have
been nice if
designed into IDL from the start, it's just too late and too
far-reaching to do
it now.

--steve

--
Steve Vinoski                         vinoski at iona.com
Senior Architect                           1-800-ORBIX-4U
IONA Technologies, Inc.           Cambridge, MA USA 02138
60 Aberdeen Ave.      http://www.iona.com/hyplan/vinoski/
Copyright 1998 Stephen B. Vinoski. All Rights Reserved.

Return-Path: <dan.frantz@beasys.com>
From: "Daniel R. Frantz" <dan.frantz@beasys.com>
To: "'Michi Henning'" <michi@dstc.edu.au>, <java2idl-rtf@omg.org>,
        <issues@omg.org>
Subject: RE: Java to IDL identifier mapping
Date: Tue, 7 Jul 1998 10:03:09 -0400
X-MSMail-Priority: Normal
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.2106.4

>-----Original Message-----
>From: Michi Henning [mailto:michi@dstc.edu.au]
>Sent: Tuesday, July 07, 1998 8:33 AM
>To: java2idl-rtf@omg.org; issues@omg.org
>Subject: Java to IDL identifier mapping
>
>
>orbos/98-02-01 says:
>
>	We have provided name manglings to work around [the
>limitations],
...
>
>I would suggest to remove this recommendation because it is
...

That recommendation was in the Java-to-IDL document's "rationale"
section. It isn't normative. Like recommendations in other
submissions'
rationale sections, it has no force. Either the Task Force picks it up
and issues an RFP or it doesn't. If somebody else make a normative
proposal in another submission, it gets discussed in terms of that
submission.

I guess what I'm saying, Michi, is that it's not something to worry
about.

FWIW, I agree with both your comments (name mangling,
overloading). They
recommend and we say forget it. Easy enough.

Dan

Return-Path: <crawley@dstc.edu.au>
To: Michi Henning <michi@dstc.edu.au>
cc: java2idl-rtf@omg.org, issues@omg.org, crawley@dstc.edu.au
Subject: Re: Java to IDL identifier mapping 
Date: Wed, 08 Jul 1998 12:09:25 +1000
From: Stephen Crawley <crawley@dstc.edu.au>


[WARNING:  Humour Alert!]

> orbos/98-02-01 says:
>	It is possible that the name mangling rules we define in
Section 5.2
>	may result in name collisions. [...] We believe that in
practice
>	the name mangling rules we have chosen will create names that
would
>	not have been used by normal Java programmers, and that the
risk
>	of of namepsace collisions by the mangled names is acceptably
low.
> 
> I take it that if such a collision occurs due to the presence of an
> abnormal Java programmer, the Java is not translatable to legal IDL?
> If so, this should be explicitly stated.

Perhaps ... the spec needs to say how a java2idl compiler can diagnose
abnormal Java programmers, and how what it should do if it does.

  Should it dispatch them?  

  Should it name-mangle them?

  Should it narrow them into a more derived class?

  Should it garbage collect them?

  Should it update their behaviour via their meta-object interfaces?

  Should it force them to use the C++ binding?

:-)

-- Steve