Issue 5920: use of IDL type string for data that's logged is not cross-language compati (lwlog-ftf) Source: Objective Interface Systems (Mr. Victor Giddings, victor.giddings(at)mail.ois.com) Nature: Uncategorized Issue Severity: Summary: Discussion: The data that can be logged with the Lightweight Log service is an encoding of arbitrary binary data. The type used in the ProducerLogRecord for this data is "string" in both the PIM and the PSM. This is inappropriate, since strings are not considered to be arbitrary encodings. They are associated with particular (and different) character encodings in different technologies. This can be seen even within the CORBA PSM, where the Java language mapping transforms a string to sixteen-bit characters internally. Proposed Resolution: Change the type of the logData field in the ProducerLogRecord to OctetSeq Resolution: Revised Text: Actions taken: April 30, 2003: received issue Discussion: use of the IDL type string for data that is logged is not give cross-language compatible End of Annotations:===== X-Sender: giddiv@postel X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Wed, 30 Apr 2003 12:18:21 -0400 To: issues@omg.org From: Victor Giddings Subject: Use of string for lwlog log data encoding Cc: lwlog-ftf@omg.org This issue is directed to the Lightweight Log FTF: Summary: The use of the IDL type string for data that is logged is not give cross-language compatible. Discussion: The data that can be logged with the Lightweight Log service is an encoding of arbitrary binary data. The type used in the ProducerLogRecord for this data is "string" in both the PIM and the PSM. This is inappropriate, since strings are not considered to be arbitrary encodings. They are associated with particular (and different) character encodings in different technologies. This can be seen even within the CORBA PSM, where the Java language mapping transforms a string to sixteen-bit characters internally. Proposed Resolution: Change the type of the logData field in the ProducerLogRecord to OctetSeq. Victor Giddings mailto:victor.giddings@ois.com Senior Product Engineer +1 703 295 6500 Objective Interface Systems Fax: +1 703 295 6501 From: To: lwlog-ftf@omg.org Subject: RE: issue 5920 -- LW Log FTF issue X-Mailer: Lotus Notes Release 5.0.10 March 22, 2002 Date: Wed, 30 Apr 2003 16:22:54 -0500 X-MIMETrack: Serialize by Router on CollinsCRSMTP02/CedarRapids/RockwellCollins(Release 5.0.10 |March 22, 2002) at 04/30/2003 04:22:55 PM, Serialize complete at 04/30/2003 04:22:55 PM One of the main considerations of the light weight logger was that all log data be in human readable form (strings), I don't know how this change will play as it completely changes that part of the original logger intent. Please correct me if I am missing something here. I consulted with some of my associates today, and they concur that this would be a very unfavorable change. David Fitkin Rockwell Collins "Pilhofer, Frank" 04/30/2003 04:05 PM To: , cc: Subject: RE: issue 5920 -- LW Log FTF issue > > The data that can be logged with the Lightweight Log service > is an encoding of arbitrary binary data. > Just wondering: why? What do you want to log? > > Proposed Resolution: > Change the type of the logData field in the ProducerLogRecord > to OctetSeq > I'm ambivalent about making any change. But if there is consensus that we need the ability to log arbitrary data, then the type should be changed to 'any', not OctetSeq. Frank Date: Wed, 30 Apr 2003 19:25:32 -0400 From: "Manfred R. Koethe" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en To: dwfitkin@rockwellcollins.com CC: lwlog-ftf@omg.org Subject: Re: issue 5920 -- LW Log FTF issue X-Loop-Detect: 1 I agree with Dave and am strongly opposed to a change to OctetSeq. The reasons are: 1. The Octet IDL type is transported 'as is' and not marshalled. This implies that OctetSeq is also unmarshalled, which impairs interoperability. 2. The mapping from IDL type String to whatever equivalent representation in a particular programming language is handled by the language mapping and should not be the business of the Log Service. And further, as the original origin [ :-) ] of the Lightweight Log Service (the SCA) calls for "human readable log records", I put it up as String when writing the RFC submission. Kind regards, Manfred dwfitkin@rockwellcollins.com wrote: One of the main considerations of the light weight logger was that all log data be in human readable form (strings), I don't know how this change will play as it completely changes that part of the original logger intent. Please correct me if I am missing something here. I consulted with some of my associates today, and they concur that this would be a very unfavorable change. David Fitkin Rockwell Collins "Pilhofer, Frank" 04/30/2003 04:05 PM To: , cc: Subject: RE: issue 5920 -- LW Log FTF issue > > The data that can be logged with the Lightweight Log service > is an encoding of arbitrary binary data. > Just wondering: why? What do you want to log? > > Proposed Resolution: > Change the type of the logData field in the ProducerLogRecord > to OctetSeq > I'm ambivalent about making any change. But if there is consensus that we need the ability to log arbitrary data, then the type should be changed to 'any', not OctetSeq. Frank -- ___________________ / Manfred R. Koethe \_____________________________________ 88solutions Corp. E-Mail: koethe@88solutions.com 37 Mague Avenue Tel: +1 (617) 916 5886 Newton, MA 02465-1553 FAX: +1 (617) 916 5887 U.S.A. _____________________________"We make your business flow"_ X-Sender: giddiv@postel X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Thu, 01 May 2003 10:45:36 -0400 To: lwlog-ftf@omg.org From: Victor Giddings Subject: Re: issue 5920 -- LW Log FTF issue I'll try to address all three responses here. I believe the utility of the Lightweight Logging service will be greatly increased by the proposed change. Handling only "human readable log records" is, IMHO, a very restricted use case, especially for use in embedded systems. Indeed, I consult with a number of programs, and they have all asked the question: "How do I log arbitrary data defined by IDL?" None of them have restricted themselves to human readable log records. This change allows me to provide an OMG standard answer: "streaming into a byte sequence is provided by standard and custom marshalling and logging is provided by the LwLog Service." Without this change, we have to (and will) initiate another effort to define a different standard. The increased utility allowed by this change will come at little cost: User cost: In memory representations for strings and octet sequences are essentially the same in both C++ and Ada, so there is little change in usage patterns in these languages, whether the data logged is binary or text. The biggest change in representation occurs in the Java language where characters are 16-bit Unicode. In Java, standard streaming operations are defined as part of the portable stubs and skeletons, so that the use of human readable log records is easily, although indirectly, accommodated. However, keeping the specification as is, would disallow these streaming operations to be used for any binary encoding, since the result undergoes 16-bit to 8-bit packing when marshalled. Implementation cost: it is hard to imagine any implementation cost; the logging service never interprets the data logged. Now the question of "orignal intent". Remember that this is the result of an RFC effort, so did not necessarily receive the benefit of points of view beyond the RFC submitters that usually occur in an RFP process. I have participated in a number of processes, and although they can be overly long and painful, they have all resulted in stronger specifications that any single initial submission. So I don't treat original intent as sacrosanct. Also, I have to question the intent. How does one explain the following paragraph in the context of a restriction to human readable log records: "Due to the constraints of an embedded environment, the Lightweight Logging Service uses a dedicated structure to hold logging records, instead of type any in the Telecom Log Service. Combined with a list of "well known" typecodes, this structure provides the restrictive control on type variety necessary in an embedded system. Further, an "any-less" structure simplifies the use of embedded ORBs, which frequently impose restrictions on type any." What is the use of this 'list of "well known" typecodes' if the only type is human-readable text, and not encoded binary data? At 07:25 PM 4/30/2003 -0400, Manfred R. Koethe wrote: I agree with Dave and am strongly opposed to a change to OctetSeq. The reasons are: 1. The Octet IDL type is transported 'as is' and not marshalled. This implies that OctetSeq is also unmarshalled, which impairs interoperability. 2. The mapping from IDL type String to whatever equivalent representation in a particular programming language is handled by the language mapping and should not be the business of the Log Service. And further, as the original origin [ :-) ] of the Lightweight Log Service (the SCA) calls for "human readable log records", I put it up as String when writing the RFC submission. Kind regards, Manfred dwfitkin@rockwellcollins.com wrote: One of the main considerations of the light weight logger was that all log data be in human readable form (strings), I don't know how this change will play as it completely changes that part of the original logger intent. Please correct me if I am missing something here. I consulted with some of my associates today, and they concur that this would be a very unfavorable change. David Fitkin Rockwell Collins "Pilhofer, Frank" 04/30/2003 04:05 PM To: , cc: Subject: RE: issue 5920 -- LW Log FTF issue > > The data that can be logged with the Lightweight Log service > is an encoding of arbitrary binary data. > Just wondering: why? What do you want to log? > > Proposed Resolution: > Change the type of the logData field in the ProducerLogRecord > to OctetSeq > I'm ambivalent about making any change. But if there is consensus that we need the ability to log arbitrary data, then the type should be changed to 'any', not OctetSeq. Frank -- ___________________ / Manfred R. Koethe \_____________________________________ 88solutions Corp. E-Mail: koethe@88solutions.com 37 Mague Avenue Tel: +1 (617) 916 5886 Newton, MA 02465-1553 FAX: +1 (617) 916 5887 U.S.A. _____________________________"We make your business flow"_ Victor Giddings mailto:victor.giddings@ois.com Senior Product Engineer +1 703 295 6500 Objective Interface Systems Fax: +1 703 295 6501 Date: Thu, 01 May 2003 11:42:00 -0400 From: "Manfred R. Koethe" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 X-Accept-Language: en-us, en To: Victor Giddings CC: lwlog-ftf@omg.org Subject: Re: issue 5920 -- LW Log FTF issue X-Loop-Detect: 1 Victor, I agree that logging of arbitrary data might be beneficial in some cases. However, the only way to achieve this in an interoperable way is by using an Any istead of a String. This would preserve correct marshalling, while OctetSeq would bypass marshalling and therefore defeat interoparability. Concerns of changing the log record struct to an Any are: - A change to Any would break existing implementations - Any processing is not in Minimum CORBA and therefore not available in all embedded ORBs. So much for binary data. I strongly disagree with you regarding CORBA String versus various programming language representations. The conversions are handled transparently by the language bindings. This is a cornerstone of CORBA language independence and interoperability. We should not even think about tampering with this. Please remember the long-winding discussion around character set negotiation. Be glad this is solved, we should not open that can again. Kind regards, Manfred Victor Giddings wrote: I'll try to address all three responses here. I believe the utility of the Lightweight Logging service will be greatly increased by the proposed change. Handling only "human readable log records" is, IMHO, a very restricted use case, especially for use in embedded systems. Indeed, I consult with a number of programs, and they have all asked the question: "How do I log arbitrary data defined by IDL?" None of them have restricted themselves to human readable log records. This change allows me to provide an OMG standard answer: "streaming into a byte sequence is provided by standard and custom marshalling and logging is provided by the LwLog Service." Without this change, we have to (and will) initiate another effort to define a different standard. The increased utility allowed by this change will come at little cost: User cost: In memory representations for strings and octet sequences are essentially the same in both C++ and Ada, so there is little change in usage patterns in these languages, whether the data logged is binary or text. The biggest change in representation occurs in the Java language where characters are 16-bit Unicode. In Java, standard streaming operations are defined as part of the portable stubs and skeletons, so that the use of human readable log records is easily, although indirectly, accommodated. However, keeping the specification as is, would disallow these streaming operations to be used for any binary encoding, since the result undergoes 16-bit to 8-bit packing when marshalled. Implementation cost: it is hard to imagine any implementation cost; the logging service never interprets the data logged. Now the question of "orignal intent". Remember that this is the result of an RFC effort, so did not necessarily receive the benefit of points of view beyond the RFC submitters that usually occur in an RFP process. I have participated in a number of processes, and although they can be overly long and painful, they have all resulted in stronger specifications that any single initial submission. So I don't treat original intent as sacrosanct. Also, I have to question the intent. How does one explain the following paragraph in the context of a restriction to human readable log records: "Due to the constraints of an embedded environment, the Lightweight Logging Service uses a dedicated structure to hold logging records, instead of type any in the Telecom Log Service. Combined with a list of "well known" typecodes, this structure provides the restrictive control on type variety necessary in an embedded system. Further, an "any-less" structure simplifies the use of embedded ORBs, which frequently impose restrictions on type any." What is the use of this 'list of "well known" typecodes' if the only type is human-readable text, and not encoded binary data? At 07:25 PM 4/30/2003 -0400, Manfred R. Koethe wrote: I agree with Dave and am strongly opposed to a change to OctetSeq. The reasons are: 1. The Octet IDL type is transported 'as is' and not marshalled. This implies that OctetSeq is also unmarshalled, which impairs interoperability. 2. The mapping from IDL type String to whatever equivalent representation in a particular programming language is handled by the language mapping and should not be the business of the Log Service. And further, as the original origin [ :-) ] of the Lightweight Log Service (the SCA) calls for "human readable log records", I put it up as String when writing the RFC submission. Kind regards, Manfred dwfitkin@rockwellcollins.com wrote: One of the main considerations of the light weight logger was that all log data be in human readable form (strings), I don't know how this change will play as it completely changes that part of the original logger intent. Please correct me if I am missing something here. I consulted with some of my associates today, and they concur that this would be a very unfavorable change. David Fitkin Rockwell Collins "Pilhofer, Frank" 04/30/2003 04:05 PM To: , cc: Subject: RE: issue 5920 -- LW Log FTF issue > > The data that can be logged with the Lightweight Log service > is an encoding of arbitrary binary data. > Just wondering: why? What do you want to log? > > Proposed Resolution: > Change the type of the logData field in the ProducerLogRecord > to OctetSeq > I'm ambivalent about making any change. But if there is consensus that we need the ability to log arbitrary data, then the type should be changed to 'any', not OctetSeq. Frank -- ___________________ / Manfred R. Koethe \_____________________________________ 88solutions Corp. E-Mail: koethe@88solutions.com 37 Mague Avenue Tel: +1 (617) 916 5886 Newton, MA 02465-1553 FAX: +1 (617) 916 5887 U.S.A. _____________________________"We make your business flow"_ Victor Giddings mailto:victor.giddings@ois.com Senior Product Engineer +1 703 295 6500 Objective Interface Systems Fax: +1 703 295 6501 -- ___________________ / Manfred R. Koethe \_____________________________________ 88solutions Corp. E-Mail: koethe@88solutions.com 37 Mague Avenue Tel: +1 (617) 916 5886 Newton, MA 02465-1553 FAX: +1 (617) 916 5887 U.S.A. _____________________________"We make your business flow"_ X-Sender: giddiv@postel X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Thu, 01 May 2003 13:05:07 -0400 To: "Manfred R. Koethe" From: Victor Giddings Subject: Re: issue 5920 -- LW Log FTF issue Cc: lwlog-ftf@omg.org At 11:42 AM 5/1/2003 -0400, Manfred R. Koethe wrote: Victor, I agree that logging of arbitrary data might be beneficial in some cases. However, the only way to achieve this in an interoperable way is by using an Any istead of a String. This would preserve correct marshalling, while OctetSeq would bypass marshalling and therefore defeat interoparability. I don't understand this statement. What interoperability are you referring to? Between what components or products? An octet sequence is marshalled. The encoding of the data in the octet sequence is the responsibility of the developer, just as the encoding of information into a text string is. Concerns of changing the log record struct to an Any are: - A change to Any would break existing implementations - Any processing is not in Minimum CORBA and therefore not available in all embedded ORBs. This is not true. Type Any is required in Minimum CORBA. There are products, including ours, that allow subsetting below the minimum, with significant gain in footprint. However, I am not supporting using type Any for logging. So much for binary data. I strongly disagree with you regarding CORBA String versus various programming language representations. The conversions are handled transparently by the language bindings. This is a cornerstone of CORBA language independence and interoperability. We should not even think about tampering with this. Please remember the long-winding discussion around character set negotiation. Be glad this is solved, we should not open that can again. Kind regards, Manfred Victor Giddings mailto:victor.giddings@ois.com Senior Product Engineer +1 703 295 6500 Objective Interface Systems Fax: +1 703 295 6501 Date: Fri, 09 May 2003 16:05:06 -0400 From: Kevin Richardson Organization: The MITRE Corporation X-Mailer: Mozilla 4.79 [en]C-20020130M (Windows NT 5.0; U) X-Accept-Language: en To: lwlog-ftf@omg.org Subject: Issue Resolution Poll 3 Okay, I'm trying to wrap this up by Monday 12 May. I've enclosed information regarding issues 5767 and 5920 (change logData field from a string to an octetSequence). A summary of the recommendations and what I (think!!) I know up to this point. 5767 (Same wording as present in the 5767 next try mail, but the text has been duplicated for the producerId and producerName operations) Thales (yes) Rockwell (yes - would like to eliminate either the retrieveByProducerId or retrieveByName) 5920 (Recommend to not make this change) 88 Solutions (yes) Rockwell (yes) Mercury (yes) OIS (no) please let me know if I've misrepresented your vote or fill in the gaps. If we can get this done by Monday afternoon, then you'll have the extra benefit of not having to hear from me regarding the Log. For those who are interested I've enclosed my working copy of the final report. Issue Resolution Poll 3.doc Final-Report-V1 Log.zip X-Sender: giddiv@postel X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Mon, 12 May 2003 13:51:13 -0400 To: Kevin Richardson From: Victor Giddings Subject: Re: Issue Resolution Poll 3 Cc: lwlog-ftf@omg.org At 04:05 PM 5/9/2003 -0400, Kevin Richardson wrote: Okay, I'm trying to wrap this up by Monday 12 May. I've enclosed information regarding issues 5767 and 5920 (change logData field from a string to an octetSequence). A summary of the recommendations and what I (think!!) I know up to this point. 5767 (Same wording as present in the 5767 next try mail, but the text has been duplicated for the producerId and producerName operations) Thales (yes) Rockwell (yes - would like to eliminate either the retrieveByProducerId or retrieveByName) 5920 (Recommend to not make this change) 88 Solutions (yes) Rockwell (yes) Mercury (yes) OIS (no) please let me know if I've misrepresented your vote or fill in the gaps. If we can get this done by Monday afternoon, then you'll have the extra benefit of not having to hear from me regarding the Log. For those who are interested I've enclosed my working copy of the final report. Objective Interface votes: 5767 - No. The use of unbounded sequences in the queries is not acceptable. I would rather see this better integrated with the present query. I am concerned about finalizing this specification without this issue being adequately addressed. I send a follow-on email outlining my concerns and suggestions. 5920 - No. (Maybe I'll fix this in the Lightweight Services submission ;-) Victor Giddings mailto:victor.giddings@ois.com Senior Product Engineer +1 703 295 6500 Objective Interface Systems Fax: +1 703 295 6501