Issue 1614: OBV spec inefficient for dending large number of small objects (obv-rtf) Source: (, ) Nature: Revision Severity: Summary: Summary: One of the common patterns used in IDL specifications is to pass a sequence of a data type in order to cut down on network round trips. The current OBV spec (orbos/98-01-01) even suggests sending a graph of objects and optimizing for the case where the same object occurs multiple times in the graph (which I assume will normally be a small number of the total objects). The spec seems to be inefficient for sending a large number of small objects though. I have looked at the errata before and don"t recall any relavent changes but know the RTF are considering some now. Resolution: Revised Text: Actions taken: June 30, 1998: received issue July 30, 1998: closed issue Discussion: End of Annotations:===== Return-Path: Sender: tim@protocol.com Date: Mon, 29 Jun 1998 17:45:56 -0700 From: Tim Brinson Organization: Protocol Systems, Inc. To: issues@omg.org, obv-rtf@omg.org Subject: OBV Efficiency Issue (and a couple others) OBV RTF Members, A month or two ago I looked at the OBV spec as we are considering using values in responses to other OMG RFPs. I thought there might be an efficiency problem due to the over head number of bytes used to send a value. The current discussions about the encoding proposals caused me to look at the marshaling format again. As I recall one of the goals stated by an initial submitter to OBV (I don't remember who) is that values could be used for data objects (structs with inheritance) with no more over head than a plain struct. I do understand that could not be achieved in order to pass inherited types (and probably other issues). Of course it is desirable (from a user's point of view) to pass these values efficiently. Issue 1: ------- One of the common patterns used in IDL specifications is to pass a sequence of a data type in order to cut down on network round trips. The current OBV spec (orbos/98-01-01) even suggests sending a graph of objects and optimizing for the case where the same object occurs multiple times in the graph (which I assume will normally be a small number of the total objects). The spec seems to be inefficient for sending a large number of small objects though. I have looked at the errata before and don't recall any relavent changes but know the RTF are considering some now. Overhead per value instance in the stream: value_tag - 4 bytes repository_id - 12 bytes for indirection (3 for padding) chunk_size_tag - 4 bytes (assuming one chunk) end_tag - 4 bytes (don't know how many of these?) With the latest proposed changes add: flag_tag - 4 bytes (3 for padding) rep_ids - 0 bytes (the long cancels removing the flag_tag) If it were a derived value type then there could be 8 more bytes for each level. I'm not clear if these are optional with the latest proposal or always present. So there is a minimum of 28 bytes per value instance which will be small for many values but may be significant for others. For example: // File: FooBar.idl // I'm trying to use the OMG style guide - cool, huh? #ifndef _FOO_BAR_IDL_ #define _FOO_BAR_IDL_ #include module FooBar { value Foo { float state1; long state2; octet state3; boolean state4; }; typedef sequence FooSeq; interface Bar { void get_foos( out FooSeq foo_seq ); }; }; #endif // _FOO_BAR_IDL_ The payload for each Foo is 8+4+1+1= 14 bytes (actually 16 including padding between values in the seq). Even if you double the number of state fields it just becomes the same size as the minimum overhead. If the foo_seq is sent with a large length the overhead starts impacting the throughput. To use a value where I expect large sequences I would want the over head to be down around 20% (depending on the usage scenerio). Of course this is a made up example and I don't have any statistics about how many sequences of small structs have been used in the past (in OMG and proprietary IDL specs) or any runtime statistics on the typical length of sequences used. Use your own judgement whether you think people should use OBV for things like this. Without OBV we loose the ability to extend the data types via inheritance. All I ask is that as you are considering rearranging the marshaling format you also see if there is a way to minimize the amount of overhead. With some experience with using OBV we will start to learn the pros and cons of using it for particular situations.