Issue 11035: Must send GAP when filtering on writer-side
Issue 12499: Clarify usage of KeyHash
Issue 12500: Add implementation note on not serializing redundant info in builtin topic
Issue 12501: Include key in serialized ParticipantMessageData
Issue 12502: Clarify when a Payload is present in a submessage
Issue 12503: Some submessage kinds are logically redundant and not extensible
Issue 12504: Change KeyHash from SubmessageElement to Parameter
Issue 12505: Change StatusInfo from SubmessageElement to Parameter
Issue 12506: Update protocol version to 2.1
Issue 12507: Properly assign vendor IDs
Issue 12508: Computation of KeyHash is unspecified
Issue 12509: The specification does not state how to generate unique GUIDs
Issue 11035: Must send GAP when filtering on writer-side (dds-interop-rtf)
Click here for this issue's archive.
Source: Real-Time Innovations (Mr. Kenneth Brophy, ken@rti.com ken.brophy@rti.com)
Nature: Uncategorized Issue
Severity:
Summary:
Title: Must send GAP when filtering on writer-side regardless of reliability QoS setting
Source:
Real-Time Innovations, Inc. (Ken Brophy, ken@rti.com)
Summary:
This issue is not currently in a state to be resolved. What follows are various thoughts on the issue and possible solutions to be discussed. This issue will not be resolved in time for the finalization task force and is included here to be documented for the revision task force.
Discussion topic: what are DDS semantics for combining filtering with the deadline QoS?
Should the deadline be triggered when all samples during the deadline period were filtered out?
That is, does deadline require at least one sample to arrive every deadline period seconds that passed the filter?
Or is the deadline satisfied when any sample arrives within the period, whether filtered out or not?
If deadline only applies to samples that pass the filter, RTPS needs no changes, simply use the GAP subMessage to avoid incorrect onDataLost callbacks.
If not, we run into a problem when using keyed DataWriters and finite deadlines. As the deadline applies on a per instance basis, the Reader expects at least one update for every instance, even when none of the updates for a particular instance pass the filter. A GAP message does not indicate for which instance an update was filtered out, so it cannot be used by the Reader to verify the deadline constraint. Instead, we should consider using an empty DATA message instead, possibly with a flag that states the update did not pass any filter. This would also be useful to add a new instance state NOT_ALIVE_FILTERED or so later on to the DDS spec.
Another possibility would be to add a list of KeyHashes to the GAP message. SO that a GAP that is caused by a CFT actually encodes the instances that are being gapped. This would not cause incorrect firings of the DEADLINE and as a result would maintain ownership of instances even if they are filtered out…
There are two ways to do this.
Either we separate GAPs that correspond to filtered samples from those that correspond to irrelevant samples. So in effect we have two kinds of GAP messages
Or we list explicitly the sequence number of each filtered message along with its KeyHash.
Not clear what would be easier implementationwise. The samples that have been filtered are still on the writer so it appears that either implementation would work.
Option (1) would save putting the sequence number with each KeyHash. This can be 4 or 8 bytes per instance, depending on whether we put the sequence number as is, or we encode it as an increment
Option (2) would cause additional GAP submessages to be sent which is an overhead of 28 bytes. Not clear what is less costly…
Also, if we use Option(1), then the messages that represent real GAPs can be sent via multicast; but this is only likely to occur when late joiners appear as normally there would be no "irrelevant" gaps if data is published immediately. Moreover we can in practice still do this and separate the GAP messages that represent real GAPs from the ones that don't. Option 1 does not force us to combine, just provides the means to do so…
Option (2) has the problem that in certain edge cases the overhead is significant. For example if each we have a irrelevant-sample GAP followed by a filtered sample, followed by an irrelevant sample gap, etc. then we end up sending one GAP message per filtered sample with is 28 bytes of overhead per filtered sample versus a single GAP with 4 bytes of overhead per filtered sample. Also the processing is much more efficient as each GAP message is dispatched separately up the receiver's stack.
For this reason it appears that Option 1 is more flexible, and the overhead is more stable. Opt
Proposal(s):
Always send GAPs for filtered-out messages (both in the BestEffort and in the RELIABLE) cases
If the type is Keyed, then the GAP also includes at the end a sequence of :
struct FilteredSampleDesc {
long gapStartOffset;
KeyHashPrefix keyHashPrefix;
KeyHashSuffic keyHashSuffix;
};
The GAP message gets two additional flags:
KeyHashFlag
indicates the presence of the KeyHashPrefix
FilteredSamplesFlag
Indicates the presence of the sequence< FilteredSampleDesc>
An issue needs to be filed against the DDS spec to clarify:
(a) Whether the deadline as specified by a DataReader should apply to the samples that pass the DataReader filter or to the samples sent by any writer?
(b) Whether a new instance_state ALIVE_FILTERED should be added such that the DataReader can determine that a sample was filtered and potentially take action on that.
(c) Whether an API or a QoS should be added to the DataReader to allow the DataReader to remove the instance information for instances with state ALIVE_FILTERED after all samples are taken. This allows resources to be conserved in the case where once filtered we expect the instance to remain filtered and also allows a reader to be notified if the instance becomes unfiltered.
(d) Weather to add a filtered_generation_count that the instance has becomed ALIVE after being in the ALIVE_FILTERED
Resolution:
T4 should include code and description that states that when the sample is not relevant, send a GAP…same for the stateful best effort writer.
Revised Text:
Proposal(s):
Always send GAPs for filtered-out messages (both in the BestEffort and in the RELIABLE) cases
If the type is Keyed, then the GAP also includes at the end a sequence of :
struct FilteredSampleDesc {
long gapStartOffset;
KeyHashPrefix keyHashPrefix;
KeyHashSuffic keyHashSuffix;
};
The GAP message gets two additional flags:
KeyHashFlag
indicates the presence of the KeyHashPrefix
FilteredSamplesFlag
Indicates the presence of the sequence< FilteredSampleDesc>
An issue needs to be filed against the DDS spec to clarify:
(a) Whether the deadline as specified by a DataReader should apply to the samples that pass the DataReader filter or to the samples sent by any writer?
(b) Whether a new instance_state ALIVE_FILTERED should be added such that the DataReader can determine that a sample was filtered and potentially take action on that.
(c) Whether an API or a QoS should be added to the DataReader to allow the DataReader to remove the instance information for instances with state ALIVE_FILTERED after all samples are taken. This allows resources to be conserved in the case where once filtered we expect the instance to remain filtered and also allows a reader to be notified if the instance becomes unfiltered.
(d) Weather to add a filtered_generation_count that the instance has becomed ALIVE after being in the ALIVE_FILTERED
Resolution:
T4 should include code and description that states that when the sample is not relevant, send a GAP…same for the stateful best effort writer.
Revised Text:
Usage of keyHash is currently underspecified. It is not clear whether its presence in the message is mandatory and how it should be processed. Resolution: Expanded descriptions of the intended usage of keyHash parameter are added. Revised Text: Insert new Section 8.7.8 8.7.8 "Key Hash" The Key Hash provides a hint for the key that uniquely identifies the data-object that is being changed within the set of objects that have been registered by the RTPS Writer. Nominally the key is part of the serialized data of a data submessage. Using the key hash benefits implementations providing a faster alternative than deserializing the full key from the received data-object. When the key hash is not received by a DataReader, it should be computed from the data itself. If there is no data in the submessage, then a default zero-valued key hash should be used by the DataReader If there is a KeyHash, if present, shall be computed as described in Section 9.6.3.3
Simple discovery data has parameters containing redundant information. To not waste bandwidth, implementations should be able to refrain from sending redundant information. As such, the spec should include an implementation note on this matter. Resolution: Add text in PSM, Section 9.6.2.2 describing the general allowable case in which implementations may avoid serializing parameters with redundant information. Revised Text: Add to Section 9.6.2.2, right before section 9.6.2.2.1 the following paragraph:: For optimization, implementations of the protocol may refrain from include in the Data submessage a parameter if it contains information that is redundant with parameters present in that same Data submessage. As a result of this optimization an implementation can omit the serialization of the parameters listed in - ParameterId subspaces- ParameterId subspaces Table 9.1 - ParameterId subspaces BuiltinEndpoint Parameter which may be omitted Parameter where the information on the omitted parameter can be found SPDPdiscoveredParticipantData ParticipantProxy::guidPrefix ParticipantBuiltinTopicData::key DiscoveredReaderData ReaderProxy::remoteReaderGuid SubscriptionBuiltinTopicData::key DiscoveredWriterData ReaderProxy::remoteWriterGuid PublicationBuiltinTopicData::key . For example, an implementation of the protocol sending DATA message containing the SPDPdiscoveredParticipantData may omit the parameter that contains the guidPrefix. If the guidPrefix is not present in the DATA message, the implementation of the protocol in the receiver side must derive this value from the "key" parameter which is always present in the DATA message.
The specification states in section 9.6.2.1 that the key portion of ParticipantMessageData is not serialized as part of the DATA submessage. The stated argument is that the key is already part of the DATA submessage. This argument is incorrect, the DATA message may optionally include a KeyHash which is not necessarily the same as the key. Moreover this makes the ParticipantMessageData special in the way it is serialized. The saving in bandwidth consumed do not justify all this "special code". It would be far better if the ParticipantMessageData was treated as any regular data message and the key was serialized along. Resolution: Change the description in Section 9.6.2.1 to not refrain from serializing the key of ParticipantMessageData.
Some submessages can optionally include a payload. For example the DATA submessage includes a payload in most cases, except in some special cases where the submessage indicates unregistration and/or disposal of the data. Currently the presence of the payload is tied to the value of the unregister and dispose flags. This is inflexible and also does not support some use-cases where it is desirable to send data with dispose messages. It would be much better to make the presence of the Data Payload separately configurable in the submessage. Resolution: Introduce a new submessage element that represents a general submessage payload whose type is specified by submessage meta-information
The submessages required to send Keyed and Unkeyed data are mapped in the PSM as two different submessages (DATA and NOKEY_DATA). These submessages have almost identical content so there is no reason to use two submessages for this. Furthermore having two different submessages at the PSM level and exposing the KeyHash as part of the submessage introduces problems with regards to the implementation and the extensibility of the protocol. For example, future versions of the protocol may want to support keyhashes that are either bigger or smaller than the 16 Bytes used today, and do that in an inter-operable manner without breaking backwards compatibility. They may also want to omit sending a key hash when the size of the key itself is small. As the key is always included as part of the data, sending the KeyHash for small keys is inefficient at best. This could be resolved by combining the two PSM submessages and moving the KeyHash into the InlineQos. Resolution: Combine the DATA and NOKEY_DATA submessages. Also combine the DATA_FRAG and NOKEY_DATA_FRAG. Include additional fields in these submessages to enable extensibility for future submessage-specific options.
The inclusion of keyHashPrefix and keyHashSuffix fields in DATA and DATA_FRAG submessages should be optional. The keyHash is a DDS-level optimization whose interpretation by RTPS is unnecessary. Hence, it is logically better to reassign it as a parameter that may be included inline. Resolution: Replace mentioned instances of keyHashPrefix/Suffix with keyHash inline parameter.
Summary: The statusInfo field of DATA submessage holds flags relating DDS-level concepts that are interpreted at higher layers than RTPS. Hence, it is not necessary to have statusInfo as an explicit field; rather, it can be included in the inlineQos SubmessageElement. Resolution: Replace all instances of the statusInfo field of DATA with the new statusInfo inline parameter.
Summary: The set of proposed changes necessitates updating the protocol version. Resolution: Change protocol version from 2.0 to 2.1.
Summary: Section 9.4.5.1.1 leaves unspecified the list of vendor IDs Resolution: Modify section to include a reference to an appendix or table where all the vendor IDs are listed.
Summary: The specification does not describe how the KeyHash is computed from the Key. This implicates that key hashes in messages coming from different RTPS implementations can never be interpreted, because one implementation may utilize a different key hash algorithm then the other. I would prefer that the hash algorithm becomes part of the specification. RTPS implementations can choose to implement the prescribed algorithm or simply use a zero-valued key. Resolution: The UDP PIM should mandate that the KeyHash is computed either as the serialized key or else as an MD5 digest, depending on whether the serialize key for the type can exceed the 128 that are used for the KeyHash.
In RTPS each entity has a so-called GUID_t. It consists of a 12 Byte GuidPrefix_and a 4-Byte EntityId_t and must be globally unique. In heterogeneous systems (with multiple RTPS implementations), the specification provides no mechanism or guidance to ensure global uniqueness. A solution would be to make the vendorId part of the GuidPrefix_t. Resolution: The first two bytes of the GUID_t prefix should be the vendor id.