Issues for Fault Tolerance FTF Mailing List

To comment on any of these issues, send email to ft-ftf@omg.org. (Please include the issue number in the Subject: header, thusly: [Issue ###].) To submit a new issue, send email to issues@omg.org.

List of issues (green=resolved, yellow=pending Board vote, red=unresolved)

List options: All ; Open Issues only; or Closed Issues only

Issue 3920: FT-FTF Issue: Request more powerful property management
Issue 3921: FT-FTF Issue: Intelligent factory selection

Issue 3920: FT-FTF Issue: Request more powerful property management (ft-ftf)

Click here for this issue's archive.
Source: AT&T (Mr. Robert E. Gruber, )
Nature: Uncategorized Issue
Severity:
Summary:
There are 3 problems w.r.t. property management which I will list
together since a solution could/should address all of them.  I will
send a proposed solution in another message.

---------------------------------------------------------------
** Problem A **

For sequence-valued properties, there should be a way to add or remove
one or more elements from a sequence value without having to resort to
using more than one method call.

Notes:

Currently, one has to 'get' the current sequence value, modify it, and
then 'set' the sequence value.  This results in a nasty race
condition: it is not safe to have independent threads of control doing
get-set combinations.

A simple solution is to have new methods for element update.  A
complex solution is to allow any number of updates, including property
set, element addition, element removal, for default and type and
dynamic, all to be grouped and sent to the manager in one
property_update request.  For example, one might want to group the
removal of a property P from type T and the addition of property P as
a default property.

---------------------------------------------------------------
** Problem B **

The property management interface has insufficient power.

Notes:

One can query or update over a single type or a single object group,
but not over a set of types or a set of object groups.  Further, one
cannot:

  * get a list of types that have
        + at least 1 property defined;
        + specific propert[y|ies] defined;
        + at least 1 factory at specific location[s];
  [ or modify properties for the specified types ]

  * get a list of existing object groups that have
        + specific propert[y|ies] defined;
        + specific type[s];
        + an active replica at specific location[s];
  [ or modify properties for the specified object groups ]

  * get a list of active replicas that have
        + specific type[s];
        + specific location[s];
  [ or modify properties for the specified replicas ]

  * get a list of locations that have
        + at least 1 property defined;
        + specific propert[y|ies] defined;
        + at least 1 active replica;
        + an active replica for specific object group[s];
        + an active replica of specific type[s];
  [ or modify properties for the specified locations]

  * other query/update cases that should be supported?

---------------------------------------------------------------
** Problem C **

The property management interface does not sufficiently distinguish
between high-level FT QoS properties used to manage entire object
groups and low-level object construction properties used to
select factories and create individual replicas.

Notes:

High-level QoS properties change infrequently, and never differ across
replicas.  Low-level construction properties change more frequently as
factories are created/destroyed/lost, and they do differ across
locations/replicas (different factories, different criteria).

In each case, one must distinguish between properties for an existing
object group and properties to be used for future object groups.  Even
for replica construction properties, one should be able to assign a
different set of locations/factories to be used for new replica
creation for existing object group[s] and for future object groups.

Currently, low-level properties are buried in a single value that is
stored with a single property (FactoryInfos), either for a specific
type or for a specific object group.  This makes it very hard to do
lookup or modification of these properties by location or by the pair
type x location or object group x location.  To replace an Info for a
single location one must replace the entire Infos sequence.  Even with
the ability to add/remove a member of a sequence, to replace either
the factory or the criteria within a given Info one would have to
remove the current Info and replace it with a new Info, where the Info
would need to contain a copy of the part(s) that are not to be
modified together with the modified part.

BTW I am leaning towards splitting the PropertyManager into a
GroupQoSManager and a FactoryManager, but other approaches are
possible.  One argument for the split is that it seems to make sense
for a FactoryManager to monitor the liveness of registered factories
and to provide logic for selecting an appropriate factory and
associated criteria for construction of a new replica for a given
group or type.  In contrast, it does not make sense for a generic
property manager to do monitoring (or to know anything about the
values stored in properties).

Resolution:
Revised Text:
Actions taken:
September 28, 2000: received issue

Discussion:
This issue was determined to be out-of-scope of the Fault Tolerant CORBA Finalization Task Force.


Issue 3921: FT-FTF Issue: Intelligent factory selection (ft-ftf)

Click
here for this issue's archive.
Source: AT&T (Mr. Robert E. Gruber, )
Nature: Uncategorized Issue
Severity:
Summary:
An FT-FTF (Fault Tolerance Finalization Task Force) Issue:

GOAL: Introduce intelligent factory selection.

On a single machine (perhaps an N-way multiprocessor, but even for a
uniprocessor) one might want to have N factories, corresponding to N
processes that will have replicas created in them.  Ideally, only one
replica for a given group should be started for the entire machine.
Similarly, if one has several subnets, one might have factories on all
machines, but ideally only one replica should be started per subnet,
if appropriate factories are available.  If the only factories
available for a given type happen to be on the same subnet or same
machine, then it should be possible to specify either that it is OK to
go ahead with replicas on the same subnet or same machine or it is not
OK.  Alternatively, I might want all replicas to be on the same
subnet, if possible, to reduce coordination costs, while still
wanting a different hosts requirement.

How to extend the specification to enable this feature?

One proposal is to take advantage of the fact that location names are
structured.  While any structuring is allowed, we could declare that
if you want to use an intelligent factory selection policy you must
use names that capture the hierarchical nature of fault domains.
E.g., for my scenario I could use names that capture
subnet/host/processor distinctions:

sA.h1.p1, sA.h1.p2, sA.h2.p1, sA.h2.p2, ... sA.hJ.p1, sA.hJ.p2
sB.h1.p1, sB.h1.p2, sB.h2.p1, sB.h2.p2, ... sB.hK.p1, sB.hK.p2

I believe there should be a LocationHints property for types or groups
that is distinct from the issue of how many actual locations have
available factories, where hints are like location names but can have
wildcards.  Thus, I could specify sA.*.* and sB.*.* as LocationHints
for type T to indicate that I prefer replicas for type T to be started
on machines on subnets sA and sB.  Note that this is very different
from giving a list of specific locations.  (I certainly do not want to
specify which processor number to use!)  While the set of available
factories might change frequently, the hints should be relatively
stable.

Assume that as factories are created at specific locations (such as a
new factory F1 at location sA.h3.p1) they could be registered with a
FactoryManager.  This manager knows all the location names that have
factories registered for a given group or object type.  One algorithm
to select a location, given a set of existing replica locations and
possibly some location hints, is to choose a location name that
matches one of the hints and has the greatest difference from the
existing names, where a difference in the i'th part of a name
dominates a difference in the j'th part of the name.

Alternative algorithms are possible, e.g., one might prefer to keep
replica groups in the same subnet but on different machines, which
corresponds to a rule that says equality of the first part of the
name is the primary determinant, while for positions 2 and on, use the
greatest difference rule above.

We could have a QoS property called FactorySelectionPolicy which is a
string and have some predefined algorithms (+ algorithm names).
Vendors could define additional algorithms.

An alternative to having a fixed number of predefined algorithms is to
introduce a means of describing a whole class of algorithms.  Here is
one approach.

For a given part, one of 5 requirements holds:
   . NC : no constraint
   . EB : equality best, inequality allowed
   . ER : equality required
   . DB : difference best, equality allowed
   . DR : difference required

A policy string is a sequence of <requirement> specs separated by dots
(".").  Each requirement applies to the part at the given location,
while the final <requirement> applies to the part at its location and
all subsequent locations.  E.g., the spec ER.DB.DR requires equality
for part 1, prefers difference for part 2 (but not required), and
requires difference for all remaining parts (3, 4, ... ).  

DR/ER constraints have higher priority than DB/EB constraints (all
DR/ER constraints must be met).

When there are optional constraints, a solution that satisfies an
earlier optional constraint has priority over a solution that
satisfies a later optional constraint.  This is true regardless of how
many optional constraints can be satisfied, e.g., satisfying the first
optional constraint but not the second or third has priority over
satisfying both the second and third optional constraint but not the
first.  The reverse ordering (favoring later optional constraints over
earlier ones) can be selected by adding a less-than ("<") sign at the
end of the policy string.

For solutions that satisfy the same earliest (or latest in the case of
"<") optional constraint, solutions that satisfy more optional
constraints have priority over solutions that satisfy fewer optional
constraints.  This rule can be overridden by adding "MIN:" as a prefix
to the policy string (indicating that the minimal number of optional
constraints should be met --- i.e., at least one optional constraint
should be met, if possible, but beyond this, solutions that satisfy
the fewest additional optional constraints are favored).

The resulting location selection policy implicitly includes a final
global constraint: the locations chosen for a given group must be
unique.

N.B. When locations have a different number of parts, EB and DB
requirement are ignored for missing part locations, while if
one location has a part but another does not, this 
satisfies the DR requirement and fails the ER requirement.

Some example selection policies:

[1] NC

  No part is constrained.  Due to the implicit global
  constraint, NC selects unique locations,
  but selection is otherwise random.

[2] DR

  *Every* part must differ.  This policy is not
  often used; it is more common to follow one or more
  DR constraints with some optional constraints
  or with NC, as in the next example.

[3] DR.NC

  The first part must differ, while there are no
  constraints on the other parts.

[4] DB

  A difference is best for each part, but not required
  for any given part.  The result is a selection algorithm
  that attempts to find a difference in the earliest
  possible part.  When several locations differ
  starting at the same earliest part, the algorithm favors
  selecting locations that differ in as many subsequent
  parts as possible.

[5] MIN:DB

  Like DB, except when several locations differ
  starting at the same earliest part, the algorithm favors
  selecting locations that differ in as few subsequent
  parts as possible.

[6] DB<

  Like DB, except the algorithm favors
  locations that differ in the latest possible part.

[7] EB

  Equality is best for every part, but not required
  for any part.  The result is a selection algorithm
  that attempts to find equality in the earliest
  possible part.  When several locations are
  equal starting at the same earliest part, the algorithm favors
  selecting locations that are equal in as many subsequent
  parts as possible.

[8] ER.DB

  Equality of the first part required, while differences
  in other parts are preferred but not required, with
  earlier optional differences dominating later ones.

[9] EB.DB

  Equality of the first part is preferred, while differences
  in other parts are preferred but not required, with
  earlier optional differences dominating later ones
  (EB dominates DB and earlier DB differences dominate
  later ones).

Consider the subnet.host.processor location naming scheme.

+ DR.NC would choose a different subnet for each replica
  and otherwise choose an arbitrary factory in each subnet.

+ EB.DB would choose the same subnet for all replicas,
  if possible, but if necessary would use different
  subnets.  For locations in the same subnet,
  it would attempt to use different hosts and different
  processors, with higher priority given to using
  different hosts.

+ EB.EB.DB< would attempt to find locations that differ
  in the processor part but have the same host and
  subnet, where the processor difference has highest
  priority, host equality has next highest priority, and
  subnet equality has least priority.  This would tend to
  cluster replicas as close together as possible, optimizing
  coordination cost while sacrificing some reliability.

+ MIN:DB< has the same effect as EB.EB.DB< :
  it specifies minimal DB matches (beyond 1 match)
  with priority given to later parts over earlier ones.
  MIN:DB< has the advantage that it works with locations
  of any length, while EB.EB.DB< is only useful for
  locations of length 3.

Resolution:
Revised Text:
Actions taken:
September 28, 2000: received issue

Discussion:
This issue was determined to be out-of-scope of the Fault Tolerant CORBA Finalization Task Force.