Issue 2772: Potential deadlock with POA::deactivate_object() (corba-rtf)
Source:  (, )
Nature: Uncategorized Issue
Severity: 
Summary: The draft CORBA 2.3 spec (ptc/99-03-07) does not deal with a potential deadlock situation. If an object is explicitly deactivated with POA::deactivate_object(), the object remains in the active object map until all operations pending on the object have completed. Any attempts to reactivate the object (implicitly via a ServantActivator, or explicitly via activate_object_with_id()) must block until the pending invocations have completed. However, if a servant's implementation of an object deactivates the object and then (directly or indirectly through a call to another collocated object) reactivates the object, the invocation will deadlock. 

Resolution: Deferred to next RTF
Revised Text: 
Actions taken:
June 28, 1999: received issue
April 11, 2012: Deferred
Discussion: 

End of Annotations:=====
Sender: jon@floorboard.com
Message-ID: <381BDE49.7E91D47C@floorboard.com>
Date: Sat, 30 Oct 1999 23:14:33 -0700
From: Jonathan Biggar <jon@floorboard.com>
X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m)
X-Accept-Language: en
MIME-Version: 1.0
To: orb_revision@omg.org
Subject: Issue 2772 discussion
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: X7Pe97OA!!^7'!!9Z=!!

I've given some thought to this issue, and the only solution I can
come
up with is to allow the POA to cancel the deactivation of an object if
it needs to be activated again before the activation is complete.
This
changes the deadlock into a potential livelock situation instead.

This change would be user-visible however, so I thought that I'd bring
it up and see what everyone thought.

Here is the information about the issue:


Issue 2772: Potential deadlock with POA::deactivate_object()
(orb_revision)

Summary:

The draft CORBA 2.3 spec (ptc/99-03-07) does not deal with a potential
deadlock situation. If an object is explicitly deactivated with
POA::deactivate_object(), the object remains in the active object map
until all operations pending on the object have completed. Any
attempts
to reactivate the object (implicitly via a ServantActivator, or
explicitly via activate_object_with_id()) must block until the pending
invocations have completed. However, if a servant's implementation of
an
object deactivates the object and then (directly or indirectly through
a
call to another collocated object) reactivates the object, the
invocation will deadlock. 

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org
From: Paul Kyzivat <paulk@roguewave.com>
To: "'jon@floorboard.com'" <jon@floorboard.com>,
        "'orb_revision@omg.org'"
	 <orb_revision@omg.org>
Subject: RE: Issue 2772 discussion
Date: Mon, 1 Nov 1999 09:07:05 -0500 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain;
	charset="iso-8859-1"
X-UIDL: %/5e9O-"e9S#?!!gi'!!


> From: jon@floorboard.com [mailto:jon@floorboard.com]

> I've given some thought to this issue, and the only solution 
> I can come
> up with is to allow the POA to cancel the deactivation of an object
> if
> it needs to be activated again before the activation is 
> complete.  This
> changes the deadlock into a potential livelock situation instead.
> 
> This change would be user-visible however, so I thought that I'd
> bring
> it up and see what everyone thought.

Can you explain a bit more about your potential solution?
(I am concerned the cure may be worse than the disease.)

Would this only work if an activation was attempted with the same
servant
and ID?

Does this mean that a ServantActivator would need to be permitted to
execute
to completion, and then if it happened to be for an ObjectId in the
process
of being activated, and if the servant happened to be the one being
deactivated for that ID, then the deactivation would be cancelled?

(I would like the option to stall calls to servant activators in this
case
rather than let them complete and stall acting on their result.)

This seems to present many complex new implementation challenges, and
possibly new race conditions. (E.g., What if we run a ServantActivator
under
the above situation but it returns a different servant than the one
being
deactivated?)
Sender: jon@floorboard.com
Message-ID: <381DD4A9.6618B4E6@floorboard.com>
Date: Mon, 01 Nov 1999 09:58:02 -0800
From: Jonathan Biggar <jon@floorboard.com>
X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m)
X-Accept-Language: en
MIME-Version: 1.0
To: Paul Kyzivat <paulk@roguewave.com>
CC: "'orb_revision@omg.org'" <orb_revision@omg.org>
Subject: Re: Issue 2772 discussion
References: <9B164B713EE9D211B6DC0090273CEEA9140187@bos1.noblenet.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: m+o!!j$j!!gMC!!^?Qd9

Paul Kyzivat wrote:
> > I've given some thought to this issue, and the only solution
> > I can come
> > up with is to allow the POA to cancel the deactivation of an
> object if
> > it needs to be activated again before the activation is
> > complete.  This
> > changes the deadlock into a potential livelock situation instead.
> >
> > This change would be user-visible however, so I thought that I'd
> bring
> > it up and see what everyone thought.
> 
> Can you explain a bit more about your potential solution?
> (I am concerned the cure may be worse than the disease.)
> 
> Would this only work if an activation was attempted with the same
> servant
> and ID?
> 
> Does this mean that a ServantActivator would need to be permitted to
> execute
> to completion, and then if it happened to be for an ObjectId in the
> process
> of being activated, and if the servant happened to be the one being
> deactivated for that ID, then the deactivation would be cancelled?
> 
> (I would like the option to stall calls to servant activators in
> this case
> rather than let them complete and stall acting on their result.)
> 
> This seems to present many complex new implementation challenges,
> and
> possibly new race conditions. (E.g., What if we run a
> ServantActivator under
> the above situation but it returns a different servant than the one
> being
> deactivated?)

I suppose I wasn't clear. :-)

In my idea, cancelling the deactivation of the object would only be
allowed to occur before etherealize() was called.  At that point, we
already know that no outstanding invocations are pending, so the
original race condition is no longer possible.

So this leaves the situation where deactivate_object() has been
called,
and we have not yet called etherealize() (or we have
USE_ACTIVE_OBJECT_MAP_ONLY).  In this case, I'd like to consider the
ramifications of allowing the POA to cancel the deactivation and
reactivating the servant to handle new incoming requests.

The deactivate_object() call is made to handle two conditions:  either
the object is being destroyed or it is just being deactivated pending
future activation.  In the former case, a thread-aware object must
already have an internal flag that marks the object as "destroyed" so
that it can raise OBJECT_NOT_EXIST on any pending requests during the
destruction process.  So rather than structuring operation
implementations like this:

void _fooImpl::foo_op()
{
    Guard	lock(mutex);
    
    if (destroyed)
        throw CORBA::OBJECT_NOT_EXIST();

    ...
}

the code looks like this instead:

void _fooImpl::foo_op()
{
    Guard	lock(mutex);

    if (destroyed) {
	try {
	    deactivate_object(my_oid);	// need to tell POA to
deactivate the
object
					// again!
	}
	catch (...) {
	    // do nothing!
	}
	throw CORBA::OBJECT_NOT_EXIST();
    }
}

If, however, deactivate_object() is called just to "passivate" the
object, then cancelling the deactivation might have no affect on the
servant implementation code.
Since a thread-aware object must always code defensively, assuming
that
there are other pending requests when deactivate_object() is called,
it
already needs to delay any clean-up code to the etherealize() or
servant
destructor (when using reference counting on the servant).

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

From: Paul Kyzivat <paulk@roguewave.com>
To: "'jon@floorboard.com'" <jon@floorboard.com>
Cc: "'orb_revision@omg.org'" <orb_revision@omg.org>
Subject: RE: Issue 2772 discussion
Date: Mon, 1 Nov 1999 15:18:08 -0500 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain;
	charset="iso-8859-1"
X-UIDL: JcW!!e?'e9k[I!!LI9e9

> From: jon@floorboard.com [mailto:jon@floorboard.com]
> 
> I suppose I wasn't clear. :-)
> 
> In my idea, cancelling the deactivation of the object would only be
> allowed to occur before etherealize() was called.  At that point, we
> already know that no outstanding invocations are pending, so the
> original race condition is no longer possible.

So, if a request for the object comes in between deactivation and
etherealization, you would just cancel the etherealization, reinstate
the
activation, and let the operation go through, without calling
incarnate
again?

> 
> So this leaves the situation where deactivate_object() has 
> been called,
> and we have not yet called etherealize() (or we have
> USE_ACTIVE_OBJECT_MAP_ONLY).  In this case, I'd like to consider the
> ramifications of allowing the POA to cancel the deactivation and
> reactivating the servant to handle new incoming requests.
> 
> The deactivate_object() call is made to handle two conditions:
> either
> the object is being destroyed or it is just being deactivated
> pending
> future activation.  In the former case, a thread-aware object must
> already have an internal flag that marks the object as "destroyed"
> so
> that it can raise OBJECT_NOT_EXIST on any pending requests during
> the
> destruction process.  

I think you have overlooked another implementation approach:

When I want to destroy the object I deactivate it.
With any further attempt to use the object, the ServantActivator will
simply
refuse to do so, causing the client to get OBJECT_NOT_EXIST.

The servant is simpler, because the object exists as long as the
servant
exists (or can be incarnated). It is the servant activator, which is
involved with persistence, etc. that is the actual gatekeeper of
existence.

To me this is a natural implementation approach. But it means that
once I
have decided I want to deactivate the servant, because I want to
destroy the
object, I don't want my decision overturned by the POA. (I may not
have any
other natural hook to cause the deactivation at a later time.)

I think this cure is indeed worse than the disease. I have several
servers
where a request to destroy an object is accomplished by deactivating
it,
with the majority of the logic happening in etherealize. This change
would
break them - anybody sending a request in the window just following
the
destroy call would in effect cancel the destroy.

> So rather than structuring operation
> implementations like this:
> 
> void _fooImpl::foo_op()
> {
>     Guard	lock(mutex);
>     
>     if (destroyed)
>         throw CORBA::OBJECT_NOT_EXIST();
> 
>     ...
> }
> 
> the code looks like this instead:
> 
> void _fooImpl::foo_op()
> {
>     Guard	lock(mutex);
> 
>     if (destroyed) {
>	try {
>	    deactivate_object(my_oid);	// need to tell POA to 
> deactivate the
> object
>					// again!
>	}
>	catch (...) {
>	    // do nothing!
>	}
>	throw CORBA::OBJECT_NOT_EXIST();
>     }
> }
> 
> If, however, deactivate_object() is called just to "passivate" the
> object, then cancelling the deactivation might have no affect on the
> servant implementation code.
> Since a thread-aware object must always code defensively, 
> assuming that
> there are other pending requests when deactivate_object() is 
> called, it
> already needs to delay any clean-up code to the etherealize() 
> or servant
> destructor (when using reference counting on the servant).
Sender: jon@floorboard.com
Message-ID: <381E02A6.D4EBD5A1@floorboard.com>
Date: Mon, 01 Nov 1999 13:14:14 -0800
From: Jonathan Biggar <jon@floorboard.com>
X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m)
X-Accept-Language: en
MIME-Version: 1.0
To: Paul Kyzivat <paulk@roguewave.com>
CC: "'orb_revision@omg.org'" <orb_revision@omg.org>
Subject: Re: Issue 2772 discussion
References: <9B164B713EE9D211B6DC0090273CEEA914018A@bos1.noblenet.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: nDS!!IPZd9WSpd9]83e9

Paul Kyzivat wrote:
> 
> > From: jon@floorboard.com [mailto:jon@floorboard.com]
> >
> > I suppose I wasn't clear. :-)
> >
> > In my idea, cancelling the deactivation of the object would only
> be
> > allowed to occur before etherealize() was called.  At that point,
> we
> > already know that no outstanding invocations are pending, so the
> > original race condition is no longer possible.
> 
> So, if a request for the object comes in between deactivation and
> etherealization, you would just cancel the etherealization,
> reinstate the
> activation, and let the operation go through, without calling
> incarnate
> again?

That's the concept.

> >
> > So this leaves the situation where deactivate_object() has
> > been called,
> > and we have not yet called etherealize() (or we have
> > USE_ACTIVE_OBJECT_MAP_ONLY).  In this case, I'd like to consider
> the
> > ramifications of allowing the POA to cancel the deactivation and
> > reactivating the servant to handle new incoming requests.
> >
> > The deactivate_object() call is made to handle two conditions:
> either
> > the object is being destroyed or it is just being deactivated
> pending
> > future activation.  In the former case, a thread-aware object must
> > already have an internal flag that marks the object as "destroyed"
> so
> > that it can raise OBJECT_NOT_EXIST on any pending requests during
> the
> > destruction process.
> 
> I think you have overlooked another implementation approach:
> 
> When I want to destroy the object I deactivate it.
> With any further attempt to use the object, the ServantActivator
> will simply
> refuse to do so, causing the client to get OBJECT_NOT_EXIST.

Umm, how does this work?  The ServantActivator can't get involved
until
etherealize() is called, at which point deactivation is safe anyway.

> The servant is simpler, because the object exists as long as the
servant
> exists (or can be incarnated). It is the servant activator, which is
> involved with persistence, etc. that is the actual gatekeeper of
existence.
> 
> To me this is a natural implementation approach. But it means that
once I
> have decided I want to deactivate the servant, because I want to
destroy the
> object, I don't want my decision overturned by the POA. (I may not
have any
> other natural hook to cause the deactivation at a later time.)
> 
> I think this cure is indeed worse than the disease. I have several
servers
> where a request to destroy an object is accomplished by deactivating
it,
> with the majority of the logic happening in etherealize. This change
would
> break them - anybody sending a request in the window just following
the
> destroy call would in effect cancel the destroy.

If youre servant is thread-aware, just calling deactivate_object()
isn't
sufficient, since there may be other pending requests outstanding when
the "destroy" occurs.  How do you prevent those pending requests from
seeing stale information?  The only safe way is to have a destroy
state
flag in the servant anyway.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org
From: Paul Kyzivat <paulk@roguewave.com>
To: "'Jonathan Biggar'" <jon@floorboard.com>,
        Paul Kyzivat
	 <paulk@roguewave.com>
Cc: "'orb_revision@omg.org'" <orb_revision@omg.org>
Subject: RE: Issue 2772 discussion
Date: Mon, 1 Nov 1999 18:09:45 -0500 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain;
	charset="iso-8859-1"
X-UIDL: ?~L!!Mg!!!@Oed985\d9

> If youre servant is thread-aware, just calling 
> deactivate_object() isn't
> sufficient, since there may be other pending requests outstanding
> when
> the "destroy" occurs.  How do you prevent those pending requests
> from
> seeing stale information?  The only safe way is to have a 
> destroy state flag in the servant anyway.

The idea is that all the request to destroy does is call deactivate.
At that point the destruction is "in the pipeline". Any other
operations
already in progress go on to completion. (After all, that is what
letting
operations complete before etherealization is all about.)

They don't see any stale information because nothing is state, yet.

When the etherealize finally happens, it does the actual work of
destruction. (This is really easy for objects that aren't persistent,
and
for which incarnate always fails.)
Sender: jon@floorboard.com
Message-ID: <381E1F73.4FCD369F@floorboard.com>
Date: Mon, 01 Nov 1999 15:17:07 -0800
From: Jonathan Biggar <jon@floorboard.com>
X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m)
X-Accept-Language: en
MIME-Version: 1.0
To: Paul Kyzivat <paulk@roguewave.com>
CC: "'orb_revision@omg.org'" <orb_revision@omg.org>
Subject: Re: Issue 2772 discussion
References: <9B164B713EE9D211B6DC0090273CEEA9140191@bos1.noblenet.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: 5+?!!gfZd9pI&e9b~L!!

Paul Kyzivat wrote:
> 
> > If youre servant is thread-aware, just calling
> > deactivate_object() isn't
> > sufficient, since there may be other pending requests outstanding
> when
> > the "destroy" occurs.  How do you prevent those pending requests
> from
> > seeing stale information?  The only safe way is to have a
> > destroy state flag in the servant anyway.
> 
> The idea is that all the request to destroy does is call deactivate.
> At that point the destruction is "in the pipeline". Any other
> operations
> already in progress go on to completion. (After all, that is what
> letting
> operations complete before etherealization is all about.)
> 
> They don't see any stale information because nothing is state, yet.
> 
> When the etherealize finally happens, it does the actual work of
> destruction. (This is really easy for objects that aren't
> persistent, and
> for which incarnate always fails.)

Unless the other pending operations won't change in behavior due to
the
fact that the object is "destroyed", you have a race condition.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

From: Paul Kyzivat <paulk@roguewave.com>
To: "'Jonathan Biggar'" <jon@floorboard.com>,
        Paul Kyzivat
	 <paulk@roguewave.com>
Cc: "'orb_revision@omg.org'" <orb_revision@omg.org>
Subject: RE: Issue 2772 discussion
Date: Tue, 2 Nov 1999 08:19:43 -0500 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain;
	charset="iso-8859-1"
X-UIDL: #"$!!(em!!F*=!!!R2e9


> From: Jonathan Biggar [mailto:jon@floorboard.com]
...
> Unless the other pending operations won't change in behavior 
> due to the
> fact that the object is "destroyed", you have a race condition.

Yes, but it fits a number of reasonable situations.

Let's put it another way. How would you go about implementing the
following
use case:

You have a factory object. Clients contact it, and perform some
operation
that creates a transient object for them to use. They use the
transient
object for some period of time, and then, when done with it, destroy
it with
an explicit call. The transient object has state while it exists, and
that
state should go away when it is destroyed.

Sender: jon@floorboard.com
Message-ID: <381F1089.CD0F6F1A@floorboard.com>
Date: Tue, 02 Nov 1999 08:25:45 -0800
From: Jonathan Biggar <jon@floorboard.com>
X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m)
X-Accept-Language: en
MIME-Version: 1.0
To: Paul Kyzivat <paulk@roguewave.com>
CC: "'orb_revision@omg.org'" <orb_revision@omg.org>
Subject: Re: Issue 2772 discussion
References: <9B164B713EE9D211B6DC0090273CEEA9140192@bos1.noblenet.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
X-UIDL: #JKd9c)$!!d"5!!L~c!!

Paul Kyzivat wrote:
> 
> > From: Jonathan Biggar [mailto:jon@floorboard.com]
> ...
> > Unless the other pending operations won't change in behavior
> > due to the
> > fact that the object is "destroyed", you have a race condition.
> 
> Yes, but it fits a number of reasonable situations.
> 
> Let's put it another way. How would you go about implementing the
> following
> use case:
> 
> You have a factory object. Clients contact it, and perform some
> operation
> that creates a transient object for them to use. They use the
> transient
> object for some period of time, and then, when done with it, destroy
> it with
> an explicit call. The transient object has state while it exists,
> and that
> state should go away when it is destroyed.

The way I implement most thread-aware objects:  with an extra state
flag
that indicates whether the object is destroyed or not.  All methods
check the flag first and raise OBJECT_NOT_EXIST.  Then I allow the
usual
cleanup methods (etherealize() or _remove_ref()) to clean up the
object
once all of the operations have drained.

If you have a race condition, you have one.  I don't see any
"reasonable" situations if the servant is required to maintain state
that doesn't include the race condition.

-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org
From: Paul Kyzivat <paulk@roguewave.com>
To: "'Jonathan Biggar'" <jon@floorboard.com>,
        Paul Kyzivat
	 <paulk@roguewave.com>
Cc: "'orb_revision@omg.org'" <orb_revision@omg.org>
Subject: RE: Issue 2772 discussion
Date: Tue, 2 Nov 1999 14:43:33 -0500 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain;
	charset="iso-8859-1"
X-UIDL: *an!!^E"!!+;""!U1He9

> From: Jonathan Biggar [mailto:jon@floorboard.com]
...
> The way I implement most thread-aware objects:  with an extra 
> state flag
> that indicates whether the object is destroyed or not.  All methods
> check the flag first and raise OBJECT_NOT_EXIST.  Then I 
> allow the usual
> cleanup methods (etherealize() or _remove_ref()) to clean up 
> the object
> once all of the operations have drained.
> 
> If you have a race condition, you have one.  I don't see any
> "reasonable" situations if the servant is required to maintain state
> that doesn't include the race condition.

While there are situations where it is unavoidable, I usually search
for
ways to avoid requiring that sort of checking at the beginning of
every
method - it is both a pain and error prone.

Being thread-safe & thread-aware may require a mutex, but that is a
whole
separate issue and is easily dealt with while generally ignoring
destruction
on a method-by-method basis.

There are plenty of cases where this is not a problem. Consider for
example,
a BindingIterator. It can be destroyed at any time by removing its
state and
refusing to reincarnate it. But until that happens, anybody who
happens to
access it will get some meaningful result. 
Date: Tue, 9 Nov 1999 15:22:47 +1000 (EST)
From: Michi Henning <michi@ooc.com.au>
X-Sender: michi@bobo.triodia.com
To: Jonathan Biggar <jon@floorboard.com>
cc: Jishnu Mukerji <jis@fpk.hp.com>, orb_revision@omg.org
Subject: Issue 2772
In-Reply-To: <382794F0.96ECCA8A@floorboard.com>
Message-ID:
<Pine.HPX.4.05.9911091522000.18757-100000@bobo.triodia.com>
Organization: Object Oriented Concepts
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: 3*U!!%E@!!#39!!k`Y!!

On Mon, 8 Nov 1999, Jonathan Biggar wrote:
> > 2772 - Jon
> 
> I sent out a "discussion point" proposal on 10/30 on this one, but
> only
> Paul responded.  Without more discussion, I don't feel comfortable
> making the proposal.

I am happy to go with Jon's suggestion. It seems to be the only
feasible
way to deal with the problem.

							Cheers,

								Michi.
--
Michi Henning               +61 7 3236 1633
Object Oriented Concepts    +61 4 1118 2700 (mobile)
PO Box 372                  +61 7 3211 0047 (fax)
Annerley 4103               michi@ooc.com.au
AUSTRALIA
http://www.ooc.com.au/staff/michi-henning.html

From: Paul Kyzivat <paulk@roguewave.com>
To: "'Michi Henning'" <michi@ooc.com.au>,
        "'Jonathan Biggar'"
	 <jon@floorboard.com>
Cc: "'Jishnu Mukerji'" <jis@fpk.hp.com>,
        "'orb_revision@omg.org'"
	 <orb_revision@omg.org>
Subject: RE: Issue 2772
Date: Tue, 9 Nov 1999 11:33:25 -0500 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain;
	charset="iso-8859-1"
X-UIDL: m\g!!jJ\d9((Qe9m_^d9


> From: Michi Henning [mailto:michi@ooc.com.au]
...
> On Mon, 8 Nov 1999, Jonathan Biggar wrote:
...
> > I sent out a "discussion point" proposal on 10/30 on this 
> one, but only
> > Paul responded.  Without more discussion, I don't feel comfortable
> > making the proposal.
> 
> I am happy to go with Jon's suggestion. It seems to be the 
> only feasible way to deal with the problem.

I was surprised that nobody else joined the discussion.
I feel strongly that this solution is a bad one.
If I have a servant activator, and I call deactivate_object, I have a
reason
to expect that in due course my etherealize routine will be called.
Otherwise I am being forced into an implementation approach that sets
a
"deactivate_requested" state into my servant and checks it on every
call.

Jon may like to code things that way. He may even think there is no
other
way to do things. But I have implementations that expect the
etherealize,
and I am convinced they work correctly. They don't have problems with
deadlocks because they are designed not to.

It isn't possible to remove all possibility of deadlock without
severely
restricting the programming model. I think it is better to leave some
of the
responsibility for deadlock prevention with the implementor.


Date: Tue, 29 Oct 2002 17:11:48 -0500 
From: Jishnu Mukerji <jishnu@hp.com> 
Organization: Software Global Business Unit, Hewlett-Packard 
X-Mailer: Mozilla 4.79 [en] (Win98; U) 
X-Accept-Language: en 
To: corba-rtf@omg.org 
Subject: Issue 2772 discussion 


http://cgi.omg.org/issues/issue2772.txt


Jon,


What would you like to see happen with this issue. Unless we can come
up
with a workable resolution, I think this is a prime candidate for
closure with a warning that there is the possibility of this deadlock.
What do others think? Does Jon's proposed resolution run afoul of any
of
their implementations?


Thanks,


Jishnu.

Sender: jon@floorboard.com 
Date: Wed, 30 Oct 2002 20:52:34 -0800 
From: Jonathan Biggar <jon@floorboard.com> 
X-Mailer: Mozilla 4.8 [en] (X11; U; SunOS 5.7 sun4u) 
X-Accept-Language: en 
To: Jishnu Mukerji <jishnu@hp.com> 
CC: corba-rtf@omg.org 
Subject: Re: Issue 2772 discussion 


Jishnu Mukerji wrote:
> 
> http://cgi.omg.org/issues/issue2772.txt
> 
> Jon,
> 
> What would you like to see happen with this issue. Unless we can
come up
> with a workable resolution, I think this is a prime candidate for
> closure with a warning that there is the possibility of this
deadlock.
> What do others think? Does Jon's proposed resolution run afoul of
any of
> their implementations?


Ok, I've re-reviewed it and here are my thoughts.


I agree with Paul that there is code that would most likely break if
the
propsal I made was adopted.  However, I think the likelyhood of the
deadlock is just too great to ignore the situation.  It's just far too
easy to implement an object in a way that causes it to invoke
(directly
or indirectly) another object in the same POA.  Add a ServantActivator
and a call to deactivate_object() and the deadlock *will* happen
eventually.  And since the invocation can be indirect through a call
to
another server and back, it is impossible for the object
implementation
to either completely avoid the problem or detect it when it happens.


So, what should we do?


1.  Punt, and leave it like it is.


I dislike this option quite a bit, since it is a timebomb that is
bound
to happen far too often, and restructuring code to avoid it is
difficult
or impossible.


2.  Make the change I proposed.


The advantage is that there is no IDL change, only behavior, but the
likelyhood of code breaking makes this choise unattractive too.


3.  Go with a new brainstorm that I just had. :)


We can protect existing code from breaking and also allow new code to
avoid the deadlock using the method that I proposed originally by
adding
a new policy object that activates the new semantics.  Something like:


module POA {


    const CORBA::PolicyType DEACTIVATION_POLICY_ID = XXX;


    enum DeactivationPolicyValue {
        CAN_NOT_CANCEL_DEACTIVATE,      // HELP!  Pithier names
        wanted!
        CAN_CANCEL_DEACTIVATE
    };


    local interface DeactivationPolicy {
        readonly attribute DeactivationPolicyValue value;
    };


};


with CAN_NOT_CANCEL_DEACTIVATE being the default, which preserves
current semantics.


So, what do people think about this idea?


-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org

Date: Wed, 30 Oct 2002 22:20:11 -0800 (PST) 
From: Ken Cavanaugh <Ken.Cavanaugh@Sun.COM> 
Reply-To: Ken Cavanaugh <Ken.Cavanaugh@Sun.COM> 
Subject: Re: Issue 2772 discussion 
To: jishnu@hp.com, jon@floorboard.com 
Cc: corba-rtf@omg.org 
X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.3.5 SunOS 5.7 sun4u sparc 


>From: Jonathan Biggar <jon@floorboard.com>
>X-Accept-Language: en
>MIME-Version: 1.0
>To: Jishnu Mukerji <jishnu@hp.com>
>CC: corba-rtf@omg.org
>Subject: Re: Issue 2772 discussion
>Content-Transfer-Encoding: 7bit
>
>Jishnu Mukerji wrote:
>> 
>> http://cgi.omg.org/issues/issue2772.txt
>> 
>> Jon,
>> 
>> What would you like to see happen with this issue. Unless we can
>come up
>> with a workable resolution, I think this is a prime candidate for
>> closure with a warning that there is the possibility of this
>deadlock.
>> What do others think? Does Jon's proposed resolution run afoul of
>any of
>> their implementations?
>
>Ok, I've re-reviewed it and here are my thoughts.
>
>I agree with Paul that there is code that would most likely break if
>the
>propsal I made was adopted.  However, I think the likelyhood of the
>deadlock is just too great to ignore the situation.  It's just far
>too
>easy to implement an object in a way that causes it to invoke
>(directly
>or indirectly) another object in the same POA.  Add a
>ServantActivator
>and a call to deactivate_object() and the deadlock *will* happen
>eventually.  And since the invocation can be indirect through a call
>to
>another server and back, it is impossible for the object
>implementation
>to either completely avoid the problem or detect it when it happens.
>
>So, what should we do?
>
>1.  Punt, and leave it like it is.
>
>I dislike this option quite a bit, since it is a timebomb that is
>bound
>to happen far too often, and restructuring code to avoid it is
>difficult
>or impossible.
>
>2.  Make the change I proposed.
>
>The advantage is that there is no IDL change, only behavior, but the
>likelyhood of code breaking makes this choise unattractive too.
>
>3.  Go with a new brainstorm that I just had. :)
>
>We can protect existing code from breaking and also allow new code to
>avoid the deadlock using the method that I proposed originally by
>adding
>a new policy object that activates the new semantics.  Something
>like:
>
>module POA {
>
>    const CORBA::PolicyType DEACTIVATION_POLICY_ID = XXX;
>
>    enum DeactivationPolicyValue {
>       CAN_NOT_CANCEL_DEACTIVATE,      // HELP!  Pithier names
>wanted!
>       CAN_CANCEL_DEACTIVATE
>    };
>
>    local interface DeactivationPolicy {
>       readonly attribute DeactivationPolicyValue value;
>    };
>
>};
>
>with CAN_NOT_CANCEL_DEACTIVATE being the default, which preserves
>current semantics.
>


Pro:
        This fix is really easy to implement, in that all I need to 
        change is one state transition in the AOM entry state machine
        in the Sun POA implementation.


Con:
        This does indeed trade the deadlock for a livelock problem,
        which is probably no real solution either.  Without the cancel
        semantics, a busy, complex system will deadlock in some cases
        when attempting to deactivate a servant; with it, a busy,
        complex
        server will probably never be able to deactivate a servant,
        due to frequent invocations that continually cancel any
        pending etherealization.  At least with the policy we can
        choose which bad behavior we want :-).
        
I guess if this was offered as a resolution, I would probably vote for
it,
but a better solution seems desirable (and no, I haven't think of a
better
solution either).  We could also consider that this issue has been in
the
archives for 3 years, and has probably been an issue since the POA 
spec was finalized.  If this has never been observed as a real problem
in systems after 5 years, perhaps we should not make the fix.
However, if we decide to make no change, it might be worth adding a
note
in the spec describing this deadlock scenario.


Ken.

Date: Mon, 04 Nov 2002 15:32:58 -0500 
From: Jishnu Mukerji <jishnu@hp.com> 
Organization: Software Global Business Unit, Hewlett-Packard 
X-Mailer: Mozilla 4.79 [en] (Win98; U) 
X-Accept-Language: en 
To: Ken Cavanaugh <Ken.Cavanaugh@sun.com> 
Cc: jon@floorboard.com, corba-rtf@omg.org 
Subject: Re: Issue 2772 discussion 


I would prefer to go with the make no change but add a crisp note
describing the possibility of deadlock in the specification and leave
it
at that.


If that seems agreeable, could Jon or Ken please provide a crisp small
paragraph describing the deadlock in a form that can be included as a
note at an appropriate palce in the spec. Identification of the
appropriate place in the spec would be most appreciated too. Please
use
formal/02-06-01 as the abse document.


Thanks,


Jishnu.


Ken Cavanaugh wrote:
> 
> >From: Jonathan Biggar <jon@floorboard.com>
> >X-Accept-Language: en
> >MIME-Version: 1.0
> >To: Jishnu Mukerji <jishnu@hp.com>
> >CC: corba-rtf@omg.org
> >Subject: Re: Issue 2772 discussion
> >Content-Transfer-Encoding: 7bit
> >
> >Jishnu Mukerji wrote:
> >>
> >> http://cgi.omg.org/issues/issue2772.txt
> >>
> >> Jon,
> >>
> >> What would you like to see happen with this issue. Unless we can
come up
> >> with a workable resolution, I think this is a prime candidate for
> >> closure with a warning that there is the possibility of this
deadlock.
> >> What do others think? Does Jon's proposed resolution run afoul of
any of
> >> their implementations?
> >
> >Ok, I've re-reviewed it and here are my thoughts.
> >
> >I agree with Paul that there is code that would most likely break
if the
> >propsal I made was adopted.  However, I think the likelyhood of the
> >deadlock is just too great to ignore the situation.  It's just far
too
> >easy to implement an object in a way that causes it to invoke
(directly
> >or indirectly) another object in the same POA.  Add a
ServantActivator
> >and a call to deactivate_object() and the deadlock *will* happen
> >eventually.  And since the invocation can be indirect through a
call to
> >another server and back, it is impossible for the object
implementation
> >to either completely avoid the problem or detect it when it
happens.
> >
> >So, what should we do?
> >
> >1.  Punt, and leave it like it is.
> >
> >I dislike this option quite a bit, since it is a timebomb that is
bound
> >to happen far too often, and restructuring code to avoid it is
difficult
> >or impossible.
> >
> >2.  Make the change I proposed.
> >
> >The advantage is that there is no IDL change, only behavior, but
the
> >likelyhood of code breaking makes this choise unattractive too.
> >
> >3.  Go with a new brainstorm that I just had. :)
> >
> >We can protect existing code from breaking and also allow new code
to
> >avoid the deadlock using the method that I proposed originally by
adding
> >a new policy object that activates the new semantics.  Something
like:
> >
> >module POA {
> >
> >    const CORBA::PolicyType DEACTIVATION_POLICY_ID = XXX;
> >
> >    enum DeactivationPolicyValue {
> >       CAN_NOT_CANCEL_DEACTIVATE,      // HELP!  Pithier names
wanted!
> >       CAN_CANCEL_DEACTIVATE
> >    };
> >
> >    local interface DeactivationPolicy {
> >       readonly attribute DeactivationPolicyValue value;
> >    };
> >
> >};
> >
> >with CAN_NOT_CANCEL_DEACTIVATE being the default, which preserves
> >current semantics.
> >
> 
> Pro:
>         This fix is really easy to implement, in that all I need to
>         change is one state transition in the AOM entry state
machine
>         in the Sun POA implementation.
> 
> Con:
>         This does indeed trade the deadlock for a livelock problem,
>         which is probably no real solution either.  Without the
cancel
>         semantics, a busy, complex system will deadlock in some
cases
>         when attempting to deactivate a servant; with it, a busy,
complex
>         server will probably never be able to deactivate a servant,
>         due to frequent invocations that continually cancel any
>         pending etherealization.  At least with the policy we can
>         choose which bad behavior we want :-).
> 
> I guess if this was offered as a resolution, I would probably vote
for it,
> but a better solution seems desirable (and no, I haven't think of a
better
> solution either).  We could also consider that this issue has been
in the
> archives for 3 years, and has probably been an issue since the POA
> spec was finalized.  If this has never been observed as a real
problem
> in systems after 5 years, perhaps we should not make the fix.
> However, if we decide to make no change, it might be worth adding a
note
> in the spec describing this deadlock scenario.
> 
> Ken.


-- 
Jishnu Mukerji
Senior Systems Architect        1001 Frontier Road, Suite 300
Technology Office               Bridgewater NJ 08807, USA
Software Global Business Unit   Tel: +1 908 243 8924
Hewlett-Packard Company         Fax: +1 908 243 8850
                                mailto: jishnu@hp.com


Sender: jbiggar@Resonate.com 
Date: Tue, 05 Nov 2002 10:56:29 -0800 
From: Jonathan Biggar <jon@floorboard.com> 
X-Mailer: Mozilla 4.8 [en] (X11; U; SunOS 5.8 sun4u) 
X-Accept-Language: en 
To: Jishnu Mukerji <jishnu@hp.com> 
CC: Ken Cavanaugh <Ken.Cavanaugh@sun.com>, corba-rtf@omg.org 
Subject: Re: Issue 2772 discussion 


Jishnu Mukerji wrote:
> 
> I would prefer to go with the make no change but add a crisp note
> describing the possibility of deadlock in the specification and
leave it
> at that.
> 
> If that seems agreeable, could Jon or Ken please provide a crisp
small
> paragraph describing the deadlock in a form that can be included as
a
> note at an appropriate palce in the spec. Identification of the
> appropriate place in the spec would be most appreciated too. Please
use
> formal/02-06-01 as the abse document.


I still see this deadlock as a ticking timebomb, which is not hard to
trigger.  All you need is a ServantActivator, a call to
deactivate_object(), and two objects in separate servers that can
possibly invoke methods on each other, and eventually the deadlock
*will* occur.  An obvious use case that triggers the deadlock would be
a
name server that implements an eviction pattern using a
ServantActivator.  Federate that with another name server with name
bindings pointing both directions and, depending on the serialization
requirements of the naming context implementation... BOOM!


I could live with adding a clear warning of the deadlock for the short
term, but in the long term, this needs a solution.


-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org


Date: Tue, 05 Nov 2002 14:07:11 -0500 
From: Jishnu Mukerji <jishnu@hp.com> 
Organization: Software Global Business Unit, Hewlett-Packard 
X-Mailer: Mozilla 4.79 [en] (Win98; U) 
X-Accept-Language: en 
To: Jonathan Biggar <jon@floorboard.com> 
Cc: Ken Cavanaugh <Ken.Cavanaugh@sun.com>, corba-rtf@omg.org 
Subject: Re: Issue 2772 discussion 


Jonathan Biggar wrote:
> 
> Jishnu Mukerji wrote:
> >
> > I would prefer to go with the make no change but add a crisp note
> > describing the possibility of deadlock in the specification and
leave it
> > at that.
> >
> > If that seems agreeable, could Jon or Ken please provide a crisp
small
> > paragraph describing the deadlock in a form that can be included
as a
> > note at an appropriate palce in the spec. Identification of the
> > appropriate place in the spec would be most appreciated
too. Please use
> > formal/02-06-01 as the abse document.
> 
> I still see this deadlock as a ticking timebomb, which is not hard
to
> trigger.  All you need is a ServantActivator, a call to
> deactivate_object(), and two objects in separate servers that can
> possibly invoke methods on each other, and eventually the deadlock
> *will* occur.  An obvious use case that triggers the deadlock would
be a
> name server that implements an eviction pattern using a
> ServantActivator.  Federate that with another name server with name
> bindings pointing both directions and, depending on the
serialization
> requirements of the naming context implementation... BOOM!
> 
> I could live with adding a clear warning of the deadlock for the
short
> term, but in the long term, this needs a solution.


But then we have to include a livelock warning instead of a deadlock
warning anyway, no? At least that is what I understood from Ken's
message. We could apply Jon's fix and then come up with a crisp
description of the livelock problem an include that in the
resolution. 


Comments? Jo? Ken? 


Thanks,


Jishnu.


Reply-To: "Michi Henning" <michi@triodia.com> 
From: "Michi Henning" <michi@triodia.com> 
To: "Jishnu Mukerji" <jishnu@hp.com>, "Jonathan Biggar"
<jon@floorboard.com> 
Cc: "Ken Cavanaugh" <Ken.Cavanaugh@sun.com>, <corba-rtf@omg.org> 
Subject: Re: Issue 2772 discussion 
Date: Wed, 6 Nov 2002 06:20:04 +1000 
Organization: Triodia Technologies 
X-Mailer: Microsoft Outlook Express 6.00.2800.1106 


> Jonathan Biggar wrote:
> >
> > Jishnu Mukerji wrote:
> > >
> > > I would prefer to go with the make no change but add a crisp
note
> > > describing the possibility of deadlock in the specification and
leave
it
> > > at that.
> > >
> > > If that seems agreeable, could Jon or Ken please provide a crisp
small
> > > paragraph describing the deadlock in a form that can be included
as a
> > > note at an appropriate palce in the spec. Identification of the
> > > appropriate place in the spec would be most appreciated
too. Please
use
> > > formal/02-06-01 as the abse document.
> >
> > I still see this deadlock as a ticking timebomb, which is not hard
to
> > trigger.  All you need is a ServantActivator, a call to
> > deactivate_object(), and two objects in separate servers that can
> > possibly invoke methods on each other, and eventually the deadlock
> > *will* occur.  An obvious use case that triggers the deadlock
would be a
> > name server that implements an eviction pattern using a
> > ServantActivator.  Federate that with another name server with
name
> > bindings pointing both directions and, depending on the
serialization
> > requirements of the naming context implementation... BOOM!
> >
> > I could live with adding a clear warning of the deadlock for the
short
> > term, but in the long term, this needs a solution.
>
> But then we have to include a livelock warning instead of a deadlock
> warning anyway, no? At least that is what I understood from Ken's
> message. We could apply Jon's fix and then come up with a crisp
> description of the livelock problem an include that in the
resolution.


The important point here is that the application programmer is
defenseless
against the deadlock problem, but can easily avoid the lifelock
problem:


1. Add a _removed member to the servant and initialize it to false.
2. In destroy, set the _removed member to true.
3. If a servant locator is in use, check the _removed member
   in preinvoke() and throw OBJECT_NOT_EXIST; if a
   servant locator is not in use, check the _removed member
   and, if true, throw OBJECT_NOT_EXIST.


As it turns out, most programs use this strategy anyway because,
for a persistent object, the only feasible place to destroy the
persistent data is in the servant destructor; the persistent data
is destroyed only if the _removed member is set. (It is not safe
to destroy the persistent data at any other time because other
operations that are in the servant in parallel with destroy()
in a threaded implementation may still need the persistent data.)


The net effect of all this is that, by using a _removed member,
I can easilty avoid the livelock, don't have to worry about deadlock,
and, for threaded servers, I typically end up using such a member
anyway.


Cheers,


Michi.
--
Michi Henning              Ph: +61 4 1118-2700
Triodia Technologies       http://www.triodia.com/staff/michi


Date: Wed, 06 Nov 2002 11:50:23 -0500 
From: Jishnu Mukerji <jishnu@hp.com> 
Organization: Software Global Business Unit, Hewlett-Packard 
X-Mailer: Mozilla 4.79 [en] (Win98; U) 
X-Accept-Language: en 
To: Michi Henning <michi@triodia.com> 
Cc: Jonathan Biggar <jon@floorboard.com>, 
   Ken Cavanaugh <Ken.Cavanaugh@sun.com>, corba-rtf@omg.org 
Subject: Re: Issue 2772 discussion 


Michi Henning wrote:
> 


> The important point here is that the application programmer is
  defenseless
> against the deadlock problem, but can easily avoid the lifelock
  problem:
> 
> 1. Add a _removed member to the servant and initialize it to false.
> 2. In destroy, set the _removed member to true.
> 3. If a servant locator is in use, check the _removed member
>    in preinvoke() and throw OBJECT_NOT_EXIST; if a
>    servant locator is not in use, check the _removed member
>    and, if true, throw OBJECT_NOT_EXIST.
> 
> As it turns out, most programs use this strategy anyway because,
> for a persistent object, the only feasible place to destroy the
> persistent data is in the servant destructor; the persistent data
> is destroyed only if the _removed member is set. (It is not safe
> to destroy the persistent data at any other time because other
> operations that are in the servant in parallel with destroy()
> in a threaded implementation may still need the persistent data.)
> 
> The net effect of all this is that, by using a _removed member,
> I can easilty avoid the livelock, don't have to worry about
  deadlock,
> and, for threaded servers, I typically end up using such a member
> anyway.


That is a pretty solid and satisfying argument for going ahead with
the
deadlock fix and documenting the livelock avoidance in a non-normative
note perhaps.


Jon, I could not find any specific suggestion for text to fix this.
Would you or Michi be able to produce it soon and send it to the list
for discussion?


Thanks,


Jishnu.


Sender: jbiggar@Resonate.com 
Date: Wed, 06 Nov 2002 11:16:59 -0800 
From: Jonathan Biggar <jon@floorboard.com> 
X-Mailer: Mozilla 4.8 [en] (X11; U; SunOS 5.8 sun4u) 
X-Accept-Language: en 
To: Jishnu Mukerji <jishnu@hp.com> 
CC: Michi Henning <michi@triodia.com>, Ken Cavanaugh
<Ken.Cavanaugh@sun.com>, 
   corba-rtf@omg.org 
Subject: Re: Issue 2772 discussion 


Jishnu Mukerji wrote:
> > The important point here is that the application programmer is
defenseless
> > against the deadlock problem, but can easily avoid the lifelock
problem:
> >
> > 1. Add a _removed member to the servant and initialize it to
false.
> > 2. In destroy, set the _removed member to true.
> > 3. If a servant locator is in use, check the _removed member
> >    in preinvoke() and throw OBJECT_NOT_EXIST; if a
> >    servant locator is not in use, check the _removed member
> >    and, if true, throw OBJECT_NOT_EXIST.
> >
> > As it turns out, most programs use this strategy anyway because,
> > for a persistent object, the only feasible place to destroy the
> > persistent data is in the servant destructor; the persistent data
> > is destroyed only if the _removed member is set. (It is not safe
> > to destroy the persistent data at any other time because other
> > operations that are in the servant in parallel with destroy()
> > in a threaded implementation may still need the persistent data.)
> >
> > The net effect of all this is that, by using a _removed member,
> > I can easilty avoid the livelock, don't have to worry about
deadlock,
> > and, for threaded servers, I typically end up using such a member
> > anyway.
> 
> That is a pretty solid and satisfying argument for going ahead with
the
> deadlock fix and documenting the livelock avoidance in a
non-normative
> note perhaps.
> 
> Jon, I could not find any specific suggestion for text to fix this.
> Would you or Michi be able to produce it soon and send it to the
list
> for discussion?


Actually, Michi's solution only works when the call to
deactivate_object() is intended to permanently destroy the object.  It
still leaves the possibility of a livelock if you are just trying to
remove the object's servant from memory, but allow it to be
reactivated
later.  However, in that case, you have to have an object that has
consistently 1 or more invocations active *all* the time to see the
livelock, which should be *very* rare.  Even that, though can be
solved
if you move the object to its own POA and POAManager, and then use
hold_requests() followed by activate() on the POAManager to get enough
breathing room to deactivate the object.


I'd like to call for a "sense of the RTF" poll on this one though.
The
question is whether to just fix the deadlock by changing the spec to
allow the POA to cancel a deactivation of an object, or to add the
additional Policy object that I mentioned previously in the thread to
allow the programmer to choose between the old and new behaviors.
Since
there is still the possibility that existing code could be broken by
the
new behavior, I think the latter is the better approach.


-- 
Jon Biggar
Floorboard Software
jon@floorboard.com
jon@biggar.org