Issue 2772: Potential deadlock with POA::deactivate_object() (corba-rtf) Source: (, ) Nature: Uncategorized Issue Severity: Summary: The draft CORBA 2.3 spec (ptc/99-03-07) does not deal with a potential deadlock situation. If an object is explicitly deactivated with POA::deactivate_object(), the object remains in the active object map until all operations pending on the object have completed. Any attempts to reactivate the object (implicitly via a ServantActivator, or explicitly via activate_object_with_id()) must block until the pending invocations have completed. However, if a servant's implementation of an object deactivates the object and then (directly or indirectly through a call to another collocated object) reactivates the object, the invocation will deadlock. Resolution: Deferred to next RTF Revised Text: Actions taken: June 28, 1999: received issue April 11, 2012: Deferred Discussion: End of Annotations:===== Sender: jon@floorboard.com Message-ID: <381BDE49.7E91D47C@floorboard.com> Date: Sat, 30 Oct 1999 23:14:33 -0700 From: Jonathan Biggar X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m) X-Accept-Language: en MIME-Version: 1.0 To: orb_revision@omg.org Subject: Issue 2772 discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: X7Pe97OA!!^7'!!9Z=!! I've given some thought to this issue, and the only solution I can come up with is to allow the POA to cancel the deactivation of an object if it needs to be activated again before the activation is complete. This changes the deadlock into a potential livelock situation instead. This change would be user-visible however, so I thought that I'd bring it up and see what everyone thought. Here is the information about the issue: Issue 2772: Potential deadlock with POA::deactivate_object() (orb_revision) Summary: The draft CORBA 2.3 spec (ptc/99-03-07) does not deal with a potential deadlock situation. If an object is explicitly deactivated with POA::deactivate_object(), the object remains in the active object map until all operations pending on the object have completed. Any attempts to reactivate the object (implicitly via a ServantActivator, or explicitly via activate_object_with_id()) must block until the pending invocations have completed. However, if a servant's implementation of an object deactivates the object and then (directly or indirectly through a call to another collocated object) reactivates the object, the invocation will deadlock. -- Jon Biggar Floorboard Software jon@floorboard.com jon@biggar.org From: Paul Kyzivat To: "'jon@floorboard.com'" , "'orb_revision@omg.org'" Subject: RE: Issue 2772 discussion Date: Mon, 1 Nov 1999 09:07:05 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1" X-UIDL: %/5e9O-"e9S#?!!gi'!! > From: jon@floorboard.com [mailto:jon@floorboard.com] > I've given some thought to this issue, and the only solution > I can come > up with is to allow the POA to cancel the deactivation of an object > if > it needs to be activated again before the activation is > complete. This > changes the deadlock into a potential livelock situation instead. > > This change would be user-visible however, so I thought that I'd > bring > it up and see what everyone thought. Can you explain a bit more about your potential solution? (I am concerned the cure may be worse than the disease.) Would this only work if an activation was attempted with the same servant and ID? Does this mean that a ServantActivator would need to be permitted to execute to completion, and then if it happened to be for an ObjectId in the process of being activated, and if the servant happened to be the one being deactivated for that ID, then the deactivation would be cancelled? (I would like the option to stall calls to servant activators in this case rather than let them complete and stall acting on their result.) This seems to present many complex new implementation challenges, and possibly new race conditions. (E.g., What if we run a ServantActivator under the above situation but it returns a different servant than the one being deactivated?) Sender: jon@floorboard.com Message-ID: <381DD4A9.6618B4E6@floorboard.com> Date: Mon, 01 Nov 1999 09:58:02 -0800 From: Jonathan Biggar X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m) X-Accept-Language: en MIME-Version: 1.0 To: Paul Kyzivat CC: "'orb_revision@omg.org'" Subject: Re: Issue 2772 discussion References: <9B164B713EE9D211B6DC0090273CEEA9140187@bos1.noblenet.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: m+o!!j$j!!gMC!!^?Qd9 Paul Kyzivat wrote: > > I've given some thought to this issue, and the only solution > > I can come > > up with is to allow the POA to cancel the deactivation of an > object if > > it needs to be activated again before the activation is > > complete. This > > changes the deadlock into a potential livelock situation instead. > > > > This change would be user-visible however, so I thought that I'd > bring > > it up and see what everyone thought. > > Can you explain a bit more about your potential solution? > (I am concerned the cure may be worse than the disease.) > > Would this only work if an activation was attempted with the same > servant > and ID? > > Does this mean that a ServantActivator would need to be permitted to > execute > to completion, and then if it happened to be for an ObjectId in the > process > of being activated, and if the servant happened to be the one being > deactivated for that ID, then the deactivation would be cancelled? > > (I would like the option to stall calls to servant activators in > this case > rather than let them complete and stall acting on their result.) > > This seems to present many complex new implementation challenges, > and > possibly new race conditions. (E.g., What if we run a > ServantActivator under > the above situation but it returns a different servant than the one > being > deactivated?) I suppose I wasn't clear. :-) In my idea, cancelling the deactivation of the object would only be allowed to occur before etherealize() was called. At that point, we already know that no outstanding invocations are pending, so the original race condition is no longer possible. So this leaves the situation where deactivate_object() has been called, and we have not yet called etherealize() (or we have USE_ACTIVE_OBJECT_MAP_ONLY). In this case, I'd like to consider the ramifications of allowing the POA to cancel the deactivation and reactivating the servant to handle new incoming requests. The deactivate_object() call is made to handle two conditions: either the object is being destroyed or it is just being deactivated pending future activation. In the former case, a thread-aware object must already have an internal flag that marks the object as "destroyed" so that it can raise OBJECT_NOT_EXIST on any pending requests during the destruction process. So rather than structuring operation implementations like this: void _fooImpl::foo_op() { Guard lock(mutex); if (destroyed) throw CORBA::OBJECT_NOT_EXIST(); ... } the code looks like this instead: void _fooImpl::foo_op() { Guard lock(mutex); if (destroyed) { try { deactivate_object(my_oid); // need to tell POA to deactivate the object // again! } catch (...) { // do nothing! } throw CORBA::OBJECT_NOT_EXIST(); } } If, however, deactivate_object() is called just to "passivate" the object, then cancelling the deactivation might have no affect on the servant implementation code. Since a thread-aware object must always code defensively, assuming that there are other pending requests when deactivate_object() is called, it already needs to delay any clean-up code to the etherealize() or servant destructor (when using reference counting on the servant). -- Jon Biggar Floorboard Software jon@floorboard.com jon@biggar.org From: Paul Kyzivat To: "'jon@floorboard.com'" Cc: "'orb_revision@omg.org'" Subject: RE: Issue 2772 discussion Date: Mon, 1 Nov 1999 15:18:08 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1" X-UIDL: JcW!!e?'e9k[I!!LI9e9 > From: jon@floorboard.com [mailto:jon@floorboard.com] > > I suppose I wasn't clear. :-) > > In my idea, cancelling the deactivation of the object would only be > allowed to occur before etherealize() was called. At that point, we > already know that no outstanding invocations are pending, so the > original race condition is no longer possible. So, if a request for the object comes in between deactivation and etherealization, you would just cancel the etherealization, reinstate the activation, and let the operation go through, without calling incarnate again? > > So this leaves the situation where deactivate_object() has > been called, > and we have not yet called etherealize() (or we have > USE_ACTIVE_OBJECT_MAP_ONLY). In this case, I'd like to consider the > ramifications of allowing the POA to cancel the deactivation and > reactivating the servant to handle new incoming requests. > > The deactivate_object() call is made to handle two conditions: > either > the object is being destroyed or it is just being deactivated > pending > future activation. In the former case, a thread-aware object must > already have an internal flag that marks the object as "destroyed" > so > that it can raise OBJECT_NOT_EXIST on any pending requests during > the > destruction process. I think you have overlooked another implementation approach: When I want to destroy the object I deactivate it. With any further attempt to use the object, the ServantActivator will simply refuse to do so, causing the client to get OBJECT_NOT_EXIST. The servant is simpler, because the object exists as long as the servant exists (or can be incarnated). It is the servant activator, which is involved with persistence, etc. that is the actual gatekeeper of existence. To me this is a natural implementation approach. But it means that once I have decided I want to deactivate the servant, because I want to destroy the object, I don't want my decision overturned by the POA. (I may not have any other natural hook to cause the deactivation at a later time.) I think this cure is indeed worse than the disease. I have several servers where a request to destroy an object is accomplished by deactivating it, with the majority of the logic happening in etherealize. This change would break them - anybody sending a request in the window just following the destroy call would in effect cancel the destroy. > So rather than structuring operation > implementations like this: > > void _fooImpl::foo_op() > { > Guard lock(mutex); > > if (destroyed) > throw CORBA::OBJECT_NOT_EXIST(); > > ... > } > > the code looks like this instead: > > void _fooImpl::foo_op() > { > Guard lock(mutex); > > if (destroyed) { > try { > deactivate_object(my_oid); // need to tell POA to > deactivate the > object > // again! > } > catch (...) { > // do nothing! > } > throw CORBA::OBJECT_NOT_EXIST(); > } > } > > If, however, deactivate_object() is called just to "passivate" the > object, then cancelling the deactivation might have no affect on the > servant implementation code. > Since a thread-aware object must always code defensively, > assuming that > there are other pending requests when deactivate_object() is > called, it > already needs to delay any clean-up code to the etherealize() > or servant > destructor (when using reference counting on the servant). Sender: jon@floorboard.com Message-ID: <381E02A6.D4EBD5A1@floorboard.com> Date: Mon, 01 Nov 1999 13:14:14 -0800 From: Jonathan Biggar X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m) X-Accept-Language: en MIME-Version: 1.0 To: Paul Kyzivat CC: "'orb_revision@omg.org'" Subject: Re: Issue 2772 discussion References: <9B164B713EE9D211B6DC0090273CEEA914018A@bos1.noblenet.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: nDS!!IPZd9WSpd9]83e9 Paul Kyzivat wrote: > > > From: jon@floorboard.com [mailto:jon@floorboard.com] > > > > I suppose I wasn't clear. :-) > > > > In my idea, cancelling the deactivation of the object would only > be > > allowed to occur before etherealize() was called. At that point, > we > > already know that no outstanding invocations are pending, so the > > original race condition is no longer possible. > > So, if a request for the object comes in between deactivation and > etherealization, you would just cancel the etherealization, > reinstate the > activation, and let the operation go through, without calling > incarnate > again? That's the concept. > > > > So this leaves the situation where deactivate_object() has > > been called, > > and we have not yet called etherealize() (or we have > > USE_ACTIVE_OBJECT_MAP_ONLY). In this case, I'd like to consider > the > > ramifications of allowing the POA to cancel the deactivation and > > reactivating the servant to handle new incoming requests. > > > > The deactivate_object() call is made to handle two conditions: > either > > the object is being destroyed or it is just being deactivated > pending > > future activation. In the former case, a thread-aware object must > > already have an internal flag that marks the object as "destroyed" > so > > that it can raise OBJECT_NOT_EXIST on any pending requests during > the > > destruction process. > > I think you have overlooked another implementation approach: > > When I want to destroy the object I deactivate it. > With any further attempt to use the object, the ServantActivator > will simply > refuse to do so, causing the client to get OBJECT_NOT_EXIST. Umm, how does this work? The ServantActivator can't get involved until etherealize() is called, at which point deactivation is safe anyway. > The servant is simpler, because the object exists as long as the servant > exists (or can be incarnated). It is the servant activator, which is > involved with persistence, etc. that is the actual gatekeeper of existence. > > To me this is a natural implementation approach. But it means that once I > have decided I want to deactivate the servant, because I want to destroy the > object, I don't want my decision overturned by the POA. (I may not have any > other natural hook to cause the deactivation at a later time.) > > I think this cure is indeed worse than the disease. I have several servers > where a request to destroy an object is accomplished by deactivating it, > with the majority of the logic happening in etherealize. This change would > break them - anybody sending a request in the window just following the > destroy call would in effect cancel the destroy. If youre servant is thread-aware, just calling deactivate_object() isn't sufficient, since there may be other pending requests outstanding when the "destroy" occurs. How do you prevent those pending requests from seeing stale information? The only safe way is to have a destroy state flag in the servant anyway. -- Jon Biggar Floorboard Software jon@floorboard.com jon@biggar.org From: Paul Kyzivat To: "'Jonathan Biggar'" , Paul Kyzivat Cc: "'orb_revision@omg.org'" Subject: RE: Issue 2772 discussion Date: Mon, 1 Nov 1999 18:09:45 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1" X-UIDL: ?~L!!Mg!!!@Oed985\d9 > If youre servant is thread-aware, just calling > deactivate_object() isn't > sufficient, since there may be other pending requests outstanding > when > the "destroy" occurs. How do you prevent those pending requests > from > seeing stale information? The only safe way is to have a > destroy state flag in the servant anyway. The idea is that all the request to destroy does is call deactivate. At that point the destruction is "in the pipeline". Any other operations already in progress go on to completion. (After all, that is what letting operations complete before etherealization is all about.) They don't see any stale information because nothing is state, yet. When the etherealize finally happens, it does the actual work of destruction. (This is really easy for objects that aren't persistent, and for which incarnate always fails.) Sender: jon@floorboard.com Message-ID: <381E1F73.4FCD369F@floorboard.com> Date: Mon, 01 Nov 1999 15:17:07 -0800 From: Jonathan Biggar X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m) X-Accept-Language: en MIME-Version: 1.0 To: Paul Kyzivat CC: "'orb_revision@omg.org'" Subject: Re: Issue 2772 discussion References: <9B164B713EE9D211B6DC0090273CEEA9140191@bos1.noblenet.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: 5+?!!gfZd9pI&e9b~L!! Paul Kyzivat wrote: > > > If youre servant is thread-aware, just calling > > deactivate_object() isn't > > sufficient, since there may be other pending requests outstanding > when > > the "destroy" occurs. How do you prevent those pending requests > from > > seeing stale information? The only safe way is to have a > > destroy state flag in the servant anyway. > > The idea is that all the request to destroy does is call deactivate. > At that point the destruction is "in the pipeline". Any other > operations > already in progress go on to completion. (After all, that is what > letting > operations complete before etherealization is all about.) > > They don't see any stale information because nothing is state, yet. > > When the etherealize finally happens, it does the actual work of > destruction. (This is really easy for objects that aren't > persistent, and > for which incarnate always fails.) Unless the other pending operations won't change in behavior due to the fact that the object is "destroyed", you have a race condition. -- Jon Biggar Floorboard Software jon@floorboard.com jon@biggar.org From: Paul Kyzivat To: "'Jonathan Biggar'" , Paul Kyzivat Cc: "'orb_revision@omg.org'" Subject: RE: Issue 2772 discussion Date: Tue, 2 Nov 1999 08:19:43 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1" X-UIDL: #"$!!(em!!F*=!!!R2e9 > From: Jonathan Biggar [mailto:jon@floorboard.com] ... > Unless the other pending operations won't change in behavior > due to the > fact that the object is "destroyed", you have a race condition. Yes, but it fits a number of reasonable situations. Let's put it another way. How would you go about implementing the following use case: You have a factory object. Clients contact it, and perform some operation that creates a transient object for them to use. They use the transient object for some period of time, and then, when done with it, destroy it with an explicit call. The transient object has state while it exists, and that state should go away when it is destroyed. Sender: jon@floorboard.com Message-ID: <381F1089.CD0F6F1A@floorboard.com> Date: Tue, 02 Nov 1999 08:25:45 -0800 From: Jonathan Biggar X-Mailer: Mozilla 4.6 [en] (X11; U; SunOS 5.5.1 sun4m) X-Accept-Language: en MIME-Version: 1.0 To: Paul Kyzivat CC: "'orb_revision@omg.org'" Subject: Re: Issue 2772 discussion References: <9B164B713EE9D211B6DC0090273CEEA9140192@bos1.noblenet.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii X-UIDL: #JKd9c)$!!d"5!!L~c!! Paul Kyzivat wrote: > > > From: Jonathan Biggar [mailto:jon@floorboard.com] > ... > > Unless the other pending operations won't change in behavior > > due to the > > fact that the object is "destroyed", you have a race condition. > > Yes, but it fits a number of reasonable situations. > > Let's put it another way. How would you go about implementing the > following > use case: > > You have a factory object. Clients contact it, and perform some > operation > that creates a transient object for them to use. They use the > transient > object for some period of time, and then, when done with it, destroy > it with > an explicit call. The transient object has state while it exists, > and that > state should go away when it is destroyed. The way I implement most thread-aware objects: with an extra state flag that indicates whether the object is destroyed or not. All methods check the flag first and raise OBJECT_NOT_EXIST. Then I allow the usual cleanup methods (etherealize() or _remove_ref()) to clean up the object once all of the operations have drained. If you have a race condition, you have one. I don't see any "reasonable" situations if the servant is required to maintain state that doesn't include the race condition. -- Jon Biggar Floorboard Software jon@floorboard.com jon@biggar.org From: Paul Kyzivat To: "'Jonathan Biggar'" , Paul Kyzivat Cc: "'orb_revision@omg.org'" Subject: RE: Issue 2772 discussion Date: Tue, 2 Nov 1999 14:43:33 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1" X-UIDL: *an!!^E"!!+;""!U1He9 > From: Jonathan Biggar [mailto:jon@floorboard.com] ... > The way I implement most thread-aware objects: with an extra > state flag > that indicates whether the object is destroyed or not. All methods > check the flag first and raise OBJECT_NOT_EXIST. Then I > allow the usual > cleanup methods (etherealize() or _remove_ref()) to clean up > the object > once all of the operations have drained. > > If you have a race condition, you have one. I don't see any > "reasonable" situations if the servant is required to maintain state > that doesn't include the race condition. While there are situations where it is unavoidable, I usually search for ways to avoid requiring that sort of checking at the beginning of every method - it is both a pain and error prone. Being thread-safe & thread-aware may require a mutex, but that is a whole separate issue and is easily dealt with while generally ignoring destruction on a method-by-method basis. There are plenty of cases where this is not a problem. Consider for example, a BindingIterator. It can be destroyed at any time by removing its state and refusing to reincarnate it. But until that happens, anybody who happens to access it will get some meaningful result. Date: Tue, 9 Nov 1999 15:22:47 +1000 (EST) From: Michi Henning X-Sender: michi@bobo.triodia.com To: Jonathan Biggar cc: Jishnu Mukerji , orb_revision@omg.org Subject: Issue 2772 In-Reply-To: <382794F0.96ECCA8A@floorboard.com> Message-ID: Organization: Object Oriented Concepts MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-UIDL: 3*U!!%E@!!#39!!k`Y!! On Mon, 8 Nov 1999, Jonathan Biggar wrote: > > 2772 - Jon > > I sent out a "discussion point" proposal on 10/30 on this one, but > only > Paul responded. Without more discussion, I don't feel comfortable > making the proposal. I am happy to go with Jon's suggestion. It seems to be the only feasible way to deal with the problem. Cheers, Michi. -- Michi Henning +61 7 3236 1633 Object Oriented Concepts +61 4 1118 2700 (mobile) PO Box 372 +61 7 3211 0047 (fax) Annerley 4103 michi@ooc.com.au AUSTRALIA http://www.ooc.com.au/staff/michi-henning.html From: Paul Kyzivat To: "'Michi Henning'" , "'Jonathan Biggar'" Cc: "'Jishnu Mukerji'" , "'orb_revision@omg.org'" Subject: RE: Issue 2772 Date: Tue, 9 Nov 1999 11:33:25 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1" X-UIDL: m\g!!jJ\d9((Qe9m_^d9 > From: Michi Henning [mailto:michi@ooc.com.au] ... > On Mon, 8 Nov 1999, Jonathan Biggar wrote: ... > > I sent out a "discussion point" proposal on 10/30 on this > one, but only > > Paul responded. Without more discussion, I don't feel comfortable > > making the proposal. > > I am happy to go with Jon's suggestion. It seems to be the > only feasible way to deal with the problem. I was surprised that nobody else joined the discussion. I feel strongly that this solution is a bad one. If I have a servant activator, and I call deactivate_object, I have a reason to expect that in due course my etherealize routine will be called. Otherwise I am being forced into an implementation approach that sets a "deactivate_requested" state into my servant and checks it on every call. Jon may like to code things that way. He may even think there is no other way to do things. But I have implementations that expect the etherealize, and I am convinced they work correctly. They don't have problems with deadlocks because they are designed not to. It isn't possible to remove all possibility of deadlock without severely restricting the programming model. I think it is better to leave some of the responsibility for deadlock prevention with the implementor. Date: Tue, 29 Oct 2002 17:11:48 -0500 From: Jishnu Mukerji Organization: Software Global Business Unit, Hewlett-Packard X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en To: corba-rtf@omg.org Subject: Issue 2772 discussion http://cgi.omg.org/issues/issue2772.txt Jon, What would you like to see happen with this issue. Unless we can come up with a workable resolution, I think this is a prime candidate for closure with a warning that there is the possibility of this deadlock. What do others think? Does Jon's proposed resolution run afoul of any of their implementations? Thanks, Jishnu. Sender: jon@floorboard.com Date: Wed, 30 Oct 2002 20:52:34 -0800 From: Jonathan Biggar X-Mailer: Mozilla 4.8 [en] (X11; U; SunOS 5.7 sun4u) X-Accept-Language: en To: Jishnu Mukerji CC: corba-rtf@omg.org Subject: Re: Issue 2772 discussion Jishnu Mukerji wrote: > > http://cgi.omg.org/issues/issue2772.txt > > Jon, > > What would you like to see happen with this issue. Unless we can come up > with a workable resolution, I think this is a prime candidate for > closure with a warning that there is the possibility of this deadlock. > What do others think? Does Jon's proposed resolution run afoul of any of > their implementations? Ok, I've re-reviewed it and here are my thoughts. I agree with Paul that there is code that would most likely break if the propsal I made was adopted. However, I think the likelyhood of the deadlock is just too great to ignore the situation. It's just far too easy to implement an object in a way that causes it to invoke (directly or indirectly) another object in the same POA. Add a ServantActivator and a call to deactivate_object() and the deadlock *will* happen eventually. And since the invocation can be indirect through a call to another server and back, it is impossible for the object implementation to either completely avoid the problem or detect it when it happens. So, what should we do? 1. Punt, and leave it like it is. I dislike this option quite a bit, since it is a timebomb that is bound to happen far too often, and restructuring code to avoid it is difficult or impossible. 2. Make the change I proposed. The advantage is that there is no IDL change, only behavior, but the likelyhood of code breaking makes this choise unattractive too. 3. Go with a new brainstorm that I just had. :) We can protect existing code from breaking and also allow new code to avoid the deadlock using the method that I proposed originally by adding a new policy object that activates the new semantics. Something like: module POA { const CORBA::PolicyType DEACTIVATION_POLICY_ID = XXX; enum DeactivationPolicyValue { CAN_NOT_CANCEL_DEACTIVATE, // HELP! Pithier names wanted! CAN_CANCEL_DEACTIVATE }; local interface DeactivationPolicy { readonly attribute DeactivationPolicyValue value; }; }; with CAN_NOT_CANCEL_DEACTIVATE being the default, which preserves current semantics. So, what do people think about this idea? -- Jon Biggar Floorboard Software jon@floorboard.com jon@biggar.org Date: Wed, 30 Oct 2002 22:20:11 -0800 (PST) From: Ken Cavanaugh Reply-To: Ken Cavanaugh Subject: Re: Issue 2772 discussion To: jishnu@hp.com, jon@floorboard.com Cc: corba-rtf@omg.org X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.3.5 SunOS 5.7 sun4u sparc >From: Jonathan Biggar >X-Accept-Language: en >MIME-Version: 1.0 >To: Jishnu Mukerji >CC: corba-rtf@omg.org >Subject: Re: Issue 2772 discussion >Content-Transfer-Encoding: 7bit > >Jishnu Mukerji wrote: >> >> http://cgi.omg.org/issues/issue2772.txt >> >> Jon, >> >> What would you like to see happen with this issue. Unless we can >come up >> with a workable resolution, I think this is a prime candidate for >> closure with a warning that there is the possibility of this >deadlock. >> What do others think? Does Jon's proposed resolution run afoul of >any of >> their implementations? > >Ok, I've re-reviewed it and here are my thoughts. > >I agree with Paul that there is code that would most likely break if >the >propsal I made was adopted. However, I think the likelyhood of the >deadlock is just too great to ignore the situation. It's just far >too >easy to implement an object in a way that causes it to invoke >(directly >or indirectly) another object in the same POA. Add a >ServantActivator >and a call to deactivate_object() and the deadlock *will* happen >eventually. And since the invocation can be indirect through a call >to >another server and back, it is impossible for the object >implementation >to either completely avoid the problem or detect it when it happens. > >So, what should we do? > >1. Punt, and leave it like it is. > >I dislike this option quite a bit, since it is a timebomb that is >bound >to happen far too often, and restructuring code to avoid it is >difficult >or impossible. > >2. Make the change I proposed. > >The advantage is that there is no IDL change, only behavior, but the >likelyhood of code breaking makes this choise unattractive too. > >3. Go with a new brainstorm that I just had. :) > >We can protect existing code from breaking and also allow new code to >avoid the deadlock using the method that I proposed originally by >adding >a new policy object that activates the new semantics. Something >like: > >module POA { > > const CORBA::PolicyType DEACTIVATION_POLICY_ID = XXX; > > enum DeactivationPolicyValue { > CAN_NOT_CANCEL_DEACTIVATE, // HELP! Pithier names >wanted! > CAN_CANCEL_DEACTIVATE > }; > > local interface DeactivationPolicy { > readonly attribute DeactivationPolicyValue value; > }; > >}; > >with CAN_NOT_CANCEL_DEACTIVATE being the default, which preserves >current semantics. > Pro: This fix is really easy to implement, in that all I need to change is one state transition in the AOM entry state machine in the Sun POA implementation. Con: This does indeed trade the deadlock for a livelock problem, which is probably no real solution either. Without the cancel semantics, a busy, complex system will deadlock in some cases when attempting to deactivate a servant; with it, a busy, complex server will probably never be able to deactivate a servant, due to frequent invocations that continually cancel any pending etherealization. At least with the policy we can choose which bad behavior we want :-). I guess if this was offered as a resolution, I would probably vote for it, but a better solution seems desirable (and no, I haven't think of a better solution either). We could also consider that this issue has been in the archives for 3 years, and has probably been an issue since the POA spec was finalized. If this has never been observed as a real problem in systems after 5 years, perhaps we should not make the fix. However, if we decide to make no change, it might be worth adding a note in the spec describing this deadlock scenario. Ken. Date: Mon, 04 Nov 2002 15:32:58 -0500 From: Jishnu Mukerji Organization: Software Global Business Unit, Hewlett-Packard X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en To: Ken Cavanaugh Cc: jon@floorboard.com, corba-rtf@omg.org Subject: Re: Issue 2772 discussion I would prefer to go with the make no change but add a crisp note describing the possibility of deadlock in the specification and leave it at that. If that seems agreeable, could Jon or Ken please provide a crisp small paragraph describing the deadlock in a form that can be included as a note at an appropriate palce in the spec. Identification of the appropriate place in the spec would be most appreciated too. Please use formal/02-06-01 as the abse document. Thanks, Jishnu. Ken Cavanaugh wrote: > > >From: Jonathan Biggar > >X-Accept-Language: en > >MIME-Version: 1.0 > >To: Jishnu Mukerji > >CC: corba-rtf@omg.org > >Subject: Re: Issue 2772 discussion > >Content-Transfer-Encoding: 7bit > > > >Jishnu Mukerji wrote: > >> > >> http://cgi.omg.org/issues/issue2772.txt > >> > >> Jon, > >> > >> What would you like to see happen with this issue. Unless we can come up > >> with a workable resolution, I think this is a prime candidate for > >> closure with a warning that there is the possibility of this deadlock. > >> What do others think? Does Jon's proposed resolution run afoul of any of > >> their implementations? > > > >Ok, I've re-reviewed it and here are my thoughts. > > > >I agree with Paul that there is code that would most likely break if the > >propsal I made was adopted. However, I think the likelyhood of the > >deadlock is just too great to ignore the situation. It's just far too > >easy to implement an object in a way that causes it to invoke (directly > >or indirectly) another object in the same POA. Add a ServantActivator > >and a call to deactivate_object() and the deadlock *will* happen > >eventually. And since the invocation can be indirect through a call to > >another server and back, it is impossible for the object implementation > >to either completely avoid the problem or detect it when it happens. > > > >So, what should we do? > > > >1. Punt, and leave it like it is. > > > >I dislike this option quite a bit, since it is a timebomb that is bound > >to happen far too often, and restructuring code to avoid it is difficult > >or impossible. > > > >2. Make the change I proposed. > > > >The advantage is that there is no IDL change, only behavior, but the > >likelyhood of code breaking makes this choise unattractive too. > > > >3. Go with a new brainstorm that I just had. :) > > > >We can protect existing code from breaking and also allow new code to > >avoid the deadlock using the method that I proposed originally by adding > >a new policy object that activates the new semantics. Something like: > > > >module POA { > > > > const CORBA::PolicyType DEACTIVATION_POLICY_ID = XXX; > > > > enum DeactivationPolicyValue { > > CAN_NOT_CANCEL_DEACTIVATE, // HELP! Pithier names wanted! > > CAN_CANCEL_DEACTIVATE > > }; > > > > local interface DeactivationPolicy { > > readonly attribute DeactivationPolicyValue value; > > }; > > > >}; > > > >with CAN_NOT_CANCEL_DEACTIVATE being the default, which preserves > >current semantics. > > > > Pro: > This fix is really easy to implement, in that all I need to > change is one state transition in the AOM entry state machine > in the Sun POA implementation. > > Con: > This does indeed trade the deadlock for a livelock problem, > which is probably no real solution either. Without the cancel > semantics, a busy, complex system will deadlock in some cases > when attempting to deactivate a servant; with it, a busy, complex > server will probably never be able to deactivate a servant, > due to frequent invocations that continually cancel any > pending etherealization. At least with the policy we can > choose which bad behavior we want :-). > > I guess if this was offered as a resolution, I would probably vote for it, > but a better solution seems desirable (and no, I haven't think of a better > solution either). We could also consider that this issue has been in the > archives for 3 years, and has probably been an issue since the POA > spec was finalized. If this has never been observed as a real problem > in systems after 5 years, perhaps we should not make the fix. > However, if we decide to make no change, it might be worth adding a note > in the spec describing this deadlock scenario. > > Ken. -- Jishnu Mukerji Senior Systems Architect 1001 Frontier Road, Suite 300 Technology Office Bridgewater NJ 08807, USA Software Global Business Unit Tel: +1 908 243 8924 Hewlett-Packard Company Fax: +1 908 243 8850 mailto: jishnu@hp.com Sender: jbiggar@Resonate.com Date: Tue, 05 Nov 2002 10:56:29 -0800 From: Jonathan Biggar X-Mailer: Mozilla 4.8 [en] (X11; U; SunOS 5.8 sun4u) X-Accept-Language: en To: Jishnu Mukerji CC: Ken Cavanaugh , corba-rtf@omg.org Subject: Re: Issue 2772 discussion Jishnu Mukerji wrote: > > I would prefer to go with the make no change but add a crisp note > describing the possibility of deadlock in the specification and leave it > at that. > > If that seems agreeable, could Jon or Ken please provide a crisp small > paragraph describing the deadlock in a form that can be included as a > note at an appropriate palce in the spec. Identification of the > appropriate place in the spec would be most appreciated too. Please use > formal/02-06-01 as the abse document. I still see this deadlock as a ticking timebomb, which is not hard to trigger. All you need is a ServantActivator, a call to deactivate_object(), and two objects in separate servers that can possibly invoke methods on each other, and eventually the deadlock *will* occur. An obvious use case that triggers the deadlock would be a name server that implements an eviction pattern using a ServantActivator. Federate that with another name server with name bindings pointing both directions and, depending on the serialization requirements of the naming context implementation... BOOM! I could live with adding a clear warning of the deadlock for the short term, but in the long term, this needs a solution. -- Jon Biggar Floorboard Software jon@floorboard.com jon@biggar.org Date: Tue, 05 Nov 2002 14:07:11 -0500 From: Jishnu Mukerji Organization: Software Global Business Unit, Hewlett-Packard X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en To: Jonathan Biggar Cc: Ken Cavanaugh , corba-rtf@omg.org Subject: Re: Issue 2772 discussion Jonathan Biggar wrote: > > Jishnu Mukerji wrote: > > > > I would prefer to go with the make no change but add a crisp note > > describing the possibility of deadlock in the specification and leave it > > at that. > > > > If that seems agreeable, could Jon or Ken please provide a crisp small > > paragraph describing the deadlock in a form that can be included as a > > note at an appropriate palce in the spec. Identification of the > > appropriate place in the spec would be most appreciated too. Please use > > formal/02-06-01 as the abse document. > > I still see this deadlock as a ticking timebomb, which is not hard to > trigger. All you need is a ServantActivator, a call to > deactivate_object(), and two objects in separate servers that can > possibly invoke methods on each other, and eventually the deadlock > *will* occur. An obvious use case that triggers the deadlock would be a > name server that implements an eviction pattern using a > ServantActivator. Federate that with another name server with name > bindings pointing both directions and, depending on the serialization > requirements of the naming context implementation... BOOM! > > I could live with adding a clear warning of the deadlock for the short > term, but in the long term, this needs a solution. But then we have to include a livelock warning instead of a deadlock warning anyway, no? At least that is what I understood from Ken's message. We could apply Jon's fix and then come up with a crisp description of the livelock problem an include that in the resolution. Comments? Jo? Ken? Thanks, Jishnu. Reply-To: "Michi Henning" From: "Michi Henning" To: "Jishnu Mukerji" , "Jonathan Biggar" Cc: "Ken Cavanaugh" , Subject: Re: Issue 2772 discussion Date: Wed, 6 Nov 2002 06:20:04 +1000 Organization: Triodia Technologies X-Mailer: Microsoft Outlook Express 6.00.2800.1106 > Jonathan Biggar wrote: > > > > Jishnu Mukerji wrote: > > > > > > I would prefer to go with the make no change but add a crisp note > > > describing the possibility of deadlock in the specification and leave it > > > at that. > > > > > > If that seems agreeable, could Jon or Ken please provide a crisp small > > > paragraph describing the deadlock in a form that can be included as a > > > note at an appropriate palce in the spec. Identification of the > > > appropriate place in the spec would be most appreciated too. Please use > > > formal/02-06-01 as the abse document. > > > > I still see this deadlock as a ticking timebomb, which is not hard to > > trigger. All you need is a ServantActivator, a call to > > deactivate_object(), and two objects in separate servers that can > > possibly invoke methods on each other, and eventually the deadlock > > *will* occur. An obvious use case that triggers the deadlock would be a > > name server that implements an eviction pattern using a > > ServantActivator. Federate that with another name server with name > > bindings pointing both directions and, depending on the serialization > > requirements of the naming context implementation... BOOM! > > > > I could live with adding a clear warning of the deadlock for the short > > term, but in the long term, this needs a solution. > > But then we have to include a livelock warning instead of a deadlock > warning anyway, no? At least that is what I understood from Ken's > message. We could apply Jon's fix and then come up with a crisp > description of the livelock problem an include that in the resolution. The important point here is that the application programmer is defenseless against the deadlock problem, but can easily avoid the lifelock problem: 1. Add a _removed member to the servant and initialize it to false. 2. In destroy, set the _removed member to true. 3. If a servant locator is in use, check the _removed member in preinvoke() and throw OBJECT_NOT_EXIST; if a servant locator is not in use, check the _removed member and, if true, throw OBJECT_NOT_EXIST. As it turns out, most programs use this strategy anyway because, for a persistent object, the only feasible place to destroy the persistent data is in the servant destructor; the persistent data is destroyed only if the _removed member is set. (It is not safe to destroy the persistent data at any other time because other operations that are in the servant in parallel with destroy() in a threaded implementation may still need the persistent data.) The net effect of all this is that, by using a _removed member, I can easilty avoid the livelock, don't have to worry about deadlock, and, for threaded servers, I typically end up using such a member anyway. Cheers, Michi. -- Michi Henning Ph: +61 4 1118-2700 Triodia Technologies http://www.triodia.com/staff/michi Date: Wed, 06 Nov 2002 11:50:23 -0500 From: Jishnu Mukerji Organization: Software Global Business Unit, Hewlett-Packard X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en To: Michi Henning Cc: Jonathan Biggar , Ken Cavanaugh , corba-rtf@omg.org Subject: Re: Issue 2772 discussion Michi Henning wrote: > > The important point here is that the application programmer is defenseless > against the deadlock problem, but can easily avoid the lifelock problem: > > 1. Add a _removed member to the servant and initialize it to false. > 2. In destroy, set the _removed member to true. > 3. If a servant locator is in use, check the _removed member > in preinvoke() and throw OBJECT_NOT_EXIST; if a > servant locator is not in use, check the _removed member > and, if true, throw OBJECT_NOT_EXIST. > > As it turns out, most programs use this strategy anyway because, > for a persistent object, the only feasible place to destroy the > persistent data is in the servant destructor; the persistent data > is destroyed only if the _removed member is set. (It is not safe > to destroy the persistent data at any other time because other > operations that are in the servant in parallel with destroy() > in a threaded implementation may still need the persistent data.) > > The net effect of all this is that, by using a _removed member, > I can easilty avoid the livelock, don't have to worry about deadlock, > and, for threaded servers, I typically end up using such a member > anyway. That is a pretty solid and satisfying argument for going ahead with the deadlock fix and documenting the livelock avoidance in a non-normative note perhaps. Jon, I could not find any specific suggestion for text to fix this. Would you or Michi be able to produce it soon and send it to the list for discussion? Thanks, Jishnu. Sender: jbiggar@Resonate.com Date: Wed, 06 Nov 2002 11:16:59 -0800 From: Jonathan Biggar X-Mailer: Mozilla 4.8 [en] (X11; U; SunOS 5.8 sun4u) X-Accept-Language: en To: Jishnu Mukerji CC: Michi Henning , Ken Cavanaugh , corba-rtf@omg.org Subject: Re: Issue 2772 discussion Jishnu Mukerji wrote: > > The important point here is that the application programmer is defenseless > > against the deadlock problem, but can easily avoid the lifelock problem: > > > > 1. Add a _removed member to the servant and initialize it to false. > > 2. In destroy, set the _removed member to true. > > 3. If a servant locator is in use, check the _removed member > > in preinvoke() and throw OBJECT_NOT_EXIST; if a > > servant locator is not in use, check the _removed member > > and, if true, throw OBJECT_NOT_EXIST. > > > > As it turns out, most programs use this strategy anyway because, > > for a persistent object, the only feasible place to destroy the > > persistent data is in the servant destructor; the persistent data > > is destroyed only if the _removed member is set. (It is not safe > > to destroy the persistent data at any other time because other > > operations that are in the servant in parallel with destroy() > > in a threaded implementation may still need the persistent data.) > > > > The net effect of all this is that, by using a _removed member, > > I can easilty avoid the livelock, don't have to worry about deadlock, > > and, for threaded servers, I typically end up using such a member > > anyway. > > That is a pretty solid and satisfying argument for going ahead with the > deadlock fix and documenting the livelock avoidance in a non-normative > note perhaps. > > Jon, I could not find any specific suggestion for text to fix this. > Would you or Michi be able to produce it soon and send it to the list > for discussion? Actually, Michi's solution only works when the call to deactivate_object() is intended to permanently destroy the object. It still leaves the possibility of a livelock if you are just trying to remove the object's servant from memory, but allow it to be reactivated later. However, in that case, you have to have an object that has consistently 1 or more invocations active *all* the time to see the livelock, which should be *very* rare. Even that, though can be solved if you move the object to its own POA and POAManager, and then use hold_requests() followed by activate() on the POAManager to get enough breathing room to deactivate the object. I'd like to call for a "sense of the RTF" poll on this one though. The question is whether to just fix the deadlock by changing the spec to allow the POA to cancel a deactivation of an object, or to add the additional Policy object that I mentioned previously in the thread to allow the programmer to choose between the old and new behaviors. Since there is still the possibility that existing code could be broken by the new behavior, I think the latter is the better approach. -- Jon Biggar Floorboard Software jon@floorboard.com jon@biggar.org