[dpdk-dev,v2] Change alarm cancel function to thread-safe:

Message ID 20140926114630.GA3930@hmsreliant.think-freely.org (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Neil Horman Sept. 26, 2014, 11:46 a.m. UTC
  On Thu, Sep 25, 2014 at 11:24:30PM +0000, Ananyev, Konstantin wrote:
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Thursday, September 25, 2014 6:24 PM
> > To: Ananyev, Konstantin
> > Cc: Jastrzebski, MichalX K; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> > 
> > On Thu, Sep 25, 2014 at 04:03:48PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> > > > Sent: Thursday, September 25, 2014 4:08 PM
> > > > To: Jastrzebski, MichalX K
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> > > >
> > > > On Thu, Sep 25, 2014 at 01:56:08PM +0100, Michal Jastrzebski wrote:
> > > > >     Change alarm cancel function to thread-safe.
> > > > >     It eliminates a race between threads using rte_alarm_cancel and
> > > > >     rte_alarm_set.
> > > > >
> > > > > Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
> > > > > Reviewed-by: Michal Jastrzebski <michalx.k.jastrzebski@intel.com>
> > > > >
> > > > > ---
> > > > >  lib/librte_eal/common/include/rte_alarm.h |    3 +-
> > > > >  lib/librte_eal/linuxapp/eal/eal_alarm.c   |   68 ++++++++++++++++++-----------
> > > > >  2 files changed, 45 insertions(+), 26 deletions(-)
> > > > >
> > > >
> > > > > diff --git a/lib/librte_eal/common/include/rte_alarm.h b/lib/librte_eal/common/include/rte_alarm.h
> > > > > index d451522..e7cbaef 100644
> > > > > --- a/lib/librte_eal/common/include/rte_alarm.h
> > > > > +++ b/lib/librte_eal/common/include/rte_alarm.h
> > > > > @@ -76,7 +76,8 @@ typedef void (*rte_eal_alarm_callback)(void *arg);
> > > > >  int rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb, void *cb_arg);
> > > > >
> > > > >  /**
> > > > > - * Function to cancel an alarm callback which has been registered before.
> > > > > + * Function to cancel an alarm callback which has been registered before. If
> > > > > + * used outside alarm callback it wait for all callbacks to finish its execution.
> > > > >   *
> > > > >   * @param cb_fn
> > > > >   *  alarm callback
> > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > > > index 480f0cb..ea8dfb4 100644
> > > > > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > > > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > > > @@ -69,7 +69,8 @@ struct alarm_entry {
> > > > >  	struct timeval time;
> > > > >  	rte_eal_alarm_callback cb_fn;
> > > > >  	void *cb_arg;
> > > > > -	volatile int executing;
> > > > > +	volatile uint8_t executing;
> > > > > +	volatile pthread_t executing_id;
> > > > >  };
> > > > >
> > > > >  static LIST_HEAD(alarm_list, alarm_entry) alarm_list = LIST_HEAD_INITIALIZER();
> > > > > @@ -108,11 +109,13 @@ eal_alarm_callback(struct rte_intr_handle *hdl __rte_unused,
> > > > >  			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec == now.tv_sec &&
> > > > >  						ap->time.tv_usec <= now.tv_usec))){
> > > > >  		ap->executing = 1;
> > > > > +		ap->executing_id = pthread_self();
> > > > How exactly does this work?  From my read all alarm callbacks are handled by the
> > > > thread created in rte_eal_intr_init (which runs forever in
> > > > eal_intr_thread_main()).
> > >
> > > In current implementation - yes.
> > >
> > >  So every assignment to the above executing_id value
> > > > will be from that thread.  As such, anytime rte_eal_alarm_cancel is called from
> > > > within a callback we are guaranteed that:
> > > > a) the ap->executing flag is set to 1
> > > > b) the ap->executing_id value should equal whatever is returned from
> > > > pthread_self()
> > >
> > > Yes
> > >
> > > >
> > > > That will cause the executing counter local to the cancel function to get
> > > > incremented, meaning we will deadlock withing that do { ... } while (executing
> > > > != 0) loop, no?
> > >
> > > No, as for the case when cancel is called from callback:
> > > pthread_equal(ap->executing_id, pthread_self())
> > > would return non-zero value (which means threads ids are equal), so executing will not be incremented.
> > >
> > Ah, pthread_equal is one of the backwards functions that returns zero for
> > inequality.  Maybe then rewrite that as:
> > if (!pthread_equal(...)
> > 
> > So its clear that we're looking for inequality there to increment?
> > 
> > > >
> > > > >  		rte_spinlock_unlock(&alarm_list_lk);
> > > > >
> > > > >  		ap->cb_fn(ap->cb_arg);
> > > > >
> > > > >  		rte_spinlock_lock(&alarm_list_lk);
> > > > > +
> > > > >  		LIST_REMOVE(ap, next);
> > > > >  		rte_free(ap);
> > > > >  	}
> > > > > @@ -145,7 +148,7 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb_fn, void *cb_arg)
> > > > >  	if (us < 1 || us > (UINT64_MAX - US_PER_S) || cb_fn == NULL)
> > > > >  		return -EINVAL;
> > > > >
> > > > > -	new_alarm = rte_malloc(NULL, sizeof(*new_alarm), 0);
> > > > > +	new_alarm = rte_zmalloc(NULL, sizeof(*new_alarm), 0);
> > > > >  	if (new_alarm == NULL)
> > > > >  		return -ENOMEM;
> > > > >
> > > > > @@ -156,7 +159,6 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb_fn, void *cb_arg)
> > > > >  	new_alarm->cb_arg = cb_arg;
> > > > >  	new_alarm->time.tv_usec = (now.tv_usec + us) % US_PER_S;
> > > > >  	new_alarm->time.tv_sec = now.tv_sec + ((now.tv_usec + us) / US_PER_S);
> > > > > -	new_alarm->executing = 0;
> > > > >
> > > > This removes the only place where ->executing is cleared again.  If there is
> > > > only one change to this bits state (which is the case after this patch), it
> > > > seems that you can just use the executing bit as the test in the alarm_cancel
> > > > function, and remove all the pthread_self mess.
> > >
> > > I believe we do need executing_id here.
> > > It allows us to distinguish are we executing cancel from a callback or not.
> > >
> > Given what you said above, I agree, at least in the current implementation.  It
> > still seems like theres a simpler solution that doesn't require all the
> > comparative gymnastics.
> > 
> > What if, instead of testing if you're the callback thread, we turn the executing
> > field of alarm_entry into a bitfield, where bit 0 represents the former
> > "executing" state, and bit 1 is defined as a "cancelled" bit.  Then
> > rte_eal_alarm_cancel becomes a search that, when an alarm is found simply or's
> > in the cancelled bit to the executing bit field.  When the callback thread runs,
> > it skips executing any alarm that is marked as cancelled, but frees all alarm
> > entries that are executed or cancelled.  That gives us a single point at which
> > frees of alarm entires happen?  Something like the patch below (completely
> > untested)?
> 
> So basically cancel() just set ALARM_CANCELLED and leaves actual alarm deletion to the callback()?
That was the thought, yes.

> I think it is doable - but I don't see any real advantage with that approach.
> Yes, code will become a bit simpler, as  we'll have one point when we remove alarm from the list.
Yes, that would be the advantage, that the code would be much simpler.

> But from other side, imagine such simple test-case:
> 
> for (i = 0; i < 0x100000; i++) {
>    rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
>    rte_eal_alarm_cancel(cb_func, (void *)i);
> } 
> 
> We'll endup with 1M of cancelled, but still not removed entries in the alarm_list.
> With current implementation that means - few MBs of wasted memory,
Thats correct, and the tradeoff to choose between.  Do you want simpler code
that is easier to maintain, or do you want a high speed cancel and set
operation.  I'm not aware of all the use cases, but I have a hard time seeing
a use case in which the in-flight alarm list grows unboundedly large, which in
my mind mitigates the risk of deferred removal, but I'm perfectly willing to
believe that there are use cases which I'm not aware of.

> plus very slow set() and cancel(), as they'll  have to traverse all entries in the list.  
> And all that - for empty from user perspective alarm_list 
> So I still prefer Michal's way.
> After all, it doesn't look that complicated to me. 
Except that the need for Michals fix arose from the fact that we have two free
locations that might both get called depending on the situation.  Thats what I'm
trying to address here, the complexity itself, rather than the fix (which I
agree is perfectly valid).

> BTW, any particular reason you are so negative about pthread_self()?
> 
Nothing specifically against it (save for its inverted return code sense, which
made it difficult for me to parse when reviewing).  Its more the complexity
itself in the alarm cancel and callback routine that I was looking at.  Given
that the origional bug happened because an cancel in a callback might produce a
double free, I wanted to fix it by simpifying the code, not adding conditions
which make it more complex.

You know, looking at it, something else just occured to me.  I think this could
all be fixed without the cancel flag or the pthread_self assignments.  What if
we simply removed the alarm from the list before we called the callback in
rte_eal_alarm_callback()?  That way any cancel operation called from within the
callback would fail, as it wouldn't appear on the list, and the callback
operation would be the only freeing entity?  That would let you still have a
fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch below


> > 
> > It also seems like the alarm api as a whole could use some improvement.  The
> > way its written right now, theres no way to refer to a specific alarm (i.e.
> > cancelation relies on the specification of a function and data pointer, which
> > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> > handle to a unique timer instance that can be store by a caller and used to
> > specfically cancel that timer?  Thats how both the bsd and linux timer
> > subsystems model timers.
> 
> Yeh,  alarm API looks a bit unusual. 
> Though, I suppose that's subject for another patch/discussion :)
> 
Yes, agreed :)
  

Comments

Wodkowski, PawelX Sept. 26, 2014, 12:37 p.m. UTC | #1
> So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> deletion to the callback()?
> That was the thought, yes.
> 
> > I think it is doable - but I don't see any real advantage with that approach.
> > Yes, code will become a bit simpler, as  we'll have one point when we remove
> alarm from the list.
> Yes, that would be the advantage, that the code would be much simpler.
> 
> > But from other side, imagine such simple test-case:
> >
> > for (i = 0; i < 0x100000; i++) {
> >    rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> >    rte_eal_alarm_cancel(cb_func, (void *)i);
> > }
> >
> > We'll endup with 1M of cancelled, but still not removed entries in the
> alarm_list.
> > With current implementation that means - few MBs of wasted memory,
> Thats correct, and the tradeoff to choose between.  Do you want simpler code
> that is easier to maintain, or do you want a high speed cancel and set
> operation.  I'm not aware of all the use cases, but I have a hard time seeing
> a use case in which the in-flight alarm list grows unboundedly large, which in
> my mind mitigates the risk of deferred removal, but I'm perfectly willing to
> believe that there are use cases which I'm not aware of.
> 
> > plus very slow set() and cancel(), as they'll  have to traverse all entries in the
> list.
> > And all that - for empty from user perspective alarm_list
> > So I still prefer Michal's way.
> > After all, it doesn't look that complicated to me.
> Except that the need for Michals fix arose from the fact that we have two free
> locations that might both get called depending on the situation.  Thats what I'm
> trying to address here, the complexity itself, rather than the fix (which I
> agree is perfectly valid).
> 
> > BTW, any particular reason you are so negative about pthread_self()?
> >
> Nothing specifically against it (save for its inverted return code sense, which
> made it difficult for me to parse when reviewing).  Its more the complexity
> itself in the alarm cancel and callback routine that I was looking at.  Given
> that the origional bug happened because an cancel in a callback might produce a
> double free, I wanted to fix it by simpifying the code, not adding conditions
> which make it more complex.
> 
> You know, looking at it, something else just occured to me.  I think this could
> all be fixed without the cancel flag or the pthread_self assignments.  What if
> we simply removed the alarm from the list before we called the callback in
> rte_eal_alarm_callback()?  That way any cancel operation called from within the
> callback would fail, as it wouldn't appear on the list, and the callback
> operation would be the only freeing entity?  That would let you still have a
> fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch
> below
> 
> 
> > >
> > > It also seems like the alarm api as a whole could use some improvement.
> The
> > > way its written right now, theres no way to refer to a specific alarm (i.e.
> > > cancelation relies on the specification of a function and data pointer, which
> > > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> > > handle to a unique timer instance that can be store by a caller and used to
> > > specfically cancel that timer?  Thats how both the bsd and linux timer
> > > subsystems model timers.
> >
> > Yeh,  alarm API looks a bit unusual.
> > Though, I suppose that's subject for another patch/discussion :)
> >
> Yes, agreed :)
> 

Please read quoted message bellow:

> >
> >
> > His solution *does* eliminate race condition too.
> >
> I applied his patch. And here is the problem
> 1	rte_spinlock_lock(&alarm_list_lk);
> 2	while ((ap = LIST_FIRST(&alarm_list)) !=NULL &&
> 3			gettimeofday(&now, NULL) == 0 &&
> 4			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> now.tv_sec &&
> 5						ap->time.tv_usec <=
> now.tv_usec))){
> 6		ap->executing |= ALARM_EXECUTING;
> 7		if (likely(!(ap->executing & ALARM_CANCELLED))) {
> 8				rte_spinlock_unlock(&alarm_list_lk);
> 9                           //another thread: rte_alarm_cancel called, mark this timer
> canceled and exit ( THE RACE)
> 10				ap->cb_fn(ap->cb_arg); // rte_alarm_set called
> (THE RACE)
> 11
> 12				rte_spinlock_lock(&alarm_list_lk);
> 13		}
> 14
> 15		rte_spinlock_lock(&alarm_list_lk);
> 16		LIST_REMOVE(ap, next);
> 17		rte_free(ap);
> 18	}
> 
> Imagine
> 
> Thread 1:                                         Thread2
> Execute eal_alarm_callback
> Lock list at 1                                   rte_alarm_cancel -> block on spinlock
> ....
> Realease lock at line 8                  rte_alarm_cancel -> resumes execution -> it
> mark this timer canceled
> ap->cb_fn is called at line 10       rte_alarm_cancel exits reporting all alarms are
> canceled and not executing (which is not true!)
>     rte_alarm_set is called
>     to rearm itself (THE RACE)
> 
> He only mark timer to * do not execute* but does not assure that it is not
> executing while canceling.
> Race condition is made by unlocking at line 8 and our solution workarounds this
> by looping until all alarms finish execution then cancel what left after callback
> finish (*including rearmed alarms*).
> Real elimination of the race would be by using recursive locking when *other
> thread can't* access the list *while callback* is running but callback *itself can
> by using recursive locking*.
> 
> Maybe I don't see something obvious? :)
> 
> Pawel
  
Neil Horman Sept. 26, 2014, 1:40 p.m. UTC | #2
On Fri, Sep 26, 2014 at 12:37:54PM +0000, Wodkowski, PawelX wrote:
> > So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> > deletion to the callback()?
> > That was the thought, yes.
> > 
> > > I think it is doable - but I don't see any real advantage with that approach.
> > > Yes, code will become a bit simpler, as  we'll have one point when we remove
> > alarm from the list.
> > Yes, that would be the advantage, that the code would be much simpler.
> > 
> > > But from other side, imagine such simple test-case:
> > >
> > > for (i = 0; i < 0x100000; i++) {
> > >    rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> > >    rte_eal_alarm_cancel(cb_func, (void *)i);
> > > }
> > >
> > > We'll endup with 1M of cancelled, but still not removed entries in the
> > alarm_list.
> > > With current implementation that means - few MBs of wasted memory,
> > Thats correct, and the tradeoff to choose between.  Do you want simpler code
> > that is easier to maintain, or do you want a high speed cancel and set
> > operation.  I'm not aware of all the use cases, but I have a hard time seeing
> > a use case in which the in-flight alarm list grows unboundedly large, which in
> > my mind mitigates the risk of deferred removal, but I'm perfectly willing to
> > believe that there are use cases which I'm not aware of.
> > 
> > > plus very slow set() and cancel(), as they'll  have to traverse all entries in the
> > list.
> > > And all that - for empty from user perspective alarm_list
> > > So I still prefer Michal's way.
> > > After all, it doesn't look that complicated to me.
> > Except that the need for Michals fix arose from the fact that we have two free
> > locations that might both get called depending on the situation.  Thats what I'm
> > trying to address here, the complexity itself, rather than the fix (which I
> > agree is perfectly valid).
> > 
> > > BTW, any particular reason you are so negative about pthread_self()?
> > >
> > Nothing specifically against it (save for its inverted return code sense, which
> > made it difficult for me to parse when reviewing).  Its more the complexity
> > itself in the alarm cancel and callback routine that I was looking at.  Given
> > that the origional bug happened because an cancel in a callback might produce a
> > double free, I wanted to fix it by simpifying the code, not adding conditions
> > which make it more complex.
> > 
> > You know, looking at it, something else just occured to me.  I think this could
> > all be fixed without the cancel flag or the pthread_self assignments.  What if
> > we simply removed the alarm from the list before we called the callback in
> > rte_eal_alarm_callback()?  That way any cancel operation called from within the
> > callback would fail, as it wouldn't appear on the list, and the callback
> > operation would be the only freeing entity?  That would let you still have a
> > fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch
> > below
> > 
> > 
> > > >
> > > > It also seems like the alarm api as a whole could use some improvement.
> > The
> > > > way its written right now, theres no way to refer to a specific alarm (i.e.
> > > > cancelation relies on the specification of a function and data pointer, which
> > > > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> > > > handle to a unique timer instance that can be store by a caller and used to
> > > > specfically cancel that timer?  Thats how both the bsd and linux timer
> > > > subsystems model timers.
> > >
> > > Yeh,  alarm API looks a bit unusual.
> > > Though, I suppose that's subject for another patch/discussion :)
> > >
> > Yes, agreed :)
> > 
> 
> Please read quoted message bellow:
> 
> > >
> > >
> > > His solution *does* eliminate race condition too.
> > >
> > I applied his patch. And here is the problem
> > 1	rte_spinlock_lock(&alarm_list_lk);
> > 2	while ((ap = LIST_FIRST(&alarm_list)) !=NULL &&
> > 3			gettimeofday(&now, NULL) == 0 &&
> > 4			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > now.tv_sec &&
> > 5						ap->time.tv_usec <=
> > now.tv_usec))){
> > 6		ap->executing |= ALARM_EXECUTING;
> > 7		if (likely(!(ap->executing & ALARM_CANCELLED))) {
> > 8				rte_spinlock_unlock(&alarm_list_lk);
> > 9                           //another thread: rte_alarm_cancel called, mark this timer
> > canceled and exit ( THE RACE)
> > 10				ap->cb_fn(ap->cb_arg); // rte_alarm_set called
> > (THE RACE)
> > 11
> > 12				rte_spinlock_lock(&alarm_list_lk);
> > 13		}
> > 14
> > 15		rte_spinlock_lock(&alarm_list_lk);
> > 16		LIST_REMOVE(ap, next);
> > 17		rte_free(ap);
> > 18	}
> > 
> > Imagine
> > 
> > Thread 1:                                         Thread2
> > Execute eal_alarm_callback
> > Lock list at 1                                   rte_alarm_cancel -> block on spinlock
> > ....
> > Realease lock at line 8                  rte_alarm_cancel -> resumes execution -> it
> > mark this timer canceled
> > ap->cb_fn is called at line 10       rte_alarm_cancel exits reporting all alarms are
> > canceled and not executing (which is not true!)
> >     rte_alarm_set is called
> >     to rearm itself (THE RACE)
> > 
> > He only mark timer to * do not execute* but does not assure that it is not
> > executing while canceling.
> > Race condition is made by unlocking at line 8 and our solution workarounds this
> > by looping until all alarms finish execution then cancel what left after callback
> > finish (*including rearmed alarms*).
> > Real elimination of the race would be by using recursive locking when *other
> > thread can't* access the list *while callback* is running but callback *itself can
> > by using recursive locking*.
> > 
> > Maybe I don't see something obvious? :)

I think you're missing the fact that your patch doesn't do what you assert above
either :)

First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
alarm implementation, because theres no ability to refer to a specific alarm
from the callers perspective.  When you call rte_eal_alarm_set you get a new
alarm every time.  So I don't really see a race there.  It might not be exactly
the behavior you want, but its not a race, becuase you're not modifying an alarm
in the middle of execution, you're just creating a new alarm, which is safe.

There is a race in what you describe above, insofar as its possible that you
might call rte_eal_alarm_cancel and return without having canceled all the
matching alarms.  I don't see any clear documentation on what the behavior is
supposed to be, but if you want to ensure that all matching alarms are cancelled
or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
API parlance, thats usually denoted as a cancel_sync operation).

For that race condition, you're correct, my patch doesn't address it, I see that
now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
within a callback function, then, by definition, you can't wait on the
completion of the active alarm, because thats a deadlock.  Its a necessecary
evil, I grant you, but it means that you can't be guaranteed the cancelled and
complete (cancel_sync) behavior that you want, at least not with the current
api.  If you want that behavior, you need to do one of two things:

1) Modify the api to allow callers to individually reference timer instances, so
that when cancelling, we can return an appropriate return code to indicate to
the caller that this alarm is in-progress.  That way you can guarantee the
caller that the specific alarm that you cancelled is either complete and cancelled
or currently executing.  Add an api to expicitly wait on a referenced alarm as
well.  This allows developers to know that, when executing an alarm callback, an
-ECURRENTLYEXECUTING return code is ok, because they are in the currently
executing context.


2) Understand that the guarantee can't be met, and change nothing, because you
consider it ok for an in-progress alarm to be ignored when getting cancelled.


Neil
  
Wodkowski, PawelX Sept. 26, 2014, 2:01 p.m. UTC | #3
> > > Maybe I don't see something obvious? :)
> 
> I think you're missing the fact that your patch doesn't do what you assert above
> either :)

Issue is not in setting alarms but canceling it. If you look closer to my patch you
see that it address this issue (look at added *do { lock(); ....; unlock(); } while( )* 
part).

> 
> First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> alarm implementation, because theres no ability to refer to a specific alarm
> from the callers perspective.  When you call rte_eal_alarm_set you get a new
> alarm every time.  So I don't really see a race there.  It might not be exactly
> the behavior you want, but its not a race, becuase you're not modifying an
> alarm
> in the middle of execution, you're just creating a new alarm, which is safe.

OK, it is safe, but this is not the case.

> 
> There is a race in what you describe above, insofar as its possible that you
> might call rte_eal_alarm_cancel and return without having canceled all the
> matching alarms.  I don't see any clear documentation on what the behavior is
> supposed to be, but if you want to ensure that all matching alarms are cancelled
> or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
> API parlance, thats usually denoted as a cancel_sync operation).

Again, look at the patch. I changed documentation to inform about this behavior.

> 
> For that race condition, you're correct, my patch doesn't address it, I see that
> now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> within a callback function, then, by definition, you can't wait on the
> completion of the active alarm, because thats a deadlock.  Its a necessecary
> evil, I grant you, but it means that you can't be guaranteed the cancelled and
> complete (cancel_sync) behavior that you want, at least not with the current
> api.  If you want that behavior, you need to do one of two things:

This patch does not break any API. It only removes undefined behavior.

> 
> 1) Modify the api to allow callers to individually reference timer instances, so
> that when cancelling, we can return an appropriate return code to indicate to
> the caller that this alarm is in-progress.  That way you can guarantee the
> caller that the specific alarm that you cancelled is either complete and cancelled
> or currently executing.  Add an api to expicitly wait on a referenced alarm as
> well.  This allows developers to know that, when executing an alarm callback, an
> -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> executing context.

This would brake API for sure.
  
Ananyev, Konstantin Sept. 26, 2014, 2:13 p.m. UTC | #4
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> Sent: Friday, September 26, 2014 2:40 PM
> To: Wodkowski, PawelX
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> 
> On Fri, Sep 26, 2014 at 12:37:54PM +0000, Wodkowski, PawelX wrote:
> > > So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> > > deletion to the callback()?
> > > That was the thought, yes.
> > >
> > > > I think it is doable - but I don't see any real advantage with that approach.
> > > > Yes, code will become a bit simpler, as  we'll have one point when we remove
> > > alarm from the list.
> > > Yes, that would be the advantage, that the code would be much simpler.
> > >
> > > > But from other side, imagine such simple test-case:
> > > >
> > > > for (i = 0; i < 0x100000; i++) {
> > > >    rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> > > >    rte_eal_alarm_cancel(cb_func, (void *)i);
> > > > }
> > > >
> > > > We'll endup with 1M of cancelled, but still not removed entries in the
> > > alarm_list.
> > > > With current implementation that means - few MBs of wasted memory,
> > > Thats correct, and the tradeoff to choose between.  Do you want simpler code
> > > that is easier to maintain, or do you want a high speed cancel and set
> > > operation.  I'm not aware of all the use cases, but I have a hard time seeing
> > > a use case in which the in-flight alarm list grows unboundedly large, which in
> > > my mind mitigates the risk of deferred removal, but I'm perfectly willing to
> > > believe that there are use cases which I'm not aware of.

After executing example above - from user perspective there is no active alarms in the system at all.
Though in fact alarm_list contains 1M entries. 

> > >
> > > > plus very slow set() and cancel(), as they'll  have to traverse all entries in the
> > > list.
> > > > And all that - for empty from user perspective alarm_list
> > > > So I still prefer Michal's way.
> > > > After all, it doesn't look that complicated to me.
> > > Except that the need for Michals fix arose from the fact that we have two free
> > > locations that might both get called depending on the situation.  Thats what I'm
> > > trying to address here, the complexity itself, rather than the fix (which I
> > > agree is perfectly valid).

Well, I believe his fix addresses all the open issues, no?

Another thing, as Pawel pointed in another mail, your fix doesn't address properly the situation when
we have a racing alarm_cancel(cb_func) and alarm cb_func rearming itself. 
While the original patch does.

> > >
> > > > BTW, any particular reason you are so negative about pthread_self()?
> > > >
> > > Nothing specifically against it (save for its inverted return code sense, which
> > > made it difficult for me to parse when reviewing).  Its more the complexity
> > > itself in the alarm cancel and callback routine that I was looking at.  Given
> > > that the origional bug happened because an cancel in a callback might produce a
> > > double free, I wanted to fix it by simpifying the code, not adding conditions
> > > which make it more complex.
> > >
> > > You know, looking at it, something else just occured to me.  I think this could
> > > all be fixed without the cancel flag or the pthread_self assignments.  What if
> > > we simply removed the alarm from the list before we called the callback in
> > > rte_eal_alarm_callback()?  That way any cancel operation called from within the
> > > callback would fail, as it wouldn't appear on the list, and the callback
> > > operation would be the only freeing entity?  That would let you still have a
> > > fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch
> > > below


Hmm, and how it would address any of the problems I mentioned above:

1. The alarm_list still would grow by the repetition of: {alarm_set(x); alarm_cansel(x);} 
2. Race between alarm_cancel(cb_func) and alarm cb_func rearming itself.

?

> > >
> > >
> > > > >
> > > > > It also seems like the alarm api as a whole could use some improvement.
> > > The
> > > > > way its written right now, theres no way to refer to a specific alarm (i.e.
> > > > > cancelation relies on the specification of a function and data pointer, which
> > > > > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> > > > > handle to a unique timer instance that can be store by a caller and used to
> > > > > specfically cancel that timer?  Thats how both the bsd and linux timer
> > > > > subsystems model timers.
> > > >
> > > > Yeh,  alarm API looks a bit unusual.
> > > > Though, I suppose that's subject for another patch/discussion :)
> > > >
> > > Yes, agreed :)
> > >
> >
> > Please read quoted message bellow:
> >
> > > >
> > > >
> > > > His solution *does* eliminate race condition too.
> > > >
> > > I applied his patch. And here is the problem
> > > 1	rte_spinlock_lock(&alarm_list_lk);
> > > 2	while ((ap = LIST_FIRST(&alarm_list)) !=NULL &&
> > > 3			gettimeofday(&now, NULL) == 0 &&
> > > 4			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > > now.tv_sec &&
> > > 5						ap->time.tv_usec <=
> > > now.tv_usec))){
> > > 6		ap->executing |= ALARM_EXECUTING;
> > > 7		if (likely(!(ap->executing & ALARM_CANCELLED))) {
> > > 8				rte_spinlock_unlock(&alarm_list_lk);
> > > 9                           //another thread: rte_alarm_cancel called, mark this timer
> > > canceled and exit ( THE RACE)
> > > 10				ap->cb_fn(ap->cb_arg); // rte_alarm_set called
> > > (THE RACE)
> > > 11
> > > 12				rte_spinlock_lock(&alarm_list_lk);
> > > 13		}
> > > 14
> > > 15		rte_spinlock_lock(&alarm_list_lk);
> > > 16		LIST_REMOVE(ap, next);
> > > 17		rte_free(ap);
> > > 18	}
> > >
> > > Imagine
> > >
> > > Thread 1:                                         Thread2
> > > Execute eal_alarm_callback
> > > Lock list at 1                                   rte_alarm_cancel -> block on spinlock
> > > ....
> > > Realease lock at line 8                  rte_alarm_cancel -> resumes execution -> it
> > > mark this timer canceled
> > > ap->cb_fn is called at line 10       rte_alarm_cancel exits reporting all alarms are
> > > canceled and not executing (which is not true!)
> > >     rte_alarm_set is called
> > >     to rearm itself (THE RACE)
> > >
> > > He only mark timer to * do not execute* but does not assure that it is not
> > > executing while canceling.
> > > Race condition is made by unlocking at line 8 and our solution workarounds this
> > > by looping until all alarms finish execution then cancel what left after callback
> > > finish (*including rearmed alarms*).
> > > Real elimination of the race would be by using recursive locking when *other
> > > thread can't* access the list *while callback* is running but callback *itself can
> > > by using recursive locking*.
> > >
> > > Maybe I don't see something obvious? :)
> 
> I think you're missing the fact that your patch doesn't do what you assert above
> either :)
> 
> First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> alarm implementation, because theres no ability to refer to a specific alarm
> from the callers perspective.  When you call rte_eal_alarm_set you get a new
> alarm every time.  So I don't really see a race there.  It might not be exactly
> the behavior you want, but its not a race, becuase you're not modifying an alarm
> in the middle of execution, you're just creating a new alarm, which is safe.
> 
> There is a race in what you describe above, insofar as its possible that you
> might call rte_eal_alarm_cancel and return without having canceled all the
> matching alarms.  I don't see any clear documentation on what the behavior is
> supposed to be, but if you want to ensure that all matching alarms are cancelled
> or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
> API parlance, thats usually denoted as a cancel_sync operation).
> 
> For that race condition, you're correct, my patch doesn't address it, I see that
> now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> within a callback function, then, by definition, you can't wait on the
> completion of the active alarm, because thats a deadlock.  Its a necessecary
> evil, I grant you, but it means that you can't be guaranteed the cancelled and
> complete (cancel_sync) behavior that you want, at least not with the current
> api.  If you want that behavior, you need to do one of two things:
> 
> 1) Modify the api to allow callers to individually reference timer instances, so
> that when cancelling, we can return an appropriate return code to indicate to
> the caller that this alarm is in-progress.  That way you can guarantee the
> caller that the specific alarm that you cancelled is either complete and cancelled
> or currently executing.  Add an api to expicitly wait on a referenced alarm as
> well.  This allows developers to know that, when executing an alarm callback, an
> -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> executing context.
> 
> 
> 2) Understand that the guarantee can't be met, and change nothing, because you
> consider it ok for an in-progress alarm to be ignored when getting cancelled.
> 
> 
> Neil
  
Neil Horman Sept. 26, 2014, 3:01 p.m. UTC | #5
On Fri, Sep 26, 2014 at 02:01:05PM +0000, Wodkowski, PawelX wrote:
> > > > Maybe I don't see something obvious? :)
> > 
> > I think you're missing the fact that your patch doesn't do what you assert above
> > either :)
> 
> Issue is not in setting alarms but canceling it. If you look closer to my patch you
> see that it address this issue (look at added *do { lock(); ....; unlock(); } while( )* 
> part).
> 
I get where the issue is, and I'm looking at your patch.  I see that you did
some locking there.  The issue I'm pointing out is that, if you call
rte_eal_alarm_cancel on an alarm callback, you will exit the alarm_cancel
function with, by definition, one alarm executing (the one you are currently
running).  You're patch works perfectly for the case where another thread calls
cancel, in that it waits until the executing alarm is complete, but it doesn't
work in the case where you are calling it from within the alarm callback. If
you're goal is to guarantee that all the matching alarms are cancelled and
complete, you haven't done that, because the recursive state is still unhandled.

> > 
> > First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> > alarm implementation, because theres no ability to refer to a specific alarm
> > from the callers perspective.  When you call rte_eal_alarm_set you get a new
> > alarm every time.  So I don't really see a race there.  It might not be exactly
> > the behavior you want, but its not a race, becuase you're not modifying an
> > alarm
> > in the middle of execution, you're just creating a new alarm, which is safe.
> 
> OK, it is safe, but this is not the case.
> 
I don't know what you mean by this.  We agree its safe, great.  But it is the
case as I've described it, you can see it from the implementation, every call to
rte_eal_alarm_set starts with a malloc of a new alarm structure. 

> > 
> > There is a race in what you describe above, insofar as its possible that you
> > might call rte_eal_alarm_cancel and return without having canceled all the
> > matching alarms.  I don't see any clear documentation on what the behavior is
> > supposed to be, but if you want to ensure that all matching alarms are cancelled
> > or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
> > API parlance, thats usually denoted as a cancel_sync operation).
> 
> Again, look at the patch. I changed documentation to inform about this behavior.
> 

This is the documentation included in the patch:
Change alarm cancel function to thread-safe.
        It eliminates a race between threads using rte_alarm_cancel and
        rte_alarm_set.

neither have you compeltely described the race condition (though you now have
previously in this thread), nor have you completely addressed it (calling
rte_eal_alarm_cancel and rte_eal_alarm_set still behaves exactly as it did
previously with a 2nd thread).

> > 
> > For that race condition, you're correct, my patch doesn't address it, I see that
> > now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> > within a callback function, then, by definition, you can't wait on the
> > completion of the active alarm, because thats a deadlock.  Its a necessecary
> > evil, I grant you, but it means that you can't be guaranteed the cancelled and
> > complete (cancel_sync) behavior that you want, at least not with the current
> > api.  If you want that behavior, you need to do one of two things:
> 
> This patch does not break any API. It only removes undefined behavior.
> 
I never said it did break ABI.  I said that to completely fix it you would have
to break ABI.  And it doesn't really remove undefined behavior, because you
still have the old behavior in the recursive case (which you may be ok with, I
don't know, but if you really want to address the behavior, you should address
this aspect of it).

> > 
> > 1) Modify the api to allow callers to individually reference timer instances, so
> > that when cancelling, we can return an appropriate return code to indicate to
> > the caller that this alarm is in-progress.  That way you can guarantee the
> > caller that the specific alarm that you cancelled is either complete and cancelled
> > or currently executing.  Add an api to expicitly wait on a referenced alarm as
> > well.  This allows developers to know that, when executing an alarm callback, an
> > -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> > executing context.
> 
> This would brake API for sure.
Yes, it would.  Bruce Richardson just made a major ABI break with his mbuf
cleanup set.  If there was a time to change ABI here, now would be the time I
think.

Neil

> 
>
  
Ananyev, Konstantin Sept. 26, 2014, 3:41 p.m. UTC | #6
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> Sent: Friday, September 26, 2014 4:02 PM
> To: Wodkowski, PawelX
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> 
> On Fri, Sep 26, 2014 at 02:01:05PM +0000, Wodkowski, PawelX wrote:
> > > > > Maybe I don't see something obvious? :)
> > >
> > > I think you're missing the fact that your patch doesn't do what you assert above
> > > either :)
> >
> > Issue is not in setting alarms but canceling it. If you look closer to my patch you
> > see that it address this issue (look at added *do { lock(); ....; unlock(); } while( )*
> > part).
> >
> I get where the issue is, and I'm looking at your patch.  I see that you did
> some locking there.  The issue I'm pointing out is that, if you call
> rte_eal_alarm_cancel on an alarm callback, you will exit the alarm_cancel
> function with, by definition, one alarm executing (the one you are currently
> running).  You're patch works perfectly for the case where another thread calls
> cancel, in that it waits until the executing alarm is complete, but it doesn't
> work in the case where you are calling it from within the alarm callback.

Hm, and why do we need it from alarm callback?
After cb_func() is finished given alarm entry will be removed anyway.

> If  you're goal is to guarantee that all the matching alarms are cancelled and
> complete, you haven't done that, because the recursive state is still unhandled.
> 
> > >
> > > First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> > > alarm implementation, because theres no ability to refer to a specific alarm
> > > from the callers perspective.  When you call rte_eal_alarm_set you get a new
> > > alarm every time.  So I don't really see a race there.  It might not be exactly
> > > the behavior you want, but its not a race, becuase you're not modifying an
> > > alarm
> > > in the middle of execution, you're just creating a new alarm, which is safe.
> >
> > OK, it is safe, but this is not the case.
> >
> I don't know what you mean by this.  We agree its safe, great.  But it is the
> case as I've described it, you can see it from the implementation, every call to
> rte_eal_alarm_set starts with a malloc of a new alarm structure.
> 
> > >
> > > There is a race in what you describe above, insofar as its possible that you
> > > might call rte_eal_alarm_cancel and return without having canceled all the
> > > matching alarms.  I don't see any clear documentation on what the behavior is
> > > supposed to be, but if you want to ensure that all matching alarms are cancelled
> > > or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
> > > API parlance, thats usually denoted as a cancel_sync operation).
> >
> > Again, look at the patch. I changed documentation to inform about this behavior.
> >
> 
> This is the documentation included in the patch:
> Change alarm cancel function to thread-safe.
>         It eliminates a race between threads using rte_alarm_cancel and
>         rte_alarm_set.
> 
> neither have you compeltely described the race condition (though you now have
> previously in this thread), nor have you completely addressed it (calling
> rte_eal_alarm_cancel and rte_eal_alarm_set still behaves exactly as it did
> previously with a 2nd thread).
> 
> > >
> > > For that race condition, you're correct, my patch doesn't address it, I see that
> > > now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> > > within a callback function, then, by definition, you can't wait on the
> > > completion of the active alarm, because thats a deadlock.  Its a necessecary
> > > evil, I grant you, but it means that you can't be guaranteed the cancelled and
> > > complete (cancel_sync) behavior that you want, at least not with the current
> > > api.  If you want that behavior, you need to do one of two things:
> >
> > This patch does not break any API. It only removes undefined behavior.
> >
> I never said it did break ABI.  I said that to completely fix it you would have
> to break ABI.  And it doesn't really remove undefined behavior, because you
> still have the old behavior in the recursive case (which you may be ok with, I
> don't know, but if you really want to address the behavior, you should address
> this aspect of it).
> 
> > >
> > > 1) Modify the api to allow callers to individually reference timer instances, so
> > > that when cancelling, we can return an appropriate return code to indicate to
> > > the caller that this alarm is in-progress.  That way you can guarantee the
> > > caller that the specific alarm that you cancelled is either complete and cancelled
> > > or currently executing.  Add an api to expicitly wait on a referenced alarm as
> > > well.  This allows developers to know that, when executing an alarm callback, an
> > > -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> > > executing context.
> >
> > This would brake API for sure.
> Yes, it would.  Bruce Richardson just made a major ABI break with his mbuf
> cleanup set.  If there was a time to change ABI here, now would be the time I
> think.

Ok, too many words for me, to be honest :)
Can I summarise:
As I remember the purpose of the patch was to fix the race condition inside rte_alarm library.
I believe that the patch provided by Michal & Pawel fixes the issues you discovered.
If you think, that is not the case, could you please provide a list of remaining issues?
Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total?      

If you have any concerns about mbuf reorg/expansion - probably better to contact Bruce and express them.
Not to use it as an argument for breaking any existing API without really good reason behind.  

Konstantin
  
Neil Horman Sept. 26, 2014, 4:21 p.m. UTC | #7
On Fri, Sep 26, 2014 at 03:41:58PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> > Sent: Friday, September 26, 2014 4:02 PM
> > To: Wodkowski, PawelX
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> > 
> > On Fri, Sep 26, 2014 at 02:01:05PM +0000, Wodkowski, PawelX wrote:
> > > > > > Maybe I don't see something obvious? :)
> > > >
> > > > I think you're missing the fact that your patch doesn't do what you assert above
> > > > either :)
> > >
> > > Issue is not in setting alarms but canceling it. If you look closer to my patch you
> > > see that it address this issue (look at added *do { lock(); ....; unlock(); } while( )*
> > > part).
> > >
> > I get where the issue is, and I'm looking at your patch.  I see that you did
> > some locking there.  The issue I'm pointing out is that, if you call
> > rte_eal_alarm_cancel on an alarm callback, you will exit the alarm_cancel
> > function with, by definition, one alarm executing (the one you are currently
> > running).  You're patch works perfectly for the case where another thread calls
> > cancel, in that it waits until the executing alarm is complete, but it doesn't
> > work in the case where you are calling it from within the alarm callback.
> 
> Hm, and why do we need it from alarm callback?
Because you might not know if you're in an alarm callback or not. Pawel
explained that the point of the patch was to ensure that alarms are canceled and
complete when you call rte_eal_alarm_cancel, and thats not always going to be
the case, even whith this patch.

> After cb_func() is finished given alarm entry will be removed anyway.
> 
Yes, but thats true with or without this patch.

> > If  you're goal is to guarantee that all the matching alarms are cancelled and
> > complete, you haven't done that, because the recursive state is still unhandled.
> > 
> > > >
> > > > First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> > > > alarm implementation, because theres no ability to refer to a specific alarm
> > > > from the callers perspective.  When you call rte_eal_alarm_set you get a new
> > > > alarm every time.  So I don't really see a race there.  It might not be exactly
> > > > the behavior you want, but its not a race, becuase you're not modifying an
> > > > alarm
> > > > in the middle of execution, you're just creating a new alarm, which is safe.
> > >
> > > OK, it is safe, but this is not the case.
> > >
> > I don't know what you mean by this.  We agree its safe, great.  But it is the
> > case as I've described it, you can see it from the implementation, every call to
> > rte_eal_alarm_set starts with a malloc of a new alarm structure.
> > 
> > > >
> > > > There is a race in what you describe above, insofar as its possible that you
> > > > might call rte_eal_alarm_cancel and return without having canceled all the
> > > > matching alarms.  I don't see any clear documentation on what the behavior is
> > > > supposed to be, but if you want to ensure that all matching alarms are cancelled
> > > > or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
> > > > API parlance, thats usually denoted as a cancel_sync operation).
> > >
> > > Again, look at the patch. I changed documentation to inform about this behavior.
> > >
> > 
> > This is the documentation included in the patch:
> > Change alarm cancel function to thread-safe.
> >         It eliminates a race between threads using rte_alarm_cancel and
> >         rte_alarm_set.
> > 
> > neither have you compeltely described the race condition (though you now have
> > previously in this thread), nor have you completely addressed it (calling
> > rte_eal_alarm_cancel and rte_eal_alarm_set still behaves exactly as it did
> > previously with a 2nd thread).
> > 
> > > >
> > > > For that race condition, you're correct, my patch doesn't address it, I see that
> > > > now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> > > > within a callback function, then, by definition, you can't wait on the
> > > > completion of the active alarm, because thats a deadlock.  Its a necessecary
> > > > evil, I grant you, but it means that you can't be guaranteed the cancelled and
> > > > complete (cancel_sync) behavior that you want, at least not with the current
> > > > api.  If you want that behavior, you need to do one of two things:
> > >
> > > This patch does not break any API. It only removes undefined behavior.
> > >
> > I never said it did break ABI.  I said that to completely fix it you would have
> > to break ABI.  And it doesn't really remove undefined behavior, because you
> > still have the old behavior in the recursive case (which you may be ok with, I
> > don't know, but if you really want to address the behavior, you should address
> > this aspect of it).
> > 
> > > >
> > > > 1) Modify the api to allow callers to individually reference timer instances, so
> > > > that when cancelling, we can return an appropriate return code to indicate to
> > > > the caller that this alarm is in-progress.  That way you can guarantee the
> > > > caller that the specific alarm that you cancelled is either complete and cancelled
> > > > or currently executing.  Add an api to expicitly wait on a referenced alarm as
> > > > well.  This allows developers to know that, when executing an alarm callback, an
> > > > -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> > > > executing context.
> > >
> > > This would brake API for sure.
> > Yes, it would.  Bruce Richardson just made a major ABI break with his mbuf
> > cleanup set.  If there was a time to change ABI here, now would be the time I
> > think.
> 
> Ok, too many words for me, to be honest :)
Yeah, its getting a bit verbose :)

> Can I summarise:
> As I remember the purpose of the patch was to fix the race condition inside rte_alarm library.
> I believe that the patch provided by Michal & Pawel fixes the issues you discovered.
> If you think, that is not the case, could you please provide a list of remaining issues?
> Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total?      
Gladly.  As Pawel explained the race, its possible that, after calling
rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be
running.  The problem with that ostensibly is that data which is being accessed
by the callback might be then accessed in parallel with another process leading
to data corruption or some other problem. The issue I have with his patch is
that it doesn't completely close the race.  While it does close the race for the
condition in whcih thread B is running the alarm callback while thread A is
executing the cancel operation, it does not close the case for when a single
thread B is running the cancel operation, as the in-flight execution itself is
still active.  If such a cancellation occurs via an intermediary function (i.e.
one which is not aware that it is explicitly running an alarm callback, which
signals another thread to execute via some other method (ipc communication,
etc), the same data corruption may occur, because the canceled and complete
guarantee has been violated.


> 
> If you have any concerns about mbuf reorg/expansion - probably better to contact Bruce and express them.
> Not to use it as an argument for breaking any existing API without really good reason behind.  
> 
No, no concerns at all, only pointing out that we've already broken ABI in this
release, which requires application writers to rebuild and adjust their
applications, so if we were going to adjust this api, now would be the time,
rather than in a futre release, requiring multiple application rebuilds.

Neil
  
Ananyev, Konstantin Sept. 26, 2014, 6:07 p.m. UTC | #8
> > As I remember the purpose of the patch was to fix the race condition inside rte_alarm library.
> > I believe that the patch provided by Michal & Pawel fixes the issues you discovered.
> > If you think, that is not the case, could you please provide a list of remaining issues?
> > Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total?


> Gladly.  As Pawel explained the race, its possible that, after calling
> rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be
> running.  The problem with that ostensibly is that data which is being accessed
> by the callback might be then accessed in parallel with another process leading
> to data corruption or some other problem. The issue I have with his patch is
> that it doesn't completely close the race.  While it does close the race for the
> condition in whcih thread B is running the alarm callback while thread A is
> executing the cancel operation, it does not close the case for when a single
> thread B is running the cancel operation, as the in-flight execution itself is
> still active.

A bit puzzled here:
Are you saying that calling alarm_cancel() for itself inside eal_alarm_callback() might cause a problem?
I still don't see how.

>  If such a cancellation occurs via an intermediary function (i.e.
> one which is not aware that it is explicitly running an alarm callback, which
> signals another thread to execute via some other method (ipc communication,
> etc), the same data corruption may occur, because the canceled and complete
> guarantee has been violated.
>
  
Neil Horman Sept. 26, 2014, 7:39 p.m. UTC | #9
On Fri, Sep 26, 2014 at 06:07:14PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > > As I remember the purpose of the patch was to fix the race condition inside rte_alarm library.
> > > I believe that the patch provided by Michal & Pawel fixes the issues you discovered.
> > > If you think, that is not the case, could you please provide a list of remaining issues?
> > > Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total?
> 
> 
> > Gladly.  As Pawel explained the race, its possible that, after calling
> > rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be
> > running.  The problem with that ostensibly is that data which is being accessed
> > by the callback might be then accessed in parallel with another process leading
> > to data corruption or some other problem. The issue I have with his patch is
> > that it doesn't completely close the race.  While it does close the race for the
> > condition in whcih thread B is running the alarm callback while thread A is
> > executing the cancel operation, it does not close the case for when a single
> > thread B is running the cancel operation, as the in-flight execution itself is
> > still active.
> 
> A bit puzzled here:
> Are you saying that calling alarm_cancel() for itself inside eal_alarm_callback() might cause a problem?
> I still don't see how.
> 
Potentially yes, by the same race condition that exists when using a secondary
thread to do the cancel call.  As I understand it the race that Pawel described
is as follows:

Thread A					Thread B
alarm_cancel()					eal_alarm_callback
    block on alarm spinlock			drop spinlock
        run cancel operation			execute callback function
    return from cancel
    rte_eal_alarm_set				

As Pawel described the problem, there is a desire to not set the new alarm while
the old alarm is still executing.  And his patch accomplishes that for the two
thread case above just fine

The problem with Pawels patch is that its non functional in the case where the
cancel happens within Thread B.  Lets change the scenario just a little bit:

Thread B					Thread C
eal_alarm_callback
     callback_function
          some_other_common_func
               rte_eal_alarm_cancel(this)
          pthread_signal(Thread C)		wake up
          operate on alarm data			rte_eal_alarm_set


In this scenario the problem is not fixed because when called from within the
alarm thread, the executing alarm is skipped (as it must be), but that fact is
invisible to the caller, and because of that its still possible for the same
origional problem to occur.

Neil
  
Ananyev, Konstantin Sept. 28, 2014, 4:12 p.m. UTC | #10
> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Friday, September 26, 2014 8:39 PM
> To: Ananyev, Konstantin
> Cc: Wodkowski, PawelX; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> 
> On Fri, Sep 26, 2014 at 06:07:14PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > > As I remember the purpose of the patch was to fix the race condition inside rte_alarm library.
> > > > I believe that the patch provided by Michal & Pawel fixes the issues you discovered.
> > > > If you think, that is not the case, could you please provide a list of remaining issues?
> > > > Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total?
> >
> >
> > > Gladly.  As Pawel explained the race, its possible that, after calling
> > > rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be
> > > running.  The problem with that ostensibly is that data which is being accessed
> > > by the callback might be then accessed in parallel with another process leading
> > > to data corruption or some other problem. The issue I have with his patch is
> > > that it doesn't completely close the race.  While it does close the race for the
> > > condition in whcih thread B is running the alarm callback while thread A is
> > > executing the cancel operation, it does not close the case for when a single
> > > thread B is running the cancel operation, as the in-flight execution itself is
> > > still active.
> >
> > A bit puzzled here:
> > Are you saying that calling alarm_cancel() for itself inside eal_alarm_callback() might cause a problem?
> > I still don't see how.
> >
> Potentially yes, by the same race condition that exists when using a secondary
> thread to do the cancel call.  As I understand it the race that Pawel described
> is as follows:
> 
> Thread A					Thread B
> alarm_cancel()					eal_alarm_callback
>     block on alarm spinlock			drop spinlock
>         run cancel operation			execute callback function
>     return from cancel
>     rte_eal_alarm_set
> 
> As Pawel described the problem, there is a desire to not set the new alarm while
> the old alarm is still executing.  And his patch accomplishes that for the two
> thread case above just fine
> 
> The problem with Pawels patch is that its non functional in the case where the
> cancel happens within Thread B.  Lets change the scenario just a little bit:
> 
> Thread B					Thread C
> eal_alarm_callback
>      callback_function
>           some_other_common_func
>                rte_eal_alarm_cancel(this)
>           pthread_signal(Thread C)		wake up
>           operate on alarm data			rte_eal_alarm_set
>

As I can see, there is an incorrect behaviour in your callback_function example.
It should first finish with " eal_alarm_callback" and only then send a signal to other thread.
Otherwise we can't help it in any way.
But I think, I understand your concern:
after rte_eal_aralm_cancel() finishes, the caller can't clearly distinguish what exactly happen:
1) alarm was cancelled succesfully.
2) alarm was not found (already cancelled or executed).
3) alarm is executing by the same thread and can't be cancelled.

Basically right now the caller can distinguish that either #1 or #2,3 happened, but can't distinguish between 2 & 3.
Correct?

 If that's so, then I suppose we can do: make alarm_cancel() to return a negative value for the case #3 (-EINPROGRESS or something).
 Something like: 
...
if (ap->executing == 0) {
   LIST_REMOVE(ap,next);
    rte_free(ap);
    count++;
    ap = ap_prev; 
} else if (pthread_equal(ap->executing_id, pthread_self()) == 0) {
    executing++;
} else { 
   ret = -EINPROGRESS;
}
...
return ((ret != 0) ? ret : count);

So the return value  will be > 0 for #1, 0 for #2, <0 for #3.
As I remember, you already suggested something similar in one of the previous mails.
Konstantin



 	
> 
> In this scenario the problem is not fixed because when called from within the
> alarm thread, the executing alarm is skipped (as it must be), but that fact is
> invisible to the caller, and because of that its still possible for the same
> origional problem to occur.
> 
> Neil
>
  
Neil Horman Sept. 28, 2014, 8:47 p.m. UTC | #11
On Sun, Sep 28, 2014 at 04:12:04PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Friday, September 26, 2014 8:39 PM
> > To: Ananyev, Konstantin
> > Cc: Wodkowski, PawelX; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> > 
> > On Fri, Sep 26, 2014 at 06:07:14PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > > As I remember the purpose of the patch was to fix the race condition inside rte_alarm library.
> > > > > I believe that the patch provided by Michal & Pawel fixes the issues you discovered.
> > > > > If you think, that is not the case, could you please provide a list of remaining issues?
> > > > > Excluding ones that you just don't like it, and you are not happy with rte_alarm API in total?
> > >
> > >
> > > > Gladly.  As Pawel explained the race, its possible that, after calling
> > > > rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be
> > > > running.  The problem with that ostensibly is that data which is being accessed
> > > > by the callback might be then accessed in parallel with another process leading
> > > > to data corruption or some other problem. The issue I have with his patch is
> > > > that it doesn't completely close the race.  While it does close the race for the
> > > > condition in whcih thread B is running the alarm callback while thread A is
> > > > executing the cancel operation, it does not close the case for when a single
> > > > thread B is running the cancel operation, as the in-flight execution itself is
> > > > still active.
> > >
> > > A bit puzzled here:
> > > Are you saying that calling alarm_cancel() for itself inside eal_alarm_callback() might cause a problem?
> > > I still don't see how.
> > >
> > Potentially yes, by the same race condition that exists when using a secondary
> > thread to do the cancel call.  As I understand it the race that Pawel described
> > is as follows:
> > 
> > Thread A					Thread B
> > alarm_cancel()					eal_alarm_callback
> >     block on alarm spinlock			drop spinlock
> >         run cancel operation			execute callback function
> >     return from cancel
> >     rte_eal_alarm_set
> > 
> > As Pawel described the problem, there is a desire to not set the new alarm while
> > the old alarm is still executing.  And his patch accomplishes that for the two
> > thread case above just fine
> > 
> > The problem with Pawels patch is that its non functional in the case where the
> > cancel happens within Thread B.  Lets change the scenario just a little bit:
> > 
> > Thread B					Thread C
> > eal_alarm_callback
> >      callback_function
> >           some_other_common_func
> >                rte_eal_alarm_cancel(this)
> >           pthread_signal(Thread C)		wake up
> >           operate on alarm data			rte_eal_alarm_set
> >
> 
> As I can see, there is an incorrect behaviour in your callback_function example.
> It should first finish with " eal_alarm_callback" and only then send a signal to other thread.
> Otherwise we can't help it in any way.
> But I think, I understand your concern:
> after rte_eal_aralm_cancel() finishes, the caller can't clearly distinguish what exactly happen:
> 1) alarm was cancelled succesfully.
> 2) alarm was not found (already cancelled or executed).
> 3) alarm is executing by the same thread and can't be cancelled.
> 
> Basically right now the caller can distinguish that either #1 or #2,3 happened, but can't distinguish between 2 & 3.
> Correct?
> 
Yes, this is my concern exactly.

>  If that's so, then I suppose we can do: make alarm_cancel() to return a negative value for the case #3 (-EINPROGRESS or something).
>  Something like: 
> ...
> if (ap->executing == 0) {
>    LIST_REMOVE(ap,next);
>     rte_free(ap);
>     count++;
>     ap = ap_prev; 
> } else if (pthread_equal(ap->executing_id, pthread_self()) == 0) {
>     executing++;
> } else { 
>    ret = -EINPROGRESS;
> }
> ...
> return ((ret != 0) ? ret : count);
> 
> So the return value  will be > 0 for #1, 0 for #2, <0 for #3.
> As I remember, you already suggested something similar in one of the previous mails.
Yes, I rolled the API changes I suggested in with this model, because I wanted
to be able to do precise specification of a timer instance to cancel, but if
we're not ready to make that change, I think what you propose above would be
suffficient.  Theres some question as to weather we would cancel timers that are
still pending on a return of -EINPROGRESS, but I think if we document it
accordingly, then it can be worked out just fine.

Best
Neil

> Konstantin
> 
> 
> 
>  	
> > 
> > In this scenario the problem is not fixed because when called from within the
> > alarm thread, the executing alarm is skipped (as it must be), but that fact is
> > invisible to the caller, and because of that its still possible for the same
> > origional problem to occur.
> > 
> > Neil
> > 
>
  
Wodkowski, PawelX Sept. 29, 2014, 6:40 a.m. UTC | #12
> Yes, this is my concern exactly.
> 
> >  If that's so, then I suppose we can do: make alarm_cancel() to return a
> negative value for the case #3 (-EINPROGRESS or something).
> >  Something like:
> > ...
> > if (ap->executing == 0) {
> >    LIST_REMOVE(ap,next);
> >     rte_free(ap);
> >     count++;
> >     ap = ap_prev;
> > } else if (pthread_equal(ap->executing_id, pthread_self()) == 0) {
> >     executing++;
> > } else {
> >    ret = -EINPROGRESS;
> > }
> > ...
> > return ((ret != 0) ? ret : count);
> >
> > So the return value  will be > 0 for #1, 0 for #2, <0 for #3.
> > As I remember, you already suggested something similar in one of the previous
> mails.
> Yes, I rolled the API changes I suggested in with this model, because I wanted
> to be able to do precise specification of a timer instance to cancel, but if
> we're not ready to make that change, I think what you propose above would be
> suffficient.  Theres some question as to weather we would cancel timers that
> are
> still pending on a return of -EINPROGRESS, but I think if we document it
> accordingly, then it can be worked out just fine.
> 
> Best
> Neil
> 

Image how you will be damned by someone that not even notice you change
and he Is managing some kind of resource based on returned number of 
set/canceled timers. If you suddenly start returning negative values how those
application will behave? Silently changing returned value domain is evil in its 
pure form.

From my point of view, problem is virtual because this is user application task to 
know what it can and what it not. If you really want to inform user application
about timer state you can introduce API call which will interrogate timers list
and return appropriate value, but for god sake, do not introduce untraceable bugs.

Pawel
  
Ananyev, Konstantin Sept. 29, 2014, 9:50 a.m. UTC | #13
> -----Original Message-----
> From: Wodkowski, PawelX
> Sent: Monday, September 29, 2014 7:41 AM
> To: Neil Horman; Ananyev, Konstantin
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> 
> > Yes, this is my concern exactly.
> >
> > >  If that's so, then I suppose we can do: make alarm_cancel() to return a
> > negative value for the case #3 (-EINPROGRESS or something).
> > >  Something like:
> > > ...
> > > if (ap->executing == 0) {
> > >    LIST_REMOVE(ap,next);
> > >     rte_free(ap);
> > >     count++;
> > >     ap = ap_prev;
> > > } else if (pthread_equal(ap->executing_id, pthread_self()) == 0) {
> > >     executing++;
> > > } else {
> > >    ret = -EINPROGRESS;
> > > }
> > > ...
> > > return ((ret != 0) ? ret : count);
> > >
> > > So the return value  will be > 0 for #1, 0 for #2, <0 for #3.
> > > As I remember, you already suggested something similar in one of the previous
> > mails.
> > Yes, I rolled the API changes I suggested in with this model, because I wanted
> > to be able to do precise specification of a timer instance to cancel, but if
> > we're not ready to make that change, I think what you propose above would be
> > suffficient.  Theres some question as to weather we would cancel timers that
> > are
> > still pending on a return of -EINPROGRESS, but I think if we document it
> > accordingly, then it can be worked out just fine.
> >
> > Best
> > Neil
> >
> 
> Image how you will be damned by someone that not even notice you change
> and he Is managing some kind of resource based on returned number of
> set/canceled timers. If you suddenly start returning negative values how those
> application will behave? Silently changing returned value domain is evil in its
> pure form.

As I can see the impact is very limited.
Only code that does check for (rte_alarm_cancel(...) == 0/ != 0) inside alarm callback function might be affected. 
From other side, indeed, there could exist situations, when the caller needs to know
was the alarm successfully cancelled or not. 
And if not by what reason. 

> 
> From my point of view, problem is virtual because this is user application task to
> know what it can and what it not. If you really want to inform user application
> about timer state you can introduce API call which will interrogate timers list
> and return appropriate value, but for god sake, do not introduce untraceable bugs.
> 
> Pawel
  
Wodkowski, PawelX Sept. 29, 2014, 10:11 a.m. UTC | #14
> >
> > Image how you will be damned by someone that not even notice you change
> > and he Is managing some kind of resource based on returned number of
> > set/canceled timers. If you suddenly start returning negative values how those
> > application will behave? Silently changing returned value domain is evil in its
> > pure form.
> 
> As I can see the impact is very limited.

It is small impact to DPDK but can be huge to user application:
Ex:
If someone use this kind of expression in callback (skipping user app serialization part):
callback () {
...
some_simple_semaphore += rte_alarm_cancel(...));
...
}

Anywhere in the code:
...
If (some_simple_semapore) {
	some_simple_semapore --;
	if (rte_eal_alarm_set(...) != 0)
		some_simple_semapore ++;
}
...

1. Do you notice the change in cancel function?
2. How many hours you spend to find this issue in case of big app/system?

> Only code that does check for (rte_alarm_cancel(...) == 0/ != 0) inside alarm
> callback function might be affected.
> From other side, indeed, there could exist situations, when the caller needs to
> know
> was the alarm successfully cancelled or not.
> And if not by what reason.
> 

I can extend API of rte alarms to add alarm state checking in next patch,  but for 
now, since this is not urgent I think original patch  v2 should be enough.

Pawel
  
Bruce Richardson Sept. 29, 2014, 10:33 a.m. UTC | #15
On Mon, Sep 29, 2014 at 10:11:38AM +0000, Wodkowski, PawelX wrote:
> > >
> > > Image how you will be damned by someone that not even notice you change
> > > and he Is managing some kind of resource based on returned number of
> > > set/canceled timers. If you suddenly start returning negative values how those
> > > application will behave? Silently changing returned value domain is evil in its
> > > pure form.
> > 
> > As I can see the impact is very limited.
> 
> It is small impact to DPDK but can be huge to user application:

This is why we traditionally have in the release-notes for each release a 
section dedicated to calling out changes from one release to another. [See 
http://dpdk.org/doc/intel/dpdk-release-notes-1.7.0.pdf section 5]. Since 
from release-to-release there are generally only a couple of changes - 
though our next release may be a little different - the actual changes are 
clear enough to read about without wading through pages of documentation. I 
thinking calling out the change in both the release notes and the API docs 
is sufficient even for a change like this.  

Basically, I wouldn't let API stability factor in too much in trying to get 
a proper fix for this issue.

/Bruce


> Ex:
> If someone use this kind of expression in callback (skipping user app serialization part):
> callback () {
> ...
> some_simple_semaphore += rte_alarm_cancel(...));
> ...
> }
> 
> Anywhere in the code:
> ...
> If (some_simple_semapore) {
> 	some_simple_semapore --;
> 	if (rte_eal_alarm_set(...) != 0)
> 		some_simple_semapore ++;
> }
> ...
> 
> 1. Do you notice the change in cancel function?
> 2. How many hours you spend to find this issue in case of big app/system?
> 
> > Only code that does check for (rte_alarm_cancel(...) == 0/ != 0) inside alarm
> > callback function might be affected.
> > From other side, indeed, there could exist situations, when the caller needs to
> > know
> > was the alarm successfully cancelled or not.
> > And if not by what reason.
> > 
> 
> I can extend API of rte alarms to add alarm state checking in next patch,  but for 
> now, since this is not urgent I think original patch  v2 should be enough.
> 
> Pawel
  
Bruce Richardson Sept. 29, 2014, 10:37 a.m. UTC | #16
On Fri, Sep 26, 2014 at 02:13:55PM +0000, Ananyev, Konstantin wrote:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> > Sent: Friday, September 26, 2014 2:40 PM
> > To: Wodkowski, PawelX
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> > 
> > On Fri, Sep 26, 2014 at 12:37:54PM +0000, Wodkowski, PawelX wrote:
> > > > So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> > > > deletion to the callback()?
> > > > That was the thought, yes.
> > > >
> > > > > I think it is doable - but I don't see any real advantage with that approach.
> > > > > Yes, code will become a bit simpler, as  we'll have one point when we remove
> > > > alarm from the list.
> > > > Yes, that would be the advantage, that the code would be much simpler.
> > > >
> > > > > But from other side, imagine such simple test-case:
> > > > >
> > > > > for (i = 0; i < 0x100000; i++) {
> > > > >    rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> > > > >    rte_eal_alarm_cancel(cb_func, (void *)i);
> > > > > }
> > > > >
> > > > > We'll endup with 1M of cancelled, but still not removed entries in the
> > > > alarm_list.
> > > > > With current implementation that means - few MBs of wasted memory,
> > > > Thats correct, and the tradeoff to choose between.  Do you want simpler code
> > > > that is easier to maintain, or do you want a high speed cancel and set
> > > > operation.  I'm not aware of all the use cases, but I have a hard time seeing
> > > > a use case in which the in-flight alarm list grows unboundedly large, which in
> > > > my mind mitigates the risk of deferred removal, but I'm perfectly willing to
> > > > believe that there are use cases which I'm not aware of.
> 
> After executing example above - from user perspective there is no active alarms in the system at all.
> Though in fact alarm_list contains 1M entries. 

This would concern me. It's likely that in applications, e.g. those with a 
network stack for instance, timers could be used for timeouts e.g. on 
connections, which would mean that the common case by far would be for 
timers to be cancelled or rescheduled without ever timing out.

/Bruce
  
Neil Horman Sept. 29, 2014, 11:35 a.m. UTC | #17
On Mon, Sep 29, 2014 at 10:11:38AM +0000, Wodkowski, PawelX wrote:
> > >
> > > Image how you will be damned by someone that not even notice you change
> > > and he Is managing some kind of resource based on returned number of
> > > set/canceled timers. If you suddenly start returning negative values how those
> > > application will behave? Silently changing returned value domain is evil in its
> > > pure form.
> > 
> > As I can see the impact is very limited.
> 
> It is small impact to DPDK but can be huge to user application:
> Ex:
> If someone use this kind of expression in callback (skipping user app serialization part):
> callback () {
> ...
> some_simple_semaphore += rte_alarm_cancel(...));

This code would be broken to begin with, as rte_eal_alarm_cancel is already
written to return negative return codes.  Its not documented as such, but its
still the case.  Note that if you run an application built against a shared
library on BSD, the definition of rte_eal_alarm_cancel returns -ENOTSUP.  The
above code would be broken because it doesn't account for that.  You can argue
that the documentation should be updated, but the dpdk in the wild already
conforms to the model Konstantin and I are proposing.

> ...
> }
> 
> Anywhere in the code:
> ...
> If (some_simple_semapore) {
> 	some_simple_semapore --;
> 	if (rte_eal_alarm_set(...) != 0)
> 		some_simple_semapore ++;
> }
> ...
> 
> 1. Do you notice the change in cancel function?
The application crashes, or otherwise misbehaves.

> 2. How many hours you spend to find this issue in case of big app/system?
You don't.  Such a problem as you describe would very likely result in a
semaphore deadlock, as the count would be incorrectly lowered, so you put
watches on the variable, note that sometimes the count goes down on a cancel,
which is completely counter-intuitive, read the updated documentation that
indicates error codes are possible (which you should have been prepared for
anyway), and move on with your day.

> 
> > Only code that does check for (rte_alarm_cancel(...) == 0/ != 0) inside alarm
> > callback function might be affected.
> > From other side, indeed, there could exist situations, when the caller needs to
> > know
> > was the alarm successfully cancelled or not.
> > And if not by what reason.
> > 
> 
> I can extend API of rte alarms to add alarm state checking in next patch,  but for 
> now, since this is not urgent I think original patch  v2 should be enough.
I re-assert my origional argument here, without the above change, you haven't
really fixed the race.  If you can find another way to do it, thats fine with
me, but keep in mind once again, that some implementations of rte_eal_alarm_set
already do whats being proposed.

Neil

> 
> Pawel
>
  
Wodkowski, PawelX Sept. 30, 2014, 11:13 a.m. UTC | #18
Paweł

> -----Original Message-----
> From: Richardson, Bruce
> Sent: Monday, September 29, 2014 12:33
> To: Wodkowski, PawelX
> Cc: Ananyev, Konstantin; Neil Horman; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-
> safe:
> 
> On Mon, Sep 29, 2014 at 10:11:38AM +0000, Wodkowski, PawelX wrote:
> > > >
> > > > Image how you will be damned by someone that not even notice you
> change
> > > > and he Is managing some kind of resource based on returned number of
> > > > set/canceled timers. If you suddenly start returning negative values how
> those
> > > > application will behave? Silently changing returned value domain is evil in
> its
> > > > pure form.
> > >
> > > As I can see the impact is very limited.
> >
> > It is small impact to DPDK but can be huge to user application:
> 
> This is why we traditionally have in the release-notes for each release a
> section dedicated to calling out changes from one release to another. [See
> http://dpdk.org/doc/intel/dpdk-release-notes-1.7.0.pdf section 5]. Since
> from release-to-release there are generally only a couple of changes -
> though our next release may be a little different - the actual changes are
> clear enough to read about without wading through pages of documentation. I
> thinking calling out the change in both the release notes and the API docs
> is sufficient even for a change like this.
> 
> Basically, I wouldn't let API stability factor in too much in trying to get
> a proper fix for this issue.
> 
> /Bruce
> 

Summarizing all proposed solutions and to be able to produce final patch - what
Is desired behavior after fix?

1. do we leave as is in Patch v2:
1.1 if canceling from other thread - if one of the alarms is executing, wait to 
  finish its execution and then cancel any rearmed alarms.
1.2 if canceling from alarm handler and one of the alarms to cancel is this 
  executing callback do no wait for it to finish and cancel anything else.
 
 in both cases return number of canceled callbacks.

2. Do exactly like in 1. but return -EINPROGRESS instead of canceled alarms
  if one of the alarms to cancel is currently executing callback from alarm thread
  (information about number of canceled alarms will be lost).

3. refuse to cancel anything if canceling currently executing alarm from alarm 
  callback and return -EINPROGRESS otherwise do like in 1.1.

4. Implement behaviour 1/2/3 (which?) and add API call to interrogate list of
  alarms and retrun state of given alarms(s).

5. other solutions?

Pawel
  
Wodkowski, PawelX Sept. 30, 2014, 12:05 p.m. UTC | #19
> -----Original Message-----
> Paweł
> 
> > On Mon, Sep 29, 2014 at 10:11:38AM +0000, Wodkowski, PawelX wrote:
> > > > >
> > > > > Image how you will be damned by someone that not even notice you
> > change
> > > > > and he Is managing some kind of resource based on returned number of
> > > > > set/canceled timers. If you suddenly start returning negative values how
> > those
> > > > > application will behave? Silently changing returned value domain is evil in
> > its
> > > > > pure form.
> > > >
> > > > As I can see the impact is very limited.
> > >
> > > It is small impact to DPDK but can be huge to user application:
> >
> > This is why we traditionally have in the release-notes for each release a
> > section dedicated to calling out changes from one release to another. [See
> > http://dpdk.org/doc/intel/dpdk-release-notes-1.7.0.pdf section 5]. Since
> > from release-to-release there are generally only a couple of changes -
> > though our next release may be a little different - the actual changes are
> > clear enough to read about without wading through pages of documentation.
> I
> > thinking calling out the change in both the release notes and the API docs
> > is sufficient even for a change like this.
> >
> > Basically, I wouldn't let API stability factor in too much in trying to get
> > a proper fix for this issue.
> >
> > /Bruce
> >
> 
> Summarizing all proposed solutions and to be able to produce final patch - what
> Is desired behavior after fix?
> 
> 1. do we leave as is in Patch v2:
> 1.1 if canceling from other thread - if one of the alarms is executing, wait to
>   finish its execution and then cancel any rearmed alarms.
> 1.2 if canceling from alarm handler and one of the alarms to cancel is this
>   executing callback do no wait for it to finish and cancel anything else.
> 
>  in both cases return number of canceled callbacks.
> 
> 2. Do exactly like in 1. but return -EINPROGRESS instead of canceled alarms
>   if one of the alarms to cancel is currently executing callback from alarm thread
>   (information about number of canceled alarms will be lost).

Or instead of returning -EINPROGRESS set errno to EINPROGRESS (replace
returning error value by setting errno and function can always return number
of canceled callbacks - in error condition 0)?

> 
> 3. refuse to cancel anything if canceling currently executing alarm from alarm
>   callback and return -EINPROGRESS otherwise do like in 1.1.
> 
> 4. Implement behaviour 1/2/3 (which?) and add API call to interrogate list of
>   alarms and retrun state of given alarms(s).
> 
> 5. other solutions?
> 
> Pawel
  
Ananyev, Konstantin Sept. 30, 2014, 12:30 p.m. UTC | #20
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wodkowski, PawelX
> Sent: Tuesday, September 30, 2014 1:05 PM
> To: Wodkowski, PawelX; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> 
> > -----Original Message-----
> > Paweł
> >
> > > On Mon, Sep 29, 2014 at 10:11:38AM +0000, Wodkowski, PawelX wrote:
> > > > > >
> > > > > > Image how you will be damned by someone that not even notice you
> > > change
> > > > > > and he Is managing some kind of resource based on returned number of
> > > > > > set/canceled timers. If you suddenly start returning negative values how
> > > those
> > > > > > application will behave? Silently changing returned value domain is evil in
> > > its
> > > > > > pure form.
> > > > >
> > > > > As I can see the impact is very limited.
> > > >
> > > > It is small impact to DPDK but can be huge to user application:
> > >
> > > This is why we traditionally have in the release-notes for each release a
> > > section dedicated to calling out changes from one release to another. [See
> > > http://dpdk.org/doc/intel/dpdk-release-notes-1.7.0.pdf section 5]. Since
> > > from release-to-release there are generally only a couple of changes -
> > > though our next release may be a little different - the actual changes are
> > > clear enough to read about without wading through pages of documentation.
> > I
> > > thinking calling out the change in both the release notes and the API docs
> > > is sufficient even for a change like this.
> > >
> > > Basically, I wouldn't let API stability factor in too much in trying to get
> > > a proper fix for this issue.
> > >
> > > /Bruce
> > >
> >
> > Summarizing all proposed solutions and to be able to produce final patch - what
> > Is desired behavior after fix?
> >
> > 1. do we leave as is in Patch v2:
> > 1.1 if canceling from other thread - if one of the alarms is executing, wait to
> >   finish its execution and then cancel any rearmed alarms.
> > 1.2 if canceling from alarm handler and one of the alarms to cancel is this
> >   executing callback do no wait for it to finish and cancel anything else.
> >
> >  in both cases return number of canceled callbacks.
> >
> > 2. Do exactly like in 1. but return -EINPROGRESS instead of canceled alarms
> >   if one of the alarms to cancel is currently executing callback from alarm thread
> >   (information about number of canceled alarms will be lost).
> 
> Or instead of returning -EINPROGRESS set errno to EINPROGRESS (replace
> returning error value by setting errno and function can always return number
> of canceled callbacks - in error condition 0)?

Yes that's looks like a better option. 
As I remember, inside DPDK we have our own rte_errno, that we can probably use here.
My vote would be for that approach.

Konstantin

> 
> >
> > 3. refuse to cancel anything if canceling currently executing alarm from alarm
> >   callback and return -EINPROGRESS otherwise do like in 1.1.
> >
> > 4. Implement behaviour 1/2/3 (which?) and add API call to interrogate list of
> >   alarms and retrun state of given alarms(s).
> >
> > 5. other solutions?
> >
> > Pawel
  
Neil Horman Sept. 30, 2014, 12:54 p.m. UTC | #21
On Tue, Sep 30, 2014 at 12:30:08PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wodkowski, PawelX
> > Sent: Tuesday, September 30, 2014 1:05 PM
> > To: Wodkowski, PawelX; Richardson, Bruce
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
> > 
> > > -----Original Message-----
> > > Paweł
> > >
> > > > On Mon, Sep 29, 2014 at 10:11:38AM +0000, Wodkowski, PawelX wrote:
> > > > > > >
> > > > > > > Image how you will be damned by someone that not even notice you
> > > > change
> > > > > > > and he Is managing some kind of resource based on returned number of
> > > > > > > set/canceled timers. If you suddenly start returning negative values how
> > > > those
> > > > > > > application will behave? Silently changing returned value domain is evil in
> > > > its
> > > > > > > pure form.
> > > > > >
> > > > > > As I can see the impact is very limited.
> > > > >
> > > > > It is small impact to DPDK but can be huge to user application:
> > > >
> > > > This is why we traditionally have in the release-notes for each release a
> > > > section dedicated to calling out changes from one release to another. [See
> > > > http://dpdk.org/doc/intel/dpdk-release-notes-1.7.0.pdf section 5]. Since
> > > > from release-to-release there are generally only a couple of changes -
> > > > though our next release may be a little different - the actual changes are
> > > > clear enough to read about without wading through pages of documentation.
> > > I
> > > > thinking calling out the change in both the release notes and the API docs
> > > > is sufficient even for a change like this.
> > > >
> > > > Basically, I wouldn't let API stability factor in too much in trying to get
> > > > a proper fix for this issue.
> > > >
> > > > /Bruce
> > > >
> > >
> > > Summarizing all proposed solutions and to be able to produce final patch - what
> > > Is desired behavior after fix?
> > >
> > > 1. do we leave as is in Patch v2:
> > > 1.1 if canceling from other thread - if one of the alarms is executing, wait to
> > >   finish its execution and then cancel any rearmed alarms.
> > > 1.2 if canceling from alarm handler and one of the alarms to cancel is this
> > >   executing callback do no wait for it to finish and cancel anything else.
> > >
> > >  in both cases return number of canceled callbacks.
> > >
> > > 2. Do exactly like in 1. but return -EINPROGRESS instead of canceled alarms
> > >   if one of the alarms to cancel is currently executing callback from alarm thread
> > >   (information about number of canceled alarms will be lost).
> > 
> > Or instead of returning -EINPROGRESS set errno to EINPROGRESS (replace
> > returning error value by setting errno and function can always return number
> > of canceled callbacks - in error condition 0)?
> 
> Yes that's looks like a better option. 
> As I remember, inside DPDK we have our own rte_errno, that we can probably use here.
> My vote would be for that approach.
> 
You'll want to document that interface to make sure callers understand that a
non-zero return code doesn't automatically mean complete success, but yes, in
this case that seems like a pretty reasonable solution.
Neil

> Konstantin
> 
> > 
> > >
> > > 3. refuse to cancel anything if canceling currently executing alarm from alarm
> > >   callback and return -EINPROGRESS otherwise do like in 1.1.
> > >
> > > 4. Implement behaviour 1/2/3 (which?) and add API call to interrogate list of
> > >   alarms and retrun state of given alarms(s).
> > >
> > > 5. other solutions?
> > >
> > > Pawel
>
  

Patch

diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/linuxapp/eal/eal_alarm.c
index 480f0cb..d586ff4 100644
--- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
+++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
@@ -107,13 +107,13 @@  eal_alarm_callback(struct rte_intr_handle *hdl __rte_unused,
 			gettimeofday(&now, NULL) == 0 &&
 			(ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec == now.tv_sec &&
 						ap->time.tv_usec <= now.tv_usec))){
+		LIST_REMOVE(ap, next);
 		ap->executing = 1;
 		rte_spinlock_unlock(&alarm_list_lk);
 
 		ap->cb_fn(ap->cb_arg);
 
 		rte_spinlock_lock(&alarm_list_lk);
-		LIST_REMOVE(ap, next);
 		rte_free(ap);
 	}