event/dsw: avoid reusing previously recorded events

Message ID 20200305104651.25456-1-mattias.ronnblom@ericsson.com (mailing list archive)
State Changes Requested, archived
Delegated to: Jerin Jacob
Headers
Series event/dsw: avoid reusing previously recorded events |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Mattias Rönnblom March 5, 2020, 10:46 a.m. UTC
  Avoid reusing recorded events when performing a migration, since this
may make the migration selection logic pick an already-moved flow.

Fixes: f6257b22e767 ("event/dsw: add load balancing")
Cc: stable@dpdk.org

Reported-by: Venky Venkatesh <vvenkatesh@paloaltonetworks.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 drivers/event/dsw/dsw_event.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Venky Venkatesh March 6, 2020, 6:05 a.m. UTC | #1
Hi Mattias,
Have a question on this fix. I understand you wanting a certain number of
events before making a decision to migrate (in the above fix).
However, suppose there are fewer events over a few flows (even if not many)
and yet your core is heavily loaded -- indicating may be they are one or
more very CPU intensive flows. Often in DPI situations depending on the
complexity of the policy you can get delayed longer. It might still be
worthwhile to migrate if the other cores are really lightly loaded. I think
that case will be missed out in this approach.

Fundamentally, the number of packets being a proxy-metric for the load of
that flow on the cpu is simplistic at times. Very CPU intensive
medium/lower bandwidth flows can be picked up in this heuristic. If there
is a way that at the time of DSW init we can have a way of tuning it
depending on the application scenario it might be more flexible.

Thanks
-Venky


On Thu, Mar 5, 2020 at 2:47 AM Mattias Rönnblom <
mattias.ronnblom@ericsson.com> wrote:

> Avoid reusing recorded events when performing a migration, since this
> may make the migration selection logic pick an already-moved flow.
>
> Fixes: f6257b22e767 ("event/dsw: add load balancing")
> Cc: stable@dpdk.org
>
> Reported-by: Venky Venkatesh <vvenkatesh@paloaltonetworks.com>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>  drivers/event/dsw/dsw_event.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
> index d68b71b98..296adea18 100644
> --- a/drivers/event/dsw/dsw_event.c
> +++ b/drivers/event/dsw/dsw_event.c
> @@ -646,6 +646,9 @@ dsw_port_consider_migration(struct dsw_evdev *dsw,
>         if (dsw->num_ports == 1)
>                 return;
>
> +       if (seen_events_len < DSW_MAX_EVENTS_RECORDED)
> +               return;
> +
>         DSW_LOG_DP_PORT(DEBUG, source_port->id, "Considering
> migration.\n");
>
>         /* Randomize interval to avoid having all threads considering
> --
> 2.17.1
>
>
  
Mattias Rönnblom March 6, 2020, 7:48 a.m. UTC | #2
On 2020-03-06 07:05, Venky Venkatesh wrote:
> Hi Mattias,
> Have a question on this fix. I understand you wanting a certain number 
> of events before making a decision to migrate (in the above fix).
> However, suppose there are fewer events over a few flows (even if not 
> many) and yet your core is heavily loaded -- indicating may be they 
> are one or more very CPU intensive flows. Often in DPI situations 
> depending on the complexity of the policy you can get delayed longer. 
> It might still be worthwhile to migrate if the other cores are really 
> lightly loaded. I think that case will be missed out in this approach.
>

The reason for requiring at least 128 events recorded was primarily 
since it simplified the code a little, in combination with the fact that 
most packet processing happens at much higher speeds than 128 events/1 
ms. If you have a lower rate, the only thing that's going to happen is 
that the migration will happen a little later, when you've hit 128.


If you run at 2,6 GHz with high utilization, and fail to process more 
than 128 events in 1 ms, then the average cost of processing an event is 
in the range of 20.000 clock cycles, which seemed a lot to me, maybe 
even for DPI?

> Fundamentally, the number of packets being a proxy-metric for the load 
> of that flow on the cpu is simplistic at times. Very CPU intensive 
> medium/lower bandwidth flows can be picked up in this heuristic. If 
> there is a way that at the time of DSW init we can have a way of 
> tuning it depending on the application scenario it might be more flexible.
>
Yes, indeed. If you have large asymmetries in the average per-event 
processing latency for different flow types, this can become a problem, 
with what's effectively a very heavy flow, "masking" as a small one, and 
causing trouble when it's being migrated.

To individually keep track of which flow generates what load is just too 
expensive, if DSW should remain a good choice for high packet-rate 
applications. Just the rdtsc required to figure how much a single event 
costed to process that likely would be required will substantially 
increase the scheduler overhead. Add to this the need to load, process 
and store this information.

The parameter would need to control an approach taken, not just tweaking 
a single algorithm, I think. Maybe if you'd reorganized the code a 
little, you could do this in a clean way. First though you would like to 
know it's a real problem, in a real application - not just a theory.

An alternative approach that might be possible is to break the 
processing of these very heavy flows into more processing stages. 
Instead of, say, one 20k cycles stage, you have four stages of roughly 
1/4 the size. That would also significant improve single-flow 
performance, if that's relevant.

> Thanks
> -Venky
>
>
> On Thu, Mar 5, 2020 at 2:47 AM Mattias Rönnblom 
> <mattias.ronnblom@ericsson.com <mailto:mattias.ronnblom@ericsson.com>> 
> wrote:
>
>     Avoid reusing recorded events when performing a migration, since this
>     may make the migration selection logic pick an already-moved flow.
>
>     Fixes: f6257b22e767 ("event/dsw: add load balancing")
>     Cc: stable@dpdk.org <mailto:stable@dpdk.org>
>
>     Reported-by: Venky Venkatesh <vvenkatesh@paloaltonetworks.com
>     <mailto:vvenkatesh@paloaltonetworks.com>>
>     Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com
>     <mailto:mattias.ronnblom@ericsson.com>>
>     ---
>      drivers/event/dsw/dsw_event.c | 3 +++
>      1 file changed, 3 insertions(+)
>
>     diff --git a/drivers/event/dsw/dsw_event.c
>     b/drivers/event/dsw/dsw_event.c
>     index d68b71b98..296adea18 100644
>     --- a/drivers/event/dsw/dsw_event.c
>     +++ b/drivers/event/dsw/dsw_event.c
>     @@ -646,6 +646,9 @@ dsw_port_consider_migration(struct dsw_evdev *dsw,
>             if (dsw->num_ports == 1)
>                     return;
>
>     +       if (seen_events_len < DSW_MAX_EVENTS_RECORDED)
>     +               return;
>     +
>             DSW_LOG_DP_PORT(DEBUG, source_port->id, "Considering
>     migration.\n");
>
>             /* Randomize interval to avoid having all threads considering
>     -- 
>     2.17.1
>
  
Jerin Jacob April 4, 2020, 12:35 p.m. UTC | #3
On Fri, Mar 6, 2020 at 11:35 AM Venky Venkatesh
<vvenkatesh@paloaltonetworks.com> wrote:
>
> Hi Mattias,
> Have a question on this fix. I understand you wanting a certain number of
> events before making a decision to migrate (in the above fix).
> However, suppose there are fewer events over a few flows (even if not many)
> and yet your core is heavily loaded -- indicating may be they are one or
> more very CPU intensive flows. Often in DPI situations depending on the
> complexity of the policy you can get delayed longer. It might still be
> worthwhile to migrate if the other cores are really lightly loaded. I think
> that case will be missed out in this approach.
>
> Fundamentally, the number of packets being a proxy-metric for the load of
> that flow on the cpu is simplistic at times. Very CPU intensive
> medium/lower bandwidth flows can be picked up in this heuristic. If there
> is a way that at the time of DSW init we can have a way of tuning it
> depending on the application scenario it might be more flexible.

Hi Venky and Mattias,

Is this patch to good to merge?

>
> Thanks
> -Venky
>
>
> On Thu, Mar 5, 2020 at 2:47 AM Mattias Rönnblom <
> mattias.ronnblom@ericsson.com> wrote:
>
> > Avoid reusing recorded events when performing a migration, since this
> > may make the migration selection logic pick an already-moved flow.
> >
> > Fixes: f6257b22e767 ("event/dsw: add load balancing")
> > Cc: stable@dpdk.org
> >
> > Reported-by: Venky Venkatesh <vvenkatesh@paloaltonetworks.com>
> > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > ---
> >  drivers/event/dsw/dsw_event.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
> > index d68b71b98..296adea18 100644
> > --- a/drivers/event/dsw/dsw_event.c
> > +++ b/drivers/event/dsw/dsw_event.c
> > @@ -646,6 +646,9 @@ dsw_port_consider_migration(struct dsw_evdev *dsw,
> >         if (dsw->num_ports == 1)
> >                 return;
> >
> > +       if (seen_events_len < DSW_MAX_EVENTS_RECORDED)
> > +               return;
> > +
> >         DSW_LOG_DP_PORT(DEBUG, source_port->id, "Considering
> > migration.\n");
> >
> >         /* Randomize interval to avoid having all threads considering
> > --
> > 2.17.1
> >
> >
  
Jerin Jacob April 14, 2020, 12:45 p.m. UTC | #4
On Sat, Apr 4, 2020 at 6:05 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Fri, Mar 6, 2020 at 11:35 AM Venky Venkatesh
> <vvenkatesh@paloaltonetworks.com> wrote:
> >
> > Hi Mattias,
> > Have a question on this fix. I understand you wanting a certain number of
> > events before making a decision to migrate (in the above fix).
> > However, suppose there are fewer events over a few flows (even if not many)
> > and yet your core is heavily loaded -- indicating may be they are one or
> > more very CPU intensive flows. Often in DPI situations depending on the
> > complexity of the policy you can get delayed longer. It might still be
> > worthwhile to migrate if the other cores are really lightly loaded. I think
> > that case will be missed out in this approach.
> >
> > Fundamentally, the number of packets being a proxy-metric for the load of
> > that flow on the cpu is simplistic at times. Very CPU intensive
> > medium/lower bandwidth flows can be picked up in this heuristic. If there
> > is a way that at the time of DSW init we can have a way of tuning it
> > depending on the application scenario it might be more flexible.
>
> Hi Venky and Mattias,
>
> Is this patch to good to merge?

Please mark the patch as "Not applicable" if the patch is not required.


>
> >
> > Thanks
> > -Venky
> >
> >
> > On Thu, Mar 5, 2020 at 2:47 AM Mattias Rönnblom <
> > mattias.ronnblom@ericsson.com> wrote:
> >
> > > Avoid reusing recorded events when performing a migration, since this
> > > may make the migration selection logic pick an already-moved flow.
> > >
> > > Fixes: f6257b22e767 ("event/dsw: add load balancing")
> > > Cc: stable@dpdk.org
> > >
> > > Reported-by: Venky Venkatesh <vvenkatesh@paloaltonetworks.com>
> > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > > ---
> > >  drivers/event/dsw/dsw_event.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
> > > index d68b71b98..296adea18 100644
> > > --- a/drivers/event/dsw/dsw_event.c
> > > +++ b/drivers/event/dsw/dsw_event.c
> > > @@ -646,6 +646,9 @@ dsw_port_consider_migration(struct dsw_evdev *dsw,
> > >         if (dsw->num_ports == 1)
> > >                 return;
> > >
> > > +       if (seen_events_len < DSW_MAX_EVENTS_RECORDED)
> > > +               return;
> > > +
> > >         DSW_LOG_DP_PORT(DEBUG, source_port->id, "Considering
> > > migration.\n");
> > >
> > >         /* Randomize interval to avoid having all threads considering
> > > --
> > > 2.17.1
> > >
> > >
  
Mattias Rönnblom April 14, 2020, 1:11 p.m. UTC | #5
On 2020-04-14 14:45, Jerin Jacob wrote:
> On Sat, Apr 4, 2020 at 6:05 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>> On Fri, Mar 6, 2020 at 11:35 AM Venky Venkatesh
>> <vvenkatesh@paloaltonetworks.com> wrote:
>>> Hi Mattias,
>>> Have a question on this fix. I understand you wanting a certain number of
>>> events before making a decision to migrate (in the above fix).
>>> However, suppose there are fewer events over a few flows (even if not many)
>>> and yet your core is heavily loaded -- indicating may be they are one or
>>> more very CPU intensive flows. Often in DPI situations depending on the
>>> complexity of the policy you can get delayed longer. It might still be
>>> worthwhile to migrate if the other cores are really lightly loaded. I think
>>> that case will be missed out in this approach.
>>>
>>> Fundamentally, the number of packets being a proxy-metric for the load of
>>> that flow on the cpu is simplistic at times. Very CPU intensive
>>> medium/lower bandwidth flows can be picked up in this heuristic. If there
>>> is a way that at the time of DSW init we can have a way of tuning it
>>> depending on the application scenario it might be more flexible.
>> Hi Venky and Mattias,
>>
>> Is this patch to good to merge?
> Please mark the patch as "Not applicable" if the patch is not required.
>

This patch should be merged. Thanks.


>>> Thanks
>>> -Venky
>>>
>>>
>>> On Thu, Mar 5, 2020 at 2:47 AM Mattias Rönnblom <
>>> mattias.ronnblom@ericsson.com> wrote:
>>>
>>>> Avoid reusing recorded events when performing a migration, since this
>>>> may make the migration selection logic pick an already-moved flow.
>>>>
>>>> Fixes: f6257b22e767 ("event/dsw: add load balancing")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Reported-by: Venky Venkatesh <vvenkatesh@paloaltonetworks.com>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> ---
>>>>   drivers/event/dsw/dsw_event.c | 3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
>>>> index d68b71b98..296adea18 100644
>>>> --- a/drivers/event/dsw/dsw_event.c
>>>> +++ b/drivers/event/dsw/dsw_event.c
>>>> @@ -646,6 +646,9 @@ dsw_port_consider_migration(struct dsw_evdev *dsw,
>>>>          if (dsw->num_ports == 1)
>>>>                  return;
>>>>
>>>> +       if (seen_events_len < DSW_MAX_EVENTS_RECORDED)
>>>> +               return;
>>>> +
>>>>          DSW_LOG_DP_PORT(DEBUG, source_port->id, "Considering
>>>> migration.\n");
>>>>
>>>>          /* Randomize interval to avoid having all threads considering
>>>> --
>>>> 2.17.1
>>>>
>>>>
  
Jerin Jacob April 14, 2020, 1:32 p.m. UTC | #6
On Tue, Apr 14, 2020 at 6:41 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> On 2020-04-14 14:45, Jerin Jacob wrote:
> > On Sat, Apr 4, 2020 at 6:05 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
> >> On Fri, Mar 6, 2020 at 11:35 AM Venky Venkatesh
> >> <vvenkatesh@paloaltonetworks.com> wrote:
> >>> Hi Mattias,
> >>> Have a question on this fix. I understand you wanting a certain number of
> >>> events before making a decision to migrate (in the above fix).
> >>> However, suppose there are fewer events over a few flows (even if not many)
> >>> and yet your core is heavily loaded -- indicating may be they are one or
> >>> more very CPU intensive flows. Often in DPI situations depending on the
> >>> complexity of the policy you can get delayed longer. It might still be
> >>> worthwhile to migrate if the other cores are really lightly loaded. I think
> >>> that case will be missed out in this approach.
> >>>
> >>> Fundamentally, the number of packets being a proxy-metric for the load of
> >>> that flow on the cpu is simplistic at times. Very CPU intensive
> >>> medium/lower bandwidth flows can be picked up in this heuristic. If there
> >>> is a way that at the time of DSW init we can have a way of tuning it
> >>> depending on the application scenario it might be more flexible.
> >> Hi Venky and Mattias,
> >>
> >> Is this patch to good to merge?
> > Please mark the patch as "Not applicable" if the patch is not required.
> >
>
> This patch should be merged. Thanks.

Sure. I will wait for couple of days on any comments on this patch. If
none, I will merge this.



>
>
> >>> Thanks
> >>> -Venky
> >>>
> >>>
> >>> On Thu, Mar 5, 2020 at 2:47 AM Mattias Rönnblom <
> >>> mattias.ronnblom@ericsson.com> wrote:
> >>>
> >>>> Avoid reusing recorded events when performing a migration, since this
> >>>> may make the migration selection logic pick an already-moved flow.
> >>>>
> >>>> Fixes: f6257b22e767 ("event/dsw: add load balancing")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Reported-by: Venky Venkatesh <vvenkatesh@paloaltonetworks.com>
> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>> ---
> >>>>   drivers/event/dsw/dsw_event.c | 3 +++
> >>>>   1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
> >>>> index d68b71b98..296adea18 100644
> >>>> --- a/drivers/event/dsw/dsw_event.c
> >>>> +++ b/drivers/event/dsw/dsw_event.c
> >>>> @@ -646,6 +646,9 @@ dsw_port_consider_migration(struct dsw_evdev *dsw,
> >>>>          if (dsw->num_ports == 1)
> >>>>                  return;
> >>>>
> >>>> +       if (seen_events_len < DSW_MAX_EVENTS_RECORDED)
> >>>> +               return;
> >>>> +
> >>>>          DSW_LOG_DP_PORT(DEBUG, source_port->id, "Considering
> >>>> migration.\n");
> >>>>
> >>>>          /* Randomize interval to avoid having all threads considering
> >>>> --
> >>>> 2.17.1
> >>>>
> >>>>
>
  
Jerin Jacob May 1, 2020, 8:01 a.m. UTC | #7
On Tue, Apr 14, 2020 at 7:02 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Tue, Apr 14, 2020 at 6:41 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
> >
> > On 2020-04-14 14:45, Jerin Jacob wrote:
> > > On Sat, Apr 4, 2020 at 6:05 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > >> On Fri, Mar 6, 2020 at 11:35 AM Venky Venkatesh
> > >> <vvenkatesh@paloaltonetworks.com> wrote:
> > >>> Hi Mattias,
> > >>> Have a question on this fix. I understand you wanting a certain number of
> > >>> events before making a decision to migrate (in the above fix).
> > >>> However, suppose there are fewer events over a few flows (even if not many)
> > >>> and yet your core is heavily loaded -- indicating may be they are one or
> > >>> more very CPU intensive flows. Often in DPI situations depending on the
> > >>> complexity of the policy you can get delayed longer. It might still be
> > >>> worthwhile to migrate if the other cores are really lightly loaded. I think
> > >>> that case will be missed out in this approach.
> > >>>
> > >>> Fundamentally, the number of packets being a proxy-metric for the load of
> > >>> that flow on the cpu is simplistic at times. Very CPU intensive
> > >>> medium/lower bandwidth flows can be picked up in this heuristic. If there
> > >>> is a way that at the time of DSW init we can have a way of tuning it
> > >>> depending on the application scenario it might be more flexible.
> > >> Hi Venky and Mattias,
> > >>
> > >> Is this patch to good to merge?
> > > Please mark the patch as "Not applicable" if the patch is not required.
> > >
> >
> > This patch should be merged. Thanks.
>
> Sure. I will wait for couple of days on any comments on this patch. If
> none, I will merge this.

Could you rebase this patch to next-eventdev master and send the next
version so that I can merge it?

[master][dpdk-next-eventdev] $ git am -3
/tmp/to_merge/event-dsw-avoid-reusing-previously-recorded-events
Applying: event/dsw: avoid reusing previously recorded events
Using index info to reconstruct a base tree...
M       drivers/event/dsw/dsw_event.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/event/dsw/dsw_event.c
CONFLICT (content): Merge conflict in drivers/event/dsw/dsw_event.c
Recorded preimage for 'drivers/event/dsw/dsw_event.c'
error: Failed to merge in the changes.
Patch failed at 0001 event/dsw: avoid reusing previously recorded events
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
[master][dpdk-next-eventdev] $
  

Patch

diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c
index d68b71b98..296adea18 100644
--- a/drivers/event/dsw/dsw_event.c
+++ b/drivers/event/dsw/dsw_event.c
@@ -646,6 +646,9 @@  dsw_port_consider_migration(struct dsw_evdev *dsw,
 	if (dsw->num_ports == 1)
 		return;
 
+	if (seen_events_len < DSW_MAX_EVENTS_RECORDED)
+		return;
+
 	DSW_LOG_DP_PORT(DEBUG, source_port->id, "Considering migration.\n");
 
 	/* Randomize interval to avoid having all threads considering