mbox series

[RFC,v1,0/6] graph enhancement for multi-core dispatch

Message ID 20220908020959.1675953-1-zhirun.yan@intel.com (mailing list archive)
Headers
Series graph enhancement for multi-core dispatch |

Message

Yan, Zhirun Sept. 8, 2022, 2:09 a.m. UTC
  Currently, the rte_graph_walk() and rte_node_enqueue* fast path API
functions in graph lib implementation are designed to work on single-core.

This solution(RFC) proposes usage of cross-core dispatching mechanism to
enhance the graph scaling strategy. We introduce Scheduler Workqueue
then we could directly dispatch streams to another worker core which is
affinity with a specific node.

This RFC:
  1. Introduce core affinity API and graph clone API.
  2. Introduce key functions to enqueue/dequeue for dispatching streams.
  3. Enhance rte_graph_walk by cross-core dispatch.
  4. Add l2fwd-graph example and stats for cross-core dispatching.

With this patch set, it could easily plan and orchestrate stream on
multi-core systems.

Future work:
  1. Support to affinity lcore set for one node.
  2. Use l3fwd-graph instead of l2fwd-graph as example in patch 06.
  3. Add new parameter, like --node(nodeid, lcoreid) to config node for core
  affinity.

Comments and suggestions are welcome. Thanks!



Haiyue Wang (1):
  examples: add l2fwd-graph

Zhirun Yan (5):
  graph: introduce core affinity API into graph
  graph: introduce graph clone API for other worker core
  graph: enable stream moving cross cores
  graph: enhance graph walk by cross-core dispatch
  graph: add stats for corss-core dispatching

 examples/l2fwd-graph/main.c      | 455 +++++++++++++++++++++++++++++++
 examples/l2fwd-graph/meson.build |  25 ++
 examples/l2fwd-graph/node.c      | 263 ++++++++++++++++++
 examples/l2fwd-graph/node.h      |  64 +++++
 examples/meson.build             |   1 +
 lib/graph/graph.c                | 121 ++++++++
 lib/graph/graph_debug.c          |   4 +
 lib/graph/graph_populate.c       |   1 +
 lib/graph/graph_private.h        |  43 +++
 lib/graph/graph_sched.c          | 194 +++++++++++++
 lib/graph/graph_stats.c          |  19 +-
 lib/graph/meson.build            |   2 +
 lib/graph/node.c                 |  25 ++
 lib/graph/rte_graph.h            |  50 ++++
 lib/graph/rte_graph_worker.h     |  59 ++++
 lib/graph/version.map            |   5 +
 16 files changed, 1327 insertions(+), 4 deletions(-)
 create mode 100644 examples/l2fwd-graph/main.c
 create mode 100644 examples/l2fwd-graph/meson.build
 create mode 100644 examples/l2fwd-graph/node.c
 create mode 100644 examples/l2fwd-graph/node.h
 create mode 100644 lib/graph/graph_sched.c
  

Comments

Jerin Jacob Sept. 20, 2022, 9:33 a.m. UTC | #1
On Thu, Sep 8, 2022 at 7:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Currently, the rte_graph_walk() and rte_node_enqueue* fast path API
> functions in graph lib implementation are designed to work on single-core.
>
> This solution(RFC) proposes usage of cross-core dispatching mechanism to
> enhance the graph scaling strategy. We introduce Scheduler Workqueue
> then we could directly dispatch streams to another worker core which is
> affinity with a specific node.
>
> This RFC:
>   1. Introduce core affinity API and graph clone API.
>   2. Introduce key functions to enqueue/dequeue for dispatching streams.
>   3. Enhance rte_graph_walk by cross-core dispatch.
>   4. Add l2fwd-graph example and stats for cross-core dispatching.
>
> With this patch set, it could easily plan and orchestrate stream on
> multi-core systems.
>
> Future work:
>   1. Support to affinity lcore set for one node.
>   2. Use l3fwd-graph instead of l2fwd-graph as example in patch 06.
>   3. Add new parameter, like --node(nodeid, lcoreid) to config node for core
>   affinity.
>
> Comments and suggestions are welcome. Thanks!

Some top level comments.

1)Yes it makes sense to not create the l2fwd-graph, Please enhance the
l3fwd-graph and compare the performance with multi core scenarios.

2) It is good to have multiple graph walk schemes like the one you
have introduced now.
Though I am not sure about performance aspects, specifically, it is
used with multiple producers and multi consumers with node.

If you have a use case for the new worker scheme, then we can add it.
I think, it would call for

a) We need to have separate rte_graph_worker.h for each implementation
to avoid the performance impact for each other.
That may boils down to
i) Create lib/graph/rte_graph_worker_common.h
ii) Treat existing rte_graph_worker.h as default scheme and include
rte_graph_worker_common.h
iii) Add new rte_graph_worker_xxx.h for the new scheme(diff between
default worker) with leveraging te_graph_worker_common.h

Application can select the worker by

#define RTE_GRAPH_WORKER_MODEL_XXX
//#define RTE_GRAPH_WORKER_MODEL_YYY
#include <rte_graph_worker.h>

b) Introduce a new enum rte_graph_model or so to express this  new
model and other models in feature

c) Each core has its own node instance so we don't need explicit
critical section management when dealing with node instances.
In this new scheme, Can we leverage the existing node implementation?
If not, we need to have separate node
implementation for different graph models. It will be a maintenance
issue. But if we really need to take this path,
Probably on each node's capability, the node needs to declare the
models supported(Use enum rte_graph_model).
This can be used for sanity checking when we clone the graph etc and
check the compatibility for creating the graph etc.
I think this is the biggest issue with adding a new model. Where nodes
need to be written based on the model. I think this could
be the reason for VPP not adding other models.

d) All new slowpath APIs like rte_node_set_lcore_affinity,
rte_graph_clone, We need to fix the namespace by
rte_graph_model_<model_name>_<operation> or so to make sure
application writer understand this APIs
are only for this model.(Also we can use "enum rte_graph_model" for
sanity check etc)



>
  
Yan, Zhirun Sept. 30, 2022, 6:41 a.m. UTC | #2
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, September 20, 2022 5:33 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com; Liang,
> Cunming <cunming.liang@intel.com>; Wang, Haiyue
> <haiyue.wang@intel.com>
> Subject: Re: [RFC, v1 0/6] graph enhancement for multi-core dispatch
> 
> On Thu, Sep 8, 2022 at 7:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Currently, the rte_graph_walk() and rte_node_enqueue* fast path API
> > functions in graph lib implementation are designed to work on single-core.
> >
> > This solution(RFC) proposes usage of cross-core dispatching mechanism
> > to enhance the graph scaling strategy. We introduce Scheduler
> > Workqueue then we could directly dispatch streams to another worker
> > core which is affinity with a specific node.
> >
> > This RFC:
> >   1. Introduce core affinity API and graph clone API.
> >   2. Introduce key functions to enqueue/dequeue for dispatching streams.
> >   3. Enhance rte_graph_walk by cross-core dispatch.
> >   4. Add l2fwd-graph example and stats for cross-core dispatching.
> >
> > With this patch set, it could easily plan and orchestrate stream on
> > multi-core systems.
> >
> > Future work:
> >   1. Support to affinity lcore set for one node.
> >   2. Use l3fwd-graph instead of l2fwd-graph as example in patch 06.
> >   3. Add new parameter, like --node(nodeid, lcoreid) to config node for
> core
> >   affinity.
> >
> > Comments and suggestions are welcome. Thanks!
> 
> Some top level comments.
> 
> 1)Yes it makes sense to not create the l2fwd-graph, Please enhance the
> l3fwd-graph and compare the performance with multi core scenarios.
> 
Thanks for your comments.
Yes, I will use l3fwd-graph and compare the performance in next version.


> 2) It is good to have multiple graph walk schemes like the one you have
> introduced now.
> Though I am not sure about performance aspects, specifically, it is used with
> multiple producers and multi consumers with node.
> 
> If you have a use case for the new worker scheme, then we can add it.
> I think, it would call for
> 
> a) We need to have separate rte_graph_worker.h for each implementation to
> avoid the performance impact for each other.
> That may boils down to
> i) Create lib/graph/rte_graph_worker_common.h
> ii) Treat existing rte_graph_worker.h as default scheme and include
> rte_graph_worker_common.h
> iii) Add new rte_graph_worker_xxx.h for the new scheme(diff between
> default worker) with leveraging te_graph_worker_common.h
> 
> Application can select the worker by
> 
> #define RTE_GRAPH_WORKER_MODEL_XXX
> //#define RTE_GRAPH_WORKER_MODEL_YYY
> #include <rte_graph_worker.h>
> 
Yes, I will break it down in next version.

> b) Introduce a new enum rte_graph_model or so to express this  new model
> and other models in feature
> 
> c) Each core has its own node instance so we don't need explicit critical
> section management when dealing with node instances.
> In this new scheme, Can we leverage the existing node implementation?
> If not, we need to have separate node
> implementation for different graph models. It will be a maintenance issue.
> But if we really need to take this path, Probably on each node's capability,
> the node needs to declare the models supported(Use enum
> rte_graph_model).
> This can be used for sanity checking when we clone the graph etc and check
> the compatibility for creating the graph etc.
> I think this is the biggest issue with adding a new model. Where nodes need
> to be written based on the model. I think this could be the reason for VPP
> not adding other models.
> 
I am agree with you. We should leverage the existing node implementation.
Also, I think the node should be agnostic of graph model.

There’re different kinds of data (tables, state, etc.) being referred in the node.
Some of them are private data of node, which has no constraint to any graph model.
For other shared data within graph, it does have thread-safe prerequisite of data access.
Instead of declaring graph model, actually it makes more sense for the node to declare the visibility of data.
It can still be transparent to the node. When running on a ‘single-core’ model, it can simplify fallback to a zero cost data access.


> d) All new slowpath APIs like rte_node_set_lcore_affinity, rte_graph_clone,
> We need to fix the namespace by
> rte_graph_model_<model_name>_<operation> or so to make sure
> application writer understand this APIs are only for this model.(Also we can
> use "enum rte_graph_model" for sanity check etc)
> 
Yes, If the operation is binding with one model, we need add the namespace.

But for rte_node_set_lcore_affinity(), it could be treated as a common API for
all models. 

For current single-core model, one graph is binding with one lcore. And it actually
makes an implicit call to set all nodes to current lcore. So I think
rte_node_set_lcore_affinity() could be a common API.
> 
> 
> >