[v3] mempool: sort the rte_mempool_ops by name
diff mbox series

Message ID 1583501776-9958-1-git-send-email-xiangxia.m.yue@gmail.com
State Superseded, archived
Delegated to: David Marchand
Headers show
Series
  • [v3] mempool: sort the rte_mempool_ops by name
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/travis-robot success Travis build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/checkpatch success coding style OK

Commit Message

Tonghao Zhang March 6, 2020, 1:36 p.m. UTC
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

The order of mempool initiation affects mempool index in the
rte_mempool_ops_table. For example, when building APPs with:

$ gcc -lrte_mempool_bucket -lrte_mempool_ring ...

The "bucket" mempool will be registered firstly, and its index
in table is 0 while the index of "ring" mempool is 1. DPDK
uses the mk/rte.app.mk to build APPs, and others, for example,
Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
The mempool lib linked in dpdk and Open vSwitch is different.

The mempool can be used between primary and secondary process,
such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
There will be a crash because dpdk-pdump creates the "ring_mp_mc"
ring which index in table is 0, but the index of "bucket" ring
is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
mempool ops and malloc memory from mempool. The crash will occur:

    bucket_dequeue (access null and crash)
    rte_mempool_get_ops (should get "ring_mp_mc",
                         but get "bucket" mempool)
    rte_mempool_ops_dequeue_bulk
    ...
    rte_pktmbuf_alloc
    rte_pktmbuf_copy
    pdump_copy
    pdump_rx
    rte_eth_rx_burst

To avoid the crash, there are some solution:
* constructor priority: Different mempool uses different
  priority in RTE_INIT, but it's not easy to maintain.

* change mk/rte.app.mk: Change the order in mk/rte.app.mk to
  be same as libdpdk.a/libdpdk.so, but when adding a new mempool
  driver in future, we must make sure the order.

* register mempool orderly: Sort the mempool when registering,
  so the lib linked will not affect the index in mempool table.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
v3:
* fix checkpatches.sh WARNING
* change "initiation -> initialization"

v2:
* use the qsort to sort the mempool_ops.
* tested: https://travis-ci.com/ovn-open-virtual-networks/dpdk-next-net/builds/151894026
---
 lib/librte_mempool/rte_mempool_ops.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Jerin Jacob March 6, 2020, 1:37 p.m. UTC | #1
On Fri, Mar 6, 2020 at 7:06 PM <xiangxia.m.yue@gmail.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> The order of mempool initiation affects mempool index in the
> rte_mempool_ops_table. For example, when building APPs with:
>
> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
>
> The "bucket" mempool will be registered firstly, and its index
> in table is 0 while the index of "ring" mempool is 1. DPDK
> uses the mk/rte.app.mk to build APPs, and others, for example,
> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
> The mempool lib linked in dpdk and Open vSwitch is different.
>
> The mempool can be used between primary and secondary process,
> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
> ring which index in table is 0, but the index of "bucket" ring
> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
> mempool ops and malloc memory from mempool. The crash will occur:
>
>     bucket_dequeue (access null and crash)
>     rte_mempool_get_ops (should get "ring_mp_mc",
>                          but get "bucket" mempool)
>     rte_mempool_ops_dequeue_bulk
>     ...
>     rte_pktmbuf_alloc
>     rte_pktmbuf_copy
>     pdump_copy
>     pdump_rx
>     rte_eth_rx_burst
>
> To avoid the crash, there are some solution:
> * constructor priority: Different mempool uses different
>   priority in RTE_INIT, but it's not easy to maintain.
>
> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
>   be same as libdpdk.a/libdpdk.so, but when adding a new mempool
>   driver in future, we must make sure the order.
>
> * register mempool orderly: Sort the mempool when registering,
>   so the lib linked will not affect the index in mempool table.
>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>


> ---
> v3:
> * fix checkpatches.sh WARNING
> * change "initiation -> initialization"
>
> v2:
> * use the qsort to sort the mempool_ops.
> * tested: https://travis-ci.com/ovn-open-virtual-networks/dpdk-next-net/builds/151894026
> ---
>  lib/librte_mempool/rte_mempool_ops.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
> index 22c5251..b0da096 100644
> --- a/lib/librte_mempool/rte_mempool_ops.c
> +++ b/lib/librte_mempool/rte_mempool_ops.c
> @@ -17,6 +17,15 @@ struct rte_mempool_ops_table rte_mempool_ops_table = {
>         .num_ops = 0
>  };
>
> +static int
> +compare_mempool_ops(const void *a, const void *b)
> +{
> +       const struct rte_mempool_ops *m_a = a;
> +       const struct rte_mempool_ops *m_b = b;
> +
> +       return strcmp(m_a->name, m_b->name);
> +}
> +
>  /* add a new ops struct in rte_mempool_ops_table, return its index. */
>  int
>  rte_mempool_register_ops(const struct rte_mempool_ops *h)
> @@ -63,6 +72,13 @@ struct rte_mempool_ops_table rte_mempool_ops_table = {
>         ops->get_info = h->get_info;
>         ops->dequeue_contig_blocks = h->dequeue_contig_blocks;
>
> +       /*
> +        * sort the rte_mempool_ops by name. the order of the mempool
> +        * lib initialization will not affect rte_mempool_ops index.
> +        */
> +       qsort(rte_mempool_ops_table.ops, rte_mempool_ops_table.num_ops,
> +             sizeof(rte_mempool_ops_table.ops[0]), compare_mempool_ops);
> +
>         rte_spinlock_unlock(&rte_mempool_ops_table.sl);
>
>         return ops_index;
> --
> 1.8.3.1
>
Andrew Rybchenko March 7, 2020, 12:51 p.m. UTC | #2
On 3/6/20 4:37 PM, Jerin Jacob wrote:
> On Fri, Mar 6, 2020 at 7:06 PM <xiangxia.m.yue@gmail.com> wrote:
>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>
>> The order of mempool initiation affects mempool index in the
>> rte_mempool_ops_table. For example, when building APPs with:
>>
>> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
>>
>> The "bucket" mempool will be registered firstly, and its index
>> in table is 0 while the index of "ring" mempool is 1. DPDK
>> uses the mk/rte.app.mk to build APPs, and others, for example,
>> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
>> The mempool lib linked in dpdk and Open vSwitch is different.
>>
>> The mempool can be used between primary and secondary process,
>> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
>> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
>> ring which index in table is 0, but the index of "bucket" ring
>> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
>> mempool ops and malloc memory from mempool. The crash will occur:
>>
>>      bucket_dequeue (access null and crash)
>>      rte_mempool_get_ops (should get "ring_mp_mc",
>>                           but get "bucket" mempool)
>>      rte_mempool_ops_dequeue_bulk
>>      ...
>>      rte_pktmbuf_alloc
>>      rte_pktmbuf_copy
>>      pdump_copy
>>      pdump_rx
>>      rte_eth_rx_burst
>>
>> To avoid the crash, there are some solution:
>> * constructor priority: Different mempool uses different
>>    priority in RTE_INIT, but it's not easy to maintain.
>>
>> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
>>    be same as libdpdk.a/libdpdk.so, but when adding a new mempool
>>    driver in future, we must make sure the order.
>>
>> * register mempool orderly: Sort the mempool when registering,
>>    so the lib linked will not affect the index in mempool table.
>>
>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>

The patch is OK, but the fact that ops index changes during
mempool driver lifetime is frightening. In fact it breaks
rte_mempool_register_ops() return value semantics (read
as API break). The return value is not used in DPDK, but it
is a public function. If I'm not mistaken it should be taken
into account.

Also I remember patches which warn about above behaviour
in documentation. If behaviour changes, corresponding
documentation must be updated.
Andrew Rybchenko March 7, 2020, 12:54 p.m. UTC | #3
On 3/7/20 3:51 PM, Andrew Rybchenko wrote:
> On 3/6/20 4:37 PM, Jerin Jacob wrote:
>> On Fri, Mar 6, 2020 at 7:06 PM <xiangxia.m.yue@gmail.com> wrote:
>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>>
>>> The order of mempool initiation affects mempool index in the
>>> rte_mempool_ops_table. For example, when building APPs with:
>>>
>>> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
>>>
>>> The "bucket" mempool will be registered firstly, and its index
>>> in table is 0 while the index of "ring" mempool is 1. DPDK
>>> uses the mk/rte.app.mk to build APPs, and others, for example,
>>> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
>>> The mempool lib linked in dpdk and Open vSwitch is different.
>>>
>>> The mempool can be used between primary and secondary process,
>>> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
>>> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
>>> ring which index in table is 0, but the index of "bucket" ring
>>> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
>>> mempool ops and malloc memory from mempool. The crash will occur:
>>>
>>>      bucket_dequeue (access null and crash)
>>>      rte_mempool_get_ops (should get "ring_mp_mc",
>>>                           but get "bucket" mempool)
>>>      rte_mempool_ops_dequeue_bulk
>>>      ...
>>>      rte_pktmbuf_alloc
>>>      rte_pktmbuf_copy
>>>      pdump_copy
>>>      pdump_rx
>>>      rte_eth_rx_burst
>>>
>>> To avoid the crash, there are some solution:
>>> * constructor priority: Different mempool uses different
>>>    priority in RTE_INIT, but it's not easy to maintain.
>>>
>>> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
>>>    be same as libdpdk.a/libdpdk.so, but when adding a new mempool
>>>    driver in future, we must make sure the order.
>>>
>>> * register mempool orderly: Sort the mempool when registering,
>>>    so the lib linked will not affect the index in mempool table.
>>>
>>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
>> Acked-by: Jerin Jacob <jerinj@marvell.com>
> 
> The patch is OK, but the fact that ops index changes during
> mempool driver lifetime is frightening. In fact it breaks
> rte_mempool_register_ops() return value semantics (read
> as API break). The return value is not used in DPDK, but it
> is a public function. If I'm not mistaken it should be taken
> into account.
> 
> Also I remember patches which warn about above behaviour
> in documentation. If behaviour changes, corresponding
> documentation must be updated.

One more point. If the patch is finally accepted it definitely
deserves few lines in release notes.
Tonghao Zhang March 9, 2020, 3:01 a.m. UTC | #4
On Sat, Mar 7, 2020 at 8:54 PM Andrew Rybchenko
<arybchenko@solarflare.com> wrote:
>
> On 3/7/20 3:51 PM, Andrew Rybchenko wrote:
> > On 3/6/20 4:37 PM, Jerin Jacob wrote:
> >> On Fri, Mar 6, 2020 at 7:06 PM <xiangxia.m.yue@gmail.com> wrote:
> >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>
> >>> The order of mempool initiation affects mempool index in the
> >>> rte_mempool_ops_table. For example, when building APPs with:
> >>>
> >>> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
> >>>
> >>> The "bucket" mempool will be registered firstly, and its index
> >>> in table is 0 while the index of "ring" mempool is 1. DPDK
> >>> uses the mk/rte.app.mk to build APPs, and others, for example,
> >>> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
> >>> The mempool lib linked in dpdk and Open vSwitch is different.
> >>>
> >>> The mempool can be used between primary and secondary process,
> >>> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
> >>> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
> >>> ring which index in table is 0, but the index of "bucket" ring
> >>> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
> >>> mempool ops and malloc memory from mempool. The crash will occur:
> >>>
> >>>      bucket_dequeue (access null and crash)
> >>>      rte_mempool_get_ops (should get "ring_mp_mc",
> >>>                           but get "bucket" mempool)
> >>>      rte_mempool_ops_dequeue_bulk
> >>>      ...
> >>>      rte_pktmbuf_alloc
> >>>      rte_pktmbuf_copy
> >>>      pdump_copy
> >>>      pdump_rx
> >>>      rte_eth_rx_burst
> >>>
> >>> To avoid the crash, there are some solution:
> >>> * constructor priority: Different mempool uses different
> >>>    priority in RTE_INIT, but it's not easy to maintain.
> >>>
> >>> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
> >>>    be same as libdpdk.a/libdpdk.so, but when adding a new mempool
> >>>    driver in future, we must make sure the order.
> >>>
> >>> * register mempool orderly: Sort the mempool when registering,
> >>>    so the lib linked will not affect the index in mempool table.
> >>>
> >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> >> Acked-by: Jerin Jacob <jerinj@marvell.com>
> >
> > The patch is OK, but the fact that ops index changes during
> > mempool driver lifetime is frightening. In fact it breaks
> > rte_mempool_register_ops() return value semantics (read
> > as API break). The return value is not used in DPDK, but it
> > is a public function. If I'm not mistaken it should be taken
> > into account.
Yes, should update the doc: how about this:

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index c90cf31..5a9c8a7 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -904,7 +904,9 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
  * @param ops
  *   Pointer to an ops structure to register.
  * @return
- *   - >=0: Success; return the index of the ops struct in the table.
+ *   - >=0: Success; return the index of the last ops struct in the table.
+ *          The number of the ops struct registered is equal to index
+ *          returned + 1.
  *   - -EINVAL - some missing callbacks while registering ops struct.
  *   - -ENOSPC - the maximum number of ops structs has been reached.
  */
diff --git a/lib/librte_mempool/rte_mempool_ops.c
b/lib/librte_mempool/rte_mempool_ops.c
index b0da096..053f340 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -26,7 +26,11 @@ struct rte_mempool_ops_table rte_mempool_ops_table = {
        return strcmp(m_a->name, m_b->name);
 }

-/* add a new ops struct in rte_mempool_ops_table, return its index. */
+/*
+ * add a new ops struct in rte_mempool_ops_table.
+ * on success, return the index of the last ops
+ * struct in the table.
+ */
 int
 rte_mempool_register_ops(const struct rte_mempool_ops *h)
 {
> > Also I remember patches which warn about above behaviour
> > in documentation. If behaviour changes, corresponding
> > documentation must be updated.
>
> One more point. If the patch is finally accepted it definitely
> deserves few lines in release notes.
OK, a separate patch should be sent before DPDK 20.05 release ?
>
Olivier Matz March 9, 2020, 8:27 a.m. UTC | #5
Hi,

On Mon, Mar 09, 2020 at 11:01:25AM +0800, Tonghao Zhang wrote:
> On Sat, Mar 7, 2020 at 8:54 PM Andrew Rybchenko
> <arybchenko@solarflare.com> wrote:
> >
> > On 3/7/20 3:51 PM, Andrew Rybchenko wrote:
> > > On 3/6/20 4:37 PM, Jerin Jacob wrote:
> > >> On Fri, Mar 6, 2020 at 7:06 PM <xiangxia.m.yue@gmail.com> wrote:
> > >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >>>
> > >>> The order of mempool initiation affects mempool index in the
> > >>> rte_mempool_ops_table. For example, when building APPs with:
> > >>>
> > >>> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
> > >>>
> > >>> The "bucket" mempool will be registered firstly, and its index
> > >>> in table is 0 while the index of "ring" mempool is 1. DPDK
> > >>> uses the mk/rte.app.mk to build APPs, and others, for example,
> > >>> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
> > >>> The mempool lib linked in dpdk and Open vSwitch is different.
> > >>>
> > >>> The mempool can be used between primary and secondary process,
> > >>> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
> > >>> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
> > >>> ring which index in table is 0, but the index of "bucket" ring
> > >>> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
> > >>> mempool ops and malloc memory from mempool. The crash will occur:
> > >>>
> > >>>      bucket_dequeue (access null and crash)
> > >>>      rte_mempool_get_ops (should get "ring_mp_mc",
> > >>>                           but get "bucket" mempool)
> > >>>      rte_mempool_ops_dequeue_bulk
> > >>>      ...
> > >>>      rte_pktmbuf_alloc
> > >>>      rte_pktmbuf_copy
> > >>>      pdump_copy
> > >>>      pdump_rx
> > >>>      rte_eth_rx_burst
> > >>>
> > >>> To avoid the crash, there are some solution:
> > >>> * constructor priority: Different mempool uses different
> > >>>    priority in RTE_INIT, but it's not easy to maintain.
> > >>>
> > >>> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
> > >>>    be same as libdpdk.a/libdpdk.so, but when adding a new mempool
> > >>>    driver in future, we must make sure the order.
> > >>>
> > >>> * register mempool orderly: Sort the mempool when registering,
> > >>>    so the lib linked will not affect the index in mempool table.
> > >>>
> > >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> > >> Acked-by: Jerin Jacob <jerinj@marvell.com>
> > >
> > > The patch is OK, but the fact that ops index changes during
> > > mempool driver lifetime is frightening. In fact it breaks
> > > rte_mempool_register_ops() return value semantics (read
> > > as API break). The return value is not used in DPDK, but it
> > > is a public function. If I'm not mistaken it should be taken
> > > into account.

Good points.

The fact that the ops index changes during mempool driver lifetime is
indeed frightening, especially knowning that this is a dynamic
registration that could happen at any moment in the life of the
application. Also, breaking the ABI is not desirable.

Let me try to propose something else to solve your issue:

1/ At init, the primary process allocates a struct in shared memory
   (named memzone):

   struct rte_mempool_shared_ops {
     size_t num_mempool_ops;
     struct {
       char name[RTE_MEMPOOL_OPS_NAMESIZE];
     } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
     char *mempool_ops_name[RTE_MEMPOOL_MAX_OPS_IDX];
     rte_spinlock_t mempool;
   }

2/ When we register a mempool ops, we first get a name and id from the
   shared struct: with the lock held, lookup for the registered name and
   return its index, else get the last id and copy the name in the struct.

3/ Then do as before (in the per-process global table), except that we
   reuse the registered id.

We can remove the num_ops field from rte_mempool_ops_table.

Thoughts?


> Yes, should update the doc: how about this:
> 
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index c90cf31..5a9c8a7 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -904,7 +904,9 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
>   * @param ops
>   *   Pointer to an ops structure to register.
>   * @return
> - *   - >=0: Success; return the index of the ops struct in the table.
> + *   - >=0: Success; return the index of the last ops struct in the table.
> + *          The number of the ops struct registered is equal to index
> + *          returned + 1.
>   *   - -EINVAL - some missing callbacks while registering ops struct.
>   *   - -ENOSPC - the maximum number of ops structs has been reached.
>   */
> diff --git a/lib/librte_mempool/rte_mempool_ops.c
> b/lib/librte_mempool/rte_mempool_ops.c
> index b0da096..053f340 100644
> --- a/lib/librte_mempool/rte_mempool_ops.c
> +++ b/lib/librte_mempool/rte_mempool_ops.c
> @@ -26,7 +26,11 @@ struct rte_mempool_ops_table rte_mempool_ops_table = {
>         return strcmp(m_a->name, m_b->name);
>  }
> 
> -/* add a new ops struct in rte_mempool_ops_table, return its index. */
> +/*
> + * add a new ops struct in rte_mempool_ops_table.
> + * on success, return the index of the last ops
> + * struct in the table.
> + */
>  int
>  rte_mempool_register_ops(const struct rte_mempool_ops *h)
>  {
> > > Also I remember patches which warn about above behaviour
> > > in documentation. If behaviour changes, corresponding
> > > documentation must be updated.
> >
> > One more point. If the patch is finally accepted it definitely
> > deserves few lines in release notes.
> OK, a separate patch should be sent before DPDK 20.05 release ?
> >
> 
> 
> -- 
> Thanks,
> Tonghao
Tonghao Zhang March 9, 2020, 8:55 a.m. UTC | #6
On Mon, Mar 9, 2020 at 4:27 PM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> Hi,
>
> On Mon, Mar 09, 2020 at 11:01:25AM +0800, Tonghao Zhang wrote:
> > On Sat, Mar 7, 2020 at 8:54 PM Andrew Rybchenko
> > <arybchenko@solarflare.com> wrote:
> > >
> > > On 3/7/20 3:51 PM, Andrew Rybchenko wrote:
> > > > On 3/6/20 4:37 PM, Jerin Jacob wrote:
> > > >> On Fri, Mar 6, 2020 at 7:06 PM <xiangxia.m.yue@gmail.com> wrote:
> > > >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >>>
> > > >>> The order of mempool initiation affects mempool index in the
> > > >>> rte_mempool_ops_table. For example, when building APPs with:
> > > >>>
> > > >>> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
> > > >>>
> > > >>> The "bucket" mempool will be registered firstly, and its index
> > > >>> in table is 0 while the index of "ring" mempool is 1. DPDK
> > > >>> uses the mk/rte.app.mk to build APPs, and others, for example,
> > > >>> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
> > > >>> The mempool lib linked in dpdk and Open vSwitch is different.
> > > >>>
> > > >>> The mempool can be used between primary and secondary process,
> > > >>> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
> > > >>> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
> > > >>> ring which index in table is 0, but the index of "bucket" ring
> > > >>> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
> > > >>> mempool ops and malloc memory from mempool. The crash will occur:
> > > >>>
> > > >>>      bucket_dequeue (access null and crash)
> > > >>>      rte_mempool_get_ops (should get "ring_mp_mc",
> > > >>>                           but get "bucket" mempool)
> > > >>>      rte_mempool_ops_dequeue_bulk
> > > >>>      ...
> > > >>>      rte_pktmbuf_alloc
> > > >>>      rte_pktmbuf_copy
> > > >>>      pdump_copy
> > > >>>      pdump_rx
> > > >>>      rte_eth_rx_burst
> > > >>>
> > > >>> To avoid the crash, there are some solution:
> > > >>> * constructor priority: Different mempool uses different
> > > >>>    priority in RTE_INIT, but it's not easy to maintain.
> > > >>>
> > > >>> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
> > > >>>    be same as libdpdk.a/libdpdk.so, but when adding a new mempool
> > > >>>    driver in future, we must make sure the order.
> > > >>>
> > > >>> * register mempool orderly: Sort the mempool when registering,
> > > >>>    so the lib linked will not affect the index in mempool table.
> > > >>>
> > > >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> > > >> Acked-by: Jerin Jacob <jerinj@marvell.com>
> > > >
> > > > The patch is OK, but the fact that ops index changes during
> > > > mempool driver lifetime is frightening. In fact it breaks
> > > > rte_mempool_register_ops() return value semantics (read
> > > > as API break). The return value is not used in DPDK, but it
> > > > is a public function. If I'm not mistaken it should be taken
> > > > into account.
>
> Good points.
>
> The fact that the ops index changes during mempool driver lifetime is
> indeed frightening, especially knowning that this is a dynamic
> registration that could happen at any moment in the life of the
> application. Also, breaking the ABI is not desirable.
That solution is better.

> Let me try to propose something else to solve your issue:
>
> 1/ At init, the primary process allocates a struct in shared memory
>    (named memzone):
>
>    struct rte_mempool_shared_ops {
>      size_t num_mempool_ops;
>      struct {
>        char name[RTE_MEMPOOL_OPS_NAMESIZE];
>      } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
>      char *mempool_ops_name[RTE_MEMPOOL_MAX_OPS_IDX];
>      rte_spinlock_t mempool;
>    }
>
> 2/ When we register a mempool ops, we first get a name and id from the
>    shared struct: with the lock held, lookup for the registered name and
>    return its index, else get the last id and copy the name in the struct.
>
> 3/ Then do as before (in the per-process global table), except that we
>    reuse the registered id.
>
> We can remove the num_ops field from rte_mempool_ops_table.
>
> Thoughts?
>
>
> > Yes, should update the doc: how about this:
> >
> > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> > index c90cf31..5a9c8a7 100644
> > --- a/lib/librte_mempool/rte_mempool.h
> > +++ b/lib/librte_mempool/rte_mempool.h
> > @@ -904,7 +904,9 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
> >   * @param ops
> >   *   Pointer to an ops structure to register.
> >   * @return
> > - *   - >=0: Success; return the index of the ops struct in the table.
> > + *   - >=0: Success; return the index of the last ops struct in the table.
> > + *          The number of the ops struct registered is equal to index
> > + *          returned + 1.
> >   *   - -EINVAL - some missing callbacks while registering ops struct.
> >   *   - -ENOSPC - the maximum number of ops structs has been reached.
> >   */
> > diff --git a/lib/librte_mempool/rte_mempool_ops.c
> > b/lib/librte_mempool/rte_mempool_ops.c
> > index b0da096..053f340 100644
> > --- a/lib/librte_mempool/rte_mempool_ops.c
> > +++ b/lib/librte_mempool/rte_mempool_ops.c
> > @@ -26,7 +26,11 @@ struct rte_mempool_ops_table rte_mempool_ops_table = {
> >         return strcmp(m_a->name, m_b->name);
> >  }
> >
> > -/* add a new ops struct in rte_mempool_ops_table, return its index. */
> > +/*
> > + * add a new ops struct in rte_mempool_ops_table.
> > + * on success, return the index of the last ops
> > + * struct in the table.
> > + */
> >  int
> >  rte_mempool_register_ops(const struct rte_mempool_ops *h)
> >  {
> > > > Also I remember patches which warn about above behaviour
> > > > in documentation. If behaviour changes, corresponding
> > > > documentation must be updated.
> > >
> > > One more point. If the patch is finally accepted it definitely
> > > deserves few lines in release notes.
> > OK, a separate patch should be sent before DPDK 20.05 release ?
> > >
> >
> >
> > --
> > Thanks,
> > Tonghao



--
Thanks,
Tonghao
Olivier Matz March 9, 2020, 9:05 a.m. UTC | #7
On Mon, Mar 09, 2020 at 04:55:28PM +0800, Tonghao Zhang wrote:
> On Mon, Mar 9, 2020 at 4:27 PM Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > Hi,
> >
> > On Mon, Mar 09, 2020 at 11:01:25AM +0800, Tonghao Zhang wrote:
> > > On Sat, Mar 7, 2020 at 8:54 PM Andrew Rybchenko
> > > <arybchenko@solarflare.com> wrote:
> > > >
> > > > On 3/7/20 3:51 PM, Andrew Rybchenko wrote:
> > > > > On 3/6/20 4:37 PM, Jerin Jacob wrote:
> > > > >> On Fri, Mar 6, 2020 at 7:06 PM <xiangxia.m.yue@gmail.com> wrote:
> > > > >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > > >>>
> > > > >>> The order of mempool initiation affects mempool index in the
> > > > >>> rte_mempool_ops_table. For example, when building APPs with:
> > > > >>>
> > > > >>> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
> > > > >>>
> > > > >>> The "bucket" mempool will be registered firstly, and its index
> > > > >>> in table is 0 while the index of "ring" mempool is 1. DPDK
> > > > >>> uses the mk/rte.app.mk to build APPs, and others, for example,
> > > > >>> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
> > > > >>> The mempool lib linked in dpdk and Open vSwitch is different.
> > > > >>>
> > > > >>> The mempool can be used between primary and secondary process,
> > > > >>> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
> > > > >>> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
> > > > >>> ring which index in table is 0, but the index of "bucket" ring
> > > > >>> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
> > > > >>> mempool ops and malloc memory from mempool. The crash will occur:
> > > > >>>
> > > > >>>      bucket_dequeue (access null and crash)
> > > > >>>      rte_mempool_get_ops (should get "ring_mp_mc",
> > > > >>>                           but get "bucket" mempool)
> > > > >>>      rte_mempool_ops_dequeue_bulk
> > > > >>>      ...
> > > > >>>      rte_pktmbuf_alloc
> > > > >>>      rte_pktmbuf_copy
> > > > >>>      pdump_copy
> > > > >>>      pdump_rx
> > > > >>>      rte_eth_rx_burst
> > > > >>>
> > > > >>> To avoid the crash, there are some solution:
> > > > >>> * constructor priority: Different mempool uses different
> > > > >>>    priority in RTE_INIT, but it's not easy to maintain.
> > > > >>>
> > > > >>> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
> > > > >>>    be same as libdpdk.a/libdpdk.so, but when adding a new mempool
> > > > >>>    driver in future, we must make sure the order.
> > > > >>>
> > > > >>> * register mempool orderly: Sort the mempool when registering,
> > > > >>>    so the lib linked will not affect the index in mempool table.
> > > > >>>
> > > > >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > > >>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> > > > >> Acked-by: Jerin Jacob <jerinj@marvell.com>
> > > > >
> > > > > The patch is OK, but the fact that ops index changes during
> > > > > mempool driver lifetime is frightening. In fact it breaks
> > > > > rte_mempool_register_ops() return value semantics (read
> > > > > as API break). The return value is not used in DPDK, but it
> > > > > is a public function. If I'm not mistaken it should be taken
> > > > > into account.
> >
> > Good points.
> >
> > The fact that the ops index changes during mempool driver lifetime is
> > indeed frightening, especially knowning that this is a dynamic
> > registration that could happen at any moment in the life of the
> > application. Also, breaking the ABI is not desirable.
> That solution is better.
> 
> > Let me try to propose something else to solve your issue:
> >
> > 1/ At init, the primary process allocates a struct in shared memory
> >    (named memzone):
> >
> >    struct rte_mempool_shared_ops {
> >      size_t num_mempool_ops;
> >      struct {
> >        char name[RTE_MEMPOOL_OPS_NAMESIZE];
> >      } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];


> >      char *mempool_ops_name[RTE_MEMPOOL_MAX_OPS_IDX];

oops I forgot to remove this line (replaced by mini-struct just above).


> >      rte_spinlock_t mempool;
> >    }
> >
> > 2/ When we register a mempool ops, we first get a name and id from the
> >    shared struct: with the lock held, lookup for the registered name and
> >    return its index, else get the last id and copy the name in the struct.
> >
> > 3/ Then do as before (in the per-process global table), except that we
> >    reuse the registered id.
> >
> > We can remove the num_ops field from rte_mempool_ops_table.
> >
> > Thoughts?
> >
> >
> > > Yes, should update the doc: how about this:
> > >
> > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> > > index c90cf31..5a9c8a7 100644
> > > --- a/lib/librte_mempool/rte_mempool.h
> > > +++ b/lib/librte_mempool/rte_mempool.h
> > > @@ -904,7 +904,9 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
> > >   * @param ops
> > >   *   Pointer to an ops structure to register.
> > >   * @return
> > > - *   - >=0: Success; return the index of the ops struct in the table.
> > > + *   - >=0: Success; return the index of the last ops struct in the table.
> > > + *          The number of the ops struct registered is equal to index
> > > + *          returned + 1.
> > >   *   - -EINVAL - some missing callbacks while registering ops struct.
> > >   *   - -ENOSPC - the maximum number of ops structs has been reached.
> > >   */
> > > diff --git a/lib/librte_mempool/rte_mempool_ops.c
> > > b/lib/librte_mempool/rte_mempool_ops.c
> > > index b0da096..053f340 100644
> > > --- a/lib/librte_mempool/rte_mempool_ops.c
> > > +++ b/lib/librte_mempool/rte_mempool_ops.c
> > > @@ -26,7 +26,11 @@ struct rte_mempool_ops_table rte_mempool_ops_table = {
> > >         return strcmp(m_a->name, m_b->name);
> > >  }
> > >
> > > -/* add a new ops struct in rte_mempool_ops_table, return its index. */
> > > +/*
> > > + * add a new ops struct in rte_mempool_ops_table.
> > > + * on success, return the index of the last ops
> > > + * struct in the table.
> > > + */
> > >  int
> > >  rte_mempool_register_ops(const struct rte_mempool_ops *h)
> > >  {
> > > > > Also I remember patches which warn about above behaviour
> > > > > in documentation. If behaviour changes, corresponding
> > > > > documentation must be updated.
> > > >
> > > > One more point. If the patch is finally accepted it definitely
> > > > deserves few lines in release notes.
> > > OK, a separate patch should be sent before DPDK 20.05 release ?
> > > >
> > >
> > >
> > > --
> > > Thanks,
> > > Tonghao
> 
> 
> 
> --
> Thanks,
> Tonghao
David Marchand March 9, 2020, 1:15 p.m. UTC | #8
On Mon, Mar 9, 2020 at 9:56 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> On Mon, Mar 9, 2020 at 4:27 PM Olivier Matz <olivier.matz@6wind.com> wrote:
> > The fact that the ops index changes during mempool driver lifetime is
> > indeed frightening, especially knowning that this is a dynamic
> > registration that could happen at any moment in the life of the
> > application. Also, breaking the ABI is not desirable.
> That solution is better.
>
> > Let me try to propose something else to solve your issue:
> >
> > 1/ At init, the primary process allocates a struct in shared memory
> >    (named memzone):
> >
> >    struct rte_mempool_shared_ops {
> >      size_t num_mempool_ops;
> >      struct {
> >        char name[RTE_MEMPOOL_OPS_NAMESIZE];
> >      } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
> >      char *mempool_ops_name[RTE_MEMPOOL_MAX_OPS_IDX];
> >      rte_spinlock_t mempool;
> >    }
> >
> > 2/ When we register a mempool ops, we first get a name and id from the
> >    shared struct: with the lock held, lookup for the registered name and
> >    return its index, else get the last id and copy the name in the struct.
> >
> > 3/ Then do as before (in the per-process global table), except that we
> >    reuse the registered id.
> >
> > We can remove the num_ops field from rte_mempool_ops_table.
> >
> > Thoughts?

It seems better, just adding Anatoly and Bruce who know more about multiprocess.

Tonghao, could you add a unit test to exhibit the issue as part of this work?

Thanks.
Tonghao Zhang March 16, 2020, 7:43 a.m. UTC | #9
On Mon, Mar 9, 2020 at 9:15 PM David Marchand <david.marchand@redhat.com> wrote:
>
> On Mon, Mar 9, 2020 at 9:56 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > On Mon, Mar 9, 2020 at 4:27 PM Olivier Matz <olivier.matz@6wind.com> wrote:
> > > The fact that the ops index changes during mempool driver lifetime is
> > > indeed frightening, especially knowning that this is a dynamic
> > > registration that could happen at any moment in the life of the
> > > application. Also, breaking the ABI is not desirable.
> > That solution is better.
> >
> > > Let me try to propose something else to solve your issue:
> > >
> > > 1/ At init, the primary process allocates a struct in shared memory
> > >    (named memzone):
> > >
> > >    struct rte_mempool_shared_ops {
> > >      size_t num_mempool_ops;
> > >      struct {
> > >        char name[RTE_MEMPOOL_OPS_NAMESIZE];
> > >      } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
> > >      char *mempool_ops_name[RTE_MEMPOOL_MAX_OPS_IDX];
> > >      rte_spinlock_t mempool;
> > >    }
> > >
> > > 2/ When we register a mempool ops, we first get a name and id from the
> > >    shared struct: with the lock held, lookup for the registered name and
> > >    return its index, else get the last id and copy the name in the struct.
> > >
> > > 3/ Then do as before (in the per-process global table), except that we
> > >    reuse the registered id.
> > >
> > > We can remove the num_ops field from rte_mempool_ops_table.
> > >
> > > Thoughts?
>
> It seems better, just adding Anatoly and Bruce who know more about multiprocess.
>
> Tonghao, could you add a unit test to exhibit the issue as part of this work?
>
> Thanks.
Hi Olivier
any progress?will apply this patch or wait your patch?

>
> --
> David Marchand
>
Olivier Matz March 16, 2020, 7:55 a.m. UTC | #10
Hi Tonghao,

On Mon, Mar 16, 2020 at 03:43:40PM +0800, Tonghao Zhang wrote:
> On Mon, Mar 9, 2020 at 9:15 PM David Marchand <david.marchand@redhat.com> wrote:
> >
> > On Mon, Mar 9, 2020 at 9:56 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > > On Mon, Mar 9, 2020 at 4:27 PM Olivier Matz <olivier.matz@6wind.com> wrote:
> > > > The fact that the ops index changes during mempool driver lifetime is
> > > > indeed frightening, especially knowning that this is a dynamic
> > > > registration that could happen at any moment in the life of the
> > > > application. Also, breaking the ABI is not desirable.
> > > That solution is better.
> > >
> > > > Let me try to propose something else to solve your issue:
> > > >
> > > > 1/ At init, the primary process allocates a struct in shared memory
> > > >    (named memzone):
> > > >
> > > >    struct rte_mempool_shared_ops {
> > > >      size_t num_mempool_ops;
> > > >      struct {
> > > >        char name[RTE_MEMPOOL_OPS_NAMESIZE];
> > > >      } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
> > > >      char *mempool_ops_name[RTE_MEMPOOL_MAX_OPS_IDX];
> > > >      rte_spinlock_t mempool;
> > > >    }
> > > >
> > > > 2/ When we register a mempool ops, we first get a name and id from the
> > > >    shared struct: with the lock held, lookup for the registered name and
> > > >    return its index, else get the last id and copy the name in the struct.
> > > >
> > > > 3/ Then do as before (in the per-process global table), except that we
> > > >    reuse the registered id.
> > > >
> > > > We can remove the num_ops field from rte_mempool_ops_table.
> > > >
> > > > Thoughts?
> >
> > It seems better, just adding Anatoly and Bruce who know more about multiprocess.
> >
> > Tonghao, could you add a unit test to exhibit the issue as part of this work?
> >
> > Thanks.
> Hi Olivier
> any progress?will apply this patch or wait your patch?

Sorry if it wasn't clear, but my proposition was just an idea, I have no
time yet to work on it. Feel free to implement it by yourself, and I'll
help to review the patches.

Thanks,
Olivier
Andrew Rybchenko March 24, 2020, 9:35 a.m. UTC | #11
On 3/9/20 11:27 AM, Olivier Matz wrote:
> Hi,
> 
> On Mon, Mar 09, 2020 at 11:01:25AM +0800, Tonghao Zhang wrote:
>> On Sat, Mar 7, 2020 at 8:54 PM Andrew Rybchenko
>> <arybchenko@solarflare.com> wrote:
>>>
>>> On 3/7/20 3:51 PM, Andrew Rybchenko wrote:
>>>> On 3/6/20 4:37 PM, Jerin Jacob wrote:
>>>>> On Fri, Mar 6, 2020 at 7:06 PM <xiangxia.m.yue@gmail.com> wrote:
>>>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>>>>>
>>>>>> The order of mempool initiation affects mempool index in the
>>>>>> rte_mempool_ops_table. For example, when building APPs with:
>>>>>>
>>>>>> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
>>>>>>
>>>>>> The "bucket" mempool will be registered firstly, and its index
>>>>>> in table is 0 while the index of "ring" mempool is 1. DPDK
>>>>>> uses the mk/rte.app.mk to build APPs, and others, for example,
>>>>>> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
>>>>>> The mempool lib linked in dpdk and Open vSwitch is different.
>>>>>>
>>>>>> The mempool can be used between primary and secondary process,
>>>>>> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
>>>>>> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
>>>>>> ring which index in table is 0, but the index of "bucket" ring
>>>>>> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
>>>>>> mempool ops and malloc memory from mempool. The crash will occur:
>>>>>>
>>>>>>      bucket_dequeue (access null and crash)
>>>>>>      rte_mempool_get_ops (should get "ring_mp_mc",
>>>>>>                           but get "bucket" mempool)
>>>>>>      rte_mempool_ops_dequeue_bulk
>>>>>>      ...
>>>>>>      rte_pktmbuf_alloc
>>>>>>      rte_pktmbuf_copy
>>>>>>      pdump_copy
>>>>>>      pdump_rx
>>>>>>      rte_eth_rx_burst
>>>>>>
>>>>>> To avoid the crash, there are some solution:
>>>>>> * constructor priority: Different mempool uses different
>>>>>>    priority in RTE_INIT, but it's not easy to maintain.
>>>>>>
>>>>>> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
>>>>>>    be same as libdpdk.a/libdpdk.so, but when adding a new mempool
>>>>>>    driver in future, we must make sure the order.
>>>>>>
>>>>>> * register mempool orderly: Sort the mempool when registering,
>>>>>>    so the lib linked will not affect the index in mempool table.
>>>>>>
>>>>>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>>>>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
>>>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
>>>>
>>>> The patch is OK, but the fact that ops index changes during
>>>> mempool driver lifetime is frightening. In fact it breaks
>>>> rte_mempool_register_ops() return value semantics (read
>>>> as API break). The return value is not used in DPDK, but it
>>>> is a public function. If I'm not mistaken it should be taken
>>>> into account.
> 
> Good points.
> 
> The fact that the ops index changes during mempool driver lifetime is
> indeed frightening, especially knowning that this is a dynamic
> registration that could happen at any moment in the life of the
> application. Also, breaking the ABI is not desirable.
> 
> Let me try to propose something else to solve your issue:
> 
> 1/ At init, the primary process allocates a struct in shared memory
>    (named memzone):
> 
>    struct rte_mempool_shared_ops {
>      size_t num_mempool_ops;
>      struct {
>        char name[RTE_MEMPOOL_OPS_NAMESIZE];
>      } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
>      char *mempool_ops_name[RTE_MEMPOOL_MAX_OPS_IDX];
>      rte_spinlock_t mempool;
>    }
> 
> 2/ When we register a mempool ops, we first get a name and id from the
>    shared struct: with the lock held, lookup for the registered name and
>    return its index, else get the last id and copy the name in the struct.
> 
> 3/ Then do as before (in the per-process global table), except that we
>    reuse the registered id.
> 
> We can remove the num_ops field from rte_mempool_ops_table.
> 
> Thoughts?

I like the solution.
Tonghao Zhang March 24, 2020, 12:41 p.m. UTC | #12
On Tue, Mar 24, 2020 at 5:36 PM Andrew Rybchenko
<arybchenko@solarflare.com> wrote:
>
> On 3/9/20 11:27 AM, Olivier Matz wrote:
> > Hi,
> >
> > On Mon, Mar 09, 2020 at 11:01:25AM +0800, Tonghao Zhang wrote:
> >> On Sat, Mar 7, 2020 at 8:54 PM Andrew Rybchenko
> >> <arybchenko@solarflare.com> wrote:
> >>>
> >>> On 3/7/20 3:51 PM, Andrew Rybchenko wrote:
> >>>> On 3/6/20 4:37 PM, Jerin Jacob wrote:
> >>>>> On Fri, Mar 6, 2020 at 7:06 PM <xiangxia.m.yue@gmail.com> wrote:
> >>>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>>>>
> >>>>>> The order of mempool initiation affects mempool index in the
> >>>>>> rte_mempool_ops_table. For example, when building APPs with:
> >>>>>>
> >>>>>> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
> >>>>>>
> >>>>>> The "bucket" mempool will be registered firstly, and its index
> >>>>>> in table is 0 while the index of "ring" mempool is 1. DPDK
> >>>>>> uses the mk/rte.app.mk to build APPs, and others, for example,
> >>>>>> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
> >>>>>> The mempool lib linked in dpdk and Open vSwitch is different.
> >>>>>>
> >>>>>> The mempool can be used between primary and secondary process,
> >>>>>> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
> >>>>>> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
> >>>>>> ring which index in table is 0, but the index of "bucket" ring
> >>>>>> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
> >>>>>> mempool ops and malloc memory from mempool. The crash will occur:
> >>>>>>
> >>>>>>      bucket_dequeue (access null and crash)
> >>>>>>      rte_mempool_get_ops (should get "ring_mp_mc",
> >>>>>>                           but get "bucket" mempool)
> >>>>>>      rte_mempool_ops_dequeue_bulk
> >>>>>>      ...
> >>>>>>      rte_pktmbuf_alloc
> >>>>>>      rte_pktmbuf_copy
> >>>>>>      pdump_copy
> >>>>>>      pdump_rx
> >>>>>>      rte_eth_rx_burst
> >>>>>>
> >>>>>> To avoid the crash, there are some solution:
> >>>>>> * constructor priority: Different mempool uses different
> >>>>>>    priority in RTE_INIT, but it's not easy to maintain.
> >>>>>>
> >>>>>> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
> >>>>>>    be same as libdpdk.a/libdpdk.so, but when adding a new mempool
> >>>>>>    driver in future, we must make sure the order.
> >>>>>>
> >>>>>> * register mempool orderly: Sort the mempool when registering,
> >>>>>>    so the lib linked will not affect the index in mempool table.
> >>>>>>
> >>>>>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>>>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> >>>>> Acked-by: Jerin Jacob <jerinj@marvell.com>
> >>>>
> >>>> The patch is OK, but the fact that ops index changes during
> >>>> mempool driver lifetime is frightening. In fact it breaks
> >>>> rte_mempool_register_ops() return value semantics (read
> >>>> as API break). The return value is not used in DPDK, but it
> >>>> is a public function. If I'm not mistaken it should be taken
> >>>> into account.
> >
> > Good points.
> >
> > The fact that the ops index changes during mempool driver lifetime is
> > indeed frightening, especially knowning that this is a dynamic
> > registration that could happen at any moment in the life of the
> > application. Also, breaking the ABI is not desirable.
> >
> > Let me try to propose something else to solve your issue:
> >
> > 1/ At init, the primary process allocates a struct in shared memory
> >    (named memzone):
> >
> >    struct rte_mempool_shared_ops {
> >      size_t num_mempool_ops;
> >      struct {
> >        char name[RTE_MEMPOOL_OPS_NAMESIZE];
> >      } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
> >      char *mempool_ops_name[RTE_MEMPOOL_MAX_OPS_IDX];
> >      rte_spinlock_t mempool;
> >    }
> >
> > 2/ When we register a mempool ops, we first get a name and id from the
> >    shared struct: with the lock held, lookup for the registered name and
> >    return its index, else get the last id and copy the name in the struct.
> >
> > 3/ Then do as before (in the per-process global table), except that we
> >    reuse the registered id.
> >
> > We can remove the num_ops field from rte_mempool_ops_table.
> >
> > Thoughts?
>
> I like the solution.
The patch will be sent, thanks.

Patch
diff mbox series

diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 22c5251..b0da096 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -17,6 +17,15 @@  struct rte_mempool_ops_table rte_mempool_ops_table = {
 	.num_ops = 0
 };
 
+static int
+compare_mempool_ops(const void *a, const void *b)
+{
+	const struct rte_mempool_ops *m_a = a;
+	const struct rte_mempool_ops *m_b = b;
+
+	return strcmp(m_a->name, m_b->name);
+}
+
 /* add a new ops struct in rte_mempool_ops_table, return its index. */
 int
 rte_mempool_register_ops(const struct rte_mempool_ops *h)
@@ -63,6 +72,13 @@  struct rte_mempool_ops_table rte_mempool_ops_table = {
 	ops->get_info = h->get_info;
 	ops->dequeue_contig_blocks = h->dequeue_contig_blocks;
 
+	/*
+	 * sort the rte_mempool_ops by name. the order of the mempool
+	 * lib initialization will not affect rte_mempool_ops index.
+	 */
+	qsort(rte_mempool_ops_table.ops, rte_mempool_ops_table.num_ops,
+	      sizeof(rte_mempool_ops_table.ops[0]), compare_mempool_ops);
+
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
 	return ops_index;