mbox series

[v3,0/8] fix distributor synchronization issues

Message ID 20200923132541.21417-1-l.wojciechow@partner.samsung.com (mailing list archive)
Headers show
Series fix distributor synchronization issues | expand

Message

Lukasz Wojciechowski Sept. 23, 2020, 1:25 p.m. UTC
During review and verification of the patch created by Sarosh Arif:
"test_distributor: prevent memory leakages from the pool" I found out
that running distributor unit tests multiple times in a row causes fails.
So I investigated all the issues I found.

There are few synchronization issues that might cause deadlocks
or corrupted data. They are fixed with this set of patches for both tests
and librte_distributor library.

---
v3:
* add missing acked and tested by statements from v1

v2:
* assign NULL to freed mbufs in distributor test
* fix handshake check on legacy single distributor
     rte_distributor_return_pkt_single()
* add patch 7 passing NULL to legacy API calls if no bufs are returned
* add patch 8 fixing API documentation

Lukasz Wojciechowski (8):
  app/test: fix deadlock in distributor test
  app/test: synchronize statistics between lcores
  app/test: fix freeing mbufs in distributor tests
  app/test: collect return mbufs in distributor test
  distributor: fix missing handshake synchronization
  distributor: fix handshake deadlock
  distributor: do not use oldpkt when not needed
  distributor: align API documentation with code

 app/test/test_distributor.c                   | 113 +++++++++++-------
 lib/librte_distributor/rte_distributor.c      |  27 ++++-
 lib/librte_distributor/rte_distributor.h      |  23 ++--
 .../rte_distributor_single.c                  |   4 +
 4 files changed, 110 insertions(+), 57 deletions(-)

Comments

David Marchand Sept. 25, 2020, 12:31 p.m. UTC | #1
Hello Lukasz,

On Wed, Sep 23, 2020 at 3:25 PM Lukasz Wojciechowski
<l.wojciechow@partner.samsung.com> wrote:
>
> During review and verification of the patch created by Sarosh Arif:
> "test_distributor: prevent memory leakages from the pool" I found out
> that running distributor unit tests multiple times in a row causes fails.
> So I investigated all the issues I found.
>
> There are few synchronization issues that might cause deadlocks
> or corrupted data. They are fixed with this set of patches for both tests
> and librte_distributor library.
>
> ---
> v3:
> * add missing acked and tested by statements from v1
>
> v2:
> * assign NULL to freed mbufs in distributor test
> * fix handshake check on legacy single distributor
>      rte_distributor_return_pkt_single()
> * add patch 7 passing NULL to legacy API calls if no bufs are returned
> * add patch 8 fixing API documentation
>
> Lukasz Wojciechowski (8):
>   app/test: fix deadlock in distributor test
>   app/test: synchronize statistics between lcores
>   app/test: fix freeing mbufs in distributor tests
>   app/test: collect return mbufs in distributor test

For these patches, we can use the "test/distributor: " prefix, and we
then avoid repeating "in distributor test"


>   distributor: fix missing handshake synchronization
>   distributor: fix handshake deadlock
>   distributor: do not use oldpkt when not needed
>   distributor: align API documentation with code

Thanks for working on those fixes !

Here is a suggestion:

- we can move this new patch 7 before patch 3 in the series, and
update the unit test:
 * by passing NULL to the first call to rte_distributor_get_pkt(),
there is no need for buf[] array init in handle_work(),
handle_work_with_free_mbufs() and handle_work_for_shutdown_test(),
 * at all points of those functions the buf[] array then contains only
[0, num[ valid entries,
 * bonus point, this makes the UT check passing NULL oldpkt,

- the former patch 3 is then easier to do since there is no need for
buf[] array clearing,

This gives the following diff, wdyt?

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index f31b54edf3..b7ab93ecbe 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -65,13 +65,10 @@ handle_work(void *arg)
        struct rte_mbuf *buf[8] __rte_cache_aligned;
        struct worker_params *wp = arg;
        struct rte_distributor *db = wp->dist;
-       unsigned int count = 0, num = 0;
+       unsigned int count = 0, num;
        unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
-       int i;

-       for (i = 0; i < 8; i++)
-               buf[i] = NULL;
-       num = rte_distributor_get_pkt(db, id, buf, buf, num);
+       num = rte_distributor_get_pkt(db, id, buf, NULL, 0);
        while (!quit) {
                __atomic_fetch_add(&worker_stats[id].handled_packets, num,
                                __ATOMIC_ACQ_REL);
@@ -277,22 +274,17 @@ handle_work_with_free_mbufs(void *arg)
        struct rte_mbuf *buf[8] __rte_cache_aligned;
        struct worker_params *wp = arg;
        struct rte_distributor *d = wp->dist;
+       unsigned int num;
        unsigned int i;
-       unsigned int num = 0;
        unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);

-       for (i = 0; i < 8; i++)
-               buf[i] = NULL;
-       num = rte_distributor_get_pkt(d, id, buf, buf, 0);
+       num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
        while (!quit) {
                __atomic_fetch_add(&worker_stats[id].handled_packets, num,
                                __ATOMIC_ACQ_REL);
-               for (i = 0; i < num; i++) {
+               for (i = 0; i < num; i++)
                        rte_pktmbuf_free(buf[i]);
-                       buf[i] = NULL;
-               }
-               num = rte_distributor_get_pkt(d,
-                               id, buf, buf, 0);
+               num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
        }
        __atomic_fetch_add(&worker_stats[id].handled_packets, num,
                        __ATOMIC_ACQ_REL);
@@ -347,15 +339,13 @@ handle_work_for_shutdown_test(void *arg)
        struct rte_mbuf *buf[8] __rte_cache_aligned;
        struct worker_params *wp = arg;
        struct rte_distributor *d = wp->dist;
-       unsigned int num = 0;
+       unsigned int num;
        unsigned int i;
        unsigned int zero_id = 0;
        const unsigned int id = __atomic_fetch_add(&worker_idx, 1,
                        __ATOMIC_RELAXED);
-       for (i = 0; i < 8; i++)
-               buf[i] = NULL;

-       num = rte_distributor_get_pkt(d, id, buf, buf, 0);
+       num = rte_distributor_get_pkt(d, id, buf, NULL, 0);

        zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
        if (id == zero_id && num > 0) {
@@ -369,12 +359,9 @@ handle_work_for_shutdown_test(void *arg)
        while (!quit && !(id == zero_id && zero_quit)) {
                __atomic_fetch_add(&worker_stats[id].handled_packets, num,
                                __ATOMIC_ACQ_REL);
-               for (i = 0; i < num; i++) {
+               for (i = 0; i < num; i++)
                        rte_pktmbuf_free(buf[i]);
-                       buf[i] = NULL;
-               }
-               num = rte_distributor_get_pkt(d,
-                               id, buf, buf, 0);
+               num = rte_distributor_get_pkt(d, id, buf, NULL, 0);

                zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
                if (id == zero_id && num > 0) {
@@ -392,21 +379,16 @@ handle_work_for_shutdown_test(void *arg)
                while (zero_quit)
                        usleep(100);

-               for (i = 0; i < num; i++) {
+               for (i = 0; i < num; i++)
                        rte_pktmbuf_free(buf[i]);
-                       buf[i] = NULL;
-               }
-               num = rte_distributor_get_pkt(d,
-                               id, buf, buf, 0);
+               num = rte_distributor_get_pkt(d, id, buf, NULL, 0);

                while (!quit) {
                        __atomic_fetch_add(&worker_stats[id].handled_packets,
                                        num, __ATOMIC_ACQ_REL);
-                       for (i = 0; i < num; i++) {
+                       for (i = 0; i < num; i++)
                                rte_pktmbuf_free(buf[i]);
-                               buf[i] = NULL;
-                       }
-                       num = rte_distributor_get_pkt(d, id, buf, buf, 0);
+                       num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
                }
        }
        rte_distributor_return_pkt(d, id, buf, num);
Lukasz Wojciechowski Sept. 25, 2020, 10:42 p.m. UTC | #2
Hello David,

Thank you for your review.

W dniu 25.09.2020 o 14:31, David Marchand pisze:
> Hello Lukasz,
>
> On Wed, Sep 23, 2020 at 3:25 PM Lukasz Wojciechowski
> <l.wojciechow@partner.samsung.com> wrote:
>> During review and verification of the patch created by Sarosh Arif:
>> "test_distributor: prevent memory leakages from the pool" I found out
>> that running distributor unit tests multiple times in a row causes fails.
>> So I investigated all the issues I found.
>>
>> There are few synchronization issues that might cause deadlocks
>> or corrupted data. They are fixed with this set of patches for both tests
>> and librte_distributor library.
>>
>> ---
>> v3:
>> * add missing acked and tested by statements from v1
>>
>> v2:
>> * assign NULL to freed mbufs in distributor test
>> * fix handshake check on legacy single distributor
>>       rte_distributor_return_pkt_single()
>> * add patch 7 passing NULL to legacy API calls if no bufs are returned
>> * add patch 8 fixing API documentation
>>
>> Lukasz Wojciechowski (8):
>>    app/test: fix deadlock in distributor test
>>    app/test: synchronize statistics between lcores
>>    app/test: fix freeing mbufs in distributor tests
>>    app/test: collect return mbufs in distributor test
> For these patches, we can use the "test/distributor: " prefix, and we
> then avoid repeating "in distributor test"
Changed
>
>>    distributor: fix missing handshake synchronization
>>    distributor: fix handshake deadlock
>>    distributor: do not use oldpkt when not needed
>>    distributor: align API documentation with code
> Thanks for working on those fixes !
>
> Here is a suggestion:
>
> - we can move this new patch 7 before patch 3 in the series, and
> update the unit test:
>   * by passing NULL to the first call to rte_distributor_get_pkt(),
> there is no need for buf[] array init in handle_work(),
> handle_work_with_free_mbufs() and handle_work_for_shutdown_test(),
>   * at all points of those functions the buf[] array then contains only
> [0, num[ valid entries,
>   * bonus point, this makes the UT check passing NULL oldpkt,
>
> - the former patch 3 is then easier to do since there is no need for
> buf[] array clearing,
>
> This gives the following diff, wdyt?
I reorder patches as you suggested.
I added unit test changes in the same patch that changes distributor 
lib. The changes follow your diff.
This also simplified "fix freeing mbufs" patch.
It's applied in v4.
>
> diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
> index f31b54edf3..b7ab93ecbe 100644
> --- a/app/test/test_distributor.c
> +++ b/app/test/test_distributor.c
> @@ -65,13 +65,10 @@ handle_work(void *arg)
>          struct rte_mbuf *buf[8] __rte_cache_aligned;
>          struct worker_params *wp = arg;
>          struct rte_distributor *db = wp->dist;
> -       unsigned int count = 0, num = 0;
> +       unsigned int count = 0, num;
>          unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
> -       int i;
>
> -       for (i = 0; i < 8; i++)
> -               buf[i] = NULL;
> -       num = rte_distributor_get_pkt(db, id, buf, buf, num);
> +       num = rte_distributor_get_pkt(db, id, buf, NULL, 0);
>          while (!quit) {
>                  __atomic_fetch_add(&worker_stats[id].handled_packets, num,
>                                  __ATOMIC_ACQ_REL);
> @@ -277,22 +274,17 @@ handle_work_with_free_mbufs(void *arg)
>          struct rte_mbuf *buf[8] __rte_cache_aligned;
>          struct worker_params *wp = arg;
>          struct rte_distributor *d = wp->dist;
> +       unsigned int num;
>          unsigned int i;
> -       unsigned int num = 0;
>          unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
>
> -       for (i = 0; i < 8; i++)
> -               buf[i] = NULL;
> -       num = rte_distributor_get_pkt(d, id, buf, buf, 0);
> +       num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
>          while (!quit) {
>                  __atomic_fetch_add(&worker_stats[id].handled_packets, num,
>                                  __ATOMIC_ACQ_REL);
> -               for (i = 0; i < num; i++) {
> +               for (i = 0; i < num; i++)
>                          rte_pktmbuf_free(buf[i]);
> -                       buf[i] = NULL;
> -               }
> -               num = rte_distributor_get_pkt(d,
> -                               id, buf, buf, 0);
> +               num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
>          }
>          __atomic_fetch_add(&worker_stats[id].handled_packets, num,
>                          __ATOMIC_ACQ_REL);
> @@ -347,15 +339,13 @@ handle_work_for_shutdown_test(void *arg)
>          struct rte_mbuf *buf[8] __rte_cache_aligned;
>          struct worker_params *wp = arg;
>          struct rte_distributor *d = wp->dist;
> -       unsigned int num = 0;
> +       unsigned int num;
>          unsigned int i;
>          unsigned int zero_id = 0;
>          const unsigned int id = __atomic_fetch_add(&worker_idx, 1,
>                          __ATOMIC_RELAXED);
> -       for (i = 0; i < 8; i++)
> -               buf[i] = NULL;
>
> -       num = rte_distributor_get_pkt(d, id, buf, buf, 0);
> +       num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
>
>          zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
>          if (id == zero_id && num > 0) {
> @@ -369,12 +359,9 @@ handle_work_for_shutdown_test(void *arg)
>          while (!quit && !(id == zero_id && zero_quit)) {
>                  __atomic_fetch_add(&worker_stats[id].handled_packets, num,
>                                  __ATOMIC_ACQ_REL);
> -               for (i = 0; i < num; i++) {
> +               for (i = 0; i < num; i++)
>                          rte_pktmbuf_free(buf[i]);
> -                       buf[i] = NULL;
> -               }
> -               num = rte_distributor_get_pkt(d,
> -                               id, buf, buf, 0);
> +               num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
>
>                  zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
>                  if (id == zero_id && num > 0) {
> @@ -392,21 +379,16 @@ handle_work_for_shutdown_test(void *arg)
>                  while (zero_quit)
>                          usleep(100);
>
> -               for (i = 0; i < num; i++) {
> +               for (i = 0; i < num; i++)
>                          rte_pktmbuf_free(buf[i]);
> -                       buf[i] = NULL;
> -               }
> -               num = rte_distributor_get_pkt(d,
> -                               id, buf, buf, 0);
> +               num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
>
>                  while (!quit) {
>                          __atomic_fetch_add(&worker_stats[id].handled_packets,
>                                          num, __ATOMIC_ACQ_REL);
> -                       for (i = 0; i < num; i++) {
> +                       for (i = 0; i < num; i++)
>                                  rte_pktmbuf_free(buf[i]);
> -                               buf[i] = NULL;
> -                       }
> -                       num = rte_distributor_get_pkt(d, id, buf, buf, 0);
> +                       num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
>                  }
>          }
>          rte_distributor_return_pkt(d, id, buf, num);
>
>