app: fix silent enqueue fail in test_mbuf test_refcnt_iter

Message ID 20230822063453.97904-1-jhascoet@kalray.eu (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series app: fix silent enqueue fail in test_mbuf test_refcnt_iter |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

jhascoet Aug. 22, 2023, 6:34 a.m. UTC
  From: Julien Hascoet <ju.hascoet@gmail.com>

In case of ring full state, we retry the enqueue
operation in order to avoid mbuf loss.

Fixes: af75078fece ("first public release")

Signed-off-by: Julien Hascoet <ju.hascoet@gmail.com>
---
 app/test/test_mbuf.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)
  

Comments

Olivier Matz Aug. 22, 2023, 8:34 a.m. UTC | #1
Hello Julien,

On Tue, Aug 22, 2023 at 08:34:53AM +0200, jhascoet wrote:
> From: Julien Hascoet <ju.hascoet@gmail.com>
> 
> In case of ring full state, we retry the enqueue
> operation in order to avoid mbuf loss.
> 
> Fixes: af75078fece ("first public release")
> 
> Signed-off-by: Julien Hascoet <ju.hascoet@gmail.com>
> ---
>  app/test/test_mbuf.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> index efac01806b..ad18bf6378 100644
> --- a/app/test/test_mbuf.c
> +++ b/app/test/test_mbuf.c
> @@ -1033,12 +1033,21 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter,
>  		tref += ref;
>  		if ((ref & 1) != 0) {
>  			rte_pktmbuf_refcnt_update(m, ref);
> -			while (ref-- != 0)
> -				rte_ring_enqueue(refcnt_mbuf_ring, m);
> +			while (ref-- != 0) {
> +				/* retry in case of failure */
> +				while (rte_ring_enqueue(refcnt_mbuf_ring, m) != 0) {
> +					/* let others consume */
> +					rte_pause();
> +				}
> +			}
>  		} else {
>  			while (ref-- != 0) {
>  				rte_pktmbuf_refcnt_update(m, 1);
> -				rte_ring_enqueue(refcnt_mbuf_ring, m);
> +				/* retry in case of failure */
> +				while (rte_ring_enqueue(refcnt_mbuf_ring, m) != 0) {
> +					/* let others consume */
> +					rte_pause();
> +				}
>  			}
>  		}
>  		rte_pktmbuf_free(m);
> -- 
> 2.34.1
> 

Can you give some more details about how to reproduce the issue?

From what I see, the code does the following:

main core:
  create a ring with at least (REFCNT_MBUF_NUM * REFCNT_MAX_REF) entries
  create an mbuf pool with REFCNT_MBUF_NUM entries
  start worker cores
  do REFCNT_MAX_ITER times:
    for each mbuf of the pool (REFCNT_MBUF_NUM entries):
      let r be a random number between 1 and REFCNT_MAX_REF
      increase mbuf references by r, and enqueue r times in the ring
    wait that the ring is empty (since worker cores are dequeuing mbufs)
  stop worker cores

worker cores:
  dequeue packets from the ring and free them until asked to stop


I may be mistaking but I don't see how the number of mbufs in ring could
exceed REFCNT_MBUF_NUM * REFCNT_MAX_REF.

Regards,
Olivier


Note: removing CC maintainers@dpdk.org
  
jhascoet Aug. 24, 2023, 6:51 a.m. UTC | #2
Hello Oliver,

thanks, your response helped a lot, I managed to find the root cause of the
instability which is on our side.
It was due to other internal developments.
I'll still add an error check on the enqueue ops to catch eventual problems
earlier, if that suits you.

Best regards,

Julien

On Tue, Aug 22, 2023 at 10:34 AM Olivier Matz <olivier.matz@6wind.com>
wrote:

> Hello Julien,
>
> On Tue, Aug 22, 2023 at 08:34:53AM +0200, jhascoet wrote:
> > From: Julien Hascoet <ju.hascoet@gmail.com>
> >
> > In case of ring full state, we retry the enqueue
> > operation in order to avoid mbuf loss.
> >
> > Fixes: af75078fece ("first public release")
> >
> > Signed-off-by: Julien Hascoet <ju.hascoet@gmail.com>
> > ---
> >  app/test/test_mbuf.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> > index efac01806b..ad18bf6378 100644
> > --- a/app/test/test_mbuf.c
> > +++ b/app/test/test_mbuf.c
> > @@ -1033,12 +1033,21 @@ test_refcnt_iter(unsigned int lcore, unsigned
> int iter,
> >               tref += ref;
> >               if ((ref & 1) != 0) {
> >                       rte_pktmbuf_refcnt_update(m, ref);
> > -                     while (ref-- != 0)
> > -                             rte_ring_enqueue(refcnt_mbuf_ring, m);
> > +                     while (ref-- != 0) {
> > +                             /* retry in case of failure */
> > +                             while (rte_ring_enqueue(refcnt_mbuf_ring,
> m) != 0) {
> > +                                     /* let others consume */
> > +                                     rte_pause();
> > +                             }
> > +                     }
> >               } else {
> >                       while (ref-- != 0) {
> >                               rte_pktmbuf_refcnt_update(m, 1);
> > -                             rte_ring_enqueue(refcnt_mbuf_ring, m);
> > +                             /* retry in case of failure */
> > +                             while (rte_ring_enqueue(refcnt_mbuf_ring,
> m) != 0) {
> > +                                     /* let others consume */
> > +                                     rte_pause();
> > +                             }
> >                       }
> >               }
> >               rte_pktmbuf_free(m);
> > --
> > 2.34.1
> >
>
> Can you give some more details about how to reproduce the issue?
>
> From what I see, the code does the following:
>
> main core:
>   create a ring with at least (REFCNT_MBUF_NUM * REFCNT_MAX_REF) entries
>   create an mbuf pool with REFCNT_MBUF_NUM entries
>   start worker cores
>   do REFCNT_MAX_ITER times:
>     for each mbuf of the pool (REFCNT_MBUF_NUM entries):
>       let r be a random number between 1 and REFCNT_MAX_REF
>       increase mbuf references by r, and enqueue r times in the ring
>     wait that the ring is empty (since worker cores are dequeuing mbufs)
>   stop worker cores
>
> worker cores:
>   dequeue packets from the ring and free them until asked to stop
>
>
> I may be mistaking but I don't see how the number of mbufs in ring could
> exceed REFCNT_MBUF_NUM * REFCNT_MAX_REF.
>
> Regards,
> Olivier
>
>
> Note: removing CC maintainers@dpdk.org
>
  
Thomas Monjalon Aug. 30, 2023, 4:16 p.m. UTC | #3
Hello,

22/08/2023 08:34, jhascoet:
> From: Julien Hascoet <ju.hascoet@gmail.com>
> 
> In case of ring full state, we retry the enqueue
> operation in order to avoid mbuf loss.
> 
> Fixes: af75078fece ("first public release")
> 
> Signed-off-by: Julien Hascoet <ju.hascoet@gmail.com>

Please could you give more information about when this is happening?
I'm surprised such thing was not a problem so far.
  

Patch

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index efac01806b..ad18bf6378 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -1033,12 +1033,21 @@  test_refcnt_iter(unsigned int lcore, unsigned int iter,
 		tref += ref;
 		if ((ref & 1) != 0) {
 			rte_pktmbuf_refcnt_update(m, ref);
-			while (ref-- != 0)
-				rte_ring_enqueue(refcnt_mbuf_ring, m);
+			while (ref-- != 0) {
+				/* retry in case of failure */
+				while (rte_ring_enqueue(refcnt_mbuf_ring, m) != 0) {
+					/* let others consume */
+					rte_pause();
+				}
+			}
 		} else {
 			while (ref-- != 0) {
 				rte_pktmbuf_refcnt_update(m, 1);
-				rte_ring_enqueue(refcnt_mbuf_ring, m);
+				/* retry in case of failure */
+				while (rte_ring_enqueue(refcnt_mbuf_ring, m) != 0) {
+					/* let others consume */
+					rte_pause();
+				}
 			}
 		}
 		rte_pktmbuf_free(m);