test/interrupt: account for race with callback

Message ID 20190808173835.5949-1-aconole@redhat.com
State New
Delegated to: Thomas Monjalon
Headers show
Series
  • test/interrupt: account for race with callback
Related show

Checks

Context Check Description
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing fail Performance Testing issues
ci/iol-Compile-Testing success Compile Testing PASS
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Aaron Conole Aug. 8, 2019, 5:38 p.m.
Because the eal interrupt framework can race when invoking the callback
and a separate unregister call, the test needs to accommodate the chance
that the two collide.  Do this by checking the return value of unregister
against the race-condition flag (EAGAIN).

Fixes: f1a6c22424ce ("app/test: update interrupts test")
Signed-off-by: Aaron Conole <aconole@redhat.com>
---
NOTE: it's difficult to reproduce this race.  I tried a bit, but have
      only seen it sporadically.  In Travis environment, the CPU
      resource can be very limited and this test is quite racy.

 app/test/test_interrupts.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

David Marchand Aug. 9, 2019, 9:18 a.m. | #1
On Thu, Aug 8, 2019 at 7:38 PM Aaron Conole <aconole@redhat.com> wrote:
>
> Because the eal interrupt framework can race when invoking the callback
> and a separate unregister call, the test needs to accommodate the chance
> that the two collide.  Do this by checking the return value of unregister
> against the race-condition flag (EAGAIN).
>
> Fixes: f1a6c22424ce ("app/test: update interrupts test")

Not too sure about this tag, but anyway, this is old enough to apply
to every stable releases we have.
Cc: stable@dpdk.org ?

> Signed-off-by: Aaron Conole <aconole@redhat.com>
> ---
> NOTE: it's difficult to reproduce this race.  I tried a bit, but have
>       only seen it sporadically.  In Travis environment, the CPU
>       resource can be very limited and this test is quite racy.

Managed to reproduce it:

# time (log=/tmp/$$.log; while true; do echo interrupt_autotest
|taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test -l 0-1 >$log
2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f $log)
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:00:1f.6 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 8086:15d7 net_e1000_em
APP: HPET is not enabled, using TSC as default timer
RTE>>interrupt_autotest
Check unknown valid interrupt full path
Check valid UIO interrupt full path
Check valid device event interrupt full path
count=-11 Resource temporarily unavailable
failure occurred during checking valid device event interrupt full path
Clearing for interrupt tests
Test Failed
RTE>>
real    0m38.081s
user    0m35.836s
sys    0m2.171s

>
>  app/test/test_interrupts.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/app/test/test_interrupts.c b/app/test/test_interrupts.c
> index d8c2d8124..233b14a70 100644
> --- a/app/test/test_interrupts.c
> +++ b/app/test/test_interrupts.c
> @@ -370,9 +370,13 @@ test_interrupt_full_path_check(enum test_interrupt_handle_type intr_type)
>                 rte_delay_ms(TEST_INTERRUPT_CHECK_INTERVAL);
>
>         rte_delay_ms(TEST_INTERRUPT_CHECK_INTERVAL);
> -       if (rte_intr_callback_unregister(&test_intr_handle,
> -                       test_interrupt_callback, &test_intr_handle) < 0)
> -               return -1;
> +       while ((count =
> +               rte_intr_callback_unregister(&test_intr_handle,
> +                                            test_interrupt_callback,
> +                                            &test_intr_handle)) < 0) {
> +               if (count != -EAGAIN)
> +                       return -1;
> +       }
>
>         if (flag == 0) {
>                 printf("callback has not been called\n");
> --
> 2.21.0
>

With this patch, my loop has been running for more than 10 minutes now.
Reviewed-by: David Marchand <david.marchand@redhat.com>


--
David Marchand

Patch

diff --git a/app/test/test_interrupts.c b/app/test/test_interrupts.c
index d8c2d8124..233b14a70 100644
--- a/app/test/test_interrupts.c
+++ b/app/test/test_interrupts.c
@@ -370,9 +370,13 @@  test_interrupt_full_path_check(enum test_interrupt_handle_type intr_type)
 		rte_delay_ms(TEST_INTERRUPT_CHECK_INTERVAL);
 
 	rte_delay_ms(TEST_INTERRUPT_CHECK_INTERVAL);
-	if (rte_intr_callback_unregister(&test_intr_handle,
-			test_interrupt_callback, &test_intr_handle) < 0)
-		return -1;
+	while ((count =
+		rte_intr_callback_unregister(&test_intr_handle,
+					     test_interrupt_callback,
+					     &test_intr_handle)) < 0) {
+		if (count != -EAGAIN)
+			return -1;
+	}
 
 	if (flag == 0) {
 		printf("callback has not been called\n");