Message ID | 20190808173835.5949-1-aconole@redhat.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Thomas Monjalon |
Headers | show |
Series | test/interrupt: account for race with callback | expand |
Context | Check | Description |
---|---|---|
ci/checkpatch | success | coding style OK |
ci/Intel-compilation | success | Compilation OK |
ci/iol-Compile-Testing | success | Compile Testing PASS |
ci/intel-Performance-Testing | fail | Performance Testing issues |
ci/intel-Performance-Testing | fail | Performance Testing issues |
ci/mellanox-Performance-Testing | success | Performance Testing PASS |
ci/mellanox-Performance-Testing | success | Performance Testing PASS |
On Thu, Aug 8, 2019 at 7:38 PM Aaron Conole <aconole@redhat.com> wrote: > > Because the eal interrupt framework can race when invoking the callback > and a separate unregister call, the test needs to accommodate the chance > that the two collide. Do this by checking the return value of unregister > against the race-condition flag (EAGAIN). > > Fixes: f1a6c22424ce ("app/test: update interrupts test") Not too sure about this tag, but anyway, this is old enough to apply to every stable releases we have. Cc: stable@dpdk.org ? > Signed-off-by: Aaron Conole <aconole@redhat.com> > --- > NOTE: it's difficult to reproduce this race. I tried a bit, but have > only seen it sporadically. In Travis environment, the CPU > resource can be very limited and this test is quite racy. Managed to reproduce it: # time (log=/tmp/$$.log; while true; do echo interrupt_autotest |taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test -l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f $log) EAL: Detected 8 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No available hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: PCI device 0000:00:1f.6 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:15d7 net_e1000_em APP: HPET is not enabled, using TSC as default timer RTE>>interrupt_autotest Check unknown valid interrupt full path Check valid UIO interrupt full path Check valid device event interrupt full path count=-11 Resource temporarily unavailable failure occurred during checking valid device event interrupt full path Clearing for interrupt tests Test Failed RTE>> real 0m38.081s user 0m35.836s sys 0m2.171s > > app/test/test_interrupts.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/app/test/test_interrupts.c b/app/test/test_interrupts.c > index d8c2d8124..233b14a70 100644 > --- a/app/test/test_interrupts.c > +++ b/app/test/test_interrupts.c > @@ -370,9 +370,13 @@ test_interrupt_full_path_check(enum test_interrupt_handle_type intr_type) > rte_delay_ms(TEST_INTERRUPT_CHECK_INTERVAL); > > rte_delay_ms(TEST_INTERRUPT_CHECK_INTERVAL); > - if (rte_intr_callback_unregister(&test_intr_handle, > - test_interrupt_callback, &test_intr_handle) < 0) > - return -1; > + while ((count = > + rte_intr_callback_unregister(&test_intr_handle, > + test_interrupt_callback, > + &test_intr_handle)) < 0) { > + if (count != -EAGAIN) > + return -1; > + } > > if (flag == 0) { > printf("callback has not been called\n"); > -- > 2.21.0 > With this patch, my loop has been running for more than 10 minutes now. Reviewed-by: David Marchand <david.marchand@redhat.com> -- David Marchand
Aaron Conole <aconole@redhat.com> writes: > Because the eal interrupt framework can race when invoking the callback > and a separate unregister call, the test needs to accommodate the chance > that the two collide. Do this by checking the return value of unregister > against the race-condition flag (EAGAIN). > > Fixes: f1a6c22424ce ("app/test: update interrupts test") > Signed-off-by: Aaron Conole <aconole@redhat.com> > --- Ping. Still see these failures.
09/08/2019 11:18, David Marchand: > On Thu, Aug 8, 2019 at 7:38 PM Aaron Conole <aconole@redhat.com> wrote: > > > > Because the eal interrupt framework can race when invoking the callback > > and a separate unregister call, the test needs to accommodate the chance > > that the two collide. Do this by checking the return value of unregister > > against the race-condition flag (EAGAIN). > > > > Fixes: f1a6c22424ce ("app/test: update interrupts test") > > Not too sure about this tag, but anyway, this is old enough to apply > to every stable releases we have. > Cc: stable@dpdk.org ? > > > Signed-off-by: Aaron Conole <aconole@redhat.com> > > --- > > NOTE: it's difficult to reproduce this race. I tried a bit, but have > > only seen it sporadically. In Travis environment, the CPU > > resource can be very limited and this test is quite racy. > > Managed to reproduce it: > > # time (log=/tmp/$$.log; while true; do echo interrupt_autotest > |taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test -l 0-1 >$log > 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f $log) > EAL: Detected 8 lcore(s) > EAL: Detected 1 NUMA nodes > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > EAL: Selected IOVA mode 'PA' > EAL: No available hugepages reported in hugepages-1048576kB > EAL: Probing VFIO support... > EAL: PCI device 0000:00:1f.6 on NUMA socket -1 > EAL: Invalid NUMA socket, default to 0 > EAL: probe driver: 8086:15d7 net_e1000_em > APP: HPET is not enabled, using TSC as default timer > RTE>>interrupt_autotest > Check unknown valid interrupt full path > Check valid UIO interrupt full path > Check valid device event interrupt full path > count=-11 Resource temporarily unavailable > failure occurred during checking valid device event interrupt full path > Clearing for interrupt tests > Test Failed > RTE>> > real 0m38.081s > user 0m35.836s > sys 0m2.171s > > With this patch, my loop has been running for more than 10 minutes now. > Reviewed-by: David Marchand <david.marchand@redhat.com> Applied, thanks
diff --git a/app/test/test_interrupts.c b/app/test/test_interrupts.c index d8c2d8124..233b14a70 100644 --- a/app/test/test_interrupts.c +++ b/app/test/test_interrupts.c @@ -370,9 +370,13 @@ test_interrupt_full_path_check(enum test_interrupt_handle_type intr_type) rte_delay_ms(TEST_INTERRUPT_CHECK_INTERVAL); rte_delay_ms(TEST_INTERRUPT_CHECK_INTERVAL); - if (rte_intr_callback_unregister(&test_intr_handle, - test_interrupt_callback, &test_intr_handle) < 0) - return -1; + while ((count = + rte_intr_callback_unregister(&test_intr_handle, + test_interrupt_callback, + &test_intr_handle)) < 0) { + if (count != -EAGAIN) + return -1; + } if (flag == 0) { printf("callback has not been called\n");
Because the eal interrupt framework can race when invoking the callback and a separate unregister call, the test needs to accommodate the chance that the two collide. Do this by checking the return value of unregister against the race-condition flag (EAGAIN). Fixes: f1a6c22424ce ("app/test: update interrupts test") Signed-off-by: Aaron Conole <aconole@redhat.com> --- NOTE: it's difficult to reproduce this race. I tried a bit, but have only seen it sporadically. In Travis environment, the CPU resource can be very limited and this test is quite racy. app/test/test_interrupts.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)