[v2] test/service: fix spurious failures by extending timeout
Checks
Commit Message
This commit extends the timeout for service_may_be_active()
from 100ms to 1000ms. Local testing on a idle and loaded system
(compiling DPDK with all cores) always completes after 1 ms.
The wait time for a service-lcore to finish is also extended
from 100ms to 1000ms.
The same timeout waiting code was duplicated in two tests, and
is now refactored to a standalone function avoiding duplication.
Reported-by: David Marchand <david.marchand@redhat.com>
Suggested-by: Mattias Ronnblom <mattias.ronnblom@ericsson.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
---
Apologies for the quick respin noise; only the first diff-section
is added, no changes to the rest of the patch.
v2:
- v1 addressed only testcase 15 issue, v2 also addresses test
case 5, which has an service-lcore wait code-path.
---
app/test/test_service_cores.c | 47 ++++++++++++++++-------------------
1 file changed, 22 insertions(+), 25 deletions(-)
Comments
On Thu, Oct 6, 2022 at 10:28 AM Harry van Haaren
<harry.van.haaren@intel.com> wrote:
>
> This commit extends the timeout for service_may_be_active()
> from 100ms to 1000ms. Local testing on a idle and loaded system
> (compiling DPDK with all cores) always completes after 1 ms.
>
> The wait time for a service-lcore to finish is also extended
> from 100ms to 1000ms.
>
> The same timeout waiting code was duplicated in two tests, and
> is now refactored to a standalone function avoiding duplication.
>
> Reported-by: David Marchand <david.marchand@redhat.com>
> Suggested-by: Mattias Ronnblom <mattias.ronnblom@ericsson.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Just to be sure, do we want such a timeout in the test logic itself?
Is it that you want to make sure that the synchronisation happens in a
"reasonable" (subject to discussion ;-)) amount of time?
Otherwise, the unit tests run in the CI are themselves subject to a
10s x mutiplier timeout (-t meson test option).
And then I would rely on this overall timeout.
On 2022-10-06 10:39, David Marchand wrote:
> On Thu, Oct 6, 2022 at 10:28 AM Harry van Haaren
> <harry.van.haaren@intel.com> wrote:
>>
>> This commit extends the timeout for service_may_be_active()
>> from 100ms to 1000ms. Local testing on a idle and loaded system
>> (compiling DPDK with all cores) always completes after 1 ms.
>>
>> The wait time for a service-lcore to finish is also extended
>> from 100ms to 1000ms.
>>
>> The same timeout waiting code was duplicated in two tests, and
>> is now refactored to a standalone function avoiding duplication.
>>
>> Reported-by: David Marchand <david.marchand@redhat.com>
>> Suggested-by: Mattias Ronnblom <mattias.ronnblom@ericsson.com>
>> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
>
> Just to be sure, do we want such a timeout in the test logic itself?
I think it depends on how quickly you want to produce a failure, and
also if there are some follow-up tests in the same autotest that you
want to proceed with, regardless of the outcome.
> Is it that you want to make sure that the synchronisation happens in a
> "reasonable" (subject to discussion ;-)) amount of time?
>
> Otherwise, the unit tests run in the CI are themselves subject to a
> 10s x mutiplier timeout (-t meson test option).
> And then I would rely on this overall timeout.
>
>
@@ -123,14 +123,14 @@ unregister_all(void)
return TEST_SUCCESS;
}
-/* Wait until service lcore not active, or for 100x SERVICE_DELAY */
+/* Wait until service lcore not active, or for N times SERVICE_DELAY */
static void
wait_slcore_inactive(uint32_t slcore_id)
{
int i;
for (i = 0; rte_service_lcore_may_be_active(slcore_id) == 1 &&
- i < 100; i++)
+ i < 1000; i++)
rte_delay_ms(SERVICE_DELAY);
}
@@ -921,12 +921,26 @@ service_lcore_start_stop(void)
return unregister_all();
}
+static int
+service_ensure_stopped_with_timeout(uint32_t sid)
+{
+ /* give the service time to stop running */
+ int32_t timeout_ms = 1000;
+ int i;
+ for (i = 0; i < timeout_ms; i++) {
+ if (!rte_service_may_be_active(sid))
+ break;
+ rte_delay_ms(SERVICE_DELAY);
+ }
+
+ return rte_service_may_be_active(sid);
+}
+
/* stop a service and wait for it to become inactive */
static int
service_may_be_active(void)
{
const uint32_t sid = 0;
- int i;
/* expected failure cases */
TEST_ASSERT_EQUAL(-EINVAL, rte_service_may_be_active(10000),
@@ -946,19 +960,11 @@ service_may_be_active(void)
TEST_ASSERT_EQUAL(1, service_lcore_running_check(),
"Service core expected to poll service but it didn't");
- /* stop the service */
+ /* stop the service, and wait for not-active with timeout */
TEST_ASSERT_EQUAL(0, rte_service_runstate_set(sid, 0),
"Error: Service stop returned non-zero");
-
- /* give the service 100ms to stop running */
- for (i = 0; i < 100; i++) {
- if (!rte_service_may_be_active(sid))
- break;
- rte_delay_ms(SERVICE_DELAY);
- }
-
- TEST_ASSERT_EQUAL(0, rte_service_may_be_active(sid),
- "Error: Service not stopped after 100ms");
+ TEST_ASSERT_EQUAL(0, service_ensure_stopped_with_timeout(sid),
+ "Error: Service not stopped after timeout period.");
return unregister_all();
}
@@ -972,7 +978,6 @@ service_active_two_cores(void)
return TEST_SKIPPED;
const uint32_t sid = 0;
- int i;
uint32_t lcore = rte_get_next_lcore(/* start core */ -1,
/* skip main */ 1,
@@ -1002,16 +1007,8 @@ service_active_two_cores(void)
/* stop the service */
TEST_ASSERT_EQUAL(0, rte_service_runstate_set(sid, 0),
"Error: Service stop returned non-zero");
-
- /* give the service 100ms to stop running */
- for (i = 0; i < 100; i++) {
- if (!rte_service_may_be_active(sid))
- break;
- rte_delay_ms(SERVICE_DELAY);
- }
-
- TEST_ASSERT_EQUAL(0, rte_service_may_be_active(sid),
- "Error: Service not stopped after 100ms");
+ TEST_ASSERT_EQUAL(0, service_ensure_stopped_with_timeout(sid),
+ "Error: Service not stopped after timeout period.");
return unregister_all();
}