Message ID | 20201221111359.22013-2-feifei.wang2@arm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | David Marchand |
Headers | show |
Series | refactoring ring library | expand |
Context | Check | Description |
---|---|---|
ci/checkpatch | warning | coding style issues |
> When testing ring performance in the case that multiple lcores are mapped to > the same physical core, e.g. --lcores '(0-3)@10', it takes a very long time > to wait for the "enqueue_dequeue_bulk_helper" to finish. This is because > too much iteration numbers and extremely low efficiency for enqueue and > dequeue with this kind of core mapping. Following are the test results to > show the above phenomenon: > > x86-Intel(R) Xeon(R) Gold 6240: > $sudo ./app/test/dpdk-test --lcores '(0-1)@25' > Testing using two hyperthreads(bulk (size: 8):) > iter_shift: 3 5 7 9 11 13 *15 17 19 21 23 > run time: 7s 7s 7s 8s 9s 16s 47s 170s 660s >0.5h >1h > legacy APIs: SP/SC: 37 11 6 40525 40525 40209 40367 40407 40541 NoData NoData > legacy APIs: MP/MC: 56 14 11 50657 40526 40526 40526 40625 40585 NoData NoData > > aarch64-n1sdp: > $sudo ./app/test/dpdk-test --lcore '(0-1)@1' > Testing using two hyperthreads(bulk (size: 8):) > iter_shift: 3 5 7 9 11 13 *15 17 19 21 23 > run time: 8s 8s 8s 9s 9s 14s 34s 111s 418s 25min >1h > legacy APIs: SP/SC: 0.4 0.2 0.1 488 488 488 488 488 489 489 NoData > legacy APIs: MP/MC: 0.4 0.3 0.2 488 488 488 488 490 489 489 NoData > > As the number of iterations increases, so does the time which is required to > run the program. Currently (iter_shift = 23), it will take more than 1 hour > to wait for the test to finish. To fix this, the "iter_shift" should decrease > and ensure enough iterations to keep the test data stable. In order to achieve > this, we also test with "-l" EAL argument: > > x86-Intel(R) Xeon(R) Gold 6240: > $sudo ./app/test/dpdk-test -l 25-26 > Testing using two NUMA nodes(bulk (size: 8):) > iter_shift: 3 5 7 9 11 13 *15 17 19 21 23 > run time: 6s 6s 6s 6s 6s 6s 6s 7s 8s 11s 27s > legacy APIs: SP/SC: 47 20 13 22 54 83 91 73 81 75 95 > legacy APIs: MP/MC: 44 18 18 240 245 270 250 249 252 250 253 > > aarch64-n1sdp: > $sudo ./app/test/dpdk-test -l 1-2 > Testing using two physical cores(bulk (size: 8):) > iter_shift: 3 5 7 9 11 13 *15 17 19 21 23 > run time: 8s 8s 8s 8s 8s 8s 8s 9s 9s 11s 23s > legacy APIs: SP/SC: 0.7 0.4 1.2 1.8 2.0 2.0 2.0 2.0 2.0 2.0 2.0 > legacy APIs: MP/MC: 0.3 0.4 1.3 1.9 2.9 2.9 2.9 2.9 2.9 2.9 2.9 > > According to above test data, when "iter_shift" is set as "15", the test run > time is reduced to less than 1 minute and the test result can keep stable > in x86 and aarch64 servers. > > Fixes: 1fa5d0099efc ("test/ring: add custom element size performance tests") > Cc: honnappa.nagarahalli@arm.com > Cc: stable@dpdk.org > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com> > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > --- > app/test/test_ring_perf.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c > index e63e25a86..fd82e2041 100644 > --- a/app/test/test_ring_perf.c > +++ b/app/test/test_ring_perf.c > @@ -178,7 +178,7 @@ enqueue_dequeue_bulk_helper(const unsigned int flag, const int esize, > struct thread_params *p) > { > int ret; > - const unsigned int iter_shift = 23; > + const unsigned int iter_shift = 15; > const unsigned int iterations = 1 << iter_shift; > struct rte_ring *r = p->r; > unsigned int bsize = p->size; > -- I think it would be better to rework the test(s) to terminate after some timeout (30s or so), and report number of ops per timeout. Anyway, as a short term fix, I am ok with it. Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > 2.17.1
Hi, Konstantin > -----邮件原件----- > 发件人: Ananyev, Konstantin <konstantin.ananyev@intel.com> > 发送时间: 2021年1月22日 21:16 > 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; Olivier Matz <olivier.matz@6wind.com>; > Gavin Hu <Gavin.Hu@arm.com> > 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org > 主题: RE: [PATCH v1 1/3] test/ring: reduce iteration numbers to make test > duration shorter > > > > When testing ring performance in the case that multiple lcores are > > mapped to the same physical core, e.g. --lcores '(0-3)@10', it takes a > > very long time to wait for the "enqueue_dequeue_bulk_helper" to > > finish. This is because too much iteration numbers and extremely low > > efficiency for enqueue and dequeue with this kind of core mapping. > > Following are the test results to show the above phenomenon: > > > > x86-Intel(R) Xeon(R) Gold 6240: > > $sudo ./app/test/dpdk-test --lcores '(0-1)@25' > > Testing using two hyperthreads(bulk (size: 8):) > > iter_shift: 3 5 7 9 11 13 *15 17 19 21 23 > > run time: 7s 7s 7s 8s 9s 16s 47s 170s 660s >0.5h >1h > > legacy APIs: SP/SC: 37 11 6 40525 40525 40209 40367 40407 40541 > NoData NoData > > legacy APIs: MP/MC: 56 14 11 50657 40526 40526 40526 40625 40585 > NoData NoData > > > > aarch64-n1sdp: > > $sudo ./app/test/dpdk-test --lcore '(0-1)@1' > > Testing using two hyperthreads(bulk (size: 8):) > > iter_shift: 3 5 7 9 11 13 *15 17 19 21 23 > > run time: 8s 8s 8s 9s 9s 14s 34s 111s 418s 25min >1h > > legacy APIs: SP/SC: 0.4 0.2 0.1 488 488 488 488 488 489 489 > NoData > > legacy APIs: MP/MC: 0.4 0.3 0.2 488 488 488 488 490 489 489 > NoData > > > > As the number of iterations increases, so does the time which is > > required to run the program. Currently (iter_shift = 23), it will take > > more than 1 hour to wait for the test to finish. To fix this, the > > "iter_shift" should decrease and ensure enough iterations to keep the > > test data stable. In order to achieve this, we also test with "-l" EAL > argument: > > > > x86-Intel(R) Xeon(R) Gold 6240: > > $sudo ./app/test/dpdk-test -l 25-26 > > Testing using two NUMA nodes(bulk (size: 8):) > > iter_shift: 3 5 7 9 11 13 *15 17 19 21 23 > > run time: 6s 6s 6s 6s 6s 6s 6s 7s 8s 11s 27s > > legacy APIs: SP/SC: 47 20 13 22 54 83 91 73 81 75 95 > > legacy APIs: MP/MC: 44 18 18 240 245 270 250 249 252 250 > 253 > > > > aarch64-n1sdp: > > $sudo ./app/test/dpdk-test -l 1-2 > > Testing using two physical cores(bulk (size: 8):) > > iter_shift: 3 5 7 9 11 13 *15 17 19 21 23 > > run time: 8s 8s 8s 8s 8s 8s 8s 9s 9s 11s 23s > > legacy APIs: SP/SC: 0.7 0.4 1.2 1.8 2.0 2.0 2.0 2.0 2.0 2.0 2.0 > > legacy APIs: MP/MC: 0.3 0.4 1.3 1.9 2.9 2.9 2.9 2.9 2.9 2.9 2.9 > > > > According to above test data, when "iter_shift" is set as "15", the > > test run time is reduced to less than 1 minute and the test result can > > keep stable in x86 and aarch64 servers. > > > > Fixes: 1fa5d0099efc ("test/ring: add custom element size performance > > tests") > > Cc: honnappa.nagarahalli@arm.com > > Cc: stable@dpdk.org > > > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com> > > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > --- > > app/test/test_ring_perf.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c > > index e63e25a86..fd82e2041 100644 > > --- a/app/test/test_ring_perf.c > > +++ b/app/test/test_ring_perf.c > > @@ -178,7 +178,7 @@ enqueue_dequeue_bulk_helper(const unsigned int > flag, const int esize, > > struct thread_params *p) > > { > > int ret; > > - const unsigned int iter_shift = 23; > > + const unsigned int iter_shift = 15; > > const unsigned int iterations = 1 << iter_shift; > > struct rte_ring *r = p->r; > > unsigned int bsize = p->size; > > -- > > I think it would be better to rework the test(s) to terminate after some > timeout (30s or so), and report number of ops per timeout. > Anyway, as a short term fix, I am ok with it. > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Ok, thanks very much. Best Regards Feifei > > > > 2.17.1
diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c index e63e25a86..fd82e2041 100644 --- a/app/test/test_ring_perf.c +++ b/app/test/test_ring_perf.c @@ -178,7 +178,7 @@ enqueue_dequeue_bulk_helper(const unsigned int flag, const int esize, struct thread_params *p) { int ret; - const unsigned int iter_shift = 23; + const unsigned int iter_shift = 15; const unsigned int iterations = 1 << iter_shift; struct rte_ring *r = p->r; unsigned int bsize = p->size;