Message ID | 20190529130420.6428-1-maxime.coquelin@redhat.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 249321B993; Wed, 29 May 2019 15:04:49 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 2B72C1B955 for <dev@dpdk.org>; Wed, 29 May 2019 15:04:48 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5FB5EC063D01; Wed, 29 May 2019 13:04:37 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-24.ams2.redhat.com [10.36.112.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8F9256149A; Wed, 29 May 2019 13:04:25 +0000 (UTC) From: Maxime Coquelin <maxime.coquelin@redhat.com> To: dev@dpdk.org, tiwei.bie@intel.com, david.marchand@redhat.com, jfreimann@redhat.com, bruce.richardson@intel.com, zhihong.wang@intel.com, konstantin.ananyev@intel.com, mattias.ronnblom@ericsson.com Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Date: Wed, 29 May 2019 15:04:15 +0200 Message-Id: <20190529130420.6428-1-maxime.coquelin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 29 May 2019 13:04:47 +0000 (UTC) Subject: [dpdk-dev] [PATCH v3 0/5] vhost: I-cache pressure optimizations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
vhost: I-cache pressure optimizations
|
|
Message
Maxime Coquelin
May 29, 2019, 1:04 p.m. UTC
Some OVS-DPDK PVP benchmarks show a performance drop when switching from DPDK v17.11 to v18.11. With the addition of packed ring layout support, rte_vhost_enqueue_burst and rte_vhost_dequeue_burst became very large, and only a part of the instructions are executed (either packed or split ring used). This series aims at improving the I-cache pressure, first by un-inlining split and packed rings, but also by moving parts considered as cold in dedicated functions (dirty page logging, fragmented descriptors buffer management added for CVE-2018-1059). With the series applied, size of the enqueue and dequeue split paths is reduced significantly: +---------+--------------------+---------------------+ | Version | Enqueue split path | Dequeue split path | +---------+--------------------+---------------------+ | v19.05 | 16461B | 25521B | | +series | 7286B | 11285B | +---------+--------------------+---------------------+ Using perf tool to monitor iTLB-load-misses event while doing PVP benchmark with testpmd as vswitch, we can see the number of iTLB misses being reduced: - v19.05: # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 Performance counter stats for 'CPU(s) 2,3' (10 runs): 2,438 iTLB-load-miss ( +- 13.43% ) 10.00058928 +- 0.00000336 seconds time elapsed ( +- 0.00% ) - +series: # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 Performance counter stats for 'CPU(s) 2,3' (10 runs): 55 iTLB-load-miss ( +- 10.08% ) 10.00059466 +- 0.00000283 seconds time elapsed ( +- 0.00% ) The series also force the inlining of some rte_memcpy helpers, as by adding packed ring support, some of them were not more inlined but embedded as functions in the virtio_net object file, which was not expected. Finally, the series simplifies the descriptors buffers prefetching, by doing it in the recently introduced descriptor buffer mapping function. v3: === - Prefix alloc_copy_ind_table with vhost_ (Mattias) - Remove double new line (Tiwei) - Fix grammar error in patch 3's commit message (Jens) - Force noinline for hear copy functions (Mattias) - Fix dst assignement in copy_hdr_from_desc (Tiwei) v2: === - Fix checkpatch issue - Reset author for patch 5 (David) - Force non-inlining in patch 2 (David) - Fix typo in path 3 commit message (David) Maxime Coquelin (5): vhost: un-inline dirty pages logging functions vhost: do not inline packed and split functions vhost: do not inline unlikely fragmented buffers code vhost: simplify descriptor's buffer prefetching eal/x86: force inlining of all memcpy and mov helpers .../common/include/arch/x86/rte_memcpy.h | 18 +- lib/librte_vhost/vdpa.c | 2 +- lib/librte_vhost/vhost.c | 164 +++++++++++++++++ lib/librte_vhost/vhost.h | 165 ++---------------- lib/librte_vhost/virtio_net.c | 140 +++++++-------- 5 files changed, 251 insertions(+), 238 deletions(-)
Comments
On 5/29/19 3:04 PM, Maxime Coquelin wrote: > Some OVS-DPDK PVP benchmarks show a performance drop > when switching from DPDK v17.11 to v18.11. > > With the addition of packed ring layout support, > rte_vhost_enqueue_burst and rte_vhost_dequeue_burst > became very large, and only a part of the instructions > are executed (either packed or split ring used). > > This series aims at improving the I-cache pressure, > first by un-inlining split and packed rings, but > also by moving parts considered as cold in dedicated > functions (dirty page logging, fragmented descriptors > buffer management added for CVE-2018-1059). > > With the series applied, size of the enqueue and > dequeue split paths is reduced significantly: > > +---------+--------------------+---------------------+ > | Version | Enqueue split path | Dequeue split path | > +---------+--------------------+---------------------+ > | v19.05 | 16461B | 25521B | > | +series | 7286B | 11285B | > +---------+--------------------+---------------------+ > > Using perf tool to monitor iTLB-load-misses event > while doing PVP benchmark with testpmd as vswitch, > we can see the number of iTLB misses being reduced: > > - v19.05: > # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 > > Performance counter stats for 'CPU(s) 2,3' (10 runs): > > 2,438 iTLB-load-miss ( +- 13.43% ) > > 10.00058928 +- 0.00000336 seconds time elapsed ( +- 0.00% ) > > - +series: > # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 > > Performance counter stats for 'CPU(s) 2,3' (10 runs): > > 55 iTLB-load-miss ( +- 10.08% ) > > 10.00059466 +- 0.00000283 seconds time elapsed ( +- 0.00% ) > > The series also force the inlining of some rte_memcpy > helpers, as by adding packed ring support, some of them > were not more inlined but embedded as functions in > the virtio_net object file, which was not expected. > > Finally, the series simplifies the descriptors buffers > prefetching, by doing it in the recently introduced > descriptor buffer mapping function. > > v3: > === > - Prefix alloc_copy_ind_table with vhost_ (Mattias) > - Remove double new line (Tiwei) > - Fix grammar error in patch 3's commit message (Jens) > - Force noinline for hear copy functions (Mattias) > - Fix dst assignement in copy_hdr_from_desc (Tiwei) > > v2: > === > - Fix checkpatch issue > - Reset author for patch 5 (David) > - Force non-inlining in patch 2 (David) > - Fix typo in path 3 commit message (David) > > Maxime Coquelin (5): > vhost: un-inline dirty pages logging functions > vhost: do not inline packed and split functions > vhost: do not inline unlikely fragmented buffers code > vhost: simplify descriptor's buffer prefetching > eal/x86: force inlining of all memcpy and mov helpers > > .../common/include/arch/x86/rte_memcpy.h | 18 +- > lib/librte_vhost/vdpa.c | 2 +- > lib/librte_vhost/vhost.c | 164 +++++++++++++++++ > lib/librte_vhost/vhost.h | 165 ++---------------- > lib/librte_vhost/virtio_net.c | 140 +++++++-------- > 5 files changed, 251 insertions(+), 238 deletions(-) > Applied patches 1 to 4 to dpdk-next-virtio/master. Bruce, I'm assigning patch 5 to you in Patchwork, as this is not vhost/virtio specific. Thanks, Maxime
On Wed, Jun 05, 2019 at 02:32:27PM +0200, Maxime Coquelin wrote: > > > On 5/29/19 3:04 PM, Maxime Coquelin wrote: > > Some OVS-DPDK PVP benchmarks show a performance drop > > when switching from DPDK v17.11 to v18.11. > > > > With the addition of packed ring layout support, > > rte_vhost_enqueue_burst and rte_vhost_dequeue_burst > > became very large, and only a part of the instructions > > are executed (either packed or split ring used). > > > > This series aims at improving the I-cache pressure, > > first by un-inlining split and packed rings, but > > also by moving parts considered as cold in dedicated > > functions (dirty page logging, fragmented descriptors > > buffer management added for CVE-2018-1059). > > > > With the series applied, size of the enqueue and > > dequeue split paths is reduced significantly: > > > > +---------+--------------------+---------------------+ > > | Version | Enqueue split path | Dequeue split path | > > +---------+--------------------+---------------------+ > > | v19.05 | 16461B | 25521B | > > | +series | 7286B | 11285B | > > +---------+--------------------+---------------------+ > > > > Using perf tool to monitor iTLB-load-misses event > > while doing PVP benchmark with testpmd as vswitch, > > we can see the number of iTLB misses being reduced: > > > > - v19.05: > > # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 > > > > Performance counter stats for 'CPU(s) 2,3' (10 runs): > > > > 2,438 iTLB-load-miss ( +- 13.43% ) > > > > 10.00058928 +- 0.00000336 seconds time elapsed ( +- 0.00% ) > > > > - +series: > > # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 > > > > Performance counter stats for 'CPU(s) 2,3' (10 runs): > > > > 55 iTLB-load-miss ( +- 10.08% ) > > > > 10.00059466 +- 0.00000283 seconds time elapsed ( +- 0.00% ) > > > > The series also force the inlining of some rte_memcpy > > helpers, as by adding packed ring support, some of them > > were not more inlined but embedded as functions in > > the virtio_net object file, which was not expected. > > > > Finally, the series simplifies the descriptors buffers > > prefetching, by doing it in the recently introduced > > descriptor buffer mapping function. > > > > v3: > > === > > - Prefix alloc_copy_ind_table with vhost_ (Mattias) > > - Remove double new line (Tiwei) > > - Fix grammar error in patch 3's commit message (Jens) > > - Force noinline for hear copy functions (Mattias) > > - Fix dst assignement in copy_hdr_from_desc (Tiwei) > > > > v2: > > === > > - Fix checkpatch issue > > - Reset author for patch 5 (David) > > - Force non-inlining in patch 2 (David) > > - Fix typo in path 3 commit message (David) > > > > Maxime Coquelin (5): > > vhost: un-inline dirty pages logging functions > > vhost: do not inline packed and split functions > > vhost: do not inline unlikely fragmented buffers code > > vhost: simplify descriptor's buffer prefetching > > eal/x86: force inlining of all memcpy and mov helpers > > > > .../common/include/arch/x86/rte_memcpy.h | 18 +- > > lib/librte_vhost/vdpa.c | 2 +- > > lib/librte_vhost/vhost.c | 164 +++++++++++++++++ > > lib/librte_vhost/vhost.h | 165 ++---------------- > > lib/librte_vhost/virtio_net.c | 140 +++++++-------- > > 5 files changed, 251 insertions(+), 238 deletions(-) > > > > > Applied patches 1 to 4 to dpdk-next-virtio/master. > > Bruce, I'm assigning patch 5 to you in Patchwork, as this is not > vhost/virtio specific. > Patch looks ok to me, but I'm not the one to apply it. /Bruce
On 6/5/19 2:52 PM, Bruce Richardson wrote: > On Wed, Jun 05, 2019 at 02:32:27PM +0200, Maxime Coquelin wrote: >> >> >> On 5/29/19 3:04 PM, Maxime Coquelin wrote: >>> Some OVS-DPDK PVP benchmarks show a performance drop >>> when switching from DPDK v17.11 to v18.11. >>> >>> With the addition of packed ring layout support, >>> rte_vhost_enqueue_burst and rte_vhost_dequeue_burst >>> became very large, and only a part of the instructions >>> are executed (either packed or split ring used). >>> >>> This series aims at improving the I-cache pressure, >>> first by un-inlining split and packed rings, but >>> also by moving parts considered as cold in dedicated >>> functions (dirty page logging, fragmented descriptors >>> buffer management added for CVE-2018-1059). >>> >>> With the series applied, size of the enqueue and >>> dequeue split paths is reduced significantly: >>> >>> +---------+--------------------+---------------------+ >>> | Version | Enqueue split path | Dequeue split path | >>> +---------+--------------------+---------------------+ >>> | v19.05 | 16461B | 25521B | >>> | +series | 7286B | 11285B | >>> +---------+--------------------+---------------------+ >>> >>> Using perf tool to monitor iTLB-load-misses event >>> while doing PVP benchmark with testpmd as vswitch, >>> we can see the number of iTLB misses being reduced: >>> >>> - v19.05: >>> # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 >>> >>> Performance counter stats for 'CPU(s) 2,3' (10 runs): >>> >>> 2,438 iTLB-load-miss ( +- 13.43% ) >>> >>> 10.00058928 +- 0.00000336 seconds time elapsed ( +- 0.00% ) >>> >>> - +series: >>> # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 >>> >>> Performance counter stats for 'CPU(s) 2,3' (10 runs): >>> >>> 55 iTLB-load-miss ( +- 10.08% ) >>> >>> 10.00059466 +- 0.00000283 seconds time elapsed ( +- 0.00% ) >>> >>> The series also force the inlining of some rte_memcpy >>> helpers, as by adding packed ring support, some of them >>> were not more inlined but embedded as functions in >>> the virtio_net object file, which was not expected. >>> >>> Finally, the series simplifies the descriptors buffers >>> prefetching, by doing it in the recently introduced >>> descriptor buffer mapping function. >>> >>> v3: >>> === >>> - Prefix alloc_copy_ind_table with vhost_ (Mattias) >>> - Remove double new line (Tiwei) >>> - Fix grammar error in patch 3's commit message (Jens) >>> - Force noinline for hear copy functions (Mattias) >>> - Fix dst assignement in copy_hdr_from_desc (Tiwei) >>> >>> v2: >>> === >>> - Fix checkpatch issue >>> - Reset author for patch 5 (David) >>> - Force non-inlining in patch 2 (David) >>> - Fix typo in path 3 commit message (David) >>> >>> Maxime Coquelin (5): >>> vhost: un-inline dirty pages logging functions >>> vhost: do not inline packed and split functions >>> vhost: do not inline unlikely fragmented buffers code >>> vhost: simplify descriptor's buffer prefetching >>> eal/x86: force inlining of all memcpy and mov helpers >>> >>> .../common/include/arch/x86/rte_memcpy.h | 18 +- >>> lib/librte_vhost/vdpa.c | 2 +- >>> lib/librte_vhost/vhost.c | 164 +++++++++++++++++ >>> lib/librte_vhost/vhost.h | 165 ++---------------- >>> lib/librte_vhost/virtio_net.c | 140 +++++++-------- >>> 5 files changed, 251 insertions(+), 238 deletions(-) >>> >> >> >> Applied patches 1 to 4 to dpdk-next-virtio/master. >> >> Bruce, I'm assigning patch 5 to you in Patchwork, as this is not >> vhost/virtio specific. >> > Patch looks ok to me, but I'm not the one to apply it. Ok, my bad. I'll switch to the right maintainer. Thanks for the ack, Maxime > /Bruce >