From patchwork Fri May 17 12:22:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxime Coquelin X-Patchwork-Id: 53518 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A40A25F3C; Fri, 17 May 2019 14:22:41 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id C29335F34 for ; Fri, 17 May 2019 14:22:39 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4516CC049598; Fri, 17 May 2019 12:22:31 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-59.ams2.redhat.com [10.36.112.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 637E3100203C; Fri, 17 May 2019 12:22:23 +0000 (UTC) From: Maxime Coquelin To: dev@dpdk.org, tiwei.bie@intel.com, jfreimann@redhat.com, zhihong.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com Cc: Maxime Coquelin Date: Fri, 17 May 2019 14:22:15 +0200 Message-Id: <20190517122220.31283-1-maxime.coquelin@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 17 May 2019 12:22:33 +0000 (UTC) Subject: [dpdk-dev] [PATCH 0/5] vhost: I-cache pressure optimizations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Some OVS-DPDK PVP benchmarks show a performance drop when switching from DPDK v17.11 to v18.11. With the addition of packed ring layout support, rte_vhost_enqueue_burst and rte_vhost_dequeue_burst became very large, and only a part of the instructions are executed (either packed or split ring used). This series aims at improving the I-cache pressure, first by un-inlining split and packed rings, but also by moving parts considered as cold in dedicated functions (dirty page logging, fragmented descriptors buffer management added for CVE-2018-1059). With the series applied, size of the enqueue and dequeue split paths is reduced significantly: +---------+--------------------+---------------------+ | Version | Enqueue split path | Dequeue split path | +---------+--------------------+---------------------+ | v19.05 | 16461B | 25521B | | +series | 7286B | 11285B | +---------+--------------------+---------------------+ Using perf tool to monitor iTLB-load-misses event while doing PVP benchmark with testpmd as vswitch, we can see the number of iTLB misses being reduced: - v19.05: # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 Performance counter stats for 'CPU(s) 2,3' (10 runs): 2,438 iTLB-load-miss ( +- 13.43% ) 10.00058928 +- 0.00000336 seconds time elapsed ( +- 0.00% ) - +series: # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 Performance counter stats for 'CPU(s) 2,3' (10 runs): 55 iTLB-load-miss ( +- 10.08% ) 10.00059466 +- 0.00000283 seconds time elapsed ( +- 0.00% ) The series also force the inlining of some rte_memcpy helpers, as by adding packed ring support, some of them were not more inlined but embedded as functions in the virtio_net object file, which was not expected. Finally, the series simplifies the descriptors buffers prefetching, by doing it in the recently introduced descriptor buffer mapping function. Maxime Coquelin (4): vhost: un-inline dirty pages logging functions vhost: do not inline packed and split functions vhost: do not inline unlikely fragmented buffers code vhost: simplify descriptor's buffer prefetching root (1): eal/x86: force inlining of all memcpy and mov helpers .../common/include/arch/x86/rte_memcpy.h | 18 +- lib/librte_vhost/vhost.c | 165 ++++++++++++++++++ lib/librte_vhost/vhost.h | 164 ++--------------- lib/librte_vhost/virtio_net.c | 142 +++++++-------- 4 files changed, 250 insertions(+), 239 deletions(-)