List patch comments

GET /api/patches/397/comments/?format=api&order=id
HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Link: 
<https://patches.dpdk.org/api/patches/397/comments/?format=api&order=id&page=1>; rel="first",
<https://patches.dpdk.org/api/patches/397/comments/?format=api&order=id&page=1>; rel="last"
Vary: Accept
[ { "id": 836, "web_url": "https://patches.dpdk.org/comment/836/", "msgid": "<20140917152103.GE4213@localhost.localdomain>", "list_archive_url": "https://inbox.dpdk.org/dev/20140917152103.GE4213@localhost.localdomain", "date": "2014-09-17T15:21:03", "subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "submitter": { "id": 32, "url": "https://patches.dpdk.org/api/people/32/?format=api", "name": "Neil Horman", "email": "nhorman@tuxdriver.com" }, "content": "On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:\n> Make a small improvement to slow path TX performance by adding in a\n> prefetch for the second mbuf cache line.\n> Also move assignment of l2/l3 length values only when needed.\n> \n> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>\n> ---\n> lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++-----\n> 1 file changed, 7 insertions(+), 5 deletions(-)\n> \n> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> index 6f702b3..c0bb49f 100644\n> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,\n> \t\tixgbe_xmit_cleanup(txq);\n> \t}\n> \n> +\trte_prefetch0(&txe->mbuf->pool);\n> +\n\nCan you explain what all of these prefetches are doing? It looks to me like\nthey're just fetching the first caheline of the mempool structure, which it\nappears amounts to the pools name. I don't see that having any use here.\n\n> \t/* TX loop */\n> \tfor (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {\n> \t\tnew_ctx = 0;\n> \t\ttx_pkt = *tx_pkts++;\n> \t\tpkt_len = tx_pkt->pkt_len;\n> \n> -\t\tRTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);\n> -\n> \t\t/*\n> \t\t * Determine how many (if any) context descriptors\n> \t\t * are needed for offload functionality.\n> \t\t */\n> \t\tol_flags = tx_pkt->ol_flags;\n> -\t\tvlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;\n> -\t\tvlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;\n> \n> \t\t/* If hardware offload required */\n> \t\ttx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;\n> \t\tif (tx_ol_req) {\n> +\t\t\tvlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;\n> +\t\t\tvlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;\n> +\n> \t\t\t/* If new context need be built or reuse the exist ctx. */\n> \t\t\tctx = what_advctx_update(txq, tx_ol_req,\n> \t\t\t\tvlan_macip_lens.data);\n> @@ -720,7 +721,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,\n> \t\t\t\t &txr[tx_id];\n> \n> \t\t\t\ttxn = &sw_ring[txe->next_id];\n> -\t\t\t\tRTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);\n> +\t\t\t\trte_prefetch0(&txn->mbuf->pool);\n> \n> \t\t\t\tif (txe->mbuf != NULL) {\n> \t\t\t\t\trte_pktmbuf_free_seg(txe->mbuf);\n> @@ -749,6 +750,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,\n> \t\tdo {\n> \t\t\ttxd = &txr[tx_id];\n> \t\t\ttxn = &sw_ring[txe->next_id];\n> +\t\t\trte_prefetch0(&txn->mbuf->pool);\n> \n> \t\t\tif (txe->mbuf != NULL)\n> \t\t\t\trte_pktmbuf_free_seg(txe->mbuf);\n> -- \n> 1.9.3\n> \n>", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id A7A90B39E;\n\tWed, 17 Sep 2014 17:15:36 +0200 (CEST)", "from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58])\n\tby dpdk.org (Postfix) with ESMTP id CF48F18F\n\tfor <dev@dpdk.org>; Wed, 17 Sep 2014 17:15:34 +0200 (CEST)", "from nat-pool-rdu-u.redhat.com ([66.187.233.203] helo=localhost)\n\tby smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63)\n\t(envelope-from <nhorman@tuxdriver.com>)\n\tid 1XUH2b-0001Hb-I5; Wed, 17 Sep 2014 11:21:15 -0400" ], "Date": "Wed, 17 Sep 2014 11:21:03 -0400", "From": "Neil Horman <nhorman@tuxdriver.com>", "To": "Bruce Richardson <bruce.richardson@intel.com>", "Message-ID": "<20140917152103.GE4213@localhost.localdomain>", "References": "<1410948102-12740-1-git-send-email-bruce.richardson@intel.com>\n\t<1410948102-12740-3-git-send-email-bruce.richardson@intel.com>", "MIME-Version": "1.0", "Content-Type": "text/plain; charset=us-ascii", "Content-Disposition": "inline", "In-Reply-To": "<1410948102-12740-3-git-send-email-bruce.richardson@intel.com>", "User-Agent": "Mutt/1.5.23 (2014-03-12)", "X-Spam-Score": "-2.9 (--)", "X-Spam-Status": "No", "Cc": "dev@dpdk.org", "Subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "patches and discussions about DPDK <dev.dpdk.org>", "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://dpdk.org/ml/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null }, { "id": 839, "web_url": "https://patches.dpdk.org/comment/839/", "msgid": "<59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com>", "list_archive_url": "https://inbox.dpdk.org/dev/59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com", "date": "2014-09-17T15:35:19", "subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "submitter": { "id": 20, "url": "https://patches.dpdk.org/api/people/20/?format=api", "name": "Bruce Richardson", "email": "bruce.richardson@intel.com" }, "content": "> -----Original Message-----\n> From: Neil Horman [mailto:nhorman@tuxdriver.com]\n> Sent: Wednesday, September 17, 2014 4:21 PM\n> To: Richardson, Bruce\n> Cc: dev@dpdk.org\n> Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx\n> perf\n> \n> On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:\n> > Make a small improvement to slow path TX performance by adding in a\n> > prefetch for the second mbuf cache line.\n> > Also move assignment of l2/l3 length values only when needed.\n> >\n> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>\n> > ---\n> > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++-----\n> > 1 file changed, 7 insertions(+), 5 deletions(-)\n> >\n> > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > index 6f702b3..c0bb49f 100644\n> > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf\n> **tx_pkts,\n> > \t\tixgbe_xmit_cleanup(txq);\n> > \t}\n> >\n> > +\trte_prefetch0(&txe->mbuf->pool);\n> > +\n> \n> Can you explain what all of these prefetches are doing? It looks to me like\n> they're just fetching the first caheline of the mempool structure, which it\n> appears amounts to the pools name. I don't see that having any use here.\n> \nThis does make a decent enough performance difference in my tests (the amount varies depending on the RX path being used by testpmd). \n\nWhat I've done with the prefetches is two-fold:\n1) changed it from prefetching the mbuf (first cache line) to prefetching the mbuf pool pointer (second cache line) so that when we go to access the pool pointer to free transmitted mbufs we don't get a cache miss. When clearing the ring and freeing mbufs, the pool pointer is the only mbuf field used, so we don't need that first cache line.\n2) changed the code to prefetch earlier - in effect to prefetch one mbuf ahead. The original code prefetched the mbuf to be freed as soon as it started processing the mbuf to replace it. Instead now, every time we calculate what the next mbuf position is going to be we prefetch the mbuf in that position (i.e. the mbuf pool pointer we are going to free the mbuf to), even while we are still updating the previous mbuf slot on the ring. This gives the prefetch much more time to resolve and get the data we need in the cache before we need it.\n\nHope this clarifies things.\n\n/Bruce", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id 57FBDB3A1;\n\tWed, 17 Sep 2014 17:33:36 +0200 (CEST)", "from mga01.intel.com (mga01.intel.com [192.55.52.88])\n\tby dpdk.org (Postfix) with ESMTP id 08DA968C3\n\tfor <dev@dpdk.org>; Wed, 17 Sep 2014 17:33:34 +0200 (CEST)", "from fmsmga003.fm.intel.com ([10.253.24.29])\n\tby fmsmga101.fm.intel.com with ESMTP; 17 Sep 2014 08:36:03 -0700", "from irsmsx103.ger.corp.intel.com ([163.33.3.157])\n\tby FMSMGA003.fm.intel.com with ESMTP; 17 Sep 2014 08:30:32 -0700", "from irsmsx105.ger.corp.intel.com (163.33.3.28) by\n\tIRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP\n\tServer (TLS) id 14.3.195.1; Wed, 17 Sep 2014 16:35:20 +0100", "from irsmsx103.ger.corp.intel.com ([169.254.3.112]) by\n\tIRSMSX105.ger.corp.intel.com ([169.254.7.158]) with mapi id\n\t14.03.0195.001; Wed, 17 Sep 2014 16:35:20 +0100" ], "X-ExtLoop1": "1", "X-IronPort-AV": "E=Sophos;i=\"4.97,862,1389772800\"; d=\"scan'208\";a=\"387455441\"", "From": "\"Richardson, Bruce\" <bruce.richardson@intel.com>", "To": "Neil Horman <nhorman@tuxdriver.com>", "Thread-Topic": "[dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve\n\tslow-path tx perf", "Thread-Index": "AQHP0l6UcfMND56eKE+7WeGe29qoUZwFYHuAgAAS+mA=", "Date": "Wed, 17 Sep 2014 15:35:19 +0000", "Message-ID": "<59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com>", "References": "<1410948102-12740-1-git-send-email-bruce.richardson@intel.com>\n\t<1410948102-12740-3-git-send-email-bruce.richardson@intel.com>\n\t<20140917152103.GE4213@localhost.localdomain>", "In-Reply-To": "<20140917152103.GE4213@localhost.localdomain>", "Accept-Language": "en-GB, en-US", "Content-Language": "en-US", "X-MS-Has-Attach": "", "X-MS-TNEF-Correlator": "", "x-originating-ip": "[163.33.239.180]", "Content-Type": "text/plain; charset=\"us-ascii\"", "Content-Transfer-Encoding": "quoted-printable", "MIME-Version": "1.0", "Cc": "\"dev@dpdk.org\" <dev@dpdk.org>", "Subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "patches and discussions about DPDK <dev.dpdk.org>", "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://dpdk.org/ml/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null }, { "id": 842, "web_url": "https://patches.dpdk.org/comment/842/", "msgid": "<20140917175936.GA13492@hmsreliant.think-freely.org>", "list_archive_url": "https://inbox.dpdk.org/dev/20140917175936.GA13492@hmsreliant.think-freely.org", "date": "2014-09-17T17:59:36", "subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "submitter": { "id": 32, "url": "https://patches.dpdk.org/api/people/32/?format=api", "name": "Neil Horman", "email": "nhorman@tuxdriver.com" }, "content": "On Wed, Sep 17, 2014 at 03:35:19PM +0000, Richardson, Bruce wrote:\n> \n> > -----Original Message-----\n> > From: Neil Horman [mailto:nhorman@tuxdriver.com]\n> > Sent: Wednesday, September 17, 2014 4:21 PM\n> > To: Richardson, Bruce\n> > Cc: dev@dpdk.org\n> > Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx\n> > perf\n> > \n> > On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:\n> > > Make a small improvement to slow path TX performance by adding in a\n> > > prefetch for the second mbuf cache line.\n> > > Also move assignment of l2/l3 length values only when needed.\n> > >\n> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>\n> > > ---\n> > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++-----\n> > > 1 file changed, 7 insertions(+), 5 deletions(-)\n> > >\n> > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > index 6f702b3..c0bb49f 100644\n> > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf\n> > **tx_pkts,\n> > > \t\tixgbe_xmit_cleanup(txq);\n> > > \t}\n> > >\n> > > +\trte_prefetch0(&txe->mbuf->pool);\n> > > +\n> > \n> > Can you explain what all of these prefetches are doing? It looks to me like\n> > they're just fetching the first caheline of the mempool structure, which it\n> > appears amounts to the pools name. I don't see that having any use here.\n> > \n> This does make a decent enough performance difference in my tests (the amount varies depending on the RX path being used by testpmd). \n> \n> What I've done with the prefetches is two-fold:\n> 1) changed it from prefetching the mbuf (first cache line) to prefetching the mbuf pool pointer (second cache line) so that when we go to access the pool pointer to free transmitted mbufs we don't get a cache miss. When clearing the ring and freeing mbufs, the pool pointer is the only mbuf field used, so we don't need that first cache line.\nok, this makes some sense, but you're not guaranteed to either have that\nprefetch be needed, nor are you certain it will still be in cache by the time\nyou get to the free call. Seems like it might be preferable to prefecth the\ndata pointed to by tx_pkt, as you're sure to use that every loop iteration.\n\n> 2) changed the code to prefetch earlier - in effect to prefetch one mbuf ahead. The original code prefetched the mbuf to be freed as soon as it started processing the mbuf to replace it. Instead now, every time we calculate what the next mbuf position is going to be we prefetch the mbuf in that position (i.e. the mbuf pool pointer we are going to free the mbuf to), even while we are still updating the previous mbuf slot on the ring. This gives the prefetch much more time to resolve and get the data we need in the cache before we need it.\n> \nAgain, early isn't necessecarily better, as it just means more time for the data\nin cache to get victimized. It seems like it would be better to prefetch the\ntx_pkts data a few cache lines ahead.\n\nNeil\n\n> Hope this clarifies things.\n> \n> /Bruce\n>", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id B4330B3AC;\n\tWed, 17 Sep 2014 19:54:04 +0200 (CEST)", "from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58])\n\tby dpdk.org (Postfix) with ESMTP id D57DFB3AB\n\tfor <dev@dpdk.org>; Wed, 17 Sep 2014 19:54:02 +0200 (CEST)", "from hmsreliant.think-freely.org\n\t([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost)\n\tby smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63)\n\t(envelope-from <nhorman@tuxdriver.com>)\n\tid 1XUJVx-0002WF-HG; Wed, 17 Sep 2014 13:59:43 -0400" ], "Date": "Wed, 17 Sep 2014 13:59:36 -0400", "From": "Neil Horman <nhorman@tuxdriver.com>", "To": "\"Richardson, Bruce\" <bruce.richardson@intel.com>", "Message-ID": "<20140917175936.GA13492@hmsreliant.think-freely.org>", "References": "<1410948102-12740-1-git-send-email-bruce.richardson@intel.com>\n\t<1410948102-12740-3-git-send-email-bruce.richardson@intel.com>\n\t<20140917152103.GE4213@localhost.localdomain>\n\t<59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com>", "MIME-Version": "1.0", "Content-Type": "text/plain; charset=us-ascii", "Content-Disposition": "inline", "In-Reply-To": "<59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com>", "User-Agent": "Mutt/1.5.23 (2014-03-12)", "X-Spam-Score": "-2.9 (--)", "X-Spam-Status": "No", "Cc": "\"dev@dpdk.org\" <dev@dpdk.org>", "Subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "patches and discussions about DPDK <dev.dpdk.org>", "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://dpdk.org/ml/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null }, { "id": 858, "web_url": "https://patches.dpdk.org/comment/858/", "msgid": "<20140918133613.GA7208@BRICHA3-MOBL>", "list_archive_url": "https://inbox.dpdk.org/dev/20140918133613.GA7208@BRICHA3-MOBL", "date": "2014-09-18T13:36:13", "subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "submitter": { "id": 20, "url": "https://patches.dpdk.org/api/people/20/?format=api", "name": "Bruce Richardson", "email": "bruce.richardson@intel.com" }, "content": "On Wed, Sep 17, 2014 at 01:59:36PM -0400, Neil Horman wrote:\n> On Wed, Sep 17, 2014 at 03:35:19PM +0000, Richardson, Bruce wrote:\n> > \n> > > -----Original Message-----\n> > > From: Neil Horman [mailto:nhorman@tuxdriver.com]\n> > > Sent: Wednesday, September 17, 2014 4:21 PM\n> > > To: Richardson, Bruce\n> > > Cc: dev@dpdk.org\n> > > Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx\n> > > perf\n> > > \n> > > On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:\n> > > > Make a small improvement to slow path TX performance by adding in a\n> > > > prefetch for the second mbuf cache line.\n> > > > Also move assignment of l2/l3 length values only when needed.\n> > > >\n> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>\n> > > > ---\n> > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++-----\n> > > > 1 file changed, 7 insertions(+), 5 deletions(-)\n> > > >\n> > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > index 6f702b3..c0bb49f 100644\n> > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf\n> > > **tx_pkts,\n> > > > \t\tixgbe_xmit_cleanup(txq);\n> > > > \t}\n> > > >\n> > > > +\trte_prefetch0(&txe->mbuf->pool);\n> > > > +\n> > > \n> > > Can you explain what all of these prefetches are doing? It looks to me like\n> > > they're just fetching the first caheline of the mempool structure, which it\n> > > appears amounts to the pools name. I don't see that having any use here.\n> > > \n> > This does make a decent enough performance difference in my tests (the amount varies depending on the RX path being used by testpmd). \n> > \n> > What I've done with the prefetches is two-fold:\n> > 1) changed it from prefetching the mbuf (first cache line) to prefetching the mbuf pool pointer (second cache line) so that when we go to access the pool pointer to free transmitted mbufs we don't get a cache miss. When clearing the ring and freeing mbufs, the pool pointer is the only mbuf field used, so we don't need that first cache line.\n> ok, this makes some sense, but you're not guaranteed to either have that\n> prefetch be needed, nor are you certain it will still be in cache by the time\n> you get to the free call. Seems like it might be preferable to prefecth the\n> data pointed to by tx_pkt, as you're sure to use that every loop iteration.\n\nThe vast majority of the times the prefetch is necessary, and it does help \nperformance doing things this way. If the prefetch is not necessary, it's \njust one extra instruction, while, if it is needed, having the prefetch \noccur 20 cycles before access (picking an arbitrary value) means that we \nhave cut down the time it takes to pull the data from cache when it is \nneeded by 20 cycles. As for the value pointed to by tx_pkt, since this is a \npacket the app has just been working on, it's almost certainly already in \nl1/l2 cache. \n\n> \n> > 2) changed the code to prefetch earlier - in effect to prefetch one mbuf ahead. The original code prefetched the mbuf to be freed as soon as it started processing the mbuf to replace it. Instead now, every time we calculate what the next mbuf position is going to be we prefetch the mbuf in that position (i.e. the mbuf pool pointer we are going to free the mbuf to), even while we are still updating the previous mbuf slot on the ring. This gives the prefetch much more time to resolve and get the data we need in the cache before we need it.\n> > \n> Again, early isn't necessecarily better, as it just means more time for the data\n> in cache to get victimized. It seems like it would be better to prefetch the\n> tx_pkts data a few cache lines ahead.\n> \n> Neil\n\nBasically it all comes down to measured performance - working with \nprefetches is not an exactly science, sadly. I've just re-run a quick sanity \ntest on this patch in the sequence. Running with testpmd on a single core, \n40G of small packet input, I see considerable performance increases. What \nI've run is:\n* testpmd with a single forwarding core, defaults - which means slow path RX \n+ slow path TX (i.e. this code): Performance with this patch increases by \nalmost 8%\n* testpmd with a single forwarding core, defaults + rxfreet=32 - which means \nvector RX path + slow path TX (again, this code path): Performance \nincreases by over 18%.\n\nGiven these numbers, the prefetching seems better this way. Perhaps you \ncould run some tests yourself and see if you see a similar performance delta \n(or perhaps there are other scenarios I'm missing here)?\n\nRegards,\n/Bruce", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id 57487B3A8;\n\tThu, 18 Sep 2014 15:30:33 +0200 (CEST)", "from mga11.intel.com (mga11.intel.com [192.55.52.93])\n\tby dpdk.org (Postfix) with ESMTP id D999168C2\n\tfor <dev@dpdk.org>; Thu, 18 Sep 2014 15:30:31 +0200 (CEST)", "from fmsmga002.fm.intel.com ([10.253.24.26])\n\tby fmsmga102.fm.intel.com with ESMTP; 18 Sep 2014 06:36:16 -0700", "from bricha3-mobl.ger.corp.intel.com (HELO\n\tbricha3-mobl.ir.intel.com) ([10.243.20.22])\n\tby fmsmga002.fm.intel.com with SMTP; 18 Sep 2014 06:36:14 -0700", "by bricha3-mobl.ir.intel.com (sSMTP sendmail emulation);\n\tThu, 18 Sep 2014 14:36:13 +0001" ], "X-ExtLoop1": "1", "X-IronPort-AV": "E=Sophos;i=\"5.04,547,1406617200\"; d=\"scan'208\";a=\"601609492\"", "Date": "Thu, 18 Sep 2014 14:36:13 +0100", "From": "Bruce Richardson <bruce.richardson@intel.com>", "To": "Neil Horman <nhorman@tuxdriver.com>", "Message-ID": "<20140918133613.GA7208@BRICHA3-MOBL>", "References": "<1410948102-12740-1-git-send-email-bruce.richardson@intel.com>\n\t<1410948102-12740-3-git-send-email-bruce.richardson@intel.com>\n\t<20140917152103.GE4213@localhost.localdomain>\n\t<59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com>\n\t<20140917175936.GA13492@hmsreliant.think-freely.org>", "MIME-Version": "1.0", "Content-Type": "text/plain; charset=us-ascii", "Content-Disposition": "inline", "In-Reply-To": "<20140917175936.GA13492@hmsreliant.think-freely.org>", "Organization": "Intel Shannon Ltd.", "User-Agent": "Mutt/1.5.22 (2013-10-16)", "Cc": "\"dev@dpdk.org\" <dev@dpdk.org>", "Subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "patches and discussions about DPDK <dev.dpdk.org>", "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://dpdk.org/ml/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null }, { "id": 861, "web_url": "https://patches.dpdk.org/comment/861/", "msgid": "<20140918152930.GG20389@hmsreliant.think-freely.org>", "list_archive_url": "https://inbox.dpdk.org/dev/20140918152930.GG20389@hmsreliant.think-freely.org", "date": "2014-09-18T15:29:30", "subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "submitter": { "id": 32, "url": "https://patches.dpdk.org/api/people/32/?format=api", "name": "Neil Horman", "email": "nhorman@tuxdriver.com" }, "content": "On Thu, Sep 18, 2014 at 02:36:13PM +0100, Bruce Richardson wrote:\n> On Wed, Sep 17, 2014 at 01:59:36PM -0400, Neil Horman wrote:\n> > On Wed, Sep 17, 2014 at 03:35:19PM +0000, Richardson, Bruce wrote:\n> > > \n> > > > -----Original Message-----\n> > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]\n> > > > Sent: Wednesday, September 17, 2014 4:21 PM\n> > > > To: Richardson, Bruce\n> > > > Cc: dev@dpdk.org\n> > > > Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx\n> > > > perf\n> > > > \n> > > > On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:\n> > > > > Make a small improvement to slow path TX performance by adding in a\n> > > > > prefetch for the second mbuf cache line.\n> > > > > Also move assignment of l2/l3 length values only when needed.\n> > > > >\n> > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>\n> > > > > ---\n> > > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++-----\n> > > > > 1 file changed, 7 insertions(+), 5 deletions(-)\n> > > > >\n> > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > index 6f702b3..c0bb49f 100644\n> > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf\n> > > > **tx_pkts,\n> > > > > \t\tixgbe_xmit_cleanup(txq);\n> > > > > \t}\n> > > > >\n> > > > > +\trte_prefetch0(&txe->mbuf->pool);\n> > > > > +\n> > > > \n> > > > Can you explain what all of these prefetches are doing? It looks to me like\n> > > > they're just fetching the first caheline of the mempool structure, which it\n> > > > appears amounts to the pools name. I don't see that having any use here.\n> > > > \n> > > This does make a decent enough performance difference in my tests (the amount varies depending on the RX path being used by testpmd). \n> > > \n> > > What I've done with the prefetches is two-fold:\n> > > 1) changed it from prefetching the mbuf (first cache line) to prefetching the mbuf pool pointer (second cache line) so that when we go to access the pool pointer to free transmitted mbufs we don't get a cache miss. When clearing the ring and freeing mbufs, the pool pointer is the only mbuf field used, so we don't need that first cache line.\n> > ok, this makes some sense, but you're not guaranteed to either have that\n> > prefetch be needed, nor are you certain it will still be in cache by the time\n> > you get to the free call. Seems like it might be preferable to prefecth the\n> > data pointed to by tx_pkt, as you're sure to use that every loop iteration.\n> \n> The vast majority of the times the prefetch is necessary, and it does help \n> performance doing things this way. If the prefetch is not necessary, it's \n> just one extra instruction, while, if it is needed, having the prefetch \n> occur 20 cycles before access (picking an arbitrary value) means that we \n> have cut down the time it takes to pull the data from cache when it is \n> needed by 20 cycles.\nI understand how prefetch works. What I'm concerned about is its overuse, and\nits tendency to frequently need re-calibration (though I admit I missed the &\noperator in the patch, and thought you were prefetching the contents of the\nstruct, not the pointer value itself). As you say, if the pool pointer is\nalmost certain to be used, then it may well make sense to prefetch the data, but\nin doing so, you potentially evict something that you were about to use, so\nyou're not doing yourself any favors. I understand that you've validated this\nexperimentally, and so it works, right now. I just like to be very careful\nabout how prefetch happens, as it can easily (and sliently) start hurting far\nmore than it helps.\n\n> As for the value pointed to by tx_pkt, since this is a \n> packet the app has just been working on, it's almost certainly already in \n> l1/l2 cache. \n> \nNot sure I follow you here. tx_pkts is an array of mbufs passed to the pmd from\nrte_eth_tx_burts, which in turn is called by the application. I don't see any\nreasonable guarantee that any of those packets have been touch in sufficiently\nrecent history that they are likely to be in cache. It seems like, if you do\nwant to do prefetching, interrotagting nb_tx and doing a prefetch of an\napproriate stride to fill multiple cachelines with successive mbuf headers might\nprovide superior performance.\nNeil", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id E5B61B3AE;\n\tThu, 18 Sep 2014 17:23:56 +0200 (CEST)", "from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58])\n\tby dpdk.org (Postfix) with ESMTP id E1977B3AD\n\tfor <dev@dpdk.org>; Thu, 18 Sep 2014 17:23:53 +0200 (CEST)", "from hmsreliant.think-freely.org\n\t([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost)\n\tby smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63)\n\t(envelope-from <nhorman@tuxdriver.com>)\n\tid 1XUdeG-0002Vh-7e; Thu, 18 Sep 2014 11:29:38 -0400" ], "Date": "Thu, 18 Sep 2014 11:29:30 -0400", "From": "Neil Horman <nhorman@tuxdriver.com>", "To": "Bruce Richardson <bruce.richardson@intel.com>", "Message-ID": "<20140918152930.GG20389@hmsreliant.think-freely.org>", "References": "<1410948102-12740-1-git-send-email-bruce.richardson@intel.com>\n\t<1410948102-12740-3-git-send-email-bruce.richardson@intel.com>\n\t<20140917152103.GE4213@localhost.localdomain>\n\t<59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com>\n\t<20140917175936.GA13492@hmsreliant.think-freely.org>\n\t<20140918133613.GA7208@BRICHA3-MOBL>", "MIME-Version": "1.0", "Content-Type": "text/plain; charset=us-ascii", "Content-Disposition": "inline", "In-Reply-To": "<20140918133613.GA7208@BRICHA3-MOBL>", "User-Agent": "Mutt/1.5.23 (2014-03-12)", "X-Spam-Score": "-2.9 (--)", "X-Spam-Status": "No", "Cc": "\"dev@dpdk.org\" <dev@dpdk.org>", "Subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "patches and discussions about DPDK <dev.dpdk.org>", "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://dpdk.org/ml/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null }, { "id": 863, "web_url": "https://patches.dpdk.org/comment/863/", "msgid": "<20140918154235.GB12120@BRICHA3-MOBL>", "list_archive_url": "https://inbox.dpdk.org/dev/20140918154235.GB12120@BRICHA3-MOBL", "date": "2014-09-18T15:42:36", "subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "submitter": { "id": 20, "url": "https://patches.dpdk.org/api/people/20/?format=api", "name": "Bruce Richardson", "email": "bruce.richardson@intel.com" }, "content": "On Thu, Sep 18, 2014 at 11:29:30AM -0400, Neil Horman wrote:\n> On Thu, Sep 18, 2014 at 02:36:13PM +0100, Bruce Richardson wrote:\n> > On Wed, Sep 17, 2014 at 01:59:36PM -0400, Neil Horman wrote:\n> > > On Wed, Sep 17, 2014 at 03:35:19PM +0000, Richardson, Bruce wrote:\n> > > > \n> > > > > -----Original Message-----\n> > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]\n> > > > > Sent: Wednesday, September 17, 2014 4:21 PM\n> > > > > To: Richardson, Bruce\n> > > > > Cc: dev@dpdk.org\n> > > > > Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx\n> > > > > perf\n> > > > > \n> > > > > On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:\n> > > > > > Make a small improvement to slow path TX performance by adding in a\n> > > > > > prefetch for the second mbuf cache line.\n> > > > > > Also move assignment of l2/l3 length values only when needed.\n> > > > > >\n> > > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>\n> > > > > > ---\n> > > > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++-----\n> > > > > > 1 file changed, 7 insertions(+), 5 deletions(-)\n> > > > > >\n> > > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > > index 6f702b3..c0bb49f 100644\n> > > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf\n> > > > > **tx_pkts,\n> > > > > > \t\tixgbe_xmit_cleanup(txq);\n> > > > > > \t}\n> > > > > >\n> > > > > > +\trte_prefetch0(&txe->mbuf->pool);\n> > > > > > +\n> > > > > \n> > > > > Can you explain what all of these prefetches are doing? It looks to me like\n> > > > > they're just fetching the first caheline of the mempool structure, which it\n> > > > > appears amounts to the pools name. I don't see that having any use here.\n> > > > > \n> > > > This does make a decent enough performance difference in my tests (the amount varies depending on the RX path being used by testpmd). \n> > > > \n> > > > What I've done with the prefetches is two-fold:\n> > > > 1) changed it from prefetching the mbuf (first cache line) to prefetching the mbuf pool pointer (second cache line) so that when we go to access the pool pointer to free transmitted mbufs we don't get a cache miss. When clearing the ring and freeing mbufs, the pool pointer is the only mbuf field used, so we don't need that first cache line.\n> > > ok, this makes some sense, but you're not guaranteed to either have that\n> > > prefetch be needed, nor are you certain it will still be in cache by the time\n> > > you get to the free call. Seems like it might be preferable to prefecth the\n> > > data pointed to by tx_pkt, as you're sure to use that every loop iteration.\n> > \n> > The vast majority of the times the prefetch is necessary, and it does help \n> > performance doing things this way. If the prefetch is not necessary, it's \n> > just one extra instruction, while, if it is needed, having the prefetch \n> > occur 20 cycles before access (picking an arbitrary value) means that we \n> > have cut down the time it takes to pull the data from cache when it is \n> > needed by 20 cycles.\n> I understand how prefetch works. What I'm concerned about is its overuse, and\n> its tendency to frequently need re-calibration (though I admit I missed the &\n> operator in the patch, and thought you were prefetching the contents of the\n> struct, not the pointer value itself). As you say, if the pool pointer is\n> almost certain to be used, then it may well make sense to prefetch the data, but\n> in doing so, you potentially evict something that you were about to use, so\n> you're not doing yourself any favors. I understand that you've validated this\n> experimentally, and so it works, right now. I just like to be very careful\n> about how prefetch happens, as it can easily (and sliently) start hurting far\n> more than it helps.\n> \n> > As for the value pointed to by tx_pkt, since this is a \n> > packet the app has just been working on, it's almost certainly already in \n> > l1/l2 cache. \n> > \n> Not sure I follow you here. tx_pkts is an array of mbufs passed to the pmd from\n> rte_eth_tx_burts, which in turn is called by the application. I don't see any\n> reasonable guarantee that any of those packets have been touch in sufficiently\n> recent history that they are likely to be in cache. It seems like, if you do\n> want to do prefetching, interrotagting nb_tx and doing a prefetch of an\n> approriate stride to fill multiple cachelines with successive mbuf headers might\n> provide superior performance.\n> Neil\n>\nPrefetching the mbuf is probably best left to the application. For all our \nsample applications used for benchmarking, and almost certainly the vast \nmajority of all our example applications, the packet being transmitted is \nalready in cache on the core itself. Adding a prefetch to the tx function I \nwould expect to see a performance decrease in both testpmd and l3fwd apps. \nI would be useful for apps where the packets are passed from one core to \nanother core which does no processing of them before transmitting them - but \nin that case, it's better to have the TX thread of the app do the prefetch \nrather than forcing it in the driver and reduce the performance of those \napps that have the packets already in cache.\n\nThe prefetch added by the patch under discussion doesn't suffer from this \nissue as the data being prefetched is for the mbuf that was previously \ntransmitted some time previously, and the tx function has fully looped back \naround the TX ring to get to it again.\n\n/Bruce", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id B5A70B3B2;\n\tThu, 18 Sep 2014 17:37:02 +0200 (CEST)", "from mga11.intel.com (mga11.intel.com [192.55.52.93])\n\tby dpdk.org (Postfix) with ESMTP id 09858B3B0\n\tfor <dev@dpdk.org>; Thu, 18 Sep 2014 17:37:00 +0200 (CEST)", "from azsmga001.ch.intel.com ([10.2.17.19])\n\tby fmsmga102.fm.intel.com with ESMTP; 18 Sep 2014 08:42:39 -0700", "from bricha3-mobl.ger.corp.intel.com (HELO\n\tbricha3-mobl.ir.intel.com) ([10.237.220.58])\n\tby azsmga001.ch.intel.com with SMTP; 18 Sep 2014 08:42:37 -0700", "by bricha3-mobl.ir.intel.com (sSMTP sendmail emulation);\n\tThu, 18 Sep 2014 16:42:36 +0001" ], "X-ExtLoop1": "1", "X-IronPort-AV": "E=Sophos;i=\"5.04,548,1406617200\"; d=\"scan'208\";a=\"478460976\"", "Date": "Thu, 18 Sep 2014 16:42:36 +0100", "From": "Bruce Richardson <bruce.richardson@intel.com>", "To": "Neil Horman <nhorman@tuxdriver.com>", "Message-ID": "<20140918154235.GB12120@BRICHA3-MOBL>", "References": "<1410948102-12740-1-git-send-email-bruce.richardson@intel.com>\n\t<1410948102-12740-3-git-send-email-bruce.richardson@intel.com>\n\t<20140917152103.GE4213@localhost.localdomain>\n\t<59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com>\n\t<20140917175936.GA13492@hmsreliant.think-freely.org>\n\t<20140918133613.GA7208@BRICHA3-MOBL>\n\t<20140918152930.GG20389@hmsreliant.think-freely.org>", "MIME-Version": "1.0", "Content-Type": "text/plain; charset=us-ascii", "Content-Disposition": "inline", "In-Reply-To": "<20140918152930.GG20389@hmsreliant.think-freely.org>", "Organization": "Intel Shannon Ltd.", "User-Agent": "Mutt/1.5.22 (2013-10-16)", "Cc": "\"dev@dpdk.org\" <dev@dpdk.org>", "Subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "patches and discussions about DPDK <dev.dpdk.org>", "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://dpdk.org/ml/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null }, { "id": 870, "web_url": "https://patches.dpdk.org/comment/870/", "msgid": "<20140918175641.GL20389@hmsreliant.think-freely.org>", "list_archive_url": "https://inbox.dpdk.org/dev/20140918175641.GL20389@hmsreliant.think-freely.org", "date": "2014-09-18T17:56:41", "subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "submitter": { "id": 32, "url": "https://patches.dpdk.org/api/people/32/?format=api", "name": "Neil Horman", "email": "nhorman@tuxdriver.com" }, "content": "On Thu, Sep 18, 2014 at 04:42:36PM +0100, Bruce Richardson wrote:\n> On Thu, Sep 18, 2014 at 11:29:30AM -0400, Neil Horman wrote:\n> > On Thu, Sep 18, 2014 at 02:36:13PM +0100, Bruce Richardson wrote:\n> > > On Wed, Sep 17, 2014 at 01:59:36PM -0400, Neil Horman wrote:\n> > > > On Wed, Sep 17, 2014 at 03:35:19PM +0000, Richardson, Bruce wrote:\n> > > > > \n> > > > > > -----Original Message-----\n> > > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]\n> > > > > > Sent: Wednesday, September 17, 2014 4:21 PM\n> > > > > > To: Richardson, Bruce\n> > > > > > Cc: dev@dpdk.org\n> > > > > > Subject: Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path tx\n> > > > > > perf\n> > > > > > \n> > > > > > On Wed, Sep 17, 2014 at 11:01:39AM +0100, Bruce Richardson wrote:\n> > > > > > > Make a small improvement to slow path TX performance by adding in a\n> > > > > > > prefetch for the second mbuf cache line.\n> > > > > > > Also move assignment of l2/l3 length values only when needed.\n> > > > > > >\n> > > > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>\n> > > > > > > ---\n> > > > > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 +++++++-----\n> > > > > > > 1 file changed, 7 insertions(+), 5 deletions(-)\n> > > > > > >\n> > > > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > > > index 6f702b3..c0bb49f 100644\n> > > > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c\n> > > > > > > @@ -565,25 +565,26 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf\n> > > > > > **tx_pkts,\n> > > > > > > \t\tixgbe_xmit_cleanup(txq);\n> > > > > > > \t}\n> > > > > > >\n> > > > > > > +\trte_prefetch0(&txe->mbuf->pool);\n> > > > > > > +\n> > > > > > \n> > > > > > Can you explain what all of these prefetches are doing? It looks to me like\n> > > > > > they're just fetching the first caheline of the mempool structure, which it\n> > > > > > appears amounts to the pools name. I don't see that having any use here.\n> > > > > > \n> > > > > This does make a decent enough performance difference in my tests (the amount varies depending on the RX path being used by testpmd). \n> > > > > \n> > > > > What I've done with the prefetches is two-fold:\n> > > > > 1) changed it from prefetching the mbuf (first cache line) to prefetching the mbuf pool pointer (second cache line) so that when we go to access the pool pointer to free transmitted mbufs we don't get a cache miss. When clearing the ring and freeing mbufs, the pool pointer is the only mbuf field used, so we don't need that first cache line.\n> > > > ok, this makes some sense, but you're not guaranteed to either have that\n> > > > prefetch be needed, nor are you certain it will still be in cache by the time\n> > > > you get to the free call. Seems like it might be preferable to prefecth the\n> > > > data pointed to by tx_pkt, as you're sure to use that every loop iteration.\n> > > \n> > > The vast majority of the times the prefetch is necessary, and it does help \n> > > performance doing things this way. If the prefetch is not necessary, it's \n> > > just one extra instruction, while, if it is needed, having the prefetch \n> > > occur 20 cycles before access (picking an arbitrary value) means that we \n> > > have cut down the time it takes to pull the data from cache when it is \n> > > needed by 20 cycles.\n> > I understand how prefetch works. What I'm concerned about is its overuse, and\n> > its tendency to frequently need re-calibration (though I admit I missed the &\n> > operator in the patch, and thought you were prefetching the contents of the\n> > struct, not the pointer value itself). As you say, if the pool pointer is\n> > almost certain to be used, then it may well make sense to prefetch the data, but\n> > in doing so, you potentially evict something that you were about to use, so\n> > you're not doing yourself any favors. I understand that you've validated this\n> > experimentally, and so it works, right now. I just like to be very careful\n> > about how prefetch happens, as it can easily (and sliently) start hurting far\n> > more than it helps.\n> > \n> > > As for the value pointed to by tx_pkt, since this is a \n> > > packet the app has just been working on, it's almost certainly already in \n> > > l1/l2 cache. \n> > > \n> > Not sure I follow you here. tx_pkts is an array of mbufs passed to the pmd from\n> > rte_eth_tx_burts, which in turn is called by the application. I don't see any\n> > reasonable guarantee that any of those packets have been touch in sufficiently\n> > recent history that they are likely to be in cache. It seems like, if you do\n> > want to do prefetching, interrotagting nb_tx and doing a prefetch of an\n> > approriate stride to fill multiple cachelines with successive mbuf headers might\n> > provide superior performance.\n> > Neil\n> >\n> Prefetching the mbuf is probably best left to the application. For all our \n> sample applications used for benchmarking, and almost certainly the vast \n> majority of all our example applications, the packet being transmitted is \n> already in cache on the core itself. Adding a prefetch to the tx function I \n> would expect to see a performance decrease in both testpmd and l3fwd apps. \n> I would be useful for apps where the packets are passed from one core to \n> another core which does no processing of them before transmitting them - but \n> in that case, it's better to have the TX thread of the app do the prefetch \n> rather than forcing it in the driver and reduce the performance of those \n> apps that have the packets already in cache.\n> \n\nRegarding the performance decrease, I think you're trying to have it both ways\nhere. Above you indicate that if the prefetch of the pool pointer isn't needed\nits just an extra instruction, which I think is true. But now you are saying\nthat if the tx buffers are in cache, the extra instructions will have an impact.\nGranted its potentially nb_tx prefetches, not one, but none of them stall the\ncpu pipeline as far as Im aware, so I can't imagine 1 prefetch vs several will\nhave a significant impact on performance.\n\nRegarding where to do prefecth. Leaving prefetch in the hands of the application is a\nbad idea, because the application has no visibility into the code path once you\nenter the DPDK. It doesn't know if the buffers are going to be accessed in 20\ncycles or 20,000 cycles, which will be all the difference between a useful and\nharmful prefetch. Sure you can calibrate your application to correspond to a\ngiven version of the dpdk and optimize such a prefetch, but that will be\ncompletely obsoleted the first time the dpdk transmit path changes.\n\nAs for the use of prefetching tx buffers at all, I think theres several cases\nwhere you might find that those buffers are vicimized in cache. consider the\nsituation where a receive interrupt triggers on a cpu right before rte_eth_trans\nis called. For a heavily loaded system, the receive buffers may frequently push\nthe soon-to-be-transmitted buffers out of cache.\n\n> The prefetch added by the patch under discussion doesn't suffer from this \n> issue as the data being prefetched is for the mbuf that was previously \n> transmitted some time previously, and the tx function has fully looped back \n> around the TX ring to get to it again.\n> \n\nI get what you're saying here, that after the first prefetch the data stays hot\nin cache because it is continually re-accessed. Thats fine. But that would\nhappen after the first fetch anyway, without the prefetch.\n\nYou know what would put this argument to rest? If you could run whatever\nbenchmark you were running under the perf utility so we could see the L1 cache\nmisses from the baseline dpdk, the variant where you prefetch the pool pointer,\nand a variant in which you prefetch the next tx buf at the top of the loop.\n\nNeil\n\n\n\n> /Bruce\n> \n>", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id 3B805B3C4;\n\tThu, 18 Sep 2014 19:51:06 +0200 (CEST)", "from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58])\n\tby dpdk.org (Postfix) with ESMTP id 3945AB3BC\n\tfor <dev@dpdk.org>; Thu, 18 Sep 2014 19:51:03 +0200 (CEST)", "from hmsreliant.think-freely.org\n\t([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost)\n\tby smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63)\n\t(envelope-from <nhorman@tuxdriver.com>)\n\tid 1XUfwg-0003Rd-93; Thu, 18 Sep 2014 13:56:48 -0400" ], "Date": "Thu, 18 Sep 2014 13:56:41 -0400", "From": "Neil Horman <nhorman@tuxdriver.com>", "To": "Bruce Richardson <bruce.richardson@intel.com>", "Message-ID": "<20140918175641.GL20389@hmsreliant.think-freely.org>", "References": "<1410948102-12740-1-git-send-email-bruce.richardson@intel.com>\n\t<1410948102-12740-3-git-send-email-bruce.richardson@intel.com>\n\t<20140917152103.GE4213@localhost.localdomain>\n\t<59AF69C657FD0841A61C55336867B5B0343F2EEA@IRSMSX103.ger.corp.intel.com>\n\t<20140917175936.GA13492@hmsreliant.think-freely.org>\n\t<20140918133613.GA7208@BRICHA3-MOBL>\n\t<20140918152930.GG20389@hmsreliant.think-freely.org>\n\t<20140918154235.GB12120@BRICHA3-MOBL>", "MIME-Version": "1.0", "Content-Type": "text/plain; charset=us-ascii", "Content-Disposition": "inline", "In-Reply-To": "<20140918154235.GB12120@BRICHA3-MOBL>", "User-Agent": "Mutt/1.5.23 (2014-03-12)", "X-Spam-Score": "-2.9 (--)", "X-Spam-Status": "No", "Cc": "\"dev@dpdk.org\" <dev@dpdk.org>", "Subject": "Re: [dpdk-dev] [PATCH 2/5] ixgbe: add prefetch to improve slow-path\n\ttx perf", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "patches and discussions about DPDK <dev.dpdk.org>", "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://dpdk.org/ml/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null } ]