app/testpmd: add profiling for Rx/Tx burst routines
Checks
Commit Message
There is the testpmd configuration option called
RTE_TEST_PMD_RECORD_CORE_CYCLES, if this one is turned on
the testpmd application measures the CPU clocks spent
within forwarding loop. This time is the sum of execution
times of rte_eth_rx_burst(), rte_eth_tx_burst(), rte_delay_us(),
rte_pktmbuf_free() and so on, depending on fwd mode set.
While debugging and performance optimization of datapath
burst routines tt would be useful to see the pure execution
times of these ones. It is proposed to add separated profiling
options:
CONFIG_RTE_TEST_PMD_RECORD_CORE_TX_CYCLES
enables gathering profiling data for transmit datapath,
ticks spent within rte_eth_tx_burst() routine
CONFIG_RTE_TEST_PMD_RECORD_CORE_RX_CYCLES
enables gathering profiling data for receive datapath,
ticks spent within rte_eth_rx_burst() routine
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
RFC: http://patches.dpdk.org/patch/53704/
---
app/test-pmd/csumonly.c | 25 ++++++++++-----------
app/test-pmd/flowgen.c | 25 +++++++++++----------
app/test-pmd/icmpecho.c | 26 ++++++++++-----------
app/test-pmd/iofwd.c | 24 ++++++++++----------
app/test-pmd/macfwd.c | 24 +++++++++++---------
app/test-pmd/macswap.c | 26 +++++++++++----------
app/test-pmd/rxonly.c | 17 +++++---------
app/test-pmd/softnicfwd.c | 24 ++++++++++----------
app/test-pmd/testpmd.c | 32 ++++++++++++++++++++++++++
app/test-pmd/testpmd.h | 40 +++++++++++++++++++++++++++++++++
app/test-pmd/txonly.c | 23 +++++++++----------
config/common_base | 2 ++
doc/guides/testpmd_app_ug/build_app.rst | 17 ++++++++++++++
13 files changed, 197 insertions(+), 108 deletions(-)
Comments
On Wed, Jun 26, 2019 at 12:48:37PM +0000, Viacheslav Ovsiienko wrote:
> There is the testpmd configuration option called
> RTE_TEST_PMD_RECORD_CORE_CYCLES, if this one is turned on
> the testpmd application measures the CPU clocks spent
> within forwarding loop. This time is the sum of execution
> times of rte_eth_rx_burst(), rte_eth_tx_burst(), rte_delay_us(),
> rte_pktmbuf_free() and so on, depending on fwd mode set.
>
> While debugging and performance optimization of datapath
> burst routines tt would be useful to see the pure execution
> times of these ones. It is proposed to add separated profiling
> options:
>
> CONFIG_RTE_TEST_PMD_RECORD_CORE_TX_CYCLES
> enables gathering profiling data for transmit datapath,
> ticks spent within rte_eth_tx_burst() routine
>
> CONFIG_RTE_TEST_PMD_RECORD_CORE_RX_CYCLES
> enables gathering profiling data for receive datapath,
> ticks spent within rte_eth_rx_burst() routine
>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
> RFC: http://patches.dpdk.org/patch/53704/
> ---
Out of interest, did you try making these runtime rather than build-time
options, and see if it makes any perf difference? Given the fact that we
would have just two predictable branches per burst of packets, I'd expect
the impact to me pretty minimal, if measurable at all.
/Bruce
Hi, Bruce
Do you mean using "if (core_rx_cycle_enabled) {...}" instead of #ifdef ?
No, I did not try runtime control settings.
Instead I compared performance with all RECORD_CORE_XX_CYCLES options enabled/disabled builds
and have seen the ~1-2% performance difference on my setups (mainly fwd txonly with retry).
So, ticks measuring is not free.
With best regards,
Slava
> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Wednesday, June 26, 2019 15:58
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; bernard.iremonger@intel.com; ferruh.yigit@intel.com
> Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add profiling for Rx/Tx burst
> routines
>
> On Wed, Jun 26, 2019 at 12:48:37PM +0000, Viacheslav Ovsiienko wrote:
> > There is the testpmd configuration option called
> > RTE_TEST_PMD_RECORD_CORE_CYCLES, if this one is turned on the
> testpmd
> > application measures the CPU clocks spent within forwarding loop. This
> > time is the sum of execution times of rte_eth_rx_burst(),
> > rte_eth_tx_burst(), rte_delay_us(),
> > rte_pktmbuf_free() and so on, depending on fwd mode set.
> >
> > While debugging and performance optimization of datapath burst
> > routines tt would be useful to see the pure execution times of these
> > ones. It is proposed to add separated profiling
> > options:
> >
> > CONFIG_RTE_TEST_PMD_RECORD_CORE_TX_CYCLES
> > enables gathering profiling data for transmit datapath,
> > ticks spent within rte_eth_tx_burst() routine
> >
> > CONFIG_RTE_TEST_PMD_RECORD_CORE_RX_CYCLES
> > enables gathering profiling data for receive datapath,
> > ticks spent within rte_eth_rx_burst() routine
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> > RFC:
> >
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatch
> >
> es.dpdk.org%2Fpatch%2F53704%2F&data=02%7C01%7Cviacheslavo%4
> 0mellan
> >
> ox.com%7Cbe154ad6d7b3460006ed08d6fa35de9c%7Ca652971c7d2e4d9ba
> 6a4d14925
> >
> 6f461b%7C0%7C0%7C636971506606185633&sdata=Y6UPuGwiYEeYrv3
> 9Sg3Wq9E2
> > GIBjyVa4mw31Et6FXKE%3D&reserved=0
> > ---
>
> Out of interest, did you try making these runtime rather than build-time
> options, and see if it makes any perf difference? Given the fact that we
> would have just two predictable branches per burst of packets, I'd expect the
> impact to me pretty minimal, if measurable at all.
>
> /Bruce
On Wed, Jun 26, 2019 at 01:19:24PM +0000, Slava Ovsiienko wrote:
> Hi, Bruce
>
> Do you mean using "if (core_rx_cycle_enabled) {...}" instead of #ifdef ?
>
> No, I did not try runtime control settings.
> Instead I compared performance with all RECORD_CORE_XX_CYCLES options enabled/disabled builds
> and have seen the ~1-2% performance difference on my setups (mainly fwd txonly with retry).
> So, ticks measuring is not free.
>
> With best regards,
> Slava
>
Yes, I realise that measuring ticks is going to have a performance impact.
However, what I was referring to was exactly the former - using an "if"
rather than an "ifdef". I would hope with ticks disable using this option
shows no perf impact, and we can reduce the use of build-time configs.
/Bruce
OK, what do you think about this:
#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
if (record_cycle & RECORD_TX_CORE_CYCLES) {
.. do measurement stuff ..
}
#endif
+ add some new command to config in runtime: "set record_cycle 3"
We keep existing RTE_TEST_PMD_RECORD_CORE_CYCLES, do not introduce
new build-time configs and get some new runtime configuring.
WBR,
Slava
> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Wednesday, June 26, 2019 16:21
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; bernard.iremonger@intel.com; ferruh.yigit@intel.com
> Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add profiling for Rx/Tx burst
> routines
>
> On Wed, Jun 26, 2019 at 01:19:24PM +0000, Slava Ovsiienko wrote:
> > Hi, Bruce
> >
> > Do you mean using "if (core_rx_cycle_enabled) {...}" instead of #ifdef ?
> >
> > No, I did not try runtime control settings.
> > Instead I compared performance with all RECORD_CORE_XX_CYCLES
> options
> > enabled/disabled builds and have seen the ~1-2% performance difference
> on my setups (mainly fwd txonly with retry).
> > So, ticks measuring is not free.
> >
> > With best regards,
> > Slava
> >
> Yes, I realise that measuring ticks is going to have a performance impact.
> However, what I was referring to was exactly the former - using an "if"
> rather than an "ifdef". I would hope with ticks disable using this option shows
> no perf impact, and we can reduce the use of build-time configs.
>
> /Bruce
Hi Bruce, Slava,
> -----Original Message-----
> From: Slava Ovsiienko [mailto:viacheslavo@mellanox.com]
> Sent: Thursday, June 27, 2019 5:49 AM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org; Iremonger, Bernard <bernard.iremonger@intel.com>;
> Yigit, Ferruh <ferruh.yigit@intel.com>
> Subject: RE: [dpdk-dev] [PATCH] app/testpmd: add profiling for Rx/Tx burst
> routines
>
> OK, what do you think about this:
>
> #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
> if (record_cycle & RECORD_TX_CORE_CYCLES) {
> .. do measurement stuff ..
> }
> #endif
>
> + add some new command to config in runtime: "set record_cycle 3"
>
> We keep existing RTE_TEST_PMD_RECORD_CORE_CYCLES, do not introduce
> new build-time configs and get some new runtime configuring.
>
> WBR,
> Slava
>
> > -----Original Message-----
> > From: Bruce Richardson <bruce.richardson@intel.com>
> > Sent: Wednesday, June 26, 2019 16:21
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > Cc: dev@dpdk.org; bernard.iremonger@intel.com; ferruh.yigit@intel.com
> > Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add profiling for Rx/Tx
> > burst routines
> >
> > On Wed, Jun 26, 2019 at 01:19:24PM +0000, Slava Ovsiienko wrote:
> > > Hi, Bruce
> > >
> > > Do you mean using "if (core_rx_cycle_enabled) {...}" instead of #ifdef ?
> > >
> > > No, I did not try runtime control settings.
> > > Instead I compared performance with all RECORD_CORE_XX_CYCLES
> > options
> > > enabled/disabled builds and have seen the ~1-2% performance
> > > difference
> > on my setups (mainly fwd txonly with retry).
> > > So, ticks measuring is not free.
> > >
> > > With best regards,
> > > Slava
> > >
> > Yes, I realise that measuring ticks is going to have a performance impact.
> > However, what I was referring to was exactly the former - using an "if"
> > rather than an "ifdef". I would hope with ticks disable using this
> > option shows no perf impact, and we can reduce the use of build-time
> configs.
> >
> > /Bruce
Given that RTE_TEST_PMD_RECORD_CORE_CYCLES is already in the config file.
I think it is better to be consistent and add the new RECORD macros there.
Would it be reasonable to have runtime settings available as well?
Regards,
Bernard
On Fri, Jun 28, 2019 at 02:45:13PM +0100, Iremonger, Bernard wrote:
> Hi Bruce, Slava,
>
> > -----Original Message-----
> > From: Slava Ovsiienko [mailto:viacheslavo@mellanox.com]
> > Sent: Thursday, June 27, 2019 5:49 AM
> > To: Richardson, Bruce <bruce.richardson@intel.com>
> > Cc: dev@dpdk.org; Iremonger, Bernard <bernard.iremonger@intel.com>;
> > Yigit, Ferruh <ferruh.yigit@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: add profiling for Rx/Tx burst
> > routines
> >
> > OK, what do you think about this:
> >
> > #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
> > if (record_cycle & RECORD_TX_CORE_CYCLES) {
> > .. do measurement stuff ..
> > }
> > #endif
> >
> > + add some new command to config in runtime: "set record_cycle 3"
> >
> > We keep existing RTE_TEST_PMD_RECORD_CORE_CYCLES, do not introduce
> > new build-time configs and get some new runtime configuring.
> >
> > WBR,
> > Slava
> >
> > > -----Original Message-----
> > > From: Bruce Richardson <bruce.richardson@intel.com>
> > > Sent: Wednesday, June 26, 2019 16:21
> > > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > > Cc: dev@dpdk.org; bernard.iremonger@intel.com; ferruh.yigit@intel.com
> > > Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add profiling for Rx/Tx
> > > burst routines
> > >
> > > On Wed, Jun 26, 2019 at 01:19:24PM +0000, Slava Ovsiienko wrote:
> > > > Hi, Bruce
> > > >
> > > > Do you mean using "if (core_rx_cycle_enabled) {...}" instead of #ifdef ?
> > > >
> > > > No, I did not try runtime control settings.
> > > > Instead I compared performance with all RECORD_CORE_XX_CYCLES
> > > options
> > > > enabled/disabled builds and have seen the ~1-2% performance
> > > > difference
> > > on my setups (mainly fwd txonly with retry).
> > > > So, ticks measuring is not free.
> > > >
> > > > With best regards,
> > > > Slava
> > > >
> > > Yes, I realise that measuring ticks is going to have a performance impact.
> > > However, what I was referring to was exactly the former - using an "if"
> > > rather than an "ifdef". I would hope with ticks disable using this
> > > option shows no perf impact, and we can reduce the use of build-time
> > configs.
> > >
> > > /Bruce
>
> Given that RTE_TEST_PMD_RECORD_CORE_CYCLES is already in the config file.
> I think it is better to be consistent and add the new RECORD macros there.
>
> Would it be reasonable to have runtime settings available as well?
>
That configuration option is only present right now for the make builds, so
I'd like to see it replaced with a runtime option rather than see about
adding more config options to the meson build. The first step should be to
avoid adding more config options and just look to use dynamic ones. Ideally
the existing build option should be replaced at the same time.
/Bruce
I think we should compromise: keep existing RTE_TEST_PMD_RECORD_CORE_CYCLES
and extend with runtime switch under this build-time option:
#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
if (record_tx)
.. gather tx related stats...
if (record_rx)
.. gather rx related stats...
#endif
This is very specific feature, it is needed while debugging and testing datapath
routines, and It seems this feature with appropriate overhead should not be always enabled.
existing build-time configuration options looks OK as for me.
Bruce, if proposed runtime extension is acceptable - I will update the patch.
WBR,
Slava
> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Friday, June 28, 2019 17:20
> To: Iremonger, Bernard <bernard.iremonger@intel.com>
> Cc: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org; Yigit,
> Ferruh <ferruh.yigit@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add profiling for Rx/Tx burst
> routines
>
> On Fri, Jun 28, 2019 at 02:45:13PM +0100, Iremonger, Bernard wrote:
> > Hi Bruce, Slava,
> >
> > > -----Original Message-----
> > > From: Slava Ovsiienko [mailto:viacheslavo@mellanox.com]
> > > Sent: Thursday, June 27, 2019 5:49 AM
> > > To: Richardson, Bruce <bruce.richardson@intel.com>
> > > Cc: dev@dpdk.org; Iremonger, Bernard <bernard.iremonger@intel.com>;
> > > Yigit, Ferruh <ferruh.yigit@intel.com>
> > > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: add profiling for Rx/Tx
> > > burst routines
> > >
> > > OK, what do you think about this:
> > >
> > > #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES if (record_cycle &
> > > RECORD_TX_CORE_CYCLES) {
> > > .. do measurement stuff ..
> > > }
> > > #endif
> > >
> > > + add some new command to config in runtime: "set record_cycle 3"
> > >
> > > We keep existing RTE_TEST_PMD_RECORD_CORE_CYCLES, do not
> introduce
> > > new build-time configs and get some new runtime configuring.
> > >
> > > WBR,
> > > Slava
> > >
> > > > -----Original Message-----
> > > > From: Bruce Richardson <bruce.richardson@intel.com>
> > > > Sent: Wednesday, June 26, 2019 16:21
> > > > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > > > Cc: dev@dpdk.org; bernard.iremonger@intel.com;
> > > > ferruh.yigit@intel.com
> > > > Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add profiling for
> > > > Rx/Tx burst routines
> > > >
> > > > On Wed, Jun 26, 2019 at 01:19:24PM +0000, Slava Ovsiienko wrote:
> > > > > Hi, Bruce
> > > > >
> > > > > Do you mean using "if (core_rx_cycle_enabled) {...}" instead of #ifdef
> ?
> > > > >
> > > > > No, I did not try runtime control settings.
> > > > > Instead I compared performance with all RECORD_CORE_XX_CYCLES
> > > > options
> > > > > enabled/disabled builds and have seen the ~1-2% performance
> > > > > difference
> > > > on my setups (mainly fwd txonly with retry).
> > > > > So, ticks measuring is not free.
> > > > >
> > > > > With best regards,
> > > > > Slava
> > > > >
> > > > Yes, I realise that measuring ticks is going to have a performance
> impact.
> > > > However, what I was referring to was exactly the former - using an "if"
> > > > rather than an "ifdef". I would hope with ticks disable using this
> > > > option shows no perf impact, and we can reduce the use of
> > > > build-time
> > > configs.
> > > >
> > > > /Bruce
> >
> > Given that RTE_TEST_PMD_RECORD_CORE_CYCLES is already in the
> config file.
> > I think it is better to be consistent and add the new RECORD macros there.
> >
> > Would it be reasonable to have runtime settings available as well?
> >
> That configuration option is only present right now for the make builds, so I'd
> like to see it replaced with a runtime option rather than see about adding
> more config options to the meson build. The first step should be to avoid
> adding more config options and just look to use dynamic ones. Ideally the
> existing build option should be replaced at the same time.
>
> /Bruce
On Mon, Jul 01, 2019 at 04:57:30AM +0000, Slava Ovsiienko wrote:
> I think we should compromise: keep existing RTE_TEST_PMD_RECORD_CORE_CYCLES
> and extend with runtime switch under this build-time option:
>
> #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
> if (record_tx)
> .. gather tx related stats...
> if (record_rx)
> .. gather rx related stats...
> #endif
>
> This is very specific feature, it is needed while debugging and testing datapath
> routines, and It seems this feature with appropriate overhead should not be always enabled.
> existing build-time configuration options looks OK as for me.
>
> Bruce, if proposed runtime extension is acceptable - I will update the patch.
>
Ok for me.
Thanks,
/Bruce
On 7/1/2019 9:15 AM, Bruce Richardson wrote:
> On Mon, Jul 01, 2019 at 04:57:30AM +0000, Slava Ovsiienko wrote:
>> I think we should compromise: keep existing RTE_TEST_PMD_RECORD_CORE_CYCLES
>> and extend with runtime switch under this build-time option:
>>
>> #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
>> if (record_tx)
>> .. gather tx related stats...
>> if (record_rx)
>> .. gather rx related stats...
>> #endif
>>
>> This is very specific feature, it is needed while debugging and testing datapath
>> routines, and It seems this feature with appropriate overhead should not be always enabled.
>> existing build-time configuration options looks OK as for me.
+1, if we will enable this I am for having compile time config options.
Only a concern about the implementation, 'RTE_TEST_PMD_RECORD_CORE_CYCLES' and
'RTE_TEST_PMD_RECORD_CORE_RX_CYCLES' are both using same variable, like
'start_rx_tsc', is there an assumption that both won't be enabled at same time?
I think better to able to enable CORE_CYCLES, RX_CYCLES and TX_CYCLES separately
/ independently.
Another think to consider, for long term - not for this patch, to move introduce
RX/TX ifdefs to ethdev Rx/Tx functions so that all applications can use them,
not just testpmd.
>>
>> Bruce, if proposed runtime extension is acceptable - I will update the patch.
>>
> Ok for me.>
> Thanks,
> /Bruce
>
@@ -717,19 +717,19 @@ struct simple_gre_hdr {
uint16_t nb_segments = 0;
int ret;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- uint64_t start_tsc;
- uint64_t end_tsc;
- uint64_t core_cycles;
+#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES)
+ uint64_t start_tx_tsc;
#endif
-
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- start_tsc = rte_rdtsc();
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \
+ defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES)
+ uint64_t start_rx_tsc;
#endif
/* receive a burst of packet */
+ TEST_PMD_CORE_CYC_RX_START(start_rx_tsc);
nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
nb_pkt_per_burst);
+ TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc);
if (unlikely(nb_rx == 0))
return;
#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
@@ -989,8 +989,10 @@ struct simple_gre_hdr {
printf("Preparing packet burst to transmit failed: %s\n",
rte_strerror(rte_errno));
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, tx_pkts_burst,
nb_prep);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
/*
* Retry if necessary
@@ -999,8 +1001,10 @@ struct simple_gre_hdr {
retry = 0;
while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) {
rte_delay_us(burst_tx_delay_time);
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
&tx_pkts_burst[nb_tx], nb_rx - nb_tx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
}
}
fs->tx_packets += nb_tx;
@@ -1017,12 +1021,7 @@ struct simple_gre_hdr {
rte_pktmbuf_free(tx_pkts_burst[nb_tx]);
} while (++nb_tx < nb_rx);
}
-
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- end_tsc = rte_rdtsc();
- core_cycles = (end_tsc - start_tsc);
- fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
-#endif
+ TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc);
}
struct fwd_engine csum_fwd_engine = {
@@ -130,20 +130,21 @@
uint16_t i;
uint32_t retry;
uint64_t tx_offloads;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- uint64_t start_tsc;
- uint64_t end_tsc;
- uint64_t core_cycles;
-#endif
static int next_flow = 0;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- start_tsc = rte_rdtsc();
+#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES)
+ uint64_t start_tx_tsc;
+#endif
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \
+ defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES)
+ uint64_t start_rx_tsc;
#endif
/* Receive a burst of packets and discard them. */
+ TEST_PMD_CORE_CYC_RX_START(start_rx_tsc);
nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
nb_pkt_per_burst);
+ TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc);
fs->rx_packets += nb_rx;
for (i = 0; i < nb_rx; i++)
@@ -212,7 +213,9 @@
next_flow = (next_flow + 1) % cfg_n_flows;
}
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
/*
* Retry if necessary
*/
@@ -220,8 +223,10 @@
retry = 0;
while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) {
rte_delay_us(burst_tx_delay_time);
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
&pkts_burst[nb_tx], nb_rx - nb_tx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
}
}
fs->tx_packets += nb_tx;
@@ -239,11 +244,7 @@
rte_pktmbuf_free(pkts_burst[nb_tx]);
} while (++nb_tx < nb_pkt);
}
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- end_tsc = rte_rdtsc();
- core_cycles = (end_tsc - start_tsc);
- fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
-#endif
+ TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc);
}
struct fwd_engine flow_gen_engine = {
@@ -293,21 +293,22 @@
uint32_t cksum;
uint8_t i;
int l2_len;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- uint64_t start_tsc;
- uint64_t end_tsc;
- uint64_t core_cycles;
-#endif
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- start_tsc = rte_rdtsc();
+#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES)
+ uint64_t start_tx_tsc;
+#endif
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \
+ defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES)
+ uint64_t start_rx_tsc;
#endif
/*
* First, receive a burst of packets.
*/
+ TEST_PMD_CORE_CYC_RX_START(start_rx_tsc);
nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
nb_pkt_per_burst);
+ TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc);
if (unlikely(nb_rx == 0))
return;
@@ -492,8 +493,10 @@
/* Send back ICMP echo replies, if any. */
if (nb_replies > 0) {
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst,
nb_replies);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
/*
* Retry if necessary
*/
@@ -502,10 +505,12 @@
while (nb_tx < nb_replies &&
retry++ < burst_tx_retry_num) {
rte_delay_us(burst_tx_delay_time);
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx += rte_eth_tx_burst(fs->tx_port,
fs->tx_queue,
&pkts_burst[nb_tx],
nb_replies - nb_tx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
}
}
fs->tx_packets += nb_tx;
@@ -519,12 +524,7 @@
} while (++nb_tx < nb_replies);
}
}
-
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- end_tsc = rte_rdtsc();
- core_cycles = (end_tsc - start_tsc);
- fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
-#endif
+ TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc);
}
struct fwd_engine icmp_echo_engine = {
@@ -51,21 +51,21 @@
uint16_t nb_tx;
uint32_t retry;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- uint64_t start_tsc;
- uint64_t end_tsc;
- uint64_t core_cycles;
+#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES)
+ uint64_t start_tx_tsc;
#endif
-
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- start_tsc = rte_rdtsc();
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \
+ defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES)
+ uint64_t start_rx_tsc;
#endif
/*
* Receive a burst of packets and forward them.
*/
+ TEST_PMD_CORE_CYC_RX_START(start_rx_tsc);
nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue,
pkts_burst, nb_pkt_per_burst);
+ TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc);
if (unlikely(nb_rx == 0))
return;
fs->rx_packets += nb_rx;
@@ -73,8 +73,10 @@
#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
#endif
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
pkts_burst, nb_rx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
/*
* Retry if necessary
*/
@@ -82,8 +84,10 @@
retry = 0;
while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) {
rte_delay_us(burst_tx_delay_time);
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
&pkts_burst[nb_tx], nb_rx - nb_tx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
}
}
fs->tx_packets += nb_tx;
@@ -96,11 +100,7 @@
rte_pktmbuf_free(pkts_burst[nb_tx]);
} while (++nb_tx < nb_rx);
}
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- end_tsc = rte_rdtsc();
- core_cycles = (end_tsc - start_tsc);
- fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
-#endif
+ TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc);
}
struct fwd_engine io_fwd_engine = {
@@ -56,21 +56,23 @@
uint16_t i;
uint64_t ol_flags = 0;
uint64_t tx_offloads;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- uint64_t start_tsc;
- uint64_t end_tsc;
- uint64_t core_cycles;
+
+#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES)
+ uint64_t start_tx_tsc;
#endif
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \
+ defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES)
+ uint64_t start_rx_tsc;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- start_tsc = rte_rdtsc();
#endif
/*
* Receive a burst of packets and forward them.
*/
+ TEST_PMD_CORE_CYC_RX_START(start_rx_tsc);
nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
nb_pkt_per_burst);
+ TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc);
if (unlikely(nb_rx == 0))
return;
@@ -103,7 +105,9 @@
mb->vlan_tci = txp->tx_vlan_id;
mb->vlan_tci_outer = txp->tx_vlan_id_outer;
}
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
/*
* Retry if necessary
*/
@@ -111,8 +115,10 @@
retry = 0;
while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) {
rte_delay_us(burst_tx_delay_time);
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
&pkts_burst[nb_tx], nb_rx - nb_tx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
}
}
@@ -126,11 +132,7 @@
rte_pktmbuf_free(pkts_burst[nb_tx]);
} while (++nb_tx < nb_rx);
}
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- end_tsc = rte_rdtsc();
- core_cycles = (end_tsc - start_tsc);
- fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
-#endif
+ TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc);
}
struct fwd_engine mac_fwd_engine = {
@@ -86,21 +86,22 @@
uint16_t nb_rx;
uint16_t nb_tx;
uint32_t retry;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- uint64_t start_tsc;
- uint64_t end_tsc;
- uint64_t core_cycles;
-#endif
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- start_tsc = rte_rdtsc();
+#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES)
+ uint64_t start_tx_tsc;
+#endif
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \
+ defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES)
+ uint64_t start_rx_tsc;
#endif
/*
* Receive a burst of packets and forward them.
*/
+ TEST_PMD_CORE_CYC_RX_START(start_rx_tsc);
nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
nb_pkt_per_burst);
+ TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc);
if (unlikely(nb_rx == 0))
return;
@@ -112,7 +113,10 @@
do_macswap(pkts_burst, nb_rx, txp);
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
+
/*
* Retry if necessary
*/
@@ -120,8 +124,10 @@
retry = 0;
while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) {
rte_delay_us(burst_tx_delay_time);
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
&pkts_burst[nb_tx], nb_rx - nb_tx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
}
}
fs->tx_packets += nb_tx;
@@ -134,11 +140,7 @@
rte_pktmbuf_free(pkts_burst[nb_tx]);
} while (++nb_tx < nb_rx);
}
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- end_tsc = rte_rdtsc();
- core_cycles = (end_tsc - start_tsc);
- fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
-#endif
+ TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc);
}
struct fwd_engine mac_swap_engine = {
@@ -50,19 +50,18 @@
uint16_t nb_rx;
uint16_t i;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- uint64_t start_tsc;
- uint64_t end_tsc;
- uint64_t core_cycles;
-
- start_tsc = rte_rdtsc();
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \
+ defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES)
+ uint64_t start_rx_tsc;
#endif
/*
* Receive a burst of packets.
*/
+ TEST_PMD_CORE_CYC_RX_START(start_rx_tsc);
nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
nb_pkt_per_burst);
+ TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc);
if (unlikely(nb_rx == 0))
return;
@@ -73,11 +72,7 @@
for (i = 0; i < nb_rx; i++)
rte_pktmbuf_free(pkts_burst[i]);
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- end_tsc = rte_rdtsc();
- core_cycles = (end_tsc - start_tsc);
- fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
-#endif
+ TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc);
}
struct fwd_engine rx_only_engine = {
@@ -87,35 +87,39 @@ struct tm_hierarchy {
uint16_t nb_tx;
uint32_t retry;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- uint64_t start_tsc;
- uint64_t end_tsc;
- uint64_t core_cycles;
+#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES)
+ uint64_t start_tx_tsc;
#endif
-
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- start_tsc = rte_rdtsc();
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \
+ defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES)
+ uint64_t start_rx_tsc;
#endif
/* Packets Receive */
+ TEST_PMD_CORE_CYC_RX_START(start_rx_tsc);
nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue,
pkts_burst, nb_pkt_per_burst);
+ TEST_PMD_CORE_CYC_RX_ADD(fs, start_rx_tsc);
fs->rx_packets += nb_rx;
#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
#endif
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
pkts_burst, nb_rx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
/* Retry if necessary */
if (unlikely(nb_tx < nb_rx) && fs->retry_enabled) {
retry = 0;
while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) {
rte_delay_us(burst_tx_delay_time);
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
&pkts_burst[nb_tx], nb_rx - nb_tx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
}
}
fs->tx_packets += nb_tx;
@@ -130,11 +134,7 @@ struct tm_hierarchy {
rte_pktmbuf_free(pkts_burst[nb_tx]);
} while (++nb_tx < nb_rx);
}
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- end_tsc = rte_rdtsc();
- core_cycles = (end_tsc - start_tsc);
- fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
-#endif
+ TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc);
}
static void
@@ -1506,6 +1506,12 @@ struct extmem_param {
#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
uint64_t fwd_cycles = 0;
#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES
+ uint64_t rx_cycles = 0;
+#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES
+ uint64_t tx_cycles = 0;
+#endif
uint64_t total_recv = 0;
uint64_t total_xmit = 0;
struct rte_port *port;
@@ -1536,6 +1542,12 @@ struct extmem_param {
#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
fwd_cycles += fs->core_cycles;
#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES
+ rx_cycles += fs->core_rx_cycles;
+#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES
+ tx_cycles += fs->core_tx_cycles;
+#endif
}
for (i = 0; i < cur_fwd_config.nb_fwd_ports; i++) {
uint8_t j;
@@ -1671,6 +1683,20 @@ struct extmem_param {
(unsigned int)(fwd_cycles / total_recv),
fwd_cycles, total_recv);
#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES
+ if (total_recv > 0)
+ printf("\n rx CPU cycles/packet=%u (total cycles="
+ "%"PRIu64" / total RX packets=%"PRIu64")\n",
+ (unsigned int)(rx_cycles / total_recv),
+ rx_cycles, total_recv);
+#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES
+ if (total_xmit > 0)
+ printf("\n tx CPU cycles/packet=%u (total cycles="
+ "%"PRIu64" / total TX packets=%"PRIu64")\n",
+ (unsigned int)(tx_cycles / total_xmit),
+ tx_cycles, total_xmit);
+#endif
}
void
@@ -1701,6 +1727,12 @@ struct extmem_param {
#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
fs->core_cycles = 0;
#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES
+ fs->core_rx_cycles = 0;
+#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES
+ fs->core_tx_cycles = 0;
+#endif
}
}
@@ -130,12 +130,52 @@ struct fwd_stream {
#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
uint64_t core_cycles; /**< used for RX and TX processing */
#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES
+ uint64_t core_tx_cycles; /**< used for tx_burst processing */
+#endif
+#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES
+ uint64_t core_rx_cycles; /**< used for rx_burst processing */
+#endif
#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
struct pkt_burst_stats rx_burst_stats;
struct pkt_burst_stats tx_burst_stats;
#endif
};
+#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES)
+#define TEST_PMD_CORE_CYC_TX_START(a) {a = rte_rdtsc(); }
+#else
+#define TEST_PMD_CORE_CYC_TX_START(a)
+#endif
+
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES) || \
+ defined(RTE_TEST_PMD_RECORD_CORE_RX_CYCLES)
+#define TEST_PMD_CORE_CYC_RX_START(a) {a = rte_rdtsc(); }
+#else
+#define TEST_PMD_CORE_CYC_RX_START(a)
+#endif
+
+#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
+#define TEST_PMD_CORE_CYC_FWD_ADD(fs, s) \
+{uint64_t end_tsc = rte_rdtsc(); fs->core_cycles += end_tsc - (s); }
+#else
+#define TEST_PMD_CORE_CYC_FWD_ADD(fs, s)
+#endif
+
+#ifdef RTE_TEST_PMD_RECORD_CORE_TX_CYCLES
+#define TEST_PMD_CORE_CYC_TX_ADD(fs, s) \
+{uint64_t end_tsc = rte_rdtsc(); fs->core_tx_cycles += end_tsc - (s); }
+#else
+#define TEST_PMD_CORE_CYC_TX_ADD(fs, s)
+#endif
+
+#ifdef RTE_TEST_PMD_RECORD_CORE_RX_CYCLES
+#define TEST_PMD_CORE_CYC_RX_ADD(fs, s) \
+{uint64_t end_tsc = rte_rdtsc(); fs->core_rx_cycles += end_tsc - (s); }
+#else
+#define TEST_PMD_CORE_CYC_RX_ADD(fs, s)
+#endif
+
/** Descriptor for a single flow. */
struct port_flow {
struct port_flow *next; /**< Next flow in list. */
@@ -241,16 +241,16 @@
uint32_t retry;
uint64_t ol_flags = 0;
uint64_t tx_offloads;
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- uint64_t start_tsc;
- uint64_t end_tsc;
- uint64_t core_cycles;
+#if defined(RTE_TEST_PMD_RECORD_CORE_TX_CYCLES)
+ uint64_t start_tx_tsc;
+#endif
+#if defined(RTE_TEST_PMD_RECORD_CORE_CYCLES)
+ uint64_t start_rx_tsc;
#endif
#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- start_tsc = rte_rdtsc();
+ TEST_PMD_CORE_CYC_RX_START(start_rx_tsc);
#endif
-
mbp = current_fwd_lcore()->mbp;
txp = &ports[fs->tx_port];
tx_offloads = txp->dev_conf.txmode.offloads;
@@ -302,7 +302,9 @@
if (nb_pkt == 0)
return;
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
/*
* Retry if necessary
*/
@@ -310,8 +312,10 @@
retry = 0;
while (nb_tx < nb_pkt && retry++ < burst_tx_retry_num) {
rte_delay_us(burst_tx_delay_time);
+ TEST_PMD_CORE_CYC_TX_START(start_tx_tsc);
nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
&pkts_burst[nb_tx], nb_pkt - nb_tx);
+ TEST_PMD_CORE_CYC_TX_ADD(fs, start_tx_tsc);
}
}
fs->tx_packets += nb_tx;
@@ -334,12 +338,7 @@
rte_pktmbuf_free(pkts_burst[nb_tx]);
} while (++nb_tx < nb_pkt);
}
-
-#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
- end_tsc = rte_rdtsc();
- core_cycles = (end_tsc - start_tsc);
- fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
-#endif
+ TEST_PMD_CORE_CYC_FWD_ADD(fs, start_rx_tsc);
}
static void
@@ -1003,6 +1003,8 @@ CONFIG_RTE_PROC_INFO=n
#
CONFIG_RTE_TEST_PMD=y
CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
+CONFIG_RTE_TEST_PMD_RECORD_CORE_RX_CYCLES=n
+CONFIG_RTE_TEST_PMD_RECORD_CORE_TX_CYCLES=n
CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
#
@@ -21,6 +21,23 @@ The basic compilation steps are:
export RTE_TARGET=x86_64-native-linux-gcc
+#. Edit the desired conditional options in $RTE_SDK/config/common_base (optional):
+
+ * ``CONFIG_RTE_TEST_PMD_RECORD_CORE_TX_CYCLES``
+
+ Enables gathering profiling data for transmit datapath,
+ counts the ticks spent within rte_eth_tx_burst() routine.
+
+ * ``CONFIG_RTE_TEST_PMD_RECORD_CORE_RX_CYCLES``
+
+ Enables gathering profiling data for receive datapath,
+ counts ticks spent within rte_eth_rx_burst() routine.
+
+ * ``CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES``
+
+ Enables gathering profiling data for forwarding routine
+ in general.
+
#. Build the application:
.. code-block:: console