[dpdk-dev,v2,03/20] i40e: call i40e_txd_enable_checksum only for offloaded packets

Message ID 1423041925-26956-4-git-send-email-olivier.matz@6wind.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Olivier Matz Feb. 4, 2015, 9:25 a.m. UTC
  From i40e datasheet:

  The IP header type and its offload. In case of tunneling, the IIPT
  relates to the inner IP header. See also EIPT field for the outer
  (External) IP header offload.

  00 - non IP packet or packet type is not defined by software
  01 - IPv6 packet
  10 - IPv4 packet with no IP checksum offload
  11 - IPv4 packet with IP checksum offload

Therefore it is not needed to fill the IIPT field if no offload is
requested (we can keep the value to 00). For instance, the linux driver
code does not set it when (skb->ip_summed != CHECKSUM_PARTIAL). We can
do the same in the dpdk driver.

The function i40e_txd_enable_checksum() that fills the offload registers
can only be called for packets requiring an offload.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_pmd_i40e/i40e_rxtx.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)
  

Comments

Zhang, Helin Feb. 10, 2015, 6:03 a.m. UTC | #1
> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Wednesday, February 4, 2015 5:25 PM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin; Liu, Jijiang; Zhang, Helin; olivier.matz@6wind.com
> Subject: [PATCH v2 03/20] i40e: call i40e_txd_enable_checksum only for
> offloaded packets
> 
> From i40e datasheet:
> 
>   The IP header type and its offload. In case of tunneling, the IIPT
>   relates to the inner IP header. See also EIPT field for the outer
>   (External) IP header offload.
> 
>   00 - non IP packet or packet type is not defined by software
>   01 - IPv6 packet
>   10 - IPv4 packet with no IP checksum offload
>   11 - IPv4 packet with IP checksum offload
> 
> Therefore it is not needed to fill the IIPT field if no offload is requested (we can
> keep the value to 00). For instance, the linux driver code does not set it when
> (skb->ip_summed != CHECKSUM_PARTIAL). We can do the same in the dpdk
> driver.
> 
> The function i40e_txd_enable_checksum() that fills the offload registers can
> only be called for packets requiring an offload.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  lib/librte_pmd_i40e/i40e_rxtx.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
> index 8e9df96..9acdeee 100644
> --- a/lib/librte_pmd_i40e/i40e_rxtx.c
> +++ b/lib/librte_pmd_i40e/i40e_rxtx.c
> @@ -74,6 +74,11 @@
> 
>  #define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP |
> I40E_TX_DESC_CMD_RS)
> 
> +#define I40E_TX_CKSUM_OFFLOAD_MASK (		 \
> +		PKT_TX_IP_CKSUM |		 \
> +		PKT_TX_L4_MASK |		 \
> +		PKT_TX_OUTER_IP_CKSUM)
> +
>  #define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \
>  	(uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM)
> 
> @@ -1272,10 +1277,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  		/* Enable checksum offloading */
>  		cd_tunneling_params = 0;
> -		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
> -						l2_len, l3_len, outer_l2_len,
> -						outer_l3_len,
> -						&cd_tunneling_params);
> +		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) {
likely should be added.

> +			i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
> +				l2_len, l3_len, outer_l2_len,
> +				outer_l3_len,
> +				&cd_tunneling_params);
> +		}
As this code changes are in fast path, performance regression test is needed. I would
like to see the performance difference with or without this patch set. Hopefully nothing
different. If you need any helps, just let me know.

Regards,
Helin

> 
>  		if (unlikely(nb_ctx)) {
>  			/* Setup TX context descriptor if required */
> --
> 2.1.4
  
Olivier Matz Feb. 10, 2015, 5:06 p.m. UTC | #2
Hi Helin,

On 02/10/2015 07:03 AM, Zhang, Helin wrote:
>>   		/* Enable checksum offloading */
>>   		cd_tunneling_params = 0;
>> -		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
>> -						l2_len, l3_len, outer_l2_len,
>> -						outer_l3_len,
>> -						&cd_tunneling_params);
>> +		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) {
> likely should be added.

I would say unlikely() instead. I think the non-offload case should be
the default one. What do you think?

>> +			i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
>> +				l2_len, l3_len, outer_l2_len,
>> +				outer_l3_len,
>> +				&cd_tunneling_params);
>> +		}
> As this code changes are in fast path, performance regression test is needed. I would
> like to see the performance difference with or without this patch set. Hopefully nothing
> different. If you need any helps, just let me know.

I'm sorry, I won't have the needed resources to bench this as I
would have to setup a performance platform with i40e devices.

But I'm pretty sure that the code in non-offload case would be faster
with this patch as it will avoid many operations in
i40e_txd_enable_checksum().

For the offload case, as we also removed the if (l2_len == 0)
and if (l3_len == 0), I think there are also less tests than before
my patch series.

So in my opinion, adding this test does not really justify to check the
performance.

Regards,
Olivier
  
Zhang, Helin Feb. 11, 2015, 5:32 a.m. UTC | #3
> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Wednesday, February 11, 2015 1:07 AM
> To: Zhang, Helin; dev@dpdk.org
> Cc: Ananyev, Konstantin; Liu, Jijiang
> Subject: Re: [PATCH v2 03/20] i40e: call i40e_txd_enable_checksum only for
> offloaded packets
> 
> Hi Helin,
> 
> On 02/10/2015 07:03 AM, Zhang, Helin wrote:
> >>   		/* Enable checksum offloading */
> >>   		cd_tunneling_params = 0;
> >> -		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
> >> -						l2_len, l3_len, outer_l2_len,
> >> -						outer_l3_len,
> >> -						&cd_tunneling_params);
> >> +		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) {
> > likely should be added.
> 
> I would say unlikely() instead. I think the non-offload case should be the default
> one. What do you think?
> 
> >> +			i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
> >> +				l2_len, l3_len, outer_l2_len,
> >> +				outer_l3_len,
> >> +				&cd_tunneling_params);
> >> +		}
> > As this code changes are in fast path, performance regression test is
> > needed. I would like to see the performance difference with or without
> > this patch set. Hopefully nothing different. If you need any helps, just let me
> know.
> 
> I'm sorry, I won't have the needed resources to bench this as I would have to
> setup a performance platform with i40e devices.
> 
> But I'm pretty sure that the code in non-offload case would be faster with this
> patch as it will avoid many operations in i40e_txd_enable_checksum().
> 
> For the offload case, as we also removed the if (l2_len == 0) and if (l3_len == 0),
> I think there are also less tests than before my patch series.
> 
> So in my opinion, adding this test does not really justify to check the
> performance.
As 40G is quite sensitive on cpu cycles, we'd better to avoid any performance drop
during our modifying the code for fast path. Performance is what we care about too
much. Based on my experiences, even minor code changes may result in big
performance impact.
It seems that we may need to help you on performance measurement.

Regards,
Helin

> 
> Regards,
> Olivier
  
Olivier Matz Feb. 11, 2015, 5:13 p.m. UTC | #4
Hi Helin,

On 02/11/2015 06:32 AM, Zhang, Helin wrote:
>> On 02/10/2015 07:03 AM, Zhang, Helin wrote:
>>>>    		/* Enable checksum offloading */
>>>>    		cd_tunneling_params = 0;
>>>> -		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
>>>> -						l2_len, l3_len, outer_l2_len,
>>>> -						outer_l3_len,
>>>> -						&cd_tunneling_params);
>>>> +		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) {
>>> likely should be added.
>>
>> I would say unlikely() instead. I think the non-offload case should be the default
>> one. What do you think?

Maybe you missed this comment. Any thoughts?


>>>> +			i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
>>>> +				l2_len, l3_len, outer_l2_len,
>>>> +				outer_l3_len,
>>>> +				&cd_tunneling_params);
>>>> +		}
>>> As this code changes are in fast path, performance regression test is
>>> needed. I would like to see the performance difference with or without
>>> this patch set. Hopefully nothing different. If you need any helps, just let me
>> know.
>>
>> I'm sorry, I won't have the needed resources to bench this as I would have to
>> setup a performance platform with i40e devices.
>>
>> But I'm pretty sure that the code in non-offload case would be faster with this
>> patch as it will avoid many operations in i40e_txd_enable_checksum().
>>
>> For the offload case, as we also removed the if (l2_len == 0) and if (l3_len == 0),
>> I think there are also less tests than before my patch series.
>>
>> So in my opinion, adding this test does not really justify to check the
>> performance.
> As 40G is quite sensitive on cpu cycles, we'd better to avoid any performance drop
> during our modifying the code for fast path. Performance is what we care about too
> much. Based on my experiences, even minor code changes may result in big
> performance impact.
> It seems that we may need to help you on performance measurement.

Thanks, indeed it's helpful if you can check performance non-regression.

Regards,
Olivier
  
Zhang, Helin Feb. 13, 2015, 2:25 a.m. UTC | #5
> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Thursday, February 12, 2015 1:14 AM
> To: Zhang, Helin; dev@dpdk.org
> Cc: Ananyev, Konstantin; Liu, Jijiang
> Subject: Re: [PATCH v2 03/20] i40e: call i40e_txd_enable_checksum only for
> offloaded packets
> 
> Hi Helin,
> 
> On 02/11/2015 06:32 AM, Zhang, Helin wrote:
> >> On 02/10/2015 07:03 AM, Zhang, Helin wrote:
> >>>>    		/* Enable checksum offloading */
> >>>>    		cd_tunneling_params = 0;
> >>>> -		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
> >>>> -						l2_len, l3_len, outer_l2_len,
> >>>> -						outer_l3_len,
> >>>> -						&cd_tunneling_params);
> >>>> +		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) {
> >>> likely should be added.
> >>
> >> I would say unlikely() instead. I think the non-offload case should
> >> be the default one. What do you think?
> 
> Maybe you missed this comment. Any thoughts?
Ohh, sorry for the missing!
I'd prefer to have likely, as hardware offload is preferred if there is. If you
don't think so, how about to keep nothing (no likely/unlikely) as it is.

> 
> 
> >>>> +			i40e_txd_enable_checksum(ol_flags, &td_cmd,
> &td_offset,
> >>>> +				l2_len, l3_len, outer_l2_len,
> >>>> +				outer_l3_len,
> >>>> +				&cd_tunneling_params);
> >>>> +		}
> >>> As this code changes are in fast path, performance regression test
> >>> is needed. I would like to see the performance difference with or
> >>> without this patch set. Hopefully nothing different. If you need any
> >>> helps, just let me
> >> know.
> >>
> >> I'm sorry, I won't have the needed resources to bench this as I would
> >> have to setup a performance platform with i40e devices.
> >>
> >> But I'm pretty sure that the code in non-offload case would be faster
> >> with this patch as it will avoid many operations in
> i40e_txd_enable_checksum().
> >>
> >> For the offload case, as we also removed the if (l2_len == 0) and if
> >> (l3_len == 0), I think there are also less tests than before my patch series.
> >>
> >> So in my opinion, adding this test does not really justify to check
> >> the performance.
> > As 40G is quite sensitive on cpu cycles, we'd better to avoid any
> > performance drop during our modifying the code for fast path.
> > Performance is what we care about too much. Based on my experiences,
> > even minor code changes may result in big performance impact.
> > It seems that we may need to help you on performance measurement.
> 
> Thanks, indeed it's helpful if you can check performance non-regression.
I have asked our validation guys here to help you on that, but might not in
high priority. In addition, we all will take vocation for the coming Chinese new year.

Regards,
Helin

> 
> Regards,
> Olivier
  
Olivier Matz Feb. 13, 2015, 8:41 a.m. UTC | #6
Hi,

On 02/13/2015 03:25 AM, Zhang, Helin wrote:
>> On 02/11/2015 06:32 AM, Zhang, Helin wrote:
>>>> On 02/10/2015 07:03 AM, Zhang, Helin wrote:
>>>>>>    		/* Enable checksum offloading */
>>>>>>    		cd_tunneling_params = 0;
>>>>>> -		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
>>>>>> -						l2_len, l3_len, outer_l2_len,
>>>>>> -						outer_l3_len,
>>>>>> -						&cd_tunneling_params);
>>>>>> +		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) {
>>>>> likely should be added.
>>>>
>>>> I would say unlikely() instead. I think the non-offload case should
>>>> be the default one. What do you think?
>>
>> Maybe you missed this comment. Any thoughts?
> Ohh, sorry for the missing!
> I'd prefer to have likely, as hardware offload is preferred if there is. If you
> don't think so, how about to keep nothing (no likely/unlikely) as it is.

OK, I'll use likely() as you initially suggested.

>>> As 40G is quite sensitive on cpu cycles, we'd better to avoid any
>>> performance drop during our modifying the code for fast path.
>>> Performance is what we care about too much. Based on my experiences,
>>> even minor code changes may result in big performance impact.
>>> It seems that we may need to help you on performance measurement.
>>
>> Thanks, indeed it's helpful if you can check performance non-regression.
> I have asked our validation guys here to help you on that, but might not in
> high priority. In addition, we all will take vocation for the coming Chinese new year.

OK, it's noted


Thanks,
Olivier
  

Patch

diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index 8e9df96..9acdeee 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -74,6 +74,11 @@ 
 
 #define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS)
 
+#define I40E_TX_CKSUM_OFFLOAD_MASK (		 \
+		PKT_TX_IP_CKSUM |		 \
+		PKT_TX_L4_MASK |		 \
+		PKT_TX_OUTER_IP_CKSUM)
+
 #define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \
 	(uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM)
 
@@ -1272,10 +1277,12 @@  i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		cd_tunneling_params = 0;
-		i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
-						l2_len, l3_len, outer_l2_len,
-						outer_l3_len,
-						&cd_tunneling_params);
+		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) {
+			i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset,
+				l2_len, l3_len, outer_l2_len,
+				outer_l3_len,
+				&cd_tunneling_params);
+		}
 
 		if (unlikely(nb_ctx)) {
 			/* Setup TX context descriptor if required */