app/test-pmd: fix L4 checksum with padding data

Message ID 20230804082849.533059-1-kaiwenx.deng@intel.com (mailing list archive)
State New
Delegated to: Ferruh Yigit
Headers
Series app/test-pmd: fix L4 checksum with padding data |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/github-robot: build success github build: passed
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch-unit-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Kaiwen Deng Aug. 4, 2023, 8:28 a.m. UTC
  IEEE 802 packets may have a minimum size limit. The data fields
should be padded when necessary. In some cases, the padding data
is not zero. Testpmd does not trim these IP packets to the true
length of the frame, so errors will occur when calculating TCP
or UDP checksum.

This commit fixes this issue by triming IP packets to the true
length of the frame in testpmd.

Fixes: 03d17e4d0179 ("app/testpmd: do not change IP addrs in checksum engine")
Cc: stable@dpdk.org

Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
---
 app/test-pmd/csumonly.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
  

Comments

Ferruh Yigit Nov. 2, 2023, 7:20 p.m. UTC | #1
On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
> IEEE 802 packets may have a minimum size limit. The data fields
> should be padded when necessary. In some cases, the padding data
> is not zero. Testpmd does not trim these IP packets to the true
> length of the frame, so errors will occur when calculating TCP
> or UDP checksum.
> 

Hi Kaiwen,

I am trying to understand the problem, what is the testcase that has
checksum error?

Are the received mbuf data_len & pkt_len wrong? Instead of trying to fix
the mbuf during forwarding, can we fix where packet generated?

> This commit fixes this issue by triming IP packets to the true
> length of the frame in testpmd.
> 
> Fixes: 03d17e4d0179 ("app/testpmd: do not change IP addrs in checksum engine")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
> ---
>  app/test-pmd/csumonly.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index 7af635e3f7..58b72b714a 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -853,12 +853,14 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  	uint16_t nb_rx;
>  	uint16_t nb_prep;
>  	uint16_t i;
> +	uint16_t pad_len;
>  	uint64_t rx_ol_flags, tx_ol_flags;
>  	uint64_t tx_offloads;
>  	uint32_t rx_bad_ip_csum;
>  	uint32_t rx_bad_l4_csum;
>  	uint32_t rx_bad_outer_l4_csum;
>  	uint32_t rx_bad_outer_ip_csum;
> +	uint32_t l3_off;
>  	struct testpmd_offload_info info;
>  
>  	/* receive a burst of packet */
> @@ -980,6 +982,36 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  			l3_hdr = (char *)l3_hdr + info.outer_l3_len + info.l2_len;
>  		}
>  
> +		if (info.is_tunnel) {
> +			l3_off = info.outer_l2_len +
> +					info.outer_l3_len +
> +					info.l2_len;
> +		} else {
> +			l3_off = info.l2_len;
> +		}
> +		switch (info.ethertype) {
> +		case _htons(RTE_ETHER_TYPE_IPV4):
> +			pad_len = rte_pktmbuf_data_len(m) -
> +					(l3_off +
> +					rte_be_to_cpu_16(
> +					((struct rte_ipv4_hdr *)l3_hdr)->total_length));
> +			break;
> +		case _htons(RTE_ETHER_TYPE_IPV6):
> +			pad_len = rte_pktmbuf_data_len(m) -
> +					(l3_off +
> +					rte_be_to_cpu_16(
> +					((struct rte_ipv6_hdr *)l3_hdr)->payload_len));
> +			break;
> +		default:
> +			pad_len = 0;
> +			break;
> +		}
> +
> +		if (pad_len) {
> +			rte_pktmbuf_data_len(m) = rte_pktmbuf_data_len(m) - pad_len;
> +			rte_pktmbuf_pkt_len(m) = rte_pktmbuf_data_len(m);
> +		}
> +
>  		/* step 2: depending on user command line configuration,
>  		 * recompute checksum either in software or flag the
>  		 * mbuf to offload the calculation to the NIC. If TSO
  
Kaiwen Deng Nov. 3, 2023, 2:49 a.m. UTC | #2
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, November 3, 2023 3:20 AM
> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> <yidingx.zhou@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Matz, Olivier
> <olivier.matz@6wind.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
> 
> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
> > IEEE 802 packets may have a minimum size limit. The data fields should
> > be padded when necessary. In some cases, the padding data is not zero.
> > Testpmd does not trim these IP packets to the true length of the
> > frame, so errors will occur when calculating TCP or UDP checksum.
> >
> 
> Hi Kaiwen,
> 
> I am trying to understand the problem, what is the testcase that has checksum
> error?
> 
> Are the received mbuf data_len & pkt_len wrong? Instead of trying to fix the
> mbuf during forwarding, can we fix where packet generated?
> 
Hi Ferruh,

In effect, the packet is padded by the switch. 
IEEE 802 packets may have a minimum size limit. The data fields should 
be padded by switch when necessary. In some switches, the padding data is not zero. 

Csumonly doesn't trim these packets to the true length of the frame. 
In csumonly, the received mbuf data_len is the true length of the packet plus the padding data len.
Therefore, padding data is included in the checksum calculation.
When the padding data is not zero, the checksum is wrong.

> > This commit fixes this issue by triming IP packets to the true length
> > of the frame in testpmd.
> >
> > Fixes: 03d17e4d0179 ("app/testpmd: do not change IP addrs in checksum
> > engine")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
> > ---
> >  app/test-pmd/csumonly.c | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index
> > 7af635e3f7..58b72b714a 100644
> > --- a/app/test-pmd/csumonly.c
> > +++ b/app/test-pmd/csumonly.c
> > @@ -853,12 +853,14 @@ pkt_burst_checksum_forward(struct
> fwd_stream *fs)
> >  	uint16_t nb_rx;
> >  	uint16_t nb_prep;
> >  	uint16_t i;
> > +	uint16_t pad_len;
> >  	uint64_t rx_ol_flags, tx_ol_flags;
> >  	uint64_t tx_offloads;
> >  	uint32_t rx_bad_ip_csum;
> >  	uint32_t rx_bad_l4_csum;
> >  	uint32_t rx_bad_outer_l4_csum;
> >  	uint32_t rx_bad_outer_ip_csum;
> > +	uint32_t l3_off;
> >  	struct testpmd_offload_info info;
> >
> >  	/* receive a burst of packet */
> > @@ -980,6 +982,36 @@ pkt_burst_checksum_forward(struct fwd_stream
> *fs)
> >  			l3_hdr = (char *)l3_hdr + info.outer_l3_len +
> info.l2_len;
> >  		}
> >
> > +		if (info.is_tunnel) {
> > +			l3_off = info.outer_l2_len +
> > +					info.outer_l3_len +
> > +					info.l2_len;
> > +		} else {
> > +			l3_off = info.l2_len;
> > +		}
> > +		switch (info.ethertype) {
> > +		case _htons(RTE_ETHER_TYPE_IPV4):
> > +			pad_len = rte_pktmbuf_data_len(m) -
> > +					(l3_off +
> > +					rte_be_to_cpu_16(
> > +					((struct rte_ipv4_hdr *)l3_hdr)-
> >total_length));
> > +			break;
> > +		case _htons(RTE_ETHER_TYPE_IPV6):
> > +			pad_len = rte_pktmbuf_data_len(m) -
> > +					(l3_off +
> > +					rte_be_to_cpu_16(
> > +					((struct rte_ipv6_hdr *)l3_hdr)-
> >payload_len));
> > +			break;
> > +		default:
> > +			pad_len = 0;
> > +			break;
> > +		}
> > +
> > +		if (pad_len) {
> > +			rte_pktmbuf_data_len(m) =
> rte_pktmbuf_data_len(m) - pad_len;
> > +			rte_pktmbuf_pkt_len(m) = rte_pktmbuf_data_len(m);
> > +		}
> > +
> >  		/* step 2: depending on user command line configuration,
> >  		 * recompute checksum either in software or flag the
> >  		 * mbuf to offload the calculation to the NIC. If TSO
  
Ferruh Yigit Nov. 3, 2023, 4:03 a.m. UTC | #3
On 11/3/2023 2:49 AM, Deng, KaiwenX wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, November 3, 2023 3:20 AM
>> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
>> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
>> <yidingx.zhou@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
>> Zhang, Yuying <yuying.zhang@intel.com>; Matz, Olivier
>> <olivier.matz@6wind.com>; De Lara Guarch, Pablo
>> <pablo.de.lara.guarch@intel.com>
>> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
>>
>> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
>>> IEEE 802 packets may have a minimum size limit. The data fields should
>>> be padded when necessary. In some cases, the padding data is not zero.
>>> Testpmd does not trim these IP packets to the true length of the
>>> frame, so errors will occur when calculating TCP or UDP checksum.
>>>
>>
>> Hi Kaiwen,
>>
>> I am trying to understand the problem, what is the testcase that has checksum
>> error?
>>
>> Are the received mbuf data_len & pkt_len wrong? Instead of trying to fix the
>> mbuf during forwarding, can we fix where packet generated?
>>
> Hi Ferruh,
> 
> In effect, the packet is padded by the switch. 
> IEEE 802 packets may have a minimum size limit. The data fields should 
> be padded by switch when necessary. In some switches, the padding data is not zero. 
> 
> Csumonly doesn't trim these packets to the true length of the frame. 
> In csumonly, the received mbuf data_len is the true length of the packet plus the padding data len.
> Therefore, padding data is included in the checksum calculation.
> When the padding data is not zero, the checksum is wrong.
> 

Thanks for clarification.

Even some non-zero padding added, it will calculate the csum
successfully, but I assume in this case csum becomes different than
expected csum and test fails?

In this case why not fix the generated packets, and make them compatible
to minimum size requirement? What is generating packets?


>>> This commit fixes this issue by triming IP packets to the true length
>>> of the frame in testpmd.
>>>
>>> Fixes: 03d17e4d0179 ("app/testpmd: do not change IP addrs in checksum
>>> engine")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
>>> ---
>>>  app/test-pmd/csumonly.c | 32 ++++++++++++++++++++++++++++++++
>>>  1 file changed, 32 insertions(+)
>>>
>>> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index
>>> 7af635e3f7..58b72b714a 100644
>>> --- a/app/test-pmd/csumonly.c
>>> +++ b/app/test-pmd/csumonly.c
>>> @@ -853,12 +853,14 @@ pkt_burst_checksum_forward(struct
>> fwd_stream *fs)
>>>  	uint16_t nb_rx;
>>>  	uint16_t nb_prep;
>>>  	uint16_t i;
>>> +	uint16_t pad_len;
>>>  	uint64_t rx_ol_flags, tx_ol_flags;
>>>  	uint64_t tx_offloads;
>>>  	uint32_t rx_bad_ip_csum;
>>>  	uint32_t rx_bad_l4_csum;
>>>  	uint32_t rx_bad_outer_l4_csum;
>>>  	uint32_t rx_bad_outer_ip_csum;
>>> +	uint32_t l3_off;
>>>  	struct testpmd_offload_info info;
>>>
>>>  	/* receive a burst of packet */
>>> @@ -980,6 +982,36 @@ pkt_burst_checksum_forward(struct fwd_stream
>> *fs)
>>>  			l3_hdr = (char *)l3_hdr + info.outer_l3_len +
>> info.l2_len;
>>>  		}
>>>
>>> +		if (info.is_tunnel) {
>>> +			l3_off = info.outer_l2_len +
>>> +					info.outer_l3_len +
>>> +					info.l2_len;
>>> +		} else {
>>> +			l3_off = info.l2_len;
>>> +		}
>>> +		switch (info.ethertype) {
>>> +		case _htons(RTE_ETHER_TYPE_IPV4):
>>> +			pad_len = rte_pktmbuf_data_len(m) -
>>> +					(l3_off +
>>> +					rte_be_to_cpu_16(
>>> +					((struct rte_ipv4_hdr *)l3_hdr)-
>>> total_length));
>>> +			break;
>>> +		case _htons(RTE_ETHER_TYPE_IPV6):
>>> +			pad_len = rte_pktmbuf_data_len(m) -
>>> +					(l3_off +
>>> +					rte_be_to_cpu_16(
>>> +					((struct rte_ipv6_hdr *)l3_hdr)-
>>> payload_len));
>>> +			break;
>>> +		default:
>>> +			pad_len = 0;
>>> +			break;
>>> +		}
>>> +
>>> +		if (pad_len) {
>>> +			rte_pktmbuf_data_len(m) =
>> rte_pktmbuf_data_len(m) - pad_len;
>>> +			rte_pktmbuf_pkt_len(m) = rte_pktmbuf_data_len(m);
>>> +		}
>>> +
>>>  		/* step 2: depending on user command line configuration,
>>>  		 * recompute checksum either in software or flag the
>>>  		 * mbuf to offload the calculation to the NIC. If TSO
>
  
Kaiwen Deng Nov. 14, 2023, 2:19 a.m. UTC | #4
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, November 3, 2023 12:03 PM
> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> <yidingx.zhou@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Matz, Olivier
> <olivier.matz@6wind.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
> 
> On 11/3/2023 2:49 AM, Deng, KaiwenX wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Friday, November 3, 2023 3:20 AM
> >> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
> >> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou,
> >> YidingX <yidingx.zhou@intel.com>; Singh, Aman Deep
> >> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> >> Matz, Olivier <olivier.matz@6wind.com>; De Lara Guarch, Pablo
> >> <pablo.de.lara.guarch@intel.com>
> >> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
> >>
> >> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
> >>> IEEE 802 packets may have a minimum size limit. The data fields
> >>> should be padded when necessary. In some cases, the padding data is not
> zero.
> >>> Testpmd does not trim these IP packets to the true length of the
> >>> frame, so errors will occur when calculating TCP or UDP checksum.
> >>>
> >>
> >> Hi Kaiwen,
> >>
> >> I am trying to understand the problem, what is the testcase that has
> >> checksum error?
> >>
> >> Are the received mbuf data_len & pkt_len wrong? Instead of trying to
> >> fix the mbuf during forwarding, can we fix where packet generated?
> >>
> > Hi Ferruh,
> >
> > In effect, the packet is padded by the switch.
> > IEEE 802 packets may have a minimum size limit. The data fields should
> > be padded by switch when necessary. In some switches, the padding data is
> not zero.
> >
> > Csumonly doesn't trim these packets to the true length of the frame.
> > In csumonly, the received mbuf data_len is the true length of the packet plus
> the padding data len.
> > Therefore, padding data is included in the checksum calculation.
> > When the padding data is not zero, the checksum is wrong.
> >
Hi,
Sorry for late reply.
The minimum frame length specified by IEEE 802.3 is 64 bytes. In practice, 
there are many packets less than 64 bytes that are padding through the switch. 

We found this issue because some customers found that their packets could not 
calculate checksum correctly, they would send some packets less than 64 bytes, 
but our app didn't strip the padding data for such packets.
> 
> Thanks for clarification.
> 
> Even some non-zero padding added, it will calculate the csum successfully, but
> I assume in this case csum becomes different than expected csum and test
> fails?
> 
> In this case why not fix the generated packets, and make them compatible to
> minimum size requirement? What is generating packets?
> 
> 
> >>> This commit fixes this issue by triming IP packets to the true
> >>> length of the frame in testpmd.
> >>>
> >>> Fixes: 03d17e4d0179 ("app/testpmd: do not change IP addrs in
> >>> checksum
> >>> engine")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
> >>> ---
> >>>  app/test-pmd/csumonly.c | 32
> ++++++++++++++++++++++++++++++++
> >>>  1 file changed, 32 insertions(+)
> >>>
> >>> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index
> >>> 7af635e3f7..58b72b714a 100644
> >>> --- a/app/test-pmd/csumonly.c
> >>> +++ b/app/test-pmd/csumonly.c
> >>> @@ -853,12 +853,14 @@ pkt_burst_checksum_forward(struct
> >> fwd_stream *fs)
> >>>  	uint16_t nb_rx;
> >>>  	uint16_t nb_prep;
> >>>  	uint16_t i;
> >>> +	uint16_t pad_len;
> >>>  	uint64_t rx_ol_flags, tx_ol_flags;
> >>>  	uint64_t tx_offloads;
> >>>  	uint32_t rx_bad_ip_csum;
> >>>  	uint32_t rx_bad_l4_csum;
> >>>  	uint32_t rx_bad_outer_l4_csum;
> >>>  	uint32_t rx_bad_outer_ip_csum;
> >>> +	uint32_t l3_off;
> >>>  	struct testpmd_offload_info info;
> >>>
> >>>  	/* receive a burst of packet */
> >>> @@ -980,6 +982,36 @@ pkt_burst_checksum_forward(struct
> fwd_stream
> >> *fs)
> >>>  			l3_hdr = (char *)l3_hdr + info.outer_l3_len +
> >> info.l2_len;
> >>>  		}
> >>>
> >>> +		if (info.is_tunnel) {
> >>> +			l3_off = info.outer_l2_len +
> >>> +					info.outer_l3_len +
> >>> +					info.l2_len;
> >>> +		} else {
> >>> +			l3_off = info.l2_len;
> >>> +		}
> >>> +		switch (info.ethertype) {
> >>> +		case _htons(RTE_ETHER_TYPE_IPV4):
> >>> +			pad_len = rte_pktmbuf_data_len(m) -
> >>> +					(l3_off +
> >>> +					rte_be_to_cpu_16(
> >>> +					((struct rte_ipv4_hdr *)l3_hdr)-
> >>> total_length));
> >>> +			break;
> >>> +		case _htons(RTE_ETHER_TYPE_IPV6):
> >>> +			pad_len = rte_pktmbuf_data_len(m) -
> >>> +					(l3_off +
> >>> +					rte_be_to_cpu_16(
> >>> +					((struct rte_ipv6_hdr *)l3_hdr)-
> >>> payload_len));
> >>> +			break;
> >>> +		default:
> >>> +			pad_len = 0;
> >>> +			break;
> >>> +		}
> >>> +
> >>> +		if (pad_len) {
> >>> +			rte_pktmbuf_data_len(m) =
> >> rte_pktmbuf_data_len(m) - pad_len;
> >>> +			rte_pktmbuf_pkt_len(m) = rte_pktmbuf_data_len(m);
> >>> +		}
> >>> +
> >>>  		/* step 2: depending on user command line configuration,
> >>>  		 * recompute checksum either in software or flag the
> >>>  		 * mbuf to offload the calculation to the NIC. If TSO
> >
  
Ferruh Yigit Nov. 14, 2023, 7:09 p.m. UTC | #5
On 11/14/2023 2:19 AM, Deng, KaiwenX wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, November 3, 2023 12:03 PM
>> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
>> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
>> <yidingx.zhou@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
>> Zhang, Yuying <yuying.zhang@intel.com>; Matz, Olivier
>> <olivier.matz@6wind.com>; De Lara Guarch, Pablo
>> <pablo.de.lara.guarch@intel.com>
>> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
>>
>> On 11/3/2023 2:49 AM, Deng, KaiwenX wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> Sent: Friday, November 3, 2023 3:20 AM
>>>> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
>>>> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou,
>>>> YidingX <yidingx.zhou@intel.com>; Singh, Aman Deep
>>>> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
>>>> Matz, Olivier <olivier.matz@6wind.com>; De Lara Guarch, Pablo
>>>> <pablo.de.lara.guarch@intel.com>
>>>> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
>>>>
>>>> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
>>>>> IEEE 802 packets may have a minimum size limit. The data fields
>>>>> should be padded when necessary. In some cases, the padding data is not
>> zero.
>>>>> Testpmd does not trim these IP packets to the true length of the
>>>>> frame, so errors will occur when calculating TCP or UDP checksum.
>>>>>
>>>>
>>>> Hi Kaiwen,
>>>>
>>>> I am trying to understand the problem, what is the testcase that has
>>>> checksum error?
>>>>
>>>> Are the received mbuf data_len & pkt_len wrong? Instead of trying to
>>>> fix the mbuf during forwarding, can we fix where packet generated?
>>>>
>>> Hi Ferruh,
>>>
>>> In effect, the packet is padded by the switch.
>>> IEEE 802 packets may have a minimum size limit. The data fields should
>>> be padded by switch when necessary. In some switches, the padding data is
>> not zero.
>>>
>>> Csumonly doesn't trim these packets to the true length of the frame.
>>> In csumonly, the received mbuf data_len is the true length of the packet plus
>> the padding data len.
>>> Therefore, padding data is included in the checksum calculation.
>>> When the padding data is not zero, the checksum is wrong.
>>>
> Hi,
> Sorry for late reply.
> The minimum frame length specified by IEEE 802.3 is 64 bytes. In practice, 
> there are many packets less than 64 bytes that are padding through the switch. 
> 
> We found this issue because some customers found that their packets could not 
> calculate checksum correctly, they would send some packets less than 64 bytes, 
> but our app didn't strip the padding data for such packets.
>

OK, so switch in between is padding packets to make them compatible with
standard.

From DPDK application perspective received packet is 64 bytes, right?
Problem happens because where verifies the checksum gets different
checksum that expected, but this is because packet is modified in
between by the networking setup.
I am not sure about trying to fix this in the testpmd.

Why not send packets that are >= 64 bytes from sender side, or configure
switch to not add padding or maybe use different switch?


>>
>> Thanks for clarification.
>>
>> Even some non-zero padding added, it will calculate the csum successfully, but
>> I assume in this case csum becomes different than expected csum and test
>> fails?
>>
>> In this case why not fix the generated packets, and make them compatible to
>> minimum size requirement? What is generating packets?
>>
>>
>>>>> This commit fixes this issue by triming IP packets to the true
>>>>> length of the frame in testpmd.
>>>>>
>>>>> Fixes: 03d17e4d0179 ("app/testpmd: do not change IP addrs in
>>>>> checksum
>>>>> engine")
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
>>>>> ---
>>>>>  app/test-pmd/csumonly.c | 32
>> ++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 32 insertions(+)
>>>>>
>>>>> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index
>>>>> 7af635e3f7..58b72b714a 100644
>>>>> --- a/app/test-pmd/csumonly.c
>>>>> +++ b/app/test-pmd/csumonly.c
>>>>> @@ -853,12 +853,14 @@ pkt_burst_checksum_forward(struct
>>>> fwd_stream *fs)
>>>>>  	uint16_t nb_rx;
>>>>>  	uint16_t nb_prep;
>>>>>  	uint16_t i;
>>>>> +	uint16_t pad_len;
>>>>>  	uint64_t rx_ol_flags, tx_ol_flags;
>>>>>  	uint64_t tx_offloads;
>>>>>  	uint32_t rx_bad_ip_csum;
>>>>>  	uint32_t rx_bad_l4_csum;
>>>>>  	uint32_t rx_bad_outer_l4_csum;
>>>>>  	uint32_t rx_bad_outer_ip_csum;
>>>>> +	uint32_t l3_off;
>>>>>  	struct testpmd_offload_info info;
>>>>>
>>>>>  	/* receive a burst of packet */
>>>>> @@ -980,6 +982,36 @@ pkt_burst_checksum_forward(struct
>> fwd_stream
>>>> *fs)
>>>>>  			l3_hdr = (char *)l3_hdr + info.outer_l3_len +
>>>> info.l2_len;
>>>>>  		}
>>>>>
>>>>> +		if (info.is_tunnel) {
>>>>> +			l3_off = info.outer_l2_len +
>>>>> +					info.outer_l3_len +
>>>>> +					info.l2_len;
>>>>> +		} else {
>>>>> +			l3_off = info.l2_len;
>>>>> +		}
>>>>> +		switch (info.ethertype) {
>>>>> +		case _htons(RTE_ETHER_TYPE_IPV4):
>>>>> +			pad_len = rte_pktmbuf_data_len(m) -
>>>>> +					(l3_off +
>>>>> +					rte_be_to_cpu_16(
>>>>> +					((struct rte_ipv4_hdr *)l3_hdr)-
>>>>> total_length));
>>>>> +			break;
>>>>> +		case _htons(RTE_ETHER_TYPE_IPV6):
>>>>> +			pad_len = rte_pktmbuf_data_len(m) -
>>>>> +					(l3_off +
>>>>> +					rte_be_to_cpu_16(
>>>>> +					((struct rte_ipv6_hdr *)l3_hdr)-
>>>>> payload_len));
>>>>> +			break;
>>>>> +		default:
>>>>> +			pad_len = 0;
>>>>> +			break;
>>>>> +		}
>>>>> +
>>>>> +		if (pad_len) {
>>>>> +			rte_pktmbuf_data_len(m) =
>>>> rte_pktmbuf_data_len(m) - pad_len;
>>>>> +			rte_pktmbuf_pkt_len(m) = rte_pktmbuf_data_len(m);
>>>>> +		}
>>>>> +
>>>>>  		/* step 2: depending on user command line configuration,
>>>>>  		 * recompute checksum either in software or flag the
>>>>>  		 * mbuf to offload the calculation to the NIC. If TSO
>>>
>
  
Kaiwen Deng Nov. 16, 2023, 7:02 a.m. UTC | #6
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, November 15, 2023 3:10 AM
> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> <yidingx.zhou@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Matz, Olivier
> <olivier.matz@6wind.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
> 
> On 11/14/2023 2:19 AM, Deng, KaiwenX wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Friday, November 3, 2023 12:03 PM
> >> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
> >> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou,
> >> YidingX <yidingx.zhou@intel.com>; Singh, Aman Deep
> >> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> >> Matz, Olivier <olivier.matz@6wind.com>; De Lara Guarch, Pablo
> >> <pablo.de.lara.guarch@intel.com>
> >> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
> >>
> >> On 11/3/2023 2:49 AM, Deng, KaiwenX wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >>>> Sent: Friday, November 3, 2023 3:20 AM
> >>>> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
> >>>> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou,
> >>>> YidingX <yidingx.zhou@intel.com>; Singh, Aman Deep
> >>>> <aman.deep.singh@intel.com>; Zhang, Yuying
> >>>> <yuying.zhang@intel.com>; Matz, Olivier <olivier.matz@6wind.com>;
> >>>> De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> >>>> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding
> >>>> data
> >>>>
> >>>> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
> >>>>> IEEE 802 packets may have a minimum size limit. The data fields
> >>>>> should be padded when necessary. In some cases, the padding data
> >>>>> is not
> >> zero.
> >>>>> Testpmd does not trim these IP packets to the true length of the
> >>>>> frame, so errors will occur when calculating TCP or UDP checksum.
> >>>>>
> >>>>
> >>>> Hi Kaiwen,
> >>>>
> >>>> I am trying to understand the problem, what is the testcase that
> >>>> has checksum error?
> >>>>
> >>>> Are the received mbuf data_len & pkt_len wrong? Instead of trying
> >>>> to fix the mbuf during forwarding, can we fix where packet generated?
> >>>>
> >>> Hi Ferruh,
> >>>
> >>> In effect, the packet is padded by the switch.
> >>> IEEE 802 packets may have a minimum size limit. The data fields
> >>> should be padded by switch when necessary. In some switches, the
> >>> padding data is
> >> not zero.
> >>>
> >>> Csumonly doesn't trim these packets to the true length of the frame.
> >>> In csumonly, the received mbuf data_len is the true length of the
> >>> packet plus
> >> the padding data len.
> >>> Therefore, padding data is included in the checksum calculation.
> >>> When the padding data is not zero, the checksum is wrong.
> >>>
> > Hi,
> > Sorry for late reply.
> > The minimum frame length specified by IEEE 802.3 is 64 bytes. In
> > practice, there are many packets less than 64 bytes that are padding through
> the switch.
> >
> > We found this issue because some customers found that their packets
> > could not calculate checksum correctly, they would send some packets
> > less than 64 bytes, but our app didn't strip the padding data for such
> packets.
> >
> 
> OK, so switch in between is padding packets to make them compatible with
> standard.
> 
> From DPDK application perspective received packet is 64 bytes, right?
> Problem happens because where verifies the checksum gets different
> checksum that expected, but this is because packet is modified in between by
> the networking setup.
> I am not sure about trying to fix this in the testpmd.
> 
> Why not send packets that are >= 64 bytes from sender side, or configure
> switch to not add padding or maybe use different switch?
> 
If we send a 40 bytes UDP packet, it will be padded to 64 bytes as it passes through 
the switch, whereas the Linux kernel stack strips out the padding data as it receives 
the packet. 
I think maybe DPDK applications should be aligned with the Linux kernel. 
Otherwise the csumonly application only supports packets above 64 bytes.
> 
> >>
> >> Thanks for clarification.
> >>
> >> Even some non-zero padding added, it will calculate the csum
> >> successfully, but I assume in this case csum becomes different than
> >> expected csum and test fails?
> >>
> >> In this case why not fix the generated packets, and make them
> >> compatible to minimum size requirement? What is generating packets?
> >>
> >>
> >>>>> This commit fixes this issue by triming IP packets to the true
> >>>>> length of the frame in testpmd.
> >>>>>
> >>>>> Fixes: 03d17e4d0179 ("app/testpmd: do not change IP addrs in
> >>>>> checksum
> >>>>> engine")
> >>>>> Cc: stable@dpdk.org
> >>>>>
> >>>>> Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
> >>>>> ---
> >>>>>  app/test-pmd/csumonly.c | 32
> >> ++++++++++++++++++++++++++++++++
> >>>>>  1 file changed, 32 insertions(+)
> >>>>>
> >>>>> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> >>>>> index 7af635e3f7..58b72b714a 100644
> >>>>> --- a/app/test-pmd/csumonly.c
> >>>>> +++ b/app/test-pmd/csumonly.c
> >>>>> @@ -853,12 +853,14 @@ pkt_burst_checksum_forward(struct
> >>>> fwd_stream *fs)
> >>>>>  	uint16_t nb_rx;
> >>>>>  	uint16_t nb_prep;
> >>>>>  	uint16_t i;
> >>>>> +	uint16_t pad_len;
> >>>>>  	uint64_t rx_ol_flags, tx_ol_flags;
> >>>>>  	uint64_t tx_offloads;
> >>>>>  	uint32_t rx_bad_ip_csum;
> >>>>>  	uint32_t rx_bad_l4_csum;
> >>>>>  	uint32_t rx_bad_outer_l4_csum;
> >>>>>  	uint32_t rx_bad_outer_ip_csum;
> >>>>> +	uint32_t l3_off;
> >>>>>  	struct testpmd_offload_info info;
> >>>>>
> >>>>>  	/* receive a burst of packet */
> >>>>> @@ -980,6 +982,36 @@ pkt_burst_checksum_forward(struct
> >> fwd_stream
> >>>> *fs)
> >>>>>  			l3_hdr = (char *)l3_hdr + info.outer_l3_len +
> >>>> info.l2_len;
> >>>>>  		}
> >>>>>
> >>>>> +		if (info.is_tunnel) {
> >>>>> +			l3_off = info.outer_l2_len +
> >>>>> +					info.outer_l3_len +
> >>>>> +					info.l2_len;
> >>>>> +		} else {
> >>>>> +			l3_off = info.l2_len;
> >>>>> +		}
> >>>>> +		switch (info.ethertype) {
> >>>>> +		case _htons(RTE_ETHER_TYPE_IPV4):
> >>>>> +			pad_len = rte_pktmbuf_data_len(m) -
> >>>>> +					(l3_off +
> >>>>> +					rte_be_to_cpu_16(
> >>>>> +					((struct rte_ipv4_hdr *)l3_hdr)-
> >>>>> total_length));
> >>>>> +			break;
> >>>>> +		case _htons(RTE_ETHER_TYPE_IPV6):
> >>>>> +			pad_len = rte_pktmbuf_data_len(m) -
> >>>>> +					(l3_off +
> >>>>> +					rte_be_to_cpu_16(
> >>>>> +					((struct rte_ipv6_hdr *)l3_hdr)-
> >>>>> payload_len));
> >>>>> +			break;
> >>>>> +		default:
> >>>>> +			pad_len = 0;
> >>>>> +			break;
> >>>>> +		}
> >>>>> +
> >>>>> +		if (pad_len) {
> >>>>> +			rte_pktmbuf_data_len(m) =
> >>>> rte_pktmbuf_data_len(m) - pad_len;
> >>>>> +			rte_pktmbuf_pkt_len(m) = rte_pktmbuf_data_len(m);
> >>>>> +		}
> >>>>> +
> >>>>>  		/* step 2: depending on user command line configuration,
> >>>>>  		 * recompute checksum either in software or flag the
> >>>>>  		 * mbuf to offload the calculation to the NIC. If TSO
> >>>
> >
  
Stephen Hemminger Nov. 16, 2023, 10:58 p.m. UTC | #7
On Thu, 2 Nov 2023 19:20:07 +0000
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
> > IEEE 802 packets may have a minimum size limit. The data fields
> > should be padded when necessary. In some cases, the padding data
> > is not zero. Testpmd does not trim these IP packets to the true
> > length of the frame, so errors will occur when calculating TCP
> > or UDP checksum.
> >   
> 
> Hi Kaiwen,
> 
> I am trying to understand the problem, what is the testcase that has
> checksum error?
> 
> Are the received mbuf data_len & pkt_len wrong? Instead of trying to fix
> the mbuf during forwarding, can we fix where packet generated?

The root cause is that get_udptcp_cksum_mbuf is using m->pkt_len
which maybe larger than the actual data. The real issue is there and
in rte_ip.h checksum code. The correct fix would be to use l3_len instead.

It also looks like test-pmd is not validating the IP header.
Both parse_ipv4() and parse_ipv6() should check if packet was truncated.
Same for both UDP and TCP lengths.
  
Ferruh Yigit Nov. 17, 2023, 12:50 a.m. UTC | #8
On 11/16/2023 10:58 PM, Stephen Hemminger wrote:
> On Thu, 2 Nov 2023 19:20:07 +0000
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
>>> IEEE 802 packets may have a minimum size limit. The data fields
>>> should be padded when necessary. In some cases, the padding data
>>> is not zero. Testpmd does not trim these IP packets to the true
>>> length of the frame, so errors will occur when calculating TCP
>>> or UDP checksum.
>>>   
>>
>> Hi Kaiwen,
>>
>> I am trying to understand the problem, what is the testcase that has
>> checksum error?
>>
>> Are the received mbuf data_len & pkt_len wrong? Instead of trying to fix
>> the mbuf during forwarding, can we fix where packet generated?
> 
> The root cause is that get_udptcp_cksum_mbuf is using m->pkt_len
> which maybe larger than the actual data. The real issue is there and
> in rte_ip.h checksum code. The correct fix would be to use l3_len instead.
> 

I see, you are right.

In 'rte_ipv4_udptcp_cksum_mbuf()',
as payload length "mbuf->pkt_len - l4_off" is used, which includes
padding and if padding is not zero it will end up producing wrong checksum.


I agree using 'l3_len' instead is correct fix.

But this requires ABI/API change,
plus do we have any reason to keep the padding, discarding it as this
patch does is also simpler alternative.


Other alternative can be to zero the padding bytes. I guess standard
doesn't enforce them to be zero, but we can do this to remove its impact
on checksum calculation.


@Kaiwen, did you able to test this with HW offload, what is the behavior
of the HW, does is remove padding bytes?


> It also looks like test-pmd is not validating the IP header.
> Both parse_ipv4() and parse_ipv6() should check if packet was truncated.
> Same for both UDP and TCP lengths.
>
  
Ferruh Yigit Nov. 17, 2023, 1:13 a.m. UTC | #9
On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
> IEEE 802 packets may have a minimum size limit. The data fields
> should be padded when necessary. In some cases, the padding data
> is not zero. Testpmd does not trim these IP packets to the true
> length of the frame, so errors will occur when calculating TCP
> or UDP checksum.
> 
> This commit fixes this issue by triming IP packets to the true
> length of the frame in testpmd.
> 
> Fixes: 03d17e4d0179 ("app/testpmd: do not change IP addrs in checksum engine")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
> ---
>  app/test-pmd/csumonly.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index 7af635e3f7..58b72b714a 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -853,12 +853,14 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  	uint16_t nb_rx;
>  	uint16_t nb_prep;
>  	uint16_t i;
> +	uint16_t pad_len;
>  	uint64_t rx_ol_flags, tx_ol_flags;
>  	uint64_t tx_offloads;
>  	uint32_t rx_bad_ip_csum;
>  	uint32_t rx_bad_l4_csum;
>  	uint32_t rx_bad_outer_l4_csum;
>  	uint32_t rx_bad_outer_ip_csum;
> +	uint32_t l3_off;
>  	struct testpmd_offload_info info;
>  
>  	/* receive a burst of packet */
> @@ -980,6 +982,36 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  			l3_hdr = (char *)l3_hdr + info.outer_l3_len + info.l2_len;
>  		}
>  
> +		if (info.is_tunnel) {
> +			l3_off = info.outer_l2_len +
> +					info.outer_l3_len +
> +					info.l2_len;
>

I don't know much about tunnel code but is above calculation correct for
all tunnel protocols, like for the case inner packet over UDP, should
outer l4_len also added etc...


> +		} else {
> +			l3_off = info.l2_len;
> +		}
> +		switch (info.ethertype) {
> +		case _htons(RTE_ETHER_TYPE_IPV4):
> +			pad_len = rte_pktmbuf_data_len(m) -
> +					(l3_off +
> +					rte_be_to_cpu_16(
> +					((struct rte_ipv4_hdr *)l3_hdr)->total_length));
> +			break;
> +		case _htons(RTE_ETHER_TYPE_IPV6):
> +			pad_len = rte_pktmbuf_data_len(m) -
> +					(l3_off +
> +					rte_be_to_cpu_16(
> +					((struct rte_ipv6_hdr *)l3_hdr)->payload_len));
>

As far as I remember ipv6 payload_len doesn't contain the header length,
so pad_len calculation should be different than ipv4 one,
like "l4_off + l3_hdr->payload_len", did you verify this code with ipv6?


> +			break;
> +		default:
> +			pad_len = 0;
> +			break;
> +		}
> +
> +		if (pad_len) {
> +			rte_pktmbuf_data_len(m) = rte_pktmbuf_data_len(m) - pad_len;
> +			rte_pktmbuf_pkt_len(m) = rte_pktmbuf_data_len(m);
>

Can't received mbuf be multi-segment mbuf, as far as I can see checksum
calculation API takes this possibility into account. If so need to check
that possibility here before updating 'pkt_len'
  
Stephen Hemminger Nov. 17, 2023, 3:28 a.m. UTC | #10
On Fri, 17 Nov 2023 00:50:16 +0000
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> >> Hi Kaiwen,
> >>
> >> I am trying to understand the problem, what is the testcase that has
> >> checksum error?
> >>
> >> Are the received mbuf data_len & pkt_len wrong? Instead of trying to fix
> >> the mbuf during forwarding, can we fix where packet generated?  
> > 
> > The root cause is that get_udptcp_cksum_mbuf is using m->pkt_len
> > which maybe larger than the actual data. The real issue is there and
> > in rte_ip.h checksum code. The correct fix would be to use l3_len instead.
> >   
> 
> I see, you are right.
> 
> In 'rte_ipv4_udptcp_cksum_mbuf()',
> as payload length "mbuf->pkt_len - l4_off" is used, which includes
> padding and if padding is not zero it will end up producing wrong checksum.
> 
> 
> I agree using 'l3_len' instead is correct fix.
> 
> But this requires ABI/API change,
> plus do we have any reason to keep the padding, discarding it as this
> patch does is also simpler alternative.


Possibly an API version to change the args would work to fix.
  
Ferruh Yigit Nov. 17, 2023, 9:29 a.m. UTC | #11
On 11/17/2023 3:28 AM, Stephen Hemminger wrote:
> On Fri, 17 Nov 2023 00:50:16 +0000
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>>>> Hi Kaiwen,
>>>>
>>>> I am trying to understand the problem, what is the testcase that has
>>>> checksum error?
>>>>
>>>> Are the received mbuf data_len & pkt_len wrong? Instead of trying to fix
>>>> the mbuf during forwarding, can we fix where packet generated?  
>>>
>>> The root cause is that get_udptcp_cksum_mbuf is using m->pkt_len
>>> which maybe larger than the actual data. The real issue is there and
>>> in rte_ip.h checksum code. The correct fix would be to use l3_len instead.
>>>   
>>
>> I see, you are right.
>>
>> In 'rte_ipv4_udptcp_cksum_mbuf()',
>> as payload length "mbuf->pkt_len - l4_off" is used, which includes
>> padding and if padding is not zero it will end up producing wrong checksum.
>>
>>
>> I agree using 'l3_len' instead is correct fix.
>>
>> But this requires ABI/API change,
>> plus do we have any reason to keep the padding, discarding it as this
>> patch does is also simpler alternative.
> 
> 
> Possibly an API version to change the args would work to fix.
>

rte_ipv4_udptcp_cksum_mbuf() and rte_ipv6_udptcp_cksum_mbuf() are inline
functions, unfortunately we can't version them.

But those functions already gets IP header as parameter, can't we use IP
header to get the payload size? If so this can be fixed without updating
API.
  
Morten Brørup Nov. 17, 2023, 12:11 p.m. UTC | #12
> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
> Sent: Friday, 17 November 2023 10.30
> 
> On 11/17/2023 3:28 AM, Stephen Hemminger wrote:
> > On Fri, 17 Nov 2023 00:50:16 +0000
> > Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >
> >>>> Hi Kaiwen,
> >>>>
> >>>> I am trying to understand the problem, what is the testcase that
> has
> >>>> checksum error?
> >>>>
> >>>> Are the received mbuf data_len & pkt_len wrong? Instead of trying
> to fix
> >>>> the mbuf during forwarding, can we fix where packet generated?
> >>>
> >>> The root cause is that get_udptcp_cksum_mbuf is using m->pkt_len
> >>> which maybe larger than the actual data. The real issue is there
> and
> >>> in rte_ip.h checksum code. The correct fix would be to use l3_len
> instead.
> >>>
> >>
> >> I see, you are right.
> >>
> >> In 'rte_ipv4_udptcp_cksum_mbuf()',
> >> as payload length "mbuf->pkt_len - l4_off" is used, which includes
> >> padding and if padding is not zero it will end up producing wrong
> checksum.
> >>
> >>
> >> I agree using 'l3_len' instead is correct fix.
> >>
> >> But this requires ABI/API change,
> >> plus do we have any reason to keep the padding, discarding it as
> this
> >> patch does is also simpler alternative.
> >
> >
> > Possibly an API version to change the args would work to fix.
> >
> 
> rte_ipv4_udptcp_cksum_mbuf() and rte_ipv6_udptcp_cksum_mbuf() are
> inline
> functions, unfortunately we can't version them.
> 
> But those functions already gets IP header as parameter, can't we use
> IP
> header to get the payload size? If so this can be fixed without
> updating
> API.

If rte_ipv4_udptcp_cksum_mbuf() - or any other function in the DPDK Network Headers library - includes Ethernet padding (which may be non-zero) when calculating the TCP/UDP checksum of an IPv4 packet, it is a bug, and must be fixed there.

Our test cases should use random padding to catch bugs like this.

And I just realized that Ethernet padding may be added to any IP packet, so don't assume that this bug only applies to small packets.
  
Stephen Hemminger Nov. 17, 2023, 4:22 p.m. UTC | #13
On Fri, 17 Nov 2023 09:29:41 +0000
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> >> I agree using 'l3_len' instead is correct fix.
> >>
> >> But this requires ABI/API change,
> >> plus do we have any reason to keep the padding, discarding it as this
> >> patch does is also simpler alternative.  
> > 
> > 
> > Possibly an API version to change the args would work to fix.
> >  
> 
> rte_ipv4_udptcp_cksum_mbuf() and rte_ipv6_udptcp_cksum_mbuf() are inline
> functions, unfortunately we can't version them.
> 
> But those functions already gets IP header as parameter, can't we use IP
> header to get the payload size? If so this can be fixed without updating
> API.

Inlines are easier. Just make a fixed new function and make sure the old
one is not used.  They shouldn't have been inline in the first place.
  
Stephen Hemminger Nov. 17, 2023, 4:23 p.m. UTC | #14
On Fri, 17 Nov 2023 13:11:50 +0100
Morten Brørup <mb@smartsharesystems.com> wrote:

> > rte_ipv4_udptcp_cksum_mbuf() and rte_ipv6_udptcp_cksum_mbuf() are
> > inline
> > functions, unfortunately we can't version them.
> > 
> > But those functions already gets IP header as parameter, can't we use
> > IP
> > header to get the payload size? If so this can be fixed without
> > updating
> > API.  
> 
> If rte_ipv4_udptcp_cksum_mbuf() - or any other function in the DPDK Network Headers library - includes Ethernet padding (which may be non-zero) when calculating the TCP/UDP checksum of an IPv4 packet, it is a bug, and must be fixed there.
> 
> Our test cases should use random padding to catch bugs like this.
> 
> And I just realized that Ethernet padding may be added to any IP packet, so don't assume that this bug only applies to small packets.

Agree. And test code needs lots more header checks it is way too trusting that mbuf is valid.
  
Kaiwen Deng Nov. 20, 2023, 9:21 a.m. UTC | #15
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, November 17, 2023 8:50 AM
> To: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org;
> stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> <yidingx.zhou@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Matz, Olivier
> <olivier.matz@6wind.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
> 
> On 11/16/2023 10:58 PM, Stephen Hemminger wrote:
> > On Thu, 2 Nov 2023 19:20:07 +0000
> > Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >
> >> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
> >>> IEEE 802 packets may have a minimum size limit. The data fields
> >>> should be padded when necessary. In some cases, the padding data is
> >>> not zero. Testpmd does not trim these IP packets to the true length
> >>> of the frame, so errors will occur when calculating TCP or UDP
> >>> checksum.
> >>>
> >>
> >> Hi Kaiwen,
> >>
> >> I am trying to understand the problem, what is the testcase that has
> >> checksum error?
> >>
> >> Are the received mbuf data_len & pkt_len wrong? Instead of trying to
> >> fix the mbuf during forwarding, can we fix where packet generated?
> >
> > The root cause is that get_udptcp_cksum_mbuf is using m->pkt_len which
> > maybe larger than the actual data. The real issue is there and in
> > rte_ip.h checksum code. The correct fix would be to use l3_len instead.
> >
> 
> I see, you are right.
> 
> In 'rte_ipv4_udptcp_cksum_mbuf()',
> as payload length "mbuf->pkt_len - l4_off" is used, which includes padding
> and if padding is not zero it will end up producing wrong checksum.
> 
> 
> I agree using 'l3_len' instead is correct fix.
> 
> But this requires ABI/API change,
> plus do we have any reason to keep the padding, discarding it as this patch
> does is also simpler alternative.
> 
> 
> Other alternative can be to zero the padding bytes. I guess standard doesn't
> enforce them to be zero, but we can do this to remove its impact on checksum
> calculation.
I'm not sure if this is ok, it feels like it would reduce performance. 
I can try this alternative if needed.
> 
> 
> @Kaiwen, did you able to test this with HW offload, what is the behavior of
> the HW, does is remove padding bytes?
> 
I tested the HW offload case and the same tcp/udp checksum error occurs when padding is not 0, 
But if change pkt_len to the true length of the frame, the checksum is correct.
> 
> > It also looks like test-pmd is not validating the IP header.
> > Both parse_ipv4() and parse_ipv6() should check if packet was truncated.
> > Same for both UDP and TCP lengths.
> >
>
  
Kaiwen Deng Nov. 20, 2023, 9:52 a.m. UTC | #16
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, November 17, 2023 9:14 AM
> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> <yidingx.zhou@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Matz, Olivier
> <olivier.matz@6wind.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
> 
> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
> > IEEE 802 packets may have a minimum size limit. The data fields should
> > be padded when necessary. In some cases, the padding data is not zero.
> > Testpmd does not trim these IP packets to the true length of the
> > frame, so errors will occur when calculating TCP or UDP checksum.
> >
> > This commit fixes this issue by triming IP packets to the true length
> > of the frame in testpmd.
> >
> > Fixes: 03d17e4d0179 ("app/testpmd: do not change IP addrs in checksum
> > engine")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com>
> > ---
> >  app/test-pmd/csumonly.c | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index
> > 7af635e3f7..58b72b714a 100644
> > --- a/app/test-pmd/csumonly.c
> > +++ b/app/test-pmd/csumonly.c
> > @@ -853,12 +853,14 @@ pkt_burst_checksum_forward(struct
> fwd_stream *fs)
> >  	uint16_t nb_rx;
> >  	uint16_t nb_prep;
> >  	uint16_t i;
> > +	uint16_t pad_len;
> >  	uint64_t rx_ol_flags, tx_ol_flags;
> >  	uint64_t tx_offloads;
> >  	uint32_t rx_bad_ip_csum;
> >  	uint32_t rx_bad_l4_csum;
> >  	uint32_t rx_bad_outer_l4_csum;
> >  	uint32_t rx_bad_outer_ip_csum;
> > +	uint32_t l3_off;
> >  	struct testpmd_offload_info info;
> >
> >  	/* receive a burst of packet */
> > @@ -980,6 +982,36 @@ pkt_burst_checksum_forward(struct fwd_stream
> *fs)
> >  			l3_hdr = (char *)l3_hdr + info.outer_l3_len +
> info.l2_len;
> >  		}
> >
> > +		if (info.is_tunnel) {
> > +			l3_off = info.outer_l2_len +
> > +					info.outer_l3_len +
> > +					info.l2_len;
> >
> 
> I don't know much about tunnel code but is above calculation correct for all
> tunnel protocols, like for the case inner packet over UDP, should outer l4_len
> also added etc...
According to the comments, these tunnel packets are supported.

* (1) Supported packets are:
 *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
 *   Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
 *           UDP|TCP|SCTP
 *   Ether / (vlan) / outer IP|IP6 / outer UDP / VXLAN-GPE / Ether / IP|IP6 /
 *           UDP|TCP|SCTP
 *   Ether / (vlan) / outer IP|IP6 / outer UDP / VXLAN-GPE / IP|IP6 /
 *           UDP|TCP|SCTP
 *   Ether / (vlan) / outer IP / outer UDP / GTP / IP|IP6 / UDP|TCP|SCTP
 *   Ether / (vlan) / outer IP|IP6 / GRE / Ether / IP|IP6 / UDP|TCP|SCTP
 *   Ether / (vlan) / outer IP|IP6 / GRE / IP|IP6 / UDP|TCP|SCTP
 *   Ether / (vlan) / outer IP|IP6 / IP|IP6 / UDP|TCP|SCTP
> 
> 
> > +		} else {
> > +			l3_off = info.l2_len;
> > +		}
> > +		switch (info.ethertype) {
> > +		case _htons(RTE_ETHER_TYPE_IPV4):
> > +			pad_len = rte_pktmbuf_data_len(m) -
> > +					(l3_off +
> > +					rte_be_to_cpu_16(
> > +					((struct rte_ipv4_hdr *)l3_hdr)-
> >total_length));
> > +			break;
> > +		case _htons(RTE_ETHER_TYPE_IPV6):
> > +			pad_len = rte_pktmbuf_data_len(m) -
> > +					(l3_off +
> > +					rte_be_to_cpu_16(
> > +					((struct rte_ipv6_hdr *)l3_hdr)-
> >payload_len));
> >
> 
> As far as I remember ipv6 payload_len doesn't contain the header length, so
> pad_len calculation should be different than ipv4 one, like "l4_off + l3_hdr-
> >payload_len", did you verify this code with ipv6?
You're right, I didn't notice that and didn't test it adequately. I'll fix that.
> 
> 
> > +			break;
> > +		default:
> > +			pad_len = 0;
> > +			break;
> > +		}
> > +
> > +		if (pad_len) {
> > +			rte_pktmbuf_data_len(m) =
> rte_pktmbuf_data_len(m) - pad_len;
> > +			rte_pktmbuf_pkt_len(m) = rte_pktmbuf_data_len(m);
> >
> 
> Can't received mbuf be multi-segment mbuf, as far as I can see checksum
> calculation API takes this possibility into account. If so need to check that
> possibility here before updating 'pkt_len'
You are right.

Thanks
>
  
Ferruh Yigit Nov. 20, 2023, 10:46 a.m. UTC | #17
On 11/20/2023 9:21 AM, Deng, KaiwenX wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, November 17, 2023 8:50 AM
>> To: Stephen Hemminger <stephen@networkplumber.org>
>> Cc: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org;
>> stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
>> <yidingx.zhou@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
>> Zhang, Yuying <yuying.zhang@intel.com>; Matz, Olivier
>> <olivier.matz@6wind.com>; De Lara Guarch, Pablo
>> <pablo.de.lara.guarch@intel.com>
>> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
>>
>> On 11/16/2023 10:58 PM, Stephen Hemminger wrote:
>>> On Thu, 2 Nov 2023 19:20:07 +0000
>>> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>>
>>>> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
>>>>> IEEE 802 packets may have a minimum size limit. The data fields
>>>>> should be padded when necessary. In some cases, the padding data is
>>>>> not zero. Testpmd does not trim these IP packets to the true length
>>>>> of the frame, so errors will occur when calculating TCP or UDP
>>>>> checksum.
>>>>>
>>>>
>>>> Hi Kaiwen,
>>>>
>>>> I am trying to understand the problem, what is the testcase that has
>>>> checksum error?
>>>>
>>>> Are the received mbuf data_len & pkt_len wrong? Instead of trying to
>>>> fix the mbuf during forwarding, can we fix where packet generated?
>>>
>>> The root cause is that get_udptcp_cksum_mbuf is using m->pkt_len which
>>> maybe larger than the actual data. The real issue is there and in
>>> rte_ip.h checksum code. The correct fix would be to use l3_len instead.
>>>
>>
>> I see, you are right.
>>
>> In 'rte_ipv4_udptcp_cksum_mbuf()',
>> as payload length "mbuf->pkt_len - l4_off" is used, which includes padding
>> and if padding is not zero it will end up producing wrong checksum.
>>
>>
>> I agree using 'l3_len' instead is correct fix.
>>
>> But this requires ABI/API change,
>> plus do we have any reason to keep the padding, discarding it as this patch
>> does is also simpler alternative.
>>
>>
>> Other alternative can be to zero the padding bytes. I guess standard doesn't
>> enforce them to be zero, but we can do this to remove its impact on checksum
>> calculation.
> I'm not sure if this is ok, it feels like it would reduce performance. 
> I can try this alternative if needed.
>

Yes impacts performance, so not a good alternative, please scratch it.

As discussion with Stephen and Morten, consensus is to fix SW functions
that calculates checksum.
'rte_ipv4_udptcp_cksum_mbuf()' & 'rte_ipv6_udptcp_cksum_mbuf()'.

Instead of using packet_len, those functions can use packet length,
which will make the checksum correct.


Can you please send a patch to fix those functions? I think this can be
done without changing function fingerprint, so without causing any
ABI/API break.


>>
>>
>> @Kaiwen, did you able to test this with HW offload, what is the behavior of
>> the HW, does is remove padding bytes?
>>
> I tested the HW offload case and the same tcp/udp checksum error occurs when padding is not 0, 
> But if change pkt_len to the true length of the frame, the checksum is correct.
>

I was expecting HW not impacted, since padding is part of the spec, my
assumption would be HW only take the actual payload size into account,
instead of buffer size.

Can you please double check? Which HW you are testing with, can you
please add maintainer of that HW to this discussion?

If HW requires padding to be removed, we may go with your solution.


>>
>>> It also looks like test-pmd is not validating the IP header.
>>> Both parse_ipv4() and parse_ipv6() should check if packet was truncated.
>>> Same for both UDP and TCP lengths.
>>>
>>
>
  
Ferruh Yigit Nov. 20, 2023, 10:47 a.m. UTC | #18
On 11/17/2023 4:22 PM, Stephen Hemminger wrote:
> On Fri, 17 Nov 2023 09:29:41 +0000
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>>>> I agree using 'l3_len' instead is correct fix.
>>>>
>>>> But this requires ABI/API change,
>>>> plus do we have any reason to keep the padding, discarding it as this
>>>> patch does is also simpler alternative.  
>>>
>>>
>>> Possibly an API version to change the args would work to fix.
>>>  
>>
>> rte_ipv4_udptcp_cksum_mbuf() and rte_ipv6_udptcp_cksum_mbuf() are inline
>> functions, unfortunately we can't version them.
>>
>> But those functions already gets IP header as parameter, can't we use IP
>> header to get the payload size? If so this can be fixed without updating
>> API.
> 
> Inlines are easier. Just make a fixed new function and make sure the old
> one is not used.  They shouldn't have been inline in the first place.
>

I guess inlines as because of performance concerns, since it is in datapath.
  
Kaiwen Deng Nov. 22, 2023, 3:04 a.m. UTC | #19
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Monday, November 20, 2023 6:46 PM
> To: Deng, KaiwenX <kaiwenx.deng@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>; Morten Brørup
> <mb@smartsharesystems.com>
> Cc: dev@dpdk.org; stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>;
> Zhou, YidingX <yidingx.zhou@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Matz, Olivier <olivier.matz@6wind.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
> 
> On 11/20/2023 9:21 AM, Deng, KaiwenX wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Friday, November 17, 2023 8:50 AM
> >> To: Stephen Hemminger <stephen@networkplumber.org>
> >> Cc: Deng, KaiwenX <kaiwenx.deng@intel.com>; dev@dpdk.org;
> >> stable@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> >> <yidingx.zhou@intel.com>; Singh, Aman Deep
> >> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> >> Matz, Olivier <olivier.matz@6wind.com>; De Lara Guarch, Pablo
> >> <pablo.de.lara.guarch@intel.com>
> >> Subject: Re: [PATCH] app/test-pmd: fix L4 checksum with padding data
> >>
> >> On 11/16/2023 10:58 PM, Stephen Hemminger wrote:
> >>> On Thu, 2 Nov 2023 19:20:07 +0000
> >>> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>>
> >>>> On 8/4/2023 9:28 AM, Kaiwen Deng wrote:
> >>>>> IEEE 802 packets may have a minimum size limit. The data fields
> >>>>> should be padded when necessary. In some cases, the padding data
> >>>>> is not zero. Testpmd does not trim these IP packets to the true
> >>>>> length of the frame, so errors will occur when calculating TCP or
> >>>>> UDP checksum.
> >>>>>
> >>>>
> >>>> Hi Kaiwen,
> >>>>
> >>>> I am trying to understand the problem, what is the testcase that
> >>>> has checksum error?
> >>>>
> >>>> Are the received mbuf data_len & pkt_len wrong? Instead of trying
> >>>> to fix the mbuf during forwarding, can we fix where packet generated?
> >>>
> >>> The root cause is that get_udptcp_cksum_mbuf is using m->pkt_len
> >>> which maybe larger than the actual data. The real issue is there and
> >>> in rte_ip.h checksum code. The correct fix would be to use l3_len instead.
> >>>
> >>
> >> I see, you are right.
> >>
> >> In 'rte_ipv4_udptcp_cksum_mbuf()',
> >> as payload length "mbuf->pkt_len - l4_off" is used, which includes
> >> padding and if padding is not zero it will end up producing wrong
> checksum.
> >>
> >>
> >> I agree using 'l3_len' instead is correct fix.
> >>
> >> But this requires ABI/API change,
> >> plus do we have any reason to keep the padding, discarding it as this
> >> patch does is also simpler alternative.
> >>
> >>
> >> Other alternative can be to zero the padding bytes. I guess standard
> >> doesn't enforce them to be zero, but we can do this to remove its
> >> impact on checksum calculation.
> > I'm not sure if this is ok, it feels like it would reduce performance.
> > I can try this alternative if needed.
> >
> 
> Yes impacts performance, so not a good alternative, please scratch it.
> 
> As discussion with Stephen and Morten, consensus is to fix SW functions that
> calculates checksum.
> 'rte_ipv4_udptcp_cksum_mbuf()' & 'rte_ipv6_udptcp_cksum_mbuf()'.
> 
> Instead of using packet_len, those functions can use packet length, which will
> make the checksum correct.
> 
> 
> Can you please send a patch to fix those functions? I think this can be done
> without changing function fingerprint, so without causing any ABI/API break.
> 
> 
> >>
> >>
> >> @Kaiwen, did you able to test this with HW offload, what is the
> >> behavior of the HW, does is remove padding bytes?
> >>
> > I tested the HW offload case and the same tcp/udp checksum error
> > occurs when padding is not 0, But if change pkt_len to the true length of the
> frame, the checksum is correct.
> >
> 
> I was expecting HW not impacted, since padding is part of the spec, my
> assumption would be HW only take the actual payload size into account,
> instead of buffer size.
> 
> Can you please double check? Which HW you are testing with, can you please
> add maintainer of that HW to this discussion?

I've tested hw offloads for udp and tcp with Intel E810. The hardware takes the size 

of the buffer into account when calculating the udp/tcp checksum, not the size of the 

actual payload.

Hi @Zhang, Qi Z,

Can you help confirm if this behavior is normal in hw?

Thanks!
> 
> If HW requires padding to be removed, we may go with your solution.
> 
> 
> >>
> >>> It also looks like test-pmd is not validating the IP header.
> >>> Both parse_ipv4() and parse_ipv6() should check if packet was truncated.
> >>> Same for both UDP and TCP lengths.
> >>>
> >>
> >
  

Patch

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 7af635e3f7..58b72b714a 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -853,12 +853,14 @@  pkt_burst_checksum_forward(struct fwd_stream *fs)
 	uint16_t nb_rx;
 	uint16_t nb_prep;
 	uint16_t i;
+	uint16_t pad_len;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint64_t tx_offloads;
 	uint32_t rx_bad_ip_csum;
 	uint32_t rx_bad_l4_csum;
 	uint32_t rx_bad_outer_l4_csum;
 	uint32_t rx_bad_outer_ip_csum;
+	uint32_t l3_off;
 	struct testpmd_offload_info info;
 
 	/* receive a burst of packet */
@@ -980,6 +982,36 @@  pkt_burst_checksum_forward(struct fwd_stream *fs)
 			l3_hdr = (char *)l3_hdr + info.outer_l3_len + info.l2_len;
 		}
 
+		if (info.is_tunnel) {
+			l3_off = info.outer_l2_len +
+					info.outer_l3_len +
+					info.l2_len;
+		} else {
+			l3_off = info.l2_len;
+		}
+		switch (info.ethertype) {
+		case _htons(RTE_ETHER_TYPE_IPV4):
+			pad_len = rte_pktmbuf_data_len(m) -
+					(l3_off +
+					rte_be_to_cpu_16(
+					((struct rte_ipv4_hdr *)l3_hdr)->total_length));
+			break;
+		case _htons(RTE_ETHER_TYPE_IPV6):
+			pad_len = rte_pktmbuf_data_len(m) -
+					(l3_off +
+					rte_be_to_cpu_16(
+					((struct rte_ipv6_hdr *)l3_hdr)->payload_len));
+			break;
+		default:
+			pad_len = 0;
+			break;
+		}
+
+		if (pad_len) {
+			rte_pktmbuf_data_len(m) = rte_pktmbuf_data_len(m) - pad_len;
+			rte_pktmbuf_pkt_len(m) = rte_pktmbuf_data_len(m);
+		}
+
 		/* step 2: depending on user command line configuration,
 		 * recompute checksum either in software or flag the
 		 * mbuf to offload the calculation to the NIC. If TSO