[dpdk-dev,1/2] net/tap: add tun support

Message ID 1522705068-18198-1-git-send-email-vipin.varghese@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Varghese, Vipin April 2, 2018, 9:37 p.m. UTC
  The change adds functional TUN PMD logic to the existing TAP PMD.
TUN PMD can be initialized with 'net_tunX' where 'X' represents unique id.
PMD supports argument interface, while MAC address and remote are not
supported.

Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>
---

Changes in V3:
 - fix the TUN kernel packet error - Vipin
 - seperate out function from logging - Ferruh
 - add TUN PMD_REGISTER_PARAM_STRING - Ferruh

Changes in V2:
 - updated the documentation word error - Pascal
---
 drivers/net/tap/rte_eth_tap.c | 117 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 105 insertions(+), 12 deletions(-)
  

Comments

Ferruh Yigit April 6, 2018, 5:11 p.m. UTC | #1
On 4/2/2018 10:37 PM, Vipin Varghese wrote:
> The change adds functional TUN PMD logic to the existing TAP PMD.
> TUN PMD can be initialized with 'net_tunX' where 'X' represents unique id.
> PMD supports argument interface, while MAC address and remote are not
> supported.
> 
> Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>
> ---
> 
> Changes in V3:
>  - fix the TUN kernel packet error - Vipin
>  - seperate out function from logging - Ferruh
>  - add TUN PMD_REGISTER_PARAM_STRING - Ferruh
> 
> Changes in V2:
>  - updated the documentation word error - Pascal

For series,
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
  
Ferruh Yigit April 6, 2018, 5:19 p.m. UTC | #2
On 4/6/2018 6:11 PM, Ferruh Yigit wrote:
> On 4/2/2018 10:37 PM, Vipin Varghese wrote:
>> The change adds functional TUN PMD logic to the existing TAP PMD.
>> TUN PMD can be initialized with 'net_tunX' where 'X' represents unique id.
>> PMD supports argument interface, while MAC address and remote are not
>> supported.
>>
>> Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>
>> ---
>>
>> Changes in V3:
>>  - fix the TUN kernel packet error - Vipin
>>  - seperate out function from logging - Ferruh
>>  - add TUN PMD_REGISTER_PARAM_STRING - Ferruh
>>
>> Changes in V2:
>>  - updated the documentation word error - Pascal

> For series,
> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Acked-by: Pascal Mazon <pascal.mazon@6wind.com>

Series applied to dpdk-next-net/master, thanks.
  
Ophir Munk April 12, 2018, 11:49 a.m. UTC | #3
Hi Vipin,
This patch (adding TUN to TAP) has been Acked and accepted in next-net branch.
I have some questions regarding the implementation (please find below).

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vipin Varghese
> Sent: Tuesday, April 03, 2018 12:38 AM
> To: dev@dpdk.org; pascal.mazon@6wind.com; ferruh.yigit@intel.com
> Cc: Vipin Varghese <vipin.varghese@intel.com>
> Subject: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> 
> The change adds functional TUN PMD logic to the existing TAP PMD.
> TUN PMD can be initialized with 'net_tunX' where 'X' represents unique id.
> PMD supports argument interface, while MAC address and remote are not
> supported.
> 

[...]

> 
> +		/*
> +		 * TUN and TAP are created with IFF_NO_PI disabled.
> +		 * For TUN PMD this mandatory as fields are used by
> +		 * Kernel tun.c to determine whether its IP or non IP
> +		 * packets.
> +		 *
> +		 * The logic fetches the first byte of data from mbuf.
> +		 * compares whether its v4 or v6. If none matches default
> +		 * value 0x00 is taken for protocol field.
> +		 */
> +		char *buff_data = rte_pktmbuf_mtod(seg, void *);
> +		j = (*buff_data & 0xf0);
> +		if (j & (0x40 | 0x60))
> +			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> +

1. Accessing the first byte here assumes it is the first IP header byte (layer 3) which is correct for TUN.
For TAP however the first byte belongs to Ethernet destination address (layer 2). 
Please explain how this logic will work for TAP.

2. If the first TUN byte contains 0x2X (which is neither IPv4 nor IPv6) it will end up by setting ip.proto as 0xdd86. 
Please explain how this logic will work for non-IP packets in TUN
  
Varghese, Vipin April 13, 2018, 3:18 a.m. UTC | #4
Hi Ophir,

Please find my answers inline to the queries.

> -----Original Message-----
> From: Ophir Munk [mailto:ophirmu@mellanox.com]
> Sent: Thursday, April 12, 2018 5:19 PM
> To: Varghese, Vipin <vipin.varghese@intel.com>; dev@dpdk.org;
> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas
> Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> Shahaf Shuler <shahafs@mellanox.com>
> Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> 
> Hi Vipin,
> This patch (adding TUN to TAP) has been Acked and accepted in next-net
> branch.
> I have some questions regarding the implementation (please find below).
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vipin Varghese
> > Sent: Tuesday, April 03, 2018 12:38 AM
> > To: dev@dpdk.org; pascal.mazon@6wind.com; ferruh.yigit@intel.com
> > Cc: Vipin Varghese <vipin.varghese@intel.com>
> > Subject: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> >
> > The change adds functional TUN PMD logic to the existing TAP PMD.
> > TUN PMD can be initialized with 'net_tunX' where 'X' represents unique id.
> > PMD supports argument interface, while MAC address and remote are not
> > supported.
> >
> 
> [...]
> 
> >
> > +		/*
> > +		 * TUN and TAP are created with IFF_NO_PI disabled.
> > +		 * For TUN PMD this mandatory as fields are used by
> > +		 * Kernel tun.c to determine whether its IP or non IP
> > +		 * packets.
> > +		 *
> > +		 * The logic fetches the first byte of data from mbuf.
> > +		 * compares whether its v4 or v6. If none matches default
> > +		 * value 0x00 is taken for protocol field.
> > +		 */
> > +		char *buff_data = rte_pktmbuf_mtod(seg, void *);
> > +		j = (*buff_data & 0xf0);
> > +		if (j & (0x40 | 0x60))
> > +			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> > +
> 
> 1. Accessing the first byte here assumes it is the first IP header byte (layer 3)
> which is correct for TUN.
> For TAP however the first byte belongs to Ethernet destination address
> (layer 2).
> Please explain how this logic will work for TAP.

Based on linux code base '/driver/net/tap.c' and '/driver/net/tun.c' from 3.13. to  4.16, 

Please find my observation below
1. File: tun.c, function: tun_get_user, check for 'tun->flags & TUN_TYPE_MASK' is done and if non ip is taken counter 'rx_dropped' is updated.
2. File: tap.c, there are no checks for 'tap->flags' for IFF_NO_PI in rx data path. Counter 'rx_dropped' is updated in 'tap_handle_frame'. 

Please find my reasoning below
1. First approach was to have separate function for tap and tun TX and RX. But this will introduce code duplication, hence reworked the code as above.
2. During my internal testing assigning dummy value for protocol field in TAP packets, did not show a difference in behaviour. May be there are some specific cases this failing. 

If there difference in behaviour, can please share the same?

> 
> 2. If the first TUN byte contains 0x2X (which is neither IPv4 nor IPv6) it will
> end up by setting ip.proto as 0xdd86.
> Please explain how this logic will work for non-IP packets in TUN

I see your point. You are correct about this. Thanks for pointing out, may I send correction for this as

"""
-		if (j & (0x40 | 0x60))
-			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
+		pi.proto = (j == 0x40) ? 0x0008 : 
+					(j == 0x60) ? 0xdd86 :
+					0x00;
"""
  
Ophir Munk April 20, 2018, 9:49 p.m. UTC | #5
Hi Vipin,

Please find comments inline.

> -----Original Message-----
> From: Varghese, Vipin [mailto:vipin.varghese@intel.com]
> Sent: Friday, April 13, 2018 6:18 AM
> To: Ophir Munk <ophirmu@mellanox.com>; dev@dpdk.org;
> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas
> Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> Shahaf Shuler <shahafs@mellanox.com>
> Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> 
> Hi Ophir,
> 
> Please find my answers inline to the queries.
> 
> > -----Original Message-----
> > From: Ophir Munk [mailto:ophirmu@mellanox.com]
> > Sent: Thursday, April 12, 2018 5:19 PM
> > To: Varghese, Vipin <vipin.varghese@intel.com>; dev@dpdk.org;
> > pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas
> > Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> > Shahaf Shuler <shahafs@mellanox.com>
> > Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> >
> > Hi Vipin,
> > This patch (adding TUN to TAP) has been Acked and accepted in next-net
> > branch.
> > I have some questions regarding the implementation (please find below).
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vipin Varghese
> > > Sent: Tuesday, April 03, 2018 12:38 AM
> > > To: dev@dpdk.org; pascal.mazon@6wind.com; ferruh.yigit@intel.com
> > > Cc: Vipin Varghese <vipin.varghese@intel.com>
> > > Subject: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> > >
> > > The change adds functional TUN PMD logic to the existing TAP PMD.
> > > TUN PMD can be initialized with 'net_tunX' where 'X' represents unique
> id.
> > > PMD supports argument interface, while MAC address and remote are
> > > not supported.
> > >
> >
> > [...]
> >
> > >
> > > +		/*
> > > +		 * TUN and TAP are created with IFF_NO_PI disabled.
> > > +		 * For TUN PMD this mandatory as fields are used by
> > > +		 * Kernel tun.c to determine whether its IP or non IP
> > > +		 * packets.
> > > +		 *
> > > +		 * The logic fetches the first byte of data from mbuf.
> > > +		 * compares whether its v4 or v6. If none matches default
> > > +		 * value 0x00 is taken for protocol field.
> > > +		 */
> > > +		char *buff_data = rte_pktmbuf_mtod(seg, void *);
> > > +		j = (*buff_data & 0xf0);
> > > +		if (j & (0x40 | 0x60))
> > > +			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> > > +
> >
> > 1. Accessing the first byte here assumes it is the first IP header
> > byte (layer 3) which is correct for TUN.
> > For TAP however the first byte belongs to Ethernet destination address
> > (layer 2).
> > Please explain how this logic will work for TAP.
> 
> Based on linux code base '/driver/net/tap.c' and '/driver/net/tun.c' from
> 3.13. to  4.16,
> 
> Please find my observation below
> 1. File: tun.c, function: tun_get_user, check for 'tun->flags &
> TUN_TYPE_MASK' is done and if non ip is taken counter 'rx_dropped' is
> updated.
> 2. File: tap.c, there are no checks for 'tap->flags' for IFF_NO_PI in rx data
> path. Counter 'rx_dropped' is updated in 'tap_handle_frame'.
> 

I understand that in kernel implementation there is no check for tap->flags in file tap.c, however I think there is a bug in dpdk rte_eth_tap.c file.
Please find below an example which demonstrates this claim.

> Please find my reasoning below
> 1. First approach was to have separate function for tap and tun TX and RX.
> But this will introduce code duplication, hence reworked the code as above.

I agree. Avoiding code duplication is a good approach. 

> 2. During my internal testing assigning dummy value for protocol field in TAP
> packets, did not show a difference in behaviour. May be there are some
> specific cases this failing.
> 
> If there difference in behaviour, can please share the same?
> 

Please consider the following example:
I am running testpmd with a TAP device, --forward-mode=csum. 
I am injecting a TCP packet, which is forwarded back (mac addresses swapped) to the sender. 
Using gdb I set a breakpoint at pmd_tx_burst() in file rte_eth_tap.c

Looking at the following code inside pmd_tx_burst():

527                 char *buff_data = rte_pktmbuf_mtod(seg, void *);
528                 j = (*buff_data & 0xf0);
529                 pi.proto = (j == 0x40) ? 0x0008 :
530                                 (j == 0x60) ? 0xdd86 : 0x00;

I am printing the first 20 bytes of buff_data in line 527:

(gdb) p/x *(unsigned char *)buff_data@20
$3 = {0x0, 0x25, 0x88, 0x10, 0x66, 0x2, 0xf4, 0x52, 0x14, 0x7a, 0x59, 0x81, 0x8, 0x0, 0x45, 0x0, 0x4, 0xdf, 0x0, 0x1}

The gdb printout refers to:
6 bytes of destination MAC address: 0x0, 0x25, 0x88, 0x10, 0x66, 0x2
6 bytes of source MAC address: 0xf4, 0x52, 0x14, 0x7a, 0x59, 0x81
2 bytes of Ethernet type: 0x8, 0x0 - (IPv4)
IP header starting with 0x45, ... which is the byte (0x45) that "j" should have looked at

In the case of TAP - buff_data starts with the destination MAC address of the sender (0x0, ...). 
The code in line 528 expects that buff_data would start with an IP header protocol (e.g. 0x45), but it is not the case for TAP.
In my case j=0x0 (line 528) which is harmless (as it ends up with setting pi.proto=0x00, which is correct for TAP).
However, if the sender had an Intel NIC - the destination MAC address could have started with:
$3 = {0x40, 0x25, 0xC2, ...
Or-
$3 = {0x64, 0xD4, 0xDA, ...

as 4025C2 and 64D4DA are reserved prefixes for Intel Ethernet MAC addresses, see: http://www.coffer.com/mac_find/?string=intel

In this case pi.proto could end up with 0x0008 or 0xdd86 instead of 0x0 as expected for TAP.

I hope that this example clarifies the bug I am referring to. 

> >
> > 2. If the first TUN byte contains 0x2X (which is neither IPv4 nor
> > IPv6) it will end up by setting ip.proto as 0xdd86.
> > Please explain how this logic will work for non-IP packets in TUN
> 
> I see your point. You are correct about this. Thanks for pointing out, may I
> send correction for this as
> 
> """
> -		if (j & (0x40 | 0x60))
> -			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> +		pi.proto = (j == 0x40) ? 0x0008 :
> +					(j == 0x60) ? 0xdd86 :
> +					0x00;
> """
  
Ophir Munk April 20, 2018, 9:58 p.m. UTC | #6
Hi Vipin,
I missed your point:
You claim that TAP should work regardless of any pi.proto values.
Can you confirm that for ALL kernels versions (past and future)?

> -----Original Message-----
> From: Ophir Munk
> Sent: Saturday, April 21, 2018 12:49 AM
> To: Varghese, Vipin <vipin.varghese@intel.com>; dev@dpdk.org;
> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas
> Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> Shahaf Shuler <shahafs@mellanox.com>
> Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> 
> Hi Vipin,
> 
> Please find comments inline.
> 
> > -----Original Message-----
> > From: Varghese, Vipin [mailto:vipin.varghese@intel.com]
> > Sent: Friday, April 13, 2018 6:18 AM
> > To: Ophir Munk <ophirmu@mellanox.com>; dev@dpdk.org;
> > pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas
> > Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> > Shahaf Shuler <shahafs@mellanox.com>
> > Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> >
> > Hi Ophir,
> >
> > Please find my answers inline to the queries.
> >
> > > -----Original Message-----
> > > From: Ophir Munk [mailto:ophirmu@mellanox.com]
> > > Sent: Thursday, April 12, 2018 5:19 PM
> > > To: Varghese, Vipin <vipin.varghese@intel.com>; dev@dpdk.org;
> > > pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > > Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> > > <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
> > > Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> > >
> > > Hi Vipin,
> > > This patch (adding TUN to TAP) has been Acked and accepted in
> > > next-net branch.
> > > I have some questions regarding the implementation (please find below).
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vipin
> > > > Varghese
> > > > Sent: Tuesday, April 03, 2018 12:38 AM
> > > > To: dev@dpdk.org; pascal.mazon@6wind.com; ferruh.yigit@intel.com
> > > > Cc: Vipin Varghese <vipin.varghese@intel.com>
> > > > Subject: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> > > >
> > > > The change adds functional TUN PMD logic to the existing TAP PMD.
> > > > TUN PMD can be initialized with 'net_tunX' where 'X' represents
> > > > unique
> > id.
> > > > PMD supports argument interface, while MAC address and remote are
> > > > not supported.
> > > >
> > >
> > > [...]
> > >
> > > >
> > > > +		/*
> > > > +		 * TUN and TAP are created with IFF_NO_PI disabled.
> > > > +		 * For TUN PMD this mandatory as fields are used by
> > > > +		 * Kernel tun.c to determine whether its IP or non IP
> > > > +		 * packets.
> > > > +		 *
> > > > +		 * The logic fetches the first byte of data from mbuf.
> > > > +		 * compares whether its v4 or v6. If none matches default
> > > > +		 * value 0x00 is taken for protocol field.
> > > > +		 */
> > > > +		char *buff_data = rte_pktmbuf_mtod(seg, void *);
> > > > +		j = (*buff_data & 0xf0);
> > > > +		if (j & (0x40 | 0x60))
> > > > +			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> > > > +
> > >
> > > 1. Accessing the first byte here assumes it is the first IP header
> > > byte (layer 3) which is correct for TUN.
> > > For TAP however the first byte belongs to Ethernet destination
> > > address (layer 2).
> > > Please explain how this logic will work for TAP.
> >
> > Based on linux code base '/driver/net/tap.c' and '/driver/net/tun.c'
> > from 3.13. to  4.16,
> >
> > Please find my observation below
> > 1. File: tun.c, function: tun_get_user, check for 'tun->flags &
> > TUN_TYPE_MASK' is done and if non ip is taken counter 'rx_dropped' is
> > updated.
> > 2. File: tap.c, there are no checks for 'tap->flags' for IFF_NO_PI in
> > rx data path. Counter 'rx_dropped' is updated in 'tap_handle_frame'.
> >
> 
> I understand that in kernel implementation there is no check for tap->flags in
> file tap.c, however I think there is a bug in dpdk rte_eth_tap.c file.
> Please find below an example which demonstrates this claim.
> 
> > Please find my reasoning below
> > 1. First approach was to have separate function for tap and tun TX and RX.
> > But this will introduce code duplication, hence reworked the code as
> above.
> 
> I agree. Avoiding code duplication is a good approach.
> 
> > 2. During my internal testing assigning dummy value for protocol field
> > in TAP packets, did not show a difference in behaviour. May be there
> > are some specific cases this failing.
> >
> > If there difference in behaviour, can please share the same?
> >
> 
> Please consider the following example:
> I am running testpmd with a TAP device, --forward-mode=csum.
> I am injecting a TCP packet, which is forwarded back (mac addresses
> swapped) to the sender.
> Using gdb I set a breakpoint at pmd_tx_burst() in file rte_eth_tap.c
> 
> Looking at the following code inside pmd_tx_burst():
> 
> 527                 char *buff_data = rte_pktmbuf_mtod(seg, void *);
> 528                 j = (*buff_data & 0xf0);
> 529                 pi.proto = (j == 0x40) ? 0x0008 :
> 530                                 (j == 0x60) ? 0xdd86 : 0x00;
> 
> I am printing the first 20 bytes of buff_data in line 527:
> 
> (gdb) p/x *(unsigned char *)buff_data@20
> $3 = {0x0, 0x25, 0x88, 0x10, 0x66, 0x2, 0xf4, 0x52, 0x14, 0x7a, 0x59, 0x81,
> 0x8, 0x0, 0x45, 0x0, 0x4, 0xdf, 0x0, 0x1}
> 
> The gdb printout refers to:
> 6 bytes of destination MAC address: 0x0, 0x25, 0x88, 0x10, 0x66, 0x2
> 6 bytes of source MAC address: 0xf4, 0x52, 0x14, 0x7a, 0x59, 0x81
> 2 bytes of Ethernet type: 0x8, 0x0 - (IPv4) IP header starting with 0x45, ...
> which is the byte (0x45) that "j" should have looked at
> 
> In the case of TAP - buff_data starts with the destination MAC address of the
> sender (0x0, ...).
> The code in line 528 expects that buff_data would start with an IP header
> protocol (e.g. 0x45), but it is not the case for TAP.
> In my case j=0x0 (line 528) which is harmless (as it ends up with setting
> pi.proto=0x00, which is correct for TAP).
> However, if the sender had an Intel NIC - the destination MAC address could
> have started with:
> $3 = {0x40, 0x25, 0xC2, ...
> Or-
> $3 = {0x64, 0xD4, 0xDA, ...
> 
> as 4025C2 and 64D4DA are reserved prefixes for Intel Ethernet MAC
> addresses, see: http://www.coffer.com/mac_find/?string=intel
> 
> In this case pi.proto could end up with 0x0008 or 0xdd86 instead of 0x0 as
> expected for TAP.
> 
> I hope that this example clarifies the bug I am referring to.
> 
> > >
> > > 2. If the first TUN byte contains 0x2X (which is neither IPv4 nor
> > > IPv6) it will end up by setting ip.proto as 0xdd86.
> > > Please explain how this logic will work for non-IP packets in TUN
> >
> > I see your point. You are correct about this. Thanks for pointing out,
> > may I send correction for this as
> >
> > """
> > -		if (j & (0x40 | 0x60))
> > -			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> > +		pi.proto = (j == 0x40) ? 0x0008 :
> > +					(j == 0x60) ? 0xdd86 :
> > +					0x00;
> > """
  
Varghese, Vipin April 21, 2018, 3:09 p.m. UTC | #7
Hi Ophir,

<Snip>

> Hi Vipin,
> I missed your point:
> You claim that TAP should work regardless of any pi.proto values.
> Can you confirm that for ALL kernels versions (past and future)?

I have tested with 3.13.0 , 4.4.0 with patch fix.

> 
> > -----Original Message-----
> > From: Ophir Munk
> > Sent: Saturday, April 21, 2018 12:49 AM
> > To: Varghese, Vipin <vipin.varghese@intel.com>; dev@dpdk.org;
> > pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas
> > Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> > Shahaf Shuler <shahafs@mellanox.com>
> > Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> >
> > Hi Vipin,
> >
> > Please find comments inline.
> >
> > > -----Original Message-----
> > > From: Varghese, Vipin [mailto:vipin.varghese@intel.com]
> > > Sent: Friday, April 13, 2018 6:18 AM
> > > To: Ophir Munk <ophirmu@mellanox.com>; dev@dpdk.org;
> > > pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > > Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> > > <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
> > > Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> > >

<Snip>

> > > > 1. Accessing the first byte here assumes it is the first IP header
> > > > byte (layer 3) which is correct for TUN.
> > > > For TAP however the first byte belongs to Ethernet destination
> > > > address (layer 2).
> > > > Please explain how this logic will work for TAP.
> > >
> > > Based on linux code base '/driver/net/tap.c' and '/driver/net/tun.c'
> > > from 3.13. to  4.16,
> > >
> > > Please find my observation below
> > > 1. File: tun.c, function: tun_get_user, check for 'tun->flags &
> > > TUN_TYPE_MASK' is done and if non ip is taken counter 'rx_dropped'
> > > is updated.
> > > 2. File: tap.c, there are no checks for 'tap->flags' for IFF_NO_PI
> > > in rx data path. Counter 'rx_dropped' is updated in 'tap_handle_frame'.
> > >
> >
> > I understand that in kernel implementation there is no check for
> > tap->flags in file tap.c, however I think there is a bug in dpdk rte_eth_tap.c
> file.
> > Please find below an example which demonstrates this claim.
> >
> > > Please find my reasoning below
> > > 1. First approach was to have separate function for tap and tun TX and RX.
> > > But this will introduce code duplication, hence reworked the code as
> > above.
> >
> > I agree. Avoiding code duplication is a good approach.
> >
> > > 2. During my internal testing assigning dummy value for protocol
> > > field in TAP packets, did not show a difference in behaviour. May be
> > > there are some specific cases this failing.
> > >
> > > If there difference in behaviour, can please share the same?
> > >
> >
> > Please consider the following example:
> > I am running testpmd with a TAP device, --forward-mode=csum.
> > I am injecting a TCP packet, which is forwarded back (mac addresses
> > swapped) to the sender.
> > Using gdb I set a breakpoint at pmd_tx_burst() in file rte_eth_tap.c
> >
> > Looking at the following code inside pmd_tx_burst():
> >
> > 527                 char *buff_data = rte_pktmbuf_mtod(seg, void *);
> > 528                 j = (*buff_data & 0xf0);
> > 529                 pi.proto = (j == 0x40) ? 0x0008 :
> > 530                                 (j == 0x60) ? 0xdd86 : 0x00;
> >
> > I am printing the first 20 bytes of buff_data in line 527:
> >
> > (gdb) p/x *(unsigned char *)buff_data@20
> > $3 = {0x0, 0x25, 0x88, 0x10, 0x66, 0x2, 0xf4, 0x52, 0x14, 0x7a, 0x59,
> > 0x81, 0x8, 0x0, 0x45, 0x0, 0x4, 0xdf, 0x0, 0x1}
> >
> > The gdb printout refers to:
> > 6 bytes of destination MAC address: 0x0, 0x25, 0x88, 0x10, 0x66, 0x2
> > 6 bytes of source MAC address: 0xf4, 0x52, 0x14, 0x7a, 0x59, 0x81
> > 2 bytes of Ethernet type: 0x8, 0x0 - (IPv4) IP header starting with 0x45, ...
> > which is the byte (0x45) that "j" should have looked at
> >
> > In the case of TAP - buff_data starts with the destination MAC address
> > of the sender (0x0, ...).
> > The code in line 528 expects that buff_data would start with an IP
> > header protocol (e.g. 0x45), but it is not the case for TAP.
> > In my case j=0x0 (line 528) which is harmless (as it ends up with
> > setting pi.proto=0x00, which is correct for TAP).
> > However, if the sender had an Intel NIC - the destination MAC address
> > could have started with:
> > $3 = {0x40, 0x25, 0xC2, ...
> > Or-
> > $3 = {0x64, 0xD4, 0xDA, ...
> >
> > as 4025C2 and 64D4DA are reserved prefixes for Intel Ethernet MAC
> > addresses, see: http://www.coffer.com/mac_find/?string=intel
> >
> > In this case pi.proto could end up with 0x0008 or 0xdd86 instead of
> > 0x0 as expected for TAP.
> >
> > I hope that this example clarifies the bug I am referring to.
> >

Thanks for sharing detailed example overview. But as you mentioned this will break ' 4025C2' and ' 64D4DA', 
This will not solve for the correction patch  ' https://dpdk.org/dev/patchwork/patch/37986/'. 

Only choice left is separate tx_burst for TAP and TUN PMD, as we do not want to check PMD type on each call. 

Questions:
1) Is this ok to split tx_burst and have redundant code?
2) Does applications transparently send packets coming from Physical NIC to TAP interface? Does not the application
Modifies the DEST MAC addr to TAP interface?

> > > >
> > > > 2. If the first TUN byte contains 0x2X (which is neither IPv4 nor
> > > > IPv6) it will end up by setting ip.proto as 0xdd86.
> > > > Please explain how this logic will work for non-IP packets in TUN
> > >
> > > I see your point. You are correct about this. Thanks for pointing
> > > out, may I send correction for this as
> > >
> > > """
> > > -		if (j & (0x40 | 0x60))
> > > -			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> > > +		pi.proto = (j == 0x40) ? 0x0008 :
> > > +					(j == 0x60) ? 0xdd86 :
> > > +					0x00;
> > > """
  
Varghese, Vipin April 23, 2018, 12:58 p.m. UTC | #8
Hi Ophir,

Can you help me with the investigation with the following information?
1) The kernel or distro in which the TAP proto flag set breaks the logic?
2) Is the above still valid even after applying the patch ' https://dpdk.org/dev/patchwork/patch/37986/'?

Note: I am testing with 3.13.0, 4.4.0 and 4.13.0.

Thanks
Vipin Varghese

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Varghese, Vipin
> Sent: Saturday, April 21, 2018 8:40 PM
> To: Ophir Munk <ophirmu@mellanox.com>; dev@dpdk.org;
> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas
> Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> 
> Hi Ophir,
> 
> <Snip>
> 
> > Hi Vipin,
> > I missed your point:
> > You claim that TAP should work regardless of any pi.proto values.
> > Can you confirm that for ALL kernels versions (past and future)?
> 
> I have tested with 3.13.0 , 4.4.0 with patch fix.
> 
> >
> > > -----Original Message-----
> > > From: Ophir Munk
> > > Sent: Saturday, April 21, 2018 12:49 AM
> > > To: Varghese, Vipin <vipin.varghese@intel.com>; dev@dpdk.org;
> > > pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > > Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> > > <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
> > > Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> > >
> > > Hi Vipin,
> > >
> > > Please find comments inline.
> > >
> > > > -----Original Message-----
> > > > From: Varghese, Vipin [mailto:vipin.varghese@intel.com]
> > > > Sent: Friday, April 13, 2018 6:18 AM
> > > > To: Ophir Munk <ophirmu@mellanox.com>; dev@dpdk.org;
> > > > pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > > > Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> > > > <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
> > > > Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> > > >
> 
> <Snip>
> 
> > > > > 1. Accessing the first byte here assumes it is the first IP
> > > > > header byte (layer 3) which is correct for TUN.
> > > > > For TAP however the first byte belongs to Ethernet destination
> > > > > address (layer 2).
> > > > > Please explain how this logic will work for TAP.
> > > >
> > > > Based on linux code base '/driver/net/tap.c' and '/driver/net/tun.c'
> > > > from 3.13. to  4.16,
> > > >
> > > > Please find my observation below
> > > > 1. File: tun.c, function: tun_get_user, check for 'tun->flags &
> > > > TUN_TYPE_MASK' is done and if non ip is taken counter 'rx_dropped'
> > > > is updated.
> > > > 2. File: tap.c, there are no checks for 'tap->flags' for IFF_NO_PI
> > > > in rx data path. Counter 'rx_dropped' is updated in 'tap_handle_frame'.
> > > >
> > >
> > > I understand that in kernel implementation there is no check for
> > > tap->flags in file tap.c, however I think there is a bug in dpdk
> > > tap->rte_eth_tap.c
> > file.
> > > Please find below an example which demonstrates this claim.
> > >
> > > > Please find my reasoning below
> > > > 1. First approach was to have separate function for tap and tun TX and
> RX.
> > > > But this will introduce code duplication, hence reworked the code
> > > > as
> > > above.
> > >
> > > I agree. Avoiding code duplication is a good approach.
> > >
> > > > 2. During my internal testing assigning dummy value for protocol
> > > > field in TAP packets, did not show a difference in behaviour. May
> > > > be there are some specific cases this failing.
> > > >
> > > > If there difference in behaviour, can please share the same?
> > > >
> > >
> > > Please consider the following example:
> > > I am running testpmd with a TAP device, --forward-mode=csum.
> > > I am injecting a TCP packet, which is forwarded back (mac addresses
> > > swapped) to the sender.
> > > Using gdb I set a breakpoint at pmd_tx_burst() in file rte_eth_tap.c
> > >
> > > Looking at the following code inside pmd_tx_burst():
> > >
> > > 527                 char *buff_data = rte_pktmbuf_mtod(seg, void *);
> > > 528                 j = (*buff_data & 0xf0);
> > > 529                 pi.proto = (j == 0x40) ? 0x0008 :
> > > 530                                 (j == 0x60) ? 0xdd86 : 0x00;
> > >
> > > I am printing the first 20 bytes of buff_data in line 527:
> > >
> > > (gdb) p/x *(unsigned char *)buff_data@20
> > > $3 = {0x0, 0x25, 0x88, 0x10, 0x66, 0x2, 0xf4, 0x52, 0x14, 0x7a,
> > > 0x59, 0x81, 0x8, 0x0, 0x45, 0x0, 0x4, 0xdf, 0x0, 0x1}
> > >
> > > The gdb printout refers to:
> > > 6 bytes of destination MAC address: 0x0, 0x25, 0x88, 0x10, 0x66, 0x2
> > > 6 bytes of source MAC address: 0xf4, 0x52, 0x14, 0x7a, 0x59, 0x81
> > > 2 bytes of Ethernet type: 0x8, 0x0 - (IPv4) IP header starting with 0x45, ...
> > > which is the byte (0x45) that "j" should have looked at
> > >
> > > In the case of TAP - buff_data starts with the destination MAC
> > > address of the sender (0x0, ...).
> > > The code in line 528 expects that buff_data would start with an IP
> > > header protocol (e.g. 0x45), but it is not the case for TAP.
> > > In my case j=0x0 (line 528) which is harmless (as it ends up with
> > > setting pi.proto=0x00, which is correct for TAP).
> > > However, if the sender had an Intel NIC - the destination MAC
> > > address could have started with:
> > > $3 = {0x40, 0x25, 0xC2, ...
> > > Or-
> > > $3 = {0x64, 0xD4, 0xDA, ...
> > >
> > > as 4025C2 and 64D4DA are reserved prefixes for Intel Ethernet MAC
> > > addresses, see: http://www.coffer.com/mac_find/?string=intel
> > >
> > > In this case pi.proto could end up with 0x0008 or 0xdd86 instead of
> > > 0x0 as expected for TAP.
> > >
> > > I hope that this example clarifies the bug I am referring to.
> > >
> 
> Thanks for sharing detailed example overview. But as you mentioned this will
> break ' 4025C2' and ' 64D4DA', This will not solve for the correction patch  '
> https://dpdk.org/dev/patchwork/patch/37986/'.
> 
> Only choice left is separate tx_burst for TAP and TUN PMD, as we do not
> want to check PMD type on each call.
> 
> Questions:
> 1) Is this ok to split tx_burst and have redundant code?
> 2) Does applications transparently send packets coming from Physical NIC to
> TAP interface? Does not the application Modifies the DEST MAC addr to TAP
> interface?
> 
> > > > >
> > > > > 2. If the first TUN byte contains 0x2X (which is neither IPv4
> > > > > nor
> > > > > IPv6) it will end up by setting ip.proto as 0xdd86.
> > > > > Please explain how this logic will work for non-IP packets in
> > > > > TUN
> > > >
> > > > I see your point. You are correct about this. Thanks for pointing
> > > > out, may I send correction for this as
> > > >
> > > > """
> > > > -		if (j & (0x40 | 0x60))
> > > > -			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> > > > +		pi.proto = (j == 0x40) ? 0x0008 :
> > > > +					(j == 0x60) ? 0xdd86 :
> > > > +					0x00;
> > > > """
  
Ferruh Yigit April 23, 2018, 3:42 p.m. UTC | #9
On 4/23/2018 1:58 PM, Varghese, Vipin wrote:
> Hi Ophir,
> 
> Can you help me with the investigation with the following information?
> 1) The kernel or distro in which the TAP proto flag set breaks the logic?

Hi Vipin,

I guess Ophir's point is not this is broken with some kernels but a valid field
set wrong for tap, perhaps someone can be using a custom kernel module to use
those fields, we can't know it.

Instead of duplicating [rt]x_burst() functions, I suggest creating a variable to
set if this is tun or tap and set pi.proto only for tun, this will lead less
comparison for tap and correct proto value.

> 2) Is the above still valid even after applying the patch ' https://dpdk.org/dev/patchwork/patch/37986/'?

I guess his concern is for tap, that some MAC addresses cause wrong pi.proto,
not for tun which your patch fixes.

> 
> Note: I am testing with 3.13.0, 4.4.0 and 4.13.0.
> 
> Thanks
> Vipin Varghese
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Varghese, Vipin
>> Sent: Saturday, April 21, 2018 8:40 PM
>> To: Ophir Munk <ophirmu@mellanox.com>; dev@dpdk.org;
>> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; Thomas
>> Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
>> Shahaf Shuler <shahafs@mellanox.com>
>> Subject: Re: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
>>
>> Hi Ophir,
>>
>> <Snip>
>>
>>> Hi Vipin,
>>> I missed your point:
>>> You claim that TAP should work regardless of any pi.proto values.
>>> Can you confirm that for ALL kernels versions (past and future)?
>>
>> I have tested with 3.13.0 , 4.4.0 with patch fix.
>>
>>>
>>>> -----Original Message-----
>>>> From: Ophir Munk
>>>> Sent: Saturday, April 21, 2018 12:49 AM
>>>> To: Varghese, Vipin <vipin.varghese@intel.com>; dev@dpdk.org;
>>>> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
>>>> Thomas Monjalon <thomas@monjalon.net>; Olga Shern
>>>> <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
>>>> Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
>>>>
>>>> Hi Vipin,
>>>>
>>>> Please find comments inline.
>>>>
>>>>> -----Original Message-----
>>>>> From: Varghese, Vipin [mailto:vipin.varghese@intel.com]
>>>>> Sent: Friday, April 13, 2018 6:18 AM
>>>>> To: Ophir Munk <ophirmu@mellanox.com>; dev@dpdk.org;
>>>>> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
>>>>> Thomas Monjalon <thomas@monjalon.net>; Olga Shern
>>>>> <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
>>>>> Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
>>>>>
>>
>> <Snip>
>>
>>>>>> 1. Accessing the first byte here assumes it is the first IP
>>>>>> header byte (layer 3) which is correct for TUN.
>>>>>> For TAP however the first byte belongs to Ethernet destination
>>>>>> address (layer 2).
>>>>>> Please explain how this logic will work for TAP.
>>>>>
>>>>> Based on linux code base '/driver/net/tap.c' and '/driver/net/tun.c'
>>>>> from 3.13. to  4.16,
>>>>>
>>>>> Please find my observation below
>>>>> 1. File: tun.c, function: tun_get_user, check for 'tun->flags &
>>>>> TUN_TYPE_MASK' is done and if non ip is taken counter 'rx_dropped'
>>>>> is updated.
>>>>> 2. File: tap.c, there are no checks for 'tap->flags' for IFF_NO_PI
>>>>> in rx data path. Counter 'rx_dropped' is updated in 'tap_handle_frame'.
>>>>>
>>>>
>>>> I understand that in kernel implementation there is no check for
>>>> tap->flags in file tap.c, however I think there is a bug in dpdk
>>>> tap->rte_eth_tap.c
>>> file.
>>>> Please find below an example which demonstrates this claim.
>>>>
>>>>> Please find my reasoning below
>>>>> 1. First approach was to have separate function for tap and tun TX and
>> RX.
>>>>> But this will introduce code duplication, hence reworked the code
>>>>> as
>>>> above.
>>>>
>>>> I agree. Avoiding code duplication is a good approach.
>>>>
>>>>> 2. During my internal testing assigning dummy value for protocol
>>>>> field in TAP packets, did not show a difference in behaviour. May
>>>>> be there are some specific cases this failing.
>>>>>
>>>>> If there difference in behaviour, can please share the same?
>>>>>
>>>>
>>>> Please consider the following example:
>>>> I am running testpmd with a TAP device, --forward-mode=csum.
>>>> I am injecting a TCP packet, which is forwarded back (mac addresses
>>>> swapped) to the sender.
>>>> Using gdb I set a breakpoint at pmd_tx_burst() in file rte_eth_tap.c
>>>>
>>>> Looking at the following code inside pmd_tx_burst():
>>>>
>>>> 527                 char *buff_data = rte_pktmbuf_mtod(seg, void *);
>>>> 528                 j = (*buff_data & 0xf0);
>>>> 529                 pi.proto = (j == 0x40) ? 0x0008 :
>>>> 530                                 (j == 0x60) ? 0xdd86 : 0x00;
>>>>
>>>> I am printing the first 20 bytes of buff_data in line 527:
>>>>
>>>> (gdb) p/x *(unsigned char *)buff_data@20
>>>> $3 = {0x0, 0x25, 0x88, 0x10, 0x66, 0x2, 0xf4, 0x52, 0x14, 0x7a,
>>>> 0x59, 0x81, 0x8, 0x0, 0x45, 0x0, 0x4, 0xdf, 0x0, 0x1}
>>>>
>>>> The gdb printout refers to:
>>>> 6 bytes of destination MAC address: 0x0, 0x25, 0x88, 0x10, 0x66, 0x2
>>>> 6 bytes of source MAC address: 0xf4, 0x52, 0x14, 0x7a, 0x59, 0x81
>>>> 2 bytes of Ethernet type: 0x8, 0x0 - (IPv4) IP header starting with 0x45, ...
>>>> which is the byte (0x45) that "j" should have looked at
>>>>
>>>> In the case of TAP - buff_data starts with the destination MAC
>>>> address of the sender (0x0, ...).
>>>> The code in line 528 expects that buff_data would start with an IP
>>>> header protocol (e.g. 0x45), but it is not the case for TAP.
>>>> In my case j=0x0 (line 528) which is harmless (as it ends up with
>>>> setting pi.proto=0x00, which is correct for TAP).
>>>> However, if the sender had an Intel NIC - the destination MAC
>>>> address could have started with:
>>>> $3 = {0x40, 0x25, 0xC2, ...
>>>> Or-
>>>> $3 = {0x64, 0xD4, 0xDA, ...
>>>>
>>>> as 4025C2 and 64D4DA are reserved prefixes for Intel Ethernet MAC
>>>> addresses, see: http://www.coffer.com/mac_find/?string=intel
>>>>
>>>> In this case pi.proto could end up with 0x0008 or 0xdd86 instead of
>>>> 0x0 as expected for TAP.
>>>>
>>>> I hope that this example clarifies the bug I am referring to.
>>>>
>>
>> Thanks for sharing detailed example overview. But as you mentioned this will
>> break ' 4025C2' and ' 64D4DA', This will not solve for the correction patch  '
>> https://dpdk.org/dev/patchwork/patch/37986/'.
>>
>> Only choice left is separate tx_burst for TAP and TUN PMD, as we do not
>> want to check PMD type on each call.
>>
>> Questions:
>> 1) Is this ok to split tx_burst and have redundant code?
>> 2) Does applications transparently send packets coming from Physical NIC to
>> TAP interface? Does not the application Modifies the DEST MAC addr to TAP
>> interface?
>>
>>>>>>
>>>>>> 2. If the first TUN byte contains 0x2X (which is neither IPv4
>>>>>> nor
>>>>>> IPv6) it will end up by setting ip.proto as 0xdd86.
>>>>>> Please explain how this logic will work for non-IP packets in
>>>>>> TUN
>>>>>
>>>>> I see your point. You are correct about this. Thanks for pointing
>>>>> out, may I send correction for this as
>>>>>
>>>>> """
>>>>> -		if (j & (0x40 | 0x60))
>>>>> -			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
>>>>> +		pi.proto = (j == 0x40) ? 0x0008 :
>>>>> +					(j == 0x60) ? 0xdd86 :
>>>>> +					0x00;
>>>>> """
  
Varghese, Vipin May 3, 2018, 5:59 a.m. UTC | #10
Hi Ophir,

Shall I investigate and implement the suggestion from Ferruh to use variable logic?

Thanks
Vipin Varghese

> -----Original Message-----

> From: Yigit, Ferruh

> Sent: Monday, April 23, 2018 9:12 PM

> To: Varghese, Vipin <vipin.varghese@intel.com>; Ophir Munk

> <ophirmu@mellanox.com>; dev@dpdk.org; pascal.mazon@6wind.com;

> Thomas Monjalon <thomas@monjalon.net>; Olga Shern

> <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>

> Subject: Re: [dpdk-dev] [PATCH 1/2] net/tap: add tun support

> 

> On 4/23/2018 1:58 PM, Varghese, Vipin wrote:

> > Hi Ophir,

> >

> > Can you help me with the investigation with the following information?

> > 1) The kernel or distro in which the TAP proto flag set breaks the logic?

> 

> Hi Vipin,

> 

> I guess Ophir's point is not this is broken with some kernels but a valid field

> set wrong for tap, perhaps someone can be using a custom kernel module to

> use those fields, we can't know it.

> 

> Instead of duplicating [rt]x_burst() functions, I suggest creating a variable to

> set if this is tun or tap and set pi.proto only for tun, this will lead less

> comparison for tap and correct proto value.

> 

> > 2) Is the above still valid even after applying the patch '

> https://dpdk.org/dev/patchwork/patch/37986/'?

> 

> I guess his concern is for tap, that some MAC addresses cause wrong

> pi.proto, not for tun which your patch fixes.

> 

> >

> > Note: I am testing with 3.13.0, 4.4.0 and 4.13.0.

> >

> > Thanks

> > Vipin Varghese

> >

> >> -----Original Message-----

> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Varghese, Vipin

> >> Sent: Saturday, April 21, 2018 8:40 PM

> >> To: Ophir Munk <ophirmu@mellanox.com>; dev@dpdk.org;

> >> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;

> >> Thomas Monjalon <thomas@monjalon.net>; Olga Shern

> >> <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>

> >> Subject: Re: [dpdk-dev] [PATCH 1/2] net/tap: add tun support

> >>

> >> Hi Ophir,

> >>

> >> <Snip>

> >>

> >>> Hi Vipin,

> >>> I missed your point:

> >>> You claim that TAP should work regardless of any pi.proto values.

> >>> Can you confirm that for ALL kernels versions (past and future)?

> >>

> >> I have tested with 3.13.0 , 4.4.0 with patch fix.

> >>

> >>>

> >>>> -----Original Message-----

> >>>> From: Ophir Munk

> >>>> Sent: Saturday, April 21, 2018 12:49 AM

> >>>> To: Varghese, Vipin <vipin.varghese@intel.com>; dev@dpdk.org;

> >>>> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;

> >>>> Thomas Monjalon <thomas@monjalon.net>; Olga Shern

> >>>> <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>

> >>>> Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support

> >>>>

> >>>> Hi Vipin,

> >>>>

> >>>> Please find comments inline.

> >>>>

> >>>>> -----Original Message-----

> >>>>> From: Varghese, Vipin [mailto:vipin.varghese@intel.com]

> >>>>> Sent: Friday, April 13, 2018 6:18 AM

> >>>>> To: Ophir Munk <ophirmu@mellanox.com>; dev@dpdk.org;

> >>>>> pascal.mazon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;

> >>>>> Thomas Monjalon <thomas@monjalon.net>; Olga Shern

> >>>>> <olgas@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>

> >>>>> Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support

> >>>>>

> >>

> >> <Snip>

> >>

> >>>>>> 1. Accessing the first byte here assumes it is the first IP

> >>>>>> header byte (layer 3) which is correct for TUN.

> >>>>>> For TAP however the first byte belongs to Ethernet destination

> >>>>>> address (layer 2).

> >>>>>> Please explain how this logic will work for TAP.

> >>>>>

> >>>>> Based on linux code base '/driver/net/tap.c' and '/driver/net/tun.c'

> >>>>> from 3.13. to  4.16,

> >>>>>

> >>>>> Please find my observation below

> >>>>> 1. File: tun.c, function: tun_get_user, check for 'tun->flags &

> >>>>> TUN_TYPE_MASK' is done and if non ip is taken counter 'rx_dropped'

> >>>>> is updated.

> >>>>> 2. File: tap.c, there are no checks for 'tap->flags' for IFF_NO_PI

> >>>>> in rx data path. Counter 'rx_dropped' is updated in

> 'tap_handle_frame'.

> >>>>>

> >>>>

> >>>> I understand that in kernel implementation there is no check for

> >>>> tap->flags in file tap.c, however I think there is a bug in dpdk

> >>>> tap->rte_eth_tap.c

> >>> file.

> >>>> Please find below an example which demonstrates this claim.

> >>>>

> >>>>> Please find my reasoning below

> >>>>> 1. First approach was to have separate function for tap and tun TX

> >>>>> and

> >> RX.

> >>>>> But this will introduce code duplication, hence reworked the code

> >>>>> as

> >>>> above.

> >>>>

> >>>> I agree. Avoiding code duplication is a good approach.

> >>>>

> >>>>> 2. During my internal testing assigning dummy value for protocol

> >>>>> field in TAP packets, did not show a difference in behaviour. May

> >>>>> be there are some specific cases this failing.

> >>>>>

> >>>>> If there difference in behaviour, can please share the same?

> >>>>>

> >>>>

> >>>> Please consider the following example:

> >>>> I am running testpmd with a TAP device, --forward-mode=csum.

> >>>> I am injecting a TCP packet, which is forwarded back (mac addresses

> >>>> swapped) to the sender.

> >>>> Using gdb I set a breakpoint at pmd_tx_burst() in file

> >>>> rte_eth_tap.c

> >>>>

> >>>> Looking at the following code inside pmd_tx_burst():

> >>>>

> >>>> 527                 char *buff_data = rte_pktmbuf_mtod(seg, void *);

> >>>> 528                 j = (*buff_data & 0xf0);

> >>>> 529                 pi.proto = (j == 0x40) ? 0x0008 :

> >>>> 530                                 (j == 0x60) ? 0xdd86 : 0x00;

> >>>>

> >>>> I am printing the first 20 bytes of buff_data in line 527:

> >>>>

> >>>> (gdb) p/x *(unsigned char *)buff_data@20

> >>>> $3 = {0x0, 0x25, 0x88, 0x10, 0x66, 0x2, 0xf4, 0x52, 0x14, 0x7a,

> >>>> 0x59, 0x81, 0x8, 0x0, 0x45, 0x0, 0x4, 0xdf, 0x0, 0x1}

> >>>>

> >>>> The gdb printout refers to:

> >>>> 6 bytes of destination MAC address: 0x0, 0x25, 0x88, 0x10, 0x66,

> >>>> 0x2

> >>>> 6 bytes of source MAC address: 0xf4, 0x52, 0x14, 0x7a, 0x59, 0x81

> >>>> 2 bytes of Ethernet type: 0x8, 0x0 - (IPv4) IP header starting with 0x45,

> ...

> >>>> which is the byte (0x45) that "j" should have looked at

> >>>>

> >>>> In the case of TAP - buff_data starts with the destination MAC

> >>>> address of the sender (0x0, ...).

> >>>> The code in line 528 expects that buff_data would start with an IP

> >>>> header protocol (e.g. 0x45), but it is not the case for TAP.

> >>>> In my case j=0x0 (line 528) which is harmless (as it ends up with

> >>>> setting pi.proto=0x00, which is correct for TAP).

> >>>> However, if the sender had an Intel NIC - the destination MAC

> >>>> address could have started with:

> >>>> $3 = {0x40, 0x25, 0xC2, ...

> >>>> Or-

> >>>> $3 = {0x64, 0xD4, 0xDA, ...

> >>>>

> >>>> as 4025C2 and 64D4DA are reserved prefixes for Intel Ethernet MAC

> >>>> addresses, see: http://www.coffer.com/mac_find/?string=intel

> >>>>

> >>>> In this case pi.proto could end up with 0x0008 or 0xdd86 instead of

> >>>> 0x0 as expected for TAP.

> >>>>

> >>>> I hope that this example clarifies the bug I am referring to.

> >>>>

> >>

> >> Thanks for sharing detailed example overview. But as you mentioned

> >> this will break ' 4025C2' and ' 64D4DA', This will not solve for the correction

> patch  '

> >> https://dpdk.org/dev/patchwork/patch/37986/'.

> >>

> >> Only choice left is separate tx_burst for TAP and TUN PMD, as we do

> >> not want to check PMD type on each call.

> >>

> >> Questions:

> >> 1) Is this ok to split tx_burst and have redundant code?

> >> 2) Does applications transparently send packets coming from Physical

> >> NIC to TAP interface? Does not the application Modifies the DEST MAC

> >> addr to TAP interface?

> >>

> >>>>>>

> >>>>>> 2. If the first TUN byte contains 0x2X (which is neither IPv4 nor

> >>>>>> IPv6) it will end up by setting ip.proto as 0xdd86.

> >>>>>> Please explain how this logic will work for non-IP packets in TUN

> >>>>>

> >>>>> I see your point. You are correct about this. Thanks for pointing

> >>>>> out, may I send correction for this as

> >>>>>

> >>>>> """

> >>>>> -		if (j & (0x40 | 0x60))

> >>>>> -			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;

> >>>>> +		pi.proto = (j == 0x40) ? 0x0008 :

> >>>>> +					(j == 0x60) ? 0xdd86 :

> >>>>> +					0x00;

> >>>>> """
  

Patch

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 3e4f7a8..295db3c 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -42,6 +42,7 @@ 
 /* Linux based path to the TUN device */
 #define TUN_TAP_DEV_PATH        "/dev/net/tun"
 #define DEFAULT_TAP_NAME        "dtap"
+#define DEFAULT_TUN_NAME        "dtun"
 
 #define ETH_TAP_IFACE_ARG       "iface"
 #define ETH_TAP_REMOTE_ARG      "remote"
@@ -53,6 +54,7 @@ 
 #define ETH_TAP_MAC_ARG_FMT     ETH_TAP_MAC_FIXED "|" ETH_TAP_USR_MAC_FMT
 
 static struct rte_vdev_driver pmd_tap_drv;
+static struct rte_vdev_driver pmd_tun_drv;
 
 static const char *valid_arguments[] = {
 	ETH_TAP_IFACE_ARG,
@@ -62,6 +64,10 @@ 
 };
 
 static int tap_unit;
+static int tun_unit;
+
+static int tap_type;
+static char tuntap_name[8];
 
 static volatile uint32_t tap_trigger;	/* Rx trigger */
 
@@ -108,7 +114,7 @@  enum ioctl_mode {
 	 * Do not set IFF_NO_PI as packet information header will be needed
 	 * to check if a received packet has been truncated.
 	 */
-	ifr.ifr_flags = IFF_TAP;
+	ifr.ifr_flags = (tap_type) ? IFF_TAP : IFF_TUN | IFF_POINTOPOINT;
 	snprintf(ifr.ifr_name, IFNAMSIZ, "%s", pmd->name);
 
 	RTE_LOG(DEBUG, PMD, "ifr_name '%s'\n", ifr.ifr_name);
@@ -495,7 +501,7 @@  enum ioctl_mode {
 	for (i = 0; i < nb_pkts; i++) {
 		struct rte_mbuf *mbuf = bufs[num_tx];
 		struct iovec iovecs[mbuf->nb_segs + 1];
-		struct tun_pi pi = { .flags = 0 };
+		struct tun_pi pi = { .flags = 0, .proto = 0x00 };
 		struct rte_mbuf *seg = mbuf;
 		char m_copy[mbuf->data_len];
 		int n;
@@ -505,6 +511,21 @@  enum ioctl_mode {
 		if (rte_pktmbuf_pkt_len(mbuf) > max_size)
 			break;
 
+		/*
+		 * TUN and TAP are created with IFF_NO_PI disabled.
+		 * For TUN PMD this mandatory as fields are used by
+		 * Kernel tun.c to determine whether its IP or non IP
+		 * packets.
+		 *
+		 * The logic fetches the first byte of data from mbuf.
+		 * compares whether its v4 or v6. If none matches default
+		 * value 0x00 is taken for protocol field.
+		 */
+		char *buff_data = rte_pktmbuf_mtod(seg, void *);
+		j = (*buff_data & 0xf0);
+		if (j & (0x40 | 0x60))
+			pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
+
 		iovecs[0].iov_base = &pi;
 		iovecs[0].iov_len = sizeof(pi);
 		for (j = 1; j <= mbuf->nb_segs; j++) {
@@ -1403,10 +1424,12 @@  enum ioctl_mode {
 		pmd->txq[i].fd = -1;
 	}
 
-	if (is_zero_ether_addr(mac_addr))
-		eth_random_addr((uint8_t *)&pmd->eth_addr);
-	else
-		rte_memcpy(&pmd->eth_addr, mac_addr, sizeof(mac_addr));
+	if (tap_type) {
+		if (is_zero_ether_addr(mac_addr))
+			eth_random_addr((uint8_t *)&pmd->eth_addr);
+		else
+			rte_memcpy(&pmd->eth_addr, mac_addr, sizeof(mac_addr));
+	}
 
 	/* Immediately create the netdevice (this will create the 1st queue). */
 	/* rx queue */
@@ -1420,11 +1443,14 @@  enum ioctl_mode {
 	if (tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1, LOCAL_AND_REMOTE) < 0)
 		goto error_exit;
 
-	memset(&ifr, 0, sizeof(struct ifreq));
-	ifr.ifr_hwaddr.sa_family = AF_LOCAL;
-	rte_memcpy(ifr.ifr_hwaddr.sa_data, &pmd->eth_addr, ETHER_ADDR_LEN);
-	if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0)
-		goto error_exit;
+	if (tap_type) {
+		memset(&ifr, 0, sizeof(struct ifreq));
+		ifr.ifr_hwaddr.sa_family = AF_LOCAL;
+		rte_memcpy(ifr.ifr_hwaddr.sa_data, &pmd->eth_addr,
+				ETHER_ADDR_LEN);
+		if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0)
+			goto error_exit;
+	}
 
 	/*
 	 * Set up everything related to rte_flow:
@@ -1621,6 +1647,62 @@  static int parse_user_mac(struct ether_addr *user_mac,
 	return -1;
 }
 
+/*
+ * Open a TUN interface device. TUN PMD
+ * 1) sets tap_type as false
+ * 2) intakes iface as argument.
+ * 3) as interface is virtual set speed to 10G
+ */
+static int
+rte_pmd_tun_probe(struct rte_vdev_device *dev)
+{
+	const char *name, *params;
+	int ret;
+	struct rte_kvargs *kvlist = NULL;
+	char tun_name[RTE_ETH_NAME_MAX_LEN];
+	char remote_iface[RTE_ETH_NAME_MAX_LEN];
+
+	tap_type = 0;
+	strcpy(tuntap_name, "TUN");
+
+	name = rte_vdev_device_name(dev);
+	params = rte_vdev_device_args(dev);
+	memset(remote_iface, 0, RTE_ETH_NAME_MAX_LEN);
+
+	if (params && (params[0] != '\0')) {
+		RTE_LOG(DEBUG, PMD, "parameters (%s)\n", params);
+
+		kvlist = rte_kvargs_parse(params, valid_arguments);
+		if (kvlist) {
+			if (rte_kvargs_count(kvlist, ETH_TAP_IFACE_ARG) == 1) {
+				ret = rte_kvargs_process(kvlist,
+					ETH_TAP_IFACE_ARG,
+					&set_interface_name,
+					tun_name);
+
+				if (ret == -1)
+					goto leave;
+			}
+		}
+	}
+	pmd_link.link_speed = ETH_SPEED_NUM_10G;
+
+	RTE_LOG(NOTICE, PMD, "Initializing pmd_tun for %s as %s\n",
+		name, tun_name);
+
+	ret = eth_dev_tap_create(dev, tun_name, remote_iface, 0);
+
+leave:
+	if (ret == -1) {
+		RTE_LOG(ERR, PMD, "Failed to create pmd for %s as %s\n",
+			name, tun_name);
+		tun_unit--; /* Restore the unit number */
+	}
+	rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
 /* Open a TAP interface device.
  */
 static int
@@ -1634,6 +1716,9 @@  static int parse_user_mac(struct ether_addr *user_mac,
 	char remote_iface[RTE_ETH_NAME_MAX_LEN];
 	struct ether_addr user_mac = { .addr_bytes = {0} };
 
+	tap_type = 1;
+	strcpy(tuntap_name, "TAP");
+
 	name = rte_vdev_device_name(dev);
 	params = rte_vdev_device_args(dev);
 
@@ -1693,7 +1778,7 @@  static int parse_user_mac(struct ether_addr *user_mac,
 	return ret;
 }
 
-/* detach a TAP device.
+/* detach a TUNTAP device.
  */
 static int
 rte_pmd_tap_remove(struct rte_vdev_device *dev)
@@ -1736,12 +1821,20 @@  static int parse_user_mac(struct ether_addr *user_mac,
 	return 0;
 }
 
+static struct rte_vdev_driver pmd_tun_drv = {
+	.probe = rte_pmd_tun_probe,
+	.remove = rte_pmd_tap_remove,
+};
+
 static struct rte_vdev_driver pmd_tap_drv = {
 	.probe = rte_pmd_tap_probe,
 	.remove = rte_pmd_tap_remove,
 };
 RTE_PMD_REGISTER_VDEV(net_tap, pmd_tap_drv);
+RTE_PMD_REGISTER_VDEV(net_tun, pmd_tun_drv);
 RTE_PMD_REGISTER_ALIAS(net_tap, eth_tap);
+RTE_PMD_REGISTER_PARAM_STRING(net_tun,
+			      ETH_TAP_IFACE_ARG "=<string> ");
 RTE_PMD_REGISTER_PARAM_STRING(net_tap,
 			      ETH_TAP_IFACE_ARG "=<string> "
 			      ETH_TAP_MAC_ARG "=" ETH_TAP_MAC_ARG_FMT " "