[v2,0/3] ethdev: configure SR-IOV VF from host
mbox series

Message ID 20191029185051.32203-1-thomas@monjalon.net
Headers show
Series
  • ethdev: configure SR-IOV VF from host
Related show

Message

Thomas Monjalon Oct. 29, 2019, 6:50 p.m. UTC
In a virtual environment, the network controller may have to configure
some SR-IOV VF parameters for security reasons.

When the PF (host port) is driven by DPDK (OVS-DPDK case),
we face two different cases:
    - driver is bifurcated (Mellanox case),
      so the VF can be configured via the kernel.
    - driver is on top of UIO or VFIO, so DPDK API is required,
      and PMD-specific APIs were used.
This new generic API will avoid vendors fragmentation.

Some PMD-specific API could migrate to this generic model.
As an example, the default MAC address configuration is demonstrated
for a VF mapped to mlx5 representor port.

As it breaks the ABI, I propose to merge this API in DPDK 19.11-rc2.

I am sorry I had not send a patch since proposing a RFC in August.
(I gave priority to the summit and the -rc1 release)


Thomas Monjalon (3):
  ethdev: identify SR-IOV VF from host
  ethdev: set VF MAC address from host
  net/mlx5: set VF MAC address from host

 drivers/net/mlx5/mlx5.c                  |  6 +++
 drivers/net/mlx5/mlx5.h                  |  1 +
 drivers/net/mlx5/mlx5_mac.c              | 19 ++++++++
 lib/librte_ethdev/rte_ethdev.c           | 55 +++++++++++++++++++++---
 lib/librte_ethdev/rte_ethdev.h           | 38 ++++++++++++++++
 lib/librte_ethdev/rte_ethdev_core.h      |  1 +
 lib/librte_ethdev/rte_ethdev_version.map |  1 +
 7 files changed, 114 insertions(+), 7 deletions(-)

Comments

Jerin Jacob Oct. 30, 2019, 4:08 a.m. UTC | #1
On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> In a virtual environment, the network controller may have to configure
> some SR-IOV VF parameters for security reasons.

Just to understand, Could you explain more details/examples for
security reasons?

>
> When the PF (host port) is driven by DPDK (OVS-DPDK case),
> we face two different cases:
>     - driver is bifurcated (Mellanox case),
>       so the VF can be configured via the kernel.
>     - driver is on top of UIO or VFIO, so DPDK API is required,

Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
the PF device.
It is only allowed through igb-uio out of tree driver without iommu support.


>       and PMD-specific APIs were used.
> This new generic API will avoid vendors fragmentation.

The API is good. But I have concerns about the vendor implementation
of this API.
It can support only vendors with bifurcated driver(Mellanox case).
or using igb_uio(non iommu case) but not the devices with VFIO(Which
is the first-class citizen).

All the control plane control stuff to replace Linux with "port
representor" logic
will be of the mercy  of an "out of tree" driver either with igb_uio
or http://patches.dpdk.org/patch/58810/

I am _not against_ on DPDK supports port representor or controlling
netdev VF traffic,
but if we have taken that path then DPDK should have the
infrastructure to support for
all driver models like VFIO(Addressed in [1])

I would have this question when DPDK starts supporting port
representor(but I was not
aware that kernel security issue on netdev ports controlled by DPDK in
non-bifurcated driver case
and concise effort block such scheme by kernel [2])


 [1]
http://patches.dpdk.org/patch/58810/
[2]
https://patchwork.kernel.org/patch/10522381/



>
> Some PMD-specific API could migrate to this generic model.
> As an example, the default MAC address configuration is demonstrated
> for a VF mapped to mlx5 representor port.
>
> As it breaks the ABI, I propose to merge this API in DPDK 19.11-rc2.
>
> I am sorry I had not send a patch since proposing a RFC in August.
> (I gave priority to the summit and the -rc1 release)
>
>
> Thomas Monjalon (3):
>   ethdev: identify SR-IOV VF from host
>   ethdev: set VF MAC address from host
>   net/mlx5: set VF MAC address from host
>
>  drivers/net/mlx5/mlx5.c                  |  6 +++
>  drivers/net/mlx5/mlx5.h                  |  1 +
>  drivers/net/mlx5/mlx5_mac.c              | 19 ++++++++
>  lib/librte_ethdev/rte_ethdev.c           | 55 +++++++++++++++++++++---
>  lib/librte_ethdev/rte_ethdev.h           | 38 ++++++++++++++++
>  lib/librte_ethdev/rte_ethdev_core.h      |  1 +
>  lib/librte_ethdev/rte_ethdev_version.map |  1 +
>  7 files changed, 114 insertions(+), 7 deletions(-)
>
> --
> 2.23.0
>
Shahaf Shuler Oct. 30, 2019, 7:22 a.m. UTC | #2
Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
> 
> On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> >
> > In a virtual environment, the network controller may have to configure
> > some SR-IOV VF parameters for security reasons.
> 
> Just to understand, Could you explain more details/examples for security
> reasons?
> 
> >
> > When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
> > different cases:
> >     - driver is bifurcated (Mellanox case),
> >       so the VF can be configured via the kernel.
> >     - driver is on top of UIO or VFIO, so DPDK API is required,
> 
> Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
> PF device.
> It is only allowed through igb-uio out of tree driver without iommu support.

Per my understanding Thomas proposal is not to create the VFs from the PF device. it is to configure their network attributes from the PF after they have been created.

> 
> 
> >       and PMD-specific APIs were used.
> > This new generic API will avoid vendors fragmentation.
> 
> The API is good. But I have concerns about the vendor implementation of
> this API.
> It can support only vendors with bifurcated driver(Mellanox case).
> or using igb_uio(non iommu case) but not the devices with VFIO(Which is the
> first-class citizen).
> 
> All the control plane control stuff to replace Linux with "port representor"
> logic will be of the mercy  of an "out of tree" driver either with igb_uio or
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatch
> es.dpdk.org%2Fpatch%2F58810%2F&amp;data=02%7C01%7Cshahafs%40mel
> lanox.com%7C8da6ffacecde48af24f608d75ceee28c%7Ca652971c7d2e4d9ba6
> a4d149256f461b%7C0%7C0%7C637080053397844419&amp;sdata=sAIRqTnAN
> G8lIb2eYhvcylU%2F6%2F81eXPDeGbnrUdMnis%3D&amp;reserved=0

I am not sure I follow. 
Device that supports representor should enable the HV to configure their macs. It is the best if it can allow it using the in-tree drivers (VFIO, Mellanox bifurcated..) by using, for example, so device registers on the device bar. 
Otherwise such vendor will need to recommend its customers to use other, out of tree, driver to get the needed functionality to enable switchdev and representors. 

> 
> I am _not against_ on DPDK supports port representor or controlling netdev
> VF traffic, but if we have taken that path then DPDK should have the
> infrastructure to support for all driver models like VFIO(Addressed in [1])
> 
> I would have this question when DPDK starts supporting port
> representor(but I was not aware that kernel security issue on netdev ports
> controlled by DPDK in non-bifurcated driver case and concise effort block
> such scheme by kernel [2])
> 
> 
>  [1]
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatch
> es.dpdk.org%2Fpatch%2F58810%2F&amp;data=02%7C01%7Cshahafs%40mel
> lanox.com%7C8da6ffacecde48af24f608d75ceee28c%7Ca652971c7d2e4d9ba6
> a4d149256f461b%7C0%7C0%7C637080053397844419&amp;sdata=sAIRqTnAN
> G8lIb2eYhvcylU%2F6%2F81eXPDeGbnrUdMnis%3D&amp;reserved=0
> [2]
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatch
> work.kernel.org%2Fpatch%2F10522381%2F&amp;data=02%7C01%7Cshahafs
> %40mellanox.com%7C8da6ffacecde48af24f608d75ceee28c%7Ca652971c7d2e
> 4d9ba6a4d149256f461b%7C0%7C0%7C637080053397844419&amp;sdata=fyEo
> fHJQM51L8ssvLNyaLwrsCK8bBJiuPT%2FgMje3QxE%3D&amp;reserved=0
> 
> 
> 
> >
> > Some PMD-specific API could migrate to this generic model.
> > As an example, the default MAC address configuration is demonstrated
> > for a VF mapped to mlx5 representor port.
> >
> > As it breaks the ABI, I propose to merge this API in DPDK 19.11-rc2.
> >
> > I am sorry I had not send a patch since proposing a RFC in August.
> > (I gave priority to the summit and the -rc1 release)
> >
> >
> > Thomas Monjalon (3):
> >   ethdev: identify SR-IOV VF from host
> >   ethdev: set VF MAC address from host
> >   net/mlx5: set VF MAC address from host
> >
> >  drivers/net/mlx5/mlx5.c                  |  6 +++
> >  drivers/net/mlx5/mlx5.h                  |  1 +
> >  drivers/net/mlx5/mlx5_mac.c              | 19 ++++++++
> >  lib/librte_ethdev/rte_ethdev.c           | 55 +++++++++++++++++++++---
> >  lib/librte_ethdev/rte_ethdev.h           | 38 ++++++++++++++++
> >  lib/librte_ethdev/rte_ethdev_core.h      |  1 +
> >  lib/librte_ethdev/rte_ethdev_version.map |  1 +
> >  7 files changed, 114 insertions(+), 7 deletions(-)
> >
> > --
> > 2.23.0
> >
Thomas Monjalon Oct. 30, 2019, 8:56 a.m. UTC | #3
30/10/2019 05:08, Jerin Jacob:
> On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > In a virtual environment, the network controller may have to configure
> > some SR-IOV VF parameters for security reasons.
> 
> Just to understand, Could you explain more details/examples for
> security reasons?

Examples are setting the MAC address or the promiscuous mode.
These settings should be decided by the hypervisor,
and not freely set by the VM.

> > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > we face two different cases:
> >     - driver is bifurcated (Mellanox case),
> >       so the VF can be configured via the kernel.
> >     - driver is on top of UIO or VFIO, so DPDK API is required,
> 
> Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
> the PF device.
> It is only allowed through igb-uio out of tree driver without iommu support.

Not sure I said the contrary.
igb_uio and vfio_pf are 2 implementations of UIO and VFIO.

> >       and PMD-specific APIs were used.
> > This new generic API will avoid vendors fragmentation.
> 
> The API is good. But I have concerns about the vendor implementation
> of this API.
> It can support only vendors with bifurcated driver(Mellanox case).
> or using igb_uio(non iommu case) but not the devices with VFIO(Which
> is the first-class citizen).

Why not? (see above)
The API is agnostic to the kernel driver in use.

> All the control plane control stuff to replace Linux with "port
> representor" logic
> will be of the mercy  of an "out of tree" driver either with igb_uio
> or http://patches.dpdk.org/patch/58810/
> 
> I am _not against_ on DPDK supports port representor or controlling
> netdev VF traffic,
> but if we have taken that path then DPDK should have the
> infrastructure to support for
> all driver models like VFIO(Addressed in [1])
> 
> I would have this question when DPDK starts supporting port
> representor(but I was not
> aware that kernel security issue on netdev ports controlled by DPDK in
> non-bifurcated driver case
> and concise effort block such scheme by kernel [2])
> 
> 
>  [1]
> http://patches.dpdk.org/patch/58810/
> [2]
> https://patchwork.kernel.org/patch/10522381/

I feel you are using this thread to promote your vfio_pf implementation.
But this API and the kernel module are orthogonals.
Please can we focus on the DPDK API in this thread,
and talk about VFIO implementation in the other thread?
Jerin Jacob Oct. 30, 2019, 9:15 a.m. UTC | #4
On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 30/10/2019 05:08, Jerin Jacob:
> > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > In a virtual environment, the network controller may have to configure
> > > some SR-IOV VF parameters for security reasons.
> >
> > Just to understand, Could you explain more details/examples for
> > security reasons?
>
> Examples are setting the MAC address or the promiscuous mode.
> These settings should be decided by the hypervisor,
> and not freely set by the VM.

What is hypervisor here, rte_flow based DPDK application over using
port representor?


>
> > > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > > we face two different cases:
> > >     - driver is bifurcated (Mellanox case),
> > >       so the VF can be configured via the kernel.
> > >     - driver is on top of UIO or VFIO, so DPDK API is required,
> >
> > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
> > the PF device.
> > It is only allowed through igb-uio out of tree driver without iommu support.
>
> Not sure I said the contrary.
> igb_uio and vfio_pf are 2 implementations of UIO and VFIO.

Yes. I am saying without out of tree module it is not possible.


>
> > >       and PMD-specific APIs were used.
> > > This new generic API will avoid vendors fragmentation.
> >
> > The API is good. But I have concerns about the vendor implementation
> > of this API.
> > It can support only vendors with bifurcated driver(Mellanox case).
> > or using igb_uio(non iommu case) but not the devices with VFIO(Which
> > is the first-class citizen).
>
> Why not? (see above)
> The API is agnostic to the kernel driver in use.

My question is how do you enable in DPDK with VFIO if DPDK is giving
this feature?


>
> > All the control plane control stuff to replace Linux with "port
> > representor" logic
> > will be of the mercy  of an "out of tree" driver either with igb_uio
> > or http://patches.dpdk.org/patch/58810/
> >
> > I am _not against_ on DPDK supports port representor or controlling
> > netdev VF traffic,
> > but if we have taken that path then DPDK should have the
> > infrastructure to support for
> > all driver models like VFIO(Addressed in [1])
> >
> > I would have this question when DPDK starts supporting port
> > representor(but I was not
> > aware that kernel security issue on netdev ports controlled by DPDK in
> > non-bifurcated driver case
> > and concise effort block such scheme by kernel [2])
> >
> >
> >  [1]
> > http://patches.dpdk.org/patch/58810/
> > [2]
> > https://patchwork.kernel.org/patch/10522381/
>
> I feel you are using this thread to promote your vfio_pf implementation.

Yes. I am. Because, I need to think, How I can enable this API with VFIO.
Otherwise, I can not implement this API.

> But this API and the kernel module are orthogonals.
> Please can we focus on the DPDK API in this thread,
> and talk about VFIO implementation in the other thread?

Yes. My concerns are we keep adding APIs without thinking about how it
can be implemented
in the overall scheme of things. Just API is not enough.


>
>
Jerin Jacob Oct. 30, 2019, 9:24 a.m. UTC | #5
On Wed, Oct 30, 2019 at 12:52 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
> > Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> > host
> >
> > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > >
> > > In a virtual environment, the network controller may have to configure
> > > some SR-IOV VF parameters for security reasons.
> >
> > Just to understand, Could you explain more details/examples for security
> > reasons?
> >
> > >
> > > When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
> > > different cases:
> > >     - driver is bifurcated (Mellanox case),
> > >       so the VF can be configured via the kernel.
> > >     - driver is on top of UIO or VFIO, so DPDK API is required,
> >
> > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
> > PF device.
> > It is only allowed through igb-uio out of tree driver without iommu support.
>
> Per my understanding Thomas proposal is not to create the VFs from the PF device. it is to configure their network attributes from the PF after they have been created.

Yes. My question is without creating the VF, How do you control them?
Ilya Maximets Oct. 30, 2019, 3:07 p.m. UTC | #6
On 29.10.2019 19:50, Thomas Monjalon wrote:
> In a virtual environment, the network controller may have to configure
> some SR-IOV VF parameters for security reasons.
> 
> When the PF (host port) is driven by DPDK (OVS-DPDK case),
> we face two different cases:
>      - driver is bifurcated (Mellanox case),
>        so the VF can be configured via the kernel.
>      - driver is on top of UIO or VFIO, so DPDK API is required,
>        and PMD-specific APIs were used.

So, what is wrong with setting VF mac via the representor port as
it done now?  From our previous discussion, I thought that we concluded
that is enough to have current API, i.e. just call set_default_mac()
for a representor port and this will lead to setting mac for VF.
This is how it's implemented in Linux kernel and this is how it's
implemented in current DPDK drivers that supports setting mac for
the representor.

The only use case for this new API is to be able to control mac of
the representor itself, which doesn't make much sense. At least there
are only hypothetical use cases. And once again, Linux kernel doesn't
support this behavior.

> This new generic API will avoid vendors fragmentation.

I don't see the fragmentation. Right now you can set VF mac from DPDK
by calling set_default_mac() for representor.  This API exists for all
vendors. Not implemented for Intel, but new API is not implemented too.

The this is that this new API will produce conceptual fragmentation
between DPDK and the Linux kernel, because to do the same thing you'll
have to use different ways. I mean, to change mac of VF in kernel you
need to set mac to the representor, but in DPDK changing setting mac to
representor will lead to changing the mac of the representor itself,
not the VF. This will be really confusing for users.

> 
> Some PMD-specific API could migrate to this generic model.
> As an example, the default MAC address configuration is demonstrated
> for a VF mapped to mlx5 representor port.
> 
> As it breaks the ABI, I propose to merge this API in DPDK 19.11-rc2.
> 
> I am sorry I had not send a patch since proposing a RFC in August.
> (I gave priority to the summit and the -rc1 release)
> 
> 
> Thomas Monjalon (3):
>    ethdev: identify SR-IOV VF from host
>    ethdev: set VF MAC address from host
>    net/mlx5: set VF MAC address from host
> 
>   drivers/net/mlx5/mlx5.c                  |  6 +++
>   drivers/net/mlx5/mlx5.h                  |  1 +
>   drivers/net/mlx5/mlx5_mac.c              | 19 ++++++++
>   lib/librte_ethdev/rte_ethdev.c           | 55 +++++++++++++++++++++---
>   lib/librte_ethdev/rte_ethdev.h           | 38 ++++++++++++++++
>   lib/librte_ethdev/rte_ethdev_core.h      |  1 +
>   lib/librte_ethdev/rte_ethdev_version.map |  1 +
>   7 files changed, 114 insertions(+), 7 deletions(-)
>
Thomas Monjalon Oct. 30, 2019, 3:49 p.m. UTC | #7
30/10/2019 16:07, Ilya Maximets:
> On 29.10.2019 19:50, Thomas Monjalon wrote:
> > In a virtual environment, the network controller may have to configure
> > some SR-IOV VF parameters for security reasons.
> > 
> > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > we face two different cases:
> >      - driver is bifurcated (Mellanox case),
> >        so the VF can be configured via the kernel.
> >      - driver is on top of UIO or VFIO, so DPDK API is required,
> >        and PMD-specific APIs were used.
> 
> So, what is wrong with setting VF mac via the representor port as
> it done now?  From our previous discussion, I thought that we concluded
> that is enough to have current API, i.e. just call set_default_mac()
> for a representor port and this will lead to setting mac for VF.
> This is how it's implemented in Linux kernel and this is how it's
> implemented in current DPDK drivers that supports setting mac for
> the representor.

I don't know what is done in the Linux kernel.
In DPDK, setting the MAC of the representor is really setting
the MAC of the representor. Is it crazy?

> The only use case for this new API is to be able to control mac of
> the representor itself, which doesn't make much sense. At least there
> are only hypothetical use cases. And once again, Linux kernel doesn't
> support this behavior.

I think it is sane to be able to set different MAC addresses
for the representor and the VF.

> > This new generic API will avoid vendors fragmentation.
> 
> I don't see the fragmentation. Right now you can set VF mac from DPDK
> by calling set_default_mac() for representor.  This API exists for all
> vendors. Not implemented for Intel, but new API is not implemented too.

No, the current situation is the following:
	- for mlx5, VF MAC can be configured with iproute2 or netlink
	- for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
	- for i40e, rte_pmd_i40e_set_vf_mac_addr()
	- for bnxt, rte_pmd_bnxt_set_vf_mac_addr()

> The this is that this new API will produce conceptual fragmentation
> between DPDK and the Linux kernel, because to do the same thing you'll
> have to use different ways. I mean, to change mac of VF in kernel you
> need to set mac to the representor, but in DPDK changing setting mac to
> representor will lead to changing the mac of the representor itself,
> not the VF. This will be really confusing for users.

I am not responsible of the choices in Linux.
But I agree it would be interesting to check the reason of such decision.
Rony, please could you explain?
Ilya Maximets Oct. 30, 2019, 4:09 p.m. UTC | #8
On 30.10.2019 16:49, Thomas Monjalon wrote:
> 30/10/2019 16:07, Ilya Maximets:
>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>> In a virtual environment, the network controller may have to configure
>>> some SR-IOV VF parameters for security reasons.
>>>
>>> When the PF (host port) is driven by DPDK (OVS-DPDK case),
>>> we face two different cases:
>>>       - driver is bifurcated (Mellanox case),
>>>         so the VF can be configured via the kernel.
>>>       - driver is on top of UIO or VFIO, so DPDK API is required,
>>>         and PMD-specific APIs were used.
>>
>> So, what is wrong with setting VF mac via the representor port as
>> it done now?  From our previous discussion, I thought that we concluded
>> that is enough to have current API, i.e. just call set_default_mac()
>> for a representor port and this will lead to setting mac for VF.
>> This is how it's implemented in Linux kernel and this is how it's
>> implemented in current DPDK drivers that supports setting mac for
>> the representor.
> 
> I don't know what is done in the Linux kernel.
> In DPDK, setting the MAC of the representor is really setting
> the MAC of the representor. Is it crazy?

Kind of. And no, it doesn't work this way in DPDK.
Just look at the i40e driver:

325 static int
326 i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
327         struct ether_addr *mac_addr)
328 {
329     struct i40e_vf_representor *representor = ethdev->data->dev_private;
330
331     return rte_pmd_i40e_set_vf_mac_addr(
332         representor->adapter->eth_dev->data->port_id,
333         representor->vf_id, mac_addr);
334 }
....
423 static const struct eth_dev_ops i40e_representor_dev_ops = {
     <...>
445     .mac_addr_set         = i40e_vf_representor_mac_addr_set,


It really calls the function to set VF mac address.

And for MLX drivers it's the same.
MLX driver call netlink to set representor MAC, but netlink is in
kernel and this call inside the kernel translates to the setting
of mac address of the VF.

Am I missing something?

> 
>> The only use case for this new API is to be able to control mac of
>> the representor itself, which doesn't make much sense. At least there
>> are only hypothetical use cases. And once again, Linux kernel doesn't
>> support this behavior.
> 
> I think it is sane to be able to set different MAC addresses
> for the representor and the VF.
> 
>>> This new generic API will avoid vendors fragmentation.
>>
>> I don't see the fragmentation. Right now you can set VF mac from DPDK
>> by calling set_default_mac() for representor.  This API exists for all
>> vendors. Not implemented for Intel, but new API is not implemented too.
> 
> No, the current situation is the following:
> 	- for mlx5, VF MAC can be configured with iproute2 or netlink
> 	- for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
> 	- for i40e, rte_pmd_i40e_set_vf_mac_addr()
> 	- for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
> 
>> The this is that this new API will produce conceptual fragmentation
>> between DPDK and the Linux kernel, because to do the same thing you'll
>> have to use different ways. I mean, to change mac of VF in kernel you
>> need to set mac to the representor, but in DPDK changing setting mac to
>> representor will lead to changing the mac of the representor itself,
>> not the VF. This will be really confusing for users.
> 
> I am not responsible of the choices in Linux.
> But I agree it would be interesting to check the reason of such decision.
> Rony, please could you explain?
> 
>
Thomas Monjalon Oct. 30, 2019, 9:42 p.m. UTC | #9
30/10/2019 17:09, Ilya Maximets:
> On 30.10.2019 16:49, Thomas Monjalon wrote:
> > 30/10/2019 16:07, Ilya Maximets:
> >> On 29.10.2019 19:50, Thomas Monjalon wrote:
> >>> In a virtual environment, the network controller may have to configure
> >>> some SR-IOV VF parameters for security reasons.
> >>>
> >>> When the PF (host port) is driven by DPDK (OVS-DPDK case),
> >>> we face two different cases:
> >>>       - driver is bifurcated (Mellanox case),
> >>>         so the VF can be configured via the kernel.
> >>>       - driver is on top of UIO or VFIO, so DPDK API is required,
> >>>         and PMD-specific APIs were used.
> >>
> >> So, what is wrong with setting VF mac via the representor port as
> >> it done now?  From our previous discussion, I thought that we concluded
> >> that is enough to have current API, i.e. just call set_default_mac()
> >> for a representor port and this will lead to setting mac for VF.
> >> This is how it's implemented in Linux kernel and this is how it's
> >> implemented in current DPDK drivers that supports setting mac for
> >> the representor.
> > 
> > I don't know what is done in the Linux kernel.
> > In DPDK, setting the MAC of the representor is really setting
> > the MAC of the representor. Is it crazy?
> 
> Kind of. And no, it doesn't work this way in DPDK.
> Just look at the i40e driver:
> 
> 325 static int
> 326 i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
> 327         struct ether_addr *mac_addr)
> 328 {
> 329     struct i40e_vf_representor *representor = ethdev->data->dev_private;
> 330
> 331     return rte_pmd_i40e_set_vf_mac_addr(
> 332         representor->adapter->eth_dev->data->port_id,
> 333         representor->vf_id, mac_addr);
> 334 }
> ....
> 423 static const struct eth_dev_ops i40e_representor_dev_ops = {
>      <...>
> 445     .mac_addr_set         = i40e_vf_representor_mac_addr_set,
> 
> 
> It really calls the function to set VF mac address.

Indeed, I missed that i40e_vf_representor_mac_addr_set() is calling
rte_pmd_i40e_set_vf_mac_addr().

> And for MLX drivers it's the same.
> MLX driver call netlink to set representor MAC, but netlink is in
> kernel and this call inside the kernel translates to the setting
> of mac address of the VF.
> 
> Am I missing something?

I am not sure about this kernel translation. At least not in mlx5.
Setting MAC address of a VF representor seems not supported on mlx5.
But there is a specific netlink request to set a VF MAC address:
	ip link set <PF> vf <VF> mac <MAC>


> >> The only use case for this new API is to be able to control mac of
> >> the representor itself, which doesn't make much sense. At least there
> >> are only hypothetical use cases. And once again, Linux kernel doesn't
> >> support this behavior.
> > 
> > I think it is sane to be able to set different MAC addresses
> > for the representor and the VF.

Let me explain better my thoughts.
In the switchdev design, VF and uplink ports are all connected
together via a switch.
The representors are mirrors of those switch ports.
So a VF representor port is supposed to mirror the switch port
where a VF is connected to. It is different of the VF itself.

This is a drawing of my understanding of switchdev design:

VF1 ------ VF1 rep /--------\
                   | switch | uplink rep ------ uplink ------ wire
VF2 ------ VF2 rep \--------/    (PF)

Of course, there can be more VF and uplink ports.

With some switch-aware protocols, it might be interesting to have
different MAC addresses on a VF and its representor.
And more generally, I imagine that the config of a VF representor
could be different of the config of a VF.


> >>> This new generic API will avoid vendors fragmentation.
> >>
> >> I don't see the fragmentation. Right now you can set VF mac from DPDK
> >> by calling set_default_mac() for representor.  This API exists for all
> >> vendors. Not implemented for Intel, but new API is not implemented too.
> > 
> > No, the current situation is the following:
> > 	- for mlx5, VF MAC can be configured with iproute2 or netlink
> > 	- for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
> > 	- for i40e, rte_pmd_i40e_set_vf_mac_addr()
> > 	- for bnxt, rte_pmd_bnxt_set_vf_mac_addr()

Thanks to Ilya's comment, this is an update of the DPDK situation:
	- for mlx5, VF MAC can be configured with iproute2 or netlink
	  and rte_eth_dev_default_mac_addr_set(rep) is not supported
	- for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
	  and rte_eth_dev_default_mac_addr_set(rep) does the same
	- for i40e, rte_pmd_i40e_set_vf_mac_addr()
	  and rte_eth_dev_default_mac_addr_set(rep) does the same
	- for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
	  and no representor

If we consider what Intel did, i.e. configure VF in place of representor
for some operations, there are two drawbacks:
- confusing that some ops apply to representor, others apply to VF
- some ops are not possible on representor (because targetted to VF)

I still feel that the addition of one single bit in the port ID
is an elegant solution to target either the VF or its representor.


> >> The this is that this new API will produce conceptual fragmentation
> >> between DPDK and the Linux kernel, because to do the same thing you'll
> >> have to use different ways. I mean, to change mac of VF in kernel you
> >> need to set mac to the representor, but in DPDK changing setting mac to
> >> representor will lead to changing the mac of the representor itself,
> >> not the VF. This will be really confusing for users.
> > 
> > I am not responsible of the choices in Linux.
> > But I agree it would be interesting to check the reason of such decision.
> > Rony, please could you explain?

I looked at few Linux drivers:

	bnxt_vf_rep_netdev_ops has no op to set MAC
	bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF

	lio_vf_rep_ndev_ops has no op to set MAC
	lionetdevops.ndo_set_vf_mac = set VF MAC from PF

	mlx5e_netdev_ops_rep has no op to set MAC
	mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
	mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from PF

	nfp_repr_netdev_ops.ndo_set_mac_address = set representor MAC
	nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from representor
	nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF

There is a big chance that the behaviour is not standardized in Linux
(as usual). So it is already confusing for users of Linux.
Thomas Monjalon Nov. 1, 2019, 12:24 a.m. UTC | #10
30/10/2019 10:24, Jerin Jacob:
> On Wed, Oct 30, 2019 at 12:52 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
> > Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
> > > Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> > > host
> > >
> > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
> > > <thomas@monjalon.net> wrote:
> > > >
> > > > In a virtual environment, the network controller may have to configure
> > > > some SR-IOV VF parameters for security reasons.
> > >
> > > Just to understand, Could you explain more details/examples for security
> > > reasons?
> > >
> > > >
> > > > When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
> > > > different cases:
> > > >     - driver is bifurcated (Mellanox case),
> > > >       so the VF can be configured via the kernel.
> > > >     - driver is on top of UIO or VFIO, so DPDK API is required,
> > >
> > > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
> > > PF device.
> > > It is only allowed through igb-uio out of tree driver without iommu support.
> >
> > Per my understanding Thomas proposal is not to create the VFs
> > from the PF device. it is to configure their network attributes
> > from the PF after they have been created.
> 
> Yes. My question is without creating the VF, How do you control them?

We can create the VF via the kernel PF driver, before binding the PF to VFIO.
Thomas Monjalon Nov. 1, 2019, 12:33 a.m. UTC | #11
30/10/2019 10:15, Jerin Jacob:
> On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 30/10/2019 05:08, Jerin Jacob:
> > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > >
> > > > In a virtual environment, the network controller may have to configure
> > > > some SR-IOV VF parameters for security reasons.
> > >
> > > Just to understand, Could you explain more details/examples for
> > > security reasons?
> >
> > Examples are setting the MAC address or the promiscuous mode.
> > These settings should be decided by the hypervisor,
> > and not freely set by the VM.
> 
> What is hypervisor here, rte_flow based DPDK application over using
> port representor?

Yes, something like that. An example is OpenStack/OVS.

> > > > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > > > we face two different cases:
> > > >     - driver is bifurcated (Mellanox case),
> > > >       so the VF can be configured via the kernel.
> > > >     - driver is on top of UIO or VFIO, so DPDK API is required,
> > >
> > > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
> > > the PF device.
> > > It is only allowed through igb-uio out of tree driver without iommu support.
> >
> > Not sure I said the contrary.
> > igb_uio and vfio_pf are 2 implementations of UIO and VFIO.
> 
> Yes. I am saying without out of tree module it is not possible.
> 
> > > >       and PMD-specific APIs were used.
> > > > This new generic API will avoid vendors fragmentation.
> > >
> > > The API is good. But I have concerns about the vendor implementation
> > > of this API.
> > > It can support only vendors with bifurcated driver(Mellanox case).
> > > or using igb_uio(non iommu case) but not the devices with VFIO(Which
> > > is the first-class citizen).
> >
> > Why not? (see above)
> > The API is agnostic to the kernel driver in use.
> 
> My question is how do you enable in DPDK with VFIO if DPDK is giving
> this feature?

I didn't get your question.
If it is the same question as before, I think you can create the VF
before binding the PF to VFIO.

> > > All the control plane control stuff to replace Linux with "port
> > > representor" logic
> > > will be of the mercy  of an "out of tree" driver either with igb_uio
> > > or http://patches.dpdk.org/patch/58810/
> > >
> > > I am _not against_ on DPDK supports port representor or controlling
> > > netdev VF traffic,
> > > but if we have taken that path then DPDK should have the
> > > infrastructure to support for
> > > all driver models like VFIO(Addressed in [1])
> > >
> > > I would have this question when DPDK starts supporting port
> > > representor(but I was not
> > > aware that kernel security issue on netdev ports controlled by DPDK in
> > > non-bifurcated driver case
> > > and concise effort block such scheme by kernel [2])
> > >
> > >
> > >  [1]
> > > http://patches.dpdk.org/patch/58810/
> > > [2]
> > > https://patchwork.kernel.org/patch/10522381/
> >
> > I feel you are using this thread to promote your vfio_pf implementation.
> 
> Yes. I am. Because, I need to think, How I can enable this API with VFIO.
> Otherwise, I can not implement this API.
> 
> > But this API and the kernel module are orthogonals.
> > Please can we focus on the DPDK API in this thread,
> > and talk about VFIO implementation in the other thread?
> 
> Yes. My concerns are we keep adding APIs without thinking about how it
> can be implemented
> in the overall scheme of things. Just API is not enough.

We keep thinking about all scenarios (maybe in other threads).
And adding an API is a step in the right direction in my opinion.
Ilya Maximets Nov. 1, 2019, 9:06 a.m. UTC | #12
On 01.11.2019 1:24, Thomas Monjalon wrote:
> 30/10/2019 10:24, Jerin Jacob:
>> On Wed, Oct 30, 2019 at 12:52 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>>> Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
>>>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>>>> host
>>>>
>>>> On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
>>>> <thomas@monjalon.net> wrote:
>>>>>
>>>>> In a virtual environment, the network controller may have to configure
>>>>> some SR-IOV VF parameters for security reasons.
>>>>
>>>> Just to understand, Could you explain more details/examples for security
>>>> reasons?
>>>>
>>>>>
>>>>> When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
>>>>> different cases:
>>>>>      - driver is bifurcated (Mellanox case),
>>>>>        so the VF can be configured via the kernel.
>>>>>      - driver is on top of UIO or VFIO, so DPDK API is required,
>>>>
>>>> Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
>>>> PF device.
>>>> It is only allowed through igb-uio out of tree driver without iommu support.
>>>
>>> Per my understanding Thomas proposal is not to create the VFs
>>> from the PF device. it is to configure their network attributes
>>> from the PF after they have been created.
>>
>> Yes. My question is without creating the VF, How do you control them?
> 
> We can create the VF via the kernel PF driver, before binding the PF to VFIO.

AFAIK, this is not possible. VFs are gone as soon as you're unbinding kernel
PF driver.  And after binding of vfio-pci you can no longer create VFs.

I tried to check some representor functionality about 2 months ago and didn't
find a way to enable VFs on Intel NICs if PF is under control of vfio-pci.

Best regards, Ilya Maximets.
Ilya Maximets Nov. 1, 2019, 9:32 a.m. UTC | #13
On 30.10.2019 22:42, Thomas Monjalon wrote:
> 30/10/2019 17:09, Ilya Maximets:
>> On 30.10.2019 16:49, Thomas Monjalon wrote:
>>> 30/10/2019 16:07, Ilya Maximets:
>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>>>> In a virtual environment, the network controller may have to configure
>>>>> some SR-IOV VF parameters for security reasons.
>>>>>
>>>>> When the PF (host port) is driven by DPDK (OVS-DPDK case),
>>>>> we face two different cases:
>>>>>        - driver is bifurcated (Mellanox case),
>>>>>          so the VF can be configured via the kernel.
>>>>>        - driver is on top of UIO or VFIO, so DPDK API is required,
>>>>>          and PMD-specific APIs were used.
>>>>
>>>> So, what is wrong with setting VF mac via the representor port as
>>>> it done now?  From our previous discussion, I thought that we concluded
>>>> that is enough to have current API, i.e. just call set_default_mac()
>>>> for a representor port and this will lead to setting mac for VF.
>>>> This is how it's implemented in Linux kernel and this is how it's
>>>> implemented in current DPDK drivers that supports setting mac for
>>>> the representor.
>>>
>>> I don't know what is done in the Linux kernel.
>>> In DPDK, setting the MAC of the representor is really setting
>>> the MAC of the representor. Is it crazy?
>>
>> Kind of. And no, it doesn't work this way in DPDK.
>> Just look at the i40e driver:
>>
>> 325 static int
>> 326 i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
>> 327         struct ether_addr *mac_addr)
>> 328 {
>> 329     struct i40e_vf_representor *representor = ethdev->data->dev_private;
>> 330
>> 331     return rte_pmd_i40e_set_vf_mac_addr(
>> 332         representor->adapter->eth_dev->data->port_id,
>> 333         representor->vf_id, mac_addr);
>> 334 }
>> ....
>> 423 static const struct eth_dev_ops i40e_representor_dev_ops = {
>>       <...>
>> 445     .mac_addr_set         = i40e_vf_representor_mac_addr_set,
>>
>>
>> It really calls the function to set VF mac address.
> 
> Indeed, I missed that i40e_vf_representor_mac_addr_set() is calling
> rte_pmd_i40e_set_vf_mac_addr().
> 
>> And for MLX drivers it's the same.
>> MLX driver call netlink to set representor MAC, but netlink is in
>> kernel and this call inside the kernel translates to the setting
>> of mac address of the VF.
>>
>> Am I missing something?
> 
> I am not sure about this kernel translation. At least not in mlx5.
> Setting MAC address of a VF representor seems not supported on mlx5.
> But there is a specific netlink request to set a VF MAC address:
> 	ip link set <PF> vf <VF> mac <MAC>
> 
> 
>>>> The only use case for this new API is to be able to control mac of
>>>> the representor itself, which doesn't make much sense. At least there
>>>> are only hypothetical use cases. And once again, Linux kernel doesn't
>>>> support this behavior.
>>>
>>> I think it is sane to be able to set different MAC addresses
>>> for the representor and the VF.
> 
> Let me explain better my thoughts.
> In the switchdev design, VF and uplink ports are all connected
> together via a switch.
> The representors are mirrors of those switch ports.
> So a VF representor port is supposed to mirror the switch port
> where a VF is connected to. It is different of the VF itself.
> 
> This is a drawing of my understanding of switchdev design:
> 
> VF1 ------ VF1 rep /--------\
>                     | switch | uplink rep ------ uplink ------ wire
> VF2 ------ VF2 rep \--------/    (PF)
> 
> Of course, there can be more VF and uplink ports.
> 
> With some switch-aware protocols, it might be interesting to have
> different MAC addresses on a VF and its representor.
> And more generally, I imagine that the config of a VF representor
> could be different of the config of a VF.
> 
> 
>>>>> This new generic API will avoid vendors fragmentation.
>>>>
>>>> I don't see the fragmentation. Right now you can set VF mac from DPDK
>>>> by calling set_default_mac() for representor.  This API exists for all
>>>> vendors. Not implemented for Intel, but new API is not implemented too.
>>>
>>> No, the current situation is the following:
>>> 	- for mlx5, VF MAC can be configured with iproute2 or netlink
>>> 	- for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
>>> 	- for i40e, rte_pmd_i40e_set_vf_mac_addr()
>>> 	- for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
> 
> Thanks to Ilya's comment, this is an update of the DPDK situation:
> 	- for mlx5, VF MAC can be configured with iproute2 or netlink
> 	  and rte_eth_dev_default_mac_addr_set(rep) is not supported
> 	- for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
> 	  and rte_eth_dev_default_mac_addr_set(rep) does the same
> 	- for i40e, rte_pmd_i40e_set_vf_mac_addr()
> 	  and rte_eth_dev_default_mac_addr_set(rep) does the same
> 	- for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
> 	  and no representor
> 
> If we consider what Intel did, i.e. configure VF in place of representor
> for some operations, there are two drawbacks:
> - confusing that some ops apply to representor, others apply to VF
> - some ops are not possible on representor (because targetted to VF)
> 
> I still feel that the addition of one single bit in the port ID
> is an elegant solution to target either the VF or its representor.

Since we already have a confusion about what is configured when
operations are performed on a representor port we have 2 options:
1. Have this proposed API to configure representor itself while
    setting config to representor and configuring VF if special
    bit enabled.
2. Reverse the logic of current proposal, i.e. always apply
    configuration to VF while working with representor and apply
    configuration to representor itself if special bit is set.

I'd probably prefer option #2, because:
- From the OVS and OpenStack point of view, I think, we don't
   really need to configure representor itself in most cases.
   And OVS really should not know if it works with representor
   or some real port.
- It seems that most of the existing code in DPDK already works
   like this, i.e. applying configs to VF itself.  Intel drivers
   works like this and  Mellanox drivers, as Thomas said, doesn't
   have this functionality at all.

> 
> 
>>>> The this is that this new API will produce conceptual fragmentation
>>>> between DPDK and the Linux kernel, because to do the same thing you'll
>>>> have to use different ways. I mean, to change mac of VF in kernel you
>>>> need to set mac to the representor, but in DPDK changing setting mac to
>>>> representor will lead to changing the mac of the representor itself,
>>>> not the VF. This will be really confusing for users.
>>>
>>> I am not responsible of the choices in Linux.
>>> But I agree it would be interesting to check the reason of such decision.
>>> Rony, please could you explain?
> 
> I looked at few Linux drivers:
> 
> 	bnxt_vf_rep_netdev_ops has no op to set MAC
> 	bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> 
> 	lio_vf_rep_ndev_ops has no op to set MAC
> 	lionetdevops.ndo_set_vf_mac = set VF MAC from PF
> 
> 	mlx5e_netdev_ops_rep has no op to set MAC
> 	mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> 	mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from PF
> 
> 	nfp_repr_netdev_ops.ndo_set_mac_address = set representor MAC
> 	nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from representor
> 	nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> 
> There is a big chance that the behaviour is not standardized in Linux
> (as usual). So it is already confusing for users of Linux.
> 
>
Ilya Maximets Nov. 1, 2019, 9:56 a.m. UTC | #14
On 01.11.2019 10:06, Ilya Maximets wrote:
> On 01.11.2019 1:24, Thomas Monjalon wrote:
>> 30/10/2019 10:24, Jerin Jacob:
>>> On Wed, Oct 30, 2019 at 12:52 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>>>> Wednesday, October 30, 2019 6:09 AM, Jerin Jacob:
>>>>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>>>>> host
>>>>>
>>>>> On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
>>>>> <thomas@monjalon.net> wrote:
>>>>>>
>>>>>> In a virtual environment, the network controller may have to configure
>>>>>> some SR-IOV VF parameters for security reasons.
>>>>>
>>>>> Just to understand, Could you explain more details/examples for security
>>>>> reasons?
>>>>>
>>>>>>
>>>>>> When the PF (host port) is driven by DPDK (OVS-DPDK case), we face two
>>>>>> different cases:
>>>>>>      - driver is bifurcated (Mellanox case),
>>>>>>        so the VF can be configured via the kernel.
>>>>>>      - driver is on top of UIO or VFIO, so DPDK API is required,
>>>>>
>>>>> Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from the
>>>>> PF device.
>>>>> It is only allowed through igb-uio out of tree driver without iommu support.
>>>>
>>>> Per my understanding Thomas proposal is not to create the VFs
>>>> from the PF device. it is to configure their network attributes
>>>> from the PF after they have been created.
>>>
>>> Yes. My question is without creating the VF, How do you control them?
>>
>> We can create the VF via the kernel PF driver, before binding the PF to VFIO.
> 
> AFAIK, this is not possible. VFs are gone as soon as you're unbinding kernel
> PF driver.  And after binding of vfio-pci you can no longer create VFs.
> 
> I tried to check some representor functionality about 2 months ago and didn't
> find a way to enable VFs on Intel NICs if PF is under control of vfio-pci.

Likely it was i40e driver from the kernel side.  I see from the lkml thread that
some drivers might not clear sriov on exit (ixgbe), but that wasn't my case and
it's actually a controversial feature in general.
Jerin Jacob Nov. 1, 2019, 11:01 a.m. UTC | #15
On Fri, Nov 1, 2019 at 6:03 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 30/10/2019 10:15, Jerin Jacob:
> > On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 30/10/2019 05:08, Jerin Jacob:
> > > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > >
> > > > > In a virtual environment, the network controller may have to configure
> > > > > some SR-IOV VF parameters for security reasons.
> > > >
> > > > Just to understand, Could you explain more details/examples for
> > > > security reasons?
> > >
> > > Examples are setting the MAC address or the promiscuous mode.
> > > These settings should be decided by the hypervisor,
> > > and not freely set by the VM.
> >
> > What is hypervisor here, rte_flow based DPDK application over using
> > port representor?
>
> Yes, something like that. An example is OpenStack/OVS.
 > > > > When the PF (host port) is driven by DPDK (OVS-DPDK case),
> > > > > we face two different cases:
> > > > >     - driver is bifurcated (Mellanox case),
> > > > >       so the VF can be configured via the kernel.
> > > > >     - driver is on top of UIO or VFIO, so DPDK API is required,
> > > >
> > > > Not true. Both UIO and VFIO are NOT allowed to create SRIOV VF from
> > > > the PF device.
> > > > It is only allowed through igb-uio out of tree driver without iommu support.
> > >
> > > Not sure I said the contrary.
> > > igb_uio and vfio_pf are 2 implementations of UIO and VFIO.
> >
> > Yes. I am saying without out of tree module it is not possible.
> >
> > > > >       and PMD-specific APIs were used.
> > > > > This new generic API will avoid vendors fragmentation.
> > > >
> > > > The API is good. But I have concerns about the vendor implementation
> > > > of this API.
> > > > It can support only vendors with bifurcated driver(Mellanox case).
> > > > or using igb_uio(non iommu case) but not the devices with VFIO(Which
> > > > is the first-class citizen).
> > >
> > > Why not? (see above)
> > > The API is agnostic to the kernel driver in use.
> >
> > My question is how do you enable in DPDK with VFIO if DPDK is giving
> > this feature?
>
> I didn't get your question.
> If it is the same question as before, I think you can create the VF
> before binding the PF to VFIO.

No more. Following patch made to stop that hack.

https://patchwork.kernel.org/patch/10522381/

>
> > > > All the control plane control stuff to replace Linux with "port
> > > > representor" logic
> > > > will be of the mercy  of an "out of tree" driver either with igb_uio
> > > > or http://patches.dpdk.org/patch/58810/
> > > >
> > > > I am _not against_ on DPDK supports port representor or controlling
> > > > netdev VF traffic,
> > > > but if we have taken that path then DPDK should have the
> > > > infrastructure to support for
> > > > all driver models like VFIO(Addressed in [1])
> > > >
> > > > I would have this question when DPDK starts supporting port
> > > > representor(but I was not
> > > > aware that kernel security issue on netdev ports controlled by DPDK in
> > > > non-bifurcated driver case
> > > > and concise effort block such scheme by kernel [2])
> > > >
> > > >
> > > >  [1]
> > > > http://patches.dpdk.org/patch/58810/
> > > > [2]
> > > > https://patchwork.kernel.org/patch/10522381/
> > >
> > > I feel you are using this thread to promote your vfio_pf implementation.
> >
> > Yes. I am. Because, I need to think, How I can enable this API with VFIO.
> > Otherwise, I can not implement this API.
> >
> > > But this API and the kernel module are orthogonals.
> > > Please can we focus on the DPDK API in this thread,
> > > and talk about VFIO implementation in the other thread?
> >
> > Yes. My concerns are we keep adding APIs without thinking about how it
> > can be implemented
> > in the overall scheme of things. Just API is not enough.
>
> We keep thinking about all scenarios (maybe in other threads).
> And adding an API is a step in the right direction in my opinion.

I am not against DPDK adding the API.

My concerns are Linux kernel folks don't like the parrel
infrastructure we are building
with port representor bypassing Linux kernel tools and infrastructure tools.
ie, DPDK controlling the Netdev VFS due to security issues.

And I agree. The Mellanox case is not an issue as bifurcated driver
architecture.
But all other vendors will have the issue. (using UIO or VFIO)



>
>
Jerin Jacob Nov. 1, 2019, 1:25 p.m. UTC | #16
On Fri, Nov 1, 2019 at 6:03 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 30/10/2019 10:15, Jerin Jacob:
> > On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 30/10/2019 05:08, Jerin Jacob:
> > > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > >
> > > > > In a virtual environment, the network controller may have to configure
> > > > > some SR-IOV VF parameters for security reasons.
> > > >
> > > > Just to understand, Could you explain more details/examples for
> > > > security reasons?
> > >
> > > Examples are setting the MAC address or the promiscuous mode.
> > > These settings should be decided by the hypervisor,
> > > and not freely set by the VM.
> >
> > What is hypervisor here, rte_flow based DPDK application over using
> > port representor?
>
> Yes, something like that. An example is OpenStack/OVS.

So it is more of an orchestration primitive. Not the security
primitive as mentioned above,
Because in case the kernel will approve the data set from the guest. Right?
Shahaf Shuler Nov. 3, 2019, 6:31 a.m. UTC | #17
Friday, November 1, 2019 3:25 PM, Jerin Jacob:
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
> 
> On Fri, Nov 1, 2019 at 6:03 AM Thomas Monjalon <thomas@monjalon.net>
> wrote:
> >
> > 30/10/2019 10:15, Jerin Jacob:
> > > On Wed, Oct 30, 2019 at 2:26 PM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> > > >
> > > > 30/10/2019 05:08, Jerin Jacob:
> > > > > On Wed, Oct 30, 2019 at 12:21 AM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> > > > > >
> > > > > > In a virtual environment, the network controller may have to
> > > > > > configure some SR-IOV VF parameters for security reasons.
> > > > >
> > > > > Just to understand, Could you explain more details/examples for
> > > > > security reasons?
> > > >
> > > > Examples are setting the MAC address or the promiscuous mode.
> > > > These settings should be decided by the hypervisor, and not freely
> > > > set by the VM.
> > >
> > > What is hypervisor here, rte_flow based DPDK application over using
> > > port representor?
> >
> > Yes, something like that. An example is OpenStack/OVS.
> 
> So it is more of an orchestration primitive. Not the security primitive as
> mentioned above, 

Yes this feature is to support deployment w/ OpenStack and DPDK vSwitch using switchdev (representors).
One must configure the guest VF permanent MAC. w/ OVS kernel this is not an issue as all drivers bind to their kernel module.
w/ DPDK standard API is needed. 

w/ Switchdev, even though the VF are promisc, all traffic is validated by the vSwitch running on the HV. Only after flow validation vSwitch will insert rule for device to directly fwd the packet to its target location. 

Because in case the kernel will approve the data set from
> the guest. Right?

The exact behavior is device specific. But the kernel/user space driver on the HV is required to listen to events of recv mode change from VF and act upon. 
All recv mode modification by guest should be validated by kernel, however this is a completely diff issue than what discussed here.
Shahaf Shuler Nov. 3, 2019, 6:48 a.m. UTC | #18
Friday, November 1, 2019 11:33 AM, Ilya Maximets:
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
> 
> On 30.10.2019 22:42, Thomas Monjalon wrote:
> > 30/10/2019 17:09, Ilya Maximets:
> >> On 30.10.2019 16:49, Thomas Monjalon wrote:
> >>> 30/10/2019 16:07, Ilya Maximets:
> >>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
> >>>>> In a virtual environment, the network controller may have to
> >>>>> configure some SR-IOV VF parameters for security reasons.
> >>>>>
[...]

> > If we consider what Intel did, i.e. configure VF in place of
> > representor for some operations, there are two drawbacks:
> > - confusing that some ops apply to representor, others apply to VF
> > - some ops are not possible on representor (because targetted to VF)
> >
> > I still feel that the addition of one single bit in the port ID is an
> > elegant solution to target either the VF or its representor.
> 
> Since we already have a confusion about what is configured when operations
> are performed on a representor port we have 2 options:

I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied on VF (please share if I missed it). 
The fact that there are some drivers that implemented it doesn't mean it is correct. 

> 1. Have this proposed API to configure representor itself while
>     setting config to representor and configuring VF if special
>     bit enabled.
> 2. Reverse the logic of current proposal, i.e. always apply
>     configuration to VF while working with representor and apply
>     configuration to representor itself if special bit is set.
> 
> I'd probably prefer option #2, because:
> - From the OVS and OpenStack point of view, I think, we don't
>    really need to configure representor itself in most cases.
>    And OVS really should not know if it works with representor
>    or some real port.

I don't thinks OVS can be really agnostic to the fact it runs on top of representors:
1. probing of representor has different command line -w <bdf>,representor=XXX
2. the whole acceleration framework based on insertion of flow rules for direct forward from the VF to target entity. Rules are applied on the representor and would not work if port is not such. 
3. some multi-port devices cannot do direct fwd between its different port. This is why rep has switch_id and application should query it and act upon. 
4. representor carry the VF port id. This is how application know to which VF (or vport) they associated with on their other side. 

> - It seems that most of the existing code in DPDK already works
>    like this, i.e. applying configs to VF itself.  Intel drivers
>    works like this and  Mellanox drivers, as Thomas said, doesn't
>    have this functionality at all.

As I said above, I don't think we need to refer to specific driver behavior, rather the API guidelines. 
To me, it is a bit strange and not natural that ethdev configuration is applied to different port w/o any explicit request from the application. 
This is why I would prefer #1 above. 

> 
> >
> >
> >>>> The this is that this new API will produce conceptual fragmentation
> >>>> between DPDK and the Linux kernel, because to do the same thing
> >>>> you'll have to use different ways. I mean, to change mac of VF in
> >>>> kernel you need to set mac to the representor, but in DPDK changing
> >>>> setting mac to representor will lead to changing the mac of the
> >>>> representor itself, not the VF. This will be really confusing for users.
> >>>
> >>> I am not responsible of the choices in Linux.
> >>> But I agree it would be interesting to check the reason of such decision.
> >>> Rony, please could you explain?
> >
> > I looked at few Linux drivers:
> >
> > 	bnxt_vf_rep_netdev_ops has no op to set MAC
> > 	bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >
> > 	lio_vf_rep_ndev_ops has no op to set MAC
> > 	lionetdevops.ndo_set_vf_mac = set VF MAC from PF
> >
> > 	mlx5e_netdev_ops_rep has no op to set MAC
> > 	mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> > 	mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
> PF
> >
> > 	nfp_repr_netdev_ops.ndo_set_mac_address = set representor
> MAC
> > 	nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
> representor
> > 	nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >
> > There is a big chance that the behaviour is not standardized in Linux
> > (as usual). So it is already confusing for users of Linux.
> >
> >
Ananyev, Konstantin Nov. 3, 2019, 3:27 p.m. UTC | #19
> > > If we consider what Intel did, i.e. configure VF in place of
> > > representor for some operations, there are two drawbacks:
> > > - confusing that some ops apply to representor, others apply to VF
> > > - some ops are not possible on representor (because targetted to VF)
> > >
> > > I still feel that the addition of one single bit in the port ID is an
> > > elegant solution to target either the VF or its representor.
> >
> > Since we already have a confusion about what is configured when operations
> > are performed on a representor port we have 2 options:
> 
> I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied
> on VF (please share if I missed it).
> The fact that there are some drivers that implemented it doesn't mean it is correct.

Well, it means that at least authors and reviewers of these patches,
plus probably next-net maintainers believe that it is correct.
At least they did - when patch was applied.
If that is not clearly stated in the doc - it might be just the gap in the documentation,
that needs to be fixed, not a mandate to break existing behavior.   

> 
> > 1. Have this proposed API to configure representor itself while
> >     setting config to representor and configuring VF if special
> >     bit enabled.
> > 2. Reverse the logic of current proposal, i.e. always apply
> >     configuration to VF while working with representor and apply
> >     configuration to representor itself if special bit is set.
> >
> > I'd probably prefer option #2, because:
> > - From the OVS and OpenStack point of view, I think, we don't
> >    really need to configure representor itself in most cases.
> >    And OVS really should not know if it works with representor
> >    or some real port.

+1 to keep existing behavior if possible.

 
> I don't thinks OVS can be really agnostic to the fact it runs on top of representors:
> 1. probing of representor has different command line -w <bdf>,representor=XXX
> 2. the whole acceleration framework based on insertion of flow rules for direct forward from the VF to target entity. Rules are applied on
> the representor and would not work if port is not such.
> 3. some multi-port devices cannot do direct fwd between its different port. This is why rep has switch_id and application should query it and
> act upon.
> 4. representor carry the VF port id. This is how application know to which VF (or vport) they associated with on their other side.
> 
> > - It seems that most of the existing code in DPDK already works
> >    like this, i.e. applying configs to VF itself.  Intel drivers
> >    works like this and  Mellanox drivers, as Thomas said, doesn't
> >    have this functionality at all.
> 
> As I said above, I don't think we need to refer to specific driver behavior, rather the API guidelines.
> To me, it is a bit strange and not natural that ethdev configuration is applied to different port w/o any explicit request from the application.
> This is why I would prefer #1 above.
> 
> >
> > >
> > >
> > >>>> The this is that this new API will produce conceptual fragmentation
> > >>>> between DPDK and the Linux kernel, because to do the same thing
> > >>>> you'll have to use different ways. I mean, to change mac of VF in
> > >>>> kernel you need to set mac to the representor, but in DPDK changing
> > >>>> setting mac to representor will lead to changing the mac of the
> > >>>> representor itself, not the VF. This will be really confusing for users.
> > >>>
> > >>> I am not responsible of the choices in Linux.
> > >>> But I agree it would be interesting to check the reason of such decision.
> > >>> Rony, please could you explain?
> > >
> > > I looked at few Linux drivers:
> > >
> > > 	bnxt_vf_rep_netdev_ops has no op to set MAC
> > > 	bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> > >
> > > 	lio_vf_rep_ndev_ops has no op to set MAC
> > > 	lionetdevops.ndo_set_vf_mac = set VF MAC from PF
> > >
> > > 	mlx5e_netdev_ops_rep has no op to set MAC
> > > 	mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> > > 	mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
> > PF
> > >
> > > 	nfp_repr_netdev_ops.ndo_set_mac_address = set representor
> > MAC
> > > 	nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
> > representor
> > > 	nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> > >
> > > There is a big chance that the behaviour is not standardized in Linux
> > > (as usual). So it is already confusing for users of Linux.
> > >
> > >
Thomas Monjalon Nov. 3, 2019, 10:09 p.m. UTC | #20
03/11/2019 16:27, Ananyev, Konstantin:
> 
> > > > If we consider what Intel did, i.e. configure VF in place of
> > > > representor for some operations, there are two drawbacks:
> > > > - confusing that some ops apply to representor, others apply to VF
> > > > - some ops are not possible on representor (because targetted to VF)
> > > >
> > > > I still feel that the addition of one single bit in the port ID is an
> > > > elegant solution to target either the VF or its representor.
> > >
> > > Since we already have a confusion about what is configured when operations
> > > are performed on a representor port we have 2 options:
> > 
> > I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied
> > on VF (please share if I missed it).
> > The fact that there are some drivers that implemented it doesn't mean it is correct.
> 
> Well, it means that at least authors and reviewers of these patches,
> plus probably next-net maintainers believe that it is correct.
> At least they did - when patch was applied.
> If that is not clearly stated in the doc - it might be just the gap in the documentation,

Gap in the documentation? We should state that the config should be applied
to the port specified with port_id and no other one? Funny

> that needs to be fixed, not a mandate to break existing behavior.

So because you managed to have a wrong patch applied in your PMD,
you want to make it the generic API in ethdev?
What a process!

Hey guys, if you want to change an API behaviour in a way others don't,
you just have to implement what you want in your PMD silently,
then you will be able to change the API to comply with your behaviour.
Wonderful.
If we allow such practice, DPDK is dead.
Ilya Maximets Nov. 4, 2019, 10:28 a.m. UTC | #21
On 03.11.2019 7:48, Shahaf Shuler wrote:
> Friday, November 1, 2019 11:33 AM, Ilya Maximets:
>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>> host
>>
>> On 30.10.2019 22:42, Thomas Monjalon wrote:
>>> 30/10/2019 17:09, Ilya Maximets:
>>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
>>>>> 30/10/2019 16:07, Ilya Maximets:
>>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>>>>>> In a virtual environment, the network controller may have to
>>>>>>> configure some SR-IOV VF parameters for security reasons.
>>>>>>>
> [...]
> 
>>> If we consider what Intel did, i.e. configure VF in place of
>>> representor for some operations, there are two drawbacks:
>>> - confusing that some ops apply to representor, others apply to VF
>>> - some ops are not possible on representor (because targetted to VF)
>>>
>>> I still feel that the addition of one single bit in the port ID is an
>>> elegant solution to target either the VF or its representor.
>>
>> Since we already have a confusion about what is configured when operations
>> are performed on a representor port we have 2 options:
> 
> I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied on VF (please share if I missed it).
> The fact that there are some drivers that implemented it doesn't mean it is correct.
> 
>> 1. Have this proposed API to configure representor itself while
>>      setting config to representor and configuring VF if special
>>      bit enabled.
>> 2. Reverse the logic of current proposal, i.e. always apply
>>      configuration to VF while working with representor and apply
>>      configuration to representor itself if special bit is set.
>>
>> I'd probably prefer option #2, because:
>> - From the OVS and OpenStack point of view, I think, we don't
>>     really need to configure representor itself in most cases.
>>     And OVS really should not know if it works with representor
>>     or some real port.
> 
> I don't thinks OVS can be really agnostic to the fact it runs on top of representors:
> 1. probing of representor has different command line -w <bdf>,representor=XXX

OVS doesn't care about content of devargs. It just passes them to hotplug
engine without any parsing (except a single case that must be eliminated
with a proper device iterators, not an OVS issue).

> 2. the whole acceleration framework based on insertion of flow rules for direct forward from the VF to target entity. Rules are applied on the representor and would not work if port is not such.

OVS tries to offload rules to the netdev from which packet was received.
That's it.  If it succeeds - OK.  If not, OVS doesn't care.

> 3. some multi-port devices cannot do direct fwd between its different port. This is why rep has switch_id and application should query it and act upon.

This is part of offloading engine that doesn't affect the generic code.
If needed, OVS could request switch_id for netdev it tries to offload rules on.
OVS should not know if it representor port or not. If this operation will not
succeed for non-representors, OVS should not care because we can't offload
anything for non-representors anyway.

> 4. representor carry the VF port id. This is how application know to which VF (or vport) they associated with on their other side.

This is just part of devargs, i.e. part of device unique identifier.
Once again, OVS doesn't parse devargs and should not do that.

> 
>> - It seems that most of the existing code in DPDK already works
>>     like this, i.e. applying configs to VF itself.  Intel drivers
>>     works like this and  Mellanox drivers, as Thomas said, doesn't
>>     have this functionality at all.
> 
> As I said above, I don't think we need to refer to specific driver behavior, rather the API guidelines.
> To me, it is a bit strange and not natural that ethdev configuration is applied to different port w/o any explicit request from the application.
> This is why I would prefer #1 above.

IMHO, the whole concept of representors is that representor is a
way of attaching same port both to VM and vSwitch/hypervisor.

If you're looking at representors as a separate ports on a switch, well..
In this case, for me VF configuration looks like something that
vSwitch should not do at all, because it should not configure ports
that doesn't attached to it.  It's like configuring the other
side of veth pair, which is nonsense.


BTW, I don't know a way to find out if port is a representor of something
or not in Linux kernel.

> 
>>
>>>
>>>
>>>>>> The this is that this new API will produce conceptual fragmentation
>>>>>> between DPDK and the Linux kernel, because to do the same thing
>>>>>> you'll have to use different ways. I mean, to change mac of VF in
>>>>>> kernel you need to set mac to the representor, but in DPDK changing
>>>>>> setting mac to representor will lead to changing the mac of the
>>>>>> representor itself, not the VF. This will be really confusing for users.
>>>>>
>>>>> I am not responsible of the choices in Linux.
>>>>> But I agree it would be interesting to check the reason of such decision.
>>>>> Rony, please could you explain?
>>>
>>> I looked at few Linux drivers:
>>>
>>> 	bnxt_vf_rep_netdev_ops has no op to set MAC
>>> 	bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>
>>> 	lio_vf_rep_ndev_ops has no op to set MAC
>>> 	lionetdevops.ndo_set_vf_mac = set VF MAC from PF
>>>
>>> 	mlx5e_netdev_ops_rep has no op to set MAC
>>> 	mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>> 	mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
>> PF
>>>
>>> 	nfp_repr_netdev_ops.ndo_set_mac_address = set representor
>> MAC
>>> 	nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
>> representor
>>> 	nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>
>>> There is a big chance that the behaviour is not standardized in Linux
>>> (as usual). So it is already confusing for users of Linux.
>>>
>>>
Asaf Penso Nov. 4, 2019, 2:30 p.m. UTC | #22
Regards,
Asaf Penso

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Ilya Maximets
> Sent: Monday, November 4, 2019 12:28 PM
> To: Shahaf Shuler <shahafs@mellanox.com>; Ilya Maximets
> <i.maximets@ovn.org>; Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; Jerin Jacob <jerinjacobk@gmail.com>; Andrew
> Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>; Roni Bar Yanai <roniba@mellanox.com>;
> Rony Efraim <ronye@mellanox.com>; declan.doherty@intel.com;
> bernard.iremonger@intel.com; ajit.khaparde@broadcom.com; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
> 
> On 03.11.2019 7:48, Shahaf Shuler wrote:
> > Friday, November 1, 2019 11:33 AM, Ilya Maximets:
> >> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> >> host
> >>
> >> On 30.10.2019 22:42, Thomas Monjalon wrote:
> >>> 30/10/2019 17:09, Ilya Maximets:
> >>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
> >>>>> 30/10/2019 16:07, Ilya Maximets:
> >>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
> >>>>>>> In a virtual environment, the network controller may have to
> >>>>>>> configure some SR-IOV VF parameters for security reasons.
> >>>>>>>
> > [...]
> >
> >>> If we consider what Intel did, i.e. configure VF in place of
> >>> representor for some operations, there are two drawbacks:
> >>> - confusing that some ops apply to representor, others apply to VF
> >>> - some ops are not possible on representor (because targetted to VF)
> >>>
> >>> I still feel that the addition of one single bit in the port ID is an
> >>> elegant solution to target either the VF or its representor.
> >>
> >> Since we already have a confusion about what is configured when
> operations
> >> are performed on a representor port we have 2 options:
> >
> > I don't agree we have. I don't think there is any design note or API doc that
> says the ethdev configuration on representor should be applied on VF
> (please share if I missed it).
> > The fact that there are some drivers that implemented it doesn't mean it is
> correct.
> >
> >> 1. Have this proposed API to configure representor itself while
> >>      setting config to representor and configuring VF if special
> >>      bit enabled.
> >> 2. Reverse the logic of current proposal, i.e. always apply
> >>      configuration to VF while working with representor and apply
> >>      configuration to representor itself if special bit is set.
> >>
> >> I'd probably prefer option #2, because:
> >> - From the OVS and OpenStack point of view, I think, we don't
> >>     really need to configure representor itself in most cases.
> >>     And OVS really should not know if it works with representor
> >>     or some real port.
> >
> > I don't thinks OVS can be really agnostic to the fact it runs on top of
> representors:
> > 1. probing of representor has different command line -w
> <bdf>,representor=XXX
> 
> OVS doesn't care about content of devargs. It just passes them to hotplug
> engine without any parsing (except a single case that must be eliminated
> with a proper device iterators, not an OVS issue).
> 
> > 2. the whole acceleration framework based on insertion of flow rules for
> direct forward from the VF to target entity. Rules are applied on the
> representor and would not work if port is not such.
> 
> OVS tries to offload rules to the netdev from which packet was received.
> That's it.  If it succeeds - OK.  If not, OVS doesn't care.
> 
> > 3. some multi-port devices cannot do direct fwd between its different port.
> This is why rep has switch_id and application should query it and act upon.
> 
> This is part of offloading engine that doesn't affect the generic code.
> If needed, OVS could request switch_id for netdev it tries to offload rules on.
> OVS should not know if it representor port or not. If this operation will not
> succeed for non-representors, OVS should not care because we can't offload
> anything for non-representors anyway.
> 
> > 4. representor carry the VF port id. This is how application know to which
> VF (or vport) they associated with on their other side.
> 
> This is just part of devargs, i.e. part of device unique identifier.
> Once again, OVS doesn't parse devargs and should not do that.
> 
> >
> >> - It seems that most of the existing code in DPDK already works
> >>     like this, i.e. applying configs to VF itself.  Intel drivers
> >>     works like this and  Mellanox drivers, as Thomas said, doesn't
> >>     have this functionality at all.
> >
> > As I said above, I don't think we need to refer to specific driver behavior,
> rather the API guidelines.
> > To me, it is a bit strange and not natural that ethdev configuration is applied
> to different port w/o any explicit request from the application.
> > This is why I would prefer #1 above.
> 
> IMHO, the whole concept of representors is that representor is a
> way of attaching same port both to VM and vSwitch/hypervisor.
> 
> If you're looking at representors as a separate ports on a switch, well..
> In this case, for me VF configuration looks like something that
> vSwitch should not do at all, because it should not configure ports
> that doesn't attached to it.  It's like configuring the other
> side of veth pair, which is nonsense.
> 
> 
> BTW, I don't know a way to find out if port is a representor of something
> or not in Linux kernel.

Specifically in OVS, in function dpdk_eth_dev_init, you can do this to detect:
*info.dev_flags & RTE_ETH_DEV_REPRESENTOR

> 
> >
> >>
> >>>
> >>>
> >>>>>> The this is that this new API will produce conceptual fragmentation
> >>>>>> between DPDK and the Linux kernel, because to do the same thing
> >>>>>> you'll have to use different ways. I mean, to change mac of VF in
> >>>>>> kernel you need to set mac to the representor, but in DPDK changing
> >>>>>> setting mac to representor will lead to changing the mac of the
> >>>>>> representor itself, not the VF. This will be really confusing for users.
> >>>>>
> >>>>> I am not responsible of the choices in Linux.
> >>>>> But I agree it would be interesting to check the reason of such
> decision.
> >>>>> Rony, please could you explain?
> >>>
> >>> I looked at few Linux drivers:
> >>>
> >>> 	bnxt_vf_rep_netdev_ops has no op to set MAC
> >>> 	bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >>>
> >>> 	lio_vf_rep_ndev_ops has no op to set MAC
> >>> 	lionetdevops.ndo_set_vf_mac = set VF MAC from PF
> >>>
> >>> 	mlx5e_netdev_ops_rep has no op to set MAC
> >>> 	mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >>> 	mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
> >> PF
> >>>
> >>> 	nfp_repr_netdev_ops.ndo_set_mac_address = set representor
> >> MAC
> >>> 	nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
> >> representor
> >>> 	nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
> >>>
> >>> There is a big chance that the behaviour is not standardized in Linux
> >>> (as usual). So it is already confusing for users of Linux.
> >>>
> >>>
Ilya Maximets Nov. 4, 2019, 2:58 p.m. UTC | #23
On 04.11.2019 15:30, Asaf Penso wrote:
> 
> 
> Regards,
> Asaf Penso
> 
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Ilya Maximets
>> Sent: Monday, November 4, 2019 12:28 PM
>> To: Shahaf Shuler <shahafs@mellanox.com>; Ilya Maximets
>> <i.maximets@ovn.org>; Thomas Monjalon <thomas@monjalon.net>
>> Cc: dev@dpdk.org; Jerin Jacob <jerinjacobk@gmail.com>; Andrew
>> Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit
>> <ferruh.yigit@intel.com>; Stephen Hemminger
>> <stephen@networkplumber.org>; Roni Bar Yanai <roniba@mellanox.com>;
>> Rony Efraim <ronye@mellanox.com>; declan.doherty@intel.com;
>> bernard.iremonger@intel.com; ajit.khaparde@broadcom.com; Ananyev,
>> Konstantin <konstantin.ananyev@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>> host
>>
>> On 03.11.2019 7:48, Shahaf Shuler wrote:
>>> Friday, November 1, 2019 11:33 AM, Ilya Maximets:
>>>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>>>> host
>>>>
>>>> On 30.10.2019 22:42, Thomas Monjalon wrote:
>>>>> 30/10/2019 17:09, Ilya Maximets:
>>>>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
>>>>>>> 30/10/2019 16:07, Ilya Maximets:
>>>>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>>>>>>>> In a virtual environment, the network controller may have to
>>>>>>>>> configure some SR-IOV VF parameters for security reasons.
>>>>>>>>>
>>> [...]
>>>
>>>>> If we consider what Intel did, i.e. configure VF in place of
>>>>> representor for some operations, there are two drawbacks:
>>>>> - confusing that some ops apply to representor, others apply to VF
>>>>> - some ops are not possible on representor (because targetted to VF)
>>>>>
>>>>> I still feel that the addition of one single bit in the port ID is an
>>>>> elegant solution to target either the VF or its representor.
>>>>
>>>> Since we already have a confusion about what is configured when
>> operations
>>>> are performed on a representor port we have 2 options:
>>>
>>> I don't agree we have. I don't think there is any design note or API doc that
>> says the ethdev configuration on representor should be applied on VF
>> (please share if I missed it).
>>> The fact that there are some drivers that implemented it doesn't mean it is
>> correct.
>>>
>>>> 1. Have this proposed API to configure representor itself while
>>>>       setting config to representor and configuring VF if special
>>>>       bit enabled.
>>>> 2. Reverse the logic of current proposal, i.e. always apply
>>>>       configuration to VF while working with representor and apply
>>>>       configuration to representor itself if special bit is set.
>>>>
>>>> I'd probably prefer option #2, because:
>>>> - From the OVS and OpenStack point of view, I think, we don't
>>>>      really need to configure representor itself in most cases.
>>>>      And OVS really should not know if it works with representor
>>>>      or some real port.
>>>
>>> I don't thinks OVS can be really agnostic to the fact it runs on top of
>> representors:
>>> 1. probing of representor has different command line -w
>> <bdf>,representor=XXX
>>
>> OVS doesn't care about content of devargs. It just passes them to hotplug
>> engine without any parsing (except a single case that must be eliminated
>> with a proper device iterators, not an OVS issue).
>>
>>> 2. the whole acceleration framework based on insertion of flow rules for
>> direct forward from the VF to target entity. Rules are applied on the
>> representor and would not work if port is not such.
>>
>> OVS tries to offload rules to the netdev from which packet was received.
>> That's it.  If it succeeds - OK.  If not, OVS doesn't care.
>>
>>> 3. some multi-port devices cannot do direct fwd between its different port.
>> This is why rep has switch_id and application should query it and act upon.
>>
>> This is part of offloading engine that doesn't affect the generic code.
>> If needed, OVS could request switch_id for netdev it tries to offload rules on.
>> OVS should not know if it representor port or not. If this operation will not
>> succeed for non-representors, OVS should not care because we can't offload
>> anything for non-representors anyway.
>>
>>> 4. representor carry the VF port id. This is how application know to which
>> VF (or vport) they associated with on their other side.
>>
>> This is just part of devargs, i.e. part of device unique identifier.
>> Once again, OVS doesn't parse devargs and should not do that.
>>
>>>
>>>> - It seems that most of the existing code in DPDK already works
>>>>      like this, i.e. applying configs to VF itself.  Intel drivers
>>>>      works like this and  Mellanox drivers, as Thomas said, doesn't
>>>>      have this functionality at all.
>>>
>>> As I said above, I don't think we need to refer to specific driver behavior,
>> rather the API guidelines.
>>> To me, it is a bit strange and not natural that ethdev configuration is applied
>> to different port w/o any explicit request from the application.
>>> This is why I would prefer #1 above.
>>
>> IMHO, the whole concept of representors is that representor is a
>> way of attaching same port both to VM and vSwitch/hypervisor.
>>
>> If you're looking at representors as a separate ports on a switch, well..
>> In this case, for me VF configuration looks like something that
>> vSwitch should not do at all, because it should not configure ports
>> that doesn't attached to it.  It's like configuring the other
>> side of veth pair, which is nonsense.
>>
>>
>> BTW, I don't know a way to find out if port is a representor of something
>> or not in Linux kernel.
> 
> Specifically in OVS, in function dpdk_eth_dev_init, you can do this to detect:
> *info.dev_flags & RTE_ETH_DEV_REPRESENTOR

Sure, but I meant a way to do that for Linux network interface, not DPDK.

> 
>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>>> The this is that this new API will produce conceptual fragmentation
>>>>>>>> between DPDK and the Linux kernel, because to do the same thing
>>>>>>>> you'll have to use different ways. I mean, to change mac of VF in
>>>>>>>> kernel you need to set mac to the representor, but in DPDK changing
>>>>>>>> setting mac to representor will lead to changing the mac of the
>>>>>>>> representor itself, not the VF. This will be really confusing for users.
>>>>>>>
>>>>>>> I am not responsible of the choices in Linux.
>>>>>>> But I agree it would be interesting to check the reason of such
>> decision.
>>>>>>> Rony, please could you explain?
>>>>>
>>>>> I looked at few Linux drivers:
>>>>>
>>>>> 	bnxt_vf_rep_netdev_ops has no op to set MAC
>>>>> 	bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>>>
>>>>> 	lio_vf_rep_ndev_ops has no op to set MAC
>>>>> 	lionetdevops.ndo_set_vf_mac = set VF MAC from PF
>>>>>
>>>>> 	mlx5e_netdev_ops_rep has no op to set MAC
>>>>> 	mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>>> 	mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from
>>>> PF
>>>>>
>>>>> 	nfp_repr_netdev_ops.ndo_set_mac_address = set representor
>>>> MAC
>>>>> 	nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from
>>>> representor
>>>>> 	nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
>>>>>
>>>>> There is a big chance that the behaviour is not standardized in Linux
>>>>> (as usual). So it is already confusing for users of Linux.
>>>>>
>>>>>
Shahaf Shuler Nov. 4, 2019, 8:33 p.m. UTC | #24
Monday, November 4, 2019 12:28 PM, Ilya Maximets:
> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
> host
> 
> On 03.11.2019 7:48, Shahaf Shuler wrote:
> > Friday, November 1, 2019 11:33 AM, Ilya Maximets:
> >> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF
> >> from host
> >>
> >> On 30.10.2019 22:42, Thomas Monjalon wrote:
> >>> 30/10/2019 17:09, Ilya Maximets:
> >>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
> >>>>> 30/10/2019 16:07, Ilya Maximets:
> >>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
> >>>>>>> In a virtual environment, the network controller may have to
> >>>>>>> configure some SR-IOV VF parameters for security reasons.
> >>>>>>>
> > [...]
> >
> >>> If we consider what Intel did, i.e. configure VF in place of
> >>> representor for some operations, there are two drawbacks:
> >>> - confusing that some ops apply to representor, others apply to VF
> >>> - some ops are not possible on representor (because targetted to VF)
> >>>
> >>> I still feel that the addition of one single bit in the port ID is
> >>> an elegant solution to target either the VF or its representor.
> >>
> >> Since we already have a confusion about what is configured when
> >> operations are performed on a representor port we have 2 options:
> >
> > I don't agree we have. I don't think there is any design note or API doc that
> says the ethdev configuration on representor should be applied on VF
> (please share if I missed it).
> > The fact that there are some drivers that implemented it doesn't mean it is
> correct.
> >
> >> 1. Have this proposed API to configure representor itself while
> >>      setting config to representor and configuring VF if special
> >>      bit enabled.
> >> 2. Reverse the logic of current proposal, i.e. always apply
> >>      configuration to VF while working with representor and apply
> >>      configuration to representor itself if special bit is set.
> >>
> >> I'd probably prefer option #2, because:
> >> - From the OVS and OpenStack point of view, I think, we don't
> >>     really need to configure representor itself in most cases.
> >>     And OVS really should not know if it works with representor
> >>     or some real port.
> >
> > I don't thinks OVS can be really agnostic to the fact it runs on top of
> representors:
> > 1. probing of representor has different command line -w
> > <bdf>,representor=XXX
> 
> OVS doesn't care about content of devargs. It just passes them to hotplug
> engine without any parsing (except a single case that must be eliminated
> with a proper device iterators, not an OVS issue).
> 
> > 2. the whole acceleration framework based on insertion of flow rules for
> direct forward from the VF to target entity. Rules are applied on the
> representor and would not work if port is not such.
> 
> OVS tries to offload rules to the netdev from which packet was received.
> That's it.  If it succeeds - OK.  If not, OVS doesn't care.
> 
> > 3. some multi-port devices cannot do direct fwd between its different port.
> This is why rep has switch_id and application should query it and act upon.
> 
> This is part of offloading engine that doesn't affect the generic code.
> If needed, OVS could request switch_id for netdev it tries to offload rules on.
> OVS should not know if it representor port or not. If this operation will not
> succeed for non-representors, OVS should not care because we can't offload
> anything for non-representors anyway.

I am not speaking on specific module in OVS if it is aware or not, rather OVS as a whole. 
There is a big code part in OVS that its sole purpose is to create flow rule on representors. 
There is no meaning for this code w/o representors and it should be used only when OVS runs on top of representors (otherwise I would expect bad perf due to the constant preparation of rte_flow rules). 

> 
> > 4. representor carry the VF port id. This is how application know to which
> VF (or vport) they associated with on their other side.
> 
> This is just part of devargs, 

It is not part of devargs, it is part of the ethdev info. 
OVS should expose such info to the user in order to help w/ the open flow rules creation. 

i.e. part of device unique identifier.
> Once again, OVS doesn't parse devargs and should not do that.

So how can OVS user know the mapping between representor and its VF?

> 
> >
> >> - It seems that most of the existing code in DPDK already works
> >>     like this, i.e. applying configs to VF itself.  Intel drivers
> >>     works like this and  Mellanox drivers, as Thomas said, doesn't
> >>     have this functionality at all.
> >
> > As I said above, I don't think we need to refer to specific driver behavior,
> rather the API guidelines.
> > To me, it is a bit strange and not natural that ethdev configuration is applied
> to different port w/o any explicit request from the application.
> > This is why I would prefer #1 above.
> 
> IMHO, the whole concept of representors is that representor is a way of
> attaching same port both to VM and vSwitch/hypervisor.
> 
> If you're looking at representors as a separate ports on a switch, well..

I think we cannot ignore the fact that both in Linux and DPDK those are netdev/ethdev and as such holds some network configuration (even though on some case it is very minimal or some random values).

> In this case, for me VF configuration looks like something that vSwitch should
> not do at all, because it should not configure ports that doesn't attached to it.
> It's like configuring the other side of veth pair, which is nonsense.

It seems to me we configure the remote VF on every possible solution. The suggest one discussed here just make it more explicit. 

> 
> 
> BTW, I don't know a way to find out if port is a representor of something or
> not in Linux kernel.

One example is by querying the netdev phys_port_name. representor ports will have specific strings to describe the remote end (e.g. pf0vf0) while regular port (e.g. uplink) will have pf0.
Ilya Maximets Nov. 5, 2019, 12:15 p.m. UTC | #25
On 04.11.2019 21:33, Shahaf Shuler wrote:
> Monday, November 4, 2019 12:28 PM, Ilya Maximets:
>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF from
>> host
>>
>> On 03.11.2019 7:48, Shahaf Shuler wrote:
>>> Friday, November 1, 2019 11:33 AM, Ilya Maximets:
>>>> Subject: Re: [dpdk-dev] [PATCH v2 0/3] ethdev: configure SR-IOV VF
>>>> from host
>>>>
>>>> On 30.10.2019 22:42, Thomas Monjalon wrote:
>>>>> 30/10/2019 17:09, Ilya Maximets:
>>>>>> On 30.10.2019 16:49, Thomas Monjalon wrote:
>>>>>>> 30/10/2019 16:07, Ilya Maximets:
>>>>>>>> On 29.10.2019 19:50, Thomas Monjalon wrote:
>>>>>>>>> In a virtual environment, the network controller may have to
>>>>>>>>> configure some SR-IOV VF parameters for security reasons.
>>>>>>>>>
>>> [...]
>>>
>>>>> If we consider what Intel did, i.e. configure VF in place of
>>>>> representor for some operations, there are two drawbacks:
>>>>> - confusing that some ops apply to representor, others apply to VF
>>>>> - some ops are not possible on representor (because targetted to VF)
>>>>>
>>>>> I still feel that the addition of one single bit in the port ID is
>>>>> an elegant solution to target either the VF or its representor.
>>>>
>>>> Since we already have a confusion about what is configured when
>>>> operations are performed on a representor port we have 2 options:
>>>
>>> I don't agree we have. I don't think there is any design note or API doc that
>> says the ethdev configuration on representor should be applied on VF
>> (please share if I missed it).
>>> The fact that there are some drivers that implemented it doesn't mean it is
>> correct.
>>>
>>>> 1. Have this proposed API to configure representor itself while
>>>>       setting config to representor and configuring VF if special
>>>>       bit enabled.
>>>> 2. Reverse the logic of current proposal, i.e. always apply
>>>>       configuration to VF while working with representor and apply
>>>>       configuration to representor itself if special bit is set.
>>>>
>>>> I'd probably prefer option #2, because:
>>>> - From the OVS and OpenStack point of view, I think, we don't
>>>>      really need to configure representor itself in most cases.
>>>>      And OVS really should not know if it works with representor
>>>>      or some real port.
>>>
>>> I don't thinks OVS can be really agnostic to the fact it runs on top of
>> representors:
>>> 1. probing of representor has different command line -w
>>> <bdf>,representor=XXX
>>
>> OVS doesn't care about content of devargs. It just passes them to hotplug
>> engine without any parsing (except a single case that must be eliminated
>> with a proper device iterators, not an OVS issue).
>>
>>> 2. the whole acceleration framework based on insertion of flow rules for
>> direct forward from the VF to target entity. Rules are applied on the
>> representor and would not work if port is not such.
>>
>> OVS tries to offload rules to the netdev from which packet was received.
>> That's it.  If it succeeds - OK.  If not, OVS doesn't care.
>>
>>> 3. some multi-port devices cannot do direct fwd between its different port.
>> This is why rep has switch_id and application should query it and act upon.
>>
>> This is part of offloading engine that doesn't affect the generic code.
>> If needed, OVS could request switch_id for netdev it tries to offload rules on.
>> OVS should not know if it representor port or not. If this operation will not
>> succeed for non-representors, OVS should not care because we can't offload
>> anything for non-representors anyway.
> 
> I am not speaking on specific module in OVS if it is aware or not, rather OVS as a whole.
> There is a big code part in OVS that its sole purpose is to create flow rule on representors.
> There is no meaning for this code w/o representors and it should be used only when OVS runs on top of representors (otherwise I would expect bad perf due to the constant preparation of rte_flow rules).

I tend to disagree with that statement.  You're missing cases like
MARK+RSS actions which is the only offloading type supported now
in OVS userspace datapath.  Few more cases I can think of are ingress
policing, access control, some fast rx drop rules that could be used
directly with any HW port that supports offloading, not only representors.
I don't know if all of this is possible right now with DPDK, but these
are definitely valid use cases.

Second point is that current implementation in OVS tries to offload
to any port, and you were one of the co-authors of that code.
After re-working of offloading achitecture that was done this year it's
possible to probe support of the feature before enabling offloading
for particular port, but it's 'TODO' item that is not implemented.

> 
>>
>>> 4. representor carry the VF port id. This is how application know to which
>> VF (or vport) they associated with on their other side.
>>
>> This is just part of devargs,
> 
> It is not part of devargs, it is part of the ethdev info.
> OVS should expose such info to the user in order to help w/ the open flow rules creation.

This info already exposed via devargs that are stored in ovsdb.

> 
> i.e. part of device unique identifier.
>> Once again, OVS doesn't parse devargs and should not do that.
> 
> So how can OVS user know the mapping between representor and its VF?

User already knows that because he/she is the one who wrote devargs.
The case is that OVS should not care about that. If users doesn't
want to check devargs passed to OVS to remember which port is the
representor of which VF, he/she could use meaningful port names.

> 
>>
>>>
>>>> - It seems that most of the existing code in DPDK already works
>>>>      like this, i.e. applying configs to VF itself.  Intel drivers
>>>>      works like this and  Mellanox drivers, as Thomas said, doesn't
>>>>      have this functionality at all.
>>>
>>> As I said above, I don't think we need to refer to specific driver behavior,
>> rather the API guidelines.
>>> To me, it is a bit strange and not natural that ethdev configuration is applied
>> to different port w/o any explicit request from the application.
>>> This is why I would prefer #1 above.
>>
>> IMHO, the whole concept of representors is that representor is a way of
>> attaching same port both to VM and vSwitch/hypervisor.
>>
>> If you're looking at representors as a separate ports on a switch, well..
> 
> I think we cannot ignore the fact that both in Linux and DPDK those are netdev/ethdev and as such holds some network configuration (even though on some case it is very minimal or some random values).
> 
>> In this case, for me VF configuration looks like something that vSwitch should
>> not do at all, because it should not configure ports that doesn't attached to it.
>> It's like configuring the other side of veth pair, which is nonsense.
> 
> It seems to me we configure the remote VF on every possible solution. The suggest one discussed here just make it more explicit.
> 
>>
>>
>> BTW, I don't know a way to find out if port is a representor of something or
>> not in Linux kernel.
> 
> One example is by querying the netdev phys_port_name. representor ports will have specific strings to describe the remote end (e.g. pf0vf0) while regular port (e.g. uplink) will have pf0.
>
Thomas Monjalon Nov. 7, 2019, 2:44 p.m. UTC | #26
Bad news for the DPDK API below:

03/11/2019 23:09, Thomas Monjalon:
> 03/11/2019 16:27, Ananyev, Konstantin:
> > 
> > > > > If we consider what Intel did, i.e. configure VF in place of
> > > > > representor for some operations, there are two drawbacks:
> > > > > - confusing that some ops apply to representor, others apply to VF
> > > > > - some ops are not possible on representor (because targetted to VF)

It seems this confusion and limitation is not scary enough to convince
of the right approach.

Most probably this figure was not well understood:

VF1 ------ VF1 rep /--------\
                   | switch | uplink rep ------ uplink ------ wire
VF2 ------ VF2 rep \--------/    (PF)

Please understand that a packet going from the VF rep to the VF,
is Tx from the representor point of view,
while it is Rx from the VF point of view.
It should explain why VF and rep are different.

One more explanation: doing Rx/Tx from the VF rep is the way,
for a vSwitch, to manage the forwarding in CPU,
before offloading it in the HW switch with flow rules.

After reading how the representor is implemented in Intel PMDs,
I understand there is no intent to do some vSwitch offload with SR-IOV.
The Mellanox PMD is capable of such full offload, that's why
the representor port must be a real DPDK port, not a ghost.


> > > > >
> > > > > I still feel that the addition of one single bit in the port ID is an
> > > > > elegant solution to target either the VF or its representor.
> > > >
> > > > Since we already have a confusion about what is configured when operations
> > > > are performed on a representor port we have 2 options:
> > > 
> > > I don't agree we have. I don't think there is any design note or API doc that says the ethdev configuration on representor should be applied
> > > on VF (please share if I missed it).
> > > The fact that there are some drivers that implemented it doesn't mean it is correct.
> > 
> > Well, it means that at least authors and reviewers of these patches,
> > plus probably next-net maintainers believe that it is correct.
> > At least they did - when patch was applied.

For information, a Mellanox paper of a netdevconf talk in 2016,
which is giving more information about the ideas behind the switchdev model:
https://netdevconf.info/1.2/papers/efraim-gerlitz-sriov-ovs-final.pdf


> > If that is not clearly stated in the doc - it might be just the gap in the documentation,
> 
> Gap in the documentation? We should state that the config should be applied
> to the port specified with port_id and no other one? Funny
> 
> > that needs to be fixed, not a mandate to break existing behavior.
> 
> So because you managed to have a wrong patch applied in your PMD,
> you want to make it the generic API in ethdev?
> What a process!
> 
> Hey guys, if you want to change an API behaviour in a way others don't,
> you just have to implement what you want in your PMD silently,
> then you will be able to change the API to comply with your behaviour.
> Wonderful.
> If we allow such practice, DPDK is dead.

During the Technical Board yesterday, it was decided to go with Intel
understanding of what is a representor, i.e. a ghost of the VF.
It was asked to implement the representor operations on the representor
port with a special flag to make explicit a non-VF operation.
I really tried to implement this reversed logic.
One blocker is to check the port ID flag in every functions,
including the inline Rx/Tx. So I came to the conclusion that the
reverse logic is really ugly and it has no short-term benefit
compared to the mix we have currently.
After long hours thinking in the night, I prefer focusing on the release.
It means we will continue to mix VF and representor operations
with the same port ID. For the record, I believe it is very bad.