[v5,0/9] ethdev: support SubFunction representor

Message ID	1611040409-11548-1-git-send-email-xuemingl@nvidia.com (mailing list archive)
Headers	From: Xueming Li <xuemingl@nvidia.com> To: Cc: dev@dpdk.org, Viacheslav Ovsiienko <viacheslavo@nvidia.com>, xuemingl@nvidia.com, Asaf Penso <asafp@nvidia.com> Date: Tue, 19 Jan 2021 07:13:25 +0000 Message-Id: <1611040409-11548-1-git-send-email-xuemingl@nvidia.com> In-Reply-To: <1608303356-13089-2-git-send-email-xuemingl@nvidia.com> References: <1608303356-13089-2-git-send-email-xuemingl@nvidia.com> Subject: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction representor Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>
Series	ethdev: support SubFunction representor \| [v5,0/9] ethdev: support SubFunction representor

Message ID

1611040409-11548-1-git-send-email-xuemingl@nvidia.com (mailing list archive)

Headers

From: Xueming Li <xuemingl@nvidia.com>
To: 
Cc: dev@dpdk.org, Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
 xuemingl@nvidia.com, Asaf Penso <asafp@nvidia.com>
Date: Tue, 19 Jan 2021 07:13:25 +0000
Message-Id: <1611040409-11548-1-git-send-email-xuemingl@nvidia.com>
In-Reply-To: <1608303356-13089-2-git-send-email-xuemingl@nvidia.com>
References: <1608303356-13089-2-git-send-email-xuemingl@nvidia.com>
Subject: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction representor
Precedence: list
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Series

ethdev: support SubFunction representor |

Message

Xueming Li Jan. 19, 2021, 7:13 a.m. UTC

  dedicated queues(txq, rxq). A SF netdev supports E-Switch representation
offload similar to existing PF and VF representors. A SF shares PCI
level resources with other SFs and/or with its parent PCI function.

From SmartNIC perspective, when PCI device is shared for multi-host,
representors for host controller and host PF is required.

This patch set introduces new representor types in addtion to existing
VF representor. Syntax:

[[c#]pf#]vf#: VF port representor/s from controller/pf
[[c#]pf#]sf#: SF port representor/s from controller/pf
#: VF representor - for backwards compatibility

"#" is number instance, list or range, valid examples:
  1, [1,3,5], [0-3], [0,2-4,6]

For backward compatibility, this patch also introduces new netdev
capability to indicate the capability of supportting SF representor.

Version history:
 RFC:
 	initial version [2]
 V2:
    - separate patch for represnetor infrastructure, controller, pf and
      sf.
    - replace representor ID macro with functions:
      rte_eth_representor_id_encode()
      rte_eth_representor_id_parse()
    - new patch to allow devargs with same PCI BDF but different
      representors.
    - other minor code updates according to comments, thanks Andrew!
    - update document
 V3:
    - improve probing of allowed devargs with same name.
    - parse single word of kvargs as key. 
    - update kvargs test cases.
 V4:
    - split first representor refactor patch into
      1: add representor type
      2: refector representor list parsing
    - push the patch supporting multi-devargs for same device.
 V5:
    - add comments for parsing functions
    - update switch_representation.rst - Thanks Ajit



[1] SubFunction in kernel:
https://lore.kernel.org/netdev/20201112192424.2742-1-parav@nvidia.com/

[2] RFC:
http://patchwork.dpdk.org/project/dpdk/list/?series=14376

[3] V2:
http://patchwork.dpdk.org/project/dpdk/list/?series=14559

[4] V3:
1 http://patchwork.dpdk.org/patch/86460/
2 http://patchwork.dpdk.org/patch/86461/
3 http://patchwork.dpdk.org/patch/86462/
4 http://patchwork.dpdk.org/patch/86463/
5 http://patchwork.dpdk.org/patch/86464/
6 http://patchwork.dpdk.org/patch/86465/
7 http://patchwork.dpdk.org/patch/86467/
8 http://patchwork.dpdk.org/patch/86466/
9 http://patchwork.dpdk.org/patch/86468/

[5] V4:
http://patchwork.dpdk.org/project/dpdk/list/?series=14809

Xueming Li (9):
  ethdev: introduce representor type
  ethdev: support representor port list
  ethdev: support new VF representor syntax
  ethdev: support sub function representor
  ethdev: support PF index in representor
  ethdev: support multi-host in representor
  devarg: change representor ID to bitmap
  ethdev: add capability of sub-function representor
  kvargs: update parser to support lists

 app/test/test_kvargs.c                        |  46 +++++-
 config/rte_config.h                           |   1 +
 doc/guides/prog_guide/poll_mode_drv.rst       |  13 +-
 .../prog_guide/switch_representation.rst      |  35 ++--
 drivers/net/bnxt/bnxt_ethdev.c                |   7 +
 drivers/net/bnxt/bnxt_reps.c                  |   3 +-
 drivers/net/enic/enic_ethdev.c                |   7 +
 drivers/net/enic/enic_vf_representor.c        |   3 +-
 drivers/net/i40e/i40e_ethdev.c                |   8 +
 drivers/net/i40e/i40e_vf_representor.c        |   3 +-
 drivers/net/ixgbe/ixgbe_ethdev.c              |   8 +
 drivers/net/ixgbe/ixgbe_vf_representor.c      |   3 +-
 drivers/net/mlx5/linux/mlx5_os.c              |  15 +-
 lib/librte_ethdev/ethdev_private.c            | 154 ++++++++++++------
 lib/librte_ethdev/ethdev_private.h            |   3 -
 lib/librte_ethdev/rte_class_eth.c             |  40 +++--
 lib/librte_ethdev/rte_ethdev.c                |  31 +++-
 lib/librte_ethdev/rte_ethdev.h                |   2 +
 lib/librte_ethdev/rte_ethdev_driver.h         |  66 ++++++++
 lib/librte_ethdev/version.map                 |   2 +
 lib/librte_kvargs/rte_kvargs.c                | 101 ++++++++----
 21 files changed, 428 insertions(+), 123 deletions(-)

Comments

Andrew Rybchenko Jan. 19, 2021, 8:40 a.m. UTC | #1

On 1/19/21 10:13 AM, Xueming Li wrote:
> dedicated queues(txq, rxq). A SF netdev supports E-Switch representation
> offload similar to existing PF and VF representors. A SF shares PCI
> level resources with other SFs and/or with its parent PCI function.
> 
>>From SmartNIC perspective, when PCI device is shared for multi-host,
> representors for host controller and host PF is required.
> 
> This patch set introduces new representor types in addtion to existing
> VF representor. Syntax:
> 
> [[c#]pf#]vf#: VF port representor/s from controller/pf
> [[c#]pf#]sf#: SF port representor/s from controller/pf
> #: VF representor - for backwards compatibility
> 
> "#" is number instance, list or range, valid examples:
>   1, [1,3,5], [0-3], [0,2-4,6]
> 
> For backward compatibility, this patch also introduces new netdev
> capability to indicate the capability of supportting SF representor.

The patch series looks really nice. Thanks.

As before, my biggest concern is making representor ID
a bitmap. See my comments to a specific patch.
Plus absence of defined semantics of caller function.
Basically it is looks like it is assumed that
controller #0 and PF #0 is the caller. However, it
could be wrong.

The next biggest concert is the absence of capability
reporting API. How many controller are available?
How many PFs on each controller are available?
How many VFs on each controller/PF are available?
How many SFs?

From the first sight it sounds not that important right
now and an extra feature which could be added in the
follow up patches, but IMHO addition of the API
would allow to avoid making representor ID a bitmap.
Basically capabilities API can provide an array of
available functions with representor ID assigned to
each entry. Also it could make the entire patch
series optional since it would allow to interpret
numbers in representor=[....] as representor IDs
which are mapped to controller/PF/VF/SF by the
capabilities reporting API.

I realize that sometimes it could be more convenient to
use syntax suggested here. Mainly for human.

Xueming Li Jan. 19, 2021, 2:24 p.m. UTC | #2

>-----Original Message-----
>From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>Sent: Tuesday, January 19, 2021 4:41 PM
>To: Xueming(Steven) Li <xuemingl@nvidia.com>
>Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf Penso
><asafp@nvidia.com>
>Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction
>representor
>
>On 1/19/21 10:13 AM, Xueming Li wrote:
>> dedicated queues(txq, rxq). A SF netdev supports E-Switch
>> representation offload similar to existing PF and VF representors. A
>> SF shares PCI level resources with other SFs and/or with its parent PCI
>function.
>>
>>>From SmartNIC perspective, when PCI device is shared for multi-host,
>> representors for host controller and host PF is required.
>>
>> This patch set introduces new representor types in addtion to existing
>> VF representor. Syntax:
>>
>> [[c#]pf#]vf#: VF port representor/s from controller/pf
>> [[c#]pf#]sf#: SF port representor/s from controller/pf
>> #: VF representor - for backwards compatibility
>>
>> "#" is number instance, list or range, valid examples:
>>   1, [1,3,5], [0-3], [0,2-4,6]
>>
>> For backward compatibility, this patch also introduces new netdev
>> capability to indicate the capability of supportting SF representor.
>
>The patch series looks really nice. Thanks.
>
>As before, my biggest concern is making representor ID a bitmap. See my
>comments to a specific patch.
>Plus absence of defined semantics of caller function.
>Basically it is looks like it is assumed that controller #0 and PF #0 is the caller.
>However, it could be wrong.

From devargs syntax perspective, a VF representor on PF1 could be referenced either way:
  1:  <PF0_BDF>,representor=pf1vfX
  2:  <PF1_BDF>,representor=vfX
  3:  <PF1_BDF>,representor=pf0vfX // works but not suggested, use PF0 BDF as caller
If probe a device with PF0 BDF and locate it with PF1 BDF, the device iterator will fail,
devargs is parsed by api, EAL layer simply compare PCI BDF and then representor ID, 
i.e. "c#pf#vf#". The caller device should be consistent, better to use the first device.

Devrg "<PF1_BDF>,representor=vfX" will work, representor controller ID and pf ID
default to #0, relative to caller context PF1 BDF.

Is it good to add such suggestion/behavior on rte_eth_devargs comments?

>
>The next biggest concert is the absence of capability reporting API. How many
>controller are available?
>How many PFs on each controller are available?
>How many VFs on each controller/PF are available?
>How many SFs?
>
>From the first sight it sounds not that important right now and an extra
>feature which could be added in the follow up patches, but IMHO addition of
>the API would allow to avoid making representor ID a bitmap.
>Basically capabilities API can provide an array of available functions with
>representor ID assigned to each entry. Also it could make the entire patch
>series optional since it would allow to interpret numbers in representor=[....]
>as representor IDs which are mapped to controller/PF/VF/SF by the
>capabilities reporting API.
>
>I realize that sometimes it could be more convenient to use syntax suggested
>here. Mainly for human.

Agree, mostly for human. Regarding to capability reporting API, how about remove
current one and enhance it later with a complete patch set?

Tu, Lijuan Jan. 21, 2021, 3:32 a.m. UTC | #3

> dedicated queues(txq, rxq). A SF netdev supports E-Switch representation
> offload similar to existing PF and VF representors. A SF shares PCI level
> resources with other SFs and/or with its parent PCI function.
> 
> From SmartNIC perspective, when PCI device is shared for multi-host,
> representors for host controller and host PF is required.
> 
> This patch set introduces new representor types in addtion to existing VF
> representor. Syntax:
> 
> [[c#]pf#]vf#: VF port representor/s from controller/pf
> [[c#]pf#]sf#: SF port representor/s from controller/pf
> #: VF representor - for backwards compatibility
> 
> "#" is number instance, list or range, valid examples:
>   1, [1,3,5], [0-3], [0,2-4,6]
> 
> For backward compatibility, this patch also introduces new netdev capability
> to indicate the capability of supportting SF representor.
> 
> Version history:
>  RFC:
>  	initial version [2]
>  V2:
>     - separate patch for represnetor infrastructure, controller, pf and
>       sf.
>     - replace representor ID macro with functions:
>       rte_eth_representor_id_encode()
>       rte_eth_representor_id_parse()
>     - new patch to allow devargs with same PCI BDF but different
>       representors.
>     - other minor code updates according to comments, thanks Andrew!
>     - update document
>  V3:
>     - improve probing of allowed devargs with same name.
>     - parse single word of kvargs as key.
>     - update kvargs test cases.
>  V4:
>     - split first representor refactor patch into
>       1: add representor type
>       2: refector representor list parsing
>     - push the patch supporting multi-devargs for same device.
>  V5:
>     - add comments for parsing functions
>     - update switch_representation.rst - Thanks Ajit

Hi,

The series can't recognize i40e device, so testpmd start failed.

	./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -l 4-8 -n 6 -a 17:00.0 -- -i --txq=2 --rxq=2
	<...>
	EAL: Probe PCI driver: net_i40e (8086:158b) device: 0000:17:00.0 (socket 0)
	EAL: No legacy callbacks, legacy socket not created
	testpmd: No probed ethernet devices
	Interactive-mode selected
	Fail: input txq (2) can't be greater than max_tx_queues (0) of port 0
	EAL: Error - exiting with code: 1
	  Cause: txq 2 invalid - must be >= 0 && <= 0

thanks

Andrew Rybchenko Jan. 22, 2021, 8:21 a.m. UTC | #4

On 1/19/21 5:24 PM, Xueming(Steven) Li wrote:
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Tuesday, January 19, 2021 4:41 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf Penso
>> <asafp@nvidia.com>
>> Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction
>> representor
>>
>> On 1/19/21 10:13 AM, Xueming Li wrote:
>>> dedicated queues(txq, rxq). A SF netdev supports E-Switch
>>> representation offload similar to existing PF and VF representors. A
>>> SF shares PCI level resources with other SFs and/or with its parent PCI
>> function.
>>>
>>> >From SmartNIC perspective, when PCI device is shared for multi-host,
>>> representors for host controller and host PF is required.
>>>
>>> This patch set introduces new representor types in addtion to existing
>>> VF representor. Syntax:
>>>
>>> [[c#]pf#]vf#: VF port representor/s from controller/pf
>>> [[c#]pf#]sf#: SF port representor/s from controller/pf
>>> #: VF representor - for backwards compatibility
>>>
>>> "#" is number instance, list or range, valid examples:
>>>   1, [1,3,5], [0-3], [0,2-4,6]
>>>
>>> For backward compatibility, this patch also introduces new netdev
>>> capability to indicate the capability of supportting SF representor.
>>
>> The patch series looks really nice. Thanks.
>>
>> As before, my biggest concern is making representor ID a bitmap. See my
>> comments to a specific patch.
>> Plus absence of defined semantics of caller function.
>> Basically it is looks like it is assumed that controller #0 and PF #0 is the caller.
>> However, it could be wrong.
> 
> From devargs syntax perspective, a VF representor on PF1 could be referenced either way:
>   1:  <PF0_BDF>,representor=pf1vfX
>   2:  <PF1_BDF>,representor=vfX
>   3:  <PF1_BDF>,representor=pf0vfX // works but not suggested, use PF0 BDF as caller
> If probe a device with PF0 BDF and locate it with PF1 BDF, the device iterator will fail,
> devargs is parsed by api, EAL layer simply compare PCI BDF and then representor ID, 
> i.e. "c#pf#vf#". The caller device should be consistent, better to use the first device.
> 
> Devrg "<PF1_BDF>,representor=vfX" will work, representor controller ID and pf ID
> default to #0, relative to caller context PF1 BDF.
> 
> Is it good to add such suggestion/behavior on rte_eth_devargs comments?

Sorry, but I disagree. Solution with vf1 and pf0vf1 making the
same representor ID sounds not acceptable to me. It is
definitely different things if I pass it to PF1.


>>
>> The next biggest concert is the absence of capability reporting API. How many
>> controller are available?
>> How many PFs on each controller are available?
>> How many VFs on each controller/PF are available?
>> How many SFs?
>>
>>From the first sight it sounds not that important right now and an extra
>> feature which could be added in the follow up patches, but IMHO addition of
>> the API would allow to avoid making representor ID a bitmap.
>> Basically capabilities API can provide an array of available functions with
>> representor ID assigned to each entry. Also it could make the entire patch
>> series optional since it would allow to interpret numbers in representor=[....]
>> as representor IDs which are mapped to controller/PF/VF/SF by the
>> capabilities reporting API.
>>
>> I realize that sometimes it could be more convenient to use syntax suggested
>> here. Mainly for human.
> 
> Agree, mostly for human. Regarding to capability reporting API, how about remove
> current one and enhance it later with a complete patch set?
> 

Current one? Sorry I don't understand.

Andrew.

Xueming Li Jan. 27, 2021, 3:04 a.m. UTC | #5

Hi Andrew,

>-----Original Message-----
>From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>Sent: Friday, January 22, 2021 4:21 PM
>To: Xueming(Steven) Li <xuemingl@nvidia.com>
>Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf Penso
><asafp@nvidia.com>
>Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction
>representor
>
>On 1/19/21 5:24 PM, Xueming(Steven) Li wrote:
>>
>>> -----Original Message-----
>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>> Sent: Tuesday, January 19, 2021 4:41 PM
>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf
>>> Penso <asafp@nvidia.com>
>>> Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction
>>> representor
>>>
>>> On 1/19/21 10:13 AM, Xueming Li wrote:
>>>> dedicated queues(txq, rxq). A SF netdev supports E-Switch
>>>> representation offload similar to existing PF and VF representors. A
>>>> SF shares PCI level resources with other SFs and/or with its parent
>>>> PCI
>>> function.
>>>>
>>>> >From SmartNIC perspective, when PCI device is shared for
>>>> >multi-host,
>>>> representors for host controller and host PF is required.
>>>>
>>>> This patch set introduces new representor types in addtion to
>>>> existing VF representor. Syntax:
>>>>
>>>> [[c#]pf#]vf#: VF port representor/s from controller/pf
>>>> [[c#]pf#]sf#: SF port representor/s from controller/pf
>>>> #: VF representor - for backwards compatibility
>>>>
>>>> "#" is number instance, list or range, valid examples:
>>>>   1, [1,3,5], [0-3], [0,2-4,6]
>>>>
>>>> For backward compatibility, this patch also introduces new netdev
>>>> capability to indicate the capability of supportting SF representor.
>>>
>>> The patch series looks really nice. Thanks.
>>>
>>> As before, my biggest concern is making representor ID a bitmap. See
>>> my comments to a specific patch.
>>> Plus absence of defined semantics of caller function.
>>> Basically it is looks like it is assumed that controller #0 and PF #0 is the
>caller.
>>> However, it could be wrong.
>>
>> From devargs syntax perspective, a VF representor on PF1 could be
>referenced either way:
>>   1:  <PF0_BDF>,representor=pf1vfX
>>   2:  <PF1_BDF>,representor=vfX
>>   3:  <PF1_BDF>,representor=pf0vfX // works but not suggested, use PF0
>> BDF as caller If probe a device with PF0 BDF and locate it with PF1
>> BDF, the device iterator will fail, devargs is parsed by api, EAL
>> layer simply compare PCI BDF and then representor ID, i.e. "c#pf#vf#". The
>caller device should be consistent, better to use the first device.
>>
>> Devrg "<PF1_BDF>,representor=vfX" will work, representor controller ID
>> and pf ID default to #0, relative to caller context PF1 BDF.
>>
>> Is it good to add such suggestion/behavior on rte_eth_devargs comments?
>
>Sorry, but I disagree. Solution with vf1 and pf0vf1 making the same
>representor ID sounds not acceptable to me. It is definitely different things if I
>pass it to PF1.

My understanding is that representor is an "offset" to caller(context) PF, so the
caller PF has to be the lowest, i.e. PF0. The usage for mlx5 PMD is to probe VF 
representor on a kernel bonding:
1. PF0 is the only device for PMD to probe the bonding device, PMD detects underlay PFs.
2. PF0_BDF,reprensetor=pf1vf0, to probe the representor for first VF on PF1.
PF1_BDF,reprenstor=pf0vf0 works for the same representor, but not encouraged because 
it confuses EAL device iterator which can't tell the difference, as you said: "mainly for human"

Controller ID is not used by mlx5 PMD currently, just a place holder for now, but I think 
same policy applies: lowest as caller(context).

Your suggestion of reserving an c# and pf# ID for caller device which c# or pf# not specified looks
good, it makes the usage of the representor ID bitmap flexible, but considering some NIC with 4 PFs,
it's hard to choose, as you know, only 3 controller and 3 PFs left.

BTW, I guess your assumption is that  representor as "absolute" to caller PF, can't default to 0.
Is there a scenario for PF1 as caller?

>
>
>>>
>>> The next biggest concert is the absence of capability reporting API.
>>> How many controller are available?
>>> How many PFs on each controller are available?
>>> How many VFs on each controller/PF are available?
>>> How many SFs?
>>>
>>>From the first sight it sounds not that important right now and an
>>>extra  feature which could be added in the follow up patches, but IMHO
>>>addition of  the API would allow to avoid making representor ID a bitmap.
>>> Basically capabilities API can provide an array of available
>>>functions with  representor ID assigned to each entry. Also it could
>>>make the entire patch  series optional since it would allow to
>>>interpret numbers in representor=[....]  as representor IDs which are
>>>mapped to controller/PF/VF/SF by the  capabilities reporting API.
>>>
>>> I realize that sometimes it could be more convenient to use syntax
>>> suggested here. Mainly for human.
>>
>> Agree, mostly for human. Regarding to capability reporting API, how
>> about remove current one and enhance it later with a complete patch set?
>>
>
>Current one? Sorry I don't understand.

The patch of device SF capability, but seems I misunderstood your suggestion.
Let me explain process to create a SF:
1. SF can be created on the fly with scripts, unlike VF which is statically pre-created.
2. SF is created on a PF with a SF number. SF number is named per PF, different PF may have same SF number.
3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no need to use pf#sf# here.
4. For bonding netdev, hot plug to DPDK using "PF0_BDF,representor=pf#sf#"
If using new api to return all representor IDs, need some way locate the new created SF by PF and SF number,
that's why "pf#sf#" is used in this patch set.

In the future, I think representor could be processed by PMD, so PMD could have enough flexibility
to support more device expressions and types. But that will introduce a fundamental change of devargs and 
device management, need a full plan.

>
>Andrew.

Andrew Rybchenko Jan. 27, 2021, 12:10 p.m. UTC | #6

On 1/27/21 6:04 AM, Xueming(Steven) Li wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Friday, January 22, 2021 4:21 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf Penso
>> <asafp@nvidia.com>
>> Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction
>> representor
>>
>> On 1/19/21 5:24 PM, Xueming(Steven) Li wrote:
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Tuesday, January 19, 2021 4:41 PM
>>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>>>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf
>>>> Penso <asafp@nvidia.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction
>>>> representor
>>>>
>>>> On 1/19/21 10:13 AM, Xueming Li wrote:
>>>>> dedicated queues(txq, rxq). A SF netdev supports E-Switch
>>>>> representation offload similar to existing PF and VF representors. A
>>>>> SF shares PCI level resources with other SFs and/or with its parent
>>>>> PCI
>>>> function.
>>>>>
>>>>> >From SmartNIC perspective, when PCI device is shared for
>>>>>> multi-host,
>>>>> representors for host controller and host PF is required.
>>>>>
>>>>> This patch set introduces new representor types in addtion to
>>>>> existing VF representor. Syntax:
>>>>>
>>>>> [[c#]pf#]vf#: VF port representor/s from controller/pf
>>>>> [[c#]pf#]sf#: SF port representor/s from controller/pf
>>>>> #: VF representor - for backwards compatibility
>>>>>
>>>>> "#" is number instance, list or range, valid examples:
>>>>>   1, [1,3,5], [0-3], [0,2-4,6]
>>>>>
>>>>> For backward compatibility, this patch also introduces new netdev
>>>>> capability to indicate the capability of supportting SF representor.
>>>>
>>>> The patch series looks really nice. Thanks.
>>>>
>>>> As before, my biggest concern is making representor ID a bitmap. See
>>>> my comments to a specific patch.
>>>> Plus absence of defined semantics of caller function.
>>>> Basically it is looks like it is assumed that controller #0 and PF #0 is the
>> caller.
>>>> However, it could be wrong.
>>>
>>> From devargs syntax perspective, a VF representor on PF1 could be
>> referenced either way:
>>>   1:  <PF0_BDF>,representor=pf1vfX
>>>   2:  <PF1_BDF>,representor=vfX
>>>   3:  <PF1_BDF>,representor=pf0vfX // works but not suggested, use PF0
>>> BDF as caller If probe a device with PF0 BDF and locate it with PF1
>>> BDF, the device iterator will fail, devargs is parsed by api, EAL
>>> layer simply compare PCI BDF and then representor ID, i.e. "c#pf#vf#". The
>> caller device should be consistent, better to use the first device.
>>>
>>> Devrg "<PF1_BDF>,representor=vfX" will work, representor controller ID
>>> and pf ID default to #0, relative to caller context PF1 BDF.
>>>
>>> Is it good to add such suggestion/behavior on rte_eth_devargs comments?
>>
>> Sorry, but I disagree. Solution with vf1 and pf0vf1 making the same
>> representor ID sounds not acceptable to me. It is definitely different things if I
>> pass it to PF1.
> 
> My understanding is that representor is an "offset" to caller(context) PF, so the
> caller PF has to be the lowest, i.e. PF0.

Sorry, how to address PF0 from PF1?

> The usage for mlx5 PMD is to probe VF 
> representor on a kernel bonding:
> 1. PF0 is the only device for PMD to probe the bonding device, PMD detects underlay PFs.
> 2. PF0_BDF,reprensetor=pf1vf0, to probe the representor for first VF on PF1.
> PF1_BDF,reprenstor=pf0vf0 works for the same representor, but not encouraged because 
> it confuses EAL device iterator which can't tell the difference, as you said: "mainly for human"
> 
> Controller ID is not used by mlx5 PMD currently, just a place holder for now, but I think 
> same policy applies: lowest as caller(context).
> 
> Your suggestion of reserving an c# and pf# ID for caller device which c# or pf# not specified looks
> good, it makes the usage of the representor ID bitmap flexible, but considering some NIC with 4 PFs,
> it's hard to choose, as you know, only 3 controller and 3 PFs left.

I'd prefer the option if we make a choice between these two.
At least it a bit more consistent.
But IMHO anyway bitmap is a road to wrong direction.

> BTW, I guess your assumption is that  representor as "absolute" to caller PF, can't default to 0.
> Is there a scenario for PF1 as caller?

Yes.

>>>>
>>>> The next biggest concert is the absence of capability reporting API.
>>>> How many controller are available?
>>>> How many PFs on each controller are available?
>>>> How many VFs on each controller/PF are available?
>>>> How many SFs?
>>>>
>>> >From the first sight it sounds not that important right now and an
>>>> extra  feature which could be added in the follow up patches, but IMHO
>>>> addition of  the API would allow to avoid making representor ID a bitmap.
>>>> Basically capabilities API can provide an array of available
>>>> functions with  representor ID assigned to each entry. Also it could
>>>> make the entire patch  series optional since it would allow to
>>>> interpret numbers in representor=[....]  as representor IDs which are
>>>> mapped to controller/PF/VF/SF by the  capabilities reporting API.
>>>>
>>>> I realize that sometimes it could be more convenient to use syntax
>>>> suggested here. Mainly for human.
>>>
>>> Agree, mostly for human. Regarding to capability reporting API, how
>>> about remove current one and enhance it later with a complete patch set?
>>>
>>
>> Current one? Sorry I don't understand.
> 
> The patch of device SF capability, but seems I misunderstood your suggestion.
> Let me explain process to create a SF:
> 1. SF can be created on the fly with scripts, unlike VF which is statically pre-created.

Is there a maximum index and maximum total number of SF's
created? How to find it?

> 2. SF is created on a PF with a SF number. SF number is named per PF, different PF may have same SF number.
> 3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no need to use pf#sf# here.
> 4. For bonding netdev, hot plug to DPDK using "PF0_BDF,representor=pf#sf#"
> If using new api to return all representor IDs, need some way locate the new created SF by PF and SF number,
> that's why "pf#sf#" is used in this patch set.

I think the API should simply reserve/report space for maximum
number of SFs. So, IDs are stable across restart/reboot in
assumption that NIC is not reconfigured (changed maximum number
of VF or maximum number of SFs of any PF).

> 
> In the future, I think representor could be processed by PMD, so PMD could have enough flexibility
> to support more device expressions and types. But that will introduce a fundamental change of devargs and 
> device management, need a full plan.
> 
>>
>> Andrew.
>

Ajit Khaparde Jan. 27, 2021, 5:43 p.m. UTC | #7

On Tue, Jan 26, 2021 at 7:05 PM Xueming(Steven) Li <xuemingl@nvidia.com> wrote:
>
::::[snip]::::
> The patch of device SF capability, but seems I misunderstood your suggestion.
> Let me explain process to create a SF:
> 1. SF can be created on the fly with scripts, unlike VF which is statically pre-created.
> 2. SF is created on a PF with a SF number. SF number is named per PF, different PF may have same SF number.
> 3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no need to use pf#sf# here.
> 4. For bonding netdev, hot plug to DPDK using "PF0_BDF,representor=pf#sf#"
> If using new api to return all representor IDs, need some way locate the new created SF by PF and SF number,
> that's why "pf#sf#" is used in this patch set.
>
> In the future, I think representor could be processed by PMD, so PMD could have enough flexibility
> to support more device expressions and types. But that will introduce a fundamental change of devargs and
> device management, need a full plan.
Do you mean all changes will be contained within the PMD? The
fundamental changes will be in the PMD?
More types of what?

>
> >
> >Andrew.
>

Xueming Li Jan. 28, 2021, 11:45 a.m. UTC | #8

Hi Ajit,

>-----Original Message-----
>From: Ajit Khaparde <ajit.khaparde@broadcom.com>
>Sent: Thursday, January 28, 2021 1:43 AM
>To: Xueming(Steven) Li <xuemingl@nvidia.com>
>Cc: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org;
>Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf Penso <asafp@nvidia.com>
>Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction
>representor
>
>On Tue, Jan 26, 2021 at 7:05 PM Xueming(Steven) Li <xuemingl@nvidia.com>
>wrote:
>>
>::::[snip]::::
>> The patch of device SF capability, but seems I misunderstood your
>suggestion.
>> Let me explain process to create a SF:
>> 1. SF can be created on the fly with scripts, unlike VF which is statically pre-
>created.
>> 2. SF is created on a PF with a SF number. SF number is named per PF,
>different PF may have same SF number.
>> 3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no
>need to use pf#sf# here.
>> 4. For bonding netdev, hot plug to DPDK using
>"PF0_BDF,representor=pf#sf#"
>> If using new api to return all representor IDs, need some way locate the new
>created SF by PF and SF number,
>> that's why "pf#sf#" is used in this patch set.
>>
>> In the future, I think representor could be processed by PMD, so PMD could
>have enough flexibility
>> to support more device expressions and types. But that will introduce a
>fundamental change of devargs and
>> device management, need a full plan.
>Do you mean all changes will be contained within the PMD? The
>fundamental changes will be in the PMD?

Yes, PMD detect device type and IDs from devargs and figure out a unique name for device.

>More types of what?

SF, representor of SF, representor of VF...

>
>>
>> >
>> >Andrew.
>>

Xueming Li Jan. 28, 2021, 2:31 p.m. UTC | #9

<snip>
>> The patch of device SF capability, but seems I misunderstood your suggestion.
>> Let me explain process to create a SF:
>> 1. SF can be created on the fly with scripts, unlike VF which is statically pre-created.
>
>Is there a maximum index and maximum total number of SF's created? How to find it?

The maximum index is defined by firmware configuration, all SF's information could be found
from sysfs. To create a SF, both PCI and sfnum have to be specified.

>
>> 2. SF is created on a PF with a SF number. SF number is named per PF, different PF may have same SF number.
>> 3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no need to use pf#sf# here.
>> 4. For bonding netdev, hot plug to DPDK using "PF0_BDF,representor=pf#sf#"
>> If using new api to return all representor IDs, need some way locate
>> the new created SF by PF and SF number, that's why "pf#sf#" is used in this patch set.
>
>I think the API should simply reserve/report space for maximum number of SFs. So, IDs are stable across restart/reboot in assumption
>that NIC is not reconfigured (changed maximum number of VF or maximum number of SFs of any PF).

Yes, IDs should be stable as long as no  NIC firmware configuration change.

Just clarify, this api should be common enough to report all devices that a bus device supports:
1. name, might contains controller and pf info, example: "eth:representor:c0pf1vf"
2. ID range, example: 0-127
The api describes ID ranges for each sub device type, users have to query the api and choose representor ID to probe.

Prototype:
struct rte_bus_device_range {
	char name[64];
	uint32_t base;
	uint32_t number;
}
/* return number of ranges filled, or number of ranges if list is NULL. */
int rte_bus_ dev_range_get(struct rte_bus_device_range *list, int n);


>
<snip>

Andrew Rybchenko Feb. 1, 2021, 8:39 a.m. UTC | #10

On 1/28/21 5:31 PM, Xueming(Steven) Li wrote:
> <snip>
>>> The patch of device SF capability, but seems I misunderstood your suggestion.
>>> Let me explain process to create a SF:
>>> 1. SF can be created on the fly with scripts, unlike VF which is statically pre-created.
>>
>> Is there a maximum index and maximum total number of SF's created? How to find it?
> 
> The maximum index is defined by firmware configuration, all SF's information could be found
> from sysfs. To create a SF, both PCI and sfnum have to be specified.

sysfs is obviously Linux specific. I think the information
should be available via DPDK API.

>>
>>> 2. SF is created on a PF with a SF number. SF number is named per PF, different PF may have same SF number.
>>> 3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no need to use pf#sf# here.
>>> 4. For bonding netdev, hot plug to DPDK using "PF0_BDF,representor=pf#sf#"
>>> If using new api to return all representor IDs, need some way locate
>>> the new created SF by PF and SF number, that's why "pf#sf#" is used in this patch set.
>>
>> I think the API should simply reserve/report space for maximum number of SFs. So, IDs are stable across restart/reboot in assumption
>> that NIC is not reconfigured (changed maximum number of VF or maximum number of SFs of any PF).
> 
> Yes, IDs should be stable as long as no  NIC firmware configuration change.
> 
> Just clarify, this api should be common enough to report all devices that a bus device supports:
> 1. name, might contains controller and pf info, example: "eth:representor:c0pf1vf"
> 2. ID range, example: 0-127
> The api describes ID ranges for each sub device type, users have to query the api and choose representor ID to probe.
> 
> Prototype:
> struct rte_bus_device_range {
> 	char name[64];
> 	uint32_t base;
> 	uint32_t number;
> }
> /* return number of ranges filled, or number of ranges if list is NULL. */
> int rte_bus_ dev_range_get(struct rte_bus_device_range *list, int n);

Hm, I thought about more port representor specific API.
For me it is hard to tell if such generic naming is good or
bad. I think it should be proven that such generic API
makes sense. Any other potential users / use cases?

I've considered ethdev API which returns (in similar way as
above) list of possible port representors which could be
controlled by the device. Also I think it would be useful
to include type information (enum with PF, VF, SF),
controller ID.

There is one more bit which is not in the picture yet -
switch_info.port_id. Should it be equal to representor
ID? Or different and provided in the info structure?

Xueming Li Feb. 4, 2021, 2:15 p.m. UTC | #11

>-----Original Message-----
>From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>Sent: Monday, February 1, 2021 4:39 PM
>To: Xueming(Steven) Li <xuemingl@nvidia.com>
>Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf Penso <asafp@nvidia.com>
>Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction representor
>
>On 1/28/21 5:31 PM, Xueming(Steven) Li wrote:
>> <snip>
>>>> The patch of device SF capability, but seems I misunderstood your suggestion.
>>>> Let me explain process to create a SF:
>>>> 1. SF can be created on the fly with scripts, unlike VF which is statically pre-created.
>>>
>>> Is there a maximum index and maximum total number of SF's created? How to find it?
>>
>> The maximum index is defined by firmware configuration, all SF's
>> information could be found from sysfs. To create a SF, both PCI and sfnum have to be specified.
>
>sysfs is obviously Linux specific. I think the information should be available via DPDK API.

Yes, the new api discussed below should resolve this issue.

>
>>>
>>>> 2. SF is created on a PF with a SF number. SF number is named per PF, different PF may have same SF number.
>>>> 3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no need to use pf#sf# here.
>>>> 4. For bonding netdev, hot plug to DPDK using "PF0_BDF,representor=pf#sf#"
>>>> If using new api to return all representor IDs, need some way locate
>>>> the new created SF by PF and SF number, that's why "pf#sf#" is used in this patch set.
>>>
>>> I think the API should simply reserve/report space for maximum number
>>> of SFs. So, IDs are stable across restart/reboot in assumption that NIC is not reconfigured (changed maximum number of VF or
>maximum number of SFs of any PF).
>>
>> Yes, IDs should be stable as long as no  NIC firmware configuration change.
>>
>> Just clarify, this api should be common enough to report all devices that a bus device supports:
>> 1. name, might contains controller and pf info, example: "eth:representor:c0pf1vf"
>> 2. ID range, example: 0-127
>> The api describes ID ranges for each sub device type, users have to query the api and choose representor ID to probe.
>>
>> Prototype:
>> struct rte_bus_device_range {
>> 	char name[64];
>> 	uint32_t base;
>> 	uint32_t number;
>> }
>> /* return number of ranges filled, or number of ranges if list is
>> NULL. */ int rte_bus_ dev_range_get(struct rte_bus_device_range *list,
>> int n);
>
>Hm, I thought about more port representor specific API.
>For me it is hard to tell if such generic naming is good or bad. I think it should be proven that such generic API makes sense. Any other
>potential users / use cases?

I was thinking about SF, but SF is PCI specific, not suitable for this api. So I'm fine to make it as ethdev api.
To append new api into eth_dev_ops, is there ABI concern?

>
>I've considered ethdev API which returns (in similar way as
>above) list of possible port representors which could be controlled by the device. Also I think it would be useful to include type
>information (enum with PF, VF, SF), controller ID.

Agree. 

There is a new concern from orchestration side, currently, no interface in openstack and OVS to retrieve representor ID range info,
It will take time to adapt this solution. To probe a representor, orchestration need to know how to calculate representor ID, 
and the ID might vary on different max SF number, i.e. VF4 on PP1 might got different ID. Representor ID change before that will
break the product.

Considering both orchestration and testpmd users, how about keeping both solution together? This will bring max flexibility IMHO.

In struct rte_eth_dev_data, reserved bits could be used to define controller and port, this will avoid bitmap. How do you think?

>
>There is one more bit which is not in the picture yet - switch_info.port_id. Should it be equal to representor ID? Or different and
>provided in the info structure?

Not exactly same AFAIK, the id used in e-switch.

Andrew Rybchenko Feb. 5, 2021, 7:34 a.m. UTC | #12

On 2/4/21 5:15 PM, Xueming(Steven) Li wrote:
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Monday, February 1, 2021 4:39 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf Penso <asafp@nvidia.com>
>> Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction representor
>>
>> On 1/28/21 5:31 PM, Xueming(Steven) Li wrote:
>>> <snip>
>>>>> The patch of device SF capability, but seems I misunderstood your suggestion.
>>>>> Let me explain process to create a SF:
>>>>> 1. SF can be created on the fly with scripts, unlike VF which is statically pre-created.
>>>>
>>>> Is there a maximum index and maximum total number of SF's created? How to find it?
>>>
>>> The maximum index is defined by firmware configuration, all SF's
>>> information could be found from sysfs. To create a SF, both PCI and sfnum have to be specified.
>>
>> sysfs is obviously Linux specific. I think the information should be available via DPDK API.
> 
> Yes, the new api discussed below should resolve this issue.
> 
>>
>>>>
>>>>> 2. SF is created on a PF with a SF number. SF number is named per PF, different PF may have same SF number.
>>>>> 3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no need to use pf#sf# here.
>>>>> 4. For bonding netdev, hot plug to DPDK using "PF0_BDF,representor=pf#sf#"
>>>>> If using new api to return all representor IDs, need some way locate
>>>>> the new created SF by PF and SF number, that's why "pf#sf#" is used in this patch set.
>>>>
>>>> I think the API should simply reserve/report space for maximum number
>>>> of SFs. So, IDs are stable across restart/reboot in assumption that NIC is not reconfigured (changed maximum number of VF or
>> maximum number of SFs of any PF).
>>>
>>> Yes, IDs should be stable as long as no  NIC firmware configuration change.
>>>
>>> Just clarify, this api should be common enough to report all devices that a bus device supports:
>>> 1. name, might contains controller and pf info, example: "eth:representor:c0pf1vf"
>>> 2. ID range, example: 0-127
>>> The api describes ID ranges for each sub device type, users have to query the api and choose representor ID to probe.
>>>
>>> Prototype:
>>> struct rte_bus_device_range {
>>> 	char name[64];
>>> 	uint32_t base;
>>> 	uint32_t number;
>>> }
>>> /* return number of ranges filled, or number of ranges if list is
>>> NULL. */ int rte_bus_ dev_range_get(struct rte_bus_device_range *list,
>>> int n);
>>
>> Hm, I thought about more port representor specific API.
>> For me it is hard to tell if such generic naming is good or bad. I think it should be proven that such generic API makes sense. Any other
>> potential users / use cases?
> 
> I was thinking about SF, but SF is PCI specific, not suitable for this api. So I'm fine to make it as ethdev api.
> To append new api into eth_dev_ops, is there ABI concern?

No, eth_dev_ops are internal

>> I've considered ethdev API which returns (in similar way as
>> above) list of possible port representors which could be controlled by the device. Also I think it would be useful to include type
>> information (enum with PF, VF, SF), controller ID.
> 
> Agree. 
> 
> There is a new concern from orchestration side, currently, no interface in openstack and OVS to retrieve representor ID range info,
> It will take time to adapt this solution. To probe a representor, orchestration need to know how to calculate representor ID, 
> and the ID might vary on different max SF number, i.e. VF4 on PP1 might got different ID. Representor ID change before that will
> break the product.

I see.

> Considering both orchestration and testpmd users, how about keeping both solution together? This will bring max flexibility IMHO.

As I said before I don't mind and I really think it is a good
idea to add suggested interface to specify representor
(i.e. cXpfYvfZ), but the problem is making bitmap from
representor ID.

ethdev API should use new representor info API to
make a representor ID from controller/PF/{VF,SF}.
Or do you see any problems with such approach?

> In struct rte_eth_dev_data, reserved bits could be used to define controller and port, this will avoid bitmap. How do you think?

Could you add a bit more on it? Just a bit more details to
the idea since I don't understand what exactly you mean and
how it could help.

>>
>> There is one more bit which is not in the picture yet - switch_info.port_id. Should it be equal to representor ID? Or different and
>> provided in the info structure?
> 
> Not exactly same AFAIK, the id used in e-switch.
> 
>

Xueming Li Feb. 5, 2021, 9:13 a.m. UTC | #13

>-----Original Message-----
>From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>Sent: Friday, February 5, 2021 3:35 PM
>To: Xueming(Steven) Li <xuemingl@nvidia.com>
>Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf Penso <asafp@nvidia.com>; Thomas Monjalon
><tmonjalon@nvidia.com>
>Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction representor
>
>On 2/4/21 5:15 PM, Xueming(Steven) Li wrote:
>>
>>> -----Original Message-----
>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>> Sent: Monday, February 1, 2021 4:39 PM
>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf
>>> Penso <asafp@nvidia.com>
>>> Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction
>>> representor
>>>
>>> On 1/28/21 5:31 PM, Xueming(Steven) Li wrote:
>>>> <snip>
>>>>>> The patch of device SF capability, but seems I misunderstood your suggestion.
>>>>>> Let me explain process to create a SF:
>>>>>> 1. SF can be created on the fly with scripts, unlike VF which is statically pre-created.
>>>>>
>>>>> Is there a maximum index and maximum total number of SF's created? How to find it?
>>>>
>>>> The maximum index is defined by firmware configuration, all SF's
>>>> information could be found from sysfs. To create a SF, both PCI and sfnum have to be specified.
>>>
>>> sysfs is obviously Linux specific. I think the information should be available via DPDK API.
>>
>> Yes, the new api discussed below should resolve this issue.
>>
>>>
>>>>>
>>>>>> 2. SF is created on a PF with a SF number. SF number is named per PF, different PF may have same SF number.
>>>>>> 3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no need to use pf#sf# here.
>>>>>> 4. For bonding netdev, hot plug to DPDK using "PF0_BDF,representor=pf#sf#"
>>>>>> If using new api to return all representor IDs, need some way
>>>>>> locate the new created SF by PF and SF number, that's why "pf#sf#" is used in this patch set.
>>>>>
>>>>> I think the API should simply reserve/report space for maximum
>>>>> number of SFs. So, IDs are stable across restart/reboot in
>>>>> assumption that NIC is not reconfigured (changed maximum number of
>>>>> VF or
>>> maximum number of SFs of any PF).
>>>>
>>>> Yes, IDs should be stable as long as no  NIC firmware configuration change.
>>>>
>>>> Just clarify, this api should be common enough to report all devices that a bus device supports:
>>>> 1. name, might contains controller and pf info, example: "eth:representor:c0pf1vf"
>>>> 2. ID range, example: 0-127
>>>> The api describes ID ranges for each sub device type, users have to query the api and choose representor ID to probe.
>>>>
>>>> Prototype:
>>>> struct rte_bus_device_range {
>>>> 	char name[64];
>>>> 	uint32_t base;
>>>> 	uint32_t number;
>>>> }
>>>> /* return number of ranges filled, or number of ranges if list is
>>>> NULL. */ int rte_bus_ dev_range_get(struct rte_bus_device_range
>>>> *list, int n);
>>>
>>> Hm, I thought about more port representor specific API.
>>> For me it is hard to tell if such generic naming is good or bad. I
>>> think it should be proven that such generic API makes sense. Any other potential users / use cases?
>>
>> I was thinking about SF, but SF is PCI specific, not suitable for this api. So I'm fine to make it as ethdev api.
>> To append new api into eth_dev_ops, is there ABI concern?
>
>No, eth_dev_ops are internal
>
>>> I've considered ethdev API which returns (in similar way as
>>> above) list of possible port representors which could be controlled
>>> by the device. Also I think it would be useful to include type information (enum with PF, VF, SF), controller ID.
>>
>> Agree.
>>
>> There is a new concern from orchestration side, currently, no
>> interface in openstack and OVS to retrieve representor ID range info,
>> It will take time to adapt this solution. To probe a representor,
>> orchestration need to know how to calculate representor ID, and the ID might vary on different max SF number, i.e. VF4 on PP1
>might got different ID. Representor ID change before that will break the product.
>
>I see.
>
>> Considering both orchestration and testpmd users, how about keeping both solution together? This will bring max flexibility IMHO.
>
>As I said before I don't mind and I really think it is a good idea to add suggested interface to specify representor (i.e. cXpfYvfZ), but the
>problem is making bitmap from representor ID.
>
>ethdev API should use new representor info API to make a representor ID from controller/PF/{VF,SF}.
>Or do you see any problems with such approach?

Sorry I thought the user to figure out representor ID from api.
This combination look good, thanks for clarification :)

So the new api looks like this:
struct rte_eth_representor_info {
  Enum representor_type;
  Uint16_t controller; // -1 for any
  Uint16_t port; // -1 for any
  Uint16_t representor_id;
  Uint16_t count;
  char name[N];

int rte_eth_representor_info_get(struct rte_eth_representor_info *infos);
- Return number of entries.
- NULL infos just return number of entries supported.
Sample outputs:
 VF, -1, 0, 0, 		128, 	"pf0vf"
 SF, -1, 0, 128, 		2048, 	"pf0sf"
 PF, -1, 0, 32767, 	1, 	"pf"
 VF, -1, 1, 32768, 	128, 	"pf1vf"
 SF, -1, 0, (32768+128), 	2048, 	"pf1sf"
 PF, -1, 0, 65535, 	1,	 "pf"

>
>> In struct rte_eth_dev_data, reserved bits could be used to define controller and port, this will avoid bitmap. How do you think?
>
>Could you add a bit more on it? Just a bit more details to the idea since I don't understand what exactly you mean and how it could
>help.

The idea is replacing reserved_64s and adding more device location info in rte_eth-dev_data like this:
  Uint16_t representor_id;
  Uint16_t port_id;
  Uint16_t controller_id;
  Enum representor_type;
Compare them all when matching a device, this will also avoid bitmap encoding. 
Reserved_64s[] was added to mitigate ABI conflicts, IIRC.
But seems no need if making representor info API to make ID.

>
>>>
>>> There is one more bit which is not in the picture yet -
>>> switch_info.port_id. Should it be equal to representor ID? Or different and provided in the info structure?
>>
>> Not exactly same AFAIK, the id used in e-switch.
>>
>>

Andrew Rybchenko Feb. 5, 2021, 9:37 a.m. UTC | #14

On 2/5/21 12:13 PM, Xueming(Steven) Li wrote:
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Friday, February 5, 2021 3:35 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf Penso <asafp@nvidia.com>; Thomas Monjalon
>> <tmonjalon@nvidia.com>
>> Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction representor
>>
>> On 2/4/21 5:15 PM, Xueming(Steven) Li wrote:
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Monday, February 1, 2021 4:39 PM
>>>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>>>> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Asaf
>>>> Penso <asafp@nvidia.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v5 0/9] ethdev: support SubFunction
>>>> representor
>>>>
>>>> On 1/28/21 5:31 PM, Xueming(Steven) Li wrote:
>>>>> <snip>
>>>>>>> The patch of device SF capability, but seems I misunderstood your suggestion.
>>>>>>> Let me explain process to create a SF:
>>>>>>> 1. SF can be created on the fly with scripts, unlike VF which is statically pre-created.
>>>>>>
>>>>>> Is there a maximum index and maximum total number of SF's created? How to find it?
>>>>>
>>>>> The maximum index is defined by firmware configuration, all SF's
>>>>> information could be found from sysfs. To create a SF, both PCI and sfnum have to be specified.
>>>>
>>>> sysfs is obviously Linux specific. I think the information should be available via DPDK API.
>>>
>>> Yes, the new api discussed below should resolve this issue.
>>>
>>>>
>>>>>>
>>>>>>> 2. SF is created on a PF with a SF number. SF number is named per PF, different PF may have same SF number.
>>>>>>> 3. For standalone PF, hot plug to DPDK using "PF#_BDF,representor=sf#", no need to use pf#sf# here.
>>>>>>> 4. For bonding netdev, hot plug to DPDK using "PF0_BDF,representor=pf#sf#"
>>>>>>> If using new api to return all representor IDs, need some way
>>>>>>> locate the new created SF by PF and SF number, that's why "pf#sf#" is used in this patch set.
>>>>>>
>>>>>> I think the API should simply reserve/report space for maximum
>>>>>> number of SFs. So, IDs are stable across restart/reboot in
>>>>>> assumption that NIC is not reconfigured (changed maximum number of
>>>>>> VF or
>>>> maximum number of SFs of any PF).
>>>>>
>>>>> Yes, IDs should be stable as long as no  NIC firmware configuration change.
>>>>>
>>>>> Just clarify, this api should be common enough to report all devices that a bus device supports:
>>>>> 1. name, might contains controller and pf info, example: "eth:representor:c0pf1vf"
>>>>> 2. ID range, example: 0-127
>>>>> The api describes ID ranges for each sub device type, users have to query the api and choose representor ID to probe.
>>>>>
>>>>> Prototype:
>>>>> struct rte_bus_device_range {
>>>>> 	char name[64];
>>>>> 	uint32_t base;
>>>>> 	uint32_t number;
>>>>> }
>>>>> /* return number of ranges filled, or number of ranges if list is
>>>>> NULL. */ int rte_bus_ dev_range_get(struct rte_bus_device_range
>>>>> *list, int n);
>>>>
>>>> Hm, I thought about more port representor specific API.
>>>> For me it is hard to tell if such generic naming is good or bad. I
>>>> think it should be proven that such generic API makes sense. Any other potential users / use cases?
>>>
>>> I was thinking about SF, but SF is PCI specific, not suitable for this api. So I'm fine to make it as ethdev api.
>>> To append new api into eth_dev_ops, is there ABI concern?
>>
>> No, eth_dev_ops are internal
>>
>>>> I've considered ethdev API which returns (in similar way as
>>>> above) list of possible port representors which could be controlled
>>>> by the device. Also I think it would be useful to include type information (enum with PF, VF, SF), controller ID.
>>>
>>> Agree.
>>>
>>> There is a new concern from orchestration side, currently, no
>>> interface in openstack and OVS to retrieve representor ID range info,
>>> It will take time to adapt this solution. To probe a representor,
>>> orchestration need to know how to calculate representor ID, and the ID might vary on different max SF number, i.e. VF4 on PP1
>> might got different ID. Representor ID change before that will break the product.
>>
>> I see.
>>
>>> Considering both orchestration and testpmd users, how about keeping both solution together? This will bring max flexibility IMHO.
>>
>> As I said before I don't mind and I really think it is a good idea to add suggested interface to specify representor (i.e. cXpfYvfZ), but the
>> problem is making bitmap from representor ID.
>>
>> ethdev API should use new representor info API to make a representor ID from controller/PF/{VF,SF}.
>> Or do you see any problems with such approach?
> 
> Sorry I thought the user to figure out representor ID from api.
> This combination look good, thanks for clarification :)
> 
> So the new api looks like this:

Roughly speaking - yes

> struct rte_eth_representor_info {
>   Enum representor_type;
>   Uint16_t controller; // -1 for any

I'm not sure that I understand what does "any" mean in this
case. I think it should be the zero in examples below.
I think that API should return caller controller ID and
PF ID. It would allow to interpret "vf5" correctly when
caller is not controller #0 and/or PF #0.

>   Uint16_t port; // -1 for any

port sounds like physical port, but it should be PF
 (pf, phys_fn or something like this). It could be many
PFs per physical network port.

>   Uint16_t representor_id;

May be base_id? Or rep_base_id?

The question is what to do if range for VF or SF is
not contiguous. Should we have one more index after phys_fn
to  represent it? E.g.
union {
    uint16_t vf;
    uint16_t sf;
};

>   Uint16_t count;

May be id_range which should be 1 to show one function.
It could be convenient to treat 0 this way as well,
but I doubt that it is a good idea.

>   char name[N];
> 
> int rte_eth_representor_info_get(struct rte_eth_representor_info *infos);
> - Return number of entries.
> - NULL infos just return number of entries supported.
> Sample outputs:
>  VF, -1, 0, 0, 		128, 	"pf0vf"
>  SF, -1, 0, 128, 		2048, 	"pf0sf"
>  PF, -1, 0, 32767, 	1, 	"pf"
>  VF, -1, 1, 32768, 	128, 	"pf1vf"
>  SF, -1, 0, (32768+128), 	2048, 	"pf1sf"
>  PF, -1, 0, 65535, 	1,	 "pf"
> 
>>
>>> In struct rte_eth_dev_data, reserved bits could be used to define controller and port, this will avoid bitmap. How do you think?
>>
>> Could you add a bit more on it? Just a bit more details to the idea since I don't understand what exactly you mean and how it could
>> help.
> 
> The idea is replacing reserved_64s and adding more device location info in rte_eth-dev_data like this:
>   Uint16_t representor_id;
>   Uint16_t port_id;
>   Uint16_t controller_id;
>   Enum representor_type;
> Compare them all when matching a device, this will also avoid bitmap encoding. 
> Reserved_64s[] was added to mitigate ABI conflicts, IIRC.
> But seems no need if making representor info API to make ID.
> 
>>
>>>>
>>>> There is one more bit which is not in the picture yet -
>>>> switch_info.port_id. Should it be equal to representor ID? Or different and provided in the info structure?
>>>
>>> Not exactly same AFAIK, the id used in e-switch.
>>>
>>>
>