[v5,1/8] net/hns3: support runtime config to select IO burst func

Message ID 1616116046-47578-2-git-send-email-humin29@huawei.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series features and bugfixes for hns3 |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

humin (Q) March 19, 2021, 1:07 a.m. UTC
  From: Chengwen Feng <fengchengwen@huawei.com>

Currently, the driver support multiple IO burst function and auto
selection of the most appropriate function based on offload
configuration.

Most applications such as l2fwd/l3fwd don't provide the means to
change offload configuration, so it will use the auto selection's io
burst function.

This patch support runtime config to select io burst function, which
add two config: rx_func_hint and tx_func_hint, both could assign
vec/sve/simple/common.

The driver will use the following rules to select io burst func:
a. if hint equal vec and meet the vec Rx/Tx usage condition then use
the neon function.
b. if hint equal sve and meet the sve Rx/Tx usage condition then use
the sve function.
c. if hint equal simple and meet the simple Rx/Tx usage condition then
use the simple function.
d. if hint equal common then use the common function.
e. if hint not set then:
e.1. if meet the vec Rx/Tx usage condition then use the neon function.
e.2. if meet the simple Rx/Tx usage condition then use the simple
function.
e.3. else use the common function.

Note: the sve Rx/Tx usage condition based on the vec Rx/Tx usage
condition and runtime environment (which must support SVE).

In the previous versions, driver will preferred use the sve function
when meet the sve Rx/Tx usage condition, but in this case driver could
get better performance if use the neon function.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
---
v6:
- document hns3.rst about description of vec, common and simple.
---
 doc/guides/nics/hns3.rst               | 19 +++++++++
 doc/guides/rel_notes/release_21_05.rst |  1 +
 drivers/net/hns3/hns3_ethdev.c         | 77 ++++++++++++++++++++++++++++++++++
 drivers/net/hns3/hns3_ethdev.h         | 15 +++++++
 drivers/net/hns3/hns3_ethdev_vf.c      |  4 ++
 drivers/net/hns3/hns3_rxtx.c           | 54 +++++++++++++++++-------
 6 files changed, 156 insertions(+), 14 deletions(-)
  

Comments

Ferruh Yigit March 22, 2021, 1:58 p.m. UTC | #1
On 3/19/2021 1:07 AM, Min Hu (Connor) wrote:
> From: Chengwen Feng <fengchengwen@huawei.com>
> 
> Currently, the driver support multiple IO burst function and auto
> selection of the most appropriate function based on offload
> configuration.
> 
> Most applications such as l2fwd/l3fwd don't provide the means to
> change offload configuration, so it will use the auto selection's io
> burst function.
> 
> This patch support runtime config to select io burst function, which
> add two config: rx_func_hint and tx_func_hint, both could assign
> vec/sve/simple/common.
> 
> The driver will use the following rules to select io burst func:
> a. if hint equal vec and meet the vec Rx/Tx usage condition then use
> the neon function.
> b. if hint equal sve and meet the sve Rx/Tx usage condition then use
> the sve function.
> c. if hint equal simple and meet the simple Rx/Tx usage condition then
> use the simple function.
> d. if hint equal common then use the common function.
> e. if hint not set then:
> e.1. if meet the vec Rx/Tx usage condition then use the neon function.
> e.2. if meet the simple Rx/Tx usage condition then use the simple
> function.
> e.3. else use the common function.
> 
> Note: the sve Rx/Tx usage condition based on the vec Rx/Tx usage
> condition and runtime environment (which must support SVE).
> 
> In the previous versions, driver will preferred use the sve function
> when meet the sve Rx/Tx usage condition, but in this case driver could
> get better performance if use the neon function.
> 

Is this saying 'neon' is giving better performance even if 'sve' is supported?

> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
> ---
> v6:
> - document hns3.rst about description of vec, common and simple.
> ---
>   doc/guides/nics/hns3.rst               | 19 +++++++++
>   doc/guides/rel_notes/release_21_05.rst |  1 +
>   drivers/net/hns3/hns3_ethdev.c         | 77 ++++++++++++++++++++++++++++++++++
>   drivers/net/hns3/hns3_ethdev.h         | 15 +++++++
>   drivers/net/hns3/hns3_ethdev_vf.c      |  4 ++
>   drivers/net/hns3/hns3_rxtx.c           | 54 +++++++++++++++++-------
>   6 files changed, 156 insertions(+), 14 deletions(-)
> 
> diff --git a/doc/guides/nics/hns3.rst b/doc/guides/nics/hns3.rst
> index 84bd7a3..8f48240 100644
> --- a/doc/guides/nics/hns3.rst
> +++ b/doc/guides/nics/hns3.rst
> @@ -46,6 +46,25 @@ Prerequisites
>   - Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment.
>   
>   
> +Runtime Config Options
> +----------------------
> +
> +- ``rx_func_hint`` (default ``none``)
> +
> +  Used to select Rx burst function, supported value are "vec", "sve", "simple", "common".

``vec``, ``sve``, ``simple`` and ``common``. ??

> +  When equal "vec" and meet the vector Rx usage condition then use the default vector Rx implementation, 'neon' for Kunpeng Arm platform.
> +  When equal "sve" and meet the sve Rx usage condition then use the sve Rx function.
> +  When equal "simple" and meet the simple Rx usage condition then use the simple Rx function which indicates the Scalar algorithm obtained from rte_eth_rx_burst_mode_get.
> +  When equal "common" then use the common Rx function which indicates the Scalar Scattered algorithm obtained from rte_eth_rx_burst_mode_get.
> +

A few comments on the documentation,

- What about using `` to highlight the parameter, like ``vec``, on all occurrences.

- What about adding bullet points for each parameter

- I think you can drop "When equal" start from all

- You can drop "obtained from rte_eth_rx_burst_mode_get" part, the function name 
is not needed here, something like gives same information:

- Can "and meet the vector Rx usage condition" be simplified, overall what about 
something like:
* ``simple``, if supported use the ``simple`` Rx function which indicates the 
scalar algorithm.

- It is not clear what happens when provided parameter is not supported, like 
when I set 'vec' but if PMD doesn't support it, which function will be supported?

- Can you please try to limit the line length aroung 80 columns.

- No need to start words with uppercase for 'Scalar' & 'Scalar Scattered'

- Same for below Tx ones.
  
Ferruh Yigit March 22, 2021, 2:03 p.m. UTC | #2
On 3/22/2021 1:58 PM, Ferruh Yigit wrote:
> On 3/19/2021 1:07 AM, Min Hu (Connor) wrote:
>> From: Chengwen Feng <fengchengwen@huawei.com>
>>
>> Currently, the driver support multiple IO burst function and auto
>> selection of the most appropriate function based on offload
>> configuration.
>>
>> Most applications such as l2fwd/l3fwd don't provide the means to
>> change offload configuration, so it will use the auto selection's io
>> burst function.
>>
>> This patch support runtime config to select io burst function, which
>> add two config: rx_func_hint and tx_func_hint, both could assign
>> vec/sve/simple/common.
>>
>> The driver will use the following rules to select io burst func:
>> a. if hint equal vec and meet the vec Rx/Tx usage condition then use
>> the neon function.
>> b. if hint equal sve and meet the sve Rx/Tx usage condition then use
>> the sve function.
>> c. if hint equal simple and meet the simple Rx/Tx usage condition then
>> use the simple function.
>> d. if hint equal common then use the common function.
>> e. if hint not set then:
>> e.1. if meet the vec Rx/Tx usage condition then use the neon function.
>> e.2. if meet the simple Rx/Tx usage condition then use the simple
>> function.
>> e.3. else use the common function.
>>
>> Note: the sve Rx/Tx usage condition based on the vec Rx/Tx usage
>> condition and runtime environment (which must support SVE).
>>
>> In the previous versions, driver will preferred use the sve function
>> when meet the sve Rx/Tx usage condition, but in this case driver could
>> get better performance if use the neon function.
>>
> 
> Is this saying 'neon' is giving better performance even if 'sve' is supported?
> 
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>> ---
>> v6:
>> - document hns3.rst about description of vec, common and simple.
>> ---
>>   doc/guides/nics/hns3.rst               | 19 +++++++++
>>   doc/guides/rel_notes/release_21_05.rst |  1 +
>>   drivers/net/hns3/hns3_ethdev.c         | 77 ++++++++++++++++++++++++++++++++++
>>   drivers/net/hns3/hns3_ethdev.h         | 15 +++++++
>>   drivers/net/hns3/hns3_ethdev_vf.c      |  4 ++
>>   drivers/net/hns3/hns3_rxtx.c           | 54 +++++++++++++++++-------
>>   6 files changed, 156 insertions(+), 14 deletions(-)
>>
>> diff --git a/doc/guides/nics/hns3.rst b/doc/guides/nics/hns3.rst
>> index 84bd7a3..8f48240 100644
>> --- a/doc/guides/nics/hns3.rst
>> +++ b/doc/guides/nics/hns3.rst
>> @@ -46,6 +46,25 @@ Prerequisites
>>   - Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to 
>> setup the basic DPDK environment.
>> +Runtime Config Options
>> +----------------------
>> +
>> +- ``rx_func_hint`` (default ``none``)
>> +
>> +  Used to select Rx burst function, supported value are "vec", "sve", 
>> "simple", "common".
> 
> ``vec``, ``sve``, ``simple`` and ``common``. ??
> 
>> +  When equal "vec" and meet the vector Rx usage condition then use the 
>> default vector Rx implementation, 'neon' for Kunpeng Arm platform.
>> +  When equal "sve" and meet the sve Rx usage condition then use the sve Rx 
>> function.
>> +  When equal "simple" and meet the simple Rx usage condition then use the 
>> simple Rx function which indicates the Scalar algorithm obtained from 
>> rte_eth_rx_burst_mode_get.
>> +  When equal "common" then use the common Rx function which indicates the 
>> Scalar Scattered algorithm obtained from rte_eth_rx_burst_mode_get.
>> +
> 
> A few comments on the documentation,
> 
> - What about using `` to highlight the parameter, like ``vec``, on all occurrences.
> 
> - What about adding bullet points for each parameter
> 
> - I think you can drop "When equal" start from all
> 
> - You can drop "obtained from rte_eth_rx_burst_mode_get" part, the function name 
> is not needed here, something like gives same information:
> 
> - Can "and meet the vector Rx usage condition" be simplified, overall what about 
> something like:
> * ``simple``, if supported use the ``simple`` Rx function which indicates the 
> scalar algorithm.
> 
> - It is not clear what happens when provided parameter is not supported, like 
> when I set 'vec' but if PMD doesn't support it, which function will be supported?
> 
> - Can you please try to limit the line length aroung 80 columns.
> 
> - No need to start words with uppercase for 'Scalar' & 'Scalar Scattered'
> 
> - Same for below Tx ones.

Can you also put a separate line to document the Rx function selection order if 
the ``rx_func_hint`` is not provided. Same for Tx.
  
fengchengwen March 23, 2021, 3:31 a.m. UTC | #3
On 2021/3/22 21:58, Ferruh Yigit wrote:
> On 3/19/2021 1:07 AM, Min Hu (Connor) wrote:
>> From: Chengwen Feng <fengchengwen@huawei.com>
>>
>> Currently, the driver support multiple IO burst function and auto
>> selection of the most appropriate function based on offload
>> configuration.
>>
>> Most applications such as l2fwd/l3fwd don't provide the means to
>> change offload configuration, so it will use the auto selection's io
>> burst function.
>>
>> This patch support runtime config to select io burst function, which
>> add two config: rx_func_hint and tx_func_hint, both could assign
>> vec/sve/simple/common.
>>
>> The driver will use the following rules to select io burst func:
>> a. if hint equal vec and meet the vec Rx/Tx usage condition then use
>> the neon function.
>> b. if hint equal sve and meet the sve Rx/Tx usage condition then use
>> the sve function.
>> c. if hint equal simple and meet the simple Rx/Tx usage condition then
>> use the simple function.
>> d. if hint equal common then use the common function.
>> e. if hint not set then:
>> e.1. if meet the vec Rx/Tx usage condition then use the neon function.
>> e.2. if meet the simple Rx/Tx usage condition then use the simple
>> function.
>> e.3. else use the common function.
>>
>> Note: the sve Rx/Tx usage condition based on the vec Rx/Tx usage
>> condition and runtime environment (which must support SVE).
>>
>> In the previous versions, driver will preferred use the sve function
>> when meet the sve Rx/Tx usage condition, but in this case driver could
>> get better performance if use the neon function.
>>
> 
> Is this saying 'neon' is giving better performance even if 'sve' is supported?

I'm sorry to confuse you, let me explain the hns3 sve function history:
1. The sve instruction only support on our latest processor Kunpeng930, and
the sve Rx/Tx function is being gradually optimized.
2. We define a macro CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE which equal n
default in the original scheme, so driver will not select sve Rx/Tx function
unless user config CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE=y.
3. We plan to switch CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE equal y when the
sve Rx/Tx function is fully optimized.
4. The makefile is switched to meson build in 20.11, and it's not recommended
to define the marco such as above, so the upload scheme is adjusted which
delete the macro CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE, this leads to driver
select sve Rx/Tx function when meeting sve conditions (which are gcc support
compile sve and the host cpu&os support sve), but it doesn't fit out plan, so
here we modify it.

> 
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>> ---
>> v6:
>> - document hns3.rst about description of vec, common and simple.
>> ---
>>   doc/guides/nics/hns3.rst               | 19 +++++++++
>>   doc/guides/rel_notes/release_21_05.rst |  1 +
>>   drivers/net/hns3/hns3_ethdev.c         | 77 ++++++++++++++++++++++++++++++++++
>>   drivers/net/hns3/hns3_ethdev.h         | 15 +++++++
>>   drivers/net/hns3/hns3_ethdev_vf.c      |  4 ++
>>   drivers/net/hns3/hns3_rxtx.c           | 54 +++++++++++++++++-------
>>   6 files changed, 156 insertions(+), 14 deletions(-)
>>
>> diff --git a/doc/guides/nics/hns3.rst b/doc/guides/nics/hns3.rst
>> index 84bd7a3..8f48240 100644
>> --- a/doc/guides/nics/hns3.rst
>> +++ b/doc/guides/nics/hns3.rst
>> @@ -46,6 +46,25 @@ Prerequisites
>>   - Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment.
>>     +Runtime Config Options
>> +----------------------
>> +
>> +- ``rx_func_hint`` (default ``none``)
>> +
>> +  Used to select Rx burst function, supported value are "vec", "sve", "simple", "common".
> 
> ``vec``, ``sve``, ``simple`` and ``common``. ??
> 
>> +  When equal "vec" and meet the vector Rx usage condition then use the default vector Rx implementation, 'neon' for Kunpeng Arm platform.
>> +  When equal "sve" and meet the sve Rx usage condition then use the sve Rx function.
>> +  When equal "simple" and meet the simple Rx usage condition then use the simple Rx function which indicates the Scalar algorithm obtained from rte_eth_rx_burst_mode_get.
>> +  When equal "common" then use the common Rx function which indicates the Scalar Scattered algorithm obtained from rte_eth_rx_burst_mode_get.
>> +
> 
> A few comments on the documentation,
> 
> - What about using `` to highlight the parameter, like ``vec``, on all occurrences.
> 
> - What about adding bullet points for each parameter
> 
> - I think you can drop "When equal" start from all
> 
> - You can drop "obtained from rte_eth_rx_burst_mode_get" part, the function name is not needed here, something like gives same information:
> 
> - Can "and meet the vector Rx usage condition" be simplified, overall what about something like:
> * ``simple``, if supported use the ``simple`` Rx function which indicates the scalar algorithm.
> 
> - It is not clear what happens when provided parameter is not supported, like when I set 'vec' but if PMD doesn't support it, which function will be supported?
> 
> - Can you please try to limit the line length aroung 80 columns.
> 
> - No need to start words with uppercase for 'Scalar' & 'Scalar Scattered'
> 
> - Same for below Tx ones.
> 
> .

OK, will fix in later patch
  
fengchengwen March 23, 2021, 3:37 a.m. UTC | #4
sorry to send again: +to Ferruh

On 2021/3/22 21:58, Ferruh Yigit wrote:
> On 3/19/2021 1:07 AM, Min Hu (Connor) wrote:
>> From: Chengwen Feng <fengchengwen@huawei.com>
>>
>> Currently, the driver support multiple IO burst function and auto
>> selection of the most appropriate function based on offload
>> configuration.
>>
>> Most applications such as l2fwd/l3fwd don't provide the means to
>> change offload configuration, so it will use the auto selection's io
>> burst function.
>>
>> This patch support runtime config to select io burst function, which
>> add two config: rx_func_hint and tx_func_hint, both could assign
>> vec/sve/simple/common.
>>
>> The driver will use the following rules to select io burst func:
>> a. if hint equal vec and meet the vec Rx/Tx usage condition then use
>> the neon function.
>> b. if hint equal sve and meet the sve Rx/Tx usage condition then use
>> the sve function.
>> c. if hint equal simple and meet the simple Rx/Tx usage condition then
>> use the simple function.
>> d. if hint equal common then use the common function.
>> e. if hint not set then:
>> e.1. if meet the vec Rx/Tx usage condition then use the neon function.
>> e.2. if meet the simple Rx/Tx usage condition then use the simple
>> function.
>> e.3. else use the common function.
>>
>> Note: the sve Rx/Tx usage condition based on the vec Rx/Tx usage
>> condition and runtime environment (which must support SVE).
>>
>> In the previous versions, driver will preferred use the sve function
>> when meet the sve Rx/Tx usage condition, but in this case driver could
>> get better performance if use the neon function.
>>
> 
> Is this saying 'neon' is giving better performance even if 'sve' is supported?

I'm sorry to confuse you, let me explain the hns3 sve function history:
1. The sve instruction only support on our latest processor Kunpeng930, and
the sve Rx/Tx function is being gradually optimized.
2. We define a macro CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE which equal n
default in the original scheme, so driver will not select sve Rx/Tx function
unless user config CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE=y.
3. We plan to switch CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE equal y when the
sve Rx/Tx function is fully optimized.
4. The makefile is switched to meson build in 20.11, and it's not recommended
to define the marco such as above, so the upload scheme is adjusted which
delete the macro CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE, this leads to driver
select sve Rx/Tx function when meeting sve conditions (which are gcc support
compile sve and the host cpu&os support sve), but it doesn't fit out plan, so
here we modify it.

> 
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>> ---
>> v6:
>> - document hns3.rst about description of vec, common and simple.
>> ---
>>   doc/guides/nics/hns3.rst               | 19 +++++++++
>>   doc/guides/rel_notes/release_21_05.rst |  1 +
>>   drivers/net/hns3/hns3_ethdev.c         | 77 ++++++++++++++++++++++++++++++++++
>>   drivers/net/hns3/hns3_ethdev.h         | 15 +++++++
>>   drivers/net/hns3/hns3_ethdev_vf.c      |  4 ++
>>   drivers/net/hns3/hns3_rxtx.c           | 54 +++++++++++++++++-------
>>   6 files changed, 156 insertions(+), 14 deletions(-)
>>
>> diff --git a/doc/guides/nics/hns3.rst b/doc/guides/nics/hns3.rst
>> index 84bd7a3..8f48240 100644
>> --- a/doc/guides/nics/hns3.rst
>> +++ b/doc/guides/nics/hns3.rst
>> @@ -46,6 +46,25 @@ Prerequisites
>>   - Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment.
>>     +Runtime Config Options
>> +----------------------
>> +
>> +- ``rx_func_hint`` (default ``none``)
>> +
>> +  Used to select Rx burst function, supported value are "vec", "sve", "simple", "common".
> 
> ``vec``, ``sve``, ``simple`` and ``common``. ??
> 
>> +  When equal "vec" and meet the vector Rx usage condition then use the default vector Rx implementation, 'neon' for Kunpeng Arm platform.
>> +  When equal "sve" and meet the sve Rx usage condition then use the sve Rx function.
>> +  When equal "simple" and meet the simple Rx usage condition then use the simple Rx function which indicates the Scalar algorithm obtained from rte_eth_rx_burst_mode_get.
>> +  When equal "common" then use the common Rx function which indicates the Scalar Scattered algorithm obtained from rte_eth_rx_burst_mode_get.
>> +
> 
> A few comments on the documentation,
> 
> - What about using `` to highlight the parameter, like ``vec``, on all occurrences.
> 
> - What about adding bullet points for each parameter
> 
> - I think you can drop "When equal" start from all
> 
> - You can drop "obtained from rte_eth_rx_burst_mode_get" part, the function name is not needed here, something like gives same information:
> 
> - Can "and meet the vector Rx usage condition" be simplified, overall what about something like:
> * ``simple``, if supported use the ``simple`` Rx function which indicates the scalar algorithm.
> 
> - It is not clear what happens when provided parameter is not supported, like when I set 'vec' but if PMD doesn't support it, which function will be supported?
> 
> - Can you please try to limit the line length aroung 80 columns.
> 
> - No need to start words with uppercase for 'Scalar' & 'Scalar Scattered'
> 
> - Same for below Tx ones.
> 
> .
OK, will fix in later patch
  
Ferruh Yigit March 23, 2021, 10:31 a.m. UTC | #5
On 3/23/2021 3:31 AM, fengchengwen wrote:
> 
> 
> On 2021/3/22 21:58, Ferruh Yigit wrote:
>> On 3/19/2021 1:07 AM, Min Hu (Connor) wrote:
>>> From: Chengwen Feng <fengchengwen@huawei.com>
>>>
>>> Currently, the driver support multiple IO burst function and auto
>>> selection of the most appropriate function based on offload
>>> configuration.
>>>
>>> Most applications such as l2fwd/l3fwd don't provide the means to
>>> change offload configuration, so it will use the auto selection's io
>>> burst function.
>>>
>>> This patch support runtime config to select io burst function, which
>>> add two config: rx_func_hint and tx_func_hint, both could assign
>>> vec/sve/simple/common.
>>>
>>> The driver will use the following rules to select io burst func:
>>> a. if hint equal vec and meet the vec Rx/Tx usage condition then use
>>> the neon function.
>>> b. if hint equal sve and meet the sve Rx/Tx usage condition then use
>>> the sve function.
>>> c. if hint equal simple and meet the simple Rx/Tx usage condition then
>>> use the simple function.
>>> d. if hint equal common then use the common function.
>>> e. if hint not set then:
>>> e.1. if meet the vec Rx/Tx usage condition then use the neon function.
>>> e.2. if meet the simple Rx/Tx usage condition then use the simple
>>> function.
>>> e.3. else use the common function.
>>>
>>> Note: the sve Rx/Tx usage condition based on the vec Rx/Tx usage
>>> condition and runtime environment (which must support SVE).
>>>
>>> In the previous versions, driver will preferred use the sve function
>>> when meet the sve Rx/Tx usage condition, but in this case driver could
>>> get better performance if use the neon function.
>>>
>>
>> Is this saying 'neon' is giving better performance even if 'sve' is supported?
> 
> I'm sorry to confuse you, let me explain the hns3 sve function history:
> 1. The sve instruction only support on our latest processor Kunpeng930, and
> the sve Rx/Tx function is being gradually optimized.
> 2. We define a macro CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE which equal n
> default in the original scheme, so driver will not select sve Rx/Tx function
> unless user config CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE=y.
> 3. We plan to switch CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE equal y when the
> sve Rx/Tx function is fully optimized.
> 4. The makefile is switched to meson build in 20.11, and it's not recommended
> to define the marco such as above, so the upload scheme is adjusted which
> delete the macro CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE, this leads to driver
> select sve Rx/Tx function when meeting sve conditions (which are gcc support
> compile sve and the host cpu&os support sve), but it doesn't fit out plan, so
> here we modify it.
> 

Got it, so you want keep the 'neon' path default until 'sve' path is more 
optimized, thanks for clarification.

>>
>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>>> ---
>>> v6:
>>> - document hns3.rst about description of vec, common and simple.
>>> ---
>>>    doc/guides/nics/hns3.rst               | 19 +++++++++
>>>    doc/guides/rel_notes/release_21_05.rst |  1 +
>>>    drivers/net/hns3/hns3_ethdev.c         | 77 ++++++++++++++++++++++++++++++++++
>>>    drivers/net/hns3/hns3_ethdev.h         | 15 +++++++
>>>    drivers/net/hns3/hns3_ethdev_vf.c      |  4 ++
>>>    drivers/net/hns3/hns3_rxtx.c           | 54 +++++++++++++++++-------
>>>    6 files changed, 156 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/doc/guides/nics/hns3.rst b/doc/guides/nics/hns3.rst
>>> index 84bd7a3..8f48240 100644
>>> --- a/doc/guides/nics/hns3.rst
>>> +++ b/doc/guides/nics/hns3.rst
>>> @@ -46,6 +46,25 @@ Prerequisites
>>>    - Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment.
>>>      +Runtime Config Options
>>> +----------------------
>>> +
>>> +- ``rx_func_hint`` (default ``none``)
>>> +
>>> +  Used to select Rx burst function, supported value are "vec", "sve", "simple", "common".
>>
>> ``vec``, ``sve``, ``simple`` and ``common``. ??
>>
>>> +  When equal "vec" and meet the vector Rx usage condition then use the default vector Rx implementation, 'neon' for Kunpeng Arm platform.
>>> +  When equal "sve" and meet the sve Rx usage condition then use the sve Rx function.
>>> +  When equal "simple" and meet the simple Rx usage condition then use the simple Rx function which indicates the Scalar algorithm obtained from rte_eth_rx_burst_mode_get.
>>> +  When equal "common" then use the common Rx function which indicates the Scalar Scattered algorithm obtained from rte_eth_rx_burst_mode_get.
>>> +
>>
>> A few comments on the documentation,
>>
>> - What about using `` to highlight the parameter, like ``vec``, on all occurrences.
>>
>> - What about adding bullet points for each parameter
>>
>> - I think you can drop "When equal" start from all
>>
>> - You can drop "obtained from rte_eth_rx_burst_mode_get" part, the function name is not needed here, something like gives same information:
>>
>> - Can "and meet the vector Rx usage condition" be simplified, overall what about something like:
>> * ``simple``, if supported use the ``simple`` Rx function which indicates the scalar algorithm.
>>
>> - It is not clear what happens when provided parameter is not supported, like when I set 'vec' but if PMD doesn't support it, which function will be supported?
>>
>> - Can you please try to limit the line length aroung 80 columns.
>>
>> - No need to start words with uppercase for 'Scalar' & 'Scalar Scattered'
>>
>> - Same for below Tx ones.
>>
>> .
> 
> OK, will fix in later patch
>
  
humin (Q) March 23, 2021, 11:22 a.m. UTC | #6
在 2021/3/23 18:31, Ferruh Yigit 写道:
> On 3/23/2021 3:31 AM, fengchengwen wrote:
>>
>>
>> On 2021/3/22 21:58, Ferruh Yigit wrote:
>>> On 3/19/2021 1:07 AM, Min Hu (Connor) wrote:
>>>> From: Chengwen Feng <fengchengwen@huawei.com>
>>>>
>>>> Currently, the driver support multiple IO burst function and auto
>>>> selection of the most appropriate function based on offload
>>>> configuration.
>>>>
>>>> Most applications such as l2fwd/l3fwd don't provide the means to
>>>> change offload configuration, so it will use the auto selection's io
>>>> burst function.
>>>>
>>>> This patch support runtime config to select io burst function, which
>>>> add two config: rx_func_hint and tx_func_hint, both could assign
>>>> vec/sve/simple/common.
>>>>
>>>> The driver will use the following rules to select io burst func:
>>>> a. if hint equal vec and meet the vec Rx/Tx usage condition then use
>>>> the neon function.
>>>> b. if hint equal sve and meet the sve Rx/Tx usage condition then use
>>>> the sve function.
>>>> c. if hint equal simple and meet the simple Rx/Tx usage condition then
>>>> use the simple function.
>>>> d. if hint equal common then use the common function.
>>>> e. if hint not set then:
>>>> e.1. if meet the vec Rx/Tx usage condition then use the neon function.
>>>> e.2. if meet the simple Rx/Tx usage condition then use the simple
>>>> function.
>>>> e.3. else use the common function.
>>>>
>>>> Note: the sve Rx/Tx usage condition based on the vec Rx/Tx usage
>>>> condition and runtime environment (which must support SVE).
>>>>
>>>> In the previous versions, driver will preferred use the sve function
>>>> when meet the sve Rx/Tx usage condition, but in this case driver could
>>>> get better performance if use the neon function.
>>>>
>>>
>>> Is this saying 'neon' is giving better performance even if 'sve' is 
>>> supported?
>>
>> I'm sorry to confuse you, let me explain the hns3 sve function history:
>> 1. The sve instruction only support on our latest processor 
>> Kunpeng930, and
>> the sve Rx/Tx function is being gradually optimized.
>> 2. We define a macro CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE which equal n
>> default in the original scheme, so driver will not select sve Rx/Tx 
>> function
>> unless user config CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE=y.
>> 3. We plan to switch CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE equal y 
>> when the
>> sve Rx/Tx function is fully optimized.
>> 4. The makefile is switched to meson build in 20.11, and it's not 
>> recommended
>> to define the marco such as above, so the upload scheme is adjusted which
>> delete the macro CONFIG_RTE_LIBRTE_HNS3_INC_VECTOR_SVE, this leads to 
>> driver
>> select sve Rx/Tx function when meeting sve conditions (which are gcc 
>> support
>> compile sve and the host cpu&os support sve), but it doesn't fit out 
>> plan, so
>> here we modify it.
>>
> 
> Got it, so you want keep the 'neon' path default until 'sve' path is 
> more optimized, thanks for clarification.
OK, v6 has been sent, which fixed all the bugs, please check it, thanks.
> 
>>>
>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>> Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
>>>> ---
>>>> v6:
>>>> - document hns3.rst about description of vec, common and simple.
>>>> ---
>>>>    doc/guides/nics/hns3.rst               | 19 +++++++++
>>>>    doc/guides/rel_notes/release_21_05.rst |  1 +
>>>>    drivers/net/hns3/hns3_ethdev.c         | 77 
>>>> ++++++++++++++++++++++++++++++++++
>>>>    drivers/net/hns3/hns3_ethdev.h         | 15 +++++++
>>>>    drivers/net/hns3/hns3_ethdev_vf.c      |  4 ++
>>>>    drivers/net/hns3/hns3_rxtx.c           | 54 +++++++++++++++++-------
>>>>    6 files changed, 156 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/doc/guides/nics/hns3.rst b/doc/guides/nics/hns3.rst
>>>> index 84bd7a3..8f48240 100644
>>>> --- a/doc/guides/nics/hns3.rst
>>>> +++ b/doc/guides/nics/hns3.rst
>>>> @@ -46,6 +46,25 @@ Prerequisites
>>>>    - Follow the DPDK :ref:`Getting Started Guide for Linux 
>>>> <linux_gsg>` to setup the basic DPDK environment.
>>>>      +Runtime Config Options
>>>> +----------------------
>>>> +
>>>> +- ``rx_func_hint`` (default ``none``)
>>>> +
>>>> +  Used to select Rx burst function, supported value are "vec", 
>>>> "sve", "simple", "common".
>>>
>>> ``vec``, ``sve``, ``simple`` and ``common``. ??
>>>
>>>> +  When equal "vec" and meet the vector Rx usage condition then use 
>>>> the default vector Rx implementation, 'neon' for Kunpeng Arm platform.
>>>> +  When equal "sve" and meet the sve Rx usage condition then use the 
>>>> sve Rx function.
>>>> +  When equal "simple" and meet the simple Rx usage condition then 
>>>> use the simple Rx function which indicates the Scalar algorithm 
>>>> obtained from rte_eth_rx_burst_mode_get.
>>>> +  When equal "common" then use the common Rx function which 
>>>> indicates the Scalar Scattered algorithm obtained from 
>>>> rte_eth_rx_burst_mode_get.
>>>> +
>>>
>>> A few comments on the documentation,
>>>
>>> - What about using `` to highlight the parameter, like ``vec``, on 
>>> all occurrences.
>>>
>>> - What about adding bullet points for each parameter
>>>
>>> - I think you can drop "When equal" start from all
>>>
>>> - You can drop "obtained from rte_eth_rx_burst_mode_get" part, the 
>>> function name is not needed here, something like gives same information:
>>>
>>> - Can "and meet the vector Rx usage condition" be simplified, overall 
>>> what about something like:
>>> * ``simple``, if supported use the ``simple`` Rx function which 
>>> indicates the scalar algorithm.
>>>
>>> - It is not clear what happens when provided parameter is not 
>>> supported, like when I set 'vec' but if PMD doesn't support it, which 
>>> function will be supported?
>>>
>>> - Can you please try to limit the line length aroung 80 columns.
>>>
>>> - No need to start words with uppercase for 'Scalar' & 'Scalar 
>>> Scattered'
>>>
>>> - Same for below Tx ones.
>>>
>>> .
>>
>> OK, will fix in later patch
>>
> 
> .
  

Patch

diff --git a/doc/guides/nics/hns3.rst b/doc/guides/nics/hns3.rst
index 84bd7a3..8f48240 100644
--- a/doc/guides/nics/hns3.rst
+++ b/doc/guides/nics/hns3.rst
@@ -46,6 +46,25 @@  Prerequisites
 - Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment.
 
 
+Runtime Config Options
+----------------------
+
+- ``rx_func_hint`` (default ``none``)
+
+  Used to select Rx burst function, supported value are "vec", "sve", "simple", "common".
+  When equal "vec" and meet the vector Rx usage condition then use the default vector Rx implementation, 'neon' for Kunpeng Arm platform.
+  When equal "sve" and meet the sve Rx usage condition then use the sve Rx function.
+  When equal "simple" and meet the simple Rx usage condition then use the simple Rx function which indicates the Scalar algorithm obtained from rte_eth_rx_burst_mode_get.
+  When equal "common" then use the common Rx function which indicates the Scalar Scattered algorithm obtained from rte_eth_rx_burst_mode_get.
+
+- ``tx_func_hint`` (default ``none``)
+
+  Used to select Tx burst function, supported value are "vec", "sve", "simple", "common".
+  When equal "vec" and meet the vector Tx usage condition then use the default vector Tx implementation, 'neon' for Kunpeng Arm platform.
+  When equal "sve" and meet the sve Tx usage condition then use the sve Tx function.
+  When equal "simple" and meet the simple Tx usage condition then use the simple Tx function which indicates the Scalar Simple algorithm obtained from rte_eth_tx_burst_mode_get.
+  When equal "common" then use the common Tx function which indicated the Scalar algorithm obtained from rte_eth_tx_burst_mode_get.
+
 Driver compilation and testing
 ------------------------------
 
diff --git a/doc/guides/rel_notes/release_21_05.rst b/doc/guides/rel_notes/release_21_05.rst
index dc5399f..1d85942 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -60,6 +60,7 @@  New Features
   * Added support for module EEPROM dumping.
   * Added support for freeing Tx mbuf on demand.
   * Added support for copper port in Kunpeng930.
+  * Added support for runtime config to select IO burst function.
 
 * **Updated NXP DPAA2 driver.**
 
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index 9cbcc13..28aa27a 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -6,6 +6,7 @@ 
 #include <rte_bus_pci.h>
 #include <ethdev_pci.h>
 #include <rte_pci.h>
+#include <rte_kvargs.h>
 
 #include "hns3_ethdev.h"
 #include "hns3_logs.h"
@@ -6505,6 +6506,78 @@  hns3_get_module_info(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+hns3_parse_io_hint_func(const char *key, const char *value, void *extra_args)
+{
+	uint32_t hint = HNS3_IO_FUNC_HINT_NONE;
+
+	RTE_SET_USED(key);
+
+	if (strcmp(value, "vec") == 0)
+		hint = HNS3_IO_FUNC_HINT_VEC;
+	else if (strcmp(value, "sve") == 0)
+		hint = HNS3_IO_FUNC_HINT_SVE;
+	else if (strcmp(value, "simple") == 0)
+		hint = HNS3_IO_FUNC_HINT_SIMPLE;
+	else if (strcmp(value, "common") == 0)
+		hint = HNS3_IO_FUNC_HINT_COMMON;
+
+	/* If the hint is valid then update output parameters */
+	if (hint != HNS3_IO_FUNC_HINT_NONE)
+		*(uint32_t *)extra_args = hint;
+
+	return 0;
+}
+
+static const char *
+hns3_get_io_hint_func_name(uint32_t hint)
+{
+	switch (hint) {
+	case HNS3_IO_FUNC_HINT_VEC:
+		return "vec";
+	case HNS3_IO_FUNC_HINT_SVE:
+		return "sve";
+	case HNS3_IO_FUNC_HINT_SIMPLE:
+		return "simple";
+	case HNS3_IO_FUNC_HINT_COMMON:
+		return "common";
+	default:
+		return "none";
+	}
+}
+
+void
+hns3_parse_devargs(struct rte_eth_dev *dev)
+{
+	struct hns3_adapter *hns = dev->data->dev_private;
+	uint32_t rx_func_hint = HNS3_IO_FUNC_HINT_NONE;
+	uint32_t tx_func_hint = HNS3_IO_FUNC_HINT_NONE;
+	struct hns3_hw *hw = &hns->hw;
+	struct rte_kvargs *kvlist;
+
+	if (dev->device->devargs == NULL)
+		return;
+
+	kvlist = rte_kvargs_parse(dev->device->devargs->args, NULL);
+	if (!kvlist)
+		return;
+
+	rte_kvargs_process(kvlist, HNS3_DEVARG_RX_FUNC_HINT,
+			   &hns3_parse_io_hint_func, &rx_func_hint);
+	rte_kvargs_process(kvlist, HNS3_DEVARG_TX_FUNC_HINT,
+			   &hns3_parse_io_hint_func, &tx_func_hint);
+	rte_kvargs_free(kvlist);
+
+	if (rx_func_hint != HNS3_IO_FUNC_HINT_NONE)
+		hns3_warn(hw, "parsed %s = %s.", HNS3_DEVARG_RX_FUNC_HINT,
+			  hns3_get_io_hint_func_name(rx_func_hint));
+	hns->rx_func_hint = rx_func_hint;
+	if (tx_func_hint != HNS3_IO_FUNC_HINT_NONE)
+		hns3_warn(hw, "parsed %s = %s.", HNS3_DEVARG_TX_FUNC_HINT,
+			  hns3_get_io_hint_func_name(tx_func_hint));
+	hns->tx_func_hint = tx_func_hint;
+}
+
 static const struct eth_dev_ops hns3_eth_dev_ops = {
 	.dev_configure      = hns3_dev_configure,
 	.dev_start          = hns3_dev_start,
@@ -6625,6 +6698,7 @@  hns3_dev_init(struct rte_eth_dev *eth_dev)
 	hw->adapter_state = HNS3_NIC_UNINITIALIZED;
 	hns->is_vf = false;
 	hw->data = eth_dev->data;
+	hns3_parse_devargs(eth_dev);
 
 	/*
 	 * Set default max packet size according to the mtu
@@ -6758,5 +6832,8 @@  static struct rte_pci_driver rte_hns3_pmd = {
 RTE_PMD_REGISTER_PCI(net_hns3, rte_hns3_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(net_hns3, pci_id_hns3_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_hns3, "* igb_uio | vfio-pci");
+RTE_PMD_REGISTER_PARAM_STRING(net_hns3,
+		HNS3_DEVARG_RX_FUNC_HINT "=vec|sve|simple|common "
+		HNS3_DEVARG_TX_FUNC_HINT "=vec|sve|simple|common ");
 RTE_LOG_REGISTER(hns3_logtype_init, pmd.net.hns3.init, NOTICE);
 RTE_LOG_REGISTER(hns3_logtype_driver, pmd.net.hns3.driver, NOTICE);
diff --git a/drivers/net/hns3/hns3_ethdev.h b/drivers/net/hns3/hns3_ethdev.h
index 932600d..ec4b475 100644
--- a/drivers/net/hns3/hns3_ethdev.h
+++ b/drivers/net/hns3/hns3_ethdev.h
@@ -772,9 +772,23 @@  struct hns3_adapter {
 	bool tx_simple_allowed;
 	bool tx_vec_allowed;
 
+	uint32_t rx_func_hint;
+	uint32_t tx_func_hint;
+
 	struct hns3_ptype_table ptype_tbl __rte_cache_min_aligned;
 };
 
+enum {
+	HNS3_IO_FUNC_HINT_NONE = 0,
+	HNS3_IO_FUNC_HINT_VEC,
+	HNS3_IO_FUNC_HINT_SVE,
+	HNS3_IO_FUNC_HINT_SIMPLE,
+	HNS3_IO_FUNC_HINT_COMMON
+};
+
+#define HNS3_DEVARG_RX_FUNC_HINT	"rx_func_hint"
+#define HNS3_DEVARG_TX_FUNC_HINT	"tx_func_hint"
+
 #define HNS3_DEV_SUPPORT_DCB_B			0x0
 #define HNS3_DEV_SUPPORT_COPPER_B		0x1
 #define HNS3_DEV_SUPPORT_UDP_GSO_B		0x2
@@ -975,6 +989,7 @@  int hns3_dev_infos_get(struct rte_eth_dev *eth_dev,
 		       struct rte_eth_dev_info *info);
 void hns3vf_update_link_status(struct hns3_hw *hw, uint8_t link_status,
 			  uint32_t link_speed, uint8_t link_duplex);
+void hns3_parse_devargs(struct rte_eth_dev *dev);
 
 static inline bool
 is_reset_pending(struct hns3_adapter *hns)
diff --git a/drivers/net/hns3/hns3_ethdev_vf.c b/drivers/net/hns3/hns3_ethdev_vf.c
index fd20c52..f3eaefb 100644
--- a/drivers/net/hns3/hns3_ethdev_vf.c
+++ b/drivers/net/hns3/hns3_ethdev_vf.c
@@ -2834,6 +2834,7 @@  hns3vf_dev_init(struct rte_eth_dev *eth_dev)
 	hw->adapter_state = HNS3_NIC_UNINITIALIZED;
 	hns->is_vf = true;
 	hw->data = eth_dev->data;
+	hns3_parse_devargs(eth_dev);
 
 	ret = hns3_reset_init(hw);
 	if (ret)
@@ -2962,3 +2963,6 @@  static struct rte_pci_driver rte_hns3vf_pmd = {
 RTE_PMD_REGISTER_PCI(net_hns3_vf, rte_hns3vf_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(net_hns3_vf, pci_id_hns3vf_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_hns3_vf, "* igb_uio | vfio-pci");
+RTE_PMD_REGISTER_PARAM_STRING(net_hns3_vf,
+		HNS3_DEVARG_RX_FUNC_HINT "=vec|sve|simple|common "
+		HNS3_DEVARG_TX_FUNC_HINT "=vec|sve|simple|common ");
diff --git a/drivers/net/hns3/hns3_rxtx.c b/drivers/net/hns3/hns3_rxtx.c
index 00167c4..f5c7d71 100644
--- a/drivers/net/hns3/hns3_rxtx.c
+++ b/drivers/net/hns3/hns3_rxtx.c
@@ -2689,13 +2689,26 @@  hns3_get_rx_function(struct rte_eth_dev *dev)
 {
 	struct hns3_adapter *hns = dev->data->dev_private;
 	uint64_t offloads = dev->data->dev_conf.rxmode.offloads;
+	bool vec_allowed, sve_allowed, simple_allowed;
+
+	vec_allowed = hns->rx_vec_allowed &&
+		      hns3_rx_check_vec_support(dev) == 0;
+	sve_allowed = vec_allowed && hns3_check_sve_support();
+	simple_allowed = hns->rx_simple_allowed && !dev->data->scattered_rx &&
+			 (offloads & DEV_RX_OFFLOAD_TCP_LRO) == 0;
+
+	if (hns->rx_func_hint == HNS3_IO_FUNC_HINT_VEC && vec_allowed)
+		return hns3_recv_pkts_vec;
+	if (hns->rx_func_hint == HNS3_IO_FUNC_HINT_SVE && sve_allowed)
+		return hns3_recv_pkts_vec_sve;
+	if (hns->rx_func_hint == HNS3_IO_FUNC_HINT_SIMPLE && simple_allowed)
+		return hns3_recv_pkts;
+	if (hns->rx_func_hint == HNS3_IO_FUNC_HINT_COMMON)
+		return hns3_recv_scattered_pkts;
 
-	if (hns->rx_vec_allowed && hns3_rx_check_vec_support(dev) == 0)
-		return hns3_check_sve_support() ? hns3_recv_pkts_vec_sve :
-		       hns3_recv_pkts_vec;
-
-	if (hns->rx_simple_allowed && !dev->data->scattered_rx &&
-	    (offloads & DEV_RX_OFFLOAD_TCP_LRO) == 0)
+	if (vec_allowed)
+		return hns3_recv_pkts_vec;
+	if (simple_allowed)
 		return hns3_recv_pkts;
 
 	return hns3_recv_scattered_pkts;
@@ -3930,19 +3943,32 @@  hns3_get_tx_function(struct rte_eth_dev *dev, eth_tx_prep_t *prep)
 {
 	uint64_t offloads = dev->data->dev_conf.txmode.offloads;
 	struct hns3_adapter *hns = dev->data->dev_private;
+	bool vec_allowed, sve_allowed, simple_allowed;
 
-	if (hns->tx_vec_allowed && hns3_tx_check_vec_support(dev) == 0) {
-		*prep = NULL;
-		return hns3_check_sve_support() ? hns3_xmit_pkts_vec_sve :
-			hns3_xmit_pkts_vec;
-	}
+	vec_allowed = hns->tx_vec_allowed &&
+		      hns3_tx_check_vec_support(dev) == 0;
+	sve_allowed = vec_allowed && hns3_check_sve_support();
+	simple_allowed = hns->tx_simple_allowed &&
+			 offloads == (offloads & DEV_TX_OFFLOAD_MBUF_FAST_FREE);
 
-	if (hns->tx_simple_allowed &&
-	    offloads == (offloads & DEV_TX_OFFLOAD_MBUF_FAST_FREE)) {
-		*prep = NULL;
+	*prep = NULL;
+
+	if (hns->tx_func_hint == HNS3_IO_FUNC_HINT_VEC && vec_allowed)
+		return hns3_xmit_pkts_vec;
+	if (hns->tx_func_hint == HNS3_IO_FUNC_HINT_SVE && sve_allowed)
+		return hns3_xmit_pkts_vec_sve;
+	if (hns->tx_func_hint == HNS3_IO_FUNC_HINT_SIMPLE && simple_allowed)
 		return hns3_xmit_pkts_simple;
+	if (hns->tx_func_hint == HNS3_IO_FUNC_HINT_COMMON) {
+		*prep = hns3_prep_pkts;
+		return hns3_xmit_pkts;
 	}
 
+	if (vec_allowed)
+		return hns3_xmit_pkts_vec;
+	if (simple_allowed)
+		return hns3_xmit_pkts_simple;
+
 	*prep = hns3_prep_pkts;
 	return hns3_xmit_pkts;
 }