mbox series

[v3,0/5] vhost add vectorized data path

Message ID 20201009081410.63944-1-yong.liu@intel.com (mailing list archive)
Headers
Series vhost add vectorized data path |

Message

Marvin Liu Oct. 9, 2020, 8:14 a.m. UTC
  Packed ring format is imported since virtio spec 1.1. All descriptors
are compacted into one single ring when packed ring format is on. It is
straight forward that ring operations can be accelerated by utilizing
SIMD instructions. 

This patch set will introduce vectorized data path in vhost library. If
vectorized option is on, operations like descs check, descs writeback,
address translation will be accelerated by SIMD instructions. On skylake
server, it can bring 6% performance gain in loopback case and around 4%
performance gain in PvP case.

Vhost application can choose whether using vectorized acceleration, just
like external buffer feature. If platform or ring format not support
vectorized function, vhost will fallback to use default batch function.
There will be no impact in current data path.

v3:
* rename vectorized datapath file
* eliminate the impact when avx512 disabled
* dynamically allocate memory regions structure
* remove unlikely hint for in_order

v2:
* add vIOMMU support
* add dequeue offloading
* rebase code

Marvin Liu (5):
  vhost: add vectorized data path
  vhost: reuse packed ring functions
  vhost: prepare memory regions addresses
  vhost: add packed ring vectorized dequeue
  vhost: add packed ring vectorized enqueue

 doc/guides/nics/vhost.rst           |   5 +
 doc/guides/prog_guide/vhost_lib.rst |  12 +
 drivers/net/vhost/rte_eth_vhost.c   |  17 +-
 lib/librte_vhost/meson.build        |  16 ++
 lib/librte_vhost/rte_vhost.h        |   1 +
 lib/librte_vhost/socket.c           |   5 +
 lib/librte_vhost/vhost.c            |  11 +
 lib/librte_vhost/vhost.h            | 239 +++++++++++++++++++
 lib/librte_vhost/vhost_user.c       |  26 +++
 lib/librte_vhost/virtio_net.c       | 258 ++++-----------------
 lib/librte_vhost/virtio_net_avx.c   | 344 ++++++++++++++++++++++++++++
 11 files changed, 718 insertions(+), 216 deletions(-)
 create mode 100644 lib/librte_vhost/virtio_net_avx.c
  

Comments

Maxime Coquelin Oct. 12, 2020, 8:21 a.m. UTC | #1
Hi Marvin,

On 10/9/20 10:14 AM, Marvin Liu wrote:
> Packed ring format is imported since virtio spec 1.1. All descriptors
> are compacted into one single ring when packed ring format is on. It is
> straight forward that ring operations can be accelerated by utilizing
> SIMD instructions. 
> 
> This patch set will introduce vectorized data path in vhost library. If
> vectorized option is on, operations like descs check, descs writeback,
> address translation will be accelerated by SIMD instructions. On skylake
> server, it can bring 6% performance gain in loopback case and around 4%
> performance gain in PvP case.

IMHO, 4% gain on PVP is not a significant gain if we compare to the
added complexity. Moreover, I guess this is 4% gain with testpmd-based
PVP? If this is the case it may be even lower with OVS-DPDK PVP
benchmark, I will try to do a benchmark this week.

Thanks,
Maxime

> Vhost application can choose whether using vectorized acceleration, just
> like external buffer feature. If platform or ring format not support
> vectorized function, vhost will fallback to use default batch function.
> There will be no impact in current data path.
> 
> v3:
> * rename vectorized datapath file
> * eliminate the impact when avx512 disabled
> * dynamically allocate memory regions structure
> * remove unlikely hint for in_order
> 
> v2:
> * add vIOMMU support
> * add dequeue offloading
> * rebase code
> 
> Marvin Liu (5):
>   vhost: add vectorized data path
>   vhost: reuse packed ring functions
>   vhost: prepare memory regions addresses
>   vhost: add packed ring vectorized dequeue
>   vhost: add packed ring vectorized enqueue
> 
>  doc/guides/nics/vhost.rst           |   5 +
>  doc/guides/prog_guide/vhost_lib.rst |  12 +
>  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
>  lib/librte_vhost/meson.build        |  16 ++
>  lib/librte_vhost/rte_vhost.h        |   1 +
>  lib/librte_vhost/socket.c           |   5 +
>  lib/librte_vhost/vhost.c            |  11 +
>  lib/librte_vhost/vhost.h            | 239 +++++++++++++++++++
>  lib/librte_vhost/vhost_user.c       |  26 +++
>  lib/librte_vhost/virtio_net.c       | 258 ++++-----------------
>  lib/librte_vhost/virtio_net_avx.c   | 344 ++++++++++++++++++++++++++++
>  11 files changed, 718 insertions(+), 216 deletions(-)
>  create mode 100644 lib/librte_vhost/virtio_net_avx.c
>
  
Marvin Liu Oct. 12, 2020, 9:10 a.m. UTC | #2
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Monday, October 12, 2020 4:22 PM
> To: Liu, Yong <yong.liu@intel.com>; Xia, Chenbo <chenbo.xia@intel.com>;
> Wang, Zhihong <zhihong.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
> 
> Hi Marvin,
> 
> On 10/9/20 10:14 AM, Marvin Liu wrote:
> > Packed ring format is imported since virtio spec 1.1. All descriptors
> > are compacted into one single ring when packed ring format is on. It is
> > straight forward that ring operations can be accelerated by utilizing
> > SIMD instructions.
> >
> > This patch set will introduce vectorized data path in vhost library. If
> > vectorized option is on, operations like descs check, descs writeback,
> > address translation will be accelerated by SIMD instructions. On skylake
> > server, it can bring 6% performance gain in loopback case and around 4%
> > performance gain in PvP case.
> 
> IMHO, 4% gain on PVP is not a significant gain if we compare to the
> added complexity. Moreover, I guess this is 4% gain with testpmd-based
> PVP? If this is the case it may be even lower with OVS-DPDK PVP
> benchmark, I will try to do a benchmark this week.
> 

Maxime, 
I have observed around 3% gain with OVS-DPDK in first version. But the number is not reliable as datapath has been changed. 
I will try again after fixed OVS integration issue with latest dpdk. 

> Thanks,
> Maxime
> 
> > Vhost application can choose whether using vectorized acceleration, just
> > like external buffer feature. If platform or ring format not support
> > vectorized function, vhost will fallback to use default batch function.
> > There will be no impact in current data path.
> >
> > v3:
> > * rename vectorized datapath file
> > * eliminate the impact when avx512 disabled
> > * dynamically allocate memory regions structure
> > * remove unlikely hint for in_order
> >
> > v2:
> > * add vIOMMU support
> > * add dequeue offloading
> > * rebase code
> >
> > Marvin Liu (5):
> >   vhost: add vectorized data path
> >   vhost: reuse packed ring functions
> >   vhost: prepare memory regions addresses
> >   vhost: add packed ring vectorized dequeue
> >   vhost: add packed ring vectorized enqueue
> >
> >  doc/guides/nics/vhost.rst           |   5 +
> >  doc/guides/prog_guide/vhost_lib.rst |  12 +
> >  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
> >  lib/librte_vhost/meson.build        |  16 ++
> >  lib/librte_vhost/rte_vhost.h        |   1 +
> >  lib/librte_vhost/socket.c           |   5 +
> >  lib/librte_vhost/vhost.c            |  11 +
> >  lib/librte_vhost/vhost.h            | 239 +++++++++++++++++++
> >  lib/librte_vhost/vhost_user.c       |  26 +++
> >  lib/librte_vhost/virtio_net.c       | 258 ++++-----------------
> >  lib/librte_vhost/virtio_net_avx.c   | 344 ++++++++++++++++++++++++++++
> >  11 files changed, 718 insertions(+), 216 deletions(-)
> >  create mode 100644 lib/librte_vhost/virtio_net_avx.c
> >
  
Maxime Coquelin Oct. 12, 2020, 9:57 a.m. UTC | #3
Hi Marvin,

On 10/12/20 11:10 AM, Liu, Yong wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Monday, October 12, 2020 4:22 PM
>> To: Liu, Yong <yong.liu@intel.com>; Xia, Chenbo <chenbo.xia@intel.com>;
>> Wang, Zhihong <zhihong.wang@intel.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
>>
>> Hi Marvin,
>>
>> On 10/9/20 10:14 AM, Marvin Liu wrote:
>>> Packed ring format is imported since virtio spec 1.1. All descriptors
>>> are compacted into one single ring when packed ring format is on. It is
>>> straight forward that ring operations can be accelerated by utilizing
>>> SIMD instructions.
>>>
>>> This patch set will introduce vectorized data path in vhost library. If
>>> vectorized option is on, operations like descs check, descs writeback,
>>> address translation will be accelerated by SIMD instructions. On skylake
>>> server, it can bring 6% performance gain in loopback case and around 4%
>>> performance gain in PvP case.
>>
>> IMHO, 4% gain on PVP is not a significant gain if we compare to the
>> added complexity. Moreover, I guess this is 4% gain with testpmd-based
>> PVP? If this is the case it may be even lower with OVS-DPDK PVP
>> benchmark, I will try to do a benchmark this week.
>>
> 
> Maxime, 
> I have observed around 3% gain with OVS-DPDK in first version. But the number is not reliable as datapath has been changed. 
> I will try again after fixed OVS integration issue with latest dpdk. 

Thanks for the information.

Also, wouldn't using AVX512 lower the CPU frequency?
If so, could it have an impact on the workload running on the other
CPUs?

Thanks,
Maxime

>> Thanks,
>> Maxime
>>
>>> Vhost application can choose whether using vectorized acceleration, just
>>> like external buffer feature. If platform or ring format not support
>>> vectorized function, vhost will fallback to use default batch function.
>>> There will be no impact in current data path.
>>>
>>> v3:
>>> * rename vectorized datapath file
>>> * eliminate the impact when avx512 disabled
>>> * dynamically allocate memory regions structure
>>> * remove unlikely hint for in_order
>>>
>>> v2:
>>> * add vIOMMU support
>>> * add dequeue offloading
>>> * rebase code
>>>
>>> Marvin Liu (5):
>>>   vhost: add vectorized data path
>>>   vhost: reuse packed ring functions
>>>   vhost: prepare memory regions addresses
>>>   vhost: add packed ring vectorized dequeue
>>>   vhost: add packed ring vectorized enqueue
>>>
>>>  doc/guides/nics/vhost.rst           |   5 +
>>>  doc/guides/prog_guide/vhost_lib.rst |  12 +
>>>  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
>>>  lib/librte_vhost/meson.build        |  16 ++
>>>  lib/librte_vhost/rte_vhost.h        |   1 +
>>>  lib/librte_vhost/socket.c           |   5 +
>>>  lib/librte_vhost/vhost.c            |  11 +
>>>  lib/librte_vhost/vhost.h            | 239 +++++++++++++++++++
>>>  lib/librte_vhost/vhost_user.c       |  26 +++
>>>  lib/librte_vhost/virtio_net.c       | 258 ++++-----------------
>>>  lib/librte_vhost/virtio_net_avx.c   | 344 ++++++++++++++++++++++++++++
>>>  11 files changed, 718 insertions(+), 216 deletions(-)
>>>  create mode 100644 lib/librte_vhost/virtio_net_avx.c
>>>
>
  
Marvin Liu Oct. 12, 2020, 1:24 p.m. UTC | #4
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Monday, October 12, 2020 5:57 PM
> To: Liu, Yong <yong.liu@intel.com>; Xia, Chenbo <chenbo.xia@intel.com>;
> Wang, Zhihong <zhihong.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
> 
> Hi Marvin,
> 
> On 10/12/20 11:10 AM, Liu, Yong wrote:
> >
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Monday, October 12, 2020 4:22 PM
> >> To: Liu, Yong <yong.liu@intel.com>; Xia, Chenbo
> <chenbo.xia@intel.com>;
> >> Wang, Zhihong <zhihong.wang@intel.com>
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
> >>
> >> Hi Marvin,
> >>
> >> On 10/9/20 10:14 AM, Marvin Liu wrote:
> >>> Packed ring format is imported since virtio spec 1.1. All descriptors
> >>> are compacted into one single ring when packed ring format is on. It is
> >>> straight forward that ring operations can be accelerated by utilizing
> >>> SIMD instructions.
> >>>
> >>> This patch set will introduce vectorized data path in vhost library. If
> >>> vectorized option is on, operations like descs check, descs writeback,
> >>> address translation will be accelerated by SIMD instructions. On skylake
> >>> server, it can bring 6% performance gain in loopback case and around 4%
> >>> performance gain in PvP case.
> >>
> >> IMHO, 4% gain on PVP is not a significant gain if we compare to the
> >> added complexity. Moreover, I guess this is 4% gain with testpmd-based
> >> PVP? If this is the case it may be even lower with OVS-DPDK PVP
> >> benchmark, I will try to do a benchmark this week.
> >>
> >
> > Maxime,
> > I have observed around 3% gain with OVS-DPDK in first version. But the
> number is not reliable as datapath has been changed.
> > I will try again after fixed OVS integration issue with latest dpdk.
> 
> Thanks for the information.
> 
> Also, wouldn't using AVX512 lower the CPU frequency?
> If so, could it have an impact on the workload running on the other
> CPUs?
> 

All AVX512 instructions used in vhost are lightweight ones, frequency won't be affected. 
Theoretically system performance won’t be affected if only lightweight instructions are used. 

Thanks.

> Thanks,
> Maxime
> 
> >> Thanks,
> >> Maxime
> >>
> >>> Vhost application can choose whether using vectorized acceleration,
> just
> >>> like external buffer feature. If platform or ring format not support
> >>> vectorized function, vhost will fallback to use default batch function.
> >>> There will be no impact in current data path.
> >>>
> >>> v3:
> >>> * rename vectorized datapath file
> >>> * eliminate the impact when avx512 disabled
> >>> * dynamically allocate memory regions structure
> >>> * remove unlikely hint for in_order
> >>>
> >>> v2:
> >>> * add vIOMMU support
> >>> * add dequeue offloading
> >>> * rebase code
> >>>
> >>> Marvin Liu (5):
> >>>   vhost: add vectorized data path
> >>>   vhost: reuse packed ring functions
> >>>   vhost: prepare memory regions addresses
> >>>   vhost: add packed ring vectorized dequeue
> >>>   vhost: add packed ring vectorized enqueue
> >>>
> >>>  doc/guides/nics/vhost.rst           |   5 +
> >>>  doc/guides/prog_guide/vhost_lib.rst |  12 +
> >>>  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
> >>>  lib/librte_vhost/meson.build        |  16 ++
> >>>  lib/librte_vhost/rte_vhost.h        |   1 +
> >>>  lib/librte_vhost/socket.c           |   5 +
> >>>  lib/librte_vhost/vhost.c            |  11 +
> >>>  lib/librte_vhost/vhost.h            | 239 +++++++++++++++++++
> >>>  lib/librte_vhost/vhost_user.c       |  26 +++
> >>>  lib/librte_vhost/virtio_net.c       | 258 ++++-----------------
> >>>  lib/librte_vhost/virtio_net_avx.c   | 344
> ++++++++++++++++++++++++++++
> >>>  11 files changed, 718 insertions(+), 216 deletions(-)
> >>>  create mode 100644 lib/librte_vhost/virtio_net_avx.c
> >>>
> >
  
Marvin Liu Oct. 15, 2020, 3:28 p.m. UTC | #5
Hi All,
Performance gain from vectorized datapath in OVS-DPDK is around 1%, meanwhile it have a small impact of original datapath. 
On the other hand, it will increase the complexity of vhost (new parameter introduced, prepare memory information for address translation). 
After weighed the procs and co, I’d like to drawback this patch set.  Thanks for your time.

Regards,
Marvin

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Monday, October 12, 2020 4:22 PM
> To: Liu, Yong <yong.liu@intel.com>; Xia, Chenbo <chenbo.xia@intel.com>;
> Wang, Zhihong <zhihong.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
> 
> Hi Marvin,
> 
> On 10/9/20 10:14 AM, Marvin Liu wrote:
> > Packed ring format is imported since virtio spec 1.1. All descriptors
> > are compacted into one single ring when packed ring format is on. It is
> > straight forward that ring operations can be accelerated by utilizing
> > SIMD instructions.
> >
> > This patch set will introduce vectorized data path in vhost library. If
> > vectorized option is on, operations like descs check, descs writeback,
> > address translation will be accelerated by SIMD instructions. On skylake
> > server, it can bring 6% performance gain in loopback case and around 4%
> > performance gain in PvP case.
> 
> IMHO, 4% gain on PVP is not a significant gain if we compare to the
> added complexity. Moreover, I guess this is 4% gain with testpmd-based
> PVP? If this is the case it may be even lower with OVS-DPDK PVP
> benchmark, I will try to do a benchmark this week.
> 
> Thanks,
> Maxime
> 
> > Vhost application can choose whether using vectorized acceleration, just
> > like external buffer feature. If platform or ring format not support
> > vectorized function, vhost will fallback to use default batch function.
> > There will be no impact in current data path.
> >
> > v3:
> > * rename vectorized datapath file
> > * eliminate the impact when avx512 disabled
> > * dynamically allocate memory regions structure
> > * remove unlikely hint for in_order
> >
> > v2:
> > * add vIOMMU support
> > * add dequeue offloading
> > * rebase code
> >
> > Marvin Liu (5):
> >   vhost: add vectorized data path
> >   vhost: reuse packed ring functions
> >   vhost: prepare memory regions addresses
> >   vhost: add packed ring vectorized dequeue
> >   vhost: add packed ring vectorized enqueue
> >
> >  doc/guides/nics/vhost.rst           |   5 +
> >  doc/guides/prog_guide/vhost_lib.rst |  12 +
> >  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
> >  lib/librte_vhost/meson.build        |  16 ++
> >  lib/librte_vhost/rte_vhost.h        |   1 +
> >  lib/librte_vhost/socket.c           |   5 +
> >  lib/librte_vhost/vhost.c            |  11 +
> >  lib/librte_vhost/vhost.h            | 239 +++++++++++++++++++
> >  lib/librte_vhost/vhost_user.c       |  26 +++
> >  lib/librte_vhost/virtio_net.c       | 258 ++++-----------------
> >  lib/librte_vhost/virtio_net_avx.c   | 344 ++++++++++++++++++++++++++++
> >  11 files changed, 718 insertions(+), 216 deletions(-)
> >  create mode 100644 lib/librte_vhost/virtio_net_avx.c
> >
  
Maxime Coquelin Oct. 15, 2020, 3:35 p.m. UTC | #6
Hi Marvin,

On 10/15/20 5:28 PM, Liu, Yong wrote:
> Hi All,
> Performance gain from vectorized datapath in OVS-DPDK is around 1%, meanwhile it have a small impact of original datapath. 
> On the other hand, it will increase the complexity of vhost (new parameter introduced, prepare memory information for address translation). 
> After weighed the procs and co, I’d like to drawback this patch set.  Thanks for your time.

Thanks for running the test with the new version.
I have removed it from Patchwork.

Thanks,
Maxime

> Regards,
> Marvin
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Monday, October 12, 2020 4:22 PM
>> To: Liu, Yong <yong.liu@intel.com>; Xia, Chenbo <chenbo.xia@intel.com>;
>> Wang, Zhihong <zhihong.wang@intel.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
>>
>> Hi Marvin,
>>
>> On 10/9/20 10:14 AM, Marvin Liu wrote:
>>> Packed ring format is imported since virtio spec 1.1. All descriptors
>>> are compacted into one single ring when packed ring format is on. It is
>>> straight forward that ring operations can be accelerated by utilizing
>>> SIMD instructions.
>>>
>>> This patch set will introduce vectorized data path in vhost library. If
>>> vectorized option is on, operations like descs check, descs writeback,
>>> address translation will be accelerated by SIMD instructions. On skylake
>>> server, it can bring 6% performance gain in loopback case and around 4%
>>> performance gain in PvP case.
>>
>> IMHO, 4% gain on PVP is not a significant gain if we compare to the
>> added complexity. Moreover, I guess this is 4% gain with testpmd-based
>> PVP? If this is the case it may be even lower with OVS-DPDK PVP
>> benchmark, I will try to do a benchmark this week.
>>
>> Thanks,
>> Maxime
>>
>>> Vhost application can choose whether using vectorized acceleration, just
>>> like external buffer feature. If platform or ring format not support
>>> vectorized function, vhost will fallback to use default batch function.
>>> There will be no impact in current data path.
>>>
>>> v3:
>>> * rename vectorized datapath file
>>> * eliminate the impact when avx512 disabled
>>> * dynamically allocate memory regions structure
>>> * remove unlikely hint for in_order
>>>
>>> v2:
>>> * add vIOMMU support
>>> * add dequeue offloading
>>> * rebase code
>>>
>>> Marvin Liu (5):
>>>   vhost: add vectorized data path
>>>   vhost: reuse packed ring functions
>>>   vhost: prepare memory regions addresses
>>>   vhost: add packed ring vectorized dequeue
>>>   vhost: add packed ring vectorized enqueue
>>>
>>>  doc/guides/nics/vhost.rst           |   5 +
>>>  doc/guides/prog_guide/vhost_lib.rst |  12 +
>>>  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
>>>  lib/librte_vhost/meson.build        |  16 ++
>>>  lib/librte_vhost/rte_vhost.h        |   1 +
>>>  lib/librte_vhost/socket.c           |   5 +
>>>  lib/librte_vhost/vhost.c            |  11 +
>>>  lib/librte_vhost/vhost.h            | 239 +++++++++++++++++++
>>>  lib/librte_vhost/vhost_user.c       |  26 +++
>>>  lib/librte_vhost/virtio_net.c       | 258 ++++-----------------
>>>  lib/librte_vhost/virtio_net_avx.c   | 344 ++++++++++++++++++++++++++++
>>>  11 files changed, 718 insertions(+), 216 deletions(-)
>>>  create mode 100644 lib/librte_vhost/virtio_net_avx.c
>>>
>