mbox series

[v5,00/23] Add DLB2 PMD

Message ID 1604051021-26699-1-git-send-email-timothy.mcdaniel@intel.com (mailing list archive)
Headers
Series Add DLB2 PMD |

Message

Timothy McDaniel Oct. 30, 2020, 9:43 a.m. UTC
  The following patch series adds support for a new eventdev PMD. The DLB2
PMD adds support for the Intel Dynamic Load Balancer 2.0 (DLB2)
hardware.
The DLB2 is a PCIe device that provides load-balanced, prioritized
scheduling of core-to-core communication. The device consists of
queues and arbiters that connect producer and consumer cores, and
implements load-balanced queueing features including:
- Lock-free multi-producer/multi-consumer operation.
- Multiple priority levels for varying traffic types.
- 'Direct' traffic (i.e. multi-producer/single-consumer)
- Simple unordered load-balanced distribution.
- Atomic lock-free load balancing across multiple consumers.
- Queue element reordering feature allowing ordered load-balanced
  distribution.

The DLB2 hardware supports both load balanced and directed ports and
queues. Unlike other eventdev devices already in the repo,  not all
DLB2 ports and queues are equally capable. In particular, directed
ports are limited to a single link, and must be connected to a
directed queue. Additionally, even though LDB ports may link multiple
queues,
the number of queues that may be linked is limited by hardware.

While reviewing the code, please be aware that this PMD has full
control over the DLB2 hardware. Intel will be extending the DLB2 PMD
in the future (not as part of this first series) with a mode that we
refer to as the bifurcated PMD. The bifurcated PMD communicates with a
kernel driver to configure the device, ports, and queues, and memory
maps device MMIO so datapath operations occur purely in user-space.
Note that the DLB2 hardware is a successor of the DLB hardware, and
as such is structured similarly, both in terms of code layout and
implementation.

The framework to support both the PF PMD and bifurcated PMD exists in
this patchset, and is why the iface.[ch] layer is present.

Major changes since V4:
================
- moved introduction of dlb in relnotes_20_11 to first patch in series
- fixed underlines in dlb.rst that were too short
- note that the code still uses its private byte-encoded versions of
  umonitor/umwait, rather than the new functions in the power
  patch that are built on top of those intrinsics. This is intentional.

Changes since V3:
================
- updated MAINTAINERS file to alphabetically insert DLB
- don't create RTE_ symbols in pmd
- converted to use version.map scheme
- converted to use .._master_lcore instead of .._main_lcore
- this patch set is based on dpdk-next-eventdev

Changes since V2:
================
- fixed meson conditional build. Moved test into driver’s meson.build
  file instead of event/meson.build
- documentation is populated as associated code is introduced
- add log_register in add dynamic logging patch
- rename RTE_xxx symbol(s) as DLB2_xxx
- replaced function ptr enqueue_four with direct call to movdir64b
- remove unused port_pages
- broke up probe patch into 3 smaller patches for easier review
- changed param order of movdir64b/movntdq to match intrinsics
- added self to MAINTAINERS files
- squashed announcement of availability into last patch in series
- correct spelling errors and delete repeated words
- DPDK_21.0 -> DPDK 21 in map file
- add experimental banner to public structs and APIs

Changes since V1:
=================
- implement changes requested in code reviews by Gage Eads and Mike
  Chen
- fix a memzone leak
- convert to use eal rte-cpuflags patch from Liang Ma

Known Issues:
- the documentation contained in dlb2.rst is not complete, and
  should be updated to include command line parameters (class of service,
        etc ...).

Depends-on: patch-82202 ("eventdev: increase MAX QUEUES PER DEV to 255")
Depends-on: patch-79539 ("eal: add new x86 cpuid support for WAITPKG")

Timothy McDaniel (23):
  event/dlb2: add documentation and meson build infrastructure
  event/dlb2: add dynamic logging
  event/dlb2: add private data structures and constants
  event/dlb2: add definitions shared with LKM or shared code
  event/dlb2: add inline functions
  event/dlb2: add eventdev probe
  event/dlb2: add flexible interface
  event/dlb2: add probe-time hardware init
  event/dlb2: add xstats
  event/dlb2: add infos get and configure
  event/dlb2: add queue and port default conf
  event/dlb2: add queue setup
  event/dlb2: add port setup
  event/dlb2: add port link
  event/dlb2: add port unlink and port unlinks in progress
  event/dlb2: add eventdev start
  event/dlb2: add enqueue and its burst variants
  event/dlb2: add dequeue and its burst variants
  event/dlb2: add eventdev stop and close
  event/dlb2: add PMD's token pop public interface
  event/dlb2: add PMD self-tests
  event/dlb2: add queue and port release
  event/dlb2: add timeout ticks entry point

 MAINTAINERS                                    |    6 +-
 app/test/test_eventdev.c                       |    7 +
 config/rte_config.h                            |    7 +
 doc/api/doxy-api-index.md                      |    1 +
 doc/guides/eventdevs/dlb2.rst                  |  365 ++
 doc/guides/eventdevs/index.rst                 |    1 +
 doc/guides/rel_notes/release_20_11.rst         |    5 +
 drivers/event/dlb2/dlb2.c                      | 3951 ++++++++++++++++
 drivers/event/dlb2/dlb2_iface.c                |   74 +
 drivers/event/dlb2/dlb2_iface.h                |   74 +
 drivers/event/dlb2/dlb2_inline_fns.h           |   61 +
 drivers/event/dlb2/dlb2_log.h                  |   25 +
 drivers/event/dlb2/dlb2_priv.h                 |  578 +++
 drivers/event/dlb2/dlb2_selftest.c             | 1570 ++++++
 drivers/event/dlb2/dlb2_user.h                 |  679 +++
 drivers/event/dlb2/dlb2_xstats.c               | 1240 +++++
 drivers/event/dlb2/meson.build                 |   22 +
 drivers/event/dlb2/pf/base/dlb2_hw_types.h     |  367 ++
 drivers/event/dlb2/pf/base/dlb2_mbox.h         |  596 +++
 drivers/event/dlb2/pf/base/dlb2_osdep.h        |  230 +
 drivers/event/dlb2/pf/base/dlb2_osdep_bitmap.h |  440 ++
 drivers/event/dlb2/pf/base/dlb2_osdep_list.h   |  131 +
 drivers/event/dlb2/pf/base/dlb2_osdep_types.h  |   31 +
 drivers/event/dlb2/pf/base/dlb2_regs.h         | 2527 ++++++++++
 drivers/event/dlb2/pf/base/dlb2_resource.c     | 6027 ++++++++++++++++++++++++
 drivers/event/dlb2/pf/base/dlb2_resource.h     | 1913 ++++++++
 drivers/event/dlb2/pf/dlb2_main.c              |  673 +++
 drivers/event/dlb2/pf/dlb2_main.h              |   97 +
 drivers/event/dlb2/pf/dlb2_pf.c                |  728 +++
 drivers/event/dlb2/rte_pmd_dlb2.c              |   39 +
 drivers/event/dlb2/rte_pmd_dlb2.h              |   72 +
 drivers/event/dlb2/version.map                 |    9 +
 drivers/event/meson.build                      |    3 +-
 33 files changed, 22547 insertions(+), 2 deletions(-)
 create mode 100644 doc/guides/eventdevs/dlb2.rst
 create mode 100644 drivers/event/dlb2/dlb2.c
 create mode 100644 drivers/event/dlb2/dlb2_iface.c
 create mode 100644 drivers/event/dlb2/dlb2_iface.h
 create mode 100644 drivers/event/dlb2/dlb2_inline_fns.h
 create mode 100644 drivers/event/dlb2/dlb2_log.h
 create mode 100644 drivers/event/dlb2/dlb2_priv.h
 create mode 100644 drivers/event/dlb2/dlb2_selftest.c
 create mode 100644 drivers/event/dlb2/dlb2_user.h
 create mode 100644 drivers/event/dlb2/dlb2_xstats.c
 create mode 100644 drivers/event/dlb2/meson.build
 create mode 100644 drivers/event/dlb2/pf/base/dlb2_hw_types.h
 create mode 100644 drivers/event/dlb2/pf/base/dlb2_mbox.h
 create mode 100644 drivers/event/dlb2/pf/base/dlb2_osdep.h
 create mode 100644 drivers/event/dlb2/pf/base/dlb2_osdep_bitmap.h
 create mode 100644 drivers/event/dlb2/pf/base/dlb2_osdep_list.h
 create mode 100644 drivers/event/dlb2/pf/base/dlb2_osdep_types.h
 create mode 100644 drivers/event/dlb2/pf/base/dlb2_regs.h
 create mode 100644 drivers/event/dlb2/pf/base/dlb2_resource.c
 create mode 100644 drivers/event/dlb2/pf/base/dlb2_resource.h
 create mode 100644 drivers/event/dlb2/pf/dlb2_main.c
 create mode 100644 drivers/event/dlb2/pf/dlb2_main.h
 create mode 100644 drivers/event/dlb2/pf/dlb2_pf.c
 create mode 100644 drivers/event/dlb2/rte_pmd_dlb2.c
 create mode 100644 drivers/event/dlb2/rte_pmd_dlb2.h
 create mode 100644 drivers/event/dlb2/version.map
  

Comments

Thomas Monjalon Oct. 30, 2020, 10:01 a.m. UTC | #1
30/10/2020 10:43, Timothy McDaniel:
> - note that the code still uses its private byte-encoded versions of
>   umonitor/umwait, rather than the new functions in the power
>   patch that are built on top of those intrinsics. This is intentional.

Why? Now these intrinsics are available in the main branch.
We should avoid duplicating such code.
  
Timothy McDaniel Oct. 30, 2020, 10:16 a.m. UTC | #2
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, October 30, 2020 5:02 AM
> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> Cc: dev@dpdk.org; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, Gage
> <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>;
> jerinj@marvell.com; david.marchand@redhat.com
> Subject: Re: [PATCH v5 00/23] Add DLB2 PMD
> 
> 30/10/2020 10:43, Timothy McDaniel:
> > - note that the code still uses its private byte-encoded versions of
> >   umonitor/umwait, rather than the new functions in the power
> >   patch that are built on top of those intrinsics. This is intentional.
> 
> Why? Now these intrinsics are available in the main branch.
> We should avoid duplicating such code.
> 
> 

I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so that DLB/DLB2 could use them instead of its own private byte-encoded versions, but instead we have these wrappers that call the low level intrinsics. Those wrappers
introduce additional overhead that is not required for DLB/DLB2. I have a meeting with Ma Liang on Monday to discuss.
  
Jerin Jacob Oct. 30, 2020, 10:32 a.m. UTC | #3
On Fri, Oct 30, 2020 at 3:46 PM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Friday, October 30, 2020 5:02 AM
> > To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> > Cc: dev@dpdk.org; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, Gage
> > <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>;
> > jerinj@marvell.com; david.marchand@redhat.com
> > Subject: Re: [PATCH v5 00/23] Add DLB2 PMD
> >
> > 30/10/2020 10:43, Timothy McDaniel:
> > > - note that the code still uses its private byte-encoded versions of
> > >   umonitor/umwait, rather than the new functions in the power
> > >   patch that are built on top of those intrinsics. This is intentional.
> >
> > Why? Now these intrinsics are available in the main branch.
> > We should avoid duplicating such code.
> >
> >
>
> I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so that DLB/DLB2 could use them instead of its own private byte-encoded versions, but instead we have these wrappers that call the low level intrinsics. Those wrappers
> introduce additional overhead that is not required for DLB/DLB2. I have a meeting with Ma Liang on Monday to discuss.

Then why we merged the EAL patches? The all-purpose was to use this by
other subsystems. If it is only for the power library then we should
make specific to the power library.

Thomas, Should I take this series in eventdev or I need to wait to
sort out this?
  
Thomas Monjalon Oct. 30, 2020, 10:43 a.m. UTC | #4
30/10/2020 11:32, Jerin Jacob:
> McDaniel, Timothy <timothy.mcdaniel@intel.com> wrote:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 30/10/2020 10:43, Timothy McDaniel:
> > > > - note that the code still uses its private byte-encoded versions of
> > > >   umonitor/umwait, rather than the new functions in the power
> > > >   patch that are built on top of those intrinsics. This is intentional.
> > >
> > > Why? Now these intrinsics are available in the main branch.
> > > We should avoid duplicating such code.
> >
> > I had asked that the low level intrinsics (UMWAIT/UMONITOR)
> > be split out so that DLB/DLB2 could use them instead
> > of its own private byte-encoded versions,
> > but instead we have these wrappers that call the low level
> > intrinsics. Those wrappers introduce additional overhead
> > that is not required for DLB/DLB2.
> > I have a meeting with Ma Liang on Monday to discuss.

Why did not you tell it on the mailing list?
It would have prevented from merging a wrong/useless API.

I am now convinced that the hard push to get those intrinsics
which started with a lack of communication (no roadmap, no Cc, no reply)
is really a bad story in the community process.

> Then why we merged the EAL patches? The all-purpose was to use this by
> other subsystems. If it is only for the power library then we should
> make specific to the power library.

I agree with you Jerin.

> Thomas, Should I take this series in eventdev
> or I need to wait to sort out this?

I think you could merge those patches and I revert the EAL ones.
In any case, I won't merge any more patch about power monitor
in this release. I don't like swimming in the fog.
  
Timothy McDaniel Oct. 30, 2020, 11:58 a.m. UTC | #5
> -----Original Message-----
> From: McDaniel, Timothy
> Sent: Friday, October 30, 2020 5:17 AM
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; Carrillo, Erik G <Erik.G.Carrillo@intel.com>; Eads, Gage
> <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>;
> jerinj@marvell.com; david.marchand@redhat.com
> Subject: RE: [PATCH v5 00/23] Add DLB2 PMD
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Friday, October 30, 2020 5:02 AM
> > To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> > Cc: dev@dpdk.org; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, Gage
> > <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>;
> > jerinj@marvell.com; david.marchand@redhat.com
> > Subject: Re: [PATCH v5 00/23] Add DLB2 PMD
> >
> > 30/10/2020 10:43, Timothy McDaniel:
> > > - note that the code still uses its private byte-encoded versions of
> > >   umonitor/umwait, rather than the new functions in the power
> > >   patch that are built on top of those intrinsics. This is intentional.
> >
> > Why? Now these intrinsics are available in the main branch.
> > We should avoid duplicating such code.
> >
> >
> 
> I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so
> that DLB/DLB2 could use them instead of its own private byte-encoded versions,
> but instead we have these wrappers that call the low level intrinsics. Those
> wrappers
> introduce additional overhead that is not required for DLB/DLB2. I have a
> meeting with Ma Liang on Monday to discuss.

I thought the ask of DLB was to just substitute the low level umwait/umonitor byte
encoded instructions DLB has defined privately with similar byte-encoded instructions defined in the power
patch. The power patch does not directly expose those, which is why I did not update DLB/DLB2.
The power patch does have the advantage of centralizing the race avoidance
logic, which is a good thing for any PMD that wishes to take advantage of umwait/umonitor.  

Sorry for the confusion. I just misunderstood what was being asked of DLB in regard to switching over..  That being said, 
I am willing to convert DLB/DLB2 to use  rte_power_monitor(...) in a future patch-set.

Thanks,
Tim
  
Thomas Monjalon Oct. 30, 2020, 1:15 p.m. UTC | #6
30/10/2020 12:58, McDaniel, Timothy:
> From: McDaniel, Timothy
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 30/10/2020 10:43, Timothy McDaniel:
> > > > - note that the code still uses its private byte-encoded versions of
> > > >   umonitor/umwait, rather than the new functions in the power
> > > >   patch that are built on top of those intrinsics. This is intentional.
> > >
> > > Why? Now these intrinsics are available in the main branch.
> > > We should avoid duplicating such code.
> > >
> > >
> > 
> > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so
> > that DLB/DLB2 could use them instead of its own private byte-encoded versions,
> > but instead we have these wrappers that call the low level intrinsics. Those
> > wrappers
> > introduce additional overhead that is not required for DLB/DLB2. I have a
> > meeting with Ma Liang on Monday to discuss.
> 
> I thought the ask of DLB was to just substitute the low level umwait/umonitor byte
> encoded instructions DLB has defined privately with similar byte-encoded instructions defined in the power
> patch. The power patch does not directly expose those, which is why I did not update DLB/DLB2.
> The power patch does have the advantage of centralizing the race avoidance
> logic, which is a good thing for any PMD that wishes to take advantage of umwait/umonitor.

So you mean the overhead is a good thing?

> Sorry for the confusion. I just misunderstood what was being asked of DLB in regard to switching over..  That being said, 
> I am willing to convert DLB/DLB2 to use  rte_power_monitor(...) in a future patch-set.

Why not now?

Indeed there is a confusion and it looks like a lot of novlang
to exit from the situation.
We'll wait a clear decision with facts.
  
Jerin Jacob Oct. 30, 2020, 2:21 p.m. UTC | #7
On Fri, Oct 30, 2020 at 3:19 PM Timothy McDaniel
<timothy.mcdaniel@intel.com> wrote:

>
> Timothy McDaniel (23):
>   event/dlb2: add documentation and meson build infrastructure
>   event/dlb2: add dynamic logging
>   event/dlb2: add private data structures and constants
>   event/dlb2: add definitions shared with LKM or shared code
>   event/dlb2: add inline functions
>   event/dlb2: add eventdev probe

There is build error with clang  and static build here.
Please send the next version with fix.

meson  -Dexamples=l3fwd --buildtype=debugoptimized --werror
--default-library=static /export/dpdk-next-eventdev/devtools/..
./build-clang-static
The Meson build system
Version: 0.55.3
Source dir: /export/dpdk-next-eventdev
Build dir: /export/dpdk-next-eventdev/build-clang-static
Build type: native build
Program cat found: YES
Using 'PKG_CONFIG_PATH' from environment with value: ''
Using 'PKG_CONFIG_PATH' from environment with value: ''
Project name: DPDK
Project version: 20.11.0-rc1
Using 'CC' from environment with value: 'ccache clang'
Using 'CFLAGS' from environment with value: ''
Using 'LDFLAGS' from environment with value: ''
Using 'CPPFLAGS' from environment with value: ''
Using 'CC' from environment with value: 'ccache clang'
Using 'CFLAGS' from environment with value: ''
Using 'LDFLAGS' from environment with value: ''
Using 'CPPFLAGS' from environment with value: ''
C compiler for the host machine: ccache clang (clang 10.0.1 "clang
version 10.0.1 ")
C linker for the host machine: clang ld.bfd 2.35.1
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program pkg-config found: YES
Program gen-pmdinfo-cfile.sh found: YES
Program list-dir-globs.py found: YES
Program check-symbols.sh found: YES
Program options-ibverbs-static.sh found: YES
Program binutils-avx512-check.sh found: YES
Program python3 found: YES (/usr/bin/python)



ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers
-I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2
-Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig
-I../config -Ilib/librte_eal/include -I../lib/librte_eal/i
nclude -Ilib/librte_eal/linux/include
-I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
-I../lib/librte_eal/x86/include -Ilib/librte_eal/common
-I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal
-Ilib/librte_kvargs
 -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics
-Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring
-I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev
-Ilib/librte_net -I../lib/librte_net -Il
ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool
-I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter
-Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu
-I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer -
Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci
-I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci
-I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2
-g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
-Wformat-nonliteral -Wformat-security -Wmissing-declarations
-Wmissing-prototypes -Wnested-externs -Wold-style-definition
-Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -
Wwrite-strings -Wno-address-of-packed-member
-Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
-DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -MF
drivers/l
ibtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o.d -o
drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -c
../drivers/event/dlb2/pf/dlb2_main.c
In file included from ../drivers/event/dlb2/pf/dlb2_main.c:22:
../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
unknown builtin '__builtin_ia32_movntdq'
[-Wimplicit-function-declaration]
        __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
        ^
../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: note: did you mean
'__builtin_ia32_movntq'?
/usr/lib/clang/10.0.1/include/xmmintrin.h:2122:3: note:
'__builtin_ia32_movntq' declared here
  __builtin_ia32_movntq(__p, __a);

  [2010/2491] Compiling C object
drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o
FAILED: drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o
ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers
-I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2
-Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig
-I../config -Ilib/librte_eal/include -I../lib/librte_eal/i
nclude -Ilib/librte_eal/linux/include
-I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
-I../lib/librte_eal/x86/include -Ilib/librte_eal/common
-I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal
-Ilib/librte_kvargs
 -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics
-Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring
-I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev
-Ilib/librte_net -I../lib/librte_net -Il
ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool
-I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter
-Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu
-I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer -
Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci
-I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci
-I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2
-g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
-Wformat-nonliteral -Wformat-security -Wmissing-declarations
-Wmissing-prototypes -Wnested-externs -Wold-style-definition
-Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -
Wwrite-strings -Wno-address-of-packed-member
-Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
-DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF
drivers/lib
tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o
drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c
../drivers/event/dlb2/pf/dlb2_pf.c
In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35:
../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
unknown builtin '__builtin_ia32_movntdq'
[-Wimplicit-function-declaration]
        __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
  
Timothy McDaniel Oct. 30, 2020, 3:25 p.m. UTC | #8
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, October 30, 2020 9:22 AM
> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads,
> Gage <gage.eads@intel.com>; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Jerin Jacob <jerinj@marvell.com>; Thomas
> Monjalon <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD
> 
> On Fri, Oct 30, 2020 at 3:19 PM Timothy McDaniel
> <timothy.mcdaniel@intel.com> wrote:
> 
> >
> > Timothy McDaniel (23):
> >   event/dlb2: add documentation and meson build infrastructure
> >   event/dlb2: add dynamic logging
> >   event/dlb2: add private data structures and constants
> >   event/dlb2: add definitions shared with LKM or shared code
> >   event/dlb2: add inline functions
> >   event/dlb2: add eventdev probe
> 
> There is build error with clang  and static build here.
> Please send the next version with fix.
> 
> meson  -Dexamples=l3fwd --buildtype=debugoptimized --werror
> --default-library=static /export/dpdk-next-eventdev/devtools/..
> ./build-clang-static
> The Meson build system
> Version: 0.55.3
> Source dir: /export/dpdk-next-eventdev
> Build dir: /export/dpdk-next-eventdev/build-clang-static
> Build type: native build
> Program cat found: YES
> Using 'PKG_CONFIG_PATH' from environment with value: ''
> Using 'PKG_CONFIG_PATH' from environment with value: ''
> Project name: DPDK
> Project version: 20.11.0-rc1
> Using 'CC' from environment with value: 'ccache clang'
> Using 'CFLAGS' from environment with value: ''
> Using 'LDFLAGS' from environment with value: ''
> Using 'CPPFLAGS' from environment with value: ''
> Using 'CC' from environment with value: 'ccache clang'
> Using 'CFLAGS' from environment with value: ''
> Using 'LDFLAGS' from environment with value: ''
> Using 'CPPFLAGS' from environment with value: ''
> C compiler for the host machine: ccache clang (clang 10.0.1 "clang
> version 10.0.1 ")
> C linker for the host machine: clang ld.bfd 2.35.1
> Host machine cpu family: x86_64
> Host machine cpu: x86_64
> Program pkg-config found: YES
> Program gen-pmdinfo-cfile.sh found: YES
> Program list-dir-globs.py found: YES
> Program check-symbols.sh found: YES
> Program options-ibverbs-static.sh found: YES
> Program binutils-avx512-check.sh found: YES
> Program python3 found: YES (/usr/bin/python)
> 
> 
> 
> ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers
> -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2
> -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig
> -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i
> nclude -Ilib/librte_eal/linux/include
> -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
> -I../lib/librte_eal/x86/include -Ilib/librte_eal/common
> -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal
> -Ilib/librte_kvargs
>  -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics
> -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring
> -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev
> -Ilib/librte_net -I../lib/librte_net -Il
> ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool
> -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter
> -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu
> -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer -
> Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci
> -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci
> -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe
> -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2
> -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
> -Wformat-nonliteral -Wformat-security -Wmissing-declarations
> -Wmissing-prototypes -Wnested-externs -Wold-style-definition
> -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -
> Wwrite-strings -Wno-address-of-packed-member
> -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
> -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
> drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -MF
> drivers/l
> ibtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o.d -o
> drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -c
> ../drivers/event/dlb2/pf/dlb2_main.c
> In file included from ../drivers/event/dlb2/pf/dlb2_main.c:22:
> ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
> unknown builtin '__builtin_ia32_movntdq'
> [-Wimplicit-function-declaration]
>         __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
>         ^
> ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: note: did you mean
> '__builtin_ia32_movntq'?
> /usr/lib/clang/10.0.1/include/xmmintrin.h:2122:3: note:
> '__builtin_ia32_movntq' declared here
>   __builtin_ia32_movntq(__p, __a);
> 
>   [2010/2491] Compiling C object
> drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o
> FAILED: drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o
> ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers
> -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2
> -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig
> -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i
> nclude -Ilib/librte_eal/linux/include
> -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
> -I../lib/librte_eal/x86/include -Ilib/librte_eal/common
> -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal
> -Ilib/librte_kvargs
>  -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics
> -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring
> -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev
> -Ilib/librte_net -I../lib/librte_net -Il
> ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool
> -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter
> -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu
> -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer -
> Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci
> -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci
> -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe
> -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2
> -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
> -Wformat-nonliteral -Wformat-security -Wmissing-declarations
> -Wmissing-prototypes -Wnested-externs -Wold-style-definition
> -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -
> Wwrite-strings -Wno-address-of-packed-member
> -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
> -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
> drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF
> drivers/lib
> tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o
> drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c
> ../drivers/event/dlb2/pf/dlb2_pf.c
> In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35:
> ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
> unknown builtin '__builtin_ia32_movntdq'
> [-Wimplicit-function-declaration]
>         __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);

Not sure why this builds for me, but I do not see this error.
According to information online, '__builtin_ia32_movntdq' should be available if -msse2 is set.
According to the following snippet from config/x86/meson.build, it looks like msse4 is defined.
<snippet>
# we require SSE4.2 for DPDK
if cc.get_define('__SSE4_2__', args: machine_args) == ''
        message('SSE 4.2 not enabled by default, explicitly enabling')
        machine_args += '-msse4'
endif
<end snippet>

I realize this is from clang, not gcc, but why are they out of sync?
such that __builtin_ia32_movntdq is not available in clang, but is available in gcc.

Should I convert to _mm_stream_si128 (__m128i *__A, __m128i __B) ?

Any guidance on how to get past this would be greatly appreciated.

Thanks,
Tim
  
Jerin Jacob Oct. 30, 2020, 3:31 p.m. UTC | #9
+ @Richardson, Bruce  @Ananyev, Konstantin

On Fri, Oct 30, 2020 at 8:55 PM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, October 30, 2020 9:22 AM
> > To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> > Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads,
> > Gage <gage.eads@intel.com>; Van Haaren, Harry
> > <harry.van.haaren@intel.com>; Jerin Jacob <jerinj@marvell.com>; Thomas
> > Monjalon <thomas@monjalon.net>
> > Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD
> >
> > On Fri, Oct 30, 2020 at 3:19 PM Timothy McDaniel
> > <timothy.mcdaniel@intel.com> wrote:
> >
> > >
> > > Timothy McDaniel (23):
> > >   event/dlb2: add documentation and meson build infrastructure
> > >   event/dlb2: add dynamic logging
> > >   event/dlb2: add private data structures and constants
> > >   event/dlb2: add definitions shared with LKM or shared code
> > >   event/dlb2: add inline functions
> > >   event/dlb2: add eventdev probe
> >
> > There is build error with clang  and static build here.
> > Please send the next version with fix.
> >
> > meson  -Dexamples=l3fwd --buildtype=debugoptimized --werror
> > --default-library=static /export/dpdk-next-eventdev/devtools/..
> > ./build-clang-static
> > The Meson build system
> > Version: 0.55.3
> > Source dir: /export/dpdk-next-eventdev
> > Build dir: /export/dpdk-next-eventdev/build-clang-static
> > Build type: native build
> > Program cat found: YES
> > Using 'PKG_CONFIG_PATH' from environment with value: ''
> > Using 'PKG_CONFIG_PATH' from environment with value: ''
> > Project name: DPDK
> > Project version: 20.11.0-rc1
> > Using 'CC' from environment with value: 'ccache clang'
> > Using 'CFLAGS' from environment with value: ''
> > Using 'LDFLAGS' from environment with value: ''
> > Using 'CPPFLAGS' from environment with value: ''
> > Using 'CC' from environment with value: 'ccache clang'
> > Using 'CFLAGS' from environment with value: ''
> > Using 'LDFLAGS' from environment with value: ''
> > Using 'CPPFLAGS' from environment with value: ''
> > C compiler for the host machine: ccache clang (clang 10.0.1 "clang
> > version 10.0.1 ")
> > C linker for the host machine: clang ld.bfd 2.35.1
> > Host machine cpu family: x86_64
> > Host machine cpu: x86_64
> > Program pkg-config found: YES
> > Program gen-pmdinfo-cfile.sh found: YES
> > Program list-dir-globs.py found: YES
> > Program check-symbols.sh found: YES
> > Program options-ibverbs-static.sh found: YES
> > Program binutils-avx512-check.sh found: YES
> > Program python3 found: YES (/usr/bin/python)
> >
> >
> >
> > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers
> > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2
> > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig
> > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i
> > nclude -Ilib/librte_eal/linux/include
> > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
> > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common
> > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal
> > -Ilib/librte_kvargs
> >  -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics
> > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring
> > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev
> > -Ilib/librte_net -I../lib/librte_net -Il
> > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool
> > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter
> > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu
> > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer -
> > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci
> > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci
> > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe
> > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2
> > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
> > -Wformat-nonliteral -Wformat-security -Wmissing-declarations
> > -Wmissing-prototypes -Wnested-externs -Wold-style-definition
> > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -
> > Wwrite-strings -Wno-address-of-packed-member
> > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
> > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -MF
> > drivers/l
> > ibtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o.d -o
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -c
> > ../drivers/event/dlb2/pf/dlb2_main.c
> > In file included from ../drivers/event/dlb2/pf/dlb2_main.c:22:
> > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
> > unknown builtin '__builtin_ia32_movntdq'
> > [-Wimplicit-function-declaration]
> >         __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
> >         ^
> > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: note: did you mean
> > '__builtin_ia32_movntq'?
> > /usr/lib/clang/10.0.1/include/xmmintrin.h:2122:3: note:
> > '__builtin_ia32_movntq' declared here
> >   __builtin_ia32_movntq(__p, __a);
> >
> >   [2010/2491] Compiling C object
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o
> > FAILED: drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o
> > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers
> > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2
> > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig
> > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i
> > nclude -Ilib/librte_eal/linux/include
> > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
> > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common
> > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal
> > -Ilib/librte_kvargs
> >  -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics
> > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring
> > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev
> > -Ilib/librte_net -I../lib/librte_net -Il
> > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool
> > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter
> > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu
> > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer -
> > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci
> > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci
> > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe
> > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2
> > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
> > -Wformat-nonliteral -Wformat-security -Wmissing-declarations
> > -Wmissing-prototypes -Wnested-externs -Wold-style-definition
> > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -
> > Wwrite-strings -Wno-address-of-packed-member
> > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
> > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF
> > drivers/lib
> > tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c
> > ../drivers/event/dlb2/pf/dlb2_pf.c
> > In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35:
> > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
> > unknown builtin '__builtin_ia32_movntdq'
> > [-Wimplicit-function-declaration]
> >         __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
>
> Not sure why this builds for me, but I do not see this error.

May be you can try with
[for-main][dpdk-next-eventdev] $ clang -v
clang version 10.0.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin


> According to information online, '__builtin_ia32_movntdq' should be available if -msse2 is set.
> According to the following snippet from config/x86/meson.build, it looks like msse4 is defined.
> <snippet>
> # we require SSE4.2 for DPDK
> if cc.get_define('__SSE4_2__', args: machine_args) == ''
>         message('SSE 4.2 not enabled by default, explicitly enabling')
>         machine_args += '-msse4'
> endif
> <end snippet>
>
> I realize this is from clang, not gcc, but why are they out of sync?
> such that __builtin_ia32_movntdq is not available in clang, but is available in gcc.
>
> Should I convert to _mm_stream_si128 (__m128i *__A, __m128i __B) ?
>
> Any guidance on how to get past this would be greatly appreciated.

Added @Richardson, Bruce  @Ananyev, Konstantin

I do see the same problem in DLB driver as well.


>
> Thanks,
> Tim
>
>
>
  
David Marchand Oct. 30, 2020, 3:33 p.m. UTC | #10
On Fri, Oct 30, 2020 at 4:25 PM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, October 30, 2020 9:22 AM
> > To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> > Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads,
> > Gage <gage.eads@intel.com>; Van Haaren, Harry
> > <harry.van.haaren@intel.com>; Jerin Jacob <jerinj@marvell.com>; Thomas
> > Monjalon <thomas@monjalon.net>
> > Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD
> >
> > On Fri, Oct 30, 2020 at 3:19 PM Timothy McDaniel
> > <timothy.mcdaniel@intel.com> wrote:
> >
> > >
> > > Timothy McDaniel (23):
> > >   event/dlb2: add documentation and meson build infrastructure
> > >   event/dlb2: add dynamic logging
> > >   event/dlb2: add private data structures and constants
> > >   event/dlb2: add definitions shared with LKM or shared code
> > >   event/dlb2: add inline functions
> > >   event/dlb2: add eventdev probe
> >
> > There is build error with clang  and static build here.
> > Please send the next version with fix.
> >
> > meson  -Dexamples=l3fwd --buildtype=debugoptimized --werror
> > --default-library=static /export/dpdk-next-eventdev/devtools/..
> > ./build-clang-static
> > The Meson build system
> > Version: 0.55.3
> > Source dir: /export/dpdk-next-eventdev
> > Build dir: /export/dpdk-next-eventdev/build-clang-static
> > Build type: native build
> > Program cat found: YES
> > Using 'PKG_CONFIG_PATH' from environment with value: ''
> > Using 'PKG_CONFIG_PATH' from environment with value: ''
> > Project name: DPDK
> > Project version: 20.11.0-rc1
> > Using 'CC' from environment with value: 'ccache clang'
> > Using 'CFLAGS' from environment with value: ''
> > Using 'LDFLAGS' from environment with value: ''
> > Using 'CPPFLAGS' from environment with value: ''
> > Using 'CC' from environment with value: 'ccache clang'
> > Using 'CFLAGS' from environment with value: ''
> > Using 'LDFLAGS' from environment with value: ''
> > Using 'CPPFLAGS' from environment with value: ''
> > C compiler for the host machine: ccache clang (clang 10.0.1 "clang
> > version 10.0.1 ")
> > C linker for the host machine: clang ld.bfd 2.35.1
> > Host machine cpu family: x86_64
> > Host machine cpu: x86_64
> > Program pkg-config found: YES
> > Program gen-pmdinfo-cfile.sh found: YES
> > Program list-dir-globs.py found: YES
> > Program check-symbols.sh found: YES
> > Program options-ibverbs-static.sh found: YES
> > Program binutils-avx512-check.sh found: YES
> > Program python3 found: YES (/usr/bin/python)
> >
> >
> >
> > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers
> > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2
> > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig
> > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i
> > nclude -Ilib/librte_eal/linux/include
> > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
> > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common
> > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal
> > -Ilib/librte_kvargs
> >  -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics
> > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring
> > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev
> > -Ilib/librte_net -I../lib/librte_net -Il
> > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool
> > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter
> > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu
> > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer -
> > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci
> > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci
> > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe
> > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2
> > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
> > -Wformat-nonliteral -Wformat-security -Wmissing-declarations
> > -Wmissing-prototypes -Wnested-externs -Wold-style-definition
> > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -
> > Wwrite-strings -Wno-address-of-packed-member
> > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
> > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -MF
> > drivers/l
> > ibtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o.d -o
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -c
> > ../drivers/event/dlb2/pf/dlb2_main.c
> > In file included from ../drivers/event/dlb2/pf/dlb2_main.c:22:
> > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
> > unknown builtin '__builtin_ia32_movntdq'
> > [-Wimplicit-function-declaration]
> >         __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
> >         ^
> > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: note: did you mean
> > '__builtin_ia32_movntq'?
> > /usr/lib/clang/10.0.1/include/xmmintrin.h:2122:3: note:
> > '__builtin_ia32_movntq' declared here
> >   __builtin_ia32_movntq(__p, __a);
> >
> >   [2010/2491] Compiling C object
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o
> > FAILED: drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o
> > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers
> > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2
> > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig
> > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i
> > nclude -Ilib/librte_eal/linux/include
> > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
> > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common
> > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal
> > -Ilib/librte_kvargs
> >  -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics
> > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring
> > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev
> > -Ilib/librte_net -I../lib/librte_net -Il
> > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool
> > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter
> > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu
> > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer -
> > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci
> > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci
> > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe
> > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2
> > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
> > -Wformat-nonliteral -Wformat-security -Wmissing-declarations
> > -Wmissing-prototypes -Wnested-externs -Wold-style-definition
> > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -
> > Wwrite-strings -Wno-address-of-packed-member
> > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
> > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF
> > drivers/lib
> > tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o
> > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c
> > ../drivers/event/dlb2/pf/dlb2_pf.c
> > In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35:
> > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
> > unknown builtin '__builtin_ia32_movntdq'
> > [-Wimplicit-function-declaration]
> >         __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
>
> Not sure why this builds for me, but I do not see this error.

Reproduced the same error on fc31.
$ clang --version
clang version 9.0.1 (Fedora 9.0.1-2.fc31)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ ./devtools/test-meson-builds.sh
...

FAILED: drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o
ccache clang -Idrivers/a715181@@tmp_rte_event_dlb2@sta -Idrivers
-I../../dpdk/drivers -Idrivers/event/dlb2
-I../../dpdk/drivers/event/dlb2 -Ilib/librte_eventdev
-I../../dpdk/lib/librte_eventdev -I. -I../../dpdk/ -Iconfig
-I../../dpdk/config -Ilib/librte_eal/include
-I../../dpdk/lib/librte_eal/include -Ilib/librte_eal/linux/include
-I../../dpdk/lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
-I../../dpdk/lib/librte_eal/x86/include -Ilib/librte_eal/common
-I../../dpdk/lib/librte_eal/common -Ilib/librte_eal
-I../../dpdk/lib/librte_eal -Ilib/librte_kvargs
-I../../dpdk/lib/librte_kvargs
-Ilib/librte_telemetry/../librte_metrics
-I../../dpdk/lib/librte_telemetry/../librte_metrics
-Ilib/librte_telemetry -I../../dpdk/lib/librte_telemetry
-Ilib/librte_ring -I../../dpdk/lib/librte_ring -Ilib/librte_ethdev
-I../../dpdk/lib/librte_ethdev -Ilib/librte_net
-I../../dpdk/lib/librte_net -Ilib/librte_mbuf
-I../../dpdk/lib/librte_mbuf -Ilib/librte_mempool
-I../../dpdk/lib/librte_mempool -Ilib/librte_meter
-I../../dpdk/lib/librte_meter -Ilib/librte_hash
-I../../dpdk/lib/librte_hash -Ilib/librte_rcu
-I../../dpdk/lib/librte_rcu -Ilib/librte_timer
-I../../dpdk/lib/librte_timer -Ilib/librte_cryptodev
-I../../dpdk/lib/librte_cryptodev -Ilib/librte_pci
-I../../dpdk/lib/librte_pci -Idrivers/bus/pci
-I../../dpdk/drivers/bus/pci -I../../dpdk/drivers/bus/pci/linux
-I/home/dmarchan/intel-ipsec-mb/install/include -Xclang
-fcolor-diagnostics -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch
-Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
-Wformat-nonliteral -Wformat-security -Wmissing-declarations
-Wmissing-prototypes -Wnested-externs -Wold-style-definition
-Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef
-Wwrite-strings -Wno-address-of-packed-member
-Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
-DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o' -MF
'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o.d' -o
'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o' -c
../../dpdk/drivers/event/dlb2/dlb2.c
In file included from ../../dpdk/drivers/event/dlb2/dlb2.c:35:
../../dpdk/drivers/event/dlb2/dlb2_inline_fns.h:41:2: error: use of
unknown builtin '__builtin_ia32_movntdq'
[-Wimplicit-function-declaration]
        __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
        ^
../../dpdk/drivers/event/dlb2/dlb2_inline_fns.h:41:2: note: did you
mean '__builtin_ia32_movntq'?
/usr/lib64/clang/9.0.1/include/xmmintrin.h:2122:3: note:
'__builtin_ia32_movntq' declared here
  __builtin_ia32_movntq(__p, __a);
  ^
1 error generated.


FAILED: drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2_selftest.c.o
ccache clang -Idrivers/a715181@@tmp_rte_event_dlb2@sta -Idrivers
-I../../dpdk/drivers -Idrivers/event/dlb2
-I../../dpdk/drivers/event/dlb2 -Ilib/librte_eventdev
-I../../dpdk/lib/librte_eventdev -I. -I../../dpdk/ -Iconfig
-I../../dpdk/config -Ilib/librte_eal/include
-I../../dpdk/lib/librte_eal/include -Ilib/librte_eal/linux/include
-I../../dpdk/lib/librte_eal/linux/include -Ilib/librte_eal/x86/include
-I../../dpdk/lib/librte_eal/x86/include -Ilib/librte_eal/common
-I../../dpdk/lib/librte_eal/common -Ilib/librte_eal
-I../../dpdk/lib/librte_eal -Ilib/librte_kvargs
-I../../dpdk/lib/librte_kvargs
-Ilib/librte_telemetry/../librte_metrics
-I../../dpdk/lib/librte_telemetry/../librte_metrics
-Ilib/librte_telemetry -I../../dpdk/lib/librte_telemetry
-Ilib/librte_ring -I../../dpdk/lib/librte_ring -Ilib/librte_ethdev
-I../../dpdk/lib/librte_ethdev -Ilib/librte_net
-I../../dpdk/lib/librte_net -Ilib/librte_mbuf
-I../../dpdk/lib/librte_mbuf -Ilib/librte_mempool
-I../../dpdk/lib/librte_mempool -Ilib/librte_meter
-I../../dpdk/lib/librte_meter -Ilib/librte_hash
-I../../dpdk/lib/librte_hash -Ilib/librte_rcu
-I../../dpdk/lib/librte_rcu -Ilib/librte_timer
-I../../dpdk/lib/librte_timer -Ilib/librte_cryptodev
-I../../dpdk/lib/librte_cryptodev -Ilib/librte_pci
-I../../dpdk/lib/librte_pci -Idrivers/bus/pci
-I../../dpdk/drivers/bus/pci -I../../dpdk/drivers/bus/pci/linux
-I/home/dmarchan/intel-ipsec-mb/install/include -Xclang
-fcolor-diagnostics -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch
-Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
-Wformat-nonliteral -Wformat-security -Wmissing-declarations
-Wmissing-prototypes -Wnested-externs -Wold-style-definition
-Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef
-Wwrite-strings -Wno-address-of-packed-member
-Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
-DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2_selftest.c.o'
-MF 'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2_selftest.c.o.d'
-o 'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2_selftest.c.o'
-c ../../dpdk/drivers/event/dlb2/dlb2_selftest.c
../../dpdk/drivers/event/dlb2/dlb2_selftest.c:139:1: error: unused
function 'create_ordered_qids' [-Werror,-Wunused-function]
create_ordered_qids(struct test *t, int num_qids)
^
../../dpdk/drivers/event/dlb2/dlb2_selftest.c:145:1: error: unused
function 'create_unordered_qids' [-Werror,-Wunused-function]
create_unordered_qids(struct test *t, int num_qids)
^
../../dpdk/drivers/event/dlb2/dlb2_selftest.c:151:1: error: unused
function 'create_directed_qids' [-Werror,-Wunused-function]
create_directed_qids(struct test *t, int num_qids, const uint8_t ports[])
^
3 errors generated.




I also see a different issue with RHEL 7 gcc:

Found ninja-1.7.2 at /usr/bin/ninja-build
[1908/2413] Compiling C object
'drivers/drivers@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o'.
../drivers/event/dlb2/dlb2.c: In function ‘dlb2_hw_create_ldb_port’:
../drivers/event/dlb2/dlb2.c:1091:9: warning: missing braces around
initializer [-Wmissing-braces]
  struct dlb2_create_ldb_port_args cfg = {0};
         ^
../drivers/event/dlb2/dlb2.c:1091:9: warning: (near initialization for
‘cfg.response’) [-Wmissing-braces]
../drivers/event/dlb2/dlb2.c: In function ‘dlb2_hw_create_dir_port’:
../drivers/event/dlb2/dlb2.c:1263:9: warning: missing braces around
initializer [-Wmissing-braces]
  struct dlb2_create_dir_port_args cfg = {0};
         ^
../drivers/event/dlb2/dlb2.c:1263:9: warning: (near initialization for
‘cfg.response’) [-Wmissing-braces]
  
Timothy McDaniel Oct. 30, 2020, 3:35 p.m. UTC | #11
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, October 30, 2020 8:15 AM
> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> Cc: 'dev@dpdk.org' <dev@dpdk.org>; Carrillo, Erik G
> <erik.g.carrillo@intel.com>; Eads, Gage <gage.eads@intel.com>; Van Haaren,
> Harry <harry.van.haaren@intel.com>; 'jerinj@marvell.com'
> <jerinj@marvell.com>; 'david.marchand@redhat.com'
> <david.marchand@redhat.com>
> Subject: Re: [PATCH v5 00/23] Add DLB2 PMD
> 
> 30/10/2020 12:58, McDaniel, Timothy:
> > From: McDaniel, Timothy
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > 30/10/2020 10:43, Timothy McDaniel:
> > > > > - note that the code still uses its private byte-encoded versions of
> > > > >   umonitor/umwait, rather than the new functions in the power
> > > > >   patch that are built on top of those intrinsics. This is intentional.
> > > >
> > > > Why? Now these intrinsics are available in the main branch.
> > > > We should avoid duplicating such code.
> > > >
> > > >
> > >
> > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so
> > > that DLB/DLB2 could use them instead of its own private byte-encoded
> versions,
> > > but instead we have these wrappers that call the low level intrinsics. Those
> > > wrappers
> > > introduce additional overhead that is not required for DLB/DLB2. I have a
> > > meeting with Ma Liang on Monday to discuss.
> >
> > I thought the ask of DLB was to just substitute the low level umwait/umonitor
> byte
> > encoded instructions DLB has defined privately with similar byte-encoded
> instructions defined in the power
> > patch. The power patch does not directly expose those, which is why I did not
> update DLB/DLB2.
> > The power patch does have the advantage of centralizing the race avoidance
> > logic, which is a good thing for any PMD that wishes to take advantage of
> umwait/umonitor.
> 
> So you mean the overhead is a good thing?
> 
> > Sorry for the confusion. I just misunderstood what was being asked of DLB in
> regard to switching over..  That being said,
> > I am willing to convert DLB/DLB2 to use  rte_power_monitor(...) in a future
> patch-set.
> 
> Why not now?
> 
> Indeed there is a confusion and it looks like a lot of novlang
> to exit from the situation.
> We'll wait a clear decision with facts.
> 

Hi Thomas,

I have updated DLB and DLB2 to use rte_power_monitor(...), and those patches are
ready if you are willing to accept them and the 3 power patches.

For the sake of consistency, I see the benefit of using the power patch, even if it is 
slightly less efficient that the DLB-specific implementation that I currently have.
We have already encountered an empty queue, so this is no longer fast path for the PMD.
  
Thomas Monjalon Oct. 30, 2020, 3:47 p.m. UTC | #12
30/10/2020 16:35, McDaniel, Timothy:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 30/10/2020 12:58, McDaniel, Timothy:
> > > From: McDaniel, Timothy
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > 30/10/2020 10:43, Timothy McDaniel:
> > > > > > - note that the code still uses its private byte-encoded versions of
> > > > > >   umonitor/umwait, rather than the new functions in the power
> > > > > >   patch that are built on top of those intrinsics. This is intentional.
> > > > >
> > > > > Why? Now these intrinsics are available in the main branch.
> > > > > We should avoid duplicating such code.
> > > > >
> > > > >
> > > >
> > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so
> > > > that DLB/DLB2 could use them instead of its own private byte-encoded
> > versions,
> > > > but instead we have these wrappers that call the low level intrinsics. Those
> > > > wrappers
> > > > introduce additional overhead that is not required for DLB/DLB2. I have a
> > > > meeting with Ma Liang on Monday to discuss.
> > >
> > > I thought the ask of DLB was to just substitute the low level umwait/umonitor
> > byte
> > > encoded instructions DLB has defined privately with similar byte-encoded
> > instructions defined in the power
> > > patch. The power patch does not directly expose those, which is why I did not
> > update DLB/DLB2.
> > > The power patch does have the advantage of centralizing the race avoidance
> > > logic, which is a good thing for any PMD that wishes to take advantage of
> > umwait/umonitor.
> > 
> > So you mean the overhead is a good thing?
> > 
> > > Sorry for the confusion. I just misunderstood what was being asked of DLB in
> > regard to switching over..  That being said,
> > > I am willing to convert DLB/DLB2 to use  rte_power_monitor(...) in a future
> > patch-set.
> > 
> > Why not now?
> > 
> > Indeed there is a confusion and it looks like a lot of novlang
> > to exit from the situation.
> > We'll wait a clear decision with facts.
> > 
> 
> Hi Thomas,
> 
> I have updated DLB and DLB2 to use rte_power_monitor(...), and those patches are
> ready if you are willing to accept them and the 3 power patches.
> 
> For the sake of consistency, I see the benefit of using the power patch, even if it is 
> slightly less efficient that the DLB-specific implementation that I currently have.
> We have already encountered an empty queue, so this is no longer fast path for the PMD.

I am really concerned that the API in EAL is not the most efficient.
Why is that? Can we improve the EAL API?
  
Timothy McDaniel Oct. 30, 2020, 4:02 p.m. UTC | #13
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, October 30, 2020 10:48 AM
> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> Cc: 'dev@dpdk.org' <dev@dpdk.org>; Carrillo, Erik G
> <erik.g.carrillo@intel.com>; Eads, Gage <gage.eads@intel.com>; Van Haaren,
> Harry <harry.van.haaren@intel.com>; 'jerinj@marvell.com'
> <jerinj@marvell.com>; 'david.marchand@redhat.com'
> <david.marchand@redhat.com>
> Subject: Re: [PATCH v5 00/23] Add DLB2 PMD
> 
> 30/10/2020 16:35, McDaniel, Timothy:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 30/10/2020 12:58, McDaniel, Timothy:
> > > > From: McDaniel, Timothy
> > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > 30/10/2020 10:43, Timothy McDaniel:
> > > > > > > - note that the code still uses its private byte-encoded versions of
> > > > > > >   umonitor/umwait, rather than the new functions in the power
> > > > > > >   patch that are built on top of those intrinsics. This is intentional.
> > > > > >
> > > > > > Why? Now these intrinsics are available in the main branch.
> > > > > > We should avoid duplicating such code.
> > > > > >
> > > > > >
> > > > >
> > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split
> out so
> > > > > that DLB/DLB2 could use them instead of its own private byte-encoded
> > > versions,
> > > > > but instead we have these wrappers that call the low level intrinsics.
> Those
> > > > > wrappers
> > > > > introduce additional overhead that is not required for DLB/DLB2. I have a
> > > > > meeting with Ma Liang on Monday to discuss.
> > > >
> > > > I thought the ask of DLB was to just substitute the low level
> umwait/umonitor
> > > byte
> > > > encoded instructions DLB has defined privately with similar byte-encoded
> > > instructions defined in the power
> > > > patch. The power patch does not directly expose those, which is why I did
> not
> > > update DLB/DLB2.
> > > > The power patch does have the advantage of centralizing the race
> avoidance
> > > > logic, which is a good thing for any PMD that wishes to take advantage of
> > > umwait/umonitor.
> > >
> > > So you mean the overhead is a good thing?
> > >
> > > > Sorry for the confusion. I just misunderstood what was being asked of DLB
> in
> > > regard to switching over..  That being said,
> > > > I am willing to convert DLB/DLB2 to use  rte_power_monitor(...) in a future
> > > patch-set.
> > >
> > > Why not now?
> > >
> > > Indeed there is a confusion and it looks like a lot of novlang
> > > to exit from the situation.
> > > We'll wait a clear decision with facts.
> > >
> >
> > Hi Thomas,
> >
> > I have updated DLB and DLB2 to use rte_power_monitor(...), and those
> patches are
> > ready if you are willing to accept them and the 3 power patches.
> >
> > For the sake of consistency, I see the benefit of using the power patch, even if
> it is
> > slightly less efficient that the DLB-specific implementation that I currently
> have.
> > We have already encountered an empty queue, so this is no longer fast path
> for the PMD.
> 
> I am really concerned that the API in EAL is not the most efficient.
> Why is that? Can we improve the EAL API?
> 

From an efficiency perspective, I only noticed 2 things.
1) The size check and associated logic. This could be avoided if we had _8, _16, _32, _64 variants instead of 1 function that handled all possibilities, but even that may be a wash
with branch prediction 
2) The spinlock - but this observation was my mistake, and was flawed. I was looking at the rte_power_monitor_sync(...), and not rte_power_monitor(...), the latter of which does not take a spinlock.

In summary, I am no longer concerned about efficiency and suitability for DLB/DLB2.

Thanks,
Tim
  
Van Haaren, Harry Oct. 30, 2020, 4:08 p.m. UTC | #14
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, October 30, 2020 3:31 PM
> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads,
> Gage <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>;
> Jerin Jacob <jerinj@marvell.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD
> 
> + @Richardson, Bruce  @Ananyev, Konstantin
> 
> On Fri, Oct 30, 2020 at 8:55 PM McDaniel, Timothy
> <timothy.mcdaniel@intel.com> wrote:
> >

<snip backlog and compiler error output>

> > > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
> > > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
> > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF
> > > drivers/lib
> > > tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o
> > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c
> > > ../drivers/event/dlb2/pf/dlb2_pf.c
> > > In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35:
> > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
> > > unknown builtin '__builtin_ia32_movntdq'
> > > [-Wimplicit-function-declaration]
> > >         __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
> >
> > Not sure why this builds for me, but I do not see this error.
> 
> May be you can try with
> [for-main][dpdk-next-eventdev] $ clang -v
> clang version 10.0.1
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> 
> 
> > According to information online, '__builtin_ia32_movntdq' should be available if -
> msse2 is set.
> > According to the following snippet from config/x86/meson.build, it looks like
> msse4 is defined.
> > <snippet>
> > # we require SSE4.2 for DPDK
> > if cc.get_define('__SSE4_2__', args: machine_args) == ''
> >         message('SSE 4.2 not enabled by default, explicitly enabling')
> >         machine_args += '-msse4'
> > endif
> > <end snippet>
> >
> > I realize this is from clang, not gcc, but why are they out of sync?
> > such that __builtin_ia32_movntdq is not available in clang, but is available in gcc.
> >
> > Should I convert to _mm_stream_si128 (__m128i *__A, __m128i __B) ?
> >
> > Any guidance on how to get past this would be greatly appreciated.

Confirm that __builtin_ia32_* works on GCC and not on Clang, reproducing error above.
Using the intrinsic _mm_stream or _mm_store versions work with both Clang and GCC, and will fix this issue.
  
Timothy McDaniel Oct. 30, 2020, 4:13 p.m. UTC | #15
> -----Original Message-----
> From: Van Haaren, Harry <harry.van.haaren@intel.com>
> Sent: Friday, October 30, 2020 11:09 AM
> To: Jerin Jacob <jerinjacobk@gmail.com>; McDaniel, Timothy
> <timothy.mcdaniel@intel.com>
> Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads,
> Gage <gage.eads@intel.com>; Jerin Jacob <jerinj@marvell.com>; Thomas
> Monjalon <thomas@monjalon.net>
> Subject: RE: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD
> 
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, October 30, 2020 3:31 PM
> > To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> > Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>;
> Eads,
> > Gage <gage.eads@intel.com>; Van Haaren, Harry
> <harry.van.haaren@intel.com>;
> > Jerin Jacob <jerinj@marvell.com>; Thomas Monjalon
> <thomas@monjalon.net>
> > Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD
> >
> > + @Richardson, Bruce  @Ananyev, Konstantin
> >
> > On Fri, Oct 30, 2020 at 8:55 PM McDaniel, Timothy
> > <timothy.mcdaniel@intel.com> wrote:
> > >
> 
> <snip backlog and compiler error output>
> 
> > > > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native
> > > > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ
> > > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF
> > > > drivers/lib
> > > > tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o
> > > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c
> > > > ../drivers/event/dlb2/pf/dlb2_pf.c
> > > > In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35:
> > > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of
> > > > unknown builtin '__builtin_ia32_movntdq'
> > > > [-Wimplicit-function-declaration]
> > > >         __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
> > >
> > > Not sure why this builds for me, but I do not see this error.
> >
> > May be you can try with
> > [for-main][dpdk-next-eventdev] $ clang -v
> > clang version 10.0.1
> > Target: x86_64-pc-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> >
> >
> > > According to information online, '__builtin_ia32_movntdq' should be
> available if -
> > msse2 is set.
> > > According to the following snippet from config/x86/meson.build, it looks like
> > msse4 is defined.
> > > <snippet>
> > > # we require SSE4.2 for DPDK
> > > if cc.get_define('__SSE4_2__', args: machine_args) == ''
> > >         message('SSE 4.2 not enabled by default, explicitly enabling')
> > >         machine_args += '-msse4'
> > > endif
> > > <end snippet>
> > >
> > > I realize this is from clang, not gcc, but why are they out of sync?
> > > such that __builtin_ia32_movntdq is not available in clang, but is available in
> gcc.
> > >
> > > Should I convert to _mm_stream_si128 (__m128i *__A, __m128i __B) ?
> > >
> > > Any guidance on how to get past this would be greatly appreciated.
> 
> Confirm that __builtin_ia32_* works on GCC and not on Clang, reproducing
> error above.
> Using the intrinsic _mm_stream or _mm_store versions work with both Clang
> and GCC, and will fix this issue.

Thanks Harry.
I will convert to _mm_stream_si128.
  
Thomas Monjalon Oct. 30, 2020, 4:42 p.m. UTC | #16
30/10/2020 17:02, McDaniel, Timothy:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 30/10/2020 16:35, McDaniel, Timothy:
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > 30/10/2020 12:58, McDaniel, Timothy:
> > > > > From: McDaniel, Timothy
> > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > > 30/10/2020 10:43, Timothy McDaniel:
> > > > > > > > - note that the code still uses its private byte-encoded versions of
> > > > > > > >   umonitor/umwait, rather than the new functions in the power
> > > > > > > >   patch that are built on top of those intrinsics. This is intentional.
> > > > > > >
> > > > > > > Why? Now these intrinsics are available in the main branch.
> > > > > > > We should avoid duplicating such code.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split
> > out so
> > > > > > that DLB/DLB2 could use them instead of its own private byte-encoded
> > > > versions,
> > > > > > but instead we have these wrappers that call the low level intrinsics.
> > Those
> > > > > > wrappers
> > > > > > introduce additional overhead that is not required for DLB/DLB2. I have a
> > > > > > meeting with Ma Liang on Monday to discuss.
> > > > >
> > > > > I thought the ask of DLB was to just substitute the low level
> > umwait/umonitor
> > > > byte
> > > > > encoded instructions DLB has defined privately with similar byte-encoded
> > > > instructions defined in the power
> > > > > patch. The power patch does not directly expose those, which is why I did
> > not
> > > > update DLB/DLB2.
> > > > > The power patch does have the advantage of centralizing the race
> > avoidance
> > > > > logic, which is a good thing for any PMD that wishes to take advantage of
> > > > umwait/umonitor.
> > > >
> > > > So you mean the overhead is a good thing?
> > > >
> > > > > Sorry for the confusion. I just misunderstood what was being asked of DLB
> > in
> > > > regard to switching over..  That being said,
> > > > > I am willing to convert DLB/DLB2 to use  rte_power_monitor(...) in a future
> > > > patch-set.
> > > >
> > > > Why not now?
> > > >
> > > > Indeed there is a confusion and it looks like a lot of novlang
> > > > to exit from the situation.
> > > > We'll wait a clear decision with facts.
> > > >
> > >
> > > Hi Thomas,
> > >
> > > I have updated DLB and DLB2 to use rte_power_monitor(...), and those
> > patches are
> > > ready if you are willing to accept them and the 3 power patches.
> > >
> > > For the sake of consistency, I see the benefit of using the power patch, even if
> > it is
> > > slightly less efficient that the DLB-specific implementation that I currently
> > have.
> > > We have already encountered an empty queue, so this is no longer fast path
> > for the PMD.
> > 
> > I am really concerned that the API in EAL is not the most efficient.
> > Why is that? Can we improve the EAL API?
> > 
> 
> From an efficiency perspective, I only noticed 2 things.
> 1) The size check and associated logic. This could be avoided if we had _8, _16, _32, _64 variants instead of 1 function that handled all possibilities, but even that may be a wash
> with branch prediction 
> 2) The spinlock - but this observation was my mistake, and was flawed. I was looking at the rte_power_monitor_sync(...), and not rte_power_monitor(...), the latter of which does not take a spinlock.
> 
> In summary, I am no longer concerned about efficiency and suitability for DLB/DLB2.

OK, so let's go ahead with a DLB PMD using this API.