Message ID | 1604051021-26699-1-git-send-email-timothy.mcdaniel@intel.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D4122A04DF; Fri, 30 Oct 2020 10:49:16 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 89BDBC804; Fri, 30 Oct 2020 10:42:00 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id ADD3CBE45 for <dev@dpdk.org>; Fri, 30 Oct 2020 10:41:54 +0100 (CET) IronPort-SDR: 3sDc0D+bZ4ZkW4WXMKXZPEfvLS9xx2UZH0+scBhTGGTR3i3Zb1njS0nut3bomoV9jibA7fBWhA 952d2n51WOXA== X-IronPort-AV: E=McAfee;i="6000,8403,9789"; a="148439311" X-IronPort-AV: E=Sophos;i="5.77,432,1596524400"; d="scan'208";a="148439311" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 02:41:52 -0700 IronPort-SDR: hMDpdVdOASfJHSNhSx2YsnIL3hOH1BoveeZCoyuU+lvnrWxyxmQs+Va7/KSYzQPwSeDMbMmaDc NXrGrj3WeKYA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,432,1596524400"; d="scan'208";a="351808591" Received: from txasoft-yocto.an.intel.com ([10.123.72.192]) by fmsmga004.fm.intel.com with ESMTP; 30 Oct 2020 02:41:52 -0700 From: Timothy McDaniel <timothy.mcdaniel@intel.com> To: Cc: dev@dpdk.org, erik.g.carrillo@intel.com, gage.eads@intel.com, harry.van.haaren@intel.com, jerinj@marvell.com, thomas@monjalon.net Date: Fri, 30 Oct 2020 04:43:17 -0500 Message-Id: <1604051021-26699-1-git-send-email-timothy.mcdaniel@intel.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1602958879-8558-2-git-send-email-timothy.mcdaniel@intel.com> References: <1602958879-8558-2-git-send-email-timothy.mcdaniel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] =?utf-8?q?=5BPATCH_v5_00/23=5D_Add_DLB2_PMD?= X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
Add DLB2 PMD
|
|
Message
Timothy McDaniel
Oct. 30, 2020, 9:43 a.m. UTC
The following patch series adds support for a new eventdev PMD. The DLB2 PMD adds support for the Intel Dynamic Load Balancer 2.0 (DLB2) hardware. The DLB2 is a PCIe device that provides load-balanced, prioritized scheduling of core-to-core communication. The device consists of queues and arbiters that connect producer and consumer cores, and implements load-balanced queueing features including: - Lock-free multi-producer/multi-consumer operation. - Multiple priority levels for varying traffic types. - 'Direct' traffic (i.e. multi-producer/single-consumer) - Simple unordered load-balanced distribution. - Atomic lock-free load balancing across multiple consumers. - Queue element reordering feature allowing ordered load-balanced distribution. The DLB2 hardware supports both load balanced and directed ports and queues. Unlike other eventdev devices already in the repo, not all DLB2 ports and queues are equally capable. In particular, directed ports are limited to a single link, and must be connected to a directed queue. Additionally, even though LDB ports may link multiple queues, the number of queues that may be linked is limited by hardware. While reviewing the code, please be aware that this PMD has full control over the DLB2 hardware. Intel will be extending the DLB2 PMD in the future (not as part of this first series) with a mode that we refer to as the bifurcated PMD. The bifurcated PMD communicates with a kernel driver to configure the device, ports, and queues, and memory maps device MMIO so datapath operations occur purely in user-space. Note that the DLB2 hardware is a successor of the DLB hardware, and as such is structured similarly, both in terms of code layout and implementation. The framework to support both the PF PMD and bifurcated PMD exists in this patchset, and is why the iface.[ch] layer is present. Major changes since V4: ================ - moved introduction of dlb in relnotes_20_11 to first patch in series - fixed underlines in dlb.rst that were too short - note that the code still uses its private byte-encoded versions of umonitor/umwait, rather than the new functions in the power patch that are built on top of those intrinsics. This is intentional. Changes since V3: ================ - updated MAINTAINERS file to alphabetically insert DLB - don't create RTE_ symbols in pmd - converted to use version.map scheme - converted to use .._master_lcore instead of .._main_lcore - this patch set is based on dpdk-next-eventdev Changes since V2: ================ - fixed meson conditional build. Moved test into driver’s meson.build file instead of event/meson.build - documentation is populated as associated code is introduced - add log_register in add dynamic logging patch - rename RTE_xxx symbol(s) as DLB2_xxx - replaced function ptr enqueue_four with direct call to movdir64b - remove unused port_pages - broke up probe patch into 3 smaller patches for easier review - changed param order of movdir64b/movntdq to match intrinsics - added self to MAINTAINERS files - squashed announcement of availability into last patch in series - correct spelling errors and delete repeated words - DPDK_21.0 -> DPDK 21 in map file - add experimental banner to public structs and APIs Changes since V1: ================= - implement changes requested in code reviews by Gage Eads and Mike Chen - fix a memzone leak - convert to use eal rte-cpuflags patch from Liang Ma Known Issues: - the documentation contained in dlb2.rst is not complete, and should be updated to include command line parameters (class of service, etc ...). Depends-on: patch-82202 ("eventdev: increase MAX QUEUES PER DEV to 255") Depends-on: patch-79539 ("eal: add new x86 cpuid support for WAITPKG") Timothy McDaniel (23): event/dlb2: add documentation and meson build infrastructure event/dlb2: add dynamic logging event/dlb2: add private data structures and constants event/dlb2: add definitions shared with LKM or shared code event/dlb2: add inline functions event/dlb2: add eventdev probe event/dlb2: add flexible interface event/dlb2: add probe-time hardware init event/dlb2: add xstats event/dlb2: add infos get and configure event/dlb2: add queue and port default conf event/dlb2: add queue setup event/dlb2: add port setup event/dlb2: add port link event/dlb2: add port unlink and port unlinks in progress event/dlb2: add eventdev start event/dlb2: add enqueue and its burst variants event/dlb2: add dequeue and its burst variants event/dlb2: add eventdev stop and close event/dlb2: add PMD's token pop public interface event/dlb2: add PMD self-tests event/dlb2: add queue and port release event/dlb2: add timeout ticks entry point MAINTAINERS | 6 +- app/test/test_eventdev.c | 7 + config/rte_config.h | 7 + doc/api/doxy-api-index.md | 1 + doc/guides/eventdevs/dlb2.rst | 365 ++ doc/guides/eventdevs/index.rst | 1 + doc/guides/rel_notes/release_20_11.rst | 5 + drivers/event/dlb2/dlb2.c | 3951 ++++++++++++++++ drivers/event/dlb2/dlb2_iface.c | 74 + drivers/event/dlb2/dlb2_iface.h | 74 + drivers/event/dlb2/dlb2_inline_fns.h | 61 + drivers/event/dlb2/dlb2_log.h | 25 + drivers/event/dlb2/dlb2_priv.h | 578 +++ drivers/event/dlb2/dlb2_selftest.c | 1570 ++++++ drivers/event/dlb2/dlb2_user.h | 679 +++ drivers/event/dlb2/dlb2_xstats.c | 1240 +++++ drivers/event/dlb2/meson.build | 22 + drivers/event/dlb2/pf/base/dlb2_hw_types.h | 367 ++ drivers/event/dlb2/pf/base/dlb2_mbox.h | 596 +++ drivers/event/dlb2/pf/base/dlb2_osdep.h | 230 + drivers/event/dlb2/pf/base/dlb2_osdep_bitmap.h | 440 ++ drivers/event/dlb2/pf/base/dlb2_osdep_list.h | 131 + drivers/event/dlb2/pf/base/dlb2_osdep_types.h | 31 + drivers/event/dlb2/pf/base/dlb2_regs.h | 2527 ++++++++++ drivers/event/dlb2/pf/base/dlb2_resource.c | 6027 ++++++++++++++++++++++++ drivers/event/dlb2/pf/base/dlb2_resource.h | 1913 ++++++++ drivers/event/dlb2/pf/dlb2_main.c | 673 +++ drivers/event/dlb2/pf/dlb2_main.h | 97 + drivers/event/dlb2/pf/dlb2_pf.c | 728 +++ drivers/event/dlb2/rte_pmd_dlb2.c | 39 + drivers/event/dlb2/rte_pmd_dlb2.h | 72 + drivers/event/dlb2/version.map | 9 + drivers/event/meson.build | 3 +- 33 files changed, 22547 insertions(+), 2 deletions(-) create mode 100644 doc/guides/eventdevs/dlb2.rst create mode 100644 drivers/event/dlb2/dlb2.c create mode 100644 drivers/event/dlb2/dlb2_iface.c create mode 100644 drivers/event/dlb2/dlb2_iface.h create mode 100644 drivers/event/dlb2/dlb2_inline_fns.h create mode 100644 drivers/event/dlb2/dlb2_log.h create mode 100644 drivers/event/dlb2/dlb2_priv.h create mode 100644 drivers/event/dlb2/dlb2_selftest.c create mode 100644 drivers/event/dlb2/dlb2_user.h create mode 100644 drivers/event/dlb2/dlb2_xstats.c create mode 100644 drivers/event/dlb2/meson.build create mode 100644 drivers/event/dlb2/pf/base/dlb2_hw_types.h create mode 100644 drivers/event/dlb2/pf/base/dlb2_mbox.h create mode 100644 drivers/event/dlb2/pf/base/dlb2_osdep.h create mode 100644 drivers/event/dlb2/pf/base/dlb2_osdep_bitmap.h create mode 100644 drivers/event/dlb2/pf/base/dlb2_osdep_list.h create mode 100644 drivers/event/dlb2/pf/base/dlb2_osdep_types.h create mode 100644 drivers/event/dlb2/pf/base/dlb2_regs.h create mode 100644 drivers/event/dlb2/pf/base/dlb2_resource.c create mode 100644 drivers/event/dlb2/pf/base/dlb2_resource.h create mode 100644 drivers/event/dlb2/pf/dlb2_main.c create mode 100644 drivers/event/dlb2/pf/dlb2_main.h create mode 100644 drivers/event/dlb2/pf/dlb2_pf.c create mode 100644 drivers/event/dlb2/rte_pmd_dlb2.c create mode 100644 drivers/event/dlb2/rte_pmd_dlb2.h create mode 100644 drivers/event/dlb2/version.map
Comments
30/10/2020 10:43, Timothy McDaniel: > - note that the code still uses its private byte-encoded versions of > umonitor/umwait, rather than the new functions in the power > patch that are built on top of those intrinsics. This is intentional. Why? Now these intrinsics are available in the main branch. We should avoid duplicating such code.
> -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Friday, October 30, 2020 5:02 AM > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > Cc: dev@dpdk.org; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, Gage > <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; > jerinj@marvell.com; david.marchand@redhat.com > Subject: Re: [PATCH v5 00/23] Add DLB2 PMD > > 30/10/2020 10:43, Timothy McDaniel: > > - note that the code still uses its private byte-encoded versions of > > umonitor/umwait, rather than the new functions in the power > > patch that are built on top of those intrinsics. This is intentional. > > Why? Now these intrinsics are available in the main branch. > We should avoid duplicating such code. > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so that DLB/DLB2 could use them instead of its own private byte-encoded versions, but instead we have these wrappers that call the low level intrinsics. Those wrappers introduce additional overhead that is not required for DLB/DLB2. I have a meeting with Ma Liang on Monday to discuss.
On Fri, Oct 30, 2020 at 3:46 PM McDaniel, Timothy <timothy.mcdaniel@intel.com> wrote: > > > -----Original Message----- > > From: Thomas Monjalon <thomas@monjalon.net> > > Sent: Friday, October 30, 2020 5:02 AM > > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > > Cc: dev@dpdk.org; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, Gage > > <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; > > jerinj@marvell.com; david.marchand@redhat.com > > Subject: Re: [PATCH v5 00/23] Add DLB2 PMD > > > > 30/10/2020 10:43, Timothy McDaniel: > > > - note that the code still uses its private byte-encoded versions of > > > umonitor/umwait, rather than the new functions in the power > > > patch that are built on top of those intrinsics. This is intentional. > > > > Why? Now these intrinsics are available in the main branch. > > We should avoid duplicating such code. > > > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so that DLB/DLB2 could use them instead of its own private byte-encoded versions, but instead we have these wrappers that call the low level intrinsics. Those wrappers > introduce additional overhead that is not required for DLB/DLB2. I have a meeting with Ma Liang on Monday to discuss. Then why we merged the EAL patches? The all-purpose was to use this by other subsystems. If it is only for the power library then we should make specific to the power library. Thomas, Should I take this series in eventdev or I need to wait to sort out this?
30/10/2020 11:32, Jerin Jacob: > McDaniel, Timothy <timothy.mcdaniel@intel.com> wrote: > > From: Thomas Monjalon <thomas@monjalon.net> > > > 30/10/2020 10:43, Timothy McDaniel: > > > > - note that the code still uses its private byte-encoded versions of > > > > umonitor/umwait, rather than the new functions in the power > > > > patch that are built on top of those intrinsics. This is intentional. > > > > > > Why? Now these intrinsics are available in the main branch. > > > We should avoid duplicating such code. > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) > > be split out so that DLB/DLB2 could use them instead > > of its own private byte-encoded versions, > > but instead we have these wrappers that call the low level > > intrinsics. Those wrappers introduce additional overhead > > that is not required for DLB/DLB2. > > I have a meeting with Ma Liang on Monday to discuss. Why did not you tell it on the mailing list? It would have prevented from merging a wrong/useless API. I am now convinced that the hard push to get those intrinsics which started with a lack of communication (no roadmap, no Cc, no reply) is really a bad story in the community process. > Then why we merged the EAL patches? The all-purpose was to use this by > other subsystems. If it is only for the power library then we should > make specific to the power library. I agree with you Jerin. > Thomas, Should I take this series in eventdev > or I need to wait to sort out this? I think you could merge those patches and I revert the EAL ones. In any case, I won't merge any more patch about power monitor in this release. I don't like swimming in the fog.
> -----Original Message----- > From: McDaniel, Timothy > Sent: Friday, October 30, 2020 5:17 AM > To: Thomas Monjalon <thomas@monjalon.net> > Cc: dev@dpdk.org; Carrillo, Erik G <Erik.G.Carrillo@intel.com>; Eads, Gage > <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; > jerinj@marvell.com; david.marchand@redhat.com > Subject: RE: [PATCH v5 00/23] Add DLB2 PMD > > > -----Original Message----- > > From: Thomas Monjalon <thomas@monjalon.net> > > Sent: Friday, October 30, 2020 5:02 AM > > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > > Cc: dev@dpdk.org; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, Gage > > <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; > > jerinj@marvell.com; david.marchand@redhat.com > > Subject: Re: [PATCH v5 00/23] Add DLB2 PMD > > > > 30/10/2020 10:43, Timothy McDaniel: > > > - note that the code still uses its private byte-encoded versions of > > > umonitor/umwait, rather than the new functions in the power > > > patch that are built on top of those intrinsics. This is intentional. > > > > Why? Now these intrinsics are available in the main branch. > > We should avoid duplicating such code. > > > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so > that DLB/DLB2 could use them instead of its own private byte-encoded versions, > but instead we have these wrappers that call the low level intrinsics. Those > wrappers > introduce additional overhead that is not required for DLB/DLB2. I have a > meeting with Ma Liang on Monday to discuss. I thought the ask of DLB was to just substitute the low level umwait/umonitor byte encoded instructions DLB has defined privately with similar byte-encoded instructions defined in the power patch. The power patch does not directly expose those, which is why I did not update DLB/DLB2. The power patch does have the advantage of centralizing the race avoidance logic, which is a good thing for any PMD that wishes to take advantage of umwait/umonitor. Sorry for the confusion. I just misunderstood what was being asked of DLB in regard to switching over.. That being said, I am willing to convert DLB/DLB2 to use rte_power_monitor(...) in a future patch-set. Thanks, Tim
30/10/2020 12:58, McDaniel, Timothy: > From: McDaniel, Timothy > > From: Thomas Monjalon <thomas@monjalon.net> > > > 30/10/2020 10:43, Timothy McDaniel: > > > > - note that the code still uses its private byte-encoded versions of > > > > umonitor/umwait, rather than the new functions in the power > > > > patch that are built on top of those intrinsics. This is intentional. > > > > > > Why? Now these intrinsics are available in the main branch. > > > We should avoid duplicating such code. > > > > > > > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so > > that DLB/DLB2 could use them instead of its own private byte-encoded versions, > > but instead we have these wrappers that call the low level intrinsics. Those > > wrappers > > introduce additional overhead that is not required for DLB/DLB2. I have a > > meeting with Ma Liang on Monday to discuss. > > I thought the ask of DLB was to just substitute the low level umwait/umonitor byte > encoded instructions DLB has defined privately with similar byte-encoded instructions defined in the power > patch. The power patch does not directly expose those, which is why I did not update DLB/DLB2. > The power patch does have the advantage of centralizing the race avoidance > logic, which is a good thing for any PMD that wishes to take advantage of umwait/umonitor. So you mean the overhead is a good thing? > Sorry for the confusion. I just misunderstood what was being asked of DLB in regard to switching over.. That being said, > I am willing to convert DLB/DLB2 to use rte_power_monitor(...) in a future patch-set. Why not now? Indeed there is a confusion and it looks like a lot of novlang to exit from the situation. We'll wait a clear decision with facts.
On Fri, Oct 30, 2020 at 3:19 PM Timothy McDaniel <timothy.mcdaniel@intel.com> wrote: > > Timothy McDaniel (23): > event/dlb2: add documentation and meson build infrastructure > event/dlb2: add dynamic logging > event/dlb2: add private data structures and constants > event/dlb2: add definitions shared with LKM or shared code > event/dlb2: add inline functions > event/dlb2: add eventdev probe There is build error with clang and static build here. Please send the next version with fix. meson -Dexamples=l3fwd --buildtype=debugoptimized --werror --default-library=static /export/dpdk-next-eventdev/devtools/.. ./build-clang-static The Meson build system Version: 0.55.3 Source dir: /export/dpdk-next-eventdev Build dir: /export/dpdk-next-eventdev/build-clang-static Build type: native build Program cat found: YES Using 'PKG_CONFIG_PATH' from environment with value: '' Using 'PKG_CONFIG_PATH' from environment with value: '' Project name: DPDK Project version: 20.11.0-rc1 Using 'CC' from environment with value: 'ccache clang' Using 'CFLAGS' from environment with value: '' Using 'LDFLAGS' from environment with value: '' Using 'CPPFLAGS' from environment with value: '' Using 'CC' from environment with value: 'ccache clang' Using 'CFLAGS' from environment with value: '' Using 'LDFLAGS' from environment with value: '' Using 'CPPFLAGS' from environment with value: '' C compiler for the host machine: ccache clang (clang 10.0.1 "clang version 10.0.1 ") C linker for the host machine: clang ld.bfd 2.35.1 Host machine cpu family: x86_64 Host machine cpu: x86_64 Program pkg-config found: YES Program gen-pmdinfo-cfile.sh found: YES Program list-dir-globs.py found: YES Program check-symbols.sh found: YES Program options-ibverbs-static.sh found: YES Program binutils-avx512-check.sh found: YES Program python3 found: YES (/usr/bin/python) ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2 -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i nclude -Ilib/librte_eal/linux/include -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include -I../lib/librte_eal/x86/include -Ilib/librte_eal/common -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal -Ilib/librte_kvargs -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev -Ilib/librte_net -I../lib/librte_net -Il ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer - Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat-nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef - Wwrite-strings -Wno-address-of-packed-member -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -MF drivers/l ibtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o.d -o drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -c ../drivers/event/dlb2/pf/dlb2_main.c In file included from ../drivers/event/dlb2/pf/dlb2_main.c:22: ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of unknown builtin '__builtin_ia32_movntdq' [-Wimplicit-function-declaration] __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); ^ ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: note: did you mean '__builtin_ia32_movntq'? /usr/lib/clang/10.0.1/include/xmmintrin.h:2122:3: note: '__builtin_ia32_movntq' declared here __builtin_ia32_movntq(__p, __a); [2010/2491] Compiling C object drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o FAILED: drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2 -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i nclude -Ilib/librte_eal/linux/include -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include -I../lib/librte_eal/x86/include -Ilib/librte_eal/common -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal -Ilib/librte_kvargs -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev -Ilib/librte_net -I../lib/librte_net -Il ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer - Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat-nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef - Wwrite-strings -Wno-address-of-packed-member -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF drivers/lib tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c ../drivers/event/dlb2/pf/dlb2_pf.c In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35: ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of unknown builtin '__builtin_ia32_movntdq' [-Wimplicit-function-declaration] __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0);
> -----Original Message----- > From: Jerin Jacob <jerinjacobk@gmail.com> > Sent: Friday, October 30, 2020 9:22 AM > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, > Gage <gage.eads@intel.com>; Van Haaren, Harry > <harry.van.haaren@intel.com>; Jerin Jacob <jerinj@marvell.com>; Thomas > Monjalon <thomas@monjalon.net> > Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD > > On Fri, Oct 30, 2020 at 3:19 PM Timothy McDaniel > <timothy.mcdaniel@intel.com> wrote: > > > > > Timothy McDaniel (23): > > event/dlb2: add documentation and meson build infrastructure > > event/dlb2: add dynamic logging > > event/dlb2: add private data structures and constants > > event/dlb2: add definitions shared with LKM or shared code > > event/dlb2: add inline functions > > event/dlb2: add eventdev probe > > There is build error with clang and static build here. > Please send the next version with fix. > > meson -Dexamples=l3fwd --buildtype=debugoptimized --werror > --default-library=static /export/dpdk-next-eventdev/devtools/.. > ./build-clang-static > The Meson build system > Version: 0.55.3 > Source dir: /export/dpdk-next-eventdev > Build dir: /export/dpdk-next-eventdev/build-clang-static > Build type: native build > Program cat found: YES > Using 'PKG_CONFIG_PATH' from environment with value: '' > Using 'PKG_CONFIG_PATH' from environment with value: '' > Project name: DPDK > Project version: 20.11.0-rc1 > Using 'CC' from environment with value: 'ccache clang' > Using 'CFLAGS' from environment with value: '' > Using 'LDFLAGS' from environment with value: '' > Using 'CPPFLAGS' from environment with value: '' > Using 'CC' from environment with value: 'ccache clang' > Using 'CFLAGS' from environment with value: '' > Using 'LDFLAGS' from environment with value: '' > Using 'CPPFLAGS' from environment with value: '' > C compiler for the host machine: ccache clang (clang 10.0.1 "clang > version 10.0.1 ") > C linker for the host machine: clang ld.bfd 2.35.1 > Host machine cpu family: x86_64 > Host machine cpu: x86_64 > Program pkg-config found: YES > Program gen-pmdinfo-cfile.sh found: YES > Program list-dir-globs.py found: YES > Program check-symbols.sh found: YES > Program options-ibverbs-static.sh found: YES > Program binutils-avx512-check.sh found: YES > Program python3 found: YES (/usr/bin/python) > > > > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2 > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i > nclude -Ilib/librte_eal/linux/include > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal > -Ilib/librte_kvargs > -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev > -Ilib/librte_net -I../lib/librte_net -Il > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer - > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated > -Wformat-nonliteral -Wformat-security -Wmissing-declarations > -Wmissing-prototypes -Wnested-externs -Wold-style-definition > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef - > Wwrite-strings -Wno-address-of-packed-member > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -MF > drivers/l > ibtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o.d -o > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -c > ../drivers/event/dlb2/pf/dlb2_main.c > In file included from ../drivers/event/dlb2/pf/dlb2_main.c:22: > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of > unknown builtin '__builtin_ia32_movntdq' > [-Wimplicit-function-declaration] > __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); > ^ > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: note: did you mean > '__builtin_ia32_movntq'? > /usr/lib/clang/10.0.1/include/xmmintrin.h:2122:3: note: > '__builtin_ia32_movntq' declared here > __builtin_ia32_movntq(__p, __a); > > [2010/2491] Compiling C object > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o > FAILED: drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2 > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i > nclude -Ilib/librte_eal/linux/include > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal > -Ilib/librte_kvargs > -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev > -Ilib/librte_net -I../lib/librte_net -Il > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer - > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated > -Wformat-nonliteral -Wformat-security -Wmissing-declarations > -Wmissing-prototypes -Wnested-externs -Wold-style-definition > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef - > Wwrite-strings -Wno-address-of-packed-member > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF > drivers/lib > tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c > ../drivers/event/dlb2/pf/dlb2_pf.c > In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35: > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of > unknown builtin '__builtin_ia32_movntdq' > [-Wimplicit-function-declaration] > __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); Not sure why this builds for me, but I do not see this error. According to information online, '__builtin_ia32_movntdq' should be available if -msse2 is set. According to the following snippet from config/x86/meson.build, it looks like msse4 is defined. <snippet> # we require SSE4.2 for DPDK if cc.get_define('__SSE4_2__', args: machine_args) == '' message('SSE 4.2 not enabled by default, explicitly enabling') machine_args += '-msse4' endif <end snippet> I realize this is from clang, not gcc, but why are they out of sync? such that __builtin_ia32_movntdq is not available in clang, but is available in gcc. Should I convert to _mm_stream_si128 (__m128i *__A, __m128i __B) ? Any guidance on how to get past this would be greatly appreciated. Thanks, Tim
+ @Richardson, Bruce @Ananyev, Konstantin On Fri, Oct 30, 2020 at 8:55 PM McDaniel, Timothy <timothy.mcdaniel@intel.com> wrote: > > > > > -----Original Message----- > > From: Jerin Jacob <jerinjacobk@gmail.com> > > Sent: Friday, October 30, 2020 9:22 AM > > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > > Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, > > Gage <gage.eads@intel.com>; Van Haaren, Harry > > <harry.van.haaren@intel.com>; Jerin Jacob <jerinj@marvell.com>; Thomas > > Monjalon <thomas@monjalon.net> > > Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD > > > > On Fri, Oct 30, 2020 at 3:19 PM Timothy McDaniel > > <timothy.mcdaniel@intel.com> wrote: > > > > > > > > Timothy McDaniel (23): > > > event/dlb2: add documentation and meson build infrastructure > > > event/dlb2: add dynamic logging > > > event/dlb2: add private data structures and constants > > > event/dlb2: add definitions shared with LKM or shared code > > > event/dlb2: add inline functions > > > event/dlb2: add eventdev probe > > > > There is build error with clang and static build here. > > Please send the next version with fix. > > > > meson -Dexamples=l3fwd --buildtype=debugoptimized --werror > > --default-library=static /export/dpdk-next-eventdev/devtools/.. > > ./build-clang-static > > The Meson build system > > Version: 0.55.3 > > Source dir: /export/dpdk-next-eventdev > > Build dir: /export/dpdk-next-eventdev/build-clang-static > > Build type: native build > > Program cat found: YES > > Using 'PKG_CONFIG_PATH' from environment with value: '' > > Using 'PKG_CONFIG_PATH' from environment with value: '' > > Project name: DPDK > > Project version: 20.11.0-rc1 > > Using 'CC' from environment with value: 'ccache clang' > > Using 'CFLAGS' from environment with value: '' > > Using 'LDFLAGS' from environment with value: '' > > Using 'CPPFLAGS' from environment with value: '' > > Using 'CC' from environment with value: 'ccache clang' > > Using 'CFLAGS' from environment with value: '' > > Using 'LDFLAGS' from environment with value: '' > > Using 'CPPFLAGS' from environment with value: '' > > C compiler for the host machine: ccache clang (clang 10.0.1 "clang > > version 10.0.1 ") > > C linker for the host machine: clang ld.bfd 2.35.1 > > Host machine cpu family: x86_64 > > Host machine cpu: x86_64 > > Program pkg-config found: YES > > Program gen-pmdinfo-cfile.sh found: YES > > Program list-dir-globs.py found: YES > > Program check-symbols.sh found: YES > > Program options-ibverbs-static.sh found: YES > > Program binutils-avx512-check.sh found: YES > > Program python3 found: YES (/usr/bin/python) > > > > > > > > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers > > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2 > > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig > > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i > > nclude -Ilib/librte_eal/linux/include > > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include > > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common > > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal > > -Ilib/librte_kvargs > > -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics > > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring > > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev > > -Ilib/librte_net -I../lib/librte_net -Il > > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool > > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter > > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu > > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer - > > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci > > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci > > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe > > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 > > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated > > -Wformat-nonliteral -Wformat-security -Wmissing-declarations > > -Wmissing-prototypes -Wnested-externs -Wold-style-definition > > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef - > > Wwrite-strings -Wno-address-of-packed-member > > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native > > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -MF > > drivers/l > > ibtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o.d -o > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -c > > ../drivers/event/dlb2/pf/dlb2_main.c > > In file included from ../drivers/event/dlb2/pf/dlb2_main.c:22: > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of > > unknown builtin '__builtin_ia32_movntdq' > > [-Wimplicit-function-declaration] > > __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); > > ^ > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: note: did you mean > > '__builtin_ia32_movntq'? > > /usr/lib/clang/10.0.1/include/xmmintrin.h:2122:3: note: > > '__builtin_ia32_movntq' declared here > > __builtin_ia32_movntq(__p, __a); > > > > [2010/2491] Compiling C object > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o > > FAILED: drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o > > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers > > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2 > > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig > > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i > > nclude -Ilib/librte_eal/linux/include > > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include > > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common > > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal > > -Ilib/librte_kvargs > > -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics > > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring > > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev > > -Ilib/librte_net -I../lib/librte_net -Il > > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool > > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter > > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu > > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer - > > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci > > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci > > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe > > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 > > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated > > -Wformat-nonliteral -Wformat-security -Wmissing-declarations > > -Wmissing-prototypes -Wnested-externs -Wold-style-definition > > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef - > > Wwrite-strings -Wno-address-of-packed-member > > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native > > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF > > drivers/lib > > tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c > > ../drivers/event/dlb2/pf/dlb2_pf.c > > In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35: > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of > > unknown builtin '__builtin_ia32_movntdq' > > [-Wimplicit-function-declaration] > > __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); > > Not sure why this builds for me, but I do not see this error. May be you can try with [for-main][dpdk-next-eventdev] $ clang -v clang version 10.0.1 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin > According to information online, '__builtin_ia32_movntdq' should be available if -msse2 is set. > According to the following snippet from config/x86/meson.build, it looks like msse4 is defined. > <snippet> > # we require SSE4.2 for DPDK > if cc.get_define('__SSE4_2__', args: machine_args) == '' > message('SSE 4.2 not enabled by default, explicitly enabling') > machine_args += '-msse4' > endif > <end snippet> > > I realize this is from clang, not gcc, but why are they out of sync? > such that __builtin_ia32_movntdq is not available in clang, but is available in gcc. > > Should I convert to _mm_stream_si128 (__m128i *__A, __m128i __B) ? > > Any guidance on how to get past this would be greatly appreciated. Added @Richardson, Bruce @Ananyev, Konstantin I do see the same problem in DLB driver as well. > > Thanks, > Tim > > >
On Fri, Oct 30, 2020 at 4:25 PM McDaniel, Timothy <timothy.mcdaniel@intel.com> wrote: > > > > > -----Original Message----- > > From: Jerin Jacob <jerinjacobk@gmail.com> > > Sent: Friday, October 30, 2020 9:22 AM > > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > > Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, > > Gage <gage.eads@intel.com>; Van Haaren, Harry > > <harry.van.haaren@intel.com>; Jerin Jacob <jerinj@marvell.com>; Thomas > > Monjalon <thomas@monjalon.net> > > Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD > > > > On Fri, Oct 30, 2020 at 3:19 PM Timothy McDaniel > > <timothy.mcdaniel@intel.com> wrote: > > > > > > > > Timothy McDaniel (23): > > > event/dlb2: add documentation and meson build infrastructure > > > event/dlb2: add dynamic logging > > > event/dlb2: add private data structures and constants > > > event/dlb2: add definitions shared with LKM or shared code > > > event/dlb2: add inline functions > > > event/dlb2: add eventdev probe > > > > There is build error with clang and static build here. > > Please send the next version with fix. > > > > meson -Dexamples=l3fwd --buildtype=debugoptimized --werror > > --default-library=static /export/dpdk-next-eventdev/devtools/.. > > ./build-clang-static > > The Meson build system > > Version: 0.55.3 > > Source dir: /export/dpdk-next-eventdev > > Build dir: /export/dpdk-next-eventdev/build-clang-static > > Build type: native build > > Program cat found: YES > > Using 'PKG_CONFIG_PATH' from environment with value: '' > > Using 'PKG_CONFIG_PATH' from environment with value: '' > > Project name: DPDK > > Project version: 20.11.0-rc1 > > Using 'CC' from environment with value: 'ccache clang' > > Using 'CFLAGS' from environment with value: '' > > Using 'LDFLAGS' from environment with value: '' > > Using 'CPPFLAGS' from environment with value: '' > > Using 'CC' from environment with value: 'ccache clang' > > Using 'CFLAGS' from environment with value: '' > > Using 'LDFLAGS' from environment with value: '' > > Using 'CPPFLAGS' from environment with value: '' > > C compiler for the host machine: ccache clang (clang 10.0.1 "clang > > version 10.0.1 ") > > C linker for the host machine: clang ld.bfd 2.35.1 > > Host machine cpu family: x86_64 > > Host machine cpu: x86_64 > > Program pkg-config found: YES > > Program gen-pmdinfo-cfile.sh found: YES > > Program list-dir-globs.py found: YES > > Program check-symbols.sh found: YES > > Program options-ibverbs-static.sh found: YES > > Program binutils-avx512-check.sh found: YES > > Program python3 found: YES (/usr/bin/python) > > > > > > > > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers > > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2 > > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig > > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i > > nclude -Ilib/librte_eal/linux/include > > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include > > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common > > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal > > -Ilib/librte_kvargs > > -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics > > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring > > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev > > -Ilib/librte_net -I../lib/librte_net -Il > > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool > > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter > > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu > > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer - > > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci > > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci > > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe > > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 > > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated > > -Wformat-nonliteral -Wformat-security -Wmissing-declarations > > -Wmissing-prototypes -Wnested-externs -Wold-style-definition > > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef - > > Wwrite-strings -Wno-address-of-packed-member > > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native > > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -MF > > drivers/l > > ibtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o.d -o > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_main.c.o -c > > ../drivers/event/dlb2/pf/dlb2_main.c > > In file included from ../drivers/event/dlb2/pf/dlb2_main.c:22: > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of > > unknown builtin '__builtin_ia32_movntdq' > > [-Wimplicit-function-declaration] > > __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); > > ^ > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: note: did you mean > > '__builtin_ia32_movntq'? > > /usr/lib/clang/10.0.1/include/xmmintrin.h:2122:3: note: > > '__builtin_ia32_movntq' declared here > > __builtin_ia32_movntq(__p, __a); > > > > [2010/2491] Compiling C object > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o > > FAILED: drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o > > ccache clang -Idrivers/libtmp_rte_event_dlb2.a.p -Idrivers > > -I../drivers -Idrivers/event/dlb2 -I../drivers/event/dlb2 > > -Ilib/librte_eventdev -I../lib/librte_eventdev -I. -I.. -Iconfig > > -I../config -Ilib/librte_eal/include -I../lib/librte_eal/i > > nclude -Ilib/librte_eal/linux/include > > -I../lib/librte_eal/linux/include -Ilib/librte_eal/x86/include > > -I../lib/librte_eal/x86/include -Ilib/librte_eal/common > > -I../lib/librte_eal/common -Ilib/librte_eal -I../lib/librte_eal > > -Ilib/librte_kvargs > > -I../lib/librte_kvargs -Ilib/librte_metrics -I../lib/librte_metrics > > -Ilib/librte_telemetry -I../lib/librte_telemetry -Ilib/librte_ring > > -I../lib/librte_ring -Ilib/librte_ethdev -I../lib/librte_ethdev > > -Ilib/librte_net -I../lib/librte_net -Il > > ib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool > > -I../lib/librte_mempool -Ilib/librte_meter -I../lib/librte_meter > > -Ilib/librte_hash -I../lib/librte_hash -Ilib/librte_rcu > > -I../lib/librte_rcu -Ilib/librte_timer -I../lib/librte_timer - > > Ilib/librte_cryptodev -I../lib/librte_cryptodev -Ilib/librte_pci > > -I../lib/librte_pci -Idrivers/bus/pci -I../drivers/bus/pci > > -I../drivers/bus/pci/linux -Xclang -fcolor-diagnostics -pipe > > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 > > -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated > > -Wformat-nonliteral -Wformat-security -Wmissing-declarations > > -Wmissing-prototypes -Wnested-externs -Wold-style-definition > > -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef - > > Wwrite-strings -Wno-address-of-packed-member > > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native > > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF > > drivers/lib > > tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c > > ../drivers/event/dlb2/pf/dlb2_pf.c > > In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35: > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of > > unknown builtin '__builtin_ia32_movntdq' > > [-Wimplicit-function-declaration] > > __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); > > Not sure why this builds for me, but I do not see this error. Reproduced the same error on fc31. $ clang --version clang version 9.0.1 (Fedora 9.0.1-2.fc31) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin $ ./devtools/test-meson-builds.sh ... FAILED: drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o ccache clang -Idrivers/a715181@@tmp_rte_event_dlb2@sta -Idrivers -I../../dpdk/drivers -Idrivers/event/dlb2 -I../../dpdk/drivers/event/dlb2 -Ilib/librte_eventdev -I../../dpdk/lib/librte_eventdev -I. -I../../dpdk/ -Iconfig -I../../dpdk/config -Ilib/librte_eal/include -I../../dpdk/lib/librte_eal/include -Ilib/librte_eal/linux/include -I../../dpdk/lib/librte_eal/linux/include -Ilib/librte_eal/x86/include -I../../dpdk/lib/librte_eal/x86/include -Ilib/librte_eal/common -I../../dpdk/lib/librte_eal/common -Ilib/librte_eal -I../../dpdk/lib/librte_eal -Ilib/librte_kvargs -I../../dpdk/lib/librte_kvargs -Ilib/librte_telemetry/../librte_metrics -I../../dpdk/lib/librte_telemetry/../librte_metrics -Ilib/librte_telemetry -I../../dpdk/lib/librte_telemetry -Ilib/librte_ring -I../../dpdk/lib/librte_ring -Ilib/librte_ethdev -I../../dpdk/lib/librte_ethdev -Ilib/librte_net -I../../dpdk/lib/librte_net -Ilib/librte_mbuf -I../../dpdk/lib/librte_mbuf -Ilib/librte_mempool -I../../dpdk/lib/librte_mempool -Ilib/librte_meter -I../../dpdk/lib/librte_meter -Ilib/librte_hash -I../../dpdk/lib/librte_hash -Ilib/librte_rcu -I../../dpdk/lib/librte_rcu -Ilib/librte_timer -I../../dpdk/lib/librte_timer -Ilib/librte_cryptodev -I../../dpdk/lib/librte_cryptodev -Ilib/librte_pci -I../../dpdk/lib/librte_pci -Idrivers/bus/pci -I../../dpdk/drivers/bus/pci -I../../dpdk/drivers/bus/pci/linux -I/home/dmarchan/intel-ipsec-mb/install/include -Xclang -fcolor-diagnostics -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat-nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-address-of-packed-member -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ 'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o' -MF 'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o.d' -o 'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o' -c ../../dpdk/drivers/event/dlb2/dlb2.c In file included from ../../dpdk/drivers/event/dlb2/dlb2.c:35: ../../dpdk/drivers/event/dlb2/dlb2_inline_fns.h:41:2: error: use of unknown builtin '__builtin_ia32_movntdq' [-Wimplicit-function-declaration] __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); ^ ../../dpdk/drivers/event/dlb2/dlb2_inline_fns.h:41:2: note: did you mean '__builtin_ia32_movntq'? /usr/lib64/clang/9.0.1/include/xmmintrin.h:2122:3: note: '__builtin_ia32_movntq' declared here __builtin_ia32_movntq(__p, __a); ^ 1 error generated. FAILED: drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2_selftest.c.o ccache clang -Idrivers/a715181@@tmp_rte_event_dlb2@sta -Idrivers -I../../dpdk/drivers -Idrivers/event/dlb2 -I../../dpdk/drivers/event/dlb2 -Ilib/librte_eventdev -I../../dpdk/lib/librte_eventdev -I. -I../../dpdk/ -Iconfig -I../../dpdk/config -Ilib/librte_eal/include -I../../dpdk/lib/librte_eal/include -Ilib/librte_eal/linux/include -I../../dpdk/lib/librte_eal/linux/include -Ilib/librte_eal/x86/include -I../../dpdk/lib/librte_eal/x86/include -Ilib/librte_eal/common -I../../dpdk/lib/librte_eal/common -Ilib/librte_eal -I../../dpdk/lib/librte_eal -Ilib/librte_kvargs -I../../dpdk/lib/librte_kvargs -Ilib/librte_telemetry/../librte_metrics -I../../dpdk/lib/librte_telemetry/../librte_metrics -Ilib/librte_telemetry -I../../dpdk/lib/librte_telemetry -Ilib/librte_ring -I../../dpdk/lib/librte_ring -Ilib/librte_ethdev -I../../dpdk/lib/librte_ethdev -Ilib/librte_net -I../../dpdk/lib/librte_net -Ilib/librte_mbuf -I../../dpdk/lib/librte_mbuf -Ilib/librte_mempool -I../../dpdk/lib/librte_mempool -Ilib/librte_meter -I../../dpdk/lib/librte_meter -Ilib/librte_hash -I../../dpdk/lib/librte_hash -Ilib/librte_rcu -I../../dpdk/lib/librte_rcu -Ilib/librte_timer -I../../dpdk/lib/librte_timer -Ilib/librte_cryptodev -I../../dpdk/lib/librte_cryptodev -Ilib/librte_pci -I../../dpdk/lib/librte_pci -Idrivers/bus/pci -I../../dpdk/drivers/bus/pci -I../../dpdk/drivers/bus/pci/linux -I/home/dmarchan/intel-ipsec-mb/install/include -Xclang -fcolor-diagnostics -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat-nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-address-of-packed-member -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ 'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2_selftest.c.o' -MF 'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2_selftest.c.o.d' -o 'drivers/a715181@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2_selftest.c.o' -c ../../dpdk/drivers/event/dlb2/dlb2_selftest.c ../../dpdk/drivers/event/dlb2/dlb2_selftest.c:139:1: error: unused function 'create_ordered_qids' [-Werror,-Wunused-function] create_ordered_qids(struct test *t, int num_qids) ^ ../../dpdk/drivers/event/dlb2/dlb2_selftest.c:145:1: error: unused function 'create_unordered_qids' [-Werror,-Wunused-function] create_unordered_qids(struct test *t, int num_qids) ^ ../../dpdk/drivers/event/dlb2/dlb2_selftest.c:151:1: error: unused function 'create_directed_qids' [-Werror,-Wunused-function] create_directed_qids(struct test *t, int num_qids, const uint8_t ports[]) ^ 3 errors generated. I also see a different issue with RHEL 7 gcc: Found ninja-1.7.2 at /usr/bin/ninja-build [1908/2413] Compiling C object 'drivers/drivers@@tmp_rte_event_dlb2@sta/event_dlb2_dlb2.c.o'. ../drivers/event/dlb2/dlb2.c: In function ‘dlb2_hw_create_ldb_port’: ../drivers/event/dlb2/dlb2.c:1091:9: warning: missing braces around initializer [-Wmissing-braces] struct dlb2_create_ldb_port_args cfg = {0}; ^ ../drivers/event/dlb2/dlb2.c:1091:9: warning: (near initialization for ‘cfg.response’) [-Wmissing-braces] ../drivers/event/dlb2/dlb2.c: In function ‘dlb2_hw_create_dir_port’: ../drivers/event/dlb2/dlb2.c:1263:9: warning: missing braces around initializer [-Wmissing-braces] struct dlb2_create_dir_port_args cfg = {0}; ^ ../drivers/event/dlb2/dlb2.c:1263:9: warning: (near initialization for ‘cfg.response’) [-Wmissing-braces]
> -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Friday, October 30, 2020 8:15 AM > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > Cc: 'dev@dpdk.org' <dev@dpdk.org>; Carrillo, Erik G > <erik.g.carrillo@intel.com>; Eads, Gage <gage.eads@intel.com>; Van Haaren, > Harry <harry.van.haaren@intel.com>; 'jerinj@marvell.com' > <jerinj@marvell.com>; 'david.marchand@redhat.com' > <david.marchand@redhat.com> > Subject: Re: [PATCH v5 00/23] Add DLB2 PMD > > 30/10/2020 12:58, McDaniel, Timothy: > > From: McDaniel, Timothy > > > From: Thomas Monjalon <thomas@monjalon.net> > > > > 30/10/2020 10:43, Timothy McDaniel: > > > > > - note that the code still uses its private byte-encoded versions of > > > > > umonitor/umwait, rather than the new functions in the power > > > > > patch that are built on top of those intrinsics. This is intentional. > > > > > > > > Why? Now these intrinsics are available in the main branch. > > > > We should avoid duplicating such code. > > > > > > > > > > > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so > > > that DLB/DLB2 could use them instead of its own private byte-encoded > versions, > > > but instead we have these wrappers that call the low level intrinsics. Those > > > wrappers > > > introduce additional overhead that is not required for DLB/DLB2. I have a > > > meeting with Ma Liang on Monday to discuss. > > > > I thought the ask of DLB was to just substitute the low level umwait/umonitor > byte > > encoded instructions DLB has defined privately with similar byte-encoded > instructions defined in the power > > patch. The power patch does not directly expose those, which is why I did not > update DLB/DLB2. > > The power patch does have the advantage of centralizing the race avoidance > > logic, which is a good thing for any PMD that wishes to take advantage of > umwait/umonitor. > > So you mean the overhead is a good thing? > > > Sorry for the confusion. I just misunderstood what was being asked of DLB in > regard to switching over.. That being said, > > I am willing to convert DLB/DLB2 to use rte_power_monitor(...) in a future > patch-set. > > Why not now? > > Indeed there is a confusion and it looks like a lot of novlang > to exit from the situation. > We'll wait a clear decision with facts. > Hi Thomas, I have updated DLB and DLB2 to use rte_power_monitor(...), and those patches are ready if you are willing to accept them and the 3 power patches. For the sake of consistency, I see the benefit of using the power patch, even if it is slightly less efficient that the DLB-specific implementation that I currently have. We have already encountered an empty queue, so this is no longer fast path for the PMD.
30/10/2020 16:35, McDaniel, Timothy: > From: Thomas Monjalon <thomas@monjalon.net> > > 30/10/2020 12:58, McDaniel, Timothy: > > > From: McDaniel, Timothy > > > > From: Thomas Monjalon <thomas@monjalon.net> > > > > > 30/10/2020 10:43, Timothy McDaniel: > > > > > > - note that the code still uses its private byte-encoded versions of > > > > > > umonitor/umwait, rather than the new functions in the power > > > > > > patch that are built on top of those intrinsics. This is intentional. > > > > > > > > > > Why? Now these intrinsics are available in the main branch. > > > > > We should avoid duplicating such code. > > > > > > > > > > > > > > > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split out so > > > > that DLB/DLB2 could use them instead of its own private byte-encoded > > versions, > > > > but instead we have these wrappers that call the low level intrinsics. Those > > > > wrappers > > > > introduce additional overhead that is not required for DLB/DLB2. I have a > > > > meeting with Ma Liang on Monday to discuss. > > > > > > I thought the ask of DLB was to just substitute the low level umwait/umonitor > > byte > > > encoded instructions DLB has defined privately with similar byte-encoded > > instructions defined in the power > > > patch. The power patch does not directly expose those, which is why I did not > > update DLB/DLB2. > > > The power patch does have the advantage of centralizing the race avoidance > > > logic, which is a good thing for any PMD that wishes to take advantage of > > umwait/umonitor. > > > > So you mean the overhead is a good thing? > > > > > Sorry for the confusion. I just misunderstood what was being asked of DLB in > > regard to switching over.. That being said, > > > I am willing to convert DLB/DLB2 to use rte_power_monitor(...) in a future > > patch-set. > > > > Why not now? > > > > Indeed there is a confusion and it looks like a lot of novlang > > to exit from the situation. > > We'll wait a clear decision with facts. > > > > Hi Thomas, > > I have updated DLB and DLB2 to use rte_power_monitor(...), and those patches are > ready if you are willing to accept them and the 3 power patches. > > For the sake of consistency, I see the benefit of using the power patch, even if it is > slightly less efficient that the DLB-specific implementation that I currently have. > We have already encountered an empty queue, so this is no longer fast path for the PMD. I am really concerned that the API in EAL is not the most efficient. Why is that? Can we improve the EAL API?
> -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Friday, October 30, 2020 10:48 AM > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > Cc: 'dev@dpdk.org' <dev@dpdk.org>; Carrillo, Erik G > <erik.g.carrillo@intel.com>; Eads, Gage <gage.eads@intel.com>; Van Haaren, > Harry <harry.van.haaren@intel.com>; 'jerinj@marvell.com' > <jerinj@marvell.com>; 'david.marchand@redhat.com' > <david.marchand@redhat.com> > Subject: Re: [PATCH v5 00/23] Add DLB2 PMD > > 30/10/2020 16:35, McDaniel, Timothy: > > From: Thomas Monjalon <thomas@monjalon.net> > > > 30/10/2020 12:58, McDaniel, Timothy: > > > > From: McDaniel, Timothy > > > > > From: Thomas Monjalon <thomas@monjalon.net> > > > > > > 30/10/2020 10:43, Timothy McDaniel: > > > > > > > - note that the code still uses its private byte-encoded versions of > > > > > > > umonitor/umwait, rather than the new functions in the power > > > > > > > patch that are built on top of those intrinsics. This is intentional. > > > > > > > > > > > > Why? Now these intrinsics are available in the main branch. > > > > > > We should avoid duplicating such code. > > > > > > > > > > > > > > > > > > > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split > out so > > > > > that DLB/DLB2 could use them instead of its own private byte-encoded > > > versions, > > > > > but instead we have these wrappers that call the low level intrinsics. > Those > > > > > wrappers > > > > > introduce additional overhead that is not required for DLB/DLB2. I have a > > > > > meeting with Ma Liang on Monday to discuss. > > > > > > > > I thought the ask of DLB was to just substitute the low level > umwait/umonitor > > > byte > > > > encoded instructions DLB has defined privately with similar byte-encoded > > > instructions defined in the power > > > > patch. The power patch does not directly expose those, which is why I did > not > > > update DLB/DLB2. > > > > The power patch does have the advantage of centralizing the race > avoidance > > > > logic, which is a good thing for any PMD that wishes to take advantage of > > > umwait/umonitor. > > > > > > So you mean the overhead is a good thing? > > > > > > > Sorry for the confusion. I just misunderstood what was being asked of DLB > in > > > regard to switching over.. That being said, > > > > I am willing to convert DLB/DLB2 to use rte_power_monitor(...) in a future > > > patch-set. > > > > > > Why not now? > > > > > > Indeed there is a confusion and it looks like a lot of novlang > > > to exit from the situation. > > > We'll wait a clear decision with facts. > > > > > > > Hi Thomas, > > > > I have updated DLB and DLB2 to use rte_power_monitor(...), and those > patches are > > ready if you are willing to accept them and the 3 power patches. > > > > For the sake of consistency, I see the benefit of using the power patch, even if > it is > > slightly less efficient that the DLB-specific implementation that I currently > have. > > We have already encountered an empty queue, so this is no longer fast path > for the PMD. > > I am really concerned that the API in EAL is not the most efficient. > Why is that? Can we improve the EAL API? > From an efficiency perspective, I only noticed 2 things. 1) The size check and associated logic. This could be avoided if we had _8, _16, _32, _64 variants instead of 1 function that handled all possibilities, but even that may be a wash with branch prediction 2) The spinlock - but this observation was my mistake, and was flawed. I was looking at the rte_power_monitor_sync(...), and not rte_power_monitor(...), the latter of which does not take a spinlock. In summary, I am no longer concerned about efficiency and suitability for DLB/DLB2. Thanks, Tim
> -----Original Message----- > From: Jerin Jacob <jerinjacobk@gmail.com> > Sent: Friday, October 30, 2020 3:31 PM > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, > Gage <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; > Jerin Jacob <jerinj@marvell.com>; Thomas Monjalon <thomas@monjalon.net> > Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD > > + @Richardson, Bruce @Ananyev, Konstantin > > On Fri, Oct 30, 2020 at 8:55 PM McDaniel, Timothy > <timothy.mcdaniel@intel.com> wrote: > > <snip backlog and compiler error output> > > > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native > > > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ > > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF > > > drivers/lib > > > tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o > > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c > > > ../drivers/event/dlb2/pf/dlb2_pf.c > > > In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35: > > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of > > > unknown builtin '__builtin_ia32_movntdq' > > > [-Wimplicit-function-declaration] > > > __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); > > > > Not sure why this builds for me, but I do not see this error. > > May be you can try with > [for-main][dpdk-next-eventdev] $ clang -v > clang version 10.0.1 > Target: x86_64-pc-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > > > > According to information online, '__builtin_ia32_movntdq' should be available if - > msse2 is set. > > According to the following snippet from config/x86/meson.build, it looks like > msse4 is defined. > > <snippet> > > # we require SSE4.2 for DPDK > > if cc.get_define('__SSE4_2__', args: machine_args) == '' > > message('SSE 4.2 not enabled by default, explicitly enabling') > > machine_args += '-msse4' > > endif > > <end snippet> > > > > I realize this is from clang, not gcc, but why are they out of sync? > > such that __builtin_ia32_movntdq is not available in clang, but is available in gcc. > > > > Should I convert to _mm_stream_si128 (__m128i *__A, __m128i __B) ? > > > > Any guidance on how to get past this would be greatly appreciated. Confirm that __builtin_ia32_* works on GCC and not on Clang, reproducing error above. Using the intrinsic _mm_stream or _mm_store versions work with both Clang and GCC, and will fix this issue.
> -----Original Message----- > From: Van Haaren, Harry <harry.van.haaren@intel.com> > Sent: Friday, October 30, 2020 11:09 AM > To: Jerin Jacob <jerinjacobk@gmail.com>; McDaniel, Timothy > <timothy.mcdaniel@intel.com> > Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; Eads, > Gage <gage.eads@intel.com>; Jerin Jacob <jerinj@marvell.com>; Thomas > Monjalon <thomas@monjalon.net> > Subject: RE: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD > > > -----Original Message----- > > From: Jerin Jacob <jerinjacobk@gmail.com> > > Sent: Friday, October 30, 2020 3:31 PM > > To: McDaniel, Timothy <timothy.mcdaniel@intel.com> > > Cc: dpdk-dev <dev@dpdk.org>; Carrillo, Erik G <erik.g.carrillo@intel.com>; > Eads, > > Gage <gage.eads@intel.com>; Van Haaren, Harry > <harry.van.haaren@intel.com>; > > Jerin Jacob <jerinj@marvell.com>; Thomas Monjalon > <thomas@monjalon.net> > > Subject: Re: [dpdk-dev] [PATCH v5 00/23] Add DLB2 PMD > > > > + @Richardson, Bruce @Ananyev, Konstantin > > > > On Fri, Oct 30, 2020 at 8:55 PM McDaniel, Timothy > > <timothy.mcdaniel@intel.com> wrote: > > > > > <snip backlog and compiler error output> > > > > > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native > > > > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -MD -MQ > > > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -MF > > > > drivers/lib > > > > tmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o.d -o > > > > drivers/libtmp_rte_event_dlb2.a.p/event_dlb2_pf_dlb2_pf.c.o -c > > > > ../drivers/event/dlb2/pf/dlb2_pf.c > > > > In file included from ../drivers/event/dlb2/pf/dlb2_pf.c:35: > > > > ../drivers/event/dlb2/pf/../dlb2_inline_fns.h:41:2: error: use of > > > > unknown builtin '__builtin_ia32_movntdq' > > > > [-Wimplicit-function-declaration] > > > > __builtin_ia32_movntdq((__v2di *)pp_addr, (__v2di)src_data0); > > > > > > Not sure why this builds for me, but I do not see this error. > > > > May be you can try with > > [for-main][dpdk-next-eventdev] $ clang -v > > clang version 10.0.1 > > Target: x86_64-pc-linux-gnu > > Thread model: posix > > InstalledDir: /usr/bin > > > > > > > According to information online, '__builtin_ia32_movntdq' should be > available if - > > msse2 is set. > > > According to the following snippet from config/x86/meson.build, it looks like > > msse4 is defined. > > > <snippet> > > > # we require SSE4.2 for DPDK > > > if cc.get_define('__SSE4_2__', args: machine_args) == '' > > > message('SSE 4.2 not enabled by default, explicitly enabling') > > > machine_args += '-msse4' > > > endif > > > <end snippet> > > > > > > I realize this is from clang, not gcc, but why are they out of sync? > > > such that __builtin_ia32_movntdq is not available in clang, but is available in > gcc. > > > > > > Should I convert to _mm_stream_si128 (__m128i *__A, __m128i __B) ? > > > > > > Any guidance on how to get past this would be greatly appreciated. > > Confirm that __builtin_ia32_* works on GCC and not on Clang, reproducing > error above. > Using the intrinsic _mm_stream or _mm_store versions work with both Clang > and GCC, and will fix this issue. Thanks Harry. I will convert to _mm_stream_si128.
30/10/2020 17:02, McDaniel, Timothy: > From: Thomas Monjalon <thomas@monjalon.net> > > 30/10/2020 16:35, McDaniel, Timothy: > > > From: Thomas Monjalon <thomas@monjalon.net> > > > > 30/10/2020 12:58, McDaniel, Timothy: > > > > > From: McDaniel, Timothy > > > > > > From: Thomas Monjalon <thomas@monjalon.net> > > > > > > > 30/10/2020 10:43, Timothy McDaniel: > > > > > > > > - note that the code still uses its private byte-encoded versions of > > > > > > > > umonitor/umwait, rather than the new functions in the power > > > > > > > > patch that are built on top of those intrinsics. This is intentional. > > > > > > > > > > > > > > Why? Now these intrinsics are available in the main branch. > > > > > > > We should avoid duplicating such code. > > > > > > > > > > > > > > > > > > > > > > > > > > I had asked that the low level intrinsics (UMWAIT/UMONITOR) be split > > out so > > > > > > that DLB/DLB2 could use them instead of its own private byte-encoded > > > > versions, > > > > > > but instead we have these wrappers that call the low level intrinsics. > > Those > > > > > > wrappers > > > > > > introduce additional overhead that is not required for DLB/DLB2. I have a > > > > > > meeting with Ma Liang on Monday to discuss. > > > > > > > > > > I thought the ask of DLB was to just substitute the low level > > umwait/umonitor > > > > byte > > > > > encoded instructions DLB has defined privately with similar byte-encoded > > > > instructions defined in the power > > > > > patch. The power patch does not directly expose those, which is why I did > > not > > > > update DLB/DLB2. > > > > > The power patch does have the advantage of centralizing the race > > avoidance > > > > > logic, which is a good thing for any PMD that wishes to take advantage of > > > > umwait/umonitor. > > > > > > > > So you mean the overhead is a good thing? > > > > > > > > > Sorry for the confusion. I just misunderstood what was being asked of DLB > > in > > > > regard to switching over.. That being said, > > > > > I am willing to convert DLB/DLB2 to use rte_power_monitor(...) in a future > > > > patch-set. > > > > > > > > Why not now? > > > > > > > > Indeed there is a confusion and it looks like a lot of novlang > > > > to exit from the situation. > > > > We'll wait a clear decision with facts. > > > > > > > > > > Hi Thomas, > > > > > > I have updated DLB and DLB2 to use rte_power_monitor(...), and those > > patches are > > > ready if you are willing to accept them and the 3 power patches. > > > > > > For the sake of consistency, I see the benefit of using the power patch, even if > > it is > > > slightly less efficient that the DLB-specific implementation that I currently > > have. > > > We have already encountered an empty queue, so this is no longer fast path > > for the PMD. > > > > I am really concerned that the API in EAL is not the most efficient. > > Why is that? Can we improve the EAL API? > > > > From an efficiency perspective, I only noticed 2 things. > 1) The size check and associated logic. This could be avoided if we had _8, _16, _32, _64 variants instead of 1 function that handled all possibilities, but even that may be a wash > with branch prediction > 2) The spinlock - but this observation was my mistake, and was flawed. I was looking at the rte_power_monitor_sync(...), and not rte_power_monitor(...), the latter of which does not take a spinlock. > > In summary, I am no longer concerned about efficiency and suitability for DLB/DLB2. OK, so let's go ahead with a DLB PMD using this API.