From patchwork Mon Mar 4 09:01:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Brandes, Shai" X-Patchwork-Id: 137872 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1939F43B9B; Mon, 4 Mar 2024 10:06:58 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 91C1A42E3E; Mon, 4 Mar 2024 10:02:36 +0100 (CET) Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) by mails.dpdk.org (Postfix) with ESMTP id CE46B42E0F for ; Mon, 4 Mar 2024 10:02:28 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1709542949; x=1741078949; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=X1Nd/p7iXfjdfkocBEmLNLJa1iqZe55FxwSACVcY3yk=; b=ZnQI8YcbOj4+b3Xnj+tX7tjV+ASZhHFyQdRYgL9wNNAF/vRwpcnfMbhK ZqrOytMVNTHAecWNvW2YLxBlLCw3RvNexQPOUQl8KwNILzH4Pt3Elt1Vf NhOXKdeeDr7+kXoLFuMc2gZqxJd7ebZbHHcHMt9hkYf2ujHVYZGvTUO3Z g=; X-IronPort-AV: E=Sophos;i="6.06,203,1705363200"; d="scan'208";a="642080116" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Mar 2024 09:02:29 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.17.79:37896] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.41.19:2525] with esmtp (Farcaster) id 8b8e7785-221b-4c26-975c-76aa457caeed; Mon, 4 Mar 2024 09:02:27 +0000 (UTC) X-Farcaster-Flow-ID: 8b8e7785-221b-4c26-975c-76aa457caeed Received: from EX19D007EUA001.ant.amazon.com (10.252.50.133) by EX19MTAEUB002.ant.amazon.com (10.252.51.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 4 Mar 2024 09:02:27 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA001.ant.amazon.com (10.252.50.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 4 Mar 2024 09:02:27 +0000 Received: from HFA15-CG15235BS.amazon.com (10.1.212.49) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server id 15.2.1258.28 via Frontend Transport; Mon, 4 Mar 2024 09:02:26 +0000 From: To: CC: , Shai Brandes Subject: [PATCH 32/33] net/ena: control path pure polling mode Date: Mon, 4 Mar 2024 11:01:35 +0200 Message-ID: <20240304090136.861-33-shaibran@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240304090136.861-1-shaibran@amazon.com> References: <20240304090136.861-1-shaibran@amazon.com> MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Shai Brandes This commit implements a new operation mode that enables purely polling-based functionality, eliminating the need for interrupts in the control path. This mode is not activated by default and can be toggled using the "control_poll_interval" devarg. When operating in this mode, periodic alarms are used to monitor the control queues. A non-zero value for this devarg is mandatory for control path functionality when binding ports to uio_pci_generic kernel module which lacks interrupt support. Signed-off-by: Shai Brandes Reviewed-by: Amit Bernstein --- doc/guides/nics/ena.rst | 49 ++++++++--- doc/guides/rel_notes/release_24_03.rst | 2 + drivers/net/ena/ena_ethdev.c | 108 ++++++++++++++++++++----- drivers/net/ena/ena_ethdev.h | 5 ++ 4 files changed, 130 insertions(+), 34 deletions(-) diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst index 53c9341859..a94397f9d3 100644 --- a/doc/guides/nics/ena.rst +++ b/doc/guides/nics/ena.rst @@ -109,12 +109,16 @@ Runtime Configuration * **llq_policy** (default 1) - Controls whether use device recommended header policy or override it. + Controls whether use device recommended header policy or override it: + 0 - Disable LLQ. - **Use with extreme caution as it leads to a huge performance - degradation on AWS instances from 6th generation onwards.** + **Use with extreme caution as it leads to a huge performance + degradation on AWS instances from 6th generation onwards.** + 1 - Accept device recommended LLQ policy (Default). + 2 - Enforce normal LLQ policy. + 3 - Enforce large LLQ policy. * **miss_txc_to** (default 5) @@ -126,6 +130,18 @@ Runtime Configuration timer service. Setting this parameter to 0 disables this feature. Maximum allowed value is 60 seconds. + * **control_poll_interval** (default 0) + + Enable polling-based functionality of the admin queues, eliminating the + need for interrupts in the control-path: + + 0 - Disable (Admin queue will work in interrupt mode). + + [1..1000] - Number of milliseconds to wait between periodic inspection of the admin queues. + + **A non-zero value for this devarg is mandatory for control path functionality + when binding ports to uio_pci_generic kernel module which lacks interrupt support.** + ENA Configuration Parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -164,23 +180,23 @@ Prerequisites #. Prepare the system as recommended by DPDK suite. This includes environment variables, hugepages configuration, tool-chains and configuration. -#. ENA PMD can operate with ``vfio-pci``(*) or ``igb_uio`` driver. +#. ENA PMD can operate with ``vfio-pci`` (*), ``igb_uio``, or ``uio_pci_generic`` driver. (*) ENAv2 hardware supports Low Latency Queue v2 (LLQv2). This feature reduces the latency of the packets by pushing the header directly through the PCI to the device, before the DMA is even triggered. For proper work - kernel PCI driver must support write combining (WC). + kernel PCI driver must support write-combining (WC). In DPDK ``igb_uio`` it must be enabled by loading module with ``wc_activate=1`` flag (example below). However, mainline's vfio-pci - driver in kernel doesn't have WC support yet (planed to be added). + driver in kernel doesn't have WC support yet (planned to be added). If vfio-pci is used user should follow `AWS ENA PMD documentation `_. -#. Insert ``vfio-pci`` or ``igb_uio`` kernel module using the command - ``modprobe vfio-pci`` or ``modprobe uio; insmod igb_uio.ko wc_activate=1`` - respectively. +#. For ``igb_uio``: + Insert ``igb_uio`` kernel module using the command ``modprobe uio; insmod igb_uio.ko wc_activate=1`` -#. For ``vfio-pci`` users only: +#. For ``vfio-pci``: + Insert ``vfio-pci`` kernel module using the command ``modprobe vfio-pci`` Please make sure that ``IOMMU`` is enabled in your system, or use ``vfio`` driver in ``noiommu`` mode:: @@ -189,7 +205,14 @@ Prerequisites To use ``noiommu`` mode, the ``vfio-pci`` must be built with flag ``CONFIG_VFIO_NOIOMMU``. -#. Bind the intended ENA device to ``vfio-pci`` or ``igb_uio`` module. +#. For ``uio_pci_generic``: + Insert ``uio_pci_generic`` kernel module using the command ``modprobe uio_pci_generic``. + + Note that when launching the application, the ``control_poll_interval`` devarg must be used with a non-zero value (1000 is recommended) + as ``uio_pci_generic`` lacks interrupt support. The control-path (admin queues) of the ENA require poll-mode + to process command completion and asyncronous notification from the device. + +#. Bind the intended ENA device to ``vfio-pci``, ``igb_uio``, or ``uio_pci_generic`` module. At this point the system should be ready to run DPDK applications. Once the application runs to completion, the ENA can be detached from attached module if @@ -198,7 +221,7 @@ necessary. **Rx interrupts support** ENA PMD supports Rx interrupts, which can be used to wake up lcores waiting for -input. Please note that it won't work with ``igb_uio``, so to use this feature, +input. Please note that it won't work with ``igb_uio`` and ``uio_pci_generic`` so to use this feature, the ``vfio-pci`` should be used. ENA handles admin interrupts and AENQ notifications on separate interrupt. @@ -209,7 +232,7 @@ will fail. **Note about usage on \*.metal instances** On AWS, the metal instances are supporting IOMMU for both arm64 and x86_64 -hosts. +hosts. Note that ``uio_pci_generic`` lacks IOMMU support and cannot be used for metal instances. * x86_64 (e.g. c5.metal, i3.metal): IOMMU should be disabled by default. In that situation, the ``igb_uio`` can diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst index 9823616eeb..d01236097a 100644 --- a/doc/guides/rel_notes/release_24_03.rst +++ b/doc/guides/rel_notes/release_24_03.rst @@ -109,6 +109,8 @@ New Features * Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg `llq_policy`. * Added support for LLQ header size recommendation from the device. * Allowed large LLQ with 1024 entries when the device supports enlarged memory BAR. + * Added `control_poll_interval` devarg that configure control-path to work in poll-mode. + * Added support for binding ports to `uio_pci_generic` kernel module. * **Updated Atomic Rules' Arkville driver.** diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c index 43693ee2ee..af1f6d6d05 100644 --- a/drivers/net/ena/ena_ethdev.c +++ b/drivers/net/ena/ena_ethdev.c @@ -3,6 +3,7 @@ * All rights reserved. */ +#include #include #include #include @@ -36,6 +37,8 @@ #define ENA_MIN_RING_DESC 128 +#define USEC_PER_MSEC 1000UL + #define BITS_PER_BYTE 8 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE) @@ -95,6 +98,14 @@ struct ena_stats { * considered as a missing. */ #define ENA_DEVARG_MISS_TXC_TO "miss_txc_to" +/* + * Controls the period of time (in milliseconds) between two consecutive inspections of + * the control queues when the driver is in poll mode and not using interrupts. + * By default, this value is zero, indicating that the driver will not be in poll mode and will + * use interrupts. A non-zero value for this argument is mandatory when using uio_pci_generic + * driver. + */ +#define ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "control_path_poll_interval" /* * Each rte_memzone should have unique name. @@ -271,7 +282,8 @@ static uint64_t ena_get_rx_queue_offloads(struct ena_adapter *adapter); static uint64_t ena_get_tx_queue_offloads(struct ena_adapter *adapter); static int ena_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info); -static void ena_interrupt_handler_rte(void *cb_arg); +static void ena_control_path_handler(void *cb_arg); +static void ena_control_path_poll_handler(void *cb_arg); static void ena_timer_wd_callback(struct rte_timer *timer, void *arg); static void ena_destroy_device(struct rte_eth_dev *eth_dev); static int eth_ena_dev_init(struct rte_eth_dev *eth_dev); @@ -882,10 +894,14 @@ static int ena_close(struct rte_eth_dev *dev) ret = ena_stop(dev); adapter->state = ENA_ADAPTER_STATE_CLOSED; - rte_intr_disable(intr_handle); - rc = rte_intr_callback_unregister_sync(intr_handle, ena_interrupt_handler_rte, dev); - if (unlikely(rc != 0)) - PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n"); + if (!adapter->control_path_poll_interval) { + rte_intr_disable(intr_handle); + rc = rte_intr_callback_unregister_sync(intr_handle, ena_control_path_handler, dev); + if (unlikely(rc != 0)) + PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n"); + } else { + rte_eal_alarm_cancel(ena_control_path_poll_handler, dev); + } ena_rx_queue_release_all(dev); ena_tx_queue_release_all(dev); @@ -1889,15 +1905,33 @@ static int ena_device_init(struct ena_adapter *adapter, return rc; } -static void ena_interrupt_handler_rte(void *cb_arg) +static void ena_control_path_handler(void *cb_arg) { struct rte_eth_dev *dev = cb_arg; struct ena_adapter *adapter = dev->data->dev_private; struct ena_com_dev *ena_dev = &adapter->ena_dev; - ena_com_admin_q_comp_intr_handler(ena_dev); - if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) + if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) { + ena_com_admin_q_comp_intr_handler(ena_dev); ena_com_aenq_intr_handler(ena_dev, dev); + } +} + +static void ena_control_path_poll_handler(void *cb_arg) +{ + struct rte_eth_dev *dev = cb_arg; + struct ena_adapter *adapter = dev->data->dev_private; + int rc; + + if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) { + ena_control_path_handler(cb_arg); + rc = rte_eal_alarm_set(adapter->control_path_poll_interval, + ena_control_path_poll_handler, cb_arg); + if (unlikely(rc != 0)) { + PMD_DRV_LOG(ERR, "Failed to retrigger control path alarm\n"); + ena_trigger_reset(adapter, ENA_REGS_RESET_GENERIC); + } + } } static void check_for_missing_keep_alive(struct ena_adapter *adapter) @@ -2362,20 +2396,28 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev) rte_spinlock_init(&adapter->admin_lock); - rte_intr_callback_register(intr_handle, - ena_interrupt_handler_rte, - eth_dev); + if (!adapter->control_path_poll_interval) { + /* Control path interrupt mode */ + rte_intr_callback_register(intr_handle, ena_control_path_handler, eth_dev); rte_intr_enable(intr_handle); ena_com_set_admin_polling_mode(ena_dev, false); ena_com_admin_aenq_enable(ena_dev); - + } else { /* Control path polling mode */ + rc = rte_eal_alarm_set(adapter->control_path_poll_interval, + ena_control_path_poll_handler, eth_dev); + if (unlikely(rc != 0)) { + PMD_DRV_LOG(ERR, "Failed to set control path alarm\n"); + goto err_control_path_destroy; + } + } rte_timer_init(&adapter->timer_wd); adapters_found++; adapter->state = ENA_ADAPTER_STATE_INIT; return 0; - +err_control_path_destroy: + rte_free(adapter->drv_stats); err_rss_destroy: ena_com_rss_destroy(ena_dev); err_delete_debug_area: @@ -3656,9 +3698,9 @@ static int ena_process_uint_devarg(const char *key, { struct ena_adapter *adapter = opaque; char *str_end; - uint64_t uint_value; + uint64_t uint64_value; - uint_value = strtoull(value, &str_end, DECIMAL_BASE); + uint64_value = strtoull(value, &str_end, DECIMAL_BASE); if (value == str_end) { PMD_INIT_LOG(ERR, "Invalid value for key '%s'. Only uint values are accepted.\n", @@ -3667,12 +3709,12 @@ static int ena_process_uint_devarg(const char *key, } if (strcmp(key, ENA_DEVARG_MISS_TXC_TO) == 0) { - if (uint_value > ENA_MAX_TX_TIMEOUT_SECONDS) { + if (uint64_value > ENA_MAX_TX_TIMEOUT_SECONDS) { PMD_INIT_LOG(ERR, "Tx timeout too high: %" PRIu64 " sec. Maximum allowed: %d sec.\n", - uint_value, ENA_MAX_TX_TIMEOUT_SECONDS); + uint64_value, ENA_MAX_TX_TIMEOUT_SECONDS); return -EINVAL; - } else if (uint_value == 0) { + } else if (uint64_value == 0) { PMD_INIT_LOG(INFO, "Check for missing Tx completions has been disabled.\n"); adapter->missing_tx_completion_to = @@ -3680,9 +3722,27 @@ static int ena_process_uint_devarg(const char *key, } else { PMD_INIT_LOG(INFO, "Tx packet completion timeout set to %" PRIu64 " seconds.\n", - uint_value); + uint64_value); adapter->missing_tx_completion_to = - uint_value * rte_get_timer_hz(); + uint64_value * rte_get_timer_hz(); + } + } else if (strcmp(key, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL) == 0) { + if (uint64_value > ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC) { + PMD_INIT_LOG(ERR, + "Control path polling interval is too long: %" PRIu64 " msecs. " + "Maximum allowed: %d msecs.\n", + uint64_value, ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC); + return -EINVAL; + } else if (uint64_value == 0) { + PMD_INIT_LOG(INFO, + "Control path polling interval is set to zero. Operating in " + "interrupt mode.\n"); + adapter->control_path_poll_interval = 0; + } else { + PMD_INIT_LOG(INFO, + "Control path polling interval is set to %" PRIu64 " msecs.\n", + uint64_value); + adapter->control_path_poll_interval = uint64_value * USEC_PER_MSEC; } } @@ -3712,6 +3772,7 @@ static int ena_parse_devargs(struct ena_adapter *adapter, struct rte_devargs *de static const char * const allowed_args[] = { ENA_DEVARG_LLQ_POLICY, ENA_DEVARG_MISS_TXC_TO, + ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL, NULL, }; struct rte_kvargs *kvlist; @@ -3734,6 +3795,10 @@ static int ena_parse_devargs(struct ena_adapter *adapter, struct rte_devargs *de ena_process_uint_devarg, adapter); if (rc != 0) goto exit; + rc = rte_kvargs_process(kvlist, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL, + ena_process_uint_devarg, adapter); + if (rc != 0) + goto exit; exit: rte_kvargs_free(kvlist); @@ -3954,7 +4019,8 @@ RTE_PMD_REGISTER_PCI_TABLE(net_ena, pci_id_ena_map); RTE_PMD_REGISTER_KMOD_DEP(net_ena, "* igb_uio | uio_pci_generic | vfio-pci"); RTE_PMD_REGISTER_PARAM_STRING(net_ena, ENA_DEVARG_LLQ_POLICY "=<0|1|2|3> " - ENA_DEVARG_MISS_TXC_TO "="); + ENA_DEVARG_MISS_TXC_TO "=" + ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "=<0-1000>"); RTE_LOG_REGISTER_SUFFIX(ena_logtype_init, init, NOTICE); RTE_LOG_REGISTER_SUFFIX(ena_logtype_driver, driver, NOTICE); #ifdef RTE_ETHDEV_DEBUG_RX diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h index 6716f01ba5..85e816ae72 100644 --- a/drivers/net/ena/ena_ethdev.h +++ b/drivers/net/ena/ena_ethdev.h @@ -44,6 +44,8 @@ #define ENA_MONITORED_TX_QUEUES 3 #define ENA_DEFAULT_MISSING_COMP 256U +#define ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC 1000 + /* While processing submitted and completed descriptors (rx and tx path * respectively) in a loop it is desired to: * - perform batch submissions while populating submission queue @@ -346,6 +348,9 @@ struct ena_adapter { uint64_t memzone_cnt; + /* Time (in microseconds) of the control path queues monitoring interval */ + uint64_t control_path_poll_interval; + /* * Helper variables for holding the information about the supported * metrics.