From patchwork Thu Jan 30 20:19:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Scott Wasson X-Patchwork-Id: 65385 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 88E4FA0524; Thu, 30 Jan 2020 21:19:03 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 686D81BFFC; Thu, 30 Jan 2020 21:19:03 +0100 (CET) Received: from out.exch021.serverdata.net (out.exch021.serverdata.net [64.78.40.179]) by dpdk.org (Postfix) with ESMTP id 031932B89 for ; Thu, 30 Jan 2020 21:19:01 +0100 (CET) Received: from MBX021-W6-CA-2.exch021.domain.local (10.254.4.150) by MBX021-W6-CA-2.exch021.domain.local (10.254.4.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P521) id 15.1.1913.5; Thu, 30 Jan 2020 12:19:00 -0800 Received: from MBX021-W6-CA-2.exch021.domain.local ([10.254.4.150]) by MBX021-W6-CA-2.exch021.domain.local ([10.254.4.150]) with mapi id 15.01.1913.005; Thu, 30 Jan 2020 12:19:00 -0800 From: Scott Wasson To: "dev@dpdk.org" Thread-Topic: IOVA_CONTIG flag needed in kni initialization Thread-Index: AQHV16qChlUqdViJUEWBud89SVlSQw== Date: Thu, 30 Jan 2020 20:19:00 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [50.203.66.100] Content-ID: <2AB8AF84C3756443BCA64D49E3EA3E87@exch021.domain.local> MIME-Version: 1.0 Subject: [dpdk-dev] IOVA_CONTIG flag needed in kni initialization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi,   We’re seeing an issue since upgrading to 19.08, the kni FIFO’s apparently aren’t contiguous.  From user-space’s perspective, the kni’s tx_q straddles the 2MB pageboundary at 0x17a600000.  The mbuf pointers in the ring prior to this address are valid.  The tx_q’s write pointer is indicating there are mbufs at 0x17a600000 and beyond, but the pointers are all NULL.   Because the rte_kni kernel module is loaded:   In eal.c:                                 /* Workaround for KNI which requires physical address to work */                                 if (iova_mode == RTE_IOVA_VA &&                                                                 rte_eal_check_module("rte_kni") == 1) {                                                 if (phys_addrs) {                                                                 iova_mode = RTE_IOVA_PA;   Iova_mode is forced to PA.   Through brute-force and experimentation, we determined that enabling --legacy-mem caused the problem to go away.  But this caused the locations of the kni’s data structures to move, so they no longer straddled a hugepages boundary.  Our concern is that the furniture may move around again and bring us back to where we were.  Being tied to using --legacy-mem is undesirable in the long-term, anyway.   Through further brute-force and experimentation, we found that the following code patch helps (even without --legacy-mem):   index 3d2ffb2..5cc9d69 100644                                 if (try_contig)                                                 flags |= RTE_MEMZONE_IOVA_CONTIG;   which I think explains why our mbufs haven’t seen data truncation issues.   Could you please why RTE_MEMZONE_IOVA_CONTIG is necessary in PA mode?  Isn’t contiguousness a fundamental property of physical addressing?   Are we still potentially vulnerable with --legacy-mem and without the above code change?  Did we just get lucky because the furniture moved and doesn’t straddle a page boundary at the moment?   We also tested with stock 19.11 and did not see the crash.  However the FIFO’s were not straddling a page boundary, and so we believe it is also vulnerable.   Thanks!   -Scott --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -143,31 +143,31 @@ kni_reserve_mz(struct rte_kni *kni)         char mz_name[RTE_MEMZONE_NAMESIZE];         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_TX_Q_MZ_NAME_FMT, kni->name); -       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); +       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);         KNI_MEM_CHECK(kni->m_tx_q == NULL, tx_q_fail);         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RX_Q_MZ_NAME_FMT, kni->name); -       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); +       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);         KNI_MEM_CHECK(kni->m_rx_q == NULL, rx_q_fail);         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_ALLOC_Q_MZ_NAME_FMT, kni->name); -       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); +       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);         KNI_MEM_CHECK(kni->m_alloc_q == NULL, alloc_q_fail);         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_FREE_Q_MZ_NAME_FMT, kni->name); -       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); +       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);         KNI_MEM_CHECK(kni->m_free_q == NULL, free_q_fail);         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_REQ_Q_MZ_NAME_FMT, kni->name); -       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); +       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);         KNI_MEM_CHECK(kni->m_req_q == NULL, req_q_fail);         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RESP_Q_MZ_NAME_FMT, kni->name); -       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); +       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);         KNI_MEM_CHECK(kni->m_resp_q == NULL, resp_q_fail);         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_SYNC_ADDR_MZ_NAME_FMT, kni->name); -       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); +       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);         KNI_MEM_CHECK(kni->m_sync_addr == NULL, sync_addr_fail);         return 0;   I removed --legacy-mem, the tx_q still straddles the same 2MB page boundary, yet now it’s been running for a few hours and everything seems OK.   This would seem to follow precedent in rte_mempool.c:                                   /* if we're trying to reserve contiguous memory, add appropriate                                 * memzone flag.                                 */