From patchwork Fri Sep 22 08:19:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bruce Richardson X-Patchwork-Id: 172 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6E669425CD; Fri, 22 Sep 2023 10:19:43 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DE56F402E2; Fri, 22 Sep 2023 10:19:42 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id 6C5214013F for ; Fri, 22 Sep 2023 10:19:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695370781; x=1726906781; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=L6dmhe+FsTvT2gIp2SyMWKOyxF0hRN8qD3bdJ8wnvk8=; b=m5hSorR9SNqvJRiT/TS9XqQKMY809T1W7tKmwYJAYD16lm89iyjIx1z+ 75EoFhitgi7dyks0pJf4Tw+IWof5ygVZeLi2bMCmW0TuNBaLpsYXIVXI/ bAGVCulKH4gWxZwCY0cULXBn+Sc5dGmJG6qCQWr8Y/e/gLFc88Z+AY9dm t2QRjm8t3BoKAE0xWSuXbI/xTamZG4Wn7N2h1OtE3GP9a691DQUiCm7lw oiykIwYDNVO3V5EmSZyIAeGu72ugT7C7lZeJZDJjkXEDndkmUaSYZdbQR uofePTTf1uS/yB/Q1p/+WR6ysh3+WsI4cvsOFu6wrOd2jwkozPyvMEejR g==; X-IronPort-AV: E=McAfee;i="6600,9927,10840"; a="378063954" X-IronPort-AV: E=Sophos;i="6.03,167,1694761200"; d="scan'208";a="378063954" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 01:19:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10840"; a="1078281108" X-IronPort-AV: E=Sophos;i="6.03,167,1694761200"; d="scan'208";a="1078281108" Received: from silpixa00401385.ir.intel.com ([10.237.214.14]) by fmsmga005.fm.intel.com with ESMTP; 22 Sep 2023 01:19:20 -0700 From: Bruce Richardson To: dev@dpdk.org Cc: Bruce Richardson Subject: [RFC PATCH 0/5] Using shared mempools for zero-copy IO proxying Date: Fri, 22 Sep 2023 09:19:07 +0100 Message-Id: <20230922081912.7090-1-bruce.richardson@intel.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Following my talk at the recent DPDK Summit [1], here is an RFC patchset containing the prototypes I created which led to the talk. This patchset is simply to demonstrate: * what is currently possible with DPDK in terms of zero-copy IPC * where the big gaps, and general problem areas are * what the performance is like doing zero-copy between processes * how we may look to have new deployment models for DPDK apps. This cover letter is quite long, as it covers how to run the demo app and use the drivers included in this set. I felt it more accessible this way than putting it in rst files in the patches. This patchset depends upon patchsets [2] and [3] [1] https://dpdksummit2023.sched.com/event/1P9wU [2] http://patches.dpdk.org/project/dpdk/list/?series=29536 [3] http://patches.dpdk.org/project/dpdk/list/?series=29538 Overview -------- The patchset contains at a high level the following parts: a proxy application which performs packet IO and steers traffic on a per-queue basis to other applications which connect to it via unix sockets, and a set of drivers to be used by those applications so that they can (hopefully) receive packets from the proxy app without any changes to their own code. This all helps to demonstrate the feasibility of zero- copy packet transfer between independent DPDK apps. The drivers are: * a bus driver, which makes the connection to the proxy app via the unix socket. Thereafter it accepts the shared memory from the proxy and maps it into the running process for use for buffers and rings etc. It also handled communication with the proxy app on behalf of the other two drivers * a mempool driver, which simply manages a set of buffers on the basis of offsets within the shared memory area rather than using pointers. The big downside of its use is that it assumes all the objects stored in the mempool are mbufs. (As described in my talk, this is a big issue where I'm not sure we have a good solution available right now to resolve it) * an ethernet driver, which creates an rx and tx ring in shared memory for use in communicating with the proxy app. All buffers sent/received are converted to offsets within the shared memory area. The proxy app itself implements all the other logic - mostly inside datapath.c - to allow the connecting app to run. When an app connects to the unix socket, the proxy app uses memfd to create a hugepage block to be passed through to the "guest" app, and then sends/receives the messages from the drivers until the app connection is up and running to handle traffic. [Ideally, this IPC over unix socket mechanism should probably be generalized into a library used by the app, but for now it's just built-in]. As stated above, the steering of traffic is done per-queue, that is, each app connects to a specific socket corresponding to a NIC queue. For demo purposes, the traffic to the queues is just distributed using RSS, but obviously it would be possible to use e.g. rte_flow to do more interesting distribution in future. Running the Apps ---------------- To get things all working just do a DPDK build as normal. Then run the io-proxy app. It only takes a single parameter of the core number to use. For example, on my system I run it on lcore 25: ./build/app/dpdk-io-proxy 25 The sockets to be created and how they map to ports/queues is controlled via commandline, but a startup script can be provided, which just needs to be in the current directory and name "dpdk-io-proxy.cmds". Patch 5 of this set contains an example setup that I use. Therefore it's recommended that you run the proxy app from a directory containing that file. If so, the proxy app will use two ports and create two queues on each, mapping them to 4 unix socket files in /tmp. (Each socket is created in its own directory to simplify use with docker containers as described below in next section). No traffic is handled by the app until other end-user apps connect to it. Testpmd works as that second "guest" app without any changes to it. To run multiple testpmd instances, each taking traffic from a unique RX queue and forwarding it back, the following sequence of commands can be used [in this case, doing forwarding on cores 26 through 29, and using the 4 unix sockets configured using the startup file referenced above]. ./build/app/dpdk-testpmd -l 24,26 --no-huge -m1 --no-shconf \ -a sock:/tmp/socket_0_0/sock -- --forward-mode=macswap ./build/app/dpdk-testpmd -l 24,27 --no-huge -m1 --no-shconf \ -a sock:/tmp/socket_0_1/sock -- --forward-mode=macswap ./build/app/dpdk-testpmd -l 24,28 --no-huge -m1 --no-shconf \ -a sock:/tmp/socket_1_0/sock -- --forward-mode=macswap ./build/app/dpdk-testpmd -l 24,29 --no-huge -m1 --no-shconf \ -a sock:/tmp/socket_1_1/sock -- --forward-mode=macswap NOTE: * the "--no-huge -m1" is present to guarantee that no regular DPDK hugepage memory is being used by the app. It's all coming from the proxy app's memfd * the "--no-shconf" parameter is necessary just to avoid us needing to specify a unix file-prefix for each instance * the forwarding type to be used is optional, macswap is chosen just to have some work done inside testpmd to prove it can touch the packet payload, not just the mbuf header. Using with docker containers ---------------------------- The testpmd instances run above can also be run within a docker container. Using a dockerfile like below we can run testpmd in a container getting the packets in a zero-copy manner from the io-proxy running on the host. # syntax=docker/dockerfile:1-labs FROM alpine RUN apk add --update alpine-sdk \ py3-elftools meson ninja \ bsd-compat-headers \ linux-headers \ numactl-dev \ bash ADD . dpdk WORKDIR dpdk RUN rm -rf build RUN meson setup -Denable_drivers=*/shared_mem -Ddisable_libs=* \ -Denable_apps=test-pmd -Dtests=false build RUN ninja -v -C build ENTRYPOINT ["/dpdk/build/app/dpdk-testpmd"] To access the proxy, all the container needs is access to the unix socket on the filesystem. Since in the example startup script each socket is placed in its own directory we can use "--volume" parameter to give each instance it's own unique unix socket, and therefore proxied NIC RX/TX queue. To run four testpmd instances as above, just in containers the following commands can be used - assuming the dockerfile above is built to an image called "testpmd". docker run -it --volume=/tmp/socket_0_0:/run testpmd \ -l 24,26 --no-huge -a sock:/run/sock -- \ --no-mlockall --forward-mode=macswap docker run -it --volume=/tmp/socket_0_1:/run testpmd \ -l 24,27 --no-huge -a sock:/run/sock -- \ --no-mlockall --forward-mode=macswap docker run -it --volume=/tmp/socket_1_0:/run testpmd \ -l 24,28 --no-huge -a sock:/run/sock -- \ --no-mlockall --forward-mode=macswap docker run -it --volume=/tmp/socket_1_1:/run testpmd \ -l 24,29 --no-huge -a sock:/run/sock -- \ --no-mlockall --forward-mode=macswap NOTE: since these docker testpmd instances don't access IO or allocate hugepages directly, they should be runable without extra privileges, so long as they can connect to the unix socket. Additional info --------------- * Stats are available via app commandline * By default (#define in code), the proxy app only uses 2 queues per port, so you can't configure more than that via cmdline * Any ports used by the proxy script must support queue reconfiguration at runtime without stopping the port. * When a "guest" process connected to a socket terminates, all shared memory used by that process is detroyed and a new memfd created on reconnect. * The above setups using testpmd are the only ways in which this app and drivers have been tested. I would be hopeful that other apps would work too, but there are quite a few limitations (see my DPDK summit talk for some more details on those). Congratulations on reading this far! :-) All comments/feedback on this welcome. Bruce Richardson (5): bus: new driver to accept shared memory over unix socket mempool: driver for mempools of mbufs on shared memory net: new ethdev driver to communicate using shared mem app: add IO proxy app using shared memory interfaces app/io-proxy: add startup commands app/io-proxy/command_fns.c | 160 ++++++ app/io-proxy/commands.list | 6 + app/io-proxy/datapath.c | 595 +++++++++++++++++++++ app/io-proxy/datapath.h | 37 ++ app/io-proxy/datapath_mp.c | 78 +++ app/io-proxy/dpdk-io-proxy.cmds | 6 + app/io-proxy/main.c | 71 +++ app/io-proxy/meson.build | 12 + app/meson.build | 1 + drivers/bus/meson.build | 1 + drivers/bus/shared_mem/meson.build | 11 + drivers/bus/shared_mem/shared_mem_bus.c | 323 +++++++++++ drivers/bus/shared_mem/shared_mem_bus.h | 75 +++ drivers/bus/shared_mem/version.map | 11 + drivers/mempool/meson.build | 1 + drivers/mempool/shared_mem/meson.build | 10 + drivers/mempool/shared_mem/shared_mem_mp.c | 94 ++++ drivers/net/meson.build | 1 + drivers/net/shared_mem/meson.build | 11 + drivers/net/shared_mem/shared_mem_eth.c | 295 ++++++++++ 20 files changed, 1799 insertions(+) create mode 100644 app/io-proxy/command_fns.c create mode 100644 app/io-proxy/commands.list create mode 100644 app/io-proxy/datapath.c create mode 100644 app/io-proxy/datapath.h create mode 100644 app/io-proxy/datapath_mp.c create mode 100644 app/io-proxy/dpdk-io-proxy.cmds create mode 100644 app/io-proxy/main.c create mode 100644 app/io-proxy/meson.build create mode 100644 drivers/bus/shared_mem/meson.build create mode 100644 drivers/bus/shared_mem/shared_mem_bus.c create mode 100644 drivers/bus/shared_mem/shared_mem_bus.h create mode 100644 drivers/bus/shared_mem/version.map create mode 100644 drivers/mempool/shared_mem/meson.build create mode 100644 drivers/mempool/shared_mem/shared_mem_mp.c create mode 100644 drivers/net/shared_mem/meson.build create mode 100644 drivers/net/shared_mem/shared_mem_eth.c --- 2.39.2