[RFC,0/5] Using shared mempools for zero-copy IO proxying

Message ID 20230922081912.7090-1-bruce.richardson@intel.com (mailing list archive)
Series Using shared mempools for zero-copy IO proxying |


Bruce Richardson Sept. 22, 2023, 8:19 a.m. UTC
  Following my talk at the recent DPDK Summit [1], here is an RFC patchset
containing the prototypes I created which led to the talk.  This
patchset is simply to demonstrate:

* what is currently possible with DPDK in terms of zero-copy IPC
* where the big gaps, and general problem areas are
* what the performance is like doing zero-copy between processes
* how we may look to have new deployment models for DPDK apps.

This cover letter is quite long, as it covers how to run the demo app
and use the drivers included in this set. I felt it more accessible this
way than putting it in rst files in the patches. This patchset depends
upon patchsets [2] and [3]

[1] https://dpdksummit2023.sched.com/event/1P9wU
[2] http://patches.dpdk.org/project/dpdk/list/?series=29536
[3] http://patches.dpdk.org/project/dpdk/list/?series=29538


The patchset contains at a high level the following parts: a proxy
application which performs packet IO and steers traffic on a per-queue
basis to other applications which connect to it via unix sockets, and a
set of drivers to be used by those applications so that they can
(hopefully) receive packets from the proxy app without any changes to
their own code. This all helps to demonstrate the feasibility of zero-
copy packet transfer between independent DPDK apps.

The drivers are:
* a bus driver, which makes the connection to the proxy app via
  the unix socket. Thereafter it accepts the shared memory from the
  proxy and maps it into the running process for use for buffers and
  rings etc. It also handled communication with the proxy app on behalf
  of the other two drivers
* a mempool driver, which simply manages a set of buffers on the basis
  of offsets within the shared memory area rather than using pointers.
  The big downside of its use is that it assumes all the objects stored
  in the mempool are mbufs. (As described in my talk, this is a big
  issue where I'm not sure we have a good solution available right now
  to resolve it)
* an ethernet driver, which creates an rx and tx ring in shared memory
  for use in communicating with the proxy app. All buffers sent/received
  are converted to offsets within the shared memory area.

The proxy app itself implements all the other logic - mostly inside
datapath.c - to allow the connecting app to run. When an app connects to
the unix socket, the proxy app uses memfd to create a hugepage block to
be passed through to the "guest" app, and then sends/receives the
messages from the drivers until the app connection is up and running to
handle traffic. [Ideally, this IPC over unix socket mechanism should
probably be generalized into a library used by the app, but for now it's
just built-in]. As stated above, the steering of traffic is done
per-queue, that is, each app connects to a specific socket corresponding
to a NIC queue. For demo purposes, the traffic to the queues is just
distributed using RSS, but obviously it would be possible to use e.g.
rte_flow to do more interesting distribution in future.

Running the Apps

To get things all working just do a DPDK build as normal. Then run the
io-proxy app. It only takes a single parameter of the core number to
use. For example, on my system I run it on lcore 25:

	./build/app/dpdk-io-proxy 25

The sockets to be created and how they map to ports/queues is controlled
via commandline, but a startup script can be provided, which just needs
to be in the current directory and name "dpdk-io-proxy.cmds". Patch 5 of
this set contains an example setup that I use. Therefore it's
recommended that you run the proxy app from a directory containing that
file. If so, the proxy app will use two ports and create two queues on
each, mapping them to 4 unix socket files in /tmp. (Each socket is
created in its own directory to simplify use with docker containers as
described below in next section).

No traffic is handled by the app until other end-user apps connect to
it. Testpmd works as that second "guest" app without any changes to it.
To run multiple testpmd instances, each taking traffic from a unique RX
queue and forwarding it back, the following sequence of commands can be
used [in this case, doing forwarding on cores 26 through 29, and using
the 4 unix sockets configured using the startup file referenced above].

	./build/app/dpdk-testpmd -l 24,26 --no-huge -m1 --no-shconf \
		-a sock:/tmp/socket_0_0/sock  -- --forward-mode=macswap
	./build/app/dpdk-testpmd -l 24,27 --no-huge -m1 --no-shconf \
		-a sock:/tmp/socket_0_1/sock  -- --forward-mode=macswap
	./build/app/dpdk-testpmd -l 24,28 --no-huge -m1 --no-shconf \
		-a sock:/tmp/socket_1_0/sock  -- --forward-mode=macswap
	./build/app/dpdk-testpmd -l 24,29 --no-huge -m1 --no-shconf \
		-a sock:/tmp/socket_1_1/sock  -- --forward-mode=macswap

* the "--no-huge -m1" is present to guarantee that no regular DPDK
  hugepage memory is being used by the app. It's all coming from the
  proxy app's memfd
* the "--no-shconf" parameter is necessary just to avoid us needing to
  specify a unix file-prefix for each instance
* the forwarding type to be used is optional, macswap is chosen just to
  have some work done inside testpmd to prove it can touch the packet
  payload, not just the mbuf header.

Using with docker containers

The testpmd instances run above can also be run within a docker
container. Using a dockerfile like below we can run testpmd in a
container getting the packets in a zero-copy manner from the io-proxy
running on the host.

   # syntax=docker/dockerfile:1-labs
   FROM alpine
   RUN apk add --update alpine-sdk \
           py3-elftools meson ninja \
           bsd-compat-headers \
           linux-headers \
           numactl-dev \
   ADD . dpdk
   WORKDIR dpdk
   RUN rm -rf build
   RUN meson setup -Denable_drivers=*/shared_mem -Ddisable_libs=* \
        -Denable_apps=test-pmd -Dtests=false build
   RUN ninja -v -C build
   ENTRYPOINT ["/dpdk/build/app/dpdk-testpmd"]

To access the proxy, all the container needs is access to the unix
socket on the filesystem. Since in the example startup script each
socket is placed in its own directory we can use "--volume" parameter to
give each instance it's own unique unix socket, and therefore proxied
NIC RX/TX queue. To run four testpmd instances as above, just in
containers the following commands can be used - assuming the dockerfile
above is built to an image called "testpmd".

	docker run -it --volume=/tmp/socket_0_0:/run testpmd \
		-l 24,26 --no-huge -a sock:/run/sock -- \
		--no-mlockall --forward-mode=macswap
	docker run -it --volume=/tmp/socket_0_1:/run testpmd \
		-l 24,27 --no-huge -a sock:/run/sock -- \
		--no-mlockall --forward-mode=macswap
	docker run -it --volume=/tmp/socket_1_0:/run testpmd \
		-l 24,28 --no-huge -a sock:/run/sock -- \
		--no-mlockall --forward-mode=macswap
	docker run -it --volume=/tmp/socket_1_1:/run testpmd \
		-l 24,29 --no-huge -a sock:/run/sock -- \
		--no-mlockall --forward-mode=macswap

NOTE: since these docker testpmd instances don't access IO or allocate
hugepages directly, they should be runable without extra privileges, so
long as they can connect to the unix socket.

Additional info

* Stats are available via app commandline
* By default (#define in code), the proxy app only uses 2 queues per
  port, so you can't configure more than that via cmdline
* Any ports used by the proxy script must support queue reconfiguration
  at runtime without stopping the port.
* When a "guest" process connected to a socket terminates, all shared
  memory used by that process is detroyed and a new memfd created on
* The above setups using testpmd are the only ways in which this app and
  drivers have been tested. I would be hopeful that other apps would
  work too, but there are quite a few limitations (see my DPDK summit
  talk for some more details on those).

Congratulations on reading this far! :-)
All comments/feedback on this welcome.

Bruce Richardson (5):
  bus: new driver to accept shared memory over unix socket
  mempool: driver for mempools of mbufs on shared memory
  net: new ethdev driver to communicate using shared mem
  app: add IO proxy app using shared memory interfaces
  app/io-proxy: add startup commands

 app/io-proxy/command_fns.c                 | 160 ++++++
 app/io-proxy/commands.list                 |   6 +
 app/io-proxy/datapath.c                    | 595 +++++++++++++++++++++
 app/io-proxy/datapath.h                    |  37 ++
 app/io-proxy/datapath_mp.c                 |  78 +++
 app/io-proxy/dpdk-io-proxy.cmds            |   6 +
 app/io-proxy/main.c                        |  71 +++
 app/io-proxy/meson.build                   |  12 +
 app/meson.build                            |   1 +
 drivers/bus/meson.build                    |   1 +
 drivers/bus/shared_mem/meson.build         |  11 +
 drivers/bus/shared_mem/shared_mem_bus.c    | 323 +++++++++++
 drivers/bus/shared_mem/shared_mem_bus.h    |  75 +++
 drivers/bus/shared_mem/version.map         |  11 +
 drivers/mempool/meson.build                |   1 +
 drivers/mempool/shared_mem/meson.build     |  10 +
 drivers/mempool/shared_mem/shared_mem_mp.c |  94 ++++
 drivers/net/meson.build                    |   1 +
 drivers/net/shared_mem/meson.build         |  11 +
 drivers/net/shared_mem/shared_mem_eth.c    | 295 ++++++++++
 20 files changed, 1799 insertions(+)
 create mode 100644 app/io-proxy/command_fns.c
 create mode 100644 app/io-proxy/commands.list
 create mode 100644 app/io-proxy/datapath.c
 create mode 100644 app/io-proxy/datapath.h
 create mode 100644 app/io-proxy/datapath_mp.c
 create mode 100644 app/io-proxy/dpdk-io-proxy.cmds
 create mode 100644 app/io-proxy/main.c
 create mode 100644 app/io-proxy/meson.build
 create mode 100644 drivers/bus/shared_mem/meson.build
 create mode 100644 drivers/bus/shared_mem/shared_mem_bus.c
 create mode 100644 drivers/bus/shared_mem/shared_mem_bus.h
 create mode 100644 drivers/bus/shared_mem/version.map
 create mode 100644 drivers/mempool/shared_mem/meson.build
 create mode 100644 drivers/mempool/shared_mem/shared_mem_mp.c
 create mode 100644 drivers/net/shared_mem/meson.build
 create mode 100644 drivers/net/shared_mem/shared_mem_eth.c