Message ID | 20200306164104.15528-1-aostruszka@marvell.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id F2E8DA056A; Fri, 6 Mar 2020 17:41:12 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EEABA1BFBB; Fri, 6 Mar 2020 17:41:11 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by dpdk.org (Postfix) with ESMTP id 6F88023D for <dev@dpdk.org>; Fri, 6 Mar 2020 17:41:10 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 026FwMMj026250 for <dev@dpdk.org>; Fri, 6 Mar 2020 08:41:09 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : subject : date : message-id : mime-version : content-transfer-encoding : content-type; s=pfpt0818; bh=MqT5lloSqZxbWnYZac3Vp+jfo9+YXHxuL1dYx2mv5KI=; b=bgdvfvYgXQqnj02zl5WHb4d3g30zYv8uVqh7V49ixp5Of3OxgkBvBXdHghAXF/8gCCBh lqwI/LXcp19iBftQEF3ESwhxuQ+Y0y1s3jeRvCpIH0wapKQXkyc+9aPGtQp3MXtfcrPn 94y7H/qXNltpeWIf0F2/yBz63aaQMwuTgOPgFxORRmBQyWciIYeNOjvAX+Nk3qzDBv8H Z0GlEpxqYZtPLVU3nd93spWr3ikIrvp5ezmOZvLgiycqWDwnaMVCLsUxPkow0mmjrgTa I2BzlVkIeCYrWmA5EYgUTwNvnTPmJpsXLCOa7VkPJ0ytzLldVxJtH8ANI6NQUJK2I2u8 3g== Received: from sc-exch03.marvell.com ([199.233.58.183]) by mx0b-0016f401.pphosted.com with ESMTP id 2yhn0ydvr2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for <dev@dpdk.org>; Fri, 06 Mar 2020 08:41:09 -0800 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 6 Mar 2020 08:41:07 -0800 Received: from SC-EXCH01.marvell.com (10.93.176.81) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 6 Mar 2020 08:41:07 -0800 Received: from maili.marvell.com (10.93.176.43) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 6 Mar 2020 08:41:06 -0800 Received: from amok.marvell.com (unknown [10.95.130.79]) by maili.marvell.com (Postfix) with ESMTP id 0BB473F7040 for <dev@dpdk.org>; Fri, 6 Mar 2020 08:41:05 -0800 (PST) From: Andrzej Ostruszka <aostruszka@marvell.com> To: <dev@dpdk.org> Date: Fri, 6 Mar 2020 17:41:00 +0100 Message-ID: <20200306164104.15528-1-aostruszka@marvell.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-03-06_05:2020-03-06, 2020-03-06 signatures=0 Subject: [dpdk-dev] [PATCH 0/4] Introduce IF proxy library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series | Introduce IF proxy library | |
Message
Andrzej Ostruszka [C]
March 6, 2020, 4:41 p.m. UTC
What is this useful for ======================= Usually, when an ethernet port is assigned to DPDK it vanishes from the system and user looses ability to control it via normal configuration utilities (e.g. those from iproute2 package). Moreover by default DPDK application is not aware of the network configuration of the system. To address both of these issues application needs to: - add some command line interface (or other mechanism) allowing for control of the port and its configuration - query the status of network configuration and monitor its changes The purpose of this library is to help with both of these tasks (as long as they remain in domain of configuration available to the system). In other words, if DPDK application has some special needs, that cannot be addressed by the normal system configuration utilities, then they need to be solved by the application itself. The connection between DPDK and system is based on the existence of ports that are visible to both DPDK and system (like Tap, KNI and possibly some other drivers). These ports serve as an interface proxies. Let's visualize the action of the library by the following example: Linux | DPDK ============================================================== | | +-------+ +-------+ | | Port1 | | Port2 | "ip link set dev tap1 mtu 1600" | +-------+ +-------+ | | ^ ^ ^ | +------+ | mtu_change | | `->| Tap1 |---' callback | | +------+ | | "ip addr add 198.51.100.14 \ | | | dev tap2" | | | | +------+ | | +->| Tap2 |------------------' | | +------+ addr_add callback | "ip route add 198.0.2.0/24 \ | | | dev tap2" | | route_add callback | | `---------------------' So we have two ports Port1 and Port2 that are not visible to the system. We create two proxy interfaces (here based on Tap driver) and bind the ports to their proxies. When user issues a command changing MTU for Tap1 interface the library notes this and calls "mtu_change" callback for the Port1. Similarly when user adds an IPv4 address to the Tap2 interface "addr_add" callback is called for the Port2 and the same happens for configuration of routing rule pointing to Tap2. Apart from callbacks this library can notify about changes via adding events to notification queues. See below for more inforamtion about that and a complete list of available callbacks. Please note that nothing has been mentioned about forwarding of the packets between system and DPDK. Since the proxies are normal DPDK ports you can receive/send to them via usual RX/TX burst API. However since the library is not aware of the structure of packet processing used by the application it cannot automatically forward the packets - it is responsibility of the application to include proxy ports into its packet processing engine. As mentioned above the intention of the library is to: - provide information about network configuration that would allow application to decide what to do with the packets received on DPDK ports, - allow for control of the ports via standard configuration utilities Although the library only helps you to identify proxy for given port (and vice versa) and calls appropriate callbacks it does open some interesting possibilities. For example you can use the proxy ports to forward packets for protocols that you do not wish to handle in DPDK application to the system protocol stack and just listen to the configuration changes - so that way you can "offload" handling of those protocols to the system. How to use it ============= Usage of this library is rather simple. You have to: 1. Create proxy (if you don't have port suitable for being proxy or you have one but do not wish to use it as a proxy). 2. Bind port to proxy. 3. Register callbacks and/or event queues. 4. Start listening to the network configuration. The only mandatory requirement for DPDK port to be able to act as a proxy is that it is visible in the system - this is checked during port to proxy binding by calling rte_eth_dev_info_get() on proxy port and inspecting 'if_index' field (it has to be non-zero). One can create such port in the application by calling: proxy_id = rte_ifpx_create(RTE_IFPX_DEFAULT); Upon success this returns id of DPDK proxy port created (RTE_MAX_ETHPORTS on failure). The argument selects type of proxy port to create (currently Tap/KNI only). This function actually is just a wrapper around: uint16_t rte_ifpx_create_by_devarg(const char *devarg); creating valid 'devarg' string for the chosen type of proxy. If you have other driver capable of acting as a proxy you can call rte_ifpx_create_by_devarg() directly passing appropriate argument. Once you have id of both port and proxy you can bind the two via: rte_ifpx_port_bind(port_id, proxy_id); This creates logical binding - as mentioned above there is no automatic packet forwarding. With this binding whenever user changes the state of proxy interface in the system (link up/down, change mac/mtu, add/remove IPv4/IPv6) you get appropriate notification for the bound port. So far we've mentioned several times that the library calls callbacks. They are grouped in 'struct rte_ifpx_callbacks' and user provides them to the library via: rte_ifpx_callbacks_register(&cbs); It is worth mentioning that the context (lcore/thread) in which these callbacks are called is implementation defined. It might differ between different platforms, so the application needs to assume that some kind of inter lcore/thread synchronization/communication is required. Apart from notification via callbacks this library also supports notifying about the changes via adding events to the configured notification queues. The queues are registered via: int rte_ifpx_queue_add(struct rte_ring *r); and the actual logic used is: if there is callback registered then it is called, if it returns non-zero then event is considered completed, otherwise event is added to each configured notification queue. That way application can update data structures that are safe to be modified by single writer from within callback or do the common preprocessing steps (if any needed) in callback and data that is replicated can be updated during handling of queued events. Once we have bindings in place and notification configured, the only essential part that remains is to get the current network configuration and start listening to its changes. This is accomplished via a call to: rte_ifpx_listen(); And basically this is all one needs to understand how to use this library. Other less essential parts include: - ability to query what events are available for given platform - getting mapping between proxy and port - unbinding the ports from proxy - destroying proxy port - closing the listening service - getting basic information about proxy Currently available features and implementation =============================================== The library's API is system independent but it obviously needs some system dependent parts. We provide exemplary Linux implementation (based on netlink sockets). Very similar implementation is possible for FreeBSD (with the usage of PF_ROUTE sockets). Windows implementation would need to differ much (probably IP Helper library would be of some help). Here is the list of currently implemented callbacks: struct rte_ifpx_callbacks { int (*mac_change)(const struct rte_ifpx_mac_change *event); int (*mtu_change)(const struct rte_ifpx_mtu_change *event); int (*link_change)(const struct rte_ifpx_link_change *event); int (*addr_add)(const struct rte_ifpx_addr_change *event); int (*addr_del)(const struct rte_ifpx_addr_change *event); int (*addr6_add)(const struct rte_ifpx_addr6_change *event); int (*addr6_del)(const struct rte_ifpx_addr6_change *event); int (*route_add)(const struct rte_ifpx_route_change *event); int (*route_del)(const struct rte_ifpx_route_change *event); int (*route6_add)(const struct rte_ifpx_route6_change *event); int (*route6_del)(const struct rte_ifpx_route6_change *event); int (*neigh_add)(const struct rte_ifpx_neigh_change *event); int (*neigh_del)(const struct rte_ifpx_neigh_change *event); int (*neigh6_add)(const struct rte_ifpx_neigh6_change *event); int (*neigh6_del)(const struct rte_ifpx_neigh6_change *event); int (*cfg_done)(void); }; They are all rather self-descriptive with the exception of the last one. When the user calls rte_ifpx_listen() the library first queries the system for its current configuration. That might require several request/reply exchanges between DPDK and system and once it is finished this callback is called to let application know that all info has been gathered. It is worth to mention also that while typical case would be a 1-to-1 mapping between port and proxy, the 1-to-many mapping is also supported. In that case port related callbacks will be called for each port bound to given proxy interface - in that case it is application responsibility to define semantic of such mapping (e.g. all changes apply to all ports, or link changes apply to all but other are accepted in "round robin" fashion, or ...). As mentioned above Linux implementation is based on netlink socket. This socket is registered as file descriptor in EAL interrupts (similarly to how EAL alarms are implemented). What has changed since the RFC ============================== - Platform dependent parts has been separated into a ifpx_platform structure with callbacks for initialization, getting information about the interface, listening to the changes and closing of the library. That should allow easier reimplementation. - Notification scheme has been changed - instead of having just callbacks now event queueing is also available (or a mix of those two). - Filtering of events only related to the proxy ports - previously all network configuration changes were reported. But DPDK application needs not to know all configuration - only just portion related to the proxy ports. If a packet comes that does not match rules then it can be forwarded via proxy to the system to decide what to do with it. If that is not desired and such packets should be dropped then null port can be created with proxy and e.g. default route installed on it. - Removed previous example which was just printing notification. Instead added a simplified (stripped vectorization and other performance improvements) version of l3fwd that should serve as an example of using this library in real applications. With regards Andrzej Ostruszka Andrzej Ostruszka (4): lib: introduce IF Proxy library if_proxy: add library documentation if_proxy: add simple functionality test if_proxy: add example application MAINTAINERS | 6 + app/test/Makefile | 5 + app/test/meson.build | 4 + app/test/test_if_proxy.c | 706 +++++++++++ config/common_base | 5 + config/common_linux | 1 + doc/guides/prog_guide/if_proxy_lib.rst | 142 +++ doc/guides/prog_guide/index.rst | 1 + examples/Makefile | 1 + examples/l3fwd-ifpx/Makefile | 60 + examples/l3fwd-ifpx/l3fwd.c | 1123 +++++++++++++++++ examples/l3fwd-ifpx/l3fwd.h | 98 ++ examples/l3fwd-ifpx/main.c | 729 +++++++++++ examples/l3fwd-ifpx/meson.build | 11 + examples/meson.build | 2 +- lib/Makefile | 2 + .../common/include/rte_eal_interrupts.h | 2 + lib/librte_eal/linux/eal/eal_interrupts.c | 14 +- lib/librte_if_proxy/Makefile | 29 + lib/librte_if_proxy/if_proxy_common.c | 494 ++++++++ lib/librte_if_proxy/if_proxy_priv.h | 97 ++ lib/librte_if_proxy/linux/Makefile | 4 + lib/librte_if_proxy/linux/if_proxy.c | 552 ++++++++ lib/librte_if_proxy/meson.build | 19 + lib/librte_if_proxy/rte_if_proxy.h | 561 ++++++++ lib/librte_if_proxy/rte_if_proxy_version.map | 19 + lib/meson.build | 2 +- 27 files changed, 4683 insertions(+), 6 deletions(-) create mode 100644 app/test/test_if_proxy.c create mode 100644 doc/guides/prog_guide/if_proxy_lib.rst create mode 100644 examples/l3fwd-ifpx/Makefile create mode 100644 examples/l3fwd-ifpx/l3fwd.c create mode 100644 examples/l3fwd-ifpx/l3fwd.h create mode 100644 examples/l3fwd-ifpx/main.c create mode 100644 examples/l3fwd-ifpx/meson.build create mode 100644 lib/librte_if_proxy/Makefile create mode 100644 lib/librte_if_proxy/if_proxy_common.c create mode 100644 lib/librte_if_proxy/if_proxy_priv.h create mode 100644 lib/librte_if_proxy/linux/Makefile create mode 100644 lib/librte_if_proxy/linux/if_proxy.c create mode 100644 lib/librte_if_proxy/meson.build create mode 100644 lib/librte_if_proxy/rte_if_proxy.h create mode 100644 lib/librte_if_proxy/rte_if_proxy_version.map
Comments
My apologies - I have forgotten to run checkpatch on the series. I will correct these in version 2 - in the mean time please skip these minor faults and comment on the rest. With regards Andrzej Ostruszka
On Fri, 6 Mar 2020 17:41:00 +0100 Andrzej Ostruszka <aostruszka@marvell.com> wrote: > What is this useful for > ======================= > > Usually, when an ethernet port is assigned to DPDK it vanishes from the > system and user looses ability to control it via normal configuration > utilities (e.g. those from iproute2 package). Moreover by default DPDK > application is not aware of the network configuration of the system. > > To address both of these issues application needs to: > - add some command line interface (or other mechanism) allowing for > control of the port and its configuration > - query the status of network configuration and monitor its changes > > The purpose of this library is to help with both of these tasks (as long > as they remain in domain of configuration available to the system). In > other words, if DPDK application has some special needs, that cannot be > addressed by the normal system configuration utilities, then they need > to be solved by the application itself. > > The connection between DPDK and system is based on the existence of > ports that are visible to both DPDK and system (like Tap, KNI and > possibly some other drivers). These ports serve as an interface > proxies. > > Let's visualize the action of the library by the following example: > > Linux | DPDK > ============================================================== > | > | +-------+ +-------+ > | | Port1 | | Port2 | > "ip link set dev tap1 mtu 1600" | +-------+ +-------+ > | | ^ ^ ^ > | +------+ | mtu_change | | > `->| Tap1 |---' callback | | > +------+ | | > "ip addr add 198.51.100.14 \ | | | > dev tap2" | | | > | +------+ | | > +->| Tap2 |------------------' | > | +------+ addr_add callback | > "ip route add 198.0.2.0/24 \ | | | > dev tap2" | | route_add callback | > | `---------------------' Has anyone investigated solving this in the kernel rather than creating the added overhead of more Linux devices? What I am thinking of is a netlink to userspace interface. The kernel already has File-System-in-Userspace (FUSE) to allow for filesystems. What about having a NUSE (Netlink in userspace)? Then DPDK could have a daemon that is a provider to NUSE. This solution would also benefit other non-DPDK projects like VPP and allow DPDK to integrate with devlink etc.
On Thu, Apr 16, 2020 at 9:41 PM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Fri, 6 Mar 2020 17:41:00 +0100 > Andrzej Ostruszka <aostruszka@marvell.com> wrote: > > > What is this useful for > > ======================= > > > > Usually, when an ethernet port is assigned to DPDK it vanishes from the > > system and user looses ability to control it via normal configuration > > utilities (e.g. those from iproute2 package). Moreover by default DPDK > > application is not aware of the network configuration of the system. > > > > To address both of these issues application needs to: > > - add some command line interface (or other mechanism) allowing for > > control of the port and its configuration > > - query the status of network configuration and monitor its changes > > > > The purpose of this library is to help with both of these tasks (as long > > as they remain in domain of configuration available to the system). In > > other words, if DPDK application has some special needs, that cannot be > > addressed by the normal system configuration utilities, then they need > > to be solved by the application itself. > > > > The connection between DPDK and system is based on the existence of > > ports that are visible to both DPDK and system (like Tap, KNI and > > possibly some other drivers). These ports serve as an interface > > proxies. > > > > Let's visualize the action of the library by the following example: > > > > Linux | DPDK > > ============================================================== > > | > > | +-------+ +-------+ > > | | Port1 | | Port2 | > > "ip link set dev tap1 mtu 1600" | +-------+ +-------+ > > | | ^ ^ ^ > > | +------+ | mtu_change | | > > `->| Tap1 |---' callback | | > > +------+ | | > > "ip addr add 198.51.100.14 \ | | | > > dev tap2" | | | > > | +------+ | | > > +->| Tap2 |------------------' | > > | +------+ addr_add callback | > > "ip route add 198.0.2.0/24 \ | | | > > dev tap2" | | route_add callback | > > | `---------------------' > > Has anyone investigated solving this in the kernel rather than > creating the added overhead of more Linux devices? > > What I am thinking of is a netlink to userspace interface. > The kernel already has File-System-in-Userspace (FUSE) to allow > for filesystems. What about having a NUSE (Netlink in userspace)? IMO, there is no issue with the Linux Netlink _userspace_ interface. The goal of IF proxy to abstract the OS differences so that it can work with Linux, FreeBSD, and Windows(if needed). > > Then DPDK could have a daemon that is a provider to NUSE. > This solution would also benefit other non-DPDK projects like VPP > and allow DPDK to integrate with devlink etc.
On Thu, 16 Apr 2020 22:19:05 +0530 Jerin Jacob <jerinjacobk@gmail.com> wrote: > On Thu, Apr 16, 2020 at 9:41 PM Stephen Hemminger > <stephen@networkplumber.org> wrote: > > > > On Fri, 6 Mar 2020 17:41:00 +0100 > > Andrzej Ostruszka <aostruszka@marvell.com> wrote: > > > > > What is this useful for > > > ======================= > > > > > > Usually, when an ethernet port is assigned to DPDK it vanishes from the > > > system and user looses ability to control it via normal configuration > > > utilities (e.g. those from iproute2 package). Moreover by default DPDK > > > application is not aware of the network configuration of the system. > > > > > > To address both of these issues application needs to: > > > - add some command line interface (or other mechanism) allowing for > > > control of the port and its configuration > > > - query the status of network configuration and monitor its changes > > > > > > The purpose of this library is to help with both of these tasks (as long > > > as they remain in domain of configuration available to the system). In > > > other words, if DPDK application has some special needs, that cannot be > > > addressed by the normal system configuration utilities, then they need > > > to be solved by the application itself. > > > > > > The connection between DPDK and system is based on the existence of > > > ports that are visible to both DPDK and system (like Tap, KNI and > > > possibly some other drivers). These ports serve as an interface > > > proxies. > > > > > > Let's visualize the action of the library by the following example: > > > > > > Linux | DPDK > > > ============================================================== > > > | > > > | +-------+ +-------+ > > > | | Port1 | | Port2 | > > > "ip link set dev tap1 mtu 1600" | +-------+ +-------+ > > > | | ^ ^ ^ > > > | +------+ | mtu_change | | > > > `->| Tap1 |---' callback | | > > > +------+ | | > > > "ip addr add 198.51.100.14 \ | | | > > > dev tap2" | | | > > > | +------+ | | > > > +->| Tap2 |------------------' | > > > | +------+ addr_add callback | > > > "ip route add 198.0.2.0/24 \ | | | > > > dev tap2" | | route_add callback | > > > | `---------------------' > > > > Has anyone investigated solving this in the kernel rather than > > creating the added overhead of more Linux devices? > > > > What I am thinking of is a netlink to userspace interface. > > The kernel already has File-System-in-Userspace (FUSE) to allow > > for filesystems. What about having a NUSE (Netlink in userspace)? > > IMO, there is no issue with the Linux Netlink _userspace_ interface. > The goal of IF proxy to abstract the OS differences so that it can > work with Linux, FreeBSD, and Windows(if needed). > > > > > > Then DPDK could have a daemon that is a provider to NUSE. > > This solution would also benefit other non-DPDK projects like VPP > > and allow DPDK to integrate with devlink etc. With the wider use of tap devices like this, it may be a problem for other usages of TAP. If nothing else, having to figure out which tap is which would be error prone. Also, TAP on Windows is only available as an out-of-tree driver from OpenVPN. And the TAP on Windows is quite, limited, deprecated, poorly supported and buggy. There is no standard TAP like interface in Windows. TAP on BSD is different than Linux and has different control functions. Don't remember what the interface notification mechanism is on BSD, it is not netlink. So is IF proxy even going to work on these other OS?
On 4/16/20 6:49 PM, Jerin Jacob wrote: > On Thu, Apr 16, 2020 at 9:41 PM Stephen Hemminger > <stephen@networkplumber.org> wrote: [...] >> Has anyone investigated solving this in the kernel rather than >> creating the added overhead of more Linux devices? >> >> What I am thinking of is a netlink to userspace interface. >> The kernel already has File-System-in-Userspace (FUSE) to allow >> for filesystems. What about having a NUSE (Netlink in userspace)? > > IMO, there is no issue with the Linux Netlink _userspace_ interface. > The goal of IF proxy to abstract the OS differences so that it can > work with Linux, FreeBSD, and Windows(if needed). My understanding of Stephen's question is a bit different - Stephen please correct me if I'm wrong. By the comparison with FUSE he was thinking about providing a "kernel proxy" to userspace-based port/interface, which could be used not only by DPDK but by other too. The answer from me is: no I have not. For two reasons: - that would be Linux only - if we would create such proxy, we would probably end up with tap like driver in the end With regards Andrzej Ostruszka
On Thu, 16 Apr 2020 17:12:07 +0000 "Andrzej Ostruszka [C]" <aostruszka@marvell.com> wrote: > On 4/16/20 6:49 PM, Jerin Jacob wrote: > > On Thu, Apr 16, 2020 at 9:41 PM Stephen Hemminger > > <stephen@networkplumber.org> wrote: > [...] > >> Has anyone investigated solving this in the kernel rather than > >> creating the added overhead of more Linux devices? > >> > >> What I am thinking of is a netlink to userspace interface. > >> The kernel already has File-System-in-Userspace (FUSE) to allow > >> for filesystems. What about having a NUSE (Netlink in userspace)? > > > > IMO, there is no issue with the Linux Netlink _userspace_ interface. > > The goal of IF proxy to abstract the OS differences so that it can > > work with Linux, FreeBSD, and Windows(if needed). > > My understanding of Stephen's question is a bit different - Stephen > please correct me if I'm wrong. By the comparison with FUSE he was > thinking about providing a "kernel proxy" to userspace-based > port/interface, which could be used not only by DPDK but by other too. > > The answer from me is: no I have not. For two reasons: > - that would be Linux only > - if we would create such proxy, we would probably end up with tap like > driver in the end > > With regards > Andrzej Ostruszka The point is think of the problem beyond just DPDK.
On 4/16/20 7:04 PM, Stephen Hemminger wrote: > On Thu, 16 Apr 2020 22:19:05 +0530 > Jerin Jacob <jerinjacobk@gmail.com> wrote: > >> On Thu, Apr 16, 2020 at 9:41 PM Stephen Hemminger >> <stephen@networkplumber.org> wrote: [...] >>> Has anyone investigated solving this in the kernel rather than >>> creating the added overhead of more Linux devices? >>> >>> What I am thinking of is a netlink to userspace interface. >>> The kernel already has File-System-in-Userspace (FUSE) to allow >>> for filesystems. What about having a NUSE (Netlink in userspace)? >> >> IMO, there is no issue with the Linux Netlink _userspace_ interface. >> The goal of IF proxy to abstract the OS differences so that it can >> work with Linux, FreeBSD, and Windows(if needed). >> >> >>> >>> Then DPDK could have a daemon that is a provider to NUSE. >>> This solution would also benefit other non-DPDK projects like VPP >>> and allow DPDK to integrate with devlink etc. > > With the wider use of tap devices like this, it may be a problem > for other usages of TAP. If nothing else, having to figure out which > tap is which would be error prone. Stephen, the library does not require TAP - only some DPDK port that is visible to the system (has non-zero if_index). As to the confusion - if we use TAP then it has optional 'iface=...' argument, so we can name those proxy interfaces as 'iface=proxy0' or something like that. This is under control of application (just call ...create_by_devarg() with proper argument). > Also, TAP on Windows is only available as an out-of-tree driver > from OpenVPN. And the TAP on Windows is quite, limited, deprecated, > poorly supported and buggy. There is no standard TAP like interface > in Windows. > > TAP on BSD is different than Linux and has different control functions. > Don't remember what the interface notification mechanism is on BSD, > it is not netlink. > > So is IF proxy even going to work on these other OS? No. At the moment only Linux is supported. I don't know much about Windows, it would need some TAP-like driver and implementation would probably make use of "IP Helper" library (some extra thread doing polling?). As for FreeBSD I'm convinced that very similar implementation is possible by using PF_ROUTE sockets. What the library does to help with other platforms is that it defines following structure: /* Every implementation should provide definition of this structure: * - init : called during library initialization (NULL when not needed) * - events : this should return bitmask of supported events (can be * NULL if all defined events are supported by the implementation) * - listen : this function should start service listening to the * network configuration events/changes, * - close : this function should close the service started by listen() * - get_info : this function should query system for current * configuration of interface with index 'if_index'. After * successful initialization of listening service this function is * called with 0 as an argument. In that case configuration of all * ports should be obtained - and when this procedure completes a * RTE_IFPX_CFG_DONE event should be signaled via * ifpx_notify_event(). */ extern struct ifpx_platform_callbacks { void (*init)(void); uint64_t (*events)(void); int (*listen)(void); int (*close)(void); void (*get_info)(int if_index); } ifpx_platform; With regards Andrzej Ostruszka
On Thu, Apr 16, 2020 at 10:34 PM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Thu, 16 Apr 2020 22:19:05 +0530 > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > On Thu, Apr 16, 2020 at 9:41 PM Stephen Hemminger > > <stephen@networkplumber.org> wrote: > > > > > > On Fri, 6 Mar 2020 17:41:00 +0100 > > > Andrzej Ostruszka <aostruszka@marvell.com> wrote: > > > > > > > What is this useful for > > > > ======================= > > > > > > > > Usually, when an ethernet port is assigned to DPDK it vanishes from the > > > > system and user looses ability to control it via normal configuration > > > > utilities (e.g. those from iproute2 package). Moreover by default DPDK > > > > application is not aware of the network configuration of the system. > > > > > > > > To address both of these issues application needs to: > > > > - add some command line interface (or other mechanism) allowing for > > > > control of the port and its configuration > > > > - query the status of network configuration and monitor its changes > > > > > > > > The purpose of this library is to help with both of these tasks (as long > > > > as they remain in domain of configuration available to the system). In > > > > other words, if DPDK application has some special needs, that cannot be > > > > addressed by the normal system configuration utilities, then they need > > > > to be solved by the application itself. > > > > > > > > The connection between DPDK and system is based on the existence of > > > > ports that are visible to both DPDK and system (like Tap, KNI and > > > > possibly some other drivers). These ports serve as an interface > > > > proxies. > > > > > > > > Let's visualize the action of the library by the following example: > > > > > > > > Linux | DPDK > > > > ============================================================== > > > > | > > > > | +-------+ +-------+ > > > > | | Port1 | | Port2 | > > > > "ip link set dev tap1 mtu 1600" | +-------+ +-------+ > > > > | | ^ ^ ^ > > > > | +------+ | mtu_change | | > > > > `->| Tap1 |---' callback | | > > > > +------+ | | > > > > "ip addr add 198.51.100.14 \ | | | > > > > dev tap2" | | | > > > > | +------+ | | > > > > +->| Tap2 |------------------' | > > > > | +------+ addr_add callback | > > > > "ip route add 198.0.2.0/24 \ | | | > > > > dev tap2" | | route_add callback | > > > > | `---------------------' > > > > > > Has anyone investigated solving this in the kernel rather than > > > creating the added overhead of more Linux devices? > > > > > > What I am thinking of is a netlink to userspace interface. > > > The kernel already has File-System-in-Userspace (FUSE) to allow > > > for filesystems. What about having a NUSE (Netlink in userspace)? > > > > IMO, there is no issue with the Linux Netlink _userspace_ interface. > > The goal of IF proxy to abstract the OS differences so that it can > > work with Linux, FreeBSD, and Windows(if needed). > > > > > > > > > > Then DPDK could have a daemon that is a provider to NUSE. > > > This solution would also benefit other non-DPDK projects like VPP > > > and allow DPDK to integrate with devlink etc. > > With the wider use of tap devices like this, it may be a problem > for other usages of TAP. If nothing else, having to figure out which > tap is which would be error prone. > > Also, TAP on Windows is only available as an out-of-tree driver > from OpenVPN. And the TAP on Windows is quite, limited, deprecated, > poorly supported and buggy. There is no standard TAP like interface > in Windows. > > TAP on BSD is different than Linux and has different control functions. > Don't remember what the interface notification mechanism is on BSD, > it is not netlink. > > So is IF proxy even going to work on these other OS? I dont know about Windows. BSD has a control interface. The library gives abstraction and public API definitions and driver interface. It is up to the implementer to implement driver API for a specific EAL environment. That would help us to not, directly calling Linux specific interface in the DPDK application. > > >