[dpdk-dev,v3,6/8] vhost: handle VHOST_USER_SEND_RARP request
Commit Message
While in former patch we enabled GUEST_ANNOUNCE feature, so that the
guest OS will broadcast a GARP message after migration to notify the
switch about the new location of migrated VM, the thing is that
GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel,
VHOST_USER_SEND_RARP request comes to rescue.
The payload of this new request is the mac address of the migrated VM,
with that, we could construct a RARP message, and then broadcast it
to host interfaces.
That's how this patch works:
- list all interfaces, with the help of SIOCGIFCONF ioctl command
- construct an RARP message and broadcast it
Cc: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
Note that this patch did take effect in my test:
- it indeed updated target vswitch's mac learning table. (with
the "ovs fdb/show bridge" command)
- the ping request packets after migration were indeeded flowed
to the target (but not the source) host's vswitch. (with tcpdump
command)
However, I still saw ping lost. I asked help from Thibaut, the
original author of the VHOST_USER_SEND_RARP request, he suggested
that it might be an issue of my network topo, or ovs settings, which
is likely, regarding to what I observed above.
Anyway, I'd like to send this out, hopefully someone knows what's
wrong there if there any. In the meantime, I will do more debugs.
---
lib/librte_vhost/vhost_user/vhost-net-user.c | 4 +
lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
lib/librte_vhost/vhost_user/virtio-net-user.c | 125 ++++++++++++++++++++++++++
lib/librte_vhost/vhost_user/virtio-net-user.h | 5 +-
4 files changed, 134 insertions(+), 1 deletion(-)
Comments
Hi Yuanhan,
On 1/29/2016 12:58 PM, Yuanhan Liu wrote:
> While in former patch we enabled GUEST_ANNOUNCE feature, so that the
> guest OS will broadcast a GARP message after migration to notify the
> switch about the new location of migrated VM, the thing is that
> GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel,
> VHOST_USER_SEND_RARP request comes to rescue.
>
> The payload of this new request is the mac address of the migrated VM,
> with that, we could construct a RARP message, and then broadcast it
> to host interfaces.
>
> That's how this patch works:
>
> - list all interfaces, with the help of SIOCGIFCONF ioctl command
>
> - construct an RARP message and broadcast it
>
> Cc: Thibaut Collet <thibaut.collet@6wind.com>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
...
> +
> +/*
> + * Broadcast a RARP message to all interfaces, to update
> + * switch's mac table
> + */
> +int
> +user_send_rarp(struct VhostUserMsg *msg)
> +{
> + uint8_t *mac = (uint8_t *)&msg->payload.u64;
> + uint8_t rarp[RARP_BUF_SIZE];
> + struct ifconf ifc = {0, };
> + struct ifreq *ifr;
> + int nr = 16;
> + int fd;
> + uint32_t i;
> +
> + RTE_LOG(DEBUG, VHOST_CONFIG,
> + ":: mac: %02x:%02x:%02x:%02x:%02x:%02x\n",
> + mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
> +
> + make_rarp_packet(rarp, mac);
> +
> + /*
> + * Get all interfaces
> + */
> + fd = socket(AF_INET, SOCK_DGRAM, 0);
> + if (fd < 0) {
> + perror("failed to create AF_INET socket");
> + return -1;
> + }
> +
> +again:
> + ifc.ifc_len = sizeof(*ifr) * nr;
> + ifc.ifc_buf = realloc(ifc.ifc_buf, ifc.ifc_len);
> +
> + if (ioctl(fd, SIOCGIFCONF, &ifc) < 0) {
> + perror("failed at SIOCGIFCONF");
> + close(fd);
> + return -1;
> + }
> +
> + if (ifc.ifc_len == (int)sizeof(struct ifreq) * nr) {
> + /*
> + * current ifc_buf is not big enough to hold
> + * all interfaces; double it and try again.
> + */
> + nr *= 2;
> + goto again;
> + }
> +
> + ifr = (struct ifreq *)ifc.ifc_buf;
> + for (i = 0; i < ifc.ifc_len / sizeof(struct ifreq); i++)
> + send_rarp(ifr[i].ifr_name, rarp);
> +
> + close(fd);
> +
> + return 0;
> +}
From how you implement user_send_rarp(), if I understand it correctly,
it broadcasts this ARP packets to all host interfaces, which I don't
think it's appropriate. This ARP packets should be sent to it's own L2
networking. You should not make the hypothesis that all interfaces
maintained in the kernel are in the same L2 networking. Even worse, this
could bring problems when used in overlay networking, in which two VM in
two different overlay networking, can have same MAC address.
What I suggest here is to move user_send_rarp() to
rte_vhost_dequeue_burst() using a flag to control, so that this arp
packet can be broadcasted in its own L2 network.
Thanks,
Jianfeng
On Fri, Feb 19, 2016 at 02:11:36PM +0800, Tan, Jianfeng wrote:
> Hi Yuanhan,
>
> On 1/29/2016 12:58 PM, Yuanhan Liu wrote:
> >While in former patch we enabled GUEST_ANNOUNCE feature, so that the
> >guest OS will broadcast a GARP message after migration to notify the
> >switch about the new location of migrated VM, the thing is that
> >GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel,
> >VHOST_USER_SEND_RARP request comes to rescue.
> >
> >The payload of this new request is the mac address of the migrated VM,
> >with that, we could construct a RARP message, and then broadcast it
> >to host interfaces.
> >
> >That's how this patch works:
> >
> >- list all interfaces, with the help of SIOCGIFCONF ioctl command
> >
> >- construct an RARP message and broadcast it
> >
> >Cc: Thibaut Collet <thibaut.collet@6wind.com>
> >Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >---
> ...
> >+
> >+/*
> >+ * Broadcast a RARP message to all interfaces, to update
> >+ * switch's mac table
> >+ */
> >+int
> >+user_send_rarp(struct VhostUserMsg *msg)
> >+{
> >+ uint8_t *mac = (uint8_t *)&msg->payload.u64;
> >+ uint8_t rarp[RARP_BUF_SIZE];
> >+ struct ifconf ifc = {0, };
> >+ struct ifreq *ifr;
> >+ int nr = 16;
> >+ int fd;
> >+ uint32_t i;
> >+
> >+ RTE_LOG(DEBUG, VHOST_CONFIG,
> >+ ":: mac: %02x:%02x:%02x:%02x:%02x:%02x\n",
> >+ mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
> >+
> >+ make_rarp_packet(rarp, mac);
> >+
> >+ /*
> >+ * Get all interfaces
> >+ */
> >+ fd = socket(AF_INET, SOCK_DGRAM, 0);
> >+ if (fd < 0) {
> >+ perror("failed to create AF_INET socket");
> >+ return -1;
> >+ }
> >+
> >+again:
> >+ ifc.ifc_len = sizeof(*ifr) * nr;
> >+ ifc.ifc_buf = realloc(ifc.ifc_buf, ifc.ifc_len);
> >+
> >+ if (ioctl(fd, SIOCGIFCONF, &ifc) < 0) {
> >+ perror("failed at SIOCGIFCONF");
> >+ close(fd);
> >+ return -1;
> >+ }
> >+
> >+ if (ifc.ifc_len == (int)sizeof(struct ifreq) * nr) {
> >+ /*
> >+ * current ifc_buf is not big enough to hold
> >+ * all interfaces; double it and try again.
> >+ */
> >+ nr *= 2;
> >+ goto again;
> >+ }
> >+
> >+ ifr = (struct ifreq *)ifc.ifc_buf;
> >+ for (i = 0; i < ifc.ifc_len / sizeof(struct ifreq); i++)
> >+ send_rarp(ifr[i].ifr_name, rarp);
> >+
> >+ close(fd);
> >+
> >+ return 0;
> >+}
>
> From how you implement user_send_rarp(), if I understand it correctly, it
> broadcasts this ARP packets to all host interfaces, which I don't think it's
> appropriate. This ARP packets should be sent to it's own L2 networking. You
> should not make the hypothesis that all interfaces maintained in the kernel
> are in the same L2 networking. Even worse, this could bring problems when
> used in overlay networking, in which two VM in two different overlay
> networking, can have same MAC address.
>
> What I suggest here is to move user_send_rarp() to rte_vhost_dequeue_burst()
> using a flag to control, so that this arp packet can be broadcasted in its
> own L2 network.
I have thought of that, too. It was given up because SEND_RARP request was
handled in different thread from rte_vhost_dequeue_burst(), leading to the
fact that the RARP packet will not be broadcasted immediately after migration
is done: it will be broadcasted only when rte_vhost_dequeue_burst() is invoked.
I was thinking the delay might be a problem. While thinking it twice, it
doesn't look like one then. As GUEST_ANNOUNCE is also broadcasted by
rte_vhost_dequeue_burst(); it's enqueued by guest kernel though. And
judging that we are polling mode driver, it won't be an issue then.
So, thanks. I will give it a quick try; it should work.
--yliu
On Fri, Feb 19, 2016 at 03:03:26PM +0800, Yuanhan Liu wrote:
> On Fri, Feb 19, 2016 at 02:11:36PM +0800, Tan, Jianfeng wrote:
> > What I suggest here is to move user_send_rarp() to rte_vhost_dequeue_burst()
> > using a flag to control, so that this arp packet can be broadcasted in its
> > own L2 network.
>
> I have thought of that, too. It was given up because SEND_RARP request was
> handled in different thread from rte_vhost_dequeue_burst(), leading to the
> fact that the RARP packet will not be broadcasted immediately after migration
> is done: it will be broadcasted only when rte_vhost_dequeue_burst() is invoked.
>
> I was thinking the delay might be a problem. While thinking it twice, it
> doesn't look like one then. As GUEST_ANNOUNCE is also broadcasted by
> rte_vhost_dequeue_burst(); it's enqueued by guest kernel though. And
> judging that we are polling mode driver, it won't be an issue then.
>
> So, thanks. I will give it a quick try; it should work.
It worked like a charm :) Thanks.
--yliu
@@ -100,6 +100,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_SET_PROTOCOL_FEATURES] = "VHOST_USER_SET_PROTOCOL_FEATURES",
[VHOST_USER_GET_QUEUE_NUM] = "VHOST_USER_GET_QUEUE_NUM",
[VHOST_USER_SET_VRING_ENABLE] = "VHOST_USER_SET_VRING_ENABLE",
+ [VHOST_USER_SEND_RARP] = "VHOST_USER_SEND_RARP",
};
/**
@@ -437,6 +438,9 @@ vserver_message_handler(int connfd, void *dat, int *remove)
case VHOST_USER_SET_VRING_ENABLE:
user_set_vring_enable(ctx, &msg.payload.state);
break;
+ case VHOST_USER_SEND_RARP:
+ user_send_rarp(&msg);
+ break;
default:
break;
@@ -67,6 +67,7 @@ typedef enum VhostUserRequest {
VHOST_USER_SET_PROTOCOL_FEATURES = 16,
VHOST_USER_GET_QUEUE_NUM = 17,
VHOST_USER_SET_VRING_ENABLE = 18,
+ VHOST_USER_SEND_RARP = 19,
VHOST_USER_MAX
} VhostUserRequest;
@@ -34,11 +34,18 @@
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
+#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <net/ethernet.h>
+#include <netinet/in.h>
+#include <netinet/if_ether.h>
+#include <linux/if_packet.h>
#include <rte_common.h>
#include <rte_log.h>
@@ -413,3 +420,121 @@ user_set_log_base(struct vhost_device_ctx ctx,
return 0;
}
+
+#define RARP_BUF_SIZE 64
+
+static void
+make_rarp_packet(uint8_t *buf, uint8_t *mac)
+{
+ struct ether_header *eth_hdr;
+ struct ether_arp *rarp;
+
+ /* Ethernet header. */
+ eth_hdr = (struct ether_header *)buf;
+ memset(ð_hdr->ether_dhost, 0xff, ETH_ALEN);
+ memcpy(ð_hdr->ether_shost, mac, ETH_ALEN);
+ eth_hdr->ether_type = htons(ETH_P_RARP);
+
+ /* RARP header. */
+ rarp = (struct ether_arp *)(eth_hdr + 1);
+ rarp->ea_hdr.ar_hrd = htons(ARPHRD_ETHER);
+ rarp->ea_hdr.ar_pro = htons(ETHERTYPE_IP);
+ rarp->ea_hdr.ar_hln = ETH_ALEN;
+ rarp->ea_hdr.ar_pln = 4;
+ rarp->ea_hdr.ar_op = htons(ARPOP_RREQUEST);
+
+ memcpy(&rarp->arp_sha, mac, ETH_ALEN);
+ memset(&rarp->arp_spa, 0x00, 4);
+ memcpy(&rarp->arp_tha, mac, 6);
+ memset(&rarp->arp_tpa, 0x00, 4);
+}
+
+
+static void
+send_rarp(const char *ifname, uint8_t *rarp)
+{
+ int fd;
+ struct ifreq ifr;
+ struct sockaddr_ll addr;
+
+ fd = socket(AF_PACKET, SOCK_RAW, 0);
+ if (fd < 0) {
+ perror("socket failed");
+ return;
+ }
+
+ memset(&ifr, 0, sizeof(struct ifreq));
+ strncpy(ifr.ifr_name, ifname, IFNAMSIZ);
+ if (ioctl(fd, SIOCGIFINDEX, &ifr) < 0) {
+ perror("failed to get interface index");
+ close(fd);
+ return;
+ }
+
+ addr.sll_ifindex = ifr.ifr_ifindex;
+ addr.sll_halen = ETH_ALEN;
+
+ if (sendto(fd, rarp, RARP_BUF_SIZE, 0,
+ (const struct sockaddr*)&addr, sizeof(addr)) < 0) {
+ perror("send rarp packet failed");
+ }
+}
+
+
+/*
+ * Broadcast a RARP message to all interfaces, to update
+ * switch's mac table
+ */
+int
+user_send_rarp(struct VhostUserMsg *msg)
+{
+ uint8_t *mac = (uint8_t *)&msg->payload.u64;
+ uint8_t rarp[RARP_BUF_SIZE];
+ struct ifconf ifc = {0, };
+ struct ifreq *ifr;
+ int nr = 16;
+ int fd;
+ uint32_t i;
+
+ RTE_LOG(DEBUG, VHOST_CONFIG,
+ ":: mac: %02x:%02x:%02x:%02x:%02x:%02x\n",
+ mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+
+ make_rarp_packet(rarp, mac);
+
+ /*
+ * Get all interfaces
+ */
+ fd = socket(AF_INET, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("failed to create AF_INET socket");
+ return -1;
+ }
+
+again:
+ ifc.ifc_len = sizeof(*ifr) * nr;
+ ifc.ifc_buf = realloc(ifc.ifc_buf, ifc.ifc_len);
+
+ if (ioctl(fd, SIOCGIFCONF, &ifc) < 0) {
+ perror("failed at SIOCGIFCONF");
+ close(fd);
+ return -1;
+ }
+
+ if (ifc.ifc_len == (int)sizeof(struct ifreq) * nr) {
+ /*
+ * current ifc_buf is not big enough to hold
+ * all interfaces; double it and try again.
+ */
+ nr *= 2;
+ goto again;
+ }
+
+ ifr = (struct ifreq *)ifc.ifc_buf;
+ for (i = 0; i < ifc.ifc_len / sizeof(struct ifreq); i++)
+ send_rarp(ifr[i].ifr_name, rarp);
+
+ close(fd);
+
+ return 0;
+}
@@ -38,8 +38,10 @@
#include "vhost-net-user.h"
#define VHOST_USER_PROTOCOL_F_MQ 0
+#define VHOST_USER_PROTOCOL_F_RARP 2
-#define VHOST_USER_PROTOCOL_FEATURES (1ULL << VHOST_USER_PROTOCOL_F_MQ)
+#define VHOST_USER_PROTOCOL_FEATURES ((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
+ (1ULL << VHOST_USER_PROTOCOL_F_RARP))
int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *);
@@ -50,6 +52,7 @@ void user_set_vring_kick(struct vhost_device_ctx, struct VhostUserMsg *);
void user_set_protocol_features(struct vhost_device_ctx ctx,
uint64_t protocol_features);
int user_set_log_base(struct vhost_device_ctx ctx, struct VhostUserMsg *);
+int user_send_rarp(struct VhostUserMsg *);
int user_get_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);