[dpdk-dev,v3,6/8] vhost: handle VHOST_USER_SEND_RARP request

Message ID 1454043483-24579-7-git-send-email-yuanhan.liu@linux.intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Yuanhan Liu Jan. 29, 2016, 4:58 a.m. UTC
  While in former patch we enabled GUEST_ANNOUNCE feature, so that the
guest OS will broadcast a GARP message after migration to notify the
switch about the new location of migrated VM, the thing is that
GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel,
VHOST_USER_SEND_RARP request comes to rescue.

The payload of this new request is the mac address of the migrated VM,
with that, we could construct a RARP message, and then broadcast it
to host interfaces.

That's how this patch works:

- list all interfaces, with the help of SIOCGIFCONF ioctl command

- construct an RARP message and broadcast it

Cc: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---

  Note that this patch did take effect in my test:

  - it indeed updated target vswitch's mac learning table. (with
    the "ovs fdb/show bridge" command)

  - the ping request packets after migration were indeeded flowed
    to the target (but not the source) host's vswitch. (with tcpdump
    command)

  However, I still saw ping lost. I asked help from Thibaut, the
  original author of the VHOST_USER_SEND_RARP request, he suggested
  that it might be an issue of my network topo, or ovs settings, which
  is likely, regarding to what I observed above.

  Anyway, I'd like to send this out, hopefully someone knows what's
  wrong there if there any.  In the meantime, I will do more debugs.
---
 lib/librte_vhost/vhost_user/vhost-net-user.c  |   4 +
 lib/librte_vhost/vhost_user/vhost-net-user.h  |   1 +
 lib/librte_vhost/vhost_user/virtio-net-user.c | 125 ++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |   5 +-
 4 files changed, 134 insertions(+), 1 deletion(-)
  

Comments

Jianfeng Tan Feb. 19, 2016, 6:11 a.m. UTC | #1
Hi Yuanhan,

On 1/29/2016 12:58 PM, Yuanhan Liu wrote:
> While in former patch we enabled GUEST_ANNOUNCE feature, so that the
> guest OS will broadcast a GARP message after migration to notify the
> switch about the new location of migrated VM, the thing is that
> GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel,
> VHOST_USER_SEND_RARP request comes to rescue.
>
> The payload of this new request is the mac address of the migrated VM,
> with that, we could construct a RARP message, and then broadcast it
> to host interfaces.
>
> That's how this patch works:
>
> - list all interfaces, with the help of SIOCGIFCONF ioctl command
>
> - construct an RARP message and broadcast it
>
> Cc: Thibaut Collet <thibaut.collet@6wind.com>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
...
> +
> +/*
> + * Broadcast a RARP message to all interfaces, to update
> + * switch's mac table
> + */
> +int
> +user_send_rarp(struct VhostUserMsg *msg)
> +{
> +	uint8_t *mac = (uint8_t *)&msg->payload.u64;
> +	uint8_t rarp[RARP_BUF_SIZE];
> +	struct ifconf ifc = {0, };
> +	struct ifreq *ifr;
> +	int nr = 16;
> +	int fd;
> +	uint32_t i;
> +
> +	RTE_LOG(DEBUG, VHOST_CONFIG,
> +		":: mac: %02x:%02x:%02x:%02x:%02x:%02x\n",
> +		mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
> +
> +	make_rarp_packet(rarp, mac);
> +
> +	/*
> +	 * Get all interfaces
> +	 */
> +	fd = socket(AF_INET, SOCK_DGRAM, 0);
> +	if (fd < 0) {
> +		perror("failed to create AF_INET socket");
> +		return -1;
> +	}
> +
> +again:
> +	ifc.ifc_len = sizeof(*ifr) * nr;
> +	ifc.ifc_buf = realloc(ifc.ifc_buf, ifc.ifc_len);
> +
> +	if (ioctl(fd, SIOCGIFCONF, &ifc) < 0) {
> +		perror("failed at SIOCGIFCONF");
> +		close(fd);
> +		return -1;
> +	}
> +
> +	if (ifc.ifc_len == (int)sizeof(struct ifreq) * nr) {
> +		/*
> +		 * current ifc_buf is not big enough to hold
> +		 * all interfaces; double it and try again.
> +		 */
> +		nr *= 2;
> +		goto again;
> +	}
> +
> +	ifr = (struct ifreq *)ifc.ifc_buf;
> +	for (i = 0; i < ifc.ifc_len / sizeof(struct ifreq); i++)
> +		send_rarp(ifr[i].ifr_name, rarp);
> +
> +	close(fd);
> +
> +	return 0;
> +}

 From how you implement user_send_rarp(), if I understand it correctly, 
it broadcasts this ARP packets to all host interfaces, which I don't 
think it's appropriate. This ARP packets should be sent to it's own L2 
networking. You should not make the hypothesis that all interfaces 
maintained in the kernel are in the same L2 networking. Even worse, this 
could bring problems when used in overlay networking, in which two VM in 
two different overlay networking, can have same MAC address.

What I suggest here is to move user_send_rarp() to 
rte_vhost_dequeue_burst() using a flag to control, so that this arp 
packet can be broadcasted in its own L2 network.

Thanks,
Jianfeng
  
Yuanhan Liu Feb. 19, 2016, 7:03 a.m. UTC | #2
On Fri, Feb 19, 2016 at 02:11:36PM +0800, Tan, Jianfeng wrote:
> Hi Yuanhan,
> 
> On 1/29/2016 12:58 PM, Yuanhan Liu wrote:
> >While in former patch we enabled GUEST_ANNOUNCE feature, so that the
> >guest OS will broadcast a GARP message after migration to notify the
> >switch about the new location of migrated VM, the thing is that
> >GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel,
> >VHOST_USER_SEND_RARP request comes to rescue.
> >
> >The payload of this new request is the mac address of the migrated VM,
> >with that, we could construct a RARP message, and then broadcast it
> >to host interfaces.
> >
> >That's how this patch works:
> >
> >- list all interfaces, with the help of SIOCGIFCONF ioctl command
> >
> >- construct an RARP message and broadcast it
> >
> >Cc: Thibaut Collet <thibaut.collet@6wind.com>
> >Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >---
> ...
> >+
> >+/*
> >+ * Broadcast a RARP message to all interfaces, to update
> >+ * switch's mac table
> >+ */
> >+int
> >+user_send_rarp(struct VhostUserMsg *msg)
> >+{
> >+	uint8_t *mac = (uint8_t *)&msg->payload.u64;
> >+	uint8_t rarp[RARP_BUF_SIZE];
> >+	struct ifconf ifc = {0, };
> >+	struct ifreq *ifr;
> >+	int nr = 16;
> >+	int fd;
> >+	uint32_t i;
> >+
> >+	RTE_LOG(DEBUG, VHOST_CONFIG,
> >+		":: mac: %02x:%02x:%02x:%02x:%02x:%02x\n",
> >+		mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
> >+
> >+	make_rarp_packet(rarp, mac);
> >+
> >+	/*
> >+	 * Get all interfaces
> >+	 */
> >+	fd = socket(AF_INET, SOCK_DGRAM, 0);
> >+	if (fd < 0) {
> >+		perror("failed to create AF_INET socket");
> >+		return -1;
> >+	}
> >+
> >+again:
> >+	ifc.ifc_len = sizeof(*ifr) * nr;
> >+	ifc.ifc_buf = realloc(ifc.ifc_buf, ifc.ifc_len);
> >+
> >+	if (ioctl(fd, SIOCGIFCONF, &ifc) < 0) {
> >+		perror("failed at SIOCGIFCONF");
> >+		close(fd);
> >+		return -1;
> >+	}
> >+
> >+	if (ifc.ifc_len == (int)sizeof(struct ifreq) * nr) {
> >+		/*
> >+		 * current ifc_buf is not big enough to hold
> >+		 * all interfaces; double it and try again.
> >+		 */
> >+		nr *= 2;
> >+		goto again;
> >+	}
> >+
> >+	ifr = (struct ifreq *)ifc.ifc_buf;
> >+	for (i = 0; i < ifc.ifc_len / sizeof(struct ifreq); i++)
> >+		send_rarp(ifr[i].ifr_name, rarp);
> >+
> >+	close(fd);
> >+
> >+	return 0;
> >+}
> 
> From how you implement user_send_rarp(), if I understand it correctly, it
> broadcasts this ARP packets to all host interfaces, which I don't think it's
> appropriate. This ARP packets should be sent to it's own L2 networking. You
> should not make the hypothesis that all interfaces maintained in the kernel
> are in the same L2 networking. Even worse, this could bring problems when
> used in overlay networking, in which two VM in two different overlay
> networking, can have same MAC address.
> 
> What I suggest here is to move user_send_rarp() to rte_vhost_dequeue_burst()
> using a flag to control, so that this arp packet can be broadcasted in its
> own L2 network.

I have thought of that, too. It was given up because SEND_RARP request was
handled in different thread from rte_vhost_dequeue_burst(), leading to the
fact that the RARP packet will not be broadcasted immediately after migration
is done: it will be broadcasted only when rte_vhost_dequeue_burst() is invoked.

I was thinking the delay might be a problem. While thinking it twice, it
doesn't look like one then. As GUEST_ANNOUNCE is also broadcasted by
rte_vhost_dequeue_burst(); it's enqueued by guest kernel though. And
judging that we are polling mode driver, it won't be an issue then.

So, thanks. I will give it a quick try; it should work.

	--yliu
  
Yuanhan Liu Feb. 19, 2016, 8:55 a.m. UTC | #3
On Fri, Feb 19, 2016 at 03:03:26PM +0800, Yuanhan Liu wrote:
> On Fri, Feb 19, 2016 at 02:11:36PM +0800, Tan, Jianfeng wrote:
> > What I suggest here is to move user_send_rarp() to rte_vhost_dequeue_burst()
> > using a flag to control, so that this arp packet can be broadcasted in its
> > own L2 network.
> 
> I have thought of that, too. It was given up because SEND_RARP request was
> handled in different thread from rte_vhost_dequeue_burst(), leading to the
> fact that the RARP packet will not be broadcasted immediately after migration
> is done: it will be broadcasted only when rte_vhost_dequeue_burst() is invoked.
> 
> I was thinking the delay might be a problem. While thinking it twice, it
> doesn't look like one then. As GUEST_ANNOUNCE is also broadcasted by
> rte_vhost_dequeue_burst(); it's enqueued by guest kernel though. And
> judging that we are polling mode driver, it won't be an issue then.
> 
> So, thanks. I will give it a quick try; it should work.

It worked like a charm :) Thanks.

	--yliu
  

Patch

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 32ad6f6..cb18396 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -100,6 +100,7 @@  static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SET_PROTOCOL_FEATURES]  = "VHOST_USER_SET_PROTOCOL_FEATURES",
 	[VHOST_USER_GET_QUEUE_NUM]  = "VHOST_USER_GET_QUEUE_NUM",
 	[VHOST_USER_SET_VRING_ENABLE]  = "VHOST_USER_SET_VRING_ENABLE",
+	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
 };
 
 /**
@@ -437,6 +438,9 @@  vserver_message_handler(int connfd, void *dat, int *remove)
 	case VHOST_USER_SET_VRING_ENABLE:
 		user_set_vring_enable(ctx, &msg.payload.state);
 		break;
+	case VHOST_USER_SEND_RARP:
+		user_send_rarp(&msg);
+		break;
 
 	default:
 		break;
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
index 6d252a3..e3bb413 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -67,6 +67,7 @@  typedef enum VhostUserRequest {
 	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
 	VHOST_USER_GET_QUEUE_NUM = 17,
 	VHOST_USER_SET_VRING_ENABLE = 18,
+	VHOST_USER_SEND_RARP = 19,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 0f3b163..cda330d 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -34,11 +34,18 @@ 
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <string.h>
 #include <unistd.h>
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <net/ethernet.h>
+#include <netinet/in.h>
+#include <netinet/if_ether.h>
+#include <linux/if_packet.h>
 
 #include <rte_common.h>
 #include <rte_log.h>
@@ -413,3 +420,121 @@  user_set_log_base(struct vhost_device_ctx ctx,
 
 	return 0;
 }
+
+#define RARP_BUF_SIZE	64
+
+static void
+make_rarp_packet(uint8_t *buf, uint8_t *mac)
+{
+	struct ether_header *eth_hdr;
+	struct ether_arp *rarp;
+
+	/* Ethernet header. */
+	eth_hdr = (struct ether_header *)buf;
+	memset(&eth_hdr->ether_dhost, 0xff, ETH_ALEN);
+	memcpy(&eth_hdr->ether_shost, mac,  ETH_ALEN);
+	eth_hdr->ether_type = htons(ETH_P_RARP);
+
+	/* RARP header. */
+	rarp = (struct ether_arp *)(eth_hdr + 1);
+	rarp->ea_hdr.ar_hrd = htons(ARPHRD_ETHER);
+	rarp->ea_hdr.ar_pro = htons(ETHERTYPE_IP);
+	rarp->ea_hdr.ar_hln = ETH_ALEN;
+	rarp->ea_hdr.ar_pln = 4;
+	rarp->ea_hdr.ar_op  = htons(ARPOP_RREQUEST);
+
+	memcpy(&rarp->arp_sha, mac, ETH_ALEN);
+	memset(&rarp->arp_spa, 0x00, 4);
+	memcpy(&rarp->arp_tha, mac, 6);
+	memset(&rarp->arp_tpa, 0x00, 4);
+}
+
+
+static void
+send_rarp(const char *ifname, uint8_t *rarp)
+{
+	int fd;
+	struct ifreq ifr;
+	struct sockaddr_ll addr;
+
+	fd = socket(AF_PACKET, SOCK_RAW, 0);
+	if (fd < 0) {
+		perror("socket failed");
+		return;
+	}
+
+	memset(&ifr, 0, sizeof(struct ifreq));
+	strncpy(ifr.ifr_name, ifname, IFNAMSIZ);
+	if (ioctl(fd, SIOCGIFINDEX, &ifr) < 0) {
+		perror("failed to get interface index");
+		close(fd);
+		return;
+	}
+
+	addr.sll_ifindex = ifr.ifr_ifindex;
+	addr.sll_halen   = ETH_ALEN;
+
+	if (sendto(fd, rarp, RARP_BUF_SIZE, 0,
+		   (const struct sockaddr*)&addr, sizeof(addr)) < 0) {
+		perror("send rarp packet failed");
+	}
+}
+
+
+/*
+ * Broadcast a RARP message to all interfaces, to update
+ * switch's mac table
+ */
+int
+user_send_rarp(struct VhostUserMsg *msg)
+{
+	uint8_t *mac = (uint8_t *)&msg->payload.u64;
+	uint8_t rarp[RARP_BUF_SIZE];
+	struct ifconf ifc = {0, };
+	struct ifreq *ifr;
+	int nr = 16;
+	int fd;
+	uint32_t i;
+
+	RTE_LOG(DEBUG, VHOST_CONFIG,
+		":: mac: %02x:%02x:%02x:%02x:%02x:%02x\n",
+		mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
+
+	make_rarp_packet(rarp, mac);
+
+	/*
+	 * Get all interfaces
+	 */
+	fd = socket(AF_INET, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		perror("failed to create AF_INET socket");
+		return -1;
+	}
+
+again:
+	ifc.ifc_len = sizeof(*ifr) * nr;
+	ifc.ifc_buf = realloc(ifc.ifc_buf, ifc.ifc_len);
+
+	if (ioctl(fd, SIOCGIFCONF, &ifc) < 0) {
+		perror("failed at SIOCGIFCONF");
+		close(fd);
+		return -1;
+	}
+
+	if (ifc.ifc_len == (int)sizeof(struct ifreq) * nr) {
+		/*
+		 * current ifc_buf is not big enough to hold
+		 * all interfaces; double it and try again.
+		 */
+		nr *= 2;
+		goto again;
+	}
+
+	ifr = (struct ifreq *)ifc.ifc_buf;
+	for (i = 0; i < ifc.ifc_len / sizeof(struct ifreq); i++)
+		send_rarp(ifr[i].ifr_name, rarp);
+
+	close(fd);
+
+	return 0;
+}
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
index 013cf38..1e9ff9a 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -38,8 +38,10 @@ 
 #include "vhost-net-user.h"
 
 #define VHOST_USER_PROTOCOL_F_MQ	0
+#define VHOST_USER_PROTOCOL_F_RARP	2
 
-#define VHOST_USER_PROTOCOL_FEATURES	(1ULL << VHOST_USER_PROTOCOL_F_MQ)
+#define VHOST_USER_PROTOCOL_FEATURES	((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
+					 (1ULL << VHOST_USER_PROTOCOL_F_RARP))
 
 int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *);
 
@@ -50,6 +52,7 @@  void user_set_vring_kick(struct vhost_device_ctx, struct VhostUserMsg *);
 void user_set_protocol_features(struct vhost_device_ctx ctx,
 				uint64_t protocol_features);
 int user_set_log_base(struct vhost_device_ctx ctx, struct VhostUserMsg *);
+int user_send_rarp(struct VhostUserMsg *);
 
 int user_get_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);