[dpdk-dev,v3,2/2] net/virtio: support GUEST ANNOUNCE

Message ID 1515051700-117262-3-git-send-email-xiao.w.wang@intel.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Xiao Wang Jan. 4, 2018, 7:41 a.m. UTC
  When live migration is done, for the backup VM, either the virtio
frontend or the vhost backend needs to send out gratuitous RARP packet
to announce its new network location.

This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support live
migration scenario where the vhost backend doesn't have the ability to
generate RARP packet.

Brief introduction of the work flow:
1. QEMU finishes live migration, pokes the backup VM with an interrupt.
2. Virtio interrupt handler reads out the interrupt status value, and
   realizes it needs to send out RARP packet to announce its location.
3. Pause device to stop worker thread touching the queues.
4. Inject a RARP packet into a Tx Queue.
5. Ack the interrupt via control queue.
6. Resume device to continue packet processing.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v3:
- Remove Tx function code duplication, use a special pointer for rarp
  injection.
- Rename function generate_rarp to virtio_notify_peers, replace
  'virtnet_' with 'virtio_'.
- Add comment for state_lock.
- Typo fix and comment improvement.

v2:
- Use rte_pktmbuf_mtod directly to get eth_hdr addr.
- Fix virtio_dev_pause return value check.
---
 drivers/net/virtio/virtio_ethdev.c      | 138 +++++++++++++++++++++++++++++++-
 drivers/net/virtio/virtio_ethdev.h      |   1 +
 drivers/net/virtio/virtio_pci.h         |   7 ++
 drivers/net/virtio/virtio_rxtx.c        |   2 +-
 drivers/net/virtio/virtio_rxtx_simple.c |   2 +-
 drivers/net/virtio/virtqueue.h          |  11 +++
 6 files changed, 157 insertions(+), 4 deletions(-)
  

Comments

Tiwei Bie Jan. 4, 2018, 2:51 a.m. UTC | #1
Hi Xiao,

On Wed, Jan 03, 2018 at 11:41:40PM -0800, Xiao Wang wrote:
[...]
> +static int
> +virtio_dev_pause(struct rte_eth_dev *dev)
> +{
> +	struct virtio_hw *hw = dev->data->dev_private;
> +
> +	if (hw->started == 0)
> +		return -1;
> +	hw->started = 0;
> +	/*
> +	 * Prevent the worker thread from touching queues to avoid contention,
> +	 * 1 ms should be enough for the ongoing Tx function to finish.
> +	 */
> +	rte_delay_ms(1);
> +	return 0;
> +}
> +
> +static void
> +virtio_dev_resume(struct rte_eth_dev *dev)
> +{
> +	struct virtio_hw *hw = dev->data->dev_private;
> +
> +	hw->started = 1;
> +}

Based on your current implementation, hw->state_lock needs to
be held during a call of virtio_dev_pause()..virtio_dev_resume().
So I think the code would be more readable and much easier to
use if we take the lock in virtio_dev_pause() and release the
lock in virtio_dev_resume().

> +
> +static void
> +virtio_notify_peers(struct rte_eth_dev *dev)
> +{
> +	struct virtio_hw *hw = dev->data->dev_private;
> +	struct virtnet_tx *txvq = dev->data->tx_queues[0];
> +	struct virtnet_rx *rxvq = dev->data->rx_queues[0];
> +
> +	hw->rarp_buf[0] = rte_mbuf_raw_alloc(rxvq->mpool);
> +	if (hw->rarp_buf[0] == NULL) {
> +		PMD_DRV_LOG(ERR, "first mbuf allocate failed");
> +		return;
> +	}
> +
> +	if (make_rarp_packet(hw->rarp_buf[0],
> +				(struct ether_addr *)hw->mac_addr)) {
> +		rte_pktmbuf_free(hw->rarp_buf[0]);
> +		return;
> +	}
> +
> +	/* If virtio port just stopped, no need to send RARP */
> +	if (virtio_dev_pause(dev) < 0) {
> +		rte_pktmbuf_free(hw->rarp_buf[0]);
> +		return;
> +	}
> +
> +	dev->tx_pkt_burst(txvq, hw->rarp_buf, 1);

You have already provided virtio_dev_pause()/virtio_dev_resume().
I think you can also make this part generic and provide an inject
function, e.g.:

uint16_t
virtio_inject_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
nb_pkts)
{
	......

	txvq->inject_pkts = tx_pkts;
	nb_tx = dev->tx_pkt_burst(txvq, tx_pkts, nb_pkts);
	txvq->inject_pkts = NULL;

	return nb_tx;
}

And you can introduce virtio_dev_pause()/virtio_dev_resume()/
virtio_injec... in a separate patch. And introduce the GUEST
ANNOUNCE support in the third patch.

Thanks,
Tiwei
  
Xiao Wang Jan. 4, 2018, 7:11 a.m. UTC | #2
Hi Tiwei,

> -----Original Message-----

> From: Bie, Tiwei

> Sent: Thursday, January 4, 2018 10:51 AM

> To: Wang, Xiao W <xiao.w.wang@intel.com>

> Cc: dev@dpdk.org; yliu@fridaylinux.org; stephen@networkplumber.org

> Subject: Re: [PATCH v3 2/2] net/virtio: support GUEST ANNOUNCE

> 

> Hi Xiao,

> 

> On Wed, Jan 03, 2018 at 11:41:40PM -0800, Xiao Wang wrote:

> [...]

> > +static int

> > +virtio_dev_pause(struct rte_eth_dev *dev)

> > +{

> > +	struct virtio_hw *hw = dev->data->dev_private;

> > +

> > +	if (hw->started == 0)

> > +		return -1;

> > +	hw->started = 0;

> > +	/*

> > +	 * Prevent the worker thread from touching queues to avoid

> contention,

> > +	 * 1 ms should be enough for the ongoing Tx function to finish.

> > +	 */

> > +	rte_delay_ms(1);

> > +	return 0;

> > +}

> > +

> > +static void

> > +virtio_dev_resume(struct rte_eth_dev *dev)

> > +{

> > +	struct virtio_hw *hw = dev->data->dev_private;

> > +

> > +	hw->started = 1;

> > +}

> 

> Based on your current implementation, hw->state_lock needs to

> be held during a call of virtio_dev_pause()..virtio_dev_resume().

> So I think the code would be more readable and much easier to

> use if we take the lock in virtio_dev_pause() and release the

> lock in virtio_dev_resume().


Agree, will improve it in next version.

> 

> > +

> > +static void

> > +virtio_notify_peers(struct rte_eth_dev *dev)

> > +{

> > +	struct virtio_hw *hw = dev->data->dev_private;

> > +	struct virtnet_tx *txvq = dev->data->tx_queues[0];

> > +	struct virtnet_rx *rxvq = dev->data->rx_queues[0];

> > +

> > +	hw->rarp_buf[0] = rte_mbuf_raw_alloc(rxvq->mpool);

> > +	if (hw->rarp_buf[0] == NULL) {

> > +		PMD_DRV_LOG(ERR, "first mbuf allocate failed");

> > +		return;

> > +	}

> > +

> > +	if (make_rarp_packet(hw->rarp_buf[0],

> > +				(struct ether_addr *)hw->mac_addr)) {

> > +		rte_pktmbuf_free(hw->rarp_buf[0]);

> > +		return;

> > +	}

> > +

> > +	/* If virtio port just stopped, no need to send RARP */

> > +	if (virtio_dev_pause(dev) < 0) {

> > +		rte_pktmbuf_free(hw->rarp_buf[0]);

> > +		return;

> > +	}

> > +

> > +	dev->tx_pkt_burst(txvq, hw->rarp_buf, 1);

> 

> You have already provided virtio_dev_pause()/virtio_dev_resume().

> I think you can also make this part generic and provide an inject

> function, e.g.:

> 

> uint16_t

> virtio_inject_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t

> nb_pkts)

> {

> 	......

> 

> 	txvq->inject_pkts = tx_pkts;

> 	nb_tx = dev->tx_pkt_burst(txvq, tx_pkts, nb_pkts);

> 	txvq->inject_pkts = NULL;

> 

> 	return nb_tx;

> }

> 

> And you can introduce virtio_dev_pause()/virtio_dev_resume()/

> virtio_injec... in a separate patch. And introduce the GUEST

> ANNOUNCE support in the third patch.


That would be a better patch organization, thanks! Will make a v4.

BRs,
Xiao
  
Xiao Wang Jan. 4, 2018, 3:59 p.m. UTC | #3
v4:
- Move spinlock lock/unlock into dev_pause/resume.
- Separate out a patch for packet injection.

v3:
- Remove Tx function code duplication, use a special pointer for rarp
  injection.
- Rename function generate_rarp to virtio_notify_peers, replace
  'virtnet_' with 'virtio_'.
- Add comment for state_lock.
- Typo fix and comment improvement.

v2:
- Use spaces instead of tabs between the code and comments.
- Remove unnecessary parentheses.
- Use rte_pktmbuf_mtod directly to get eth_hdr addr.
- Fix virtio_dev_pause return value check.

Xiao Wang (2):
  net/virtio: make control queue thread-safe
  net/virtio: support GUEST ANNOUNCE

 drivers/net/virtio/virtio_ethdev.c      | 145 +++++++++++++++++++++++++++++++-
 drivers/net/virtio/virtio_ethdev.h      |   1 +
 drivers/net/virtio/virtio_pci.h         |   7 ++
 drivers/net/virtio/virtio_rxtx.c        |   3 +-
 drivers/net/virtio/virtio_rxtx.h        |   1 +
 drivers/net/virtio/virtio_rxtx_simple.c |   2 +-

Xiao Wang (3):
  net/virtio: make control queue thread-safe
  net/virtio: add packet injection method
  net/virtio: support GUEST ANNOUNCE

 drivers/net/virtio/virtio_ethdev.c      | 159 +++++++++++++++++++++++++++++++-
 drivers/net/virtio/virtio_ethdev.h      |   7 ++
 drivers/net/virtio/virtio_pci.h         |   7 ++
 drivers/net/virtio/virtio_rxtx.c        |   3 +-
 drivers/net/virtio/virtio_rxtx.h        |   1 +
 drivers/net/virtio/virtio_rxtx_simple.c |   2 +-
 drivers/net/virtio/virtqueue.h          |  11 +++
 7 files changed, 185 insertions(+), 5 deletions(-)
  

Patch

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index ac73950..80bad52 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -48,6 +48,8 @@ 
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
 #include <rte_ether.h>
+#include <rte_ip.h>
+#include <rte_arp.h>
 #include <rte_common.h>
 #include <rte_errno.h>
 #include <rte_cpuflags.h>
@@ -55,6 +57,7 @@ 
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_dev.h>
+#include <rte_cycles.h>
 
 #include "virtio_ethdev.h"
 #include "virtio_pci.h"
@@ -106,6 +109,13 @@  static int virtio_dev_queue_stats_mapping_set(
 	uint8_t stat_idx,
 	uint8_t is_rx);
 
+static int make_rarp_packet(struct rte_mbuf *rarp_mbuf,
+		const struct ether_addr *mac);
+static int virtio_dev_pause(struct rte_eth_dev *dev);
+static void virtio_dev_resume(struct rte_eth_dev *dev);
+static void virtio_notify_peers(struct rte_eth_dev *dev);
+static void virtio_ack_link_announce(struct rte_eth_dev *dev);
+
 /*
  * The set of PCI devices this driver supports
  */
@@ -1249,9 +1259,116 @@  static int virtio_dev_xstats_get_names(struct rte_eth_dev *dev,
 	return 0;
 }
 
+#define RARP_PKT_SIZE	64
+static int
+make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac)
+{
+	struct ether_hdr *eth_hdr;
+	struct arp_hdr  *rarp;
+
+	if (rarp_mbuf->buf_len < RARP_PKT_SIZE) {
+		PMD_DRV_LOG(ERR, "mbuf size too small %u (< %d)",
+				rarp_mbuf->buf_len, RARP_PKT_SIZE);
+		return -1;
+	}
+
+	/* Ethernet header. */
+	eth_hdr = rte_pktmbuf_mtod(rarp_mbuf, struct ether_hdr *);
+	memset(eth_hdr->d_addr.addr_bytes, 0xff, ETHER_ADDR_LEN);
+	ether_addr_copy(mac, &eth_hdr->s_addr);
+	eth_hdr->ether_type = htons(ETHER_TYPE_RARP);
+
+	/* RARP header. */
+	rarp = (struct arp_hdr *)(eth_hdr + 1);
+	rarp->arp_hrd = htons(ARP_HRD_ETHER);
+	rarp->arp_pro = htons(ETHER_TYPE_IPv4);
+	rarp->arp_hln = ETHER_ADDR_LEN;
+	rarp->arp_pln = 4;
+	rarp->arp_op  = htons(ARP_OP_REVREQUEST);
+
+	ether_addr_copy(mac, &rarp->arp_data.arp_sha);
+	ether_addr_copy(mac, &rarp->arp_data.arp_tha);
+	memset(&rarp->arp_data.arp_sip, 0x00, 4);
+	memset(&rarp->arp_data.arp_tip, 0x00, 4);
+
+	rarp_mbuf->data_len = RARP_PKT_SIZE;
+	rarp_mbuf->pkt_len = RARP_PKT_SIZE;
+
+	return 0;
+}
+
+static int
+virtio_dev_pause(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	if (hw->started == 0)
+		return -1;
+	hw->started = 0;
+	/*
+	 * Prevent the worker thread from touching queues to avoid contention,
+	 * 1 ms should be enough for the ongoing Tx function to finish.
+	 */
+	rte_delay_ms(1);
+	return 0;
+}
+
+static void
+virtio_dev_resume(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	hw->started = 1;
+}
+
+static void
+virtio_notify_peers(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtnet_tx *txvq = dev->data->tx_queues[0];
+	struct virtnet_rx *rxvq = dev->data->rx_queues[0];
+
+	hw->rarp_buf[0] = rte_mbuf_raw_alloc(rxvq->mpool);
+	if (hw->rarp_buf[0] == NULL) {
+		PMD_DRV_LOG(ERR, "first mbuf allocate failed");
+		return;
+	}
+
+	if (make_rarp_packet(hw->rarp_buf[0],
+				(struct ether_addr *)hw->mac_addr)) {
+		rte_pktmbuf_free(hw->rarp_buf[0]);
+		return;
+	}
+
+	/* If virtio port just stopped, no need to send RARP */
+	if (virtio_dev_pause(dev) < 0) {
+		rte_pktmbuf_free(hw->rarp_buf[0]);
+		return;
+	}
+
+	dev->tx_pkt_burst(txvq, hw->rarp_buf, 1);
+	/* Recover the stored hw status to let worker thread continue */
+	virtio_dev_resume(dev);
+}
+
+static void
+virtio_ack_link_announce(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int len;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_ANNOUNCE;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_ANNOUNCE_ACK;
+	len = 0;
+
+	virtio_send_command(hw->cvq, &ctrl, &len, 0);
+}
+
 /*
- * Process Virtio Config changed interrupt and call the callback
- * if link state changed.
+ * Process virtio config changed interrupt. Call the callback
+ * if link state changed, generate gratuitous RARP packet if
+ * the status indicates an ANNOUNCE.
  */
 void
 virtio_interrupt_handler(void *param)
@@ -1274,6 +1391,12 @@  static int virtio_dev_xstats_get_names(struct rte_eth_dev *dev,
 						      NULL, NULL);
 	}
 
+	if (isr & VIRTIO_NET_S_ANNOUNCE) {
+		rte_spinlock_lock(&hw->state_lock);
+		virtio_notify_peers(dev);
+		virtio_ack_link_announce(dev);
+		rte_spinlock_unlock(&hw->state_lock);
+	}
 }
 
 /* set rx and tx handlers according to what is supported */
@@ -1786,6 +1909,8 @@  static int eth_virtio_pci_remove(struct rte_pci_device *pci_dev)
 			return -EBUSY;
 		}
 
+	rte_spinlock_init(&hw->state_lock);
+
 	hw->use_simple_rx = 1;
 	hw->use_simple_tx = 1;
 
@@ -1892,6 +2017,12 @@  static int eth_virtio_pci_remove(struct rte_pci_device *pci_dev)
 	/* Initialize Link state */
 	virtio_dev_link_update(dev, 0);
 
+	hw->rarp_buf = rte_zmalloc("rarp_buf", sizeof(struct rte_mbuf *), 0);
+	if (!hw->rarp_buf) {
+		PMD_INIT_LOG(ERR, "Failed to allocate rarp pointer");
+		return -ENOMEM;
+	}
+
 	return 0;
 }
 
@@ -1952,12 +2083,15 @@  static void virtio_dev_free_mbufs(struct rte_eth_dev *dev)
 
 	PMD_INIT_LOG(DEBUG, "stop");
 
+	rte_spinlock_lock(&hw->state_lock);
 	if (intr_conf->lsc || intr_conf->rxq)
 		virtio_intr_disable(dev);
 
 	hw->started = 0;
 	memset(&link, 0, sizeof(link));
 	virtio_dev_atomic_write_link_status(dev, &link);
+	rte_free(hw->rarp_buf);
+	rte_spinlock_unlock(&hw->state_lock);
 }
 
 static int
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 2039bc5..58faa22 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -67,6 +67,7 @@ 
 	 1u << VIRTIO_NET_F_HOST_TSO6	  |	\
 	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
 	 1u << VIRTIO_NET_F_MTU	| \
+	 1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE  | \
 	 1u << VIRTIO_RING_F_INDIRECT_DESC |    \
 	 1ULL << VIRTIO_F_VERSION_1       |	\
 	 1ULL << VIRTIO_F_IOMMU_PLATFORM)
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 3c5ce66..7bd6c50 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -270,6 +270,13 @@  struct virtio_hw {
 	struct virtio_pci_common_cfg *common_cfg;
 	struct virtio_net_config *dev_cfg;
 	void	    *virtio_user_dev;
+	/*
+	 * App management thread and virtio interrupt handler thread
+	 * both can change the 'started' flag, this lock is meant to
+	 * avoid such a contention.
+	 */
+	rte_spinlock_t state_lock;
+	struct rte_mbuf **rarp_buf;
 
 	struct virtqueue **vqs;
 };
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 6a24fde..6bae8a6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -1017,7 +1017,7 @@ 
 	uint16_t nb_used, nb_tx = 0;
 	int error;
 
-	if (unlikely(hw->started == 0))
+	if (unlikely(hw->started == 0) && tx_pkts != hw->rarp_buf)
 		return nb_tx;
 
 	if (unlikely(nb_pkts < 1))
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index b5bc1c4..a04f20e 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -99,7 +99,7 @@  int __attribute__((cold))
 	uint16_t desc_idx_max = (vq->vq_nentries >> 1) - 1;
 	uint16_t nb_tx = 0;
 
-	if (unlikely(hw->started == 0))
+	if (unlikely(hw->started == 0) && tx_pkts != hw->rarp_buf)
 		return nb_tx;
 
 	nb_used = VIRTQUEUE_NUSED(vq);
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 2305d91..d9045e1 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -158,6 +158,17 @@  struct virtio_net_ctrl_mac {
 #define VIRTIO_NET_CTRL_VLAN_ADD 0
 #define VIRTIO_NET_CTRL_VLAN_DEL 1
 
+/*
+ * Control link announce acknowledgement
+ *
+ * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
+ * driver has recevied the notification; device would clear the
+ * VIRTIO_NET_S_ANNOUNCE bit in the status field after it receives
+ * this command.
+ */
+#define VIRTIO_NET_CTRL_ANNOUNCE     3
+#define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
+
 struct virtio_net_ctrl_hdr {
 	uint8_t class;
 	uint8_t cmd;