[dpdk-dev,v2,08/16] fm10k: add Vector RX scatter function

Message ID 1445507104-22563-9-git-send-email-jing.d.chen@intel.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Chen, Jing D Oct. 22, 2015, 9:44 a.m. UTC
  From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add func fm10k_recv_scattered_pkts_vec to receive chained packets
with SSE instructions.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
 drivers/net/fm10k/fm10k.h          |    2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   88 ++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)
  

Comments

Cunming Liang Oct. 27, 2015, 5:27 a.m. UTC | #1
Hi,

On 10/22/2015 5:44 PM, Chen Jing D(Mark) wrote:
> From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
>
> Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> with SSE instructions.
>
> Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> ---
>   drivers/net/fm10k/fm10k.h          |    2 +
>   drivers/net/fm10k/fm10k_rxtx_vec.c |   88 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 90 insertions(+), 0 deletions(-)
>
[...]
> +
> +/*
> + * vPMD receive routine that reassembles scattered packets
> + *
> + * Notice:
> + * - don't support ol_flags for rss and csum err
> + * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
> + * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan RTE_IXGBE_MAX_RX_BURST
> + *   numbers of DD bit
In order to make sure nb_pkts > RTE_IXGBE_MAX_RX_BURST, it's necessary 
to do RTE_MIN().
> + * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
> + */
> +uint16_t
> +fm10k_recv_scattered_pkts_vec(void *rx_queue,
> +				struct rte_mbuf **rx_pkts,
> +				uint16_t nb_pkts)
> +{
> +	struct fm10k_rx_queue *rxq = rx_queue;
> +	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
> +	unsigned i = 0;
> +
> +	/* get some new buffers */
> +	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
> +			split_flags);
> +	if (nb_bufs == 0)
> +		return 0;
> +
> +	/* happy day case, full burst + no packets to be joined */
> +	const uint64_t *split_fl64 = (uint64_t *)split_flags;
> +	if (rxq->pkt_first_seg == NULL &&
> +			split_fl64[0] == 0 && split_fl64[1] == 0 &&
> +			split_fl64[2] == 0 && split_fl64[3] == 0)
> +		return nb_bufs;
> +
> +	/* reassemble any packets that need reassembly*/
> +	if (rxq->pkt_first_seg == NULL) {
> +		/* find the first split flag, and only reassemble then*/
> +		while (i < nb_bufs && !split_flags[i])
> +			i++;
> +		if (i == nb_bufs)
> +			return nb_bufs;
> +	}
> +	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
> +		&split_flags[i]);
> +}
  
Chen, Jing D Oct. 27, 2015, 5:43 a.m. UTC | #2
Hi, Steve,

Best Regards,
Mark


> -----Original Message-----
> From: Liang, Cunming
> Sent: Tuesday, October 27, 2015 1:28 PM
> To: Chen, Jing D; dev@dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: Re: [PATCH v2 08/16] fm10k: add Vector RX scatter function
> 
> Hi,
> 
> On 10/22/2015 5:44 PM, Chen Jing D(Mark) wrote:
> > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> >
> > Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> > with SSE instructions.
> >
> > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > ---
> >   drivers/net/fm10k/fm10k.h          |    2 +
> >   drivers/net/fm10k/fm10k_rxtx_vec.c |   88
> ++++++++++++++++++++++++++++++++++++
> >   2 files changed, 90 insertions(+), 0 deletions(-)
> >
> [...]
> > +
> > +/*
> > + * vPMD receive routine that reassembles scattered packets
> > + *
> > + * Notice:
> > + * - don't support ol_flags for rss and csum err
> > + * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
> > + * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan
> RTE_IXGBE_MAX_RX_BURST
> > + *   numbers of DD bit
> In order to make sure nb_pkts > RTE_IXGBE_MAX_RX_BURST, it's necessary
> to do RTE_MIN().

I'll remove the improper comments. In func fm10k_recv_raw_pkts_vec, it will use
nb_pkts as index to iterate properly.
After then, below func will use actual received packet size nb_bufs as index to iterate.
So, I think RTE_MIN() is not necessary?

> > + * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
> > + */
> > +uint16_t
> > +fm10k_recv_scattered_pkts_vec(void *rx_queue,
> > +				struct rte_mbuf **rx_pkts,
> > +				uint16_t nb_pkts)
> > +{
> > +	struct fm10k_rx_queue *rxq = rx_queue;
> > +	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
> > +	unsigned i = 0;
> > +
> > +	/* get some new buffers */
> > +	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
> > +			split_flags);
> > +	if (nb_bufs == 0)
> > +		return 0;
> > +
> > +	/* happy day case, full burst + no packets to be joined */
> > +	const uint64_t *split_fl64 = (uint64_t *)split_flags;
> > +	if (rxq->pkt_first_seg == NULL &&
> > +			split_fl64[0] == 0 && split_fl64[1] == 0 &&
> > +			split_fl64[2] == 0 && split_fl64[3] == 0)
> > +		return nb_bufs;
> > +
> > +	/* reassemble any packets that need reassembly*/
> > +	if (rxq->pkt_first_seg == NULL) {
> > +		/* find the first split flag, and only reassemble then*/
> > +		while (i < nb_bufs && !split_flags[i])
> > +			i++;
> > +		if (i == nb_bufs)
> > +			return nb_bufs;
> > +	}
> > +	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
> > +		&split_flags[i]);
> > +}
  
Chen, Jing D Oct. 27, 2015, 5:55 a.m. UTC | #3
Hi, Steve,

Best Regards,
Mark


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Chen, Jing D
> Sent: Tuesday, October 27, 2015 1:44 PM
> To: Liang, Cunming; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter
> function
> 
> Hi, Steve,
> 
> Best Regards,
> Mark
> 
> 
> > -----Original Message-----
> > From: Liang, Cunming
> > Sent: Tuesday, October 27, 2015 1:28 PM
> > To: Chen, Jing D; dev@dpdk.org
> > Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> > Subject: Re: [PATCH v2 08/16] fm10k: add Vector RX scatter function
> >
> > Hi,
> >
> > On 10/22/2015 5:44 PM, Chen Jing D(Mark) wrote:
> > > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>
> > >
> > > Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> > > with SSE instructions.
> > >
> > > Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
> > > ---
> > >   drivers/net/fm10k/fm10k.h          |    2 +
> > >   drivers/net/fm10k/fm10k_rxtx_vec.c |   88
> > ++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 90 insertions(+), 0 deletions(-)
> > >
> > [...]
> > > +
> > > +/*
> > > + * vPMD receive routine that reassembles scattered packets
> > > + *
> > > + * Notice:
> > > + * - don't support ol_flags for rss and csum err
> > > + * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
> > > + * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan
> > RTE_IXGBE_MAX_RX_BURST
> > > + *   numbers of DD bit
> > In order to make sure nb_pkts > RTE_IXGBE_MAX_RX_BURST, it's
> necessary
> > to do RTE_MIN().

My bad. You indicates nb_pkts should be less or equal than RTE_IXGBE_MAX_TX_BURST.
I'll change accordingly.

> 
> I'll remove the improper comments. In func fm10k_recv_raw_pkts_vec, it
> will use
> nb_pkts as index to iterate properly.
> After then, below func will use actual received packet size nb_bufs as index
> to iterate.
> So, I think RTE_MIN() is not necessary?
> 
> > > + * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
> > > + */
> > > +uint16_t
> > > +fm10k_recv_scattered_pkts_vec(void *rx_queue,
> > > +				struct rte_mbuf **rx_pkts,
> > > +				uint16_t nb_pkts)
> > > +{
> > > +	struct fm10k_rx_queue *rxq = rx_queue;
> > > +	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
> > > +	unsigned i = 0;
> > > +
> > > +	/* get some new buffers */
> > > +	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
> > > +			split_flags);
> > > +	if (nb_bufs == 0)
> > > +		return 0;
> > > +
> > > +	/* happy day case, full burst + no packets to be joined */
> > > +	const uint64_t *split_fl64 = (uint64_t *)split_flags;
> > > +	if (rxq->pkt_first_seg == NULL &&
> > > +			split_fl64[0] == 0 && split_fl64[1] == 0 &&
> > > +			split_fl64[2] == 0 && split_fl64[3] == 0)
> > > +		return nb_bufs;
> > > +
> > > +	/* reassemble any packets that need reassembly*/
> > > +	if (rxq->pkt_first_seg == NULL) {
> > > +		/* find the first split flag, and only reassemble then*/
> > > +		while (i < nb_bufs && !split_flags[i])
> > > +			i++;
> > > +		if (i == nb_bufs)
> > > +			return nb_bufs;
> > > +	}
> > > +	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
> > > +		&split_flags[i]);
> > > +}
  

Patch

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 1502ae3..06697fa 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,4 +329,6 @@  uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
+uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
+					uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 96ca28b..237de9d 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -508,3 +508,91 @@  fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 {
 	return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
+
+static inline uint16_t
+fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
+		struct rte_mbuf **rx_bufs,
+		uint16_t nb_bufs, uint8_t *split_flags)
+{
+	struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
+	struct rte_mbuf *start = rxq->pkt_first_seg;
+	struct rte_mbuf *end =  rxq->pkt_last_seg;
+	unsigned pkt_idx, buf_idx;
+
+
+	for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+		if (end != NULL) {
+			/* processing a split packet */
+			end->next = rx_bufs[buf_idx];
+			start->nb_segs++;
+			start->pkt_len += rx_bufs[buf_idx]->data_len;
+			end = end->next;
+
+			if (!split_flags[buf_idx]) {
+				/* it's the last packet of the set */
+				start->hash = end->hash;
+				start->ol_flags = end->ol_flags;
+				pkts[pkt_idx++] = start;
+				start = end = NULL;
+			}
+		} else {
+			/* not processing a split packet */
+			if (!split_flags[buf_idx]) {
+				/* not a split packet, save and skip */
+				pkts[pkt_idx++] = rx_bufs[buf_idx];
+				continue;
+			}
+			end = start = rx_bufs[buf_idx];
+		}
+	}
+
+	/* save the partial packet for next time */
+	rxq->pkt_first_seg = start;
+	rxq->pkt_last_seg = end;
+	memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
+	return pkt_idx;
+}
+
+/*
+ * vPMD receive routine that reassembles scattered packets
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
+ * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan RTE_IXGBE_MAX_RX_BURST
+ *   numbers of DD bit
+ * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
+ */
+uint16_t
+fm10k_recv_scattered_pkts_vec(void *rx_queue,
+				struct rte_mbuf **rx_pkts,
+				uint16_t nb_pkts)
+{
+	struct fm10k_rx_queue *rxq = rx_queue;
+	uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
+	unsigned i = 0;
+
+	/* get some new buffers */
+	uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
+			split_flags);
+	if (nb_bufs == 0)
+		return 0;
+
+	/* happy day case, full burst + no packets to be joined */
+	const uint64_t *split_fl64 = (uint64_t *)split_flags;
+	if (rxq->pkt_first_seg == NULL &&
+			split_fl64[0] == 0 && split_fl64[1] == 0 &&
+			split_fl64[2] == 0 && split_fl64[3] == 0)
+		return nb_bufs;
+
+	/* reassemble any packets that need reassembly*/
+	if (rxq->pkt_first_seg == NULL) {
+		/* find the first split flag, and only reassemble then*/
+		while (i < nb_bufs && !split_flags[i])
+			i++;
+		if (i == nb_bufs)
+			return nb_bufs;
+	}
+	return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
+		&split_flags[i]);
+}