[v2,1/9] ethdev: introduce Rx buffer split

Message ID 1602083215-22921-2-git-send-email-viacheslavo@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series ethdev: introduce Rx buffer split |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Slava Ovsiienko Oct. 7, 2020, 3:06 p.m. UTC
  The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length */
    uint16_t offset; /* data offset from beginning of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rx_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
		          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf
    n_seg - number of elements in the array

The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will allocate the first mbuf
from the pool specified in the first segment descriptor and puts
the data staring at specified offset in the allocated mbuf data
buffer. If packet length exceeds the specified segment length
the next mbuf will be allocated according to the next segment
descriptor (if any) and data will be put in its data buffer at
specified offset and not exceeding specified length. If there is
no next descriptor the next mbuf will be allocated and filled in the
same way (from the same pool and with the same buffer offset/length)
as the current one.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
    seg1 - pool1, len1=20B, off1=0B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B long @ 0 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B @ 0 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload DEV_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Also, the proposed segment description might be used to specify
Rx packet split for some other features. For example, provide
the way to specify the extra memory pool for the Header Split
feature of some Intel PMD.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/features.rst           |  15 +++
 doc/guides/rel_notes/release_20_11.rst |   6 ++
 lib/librte_ethdev/rte_ethdev.c         | 174 +++++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h         |  16 +++
 lib/librte_ethdev/rte_ethdev_driver.h  |  10 ++
 5 files changed, 221 insertions(+)
  

Comments

Thomas Monjalon Oct. 11, 2020, 10:17 p.m. UTC | #1
07/10/2020 17:06, Viacheslav Ovsiienko:
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
> 
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
> 
> The following structure is introduced to specify the Rx packet
> segment:
> 
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length */

The "length" parameter is configuring a split point.
Worth to note in the comment I think.

>     uint16_t offset; /* data offset from beginning of mbuf data buffer */

Is it replacing RTE_PKTMBUF_HEADROOM?

>     uint32_t reserved; /* reserved field */
> };
> 
> The new routine rte_eth_rx_queue_setup_ex() is introduced to
> setup the given Rx queue using the new extended Rx packet segment
> description:
> 
> int
> rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
>                           uint16_t nb_rx_desc, unsigned int socket_id,
>                           const struct rte_eth_rxconf *rx_conf,
> 		          const struct rte_eth_rxseg *rx_seg,
>                           uint16_t n_seg)

An alternative name for this function:
	rte_eth_rxseg_queue_setup

> This routine presents the two new parameters:
>     rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf
>     n_seg - number of elements in the array

Not clear why we need an array.
I suggest writing here that each segment of the same packet
can have different properties, the array representing the full packet.

> The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device

The name should start with RTE_ prefix.

> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
> application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.
> 
> If the Rx queue is configured with new routine the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will allocate the first mbuf
> from the pool specified in the first segment descriptor and puts
> the data staring at specified offset in the allocated mbuf data
> buffer. If packet length exceeds the specified segment length
> the next mbuf will be allocated according to the next segment
> descriptor (if any) and data will be put in its data buffer at
> specified offset and not exceeding specified length. If there is
> no next descriptor the next mbuf will be allocated and filled in the
> same way (from the same pool and with the same buffer offset/length)
> as the current one.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
>     seg1 - pool1, len1=20B, off1=0B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B
> 
> The packet 46 bytes long will look like the following:
>     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>     seg1 - 20B long @ 0 in mbuf from pool1
>     seg2 - 12B long @ 0 in mbuf from pool2
> 
> The packet 1500 bytes long will look like the following:
>     seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>     seg1 - 20B @ 0 in mbuf from pool1
>     seg2 - 20B @ 0 in mbuf from pool2
>     seg3 - 512B @ 0 in mbuf from pool3
>     seg4 - 512B @ 0 in mbuf from pool3
>     seg5 - 422B @ 0 in mbuf from pool3
> 
> The offload DEV_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if n_seg
> is greater than one).
> 
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
> 
> Also, the proposed segment description might be used to specify
> Rx packet split for some other features. For example, provide
> the way to specify the extra memory pool for the Header Split
> feature of some Intel PMD.

I don't understand what you are referring in this last paragraph.
I think explanation above is enough to demonstrate the flexibility.

> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

Thank you, I like this feature.
More minor comments below.

[...]
> +* **Introduced extended buffer description for receiving.**

Rewording:
	Introduced extended setup of Rx queue

> +  * Added extended Rx queue setup routine
> +  * Added description for Rx segment sizes

not only "sizes", but also offset and mempool.

> +  * Added capability to specify the memory pool for each segment

This one can be merged with the above, or offset should be added.

[...]
The doxygen comment is missing here.

> +int rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> +		uint16_t nb_rx_desc, unsigned int socket_id,
> +		const struct rte_eth_rxconf *rx_conf,
> +		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);

This new function should be experimental and it should be added to the .map file.
  
Slava Ovsiienko Oct. 12, 2020, 9:40 a.m. UTC | #2
Hi, Thomas

Thank you for the comments, please, see my answers below.

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, October 12, 2020 1:18
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org; ferruh.yigit@intel.com;
> olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com;
> arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v2 1/9] ethdev: introduce Rx buffer split
> 
> 07/10/2020 17:06, Viacheslav Ovsiienko:
> > The DPDK datapath in the transmit direction is very flexible.
> > An application can build the multi-segment packet and manages almost
> > all data aspects - the memory pools where segments are allocated from,
> > the segment lengths, the memory attributes like external buffers,
> > registered for DMA, etc.
> >
> > In the receiving direction, the datapath is much less flexible, an
> > application can only specify the memory pool to configure the
> > receiving queue and nothing more. In order to extend receiving
> > datapath capabilities it is proposed to add the way to provide
> > extended information how to split the packets being received.
> >
> > The following structure is introduced to specify the Rx packet
> > segment:
> >
> > struct rte_eth_rxseg {
> >     struct rte_mempool *mp; /* memory pools to allocate segment from */
> >     uint16_t length; /* segment maximal data length */
> 
> The "length" parameter is configuring a split point.
> Worth to note in the comment I think.

OK, got it.

> 
> >     uint16_t offset; /* data offset from beginning of mbuf data buffer
> > */
> 
> Is it replacing RTE_PKTMBUF_HEADROOM?
> 
Actually adding to HEAD_ROOM. We should keep HEAD_ROOM intact,
so actual data offset in the firtst mbuf must be the sum HEAD_ROOM + offset.
mlx5 PMD Imlementation follows this approach, documentation will be updated in v3.

> >     uint32_t reserved; /* reserved field */ };
> >
> > The new routine rte_eth_rx_queue_setup_ex() is introduced to setup the
> > given Rx queue using the new extended Rx packet segment
> > description:
> >
> > int
> > rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> >                           uint16_t nb_rx_desc, unsigned int socket_id,
> >                           const struct rte_eth_rxconf *rx_conf,
> > 		          const struct rte_eth_rxseg *rx_seg,
> >                           uint16_t n_seg)
> 
> An alternative name for this function:
> 	rte_eth_rxseg_queue_setup
M-m-m... Routine name follows patter object_verb:
rx_queue is an object, setup is an action.
rxseg_queue is not an object.
What about "rte_eth_rx_queue_setup_seg"?

> 
> > This routine presents the two new parameters:
> >     rx_seg - pointer the array of segment descriptions, each element
> >              describes the memory pool, maximal data length, initial
> >              data offset from the beginning of data buffer in mbuf
> >     n_seg - number of elements in the array
> 
> Not clear why we need an array.
> I suggest writing here that each segment of the same packet can have
> different properties, the array representing the full packet.
OK, will write.

> 
> > The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
> 
> The name should start with RTE_ prefix.
It is an existing pattern for DEV_RX_OFFLOAD_xxxx, no RTE_ for the case.

> 
> > capabilities is introduced to present the way for PMD to report to
> > application about supporting Rx packet split to configurable segments.
> > Prior invoking the rte_eth_rx_queue_setup_ex() routine application
> > should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.
> >
> > If the Rx queue is configured with new routine the packets being
> > received will be split into multiple segments pushed to the mbufs with
> > specified attributes. The PMD will allocate the first mbuf from the
> > pool specified in the first segment descriptor and puts the data
> > staring at specified offset in the allocated mbuf data buffer. If
> > packet length exceeds the specified segment length the next mbuf will
> > be allocated according to the next segment descriptor (if any) and
> > data will be put in its data buffer at specified offset and not
> > exceeding specified length. If there is no next descriptor the next
> > mbuf will be allocated and filled in the same way (from the same pool
> > and with the same buffer offset/length) as the current one.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >     seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
> >     seg1 - pool1, len1=20B, off1=0B
> >     seg2 - pool2, len2=20B, off2=0B
> >     seg3 - pool3, len3=512B, off3=0B
> >
> > The packet 46 bytes long will look like the following:
> >     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
> >     seg1 - 20B long @ 0 in mbuf from pool1
> >     seg2 - 12B long @ 0 in mbuf from pool2
> >
> > The packet 1500 bytes long will look like the following:
> >     seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
> >     seg1 - 20B @ 0 in mbuf from pool1
> >     seg2 - 20B @ 0 in mbuf from pool2
> >     seg3 - 512B @ 0 in mbuf from pool3
> >     seg4 - 512B @ 0 in mbuf from pool3
> >     seg5 - 422B @ 0 in mbuf from pool3
> >
> > The offload DEV_RX_OFFLOAD_SCATTER must be present and configured
> to
> > support new buffer split feature (if n_seg is greater than one).
> >
> > The new approach would allow splitting the ingress packets into
> > multiple parts pushed to the memory with different attributes.
> > For example, the packet headers can be pushed to the embedded data
> > buffers within mbufs and the application data into the external
> > buffers attached to mbufs allocated from the different memory pools.
> > The memory attributes for the split parts may differ either - for
> > example the application data may be pushed into the external memory
> > located on the dedicated physical device, say GPU or NVMe. This would
> > improve the DPDK receiving datapath flexibility with preserving
> > compatibility with existing API.
> >
> > Also, the proposed segment description might be used to specify Rx
> > packet split for some other features. For example, provide the way to
> > specify the extra memory pool for the Header Split feature of some
> > Intel PMD.
> 
> I don't understand what you are referring in this last paragraph.
> I think explanation above is enough to demonstrate the flexibility.
> 
Just noted the segment description is common thing and could be
promoted to be used in some other features. 

> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> 
> Thank you, I like this feature.
> More minor comments below.
> 
> [...]
> > +* **Introduced extended buffer description for receiving.**
> 
> Rewording:
> 	Introduced extended setup of Rx queue
OK, sounds better.

> 

> > +  * Added extended Rx queue setup routine
> > +  * Added description for Rx segment sizes
> 
> not only "sizes", but also offset and mempool.
> 
> > +  * Added capability to specify the memory pool for each segment
> 
> This one can be merged with the above, or offset should be added.
> 
> [...]
> The doxygen comment is missing here.
Yes, thank you. Also noted that, updating.

> 
> > +int rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> > +		uint16_t nb_rx_desc, unsigned int socket_id,
> > +		const struct rte_eth_rxconf *rx_conf,
> > +		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
> 
> This new function should be experimental and it should be added to the
> .map file.
> 
OK.

With best regards, Slava
  
Thomas Monjalon Oct. 12, 2020, 10:09 a.m. UTC | #3
12/10/2020 11:40, Slava Ovsiienko:
> From: Thomas Monjalon <thomas@monjalon.net>
> > > int
> > > rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> > >                           uint16_t nb_rx_desc, unsigned int socket_id,
> > >                           const struct rte_eth_rxconf *rx_conf,
> > > 		          const struct rte_eth_rxseg *rx_seg,
> > >                           uint16_t n_seg)
> > 
> > An alternative name for this function:
> > 	rte_eth_rxseg_queue_setup
> M-m-m... Routine name follows patter object_verb:
> rx_queue is an object, setup is an action.
> rxseg_queue is not an object.
> What about "rte_eth_rx_queue_setup_seg"?

rte_eth_rxseg is the name of the struct,
so it looks natural to me to keep it as prefix (object name).

[...]
> > > The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
> > 
> > The name should start with RTE_ prefix.
> 
> It is an existing pattern for DEV_RX_OFFLOAD_xxxx, no RTE_ for the case.

It is a wrong pattern which must be fixed.
Please start fresh with the right prefix for new ones.
Thinking twice, it should be:
	RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT

[...]
> > > Also, the proposed segment description might be used to specify Rx
> > > packet split for some other features. For example, provide the way to
> > > specify the extra memory pool for the Header Split feature of some
> > > Intel PMD.
> > 
> > I don't understand what you are referring in this last paragraph.
> > I think explanation above is enough to demonstrate the flexibility.
> > 
> Just noted the segment description is common thing and could be
> promoted to be used in some other features. 

I think it is not needed. And giving Intel as an example is arbitrary.
  

Patch

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index dd8c955..ac9dfd7 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -185,6 +185,21 @@  Supports receiving segmented mbufs.
 * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
 
 
+.. _nic_features_buffer_split:
+
+Buffer Split on Rx
+------------
+
+Scatters the packets being received on specified boundaries to segmented mbufs.
+
+* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[implements] datapath**: ``Buffer Split functionality``.
+* **[implements] rte_eth_dev_data**: ``buffer_split``.
+* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:DEV_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
+* **[related] API**: ``rte_eth_rx_queue_setup_ex()``.
+
+
 .. _nic_features_lro:
 
 LRO
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 4bcf220..8da5cc9 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,12 @@  New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Introduced extended buffer description for receiving.**
+
+  * Added extended Rx queue setup routine
+  * Added description for Rx segment sizes
+  * Added capability to specify the memory pool for each segment
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index dfe5c1b..c626afa 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -128,6 +128,7 @@  struct rte_eth_xstats_name_off {
 	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
+	RTE_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
@@ -1933,6 +1934,179 @@  struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
+			  uint16_t nb_rx_desc, unsigned int socket_id,
+			  const struct rte_eth_rxconf *rx_conf,
+			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
+{
+	int ret;
+	uint16_t seg_idx;
+	uint32_t mbp_buf_size;
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_rxconf local_conf;
+	void **rxq;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+
+	if (rx_seg == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
+		return -EINVAL;
+	}
+
+	if (n_seg == 0) {
+		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup_ex, -ENOTSUP);
+
+	/*
+	 * Check the size of the mbuf data buffer.
+	 * This value must be provided in the private data of the memory pool.
+	 * First check that the memory pool has a valid private data.
+	 */
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret != 0)
+		return ret;
+
+	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+		struct rte_mempool *mp = rx_seg[seg_idx].mp;
+
+		if (mp->private_data_size <
+				sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
+				mp->name, (int)mp->private_data_size,
+				(int)sizeof(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+
+		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+		if (mbp_buf_size < rx_seg[seg_idx].length +
+				   rx_seg[seg_idx].offset +
+				   (seg_idx ? 0 :
+				    (uint32_t)RTE_PKTMBUF_HEADROOM)) {
+			RTE_ETHDEV_LOG(ERR,
+				"%s mbuf_data_room_size %d < %d"
+				" (segment length=%d + segment offset=%d)\n",
+				mp->name, (int)mbp_buf_size,
+				(int)(rx_seg[seg_idx].length +
+				      rx_seg[seg_idx].offset),
+				(int)rx_seg[seg_idx].length,
+				(int)rx_seg[seg_idx].offset);
+			return -EINVAL;
+		}
+	}
+
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0) {
+		nb_rx_desc = dev_info.default_rxportconf.ring_size;
+		/* If driver default is also zero, fall back on EAL default */
+		if (nb_rx_desc == 0)
+			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
+	}
+
+	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
+			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
+			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
+
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_rx_desc(=%hu), should be: "
+			"<= %hu, >= %hu, and a product of %hu\n",
+			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
+			dev_info.rx_desc_lim.nb_min,
+			dev_info.rx_desc_lim.nb_align);
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
+		return -EBUSY;
+
+	if (dev->data->dev_started &&
+		(dev->data->rx_queue_state[rx_queue_id] !=
+			RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id]) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+
+	if (rx_conf == NULL)
+		rx_conf = &dev_info.default_rxconf;
+
+	local_conf = *rx_conf;
+
+	/*
+	 * If an offloading has already been enabled in
+	 * rte_eth_dev_configure(), it has been enabled on all queues,
+	 * so there is no need to enable it in this queue again.
+	 * The local_conf.offloads input to underlying PMD only carries
+	 * those offloadings which are only enabled on this queue and
+	 * not enabled on all queues.
+	 */
+	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
+
+	/*
+	 * New added offloadings for this queue are those not enabled in
+	 * rte_eth_dev_configure() and they must be per-queue type.
+	 * A pure per-port offloading can't be enabled on a queue while
+	 * disabled on another queue. A pure per-port offloading can't
+	 * be enabled for any queue as new added one if it hasn't been
+	 * enabled in rte_eth_dev_configure().
+	 */
+	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
+	     local_conf.offloads) {
+		RTE_ETHDEV_LOG(ERR,
+			"Ethdev port_id=%d rx_queue_id=%d, new added offloads"
+			" 0x%"PRIx64" must be within per-queue offload"
+			" capabilities 0x%"PRIx64" in %s()\n",
+			port_id, rx_queue_id, local_conf.offloads,
+			dev_info.rx_queue_offload_capa,
+			__func__);
+		return -EINVAL;
+	}
+
+	/*
+	 * If LRO is enabled, check that the maximum aggregated packet
+	 * size is supported by the configured device.
+	 */
+	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
+		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
+			dev->data->dev_conf.rxmode.max_lro_pkt_size =
+				dev->data->dev_conf.rxmode.max_rx_pkt_len;
+		int ret = check_lro_pkt_size(port_id,
+				dev->data->dev_conf.rxmode.max_lro_pkt_size,
+				dev->data->dev_conf.rxmode.max_rx_pkt_len,
+				dev_info.max_lro_pkt_size);
+		if (ret != 0)
+			return ret;
+	}
+
+	ret = (*dev->dev_ops->rx_queue_setup_ex)(dev, rx_queue_id, nb_rx_desc,
+						 socket_id, &local_conf,
+						 rx_seg, n_seg);
+	if (!ret) {
+		if (!dev->data->min_rx_buf_size ||
+		    dev->data->min_rx_buf_size > mbp_buf_size)
+			dev->data->min_rx_buf_size = mbp_buf_size;
+	}
+
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			       uint16_t nb_rx_desc,
 			       const struct rte_eth_hairpin_conf *conf)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 645a186..553900b 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -970,6 +970,16 @@  struct rte_eth_txmode {
 };
 
 /**
+ * A structure used to configure an RX packet segment to split.
+ */
+struct rte_eth_rxseg {
+	struct rte_mempool *mp; /**< Memory pools to allocate segment from */
+	uint16_t length; /**< Segment maximal data length */
+	uint16_t offset; /**< Data offset from beggining of mbuf data buffer */
+	uint32_t reserved; /**< Reserved field */
+};
+
+/**
  * A structure used to configure an RX ring of an Ethernet port.
  */
 struct rte_eth_rxconf {
@@ -1260,6 +1270,7 @@  struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
 #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
+#define DEV_RX_OFFLOAD_BUFFER_SPLIT     0x00100000
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 				 DEV_RX_OFFLOAD_UDP_CKSUM | \
@@ -2020,6 +2031,11 @@  int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		const struct rte_eth_rxconf *rx_conf,
 		struct rte_mempool *mb_pool);
 
+int rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 04ac8e9..de4d7de 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -264,6 +264,15 @@  typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
 				    struct rte_mempool *mb_pool);
 /**< @internal Set up a receive queue of an Ethernet device. */
 
+typedef int (*eth_rx_queue_setup_ex_t)(struct rte_eth_dev *dev,
+				       uint16_t rx_queue_id,
+				       uint16_t nb_rx_desc,
+				       unsigned int socket_id,
+				       const struct rte_eth_rxconf *rx_conf,
+				       const struct rte_eth_rxseg *rx_seg,
+				       uint16_t n_seg);
+/**< @internal extended Set up a receive queue of an Ethernet device. */
+
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    uint16_t tx_queue_id,
 				    uint16_t nb_tx_desc,
@@ -630,6 +639,7 @@  struct eth_dev_ops {
 	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
 	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
 	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
+	eth_rx_queue_setup_ex_t    rx_queue_setup_ex;/**< Extended RX setup. */
 	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
 
 	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */