diff mbox series

kernel/kni: retry the xmit in case ring is full

Message ID	1639753232-115930-1-git-send-email-tudor.cornea@gmail.com (mailing list archive)
State	Rejected, archived
Delegated to:	Ferruh Yigit
Headers	From: Tudor Cornea <tudor.cornea@gmail.com> To: ferruh.yigit@intel.com Cc: thomas@monjalon.net, stephen@networkplumber.org, dev@dpdk.org, Tudor Cornea <tudor.cornea@gmail.com> Subject: [PATCH] kernel/kni: retry the xmit in case ring is full Date: Fri, 17 Dec 2021 17:00:32 +0200 Message-Id: <1639753232-115930-1-git-send-email-tudor.cornea@gmail.com> Precedence: list Errors-To: dev-bounces@dpdk.org
Series	kernel/kni: retry the xmit in case ring is full \| kernel/kni: retry the xmit in case ring is full

Checks

Context	Check	Description
ci/checkpatch	success	coding style OK
ci/Intel-compilation	success	Compilation OK
ci/iol-broadcom-Functional	success	Functional Testing PASS
ci/iol-aarch64-compile-testing	success	Testing PASS
ci/iol-mellanox-Performance	success	Performance Testing PASS
ci/iol-x86_64-unit-testing	success	Testing PASS
ci/iol-x86_64-compile-testing	success	Testing PASS
ci/iol-broadcom-Performance	success	Performance Testing PASS
ci/iol-intel-Functional	success	Functional Testing PASS
ci/iol-aarch64-unit-testing	success	Testing PASS
ci/github-robot: build	success	github build: passed
ci/intel-Testing	success	Testing PASS
ci/iol-intel-Performance	success	Performance Testing PASS

Commit Message

Tudor Cornea Dec. 17, 2021, 3 p.m. UTC

  This patch attempts to avoid dropping packets that are to be
transmitted, in case there is no space in the KNI ring.

We have a use case in which we leverage the Linux TCP / IP stack for
control plane, and some protocols might be sensitive to packet drops.

This might mean that the sender (Kernel) might be moving at a faster pace
than the receiver end (DPDK application), or it might have some brief
moments of bursty traffic patterns.

Requeuing the packets could add a kind of backpressure until a transmit
window is available to us.

The burden of retransmitting is shifted to the caller of ndo_start_xmit,
which in our case is the configured queuing discipline. This way, the
user should be able to influence the behavior w.r.t dropping packets,
by picking the desired queuing discipline.

Although it should technically be a good approach, from what
I have tested, stopping the queue prior to returning NETDEV_TX_BUSY seems
to add some extra overhead, and degrade the control-plane performance
a bit.

Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
---
 kernel/linux/kni/kni_net.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Stephen Hemminger Dec. 17, 2021, 4:24 p.m. UTC | #1

On Fri, 17 Dec 2021 17:00:32 +0200
Tudor Cornea <tudor.cornea@gmail.com> wrote:

> This patch attempts to avoid dropping packets that are to be
> transmitted, in case there is no space in the KNI ring.
> 
> We have a use case in which we leverage the Linux TCP / IP stack for
> control plane, and some protocols might be sensitive to packet drops.
> 
> This might mean that the sender (Kernel) might be moving at a faster pace
> than the receiver end (DPDK application), or it might have some brief
> moments of bursty traffic patterns.
> 
> Requeuing the packets could add a kind of backpressure until a transmit
> window is available to us.
> 
> The burden of retransmitting is shifted to the caller of ndo_start_xmit,
> which in our case is the configured queuing discipline. This way, the
> user should be able to influence the behavior w.r.t dropping packets,
> by picking the desired queuing discipline.
> 
> Although it should technically be a good approach, from what
> I have tested, stopping the queue prior to returning NETDEV_TX_BUSY seems
> to add some extra overhead, and degrade the control-plane performance
> a bit.
> 
> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>

NAK
Doing this risks having a CPU lockup if userspace does not keep up
or the DPDK application gets stuck.

There are better ways to solve the TCP stack queue overrun issue:
1. Use a better queueing discipline on the kni device. The Linux default
   of pfifo_fast has bufferbloat issues. Use fq_codel, fq, codel or pie?
2. KNI should implement BQL so that TCP stack can see lock backpressure
   about possible queue depth.

As a simple workaround increase the KNI ring size. It won't solve the whole
problem but i tcan help

Tudor Cornea Jan. 17, 2022, 3:29 p.m. UTC | #2

Hi Stephen,


> NAK
> Doing this risks having a CPU lockup if userspace does not keep up
> or the DPDK application gets stuck.
>
> There are better ways to solve the TCP stack queue overrun issue:
> 1. Use a better queueing discipline on the kni device. The Linux default
>    of pfifo_fast has bufferbloat issues. Use fq_codel, fq, codel or pie?
> 2. KNI should implement BQL so that TCP stack can see lock backpressure
>    about possible queue depth.
>
>
Thanks for the suggestions.
I agree that we risk a lockup, in case the DPDK app gets stuck.

Indeed, I am running on an older Linux kernel, and the default queuing
discipline is pfifo_fast.
I'll experiment with the queuing disciplines you recommended.


> As a simple workaround increase the KNI ring size. It won't solve the whole
> problem but i tcan help
>

I obtained moderate success with increasing MAX_MBUF_BURST_NUM from 32 to
1024 in librte_kni.
I'm not sure if such a change would be upstreamable. Perhaps it needs a bit
of testing.

I'll drop the current patch.

diff mbox series

Patch

diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index 29e5b9e..db0330f 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -321,10 +321,9 @@  kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 	if (kni_fifo_free_count(kni->tx_q) == 0 ||
 			kni_fifo_count(kni->alloc_q) == 0) {
 		/**
-		 * If no free entry in tx_q or no entry in alloc_q,
-		 * drops skb and goes out.
+		 * Tell the caller to requeue, and retry at a later time.
 		 */
-		goto drop;
+		goto requeue;
 	}
 
 	/* dequeue a mbuf from alloc_q */
@@ -371,6 +370,10 @@  kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 	dev->stats.tx_dropped++;
 
 	return NETDEV_TX_OK;
+
+requeue:
+	/* Signal the caller to re-transmit at a later time */
+	return NETDEV_TX_BUSY;
 }
 
 /*