[dpdk-dev,1/6] cxgbe: Optimize forwarding performance for 40G
Commit Message
Update sge initialization with respect to free-list manager configuration
and ingress arbiter. Also update refill logic to refill mbufs only after
a certain threshold for rx. Optimize tx packet prefetch and free.
Approx. 4 MPPS improvement seen in forwarding performance after the
optimization.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
drivers/net/cxgbe/base/t4_regs.h | 16 ++++++++++++++++
drivers/net/cxgbe/cxgbe_main.c | 7 +++++++
drivers/net/cxgbe/sge.c | 17 ++++++++++++-----
3 files changed, 35 insertions(+), 5 deletions(-)
Comments
Hi Rahul,
Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> Update sge initialization with respect to free-list manager configuration
> and ingress arbiter. Also update refill logic to refill mbufs only after
> a certain threshold for rx. Optimize tx packet prefetch and free.
<<snip>>
> for (i = 0; i < sd->coalesce.idx; i++) {
> - rte_pktmbuf_free(sd->coalesce.mbuf[i]);
> + struct rte_mbuf *tmp = sd->coalesce.mbuf[i];
> +
> + do {
> + struct rte_mbuf *next = tmp->next;
> +
> + rte_pktmbuf_free_seg(tmp);
> + tmp = next;
> + } while (tmp);
> sd->coalesce.mbuf[i] = NULL;
Pardon my ignorance here, but rte_pktmbuf_free does this work. I can't
actually see much difference between your rewrite of this block, and
the implementation of rte_pktmbuf_free() (apart from moving your branch
to the end of the function). Did your microbenchmarking really show this
as an improvement?
Thanks for your time,
Aaron
Hi Aaron,
On Friday, October 10/02/15, 2015 at 14:48:28 -0700, Aaron Conole wrote:
> Hi Rahul,
>
> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
>
> > Update sge initialization with respect to free-list manager configuration
> > and ingress arbiter. Also update refill logic to refill mbufs only after
> > a certain threshold for rx. Optimize tx packet prefetch and free.
> <<snip>>
> > for (i = 0; i < sd->coalesce.idx; i++) {
> > - rte_pktmbuf_free(sd->coalesce.mbuf[i]);
> > + struct rte_mbuf *tmp = sd->coalesce.mbuf[i];
> > +
> > + do {
> > + struct rte_mbuf *next = tmp->next;
> > +
> > + rte_pktmbuf_free_seg(tmp);
> > + tmp = next;
> > + } while (tmp);
> > sd->coalesce.mbuf[i] = NULL;
> Pardon my ignorance here, but rte_pktmbuf_free does this work. I can't
> actually see much difference between your rewrite of this block, and
> the implementation of rte_pktmbuf_free() (apart from moving your branch
> to the end of the function). Did your microbenchmarking really show this
> as an improvement?
>
> Thanks for your time,
> Aaron
rte_pktmbuf_free calls rte_mbuf_sanity_check which does a lot of
checks. This additional check seems redundant for single segment
packets since rte_pktmbuf_free_seg also performs rte_mbuf_sanity_check.
Several PMDs already prefer to use rte_pktmbuf_free_seg directly over
rte_pktmbuf_free as it is faster.
The forwarding perf. improvement with only this particular block is
around 1 Mpps for 64B packets when using l3fwd with 8 queues.
Thanks,
Rahul
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Rahul Lakkireddy
> Sent: Monday, October 05, 2015 11:06 AM
> To: Aaron Conole
> Cc: dev@dpdk.org; Felix Marti; Kumar A S; Nirranjan Kirubaharan
> Subject: Re: [dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G
>
> Hi Aaron,
>
> On Friday, October 10/02/15, 2015 at 14:48:28 -0700, Aaron Conole wrote:
> > Hi Rahul,
> >
> > Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> >
> > > Update sge initialization with respect to free-list manager configuration
> > > and ingress arbiter. Also update refill logic to refill mbufs only after
> > > a certain threshold for rx. Optimize tx packet prefetch and free.
> > <<snip>>
> > > for (i = 0; i < sd->coalesce.idx; i++) {
> > > - rte_pktmbuf_free(sd->coalesce.mbuf[i]);
> > > + struct rte_mbuf *tmp = sd->coalesce.mbuf[i];
> > > +
> > > + do {
> > > + struct rte_mbuf *next = tmp->next;
> > > +
> > > + rte_pktmbuf_free_seg(tmp);
> > > + tmp = next;
> > > + } while (tmp);
> > > sd->coalesce.mbuf[i] = NULL;
> > Pardon my ignorance here, but rte_pktmbuf_free does this work. I can't
> > actually see much difference between your rewrite of this block, and
> > the implementation of rte_pktmbuf_free() (apart from moving your branch
> > to the end of the function). Did your microbenchmarking really show this
> > as an improvement?
> >
> > Thanks for your time,
> > Aaron
>
> rte_pktmbuf_free calls rte_mbuf_sanity_check which does a lot of
> checks.
Only when RTE_LIBRTE_MBUF_DEBUG is enabled in your config.
By default it is switched off.
> This additional check seems redundant for single segment
> packets since rte_pktmbuf_free_seg also performs rte_mbuf_sanity_check.
>
> Several PMDs already prefer to use rte_pktmbuf_free_seg directly over
> rte_pktmbuf_free as it is faster.
Other PMDs use rte_pktmbuf_free_seg() as each TD has an associated
with it segment. So as HW is done with the TD, SW frees associated segment.
In your case I don't see any point in re-implementing rte_pktmbuf_free() manually,
and I don't think it would be any faster.
Konstantin
>
> The forwarding perf. improvement with only this particular block is
> around 1 Mpps for 64B packets when using l3fwd with 8 queues.
>
> Thanks,
> Rahul
Hi Konstantin,
On Monday, October 10/05/15, 2015 at 04:46:40 -0700, Ananyev, Konstantin wrote:
>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Rahul Lakkireddy
> > Sent: Monday, October 05, 2015 11:06 AM
> > To: Aaron Conole
> > Cc: dev@dpdk.org; Felix Marti; Kumar A S; Nirranjan Kirubaharan
> > Subject: Re: [dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G
> >
> > Hi Aaron,
> >
> > On Friday, October 10/02/15, 2015 at 14:48:28 -0700, Aaron Conole wrote:
> > > Hi Rahul,
> > >
> > > Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> > >
> > > > Update sge initialization with respect to free-list manager configuration
> > > > and ingress arbiter. Also update refill logic to refill mbufs only after
> > > > a certain threshold for rx. Optimize tx packet prefetch and free.
> > > <<snip>>
> > > > for (i = 0; i < sd->coalesce.idx; i++) {
> > > > - rte_pktmbuf_free(sd->coalesce.mbuf[i]);
> > > > + struct rte_mbuf *tmp = sd->coalesce.mbuf[i];
> > > > +
> > > > + do {
> > > > + struct rte_mbuf *next = tmp->next;
> > > > +
> > > > + rte_pktmbuf_free_seg(tmp);
> > > > + tmp = next;
> > > > + } while (tmp);
> > > > sd->coalesce.mbuf[i] = NULL;
> > > Pardon my ignorance here, but rte_pktmbuf_free does this work. I can't
> > > actually see much difference between your rewrite of this block, and
> > > the implementation of rte_pktmbuf_free() (apart from moving your branch
> > > to the end of the function). Did your microbenchmarking really show this
> > > as an improvement?
> > >
> > > Thanks for your time,
> > > Aaron
> >
> > rte_pktmbuf_free calls rte_mbuf_sanity_check which does a lot of
> > checks.
>
> Only when RTE_LIBRTE_MBUF_DEBUG is enabled in your config.
> By default it is switched off.
Right. I clearly missed this.
I am running with default config only btw.
>
> > This additional check seems redundant for single segment
> > packets since rte_pktmbuf_free_seg also performs rte_mbuf_sanity_check.
> >
> > Several PMDs already prefer to use rte_pktmbuf_free_seg directly over
> > rte_pktmbuf_free as it is faster.
>
> Other PMDs use rte_pktmbuf_free_seg() as each TD has an associated
> with it segment. So as HW is done with the TD, SW frees associated segment.
> In your case I don't see any point in re-implementing rte_pktmbuf_free() manually,
> and I don't think it would be any faster.
>
> Konstantin
As I mentioned below, I am clearly seeing a difference of 1 Mpps. And 1
Mpps is not a small difference IMHO.
When running l3fwd with 8 queues, I also collected a perf report.
When using rte_pktmbuf_free, I see that it eats up around 6% cpu as
below in perf top report:-
--------------------
32.00% l3fwd [.] cxgbe_poll
22.25% l3fwd [.] t4_eth_xmit
20.30% l3fwd [.] main_loop
6.77% l3fwd [.] rte_pktmbuf_free
4.86% l3fwd [.] refill_fl_usembufs
2.00% l3fwd [.] write_sgl
.....
--------------------
While, when using rte_pktmbuf_free_seg directly, I don't see above
problem. perf top report now comes as:-
-------------------
33.36% l3fwd [.] cxgbe_poll
32.69% l3fwd [.] t4_eth_xmit
19.05% l3fwd [.] main_loop
5.21% l3fwd [.] refill_fl_usembufs
2.40% l3fwd [.] write_sgl
....
-------------------
I obviously missed the debug flag for rte_mbuf_sanity_check.
However, there is a clear difference of 1 Mpps. I don't know if its the
change between while construct used in rte_pktmbuf_free and the
do..while construct that I used - is making the difference.
>
> >
> > The forwarding perf. improvement with only this particular block is
> > around 1 Mpps for 64B packets when using l3fwd with 8 queues.
> >
> > Thanks,
> > Rahul
Hi Rahul,
> -----Original Message-----
> From: Rahul Lakkireddy [mailto:rahul.lakkireddy@chelsio.com]
> Sent: Monday, October 05, 2015 1:42 PM
> To: Ananyev, Konstantin
> Cc: Aaron Conole; dev@dpdk.org; Felix Marti; Kumar A S; Nirranjan Kirubaharan
> Subject: Re: [dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G
>
> Hi Konstantin,
>
> On Monday, October 10/05/15, 2015 at 04:46:40 -0700, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Rahul Lakkireddy
> > > Sent: Monday, October 05, 2015 11:06 AM
> > > To: Aaron Conole
> > > Cc: dev@dpdk.org; Felix Marti; Kumar A S; Nirranjan Kirubaharan
> > > Subject: Re: [dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G
> > >
> > > Hi Aaron,
> > >
> > > On Friday, October 10/02/15, 2015 at 14:48:28 -0700, Aaron Conole wrote:
> > > > Hi Rahul,
> > > >
> > > > Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> > > >
> > > > > Update sge initialization with respect to free-list manager configuration
> > > > > and ingress arbiter. Also update refill logic to refill mbufs only after
> > > > > a certain threshold for rx. Optimize tx packet prefetch and free.
> > > > <<snip>>
> > > > > for (i = 0; i < sd->coalesce.idx; i++) {
> > > > > - rte_pktmbuf_free(sd->coalesce.mbuf[i]);
> > > > > + struct rte_mbuf *tmp = sd->coalesce.mbuf[i];
> > > > > +
> > > > > + do {
> > > > > + struct rte_mbuf *next = tmp->next;
> > > > > +
> > > > > + rte_pktmbuf_free_seg(tmp);
> > > > > + tmp = next;
> > > > > + } while (tmp);
> > > > > sd->coalesce.mbuf[i] = NULL;
> > > > Pardon my ignorance here, but rte_pktmbuf_free does this work. I can't
> > > > actually see much difference between your rewrite of this block, and
> > > > the implementation of rte_pktmbuf_free() (apart from moving your branch
> > > > to the end of the function). Did your microbenchmarking really show this
> > > > as an improvement?
> > > >
> > > > Thanks for your time,
> > > > Aaron
> > >
> > > rte_pktmbuf_free calls rte_mbuf_sanity_check which does a lot of
> > > checks.
> >
> > Only when RTE_LIBRTE_MBUF_DEBUG is enabled in your config.
> > By default it is switched off.
>
> Right. I clearly missed this.
> I am running with default config only btw.
>
> >
> > > This additional check seems redundant for single segment
> > > packets since rte_pktmbuf_free_seg also performs rte_mbuf_sanity_check.
> > >
> > > Several PMDs already prefer to use rte_pktmbuf_free_seg directly over
> > > rte_pktmbuf_free as it is faster.
> >
> > Other PMDs use rte_pktmbuf_free_seg() as each TD has an associated
> > with it segment. So as HW is done with the TD, SW frees associated segment.
> > In your case I don't see any point in re-implementing rte_pktmbuf_free() manually,
> > and I don't think it would be any faster.
> >
> > Konstantin
>
> As I mentioned below, I am clearly seeing a difference of 1 Mpps. And 1
> Mpps is not a small difference IMHO.
Agree with you here - it is a significant difference.
>
> When running l3fwd with 8 queues, I also collected a perf report.
> When using rte_pktmbuf_free, I see that it eats up around 6% cpu as
> below in perf top report:-
> --------------------
> 32.00% l3fwd [.] cxgbe_poll
> 22.25% l3fwd [.] t4_eth_xmit
> 20.30% l3fwd [.] main_loop
> 6.77% l3fwd [.] rte_pktmbuf_free
> 4.86% l3fwd [.] refill_fl_usembufs
> 2.00% l3fwd [.] write_sgl
> .....
> --------------------
>
> While, when using rte_pktmbuf_free_seg directly, I don't see above
> problem. perf top report now comes as:-
> -------------------
> 33.36% l3fwd [.] cxgbe_poll
> 32.69% l3fwd [.] t4_eth_xmit
> 19.05% l3fwd [.] main_loop
> 5.21% l3fwd [.] refill_fl_usembufs
> 2.40% l3fwd [.] write_sgl
> ....
> -------------------
I don't think these 6% disappeared anywhere.
As I can see, now t4_eth_xmit() increased by roughly same amount
(you still have same job to do).
To me it looks like in that case compiler didn't really inline rte_pktmbuf_free().
Wonder can you add 'always_inline' attribute to the rte_pktmbuf_free(),
and see would it make any difference?
Konstantin
>
> I obviously missed the debug flag for rte_mbuf_sanity_check.
> However, there is a clear difference of 1 Mpps. I don't know if its the
> change between while construct used in rte_pktmbuf_free and the
> do..while construct that I used - is making the difference.
>
>
> >
> > >
> > > The forwarding perf. improvement with only this particular block is
> > > around 1 Mpps for 64B packets when using l3fwd with 8 queues.
> > >
> > > Thanks,
> > > Rahul
On Monday, October 10/05/15, 2015 at 07:09:27 -0700, Ananyev, Konstantin wrote:
> Hi Rahul,
[...]
> > > > This additional check seems redundant for single segment
> > > > packets since rte_pktmbuf_free_seg also performs rte_mbuf_sanity_check.
> > > >
> > > > Several PMDs already prefer to use rte_pktmbuf_free_seg directly over
> > > > rte_pktmbuf_free as it is faster.
> > >
> > > Other PMDs use rte_pktmbuf_free_seg() as each TD has an associated
> > > with it segment. So as HW is done with the TD, SW frees associated segment.
> > > In your case I don't see any point in re-implementing rte_pktmbuf_free() manually,
> > > and I don't think it would be any faster.
> > >
> > > Konstantin
> >
> > As I mentioned below, I am clearly seeing a difference of 1 Mpps. And 1
> > Mpps is not a small difference IMHO.
>
> Agree with you here - it is a significant difference.
>
> >
> > When running l3fwd with 8 queues, I also collected a perf report.
> > When using rte_pktmbuf_free, I see that it eats up around 6% cpu as
> > below in perf top report:-
> > --------------------
> > 32.00% l3fwd [.] cxgbe_poll
> > 22.25% l3fwd [.] t4_eth_xmit
> > 20.30% l3fwd [.] main_loop
> > 6.77% l3fwd [.] rte_pktmbuf_free
> > 4.86% l3fwd [.] refill_fl_usembufs
> > 2.00% l3fwd [.] write_sgl
> > .....
> > --------------------
> >
> > While, when using rte_pktmbuf_free_seg directly, I don't see above
> > problem. perf top report now comes as:-
> > -------------------
> > 33.36% l3fwd [.] cxgbe_poll
> > 32.69% l3fwd [.] t4_eth_xmit
> > 19.05% l3fwd [.] main_loop
> > 5.21% l3fwd [.] refill_fl_usembufs
> > 2.40% l3fwd [.] write_sgl
> > ....
> > -------------------
>
> I don't think these 6% disappeared anywhere.
> As I can see, now t4_eth_xmit() increased by roughly same amount
> (you still have same job to do).
Right.
> To me it looks like in that case compiler didn't really inline rte_pktmbuf_free().
> Wonder can you add 'always_inline' attribute to the rte_pktmbuf_free(),
> and see would it make any difference?
>
> Konstantin
I will try out above and update further.
Thanks,
Rahul.
On Monday, October 10/05/15, 2015 at 20:37:31 +0530, Rahul Lakkireddy wrote:
> On Monday, October 10/05/15, 2015 at 07:09:27 -0700, Ananyev, Konstantin wrote:
> > Hi Rahul,
>
> [...]
>
> > > > > This additional check seems redundant for single segment
> > > > > packets since rte_pktmbuf_free_seg also performs rte_mbuf_sanity_check.
> > > > >
> > > > > Several PMDs already prefer to use rte_pktmbuf_free_seg directly over
> > > > > rte_pktmbuf_free as it is faster.
> > > >
> > > > Other PMDs use rte_pktmbuf_free_seg() as each TD has an associated
> > > > with it segment. So as HW is done with the TD, SW frees associated segment.
> > > > In your case I don't see any point in re-implementing rte_pktmbuf_free() manually,
> > > > and I don't think it would be any faster.
> > > >
> > > > Konstantin
> > >
> > > As I mentioned below, I am clearly seeing a difference of 1 Mpps. And 1
> > > Mpps is not a small difference IMHO.
> >
> > Agree with you here - it is a significant difference.
> >
> > >
> > > When running l3fwd with 8 queues, I also collected a perf report.
> > > When using rte_pktmbuf_free, I see that it eats up around 6% cpu as
> > > below in perf top report:-
> > > --------------------
> > > 32.00% l3fwd [.] cxgbe_poll
> > > 22.25% l3fwd [.] t4_eth_xmit
> > > 20.30% l3fwd [.] main_loop
> > > 6.77% l3fwd [.] rte_pktmbuf_free
> > > 4.86% l3fwd [.] refill_fl_usembufs
> > > 2.00% l3fwd [.] write_sgl
> > > .....
> > > --------------------
> > >
> > > While, when using rte_pktmbuf_free_seg directly, I don't see above
> > > problem. perf top report now comes as:-
> > > -------------------
> > > 33.36% l3fwd [.] cxgbe_poll
> > > 32.69% l3fwd [.] t4_eth_xmit
> > > 19.05% l3fwd [.] main_loop
> > > 5.21% l3fwd [.] refill_fl_usembufs
> > > 2.40% l3fwd [.] write_sgl
> > > ....
> > > -------------------
> >
> > I don't think these 6% disappeared anywhere.
> > As I can see, now t4_eth_xmit() increased by roughly same amount
> > (you still have same job to do).
>
> Right.
>
> > To me it looks like in that case compiler didn't really inline rte_pktmbuf_free().
> > Wonder can you add 'always_inline' attribute to the rte_pktmbuf_free(),
> > and see would it make any difference?
> >
> > Konstantin
>
> I will try out above and update further.
>
Tried always_inline and didn't see any difference in performance in
RHEL 6.4 with gcc 4.4.7, but was seeing 1 MPPS improvement with the
above block.
I've moved to latest RHEL 7.1 with gcc 4.8.3 and tried both
always_inline and the above block and I'm not seeing any difference
for both.
Will drop this block and submit a v2.
Thanks for the review Aaron and Konstantin.
Thanks,
Rahul
@@ -266,6 +266,18 @@
#define A_SGE_FL_BUFFER_SIZE2 0x104c
#define A_SGE_FL_BUFFER_SIZE3 0x1050
+#define A_SGE_FLM_CFG 0x1090
+
+#define S_CREDITCNT 4
+#define M_CREDITCNT 0x3U
+#define V_CREDITCNT(x) ((x) << S_CREDITCNT)
+#define G_CREDITCNT(x) (((x) >> S_CREDITCNT) & M_CREDITCNT)
+
+#define S_CREDITCNTPACKING 2
+#define M_CREDITCNTPACKING 0x3U
+#define V_CREDITCNTPACKING(x) ((x) << S_CREDITCNTPACKING)
+#define G_CREDITCNTPACKING(x) (((x) >> S_CREDITCNTPACKING) & M_CREDITCNTPACKING)
+
#define A_SGE_CONM_CTRL 0x1094
#define S_EGRTHRESHOLD 8
@@ -361,6 +373,10 @@
#define A_SGE_CONTROL2 0x1124
+#define S_IDMAARBROUNDROBIN 19
+#define V_IDMAARBROUNDROBIN(x) ((x) << S_IDMAARBROUNDROBIN)
+#define F_IDMAARBROUNDROBIN V_IDMAARBROUNDROBIN(1U)
+
#define S_INGPACKBOUNDARY 16
#define M_INGPACKBOUNDARY 0x7U
#define V_INGPACKBOUNDARY(x) ((x) << S_INGPACKBOUNDARY)
@@ -422,6 +422,13 @@ static int adap_init0_tweaks(struct adapter *adapter)
t4_set_reg_field(adapter, A_SGE_CONTROL, V_PKTSHIFT(M_PKTSHIFT),
V_PKTSHIFT(rx_dma_offset));
+ t4_set_reg_field(adapter, A_SGE_FLM_CFG,
+ V_CREDITCNT(M_CREDITCNT) | M_CREDITCNTPACKING,
+ V_CREDITCNT(3) | V_CREDITCNTPACKING(1));
+
+ t4_set_reg_field(adapter, A_SGE_CONTROL2, V_IDMAARBROUNDROBIN(1U),
+ V_IDMAARBROUNDROBIN(1U));
+
/*
* Don't include the "IP Pseudo Header" in CPL_RX_PKT checksums: Linux
* adds the pseudo header itself.
@@ -286,8 +286,7 @@ static void unmap_rx_buf(struct sge_fl *q)
static inline void ring_fl_db(struct adapter *adap, struct sge_fl *q)
{
- /* see if we have exceeded q->size / 4 */
- if (q->pend_cred >= (q->size / 4)) {
+ if (q->pend_cred >= 64) {
u32 val = adap->params.arch.sge_fl_db;
if (is_t4(adap->params.chip))
@@ -995,7 +994,14 @@ static inline int tx_do_packet_coalesce(struct sge_eth_txq *txq,
int i;
for (i = 0; i < sd->coalesce.idx; i++) {
- rte_pktmbuf_free(sd->coalesce.mbuf[i]);
+ struct rte_mbuf *tmp = sd->coalesce.mbuf[i];
+
+ do {
+ struct rte_mbuf *next = tmp->next;
+
+ rte_pktmbuf_free_seg(tmp);
+ tmp = next;
+ } while (tmp);
sd->coalesce.mbuf[i] = NULL;
}
}
@@ -1054,7 +1060,6 @@ out_free:
return 0;
}
- rte_prefetch0(&((&txq->q)->sdesc->mbuf->pool));
pi = (struct port_info *)txq->eth_dev->data->dev_private;
adap = pi->adapter;
@@ -1070,6 +1075,7 @@ out_free:
txq->stats.mapping_err++;
goto out_free;
}
+ rte_prefetch0((volatile void *)addr);
return tx_do_packet_coalesce(txq, mbuf, cflits, adap,
pi, addr);
} else {
@@ -1454,7 +1460,8 @@ static int process_responses(struct sge_rspq *q, int budget,
unsigned int params;
u32 val;
- __refill_fl(q->adapter, &rxq->fl);
+ if (fl_cap(&rxq->fl) - rxq->fl.avail >= 64)
+ __refill_fl(q->adapter, &rxq->fl);
params = V_QINTR_TIMER_IDX(X_TIMERREG_UPDATE_CIDX);
q->next_intr_params = params;
val = V_CIDXINC(cidx_inc) | V_SEINTARM(params);