[dpdk-dev] Fixed spam from kni_allocate_mbufs() when no mbufs are free. If mbufs exhausted, 'out of memory' message logged at EXTREMELY high rates. Now logs no more than once per 10 mins

Message ID 1418824622-9212-1-git-send-email-rolette@infiniteio.com (mailing list archive)
State Changes Requested, archived
Headers

Commit Message

Jay Rolette Dec. 17, 2014, 1:57 p.m. UTC
  Signed-off-by: Jay Rolette <rolette@infiniteio.com>
---
 lib/librte_kni/rte_kni.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)
  

Comments

Stephen Hemminger April 14, 2015, 1:09 a.m. UTC | #1
On Wed, 17 Dec 2014 07:57:02 -0600
Jay Rolette <rolette@infiniteio.com> wrote:

> Signed-off-by: Jay Rolette <rolette@infiniteio.com>
> ---
>  lib/librte_kni/rte_kni.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> index fdb7509..f89319c 100644
> --- a/lib/librte_kni/rte_kni.c
> +++ b/lib/librte_kni/rte_kni.c
> @@ -40,6 +40,7 @@
>  #include <unistd.h>
>  #include <sys/ioctl.h>
>  
> +#include <rte_cycles.h>
>  #include <rte_spinlock.h>
>  #include <rte_string_fns.h>
>  #include <rte_ethdev.h>
> @@ -61,6 +62,9 @@
>  
>  #define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0)
>  
> +// Configure how often we log "out of memory" messages (in seconds)
> +#define KNI_SPAM_SUPPRESSION_PERIOD 60*10
> +
>  /**
>   * KNI context
>   */
> @@ -592,6 +596,10 @@ kni_free_mbufs(struct rte_kni *kni)
>  static void
>  kni_allocate_mbufs(struct rte_kni *kni)
>  {
> +	static uint64_t no_mbufs = 0;
> +	static uint64_t spam_filter = 0;
> +	static uint64_t delayPeriod = 0;
> +
>  	int i, ret;
>  	struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
>  
> @@ -620,7 +628,18 @@ kni_allocate_mbufs(struct rte_kni *kni)
>  		pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
>  		if (unlikely(pkts[i] == NULL)) {
>  			/* Out of memory */
> -			RTE_LOG(ERR, KNI, "Out of memory\n");
> +			no_mbufs++;
> +
> +			// Memory leak or need to tune? Regardless, if we get here once,
> +			// we will get here a *lot*. Don't spam the logs!
> +			now = rte_get_tsc_cycles();
> +			if (!delayPeriod)
> +			    delayPeriod = rte_get_tsc_hz() * KNI_SPAM_SUPPRESSION_PERIOD;
> +
> +			if (!spam_filter || (now - spam_filter) > delayPeriod) {
> +				RTE_LOG(ERR, KNI, "No mbufs available (%llu)\n", (unsigned long long)no_mbufs);
> +				spam_filter = now;
> +			}
>  			break;
>  		}
>  	}

I agree whole completely with the intent of this.
But just remove the log message completely. It doesn't
help at all, use a statistic instead.

If you want to do ratelimiting, then it is better to create
a common function (see net_ratelimit() in Linux kernel) to
have all code use the same method, rather than reinventing private
code to do it.

Minor style complaints:
  * don't use camelCase, it goes against the style of the rest of the code.
  * don't use C++ style comments.
  * always use rte_cycles() not TSC for things like this.


Please resubmit removing the log message and adding a statistic.
  
Jay Rolette April 14, 2015, 12:58 p.m. UTC | #2
Hi Stephen,

Thanks for the feedback. Comments and questions inline below.

Jay

On Mon, Apr 13, 2015 at 8:09 PM, Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Wed, 17 Dec 2014 07:57:02 -0600
> Jay Rolette <rolette@infiniteio.com> wrote:
>
> > Signed-off-by: Jay Rolette <rolette@infiniteio.com>
> > ---
> >  lib/librte_kni/rte_kni.c | 21 ++++++++++++++++++++-
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> > index fdb7509..f89319c 100644
> > --- a/lib/librte_kni/rte_kni.c
> > +++ b/lib/librte_kni/rte_kni.c
> > @@ -40,6 +40,7 @@
> >  #include <unistd.h>
> >  #include <sys/ioctl.h>
> >
> > +#include <rte_cycles.h>
> >  #include <rte_spinlock.h>
> >  #include <rte_string_fns.h>
> >  #include <rte_ethdev.h>
> > @@ -61,6 +62,9 @@
> >
> >  #define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0)
> >
> > +// Configure how often we log "out of memory" messages (in seconds)
> > +#define KNI_SPAM_SUPPRESSION_PERIOD 60*10
> > +
> >  /**
> >   * KNI context
> >   */
> > @@ -592,6 +596,10 @@ kni_free_mbufs(struct rte_kni *kni)
> >  static void
> >  kni_allocate_mbufs(struct rte_kni *kni)
> >  {
> > +     static uint64_t no_mbufs = 0;
> > +     static uint64_t spam_filter = 0;
> > +     static uint64_t delayPeriod = 0;
> > +
> >       int i, ret;
> >       struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
> >
> > @@ -620,7 +628,18 @@ kni_allocate_mbufs(struct rte_kni *kni)
> >               pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
> >               if (unlikely(pkts[i] == NULL)) {
> >                       /* Out of memory */
> > -                     RTE_LOG(ERR, KNI, "Out of memory\n");
> > +                     no_mbufs++;
> > +
> > +                     // Memory leak or need to tune? Regardless, if we
> get here once,
> > +                     // we will get here a *lot*. Don't spam the logs!
> > +                     now = rte_get_tsc_cycles();
> > +                     if (!delayPeriod)
> > +                         delayPeriod = rte_get_tsc_hz() *
> KNI_SPAM_SUPPRESSION_PERIOD;
> > +
> > +                     if (!spam_filter || (now - spam_filter) >
> delayPeriod) {
> > +                             RTE_LOG(ERR, KNI, "No mbufs available
> (%llu)\n", (unsigned long long)no_mbufs);
> > +                             spam_filter = now;
> > +                     }
> >                       break;
> >               }
> >       }
>
> I agree whole completely with the intent of this.
> But just remove the log message completely. It doesn't
> help at all, use a statistic instead.
>

I'm fine with removing the log message completely. Can you point me to
where DPDK keeps stats generally? Stats like this are only useful if they
are accessible from some sort of monitoring process. There aren't any stats
in struct rte_kni right now.

If you want to do ratelimiting, then it is better to create
> a common function (see net_ratelimit() in Linux kernel) to
> have all code use the same method, rather than reinventing private
> code to do it.
>

I'll remove it.


> Minor style complaints:
>   * don't use camelCase, it goes against the style of the rest of the code.
>

ok


>   * don't use C++ style comments.
>

I didn't. I used C99 style comments :)


>   * always use rte_cycles() not TSC for things like this.
>

I don't see rte_cycles() defined anywhere. Did you mean some other function?

Please resubmit removing the log message and adding a statistic.
>
  

Patch

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index fdb7509..f89319c 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -40,6 +40,7 @@ 
 #include <unistd.h>
 #include <sys/ioctl.h>
 
+#include <rte_cycles.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
 #include <rte_ethdev.h>
@@ -61,6 +62,9 @@ 
 
 #define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0)
 
+// Configure how often we log "out of memory" messages (in seconds)
+#define KNI_SPAM_SUPPRESSION_PERIOD 60*10
+
 /**
  * KNI context
  */
@@ -592,6 +596,10 @@  kni_free_mbufs(struct rte_kni *kni)
 static void
 kni_allocate_mbufs(struct rte_kni *kni)
 {
+	static uint64_t no_mbufs = 0;
+	static uint64_t spam_filter = 0;
+	static uint64_t delayPeriod = 0;
+
 	int i, ret;
 	struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
 
@@ -620,7 +628,18 @@  kni_allocate_mbufs(struct rte_kni *kni)
 		pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
 		if (unlikely(pkts[i] == NULL)) {
 			/* Out of memory */
-			RTE_LOG(ERR, KNI, "Out of memory\n");
+			no_mbufs++;
+
+			// Memory leak or need to tune? Regardless, if we get here once,
+			// we will get here a *lot*. Don't spam the logs!
+			now = rte_get_tsc_cycles();
+			if (!delayPeriod)
+			    delayPeriod = rte_get_tsc_hz() * KNI_SPAM_SUPPRESSION_PERIOD;
+
+			if (!spam_filter || (now - spam_filter) > delayPeriod) {
+				RTE_LOG(ERR, KNI, "No mbufs available (%llu)\n", (unsigned long long)no_mbufs);
+				spam_filter = now;
+			}
 			break;
 		}
 	}