[dpdk-dev,RFC,v3] latencystats: added new library for latency stats

Message ID 1476711584-25377-1-git-send-email-reshma.pattan@intel.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Pattan, Reshma Oct. 17, 2016, 1:39 p.m. UTC
Library is designed to calculate latency stats and report them to the
application when queried. Library measures minimum, average, maximum
latencies and jitter in nano seconds.
Current implementation supports global latency stats, i.e. per application
stats.

Added new field to mbuf struct to mark the packet arrival time on Rx and
use the times tamp to measure the latency on Tx.

Modified dpdk-procinfo process to display the new stats.

APIs:

Added APIs to initialize and un initialize latency stats
calculation.
Added API to retrieve latency stats names and values.

Functionality:

*Library will register ethdev Rx/Tx callbacks for each active port,
queue combinations.
*Library will register latency stats names with new stats library, which is under
design for now.
*Rx packets will be marked with time stamp on each sampling interval.
*On Tx side, packets with time stamp will be considered for calculating
the minimum, maximum, average latencies and jitter.
*Average latency is calculated by summing all the latencies measured for each
time stamped packet and dividing that by total time stamped packets.
*Minimum and maximum latencies will be low and high latency values observed
so far.
*Jitter calculation is done based on inter packet delay variation.
*Measured stats can be retrieved via get API of the libray (or)
by calling generic get API of the new stats library, in this case
callback is provided to update the stats  into new stats library.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
---
 MAINTAINERS                                        |   4 +
 app/proc_info/main.c                               |  60 ++++
 app/test-pmd/testpmd.c                             |   9 +
 config/common_base                                 |  10 +
 lib/Makefile                                       |   1 +
 lib/librte_latencystats/Makefile                   |  57 ++++
 lib/librte_latencystats/rte_latencystats.c         | 378 +++++++++++++++++++++
 lib/librte_latencystats/rte_latencystats.h         | 141 ++++++++
 .../rte_latencystats_version.map                   |  10 +
 lib/librte_mbuf/rte_mbuf.h                         |   3 +
 mk/rte.app.mk                                      |   1 +
 11 files changed, 674 insertions(+)
 create mode 100644 lib/librte_latencystats/Makefile
 create mode 100644 lib/librte_latencystats/rte_latencystats.c
 create mode 100644 lib/librte_latencystats/rte_latencystats.h
 create mode 100644 lib/librte_latencystats/rte_latencystats_version.map
  

Comments

Pattan, Reshma Oct. 18, 2016, 10:44 a.m. UTC | #1
Hi,

The below latency stats RFC adds new field to second cache line of mbuf. This is not an ABI break.
But I would like to emphasize on this  point so you can have a look into the changes and provide comments.

Thanks,
Reshma

> -----Original Message-----
> From: Pattan, Reshma
> Sent: Monday, October 17, 2016 2:40 PM
> To: dev@dpdk.org
> Cc: Pattan, Reshma <reshma.pattan@intel.com>
> Subject: [RFC v3] latencystats: added new library for latency stats
> 
> Library is designed to calculate latency stats and report them to the application
> when queried. Library measures minimum, average, maximum latencies and
> jitter in nano seconds.
> Current implementation supports global latency stats, i.e. per application stats.
> 
> Added new field to mbuf struct to mark the packet arrival time on Rx and use the
> times tamp to measure the latency on Tx.
> 
> Modified dpdk-procinfo process to display the new stats.
> 
> APIs:
> 
> Added APIs to initialize and un initialize latency stats calculation.
> Added API to retrieve latency stats names and values.
> 
> Functionality:
> 
> *Library will register ethdev Rx/Tx callbacks for each active port, queue
> combinations.
> *Library will register latency stats names with new stats library, which is under
> design for now.
> *Rx packets will be marked with time stamp on each sampling interval.
> *On Tx side, packets with time stamp will be considered for calculating the
> minimum, maximum, average latencies and jitter.
> *Average latency is calculated by summing all the latencies measured for each
> time stamped packet and dividing that by total time stamped packets.
> *Minimum and maximum latencies will be low and high latency values observed
> so far.
> *Jitter calculation is done based on inter packet delay variation.
> *Measured stats can be retrieved via get API of the libray (or) by calling generic
> get API of the new stats library, in this case callback is provided to update the
> stats  into new stats library.
> 
> Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 8f5fa82..2db3365 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -698,3 +698,7 @@  F: examples/tep_termination/
 F: examples/vmdq/
 F: examples/vmdq_dcb/
 F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
+
+Latency Stats
+M: Reshma Pattan <reshma.pattan@intel.com>
+F: lib/librte_latencystats/
diff --git a/app/proc_info/main.c b/app/proc_info/main.c
index 2c56d10..56c9509 100644
--- a/app/proc_info/main.c
+++ b/app/proc_info/main.c
@@ -57,6 +57,7 @@ 
 #include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_string_fns.h>
+#include <rte_stats.h>
 
 /* Maximum long option length for option parsing. */
 #define MAX_LONG_OPT_SZ 64
@@ -75,6 +76,8 @@  static uint32_t reset_xstats;
 /**< Enable memory info. */
 static uint32_t mem_info;
 
+static uint32_t enable_latnbit_stats = 1;
+
 /**< display usage */
 static void
 proc_info_usage(const char *prgname)
@@ -301,6 +304,60 @@  nic_xstats_clear(uint8_t port_id)
 	printf("\n  NIC extended statistics for port %d cleared\n", port_id);
 }
 
+static void
+latnbit_stats_display(void)
+{
+	struct rte_stat_value *lat_stats;
+	struct rte_stat_name *names;
+	int len, ret;
+	static const char *nic_stats_border = "########################";
+
+	memset(&lat_stats, 0, sizeof(struct rte_stat_value));
+	len = rte_stats_get_names(NULL, 0);
+	if (len < 0) {
+		printf("Cannot get latency and bitrate stats count\n");
+		return;
+	}
+
+	lat_stats = malloc(sizeof(struct rte_stat_value) * len);
+	if (lat_stats == NULL) {
+		printf("Cannot allocate memory for latency and bitrate stats\n");
+		return;
+	}
+
+	names =  malloc(sizeof(struct rte_stat_name) * len);
+	if (names == NULL) {
+		printf("Cannot allocate memory for latency and bitrate stats names\n");
+		free(lat_stats);
+		return;
+	}
+
+	if (len != rte_stats_get_names(names, len)) {
+		printf("Cannot get latency and bitrate stats names\n");
+		free(lat_stats);
+		free(names);
+		return;
+	}
+
+	printf("###### Latency and bitrate statistics #########\n");
+	printf("%s############################\n", nic_stats_border);
+	ret = rte_stats_get_values(RTE_STATS_NONPORT, lat_stats, len);
+	if (ret < 0 || ret > len) {
+		printf("Cannot get latency and bitrate stats\n");
+		free(lat_stats);
+		free(names);
+		return;
+	}
+
+	int i;
+	for (i = 0; i < len; i++)
+		printf("%s: %"PRIu64"\n", names[i].name, lat_stats[i].value);
+
+	printf("%s############################\n", nic_stats_border);
+	free(lat_stats);
+	free(names);
+}
+
 int
 main(int argc, char **argv)
 {
@@ -363,5 +420,8 @@  main(int argc, char **argv)
 		}
 	}
 
+	if (enable_latnbit_stats)
+		latnbit_stats_display();
+
 	return 0;
 }
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6185be6..09dea69 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -78,6 +78,9 @@ 
 #ifdef RTE_LIBRTE_PDUMP
 #include <rte_pdump.h>
 #endif
+#ifdef RTE_LIBRTE_LATENCY_STATS
+#include <rte_latencystats.h>
+#endif
 
 #include "testpmd.h"
 
@@ -2070,6 +2073,9 @@  signal_handler(int signum)
 		/* uninitialize packet capture framework */
 		rte_pdump_uninit();
 #endif
+#ifdef RTE_LIBRTE_LATENCY_STATS
+		rte_latencystats_uninit();
+#endif
 		force_quit();
 		/* exit with the expected status */
 		signal(signum, SIG_DFL);
@@ -2128,6 +2134,9 @@  main(int argc, char** argv)
 	FOREACH_PORT(port_id, ports)
 		rte_eth_promiscuous_enable(port_id);
 
+#ifdef RTE_LIBRTE_LATENCY_STATS
+	rte_latencystats_init(1, NULL);
+#endif
 #ifdef RTE_LIBRTE_CMDLINE
 	if (interactive == 1) {
 		if (auto_start) {
diff --git a/config/common_base b/config/common_base
index 81b8e1e..9584727 100644
--- a/config/common_base
+++ b/config/common_base
@@ -557,9 +557,19 @@  CONFIG_RTE_LIBRTE_PDUMP=y
 #
 # Compile vhost user library
 #
+# Compile vhost library
+# fuse-devel is needed to run vhost-cuse.
+# fuse-devel enables user space char driver development
+# vhost-user is turned on by default.
+#
 CONFIG_RTE_LIBRTE_VHOST=n
 CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
+#
+# Compile the latency stats library
+#
+CONFIG_RTE_LIBRTE_LATENCY_STATS=y
+
 
 #
 # Compile vhost PMD
diff --git a/lib/Makefile b/lib/Makefile
index cbaeefa..7af7e1c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -59,6 +59,7 @@  DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DIRS-$(CONFIG_RTE_LIBRTE_STATS) += librte_stats
+DIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += librte_latencystats
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_latencystats/Makefile b/lib/librte_latencystats/Makefile
new file mode 100644
index 0000000..2dd0381
--- /dev/null
+++ b/lib/librte_latencystats/Makefile
@@ -0,0 +1,57 @@ 
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_latencystats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+LDLIBS += -lm
+LDLIBS += -lpthread
+
+EXPORT_MAP := rte_latencystats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) := rte_latencystats.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_LATENCY_STATS)-include := rte_latencystats.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_stats
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_latencystats/rte_latencystats.c b/lib/librte_latencystats/rte_latencystats.c
new file mode 100644
index 0000000..3d8b053
--- /dev/null
+++ b/lib/librte_latencystats/rte_latencystats.c
@@ -0,0 +1,378 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/types.h>
+#include <stdbool.h>
+#include <math.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_log.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_stats.h>
+#include <rte_memzone.h>
+#include <rte_lcore.h>
+#include <rte_timer.h>
+
+#include "rte_latencystats.h"
+
+/** Nano seconds per second */
+#define NS_PER_SEC 1E9
+
+/** Clock cycles per nano second */
+#define CYCLES_PER_NS (rte_get_timer_hz() / NS_PER_SEC)
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_LATENCY_STATS RTE_LOGTYPE_USER1
+
+static pthread_t latency_stats_thread;
+static const char *MZ_RTE_LATENCY_STATS = "rte_latencystats";
+static int latency_stats_index;
+static uint64_t sampIntvl;
+static uint64_t timer_tsc;
+static uint64_t prev_tsc;
+
+struct rte_latency_stats {
+	float min_latency; /**< Minimum latency in nano seconds */
+	float avg_latency; /**< Average latency in nano seconds */
+	float max_latency; /**< Maximum latency in nano seconds */
+	float jitter; /** Latency variation between to packets */
+	uint64_t total_sampl_pkts;
+} *glob_stats;
+
+static struct rxtx_cbs {
+	struct rte_eth_rxtx_callback *cb;
+} rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+	tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+
+struct latency_stats_nameoff {
+	char name[RTE_ETH_XSTATS_NAME_SIZE];
+	unsigned int offset;
+};
+
+static const struct latency_stats_nameoff lat_stats_strings[] = {
+	{"min_latency_ns", offsetof(struct rte_latency_stats, min_latency)},
+	{"avg_latency_ns", offsetof(struct rte_latency_stats, avg_latency)},
+	{"max_latency_ns", offsetof(struct rte_latency_stats, max_latency)},
+	{"jitter_ns", offsetof(struct rte_latency_stats, jitter)},
+};
+
+#define NUM_LATENCY_STATS (sizeof(lat_stats_strings) / \
+				sizeof(lat_stats_strings[0]))
+
+static __attribute__((noreturn)) void *
+report_latency_stats(__rte_unused void *arg)
+{
+	for (;;) {
+		unsigned int i;
+		float *stats_ptr = NULL;
+		uint64_t values[NUM_LATENCY_STATS] = {0};
+		int ret;
+
+		for (i = 0; i < NUM_LATENCY_STATS; i++) {
+			stats_ptr = RTE_PTR_ADD(glob_stats,
+					lat_stats_strings[i].offset);
+			if (i == 1)
+				values[i] = (uint64_t)floor((*stats_ptr)/
+				(CYCLES_PER_NS * glob_stats->total_sampl_pkts));
+			else
+				values[i] = (uint64_t)floor((*stats_ptr)/
+						CYCLES_PER_NS);
+		}
+
+		ret = rte_stats_update_metrics(RTE_STATS_NONPORT, latency_stats_index,
+				values, NUM_LATENCY_STATS);
+		if (ret < 0)
+			RTE_LOG(INFO, LATENCY_STATS, "Failed to push the stats\n");
+	}
+}
+
+static void
+rte_latencystats_fill_values(struct rte_stat_value *values)
+{
+	unsigned int i;
+	float *stats_ptr = NULL;
+
+	for (i = 0; i < NUM_LATENCY_STATS; i++) {
+		stats_ptr = RTE_PTR_ADD(glob_stats,
+				lat_stats_strings[i].offset);
+		values[i].key = i;
+		if (i == 1)
+			values[i].value = (uint64_t)floor((*stats_ptr)/
+			(CYCLES_PER_NS * glob_stats->total_sampl_pkts));
+		else
+			values[i].value = (uint64_t)floor((*stats_ptr)/
+						CYCLES_PER_NS);
+	}
+}
+
+static uint16_t
+add_time_stamps(uint8_t pid __rte_unused,
+		uint16_t qid __rte_unused,
+		struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		uint16_t max_pkts __rte_unused,
+		void *user_cb __rte_unused)
+{
+	unsigned int i;
+	uint64_t diff_tsc, now;
+
+	/* for every sample interval,
+	 * time stamp is marked on one received packet
+	 */
+	now = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++) {
+		diff_tsc = now - prev_tsc;
+		timer_tsc += diff_tsc;
+		if (timer_tsc >= sampIntvl) {
+			pkts[i]->time_arrived = now;
+			timer_tsc = 0;
+		}
+		prev_tsc = now;
+		now = rte_rdtsc();
+	}
+
+	return nb_pkts;
+}
+
+static uint16_t
+calc_latency(uint8_t pid __rte_unused,
+		uint16_t qid __rte_unused,
+		struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *_ __rte_unused)
+{
+	unsigned int i, cnt = 0;
+	uint64_t now;
+	float latency[nb_pkts];
+	static float prev_latency;
+
+	now = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++) {
+		if (pkts[i]->time_arrived)
+			latency[cnt++] = now - pkts[i]->time_arrived;
+	}
+
+	for (i = 0; i < cnt; i++)
+	{
+		/**
+		*The jitter is calculated as statistical mean of interpacket
+		*delay variation. The "jitter estimate" is computed by taking the
+		*absolute values of the ipdv sequence and applying an exponential
+		*filter with parameter 1/16 to generate the estimate. i.e
+		*J=J+(|D(i-1,i)|-J)/16. Where J is jitter, D(i-1,i) is difference in
+		*latency of two consecutive packets i-1 and i.
+		*Reference: Calculated as per RFC 5481, sec 4.1, RFC 3393 sec 4.5,
+		*RFC 1889 sec.
+		*/
+		glob_stats->jitter +=  (abs(prev_latency - latency[i])
+					- glob_stats->jitter)/16;
+		if (glob_stats->min_latency == 0)
+			glob_stats->min_latency = latency[i];
+		else if (latency[i] < glob_stats->min_latency)
+			glob_stats->min_latency = latency[i];
+		else if (latency[i] > glob_stats->max_latency)
+			glob_stats->max_latency = latency[i];
+		glob_stats->avg_latency += latency[i]/16;
+		glob_stats->total_sampl_pkts++;
+		prev_latency = latency[i];
+	}
+
+	return nb_pkts;
+}
+
+int
+rte_latencystats_init(uint64_t samp_intvl,
+		rte_latency_stats_flow_type_fn user_cb)
+{
+	unsigned int i;
+	uint8_t pid;
+	uint16_t qid;
+	struct rxtx_cbs *cbs = NULL;
+	const uint8_t nb_ports = rte_eth_dev_count();
+	const char *ptr_strings[NUM_LATENCY_STATS] = {0};
+	const struct rte_memzone *mz = NULL;
+	const unsigned int flags = 0;
+
+	/** Allocate stats in shared memory fo muliti process support */
+	mz = rte_memzone_reserve(MZ_RTE_LATENCY_STATS, sizeof(*glob_stats),
+					rte_socket_id(), flags);
+	if (mz == NULL) {
+		RTE_LOG(ERR, LATENCY_STATS, "Cannot reserve memory: %s:%d\n",
+			__func__, __LINE__);
+		return -ENOMEM;
+	}
+
+	glob_stats = mz->addr;
+	samp_intvl *= CYCLES_PER_NS;
+
+	/** Register latency stats with stats library */
+	for (i = 0; i < NUM_LATENCY_STATS; i++)
+		ptr_strings[i] = lat_stats_strings[i].name;
+
+	latency_stats_index = rte_stats_reg_metrics(ptr_strings,
+							NUM_LATENCY_STATS);
+	if (latency_stats_index < 0) {
+		RTE_LOG(DEBUG, LATENCY_STATS,
+			"Failed to register latency stats names\n");
+		return -1;
+	}
+
+	/** Register Rx/Tx callbacks */
+	for (pid = 0; pid < nb_ports; pid++) {
+		struct rte_eth_dev_info dev_info;
+		rte_eth_dev_info_get(pid, &dev_info);
+		for (qid = 0; qid < dev_info.nb_rx_queues; qid++) {
+			cbs = &rx_cbs[pid][qid];
+			cbs->cb = rte_eth_add_first_rx_callback(pid, qid,
+					add_time_stamps, user_cb);
+			if (!cbs->cb)
+				RTE_LOG(INFO, LATENCY_STATS, "Failed to "
+					"register Rx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+		for (qid = 0; qid < dev_info.nb_tx_queues; qid++) {
+			cbs = &tx_cbs[pid][qid];
+			cbs->cb =  rte_eth_add_tx_callback(pid, qid,
+					calc_latency, user_cb);
+			if (!cbs->cb)
+				RTE_LOG(INFO, LATENCY_STATS, "Failed to "
+					"register Tx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+	}
+
+	int ret = 0;
+	char thread_name[RTE_MAX_THREAD_NAME_LEN];
+
+	/** Create the host thread to update latency stats to stats library */
+	ret = pthread_create(&latency_stats_thread, NULL, report_latency_stats, NULL);
+	if (ret != 0) {
+		RTE_LOG(ERR, LATENCY_STATS,
+			"Failed to create the latency stats thread:%s, %s:%d\n",
+			strerror(errno), __func__, __LINE__);
+		return -1;
+	}
+	/** Set thread_name for aid in debugging */
+	snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "latency-stats-thread");
+	ret = rte_thread_setname(latency_stats_thread, thread_name);
+	if (ret != 0)
+		RTE_LOG(DEBUG, LATENCY_STATS,
+			"Failed to set thread name for latency stats handling\n");
+
+	return 0;
+}
+
+int
+rte_latencystats_uninit(void)
+{
+	uint8_t pid;
+	uint16_t qid;
+	int ret = 0;
+	struct rxtx_cbs *cbs = NULL;
+	const uint8_t nb_ports = rte_eth_dev_count();
+
+	/** De register Rx/Tx callbacks */
+	for (pid = 0; pid < nb_ports; pid++) {
+		struct rte_eth_dev_info dev_info;
+		rte_eth_dev_info_get(pid, &dev_info);
+		for (qid = 0; qid < dev_info.nb_rx_queues; qid++) {
+			cbs = &rx_cbs[pid][qid];
+			ret = rte_eth_remove_rx_callback(pid, qid, cbs->cb);
+			if (ret)
+				RTE_LOG(INFO, LATENCY_STATS, "failed to "
+					"remove Rx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+		for (qid = 0; qid < dev_info.nb_tx_queues; qid++) {
+			cbs = &tx_cbs[pid][qid];
+			ret = rte_eth_remove_tx_callback(pid, qid, cbs->cb);
+			if (ret)
+				RTE_LOG(INFO, LATENCY_STATS, "failed to "
+					"remove Tx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+	}
+
+	/** Cancel the thread */
+	ret = pthread_cancel(latency_stats_thread);
+	if (ret != 0) {
+		RTE_LOG(ERR, LATENCY_STATS,
+			"Failed to cancel latency stats update thread:"
+			"%s,%s:%d\n",
+			strerror(errno), __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+int
+rte_latencystats_get_names(struct rte_stat_name *names, uint16_t size)
+{
+	unsigned int i;
+
+	if (names == NULL || size < NUM_LATENCY_STATS)
+		return NUM_LATENCY_STATS;
+
+	for (i = 0; i < NUM_LATENCY_STATS; i++)
+		snprintf(names[i].name, sizeof(names[i].name),
+				"%s", lat_stats_strings[i].name);
+
+	return NUM_LATENCY_STATS;
+}
+
+int
+rte_latencystats_get(struct rte_stat_value *values, uint16_t size)
+{
+	if (size < NUM_LATENCY_STATS || values == NULL)
+		return NUM_LATENCY_STATS;
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		const struct rte_memzone *mz;
+		mz = rte_memzone_lookup(MZ_RTE_LATENCY_STATS);
+		if (mz == NULL) {
+			RTE_LOG(ERR, LATENCY_STATS,
+				"Latency stats memzone not found\n");
+			return -ENOMEM;
+		}
+		glob_stats =  mz->addr;
+	}
+
+	/* Retrieve latency stats */
+	rte_latencystats_fill_values(values);
+
+	return NUM_LATENCY_STATS;
+}
diff --git a/lib/librte_latencystats/rte_latencystats.h b/lib/librte_latencystats/rte_latencystats.h
new file mode 100644
index 0000000..9384c4a
--- /dev/null
+++ b/lib/librte_latencystats/rte_latencystats.h
@@ -0,0 +1,141 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_LATENCYSTATS_H_
+#define _RTE_LATENCYSTATS_H_
+
+/**
+ * @file
+ * RTE latency stats
+ *
+ * library to provide application and flow based latency stats.
+ */
+
+#include <rte_stats.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Function type used for identifting flow types of a Rx packet.
+ *
+ * The callback function is called on Rx for each packet.
+ * This function is used for flow based latency calculations.
+ *
+ * @param pkt
+ *   Packet that has to be identified with its flow types.
+ * @param user_param
+ *   The arbitrary user parameter passed in by the application when
+ *   the callback was originally configured.
+ * @return
+ *   The flow_mask, representing the multiple flow types of a packet.
+ */
+typedef uint16_t (*rte_latency_stats_flow_type_fn)(struct rte_mbuf *pkt,
+							void *user_param);
+
+/**
+ * @internal
+ *  Registers Rx/Tx callbacks for each active port, queue.
+ *
+ * @param sampIntvl
+ *  Sampling time period in nano seconds, at which packet
+ *  should be marked with time stamp.
+ * @param user_cb
+ *  User callback to be called to get flow types of a packet.
+ *  Used for flow based latency calculation.
+ *  If the value is NULL, global stats will be calculated,
+ *  else flow based stats will be calculated.
+ *  @return
+ *   -1     : On error
+ *   -ENOMEM: On error
+ *    0     : On sucess
+ */
+int rte_latencystats_init(uint64_t samp_intvl,
+			rte_latency_stats_flow_type_fn user_cb);
+
+/**
+ * @internal
+ *  Removes registered Rx/Tx callbacks for each active port, queue.
+ *  @return
+ *   -1: On error
+ *    0: On suces
+ */
+int rte_latencystats_uninit(void);
+
+/**
+ * Retrieve names of latency statistics
+ *
+ * @param names
+ *  Block of memory to insert names into. Must be at least size in capacity.
+ *  If set to NULL, function returns required capacity.
+ * @param size
+ *  Capacity of latency stats names (number of names).
+ * @return
+ *   - positive value lower or equal to size: success. The return value
+ *     is the number of entries filled in the stats table.
+ *   - positive value higher than size: error, the given statistics table
+ *     is too small. The return value corresponds to the size that should
+ *     be given to succeed. The entries in the table are not valid and
+ *     shall not be used by the caller.
+ */
+int rte_latencystats_get_names(struct rte_stat_name *names,
+				uint16_t size);
+
+/**
+ * Retrieve latency statistics.
+ *
+ * @param values
+ *   A pointer to a table of structure of type *rte_stat_value*
+ *   to be filled with latency statistics ids and values.
+ *   This parameter can be set to NULL if size is 0.
+ * @param size
+ *   The size of the stats table, which should be large enough to store
+ *   all the latency stats.
+ * @return
+ *   - positive value lower or equal to size: success. The return value
+ *     is the number of entries filled in the stats table.
+ *   - positive value higher than size: error, the given statistics table
+ *     is too small. The return value corresponds to the size that should
+ *     be given to succeed. The entries in the table are not valid and
+ *     shall not be used by the caller.
+ *   -ENOMEM: On failure.
+ */
+int rte_latencystats_get(struct rte_stat_value *values,
+			uint16_t size);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_LATENCYSTATS_H_ */
diff --git a/lib/librte_latencystats/rte_latencystats_version.map b/lib/librte_latencystats/rte_latencystats_version.map
new file mode 100644
index 0000000..82dc5a7
--- /dev/null
+++ b/lib/librte_latencystats/rte_latencystats_version.map
@@ -0,0 +1,10 @@ 
+DPDK_16.11 {
+	global:
+
+	rte_latencystats_get;
+	rte_latencystats_get_names;
+	rte_latencystats_init;
+	rte_latencystats_uninit;
+
+	local: *;
+};
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 109e666..4933bfc 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -493,6 +493,9 @@  struct rte_mbuf {
 
 	/** Timesync flags for use with IEEE1588. */
 	uint16_t timesync;
+
+	/** Timestamp for measuring latency stats. */
+	uint64_t time_arrived;
 } __rte_cache_aligned;
 
 /**
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 07ea98f..8c7d3d8 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -69,6 +69,7 @@  _LDLIBS-$(CONFIG_RTE_LIBRTE_TABLE)          += -lrte_table
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PORT)           += -lrte_port
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PDUMP)          += -lrte_pdump
+_LDLIBS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS)   += -lrte_latencystats
 _LDLIBS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)    += -lrte_distributor
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
 _LDLIBS-$(CONFIG_RTE_LIBRTE_IP_FRAG)        += -lrte_ip_frag