[v10,06/12] pdump: support pcapng and filtering

Message ID 20210916222630.71543-7-stephen@networkplumber.org (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series Packet capture framework enhancements |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Stephen Hemminger Sept. 16, 2021, 10:26 p.m. UTC
  This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 441 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 437 insertions(+), 128 deletions(-)
  

Comments

Pattan, Reshma Sept. 23, 2021, 4:11 p.m. UTC | #1
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger
> 
> +++ b/lib/meson.build
> +        'bpf',
>          'bitratestats',

If alphabetical order,  should bpf come after bitratestats?

> +/*
> + * Note: version numbers intentionally start at 3
> + * in order to catch any application built with older out
> + * version of DPDK using incompatible client request format.
> + */
>  enum pdump_version {
> -	V1 = 1
> +	PDUMP_CLIENT_LEGACY = 3,
> +	PDUMP_CLIENT_PCAPNG = 4,
The version numbering was internal to library,  applications do not have control over  it, can't we start  enumeration from 1?

>  struct pdump_request {
> +	uint16_t flags;
Why is the flags type changed from unit32_t  unint16_t?

> 
> +		 * Similar behavior to rte_bpf_eth callback.
> +		 * if BPF program returns zero value for a given packet,
> +		 * then it will be ignored.
> +		 */
Looks like wrong callback name referred in the comment, should be corrected?

> +		if (cbs->filter && rcs[i] == 0) {
Why do we need to do this again if some packets already filtered.


> +	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
> +	      p->ver == PDUMP_CLIENT_PCAPNG)) {
> +		PDUMP_LOG(ERR,
> +			  "incorrect client version %u\n", p->ver);
> +		return -EINVAL;
> +	}
This check is not useful here I guess, as we are setting the version in the library itself below.

> 
> +pdump_prepare_client_request(const char *device, uint16_t queue,
> +	req->queue = queue;
This assignment is done below as well, so here it is redundant I guess?

> -	} else {
> +		req->queue = queue;
>  	}
> 


> +	if (pdump_stats == NULL) {
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +			PDUMP_LOG(ERR,
> +				  "pdump not initialized\n");
Might be god to say "pdump stats" not initialized  instead of just saying "pdump"?

> 
> +/**
> + * Retrieve the packet capture statistics for a queue.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param stats
> + *   A pointer to structure of type *rte_pdump_stats* to be filled in.
> + * @return
> + *   Zero if successful. -1 on error and rte_errno is set.
> + */
Missing below experimental warning in   the above comments .

> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
  
Stephen Hemminger Sept. 23, 2021, 4:58 p.m. UTC | #2
On Thu, 23 Sep 2021 16:11:42 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> > +/*
> > + * Note: version numbers intentionally start at 3
> > + * in order to catch any application built with older out
> > + * version of DPDK using incompatible client request format.
> > + */
> >  enum pdump_version {
> > -	V1 = 1
> > +	PDUMP_CLIENT_LEGACY = 3,
> > +	PDUMP_CLIENT_PCAPNG = 4,  
> The version numbering was internal to library,  applications do not have control over  it, can't we start  enumeration from 1?

Although, DPDK does not support mixing versions between primary/secondary
process. Someone is sure to try.

I wanted to make sure that if user did something invalid like using
old pdump (built with DPDK 20.11) and new application that it would
fail in a direct manner.
  
Stephen Hemminger Sept. 23, 2021, 6:14 p.m. UTC | #3
On Thu, 23 Sep 2021 16:11:42 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> >  struct pdump_request {
> > +	uint16_t flags;  
> Why is the flags type changed from unit32_t  unint16_t?

Only to pack the structure. They were unused.
  
Stephen Hemminger Sept. 23, 2021, 6:23 p.m. UTC | #4
On Thu, 23 Sep 2021 16:11:42 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> > 
> > +		 * Similar behavior to rte_bpf_eth callback.
> > +		 * if BPF program returns zero value for a given packet,
> > +		 * then it will be ignored.
> > +		 */  
> Looks like wrong callback name referred in the comment, should be corrected?

It really is pcap_offline_filter() and Linux kernel socket filter.

> > +		if (cbs->filter && rcs[i] == 0) {  
> Why do we need to do this again if some packets already filtered.

The earlier call (rte_bpf_exec_burst) returns the number of packets that were
processed. Actually, the return value there is always equal n.
So this code is the filtering, there was an issue with checking return value of exec_burst.
  
Pattan, Reshma Sept. 24, 2021, 3:33 p.m. UTC | #5
> -----Original Message-----
> > >
> > > +		 * Similar behavior to rte_bpf_eth callback.
> > > +		 * if BPF program returns zero value for a given packet,
> > > +		 * then it will be ignored.
> > > +		 */
> > Looks like wrong callback name referred in the comment, should be
> corrected?
> 
> It really is pcap_offline_filter() and Linux kernel socket filter.

Oh ok, rcs[i] is basically "return value of the filter program. This will be zero if the packet doesn't match the filter and non-zero if the packet matches the filter." 
got it now on why is the below if check.

> 
> > > +		if (cbs->filter && rcs[i] == 0) {
  

Patch

diff --git a/lib/meson.build b/lib/meson.build
index ba88e9eabc58..1da521ea6185 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@  libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@  libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@  libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@ 
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..abc28fcee0ad 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@ 
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@  enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatible client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,140 @@  static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter &&
+	    rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts) == 0) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts,
+				   __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +203,7 @@  pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +227,32 @@  pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +261,7 @@  pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +290,47 @@  set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +368,9 @@  set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +378,9 @@  set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -332,7 +406,7 @@  pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 		resp->err_value = set_pdump_rxtx_cbs(cli_req);
 	}
 
-	strlcpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_resp.len_param = sizeof(*resp);
 	mp_resp.num_fds = 0;
 	if (rte_mp_reply(&mp_resp, peer) < 0) {
@@ -347,8 +421,18 @@  pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +476,21 @@  pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +517,12 @@  pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,26 +531,26 @@  pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	rte_strscpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
-	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_req.len_param = sizeof(*req);
 	mp_req.num_fds = 0;
 	if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
@@ -477,11 +568,17 @@  pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +593,42 @@  rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +637,30 @@  rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +676,8 @@  rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +692,66 @@  rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..be3fd14c4bd3 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@ 
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@  enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@  rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@  rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@  rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@  rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@  int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@  DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};