[dpdk-dev,v3,2/4] net/mrvl: add mrvl net pmd driver

Message ID 1507031500-11473-3-git-send-email-tdu@semihalf.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Tomasz Duszynski Oct. 3, 2017, 11:51 a.m. UTC
  Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
Driver is based on external, publicly available, light-weight Marvell
MUSDK library that provides access to network packet processor.

Driver comes with support for the following features:

* Speed capabilities
* Link status
* Queue start/stop
* MTU update
* Jumbo frame
* Promiscuous mode
* Allmulticast mode
* Unicast MAC filter
* Multicast MAC filter
* RSS hash
* VLAN filter
* CRC offload
* L3 checksum offload
* L4 checksum offload
* Packet type parsing
* Basic stats
* Stats per queue

Driver was engineered cooperatively by Semihalf and Marvell teams.

Semihalf:
Jacek Siuda <jck@semihalf.com>
Tomasz Duszynski <tdu@semihalf.com>

Marvell:
Dmitri Epshtein <dima@marvell.com>
Natalie Samsonov <nsamsono@marvell.com>

Signed-off-by: Jacek Siuda <jck@semihalf.com>
Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
---
 v3:
 * Removed LINE_SPACING, MULTILINE_DEREFERENCE and SPLIT_STRING
   checkpatch warnings.
 * Removed unnecessary forward declarations.
 * Fixed whitespace warnings.

 v2:
 * Removed bunch of checkpatch warnings about unnecessary parentheses.

 config/common_base                        |    7 +
 drivers/net/Makefile                      |    2 +
 drivers/net/mrvl/Makefile                 |   69 +
 drivers/net/mrvl/mrvl_ethdev.c            | 2274 +++++++++++++++++++++++++++++
 drivers/net/mrvl/mrvl_ethdev.h            |  114 ++
 drivers/net/mrvl/mrvl_qos.c               |  628 ++++++++
 drivers/net/mrvl/mrvl_qos.h               |  112 ++
 drivers/net/mrvl/rte_pmd_mrvl_version.map |    3 +
 mk/rte.app.mk                             |    1 +
 9 files changed, 3210 insertions(+)
 create mode 100644 drivers/net/mrvl/Makefile
 create mode 100644 drivers/net/mrvl/mrvl_ethdev.c
 create mode 100644 drivers/net/mrvl/mrvl_ethdev.h
 create mode 100644 drivers/net/mrvl/mrvl_qos.c
 create mode 100644 drivers/net/mrvl/mrvl_qos.h
 create mode 100644 drivers/net/mrvl/rte_pmd_mrvl_version.map

--
2.7.4
  

Comments

Ferruh Yigit Oct. 4, 2017, 12:24 a.m. UTC | #1
On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
> Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
> Driver is based on external, publicly available, light-weight Marvell
> MUSDK library that provides access to network packet processor.
> 
> Driver comes with support for the following features:
> 
> * Speed capabilities
> * Link status
> * Queue start/stop
> * MTU update
> * Jumbo frame
> * Promiscuous mode
> * Allmulticast mode
> * Unicast MAC filter
> * Multicast MAC filter
> * RSS hash
> * VLAN filter
> * CRC offload
> * L3 checksum offload
> * L4 checksum offload
> * Packet type parsing
> * Basic stats
> * Stats per queue

I have more detailed comments but in high level,
what do you think splitting this patch into three patches:
- Skeleton
- Add Rx/Tx support
- Add features, like MTU update or Promiscuous etc.. support

> 
> Driver was engineered cooperatively by Semihalf and Marvell teams.
> 
> Semihalf:
> Jacek Siuda <jck@semihalf.com>
> Tomasz Duszynski <tdu@semihalf.com>
> 
> Marvell:
> Dmitri Epshtein <dima@marvell.com>
> Natalie Samsonov <nsamsono@marvell.com>
> 
> Signed-off-by: Jacek Siuda <jck@semihalf.com>
> Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>

<...>

> +static struct rte_vdev_driver pmd_mrvl_drv = {
> +	.probe = rte_pmd_mrvl_probe,
> +	.remove = rte_pmd_mrvl_remove,
> +};
> +
> +RTE_PMD_REGISTER_VDEV(net_mrvl, pmd_mrvl_drv);

Please help me understand.

This driver implemented as virtual driver, because:
With the help of custom kernel modules, musdk library already provides
userspace datapath support. This PMD is an interface to musdk library.
Is this correct?

If so, just thinking loud:
- Why not implement this PMD directly on top of kernel interface,
removing musdk layer completely?
- How big problem that this PMD depends on custom kernel code?
- How library and custom kernel code delivered? For which platforms?

<....>
  
Ferruh Yigit Oct. 4, 2017, 12:28 a.m. UTC | #2
On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
> Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
> Driver is based on external, publicly available, light-weight Marvell
> MUSDK library that provides access to network packet processor.
> 
> Driver comes with support for the following features:
> 
> * Speed capabilities
> * Link status
> * Queue start/stop
> * MTU update
> * Jumbo frame
> * Promiscuous mode
> * Allmulticast mode
> * Unicast MAC filter
> * Multicast MAC filter
> * RSS hash
> * VLAN filter
> * CRC offload
> * L3 checksum offload
> * L4 checksum offload
> * Packet type parsing
> * Basic stats
> * Stats per queue
> 
> Driver was engineered cooperatively by Semihalf and Marvell teams.
> 
> Semihalf:
> Jacek Siuda <jck@semihalf.com>
> Tomasz Duszynski <tdu@semihalf.com>
> 
> Marvell:
> Dmitri Epshtein <dima@marvell.com>
> Natalie Samsonov <nsamsono@marvell.com>
> 
> Signed-off-by: Jacek Siuda <jck@semihalf.com>
> Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>

<...>

> +++ b/config/common_base
> @@ -262,6 +262,13 @@ CONFIG_RTE_LIBRTE_NFP_PMD=n
>  CONFIG_RTE_LIBRTE_NFP_DEBUG=n
> 
>  #
> +# Compile Marvell PMD driver
> +#
> +CONFIG_RTE_LIBRTE_MRVL_PMD=n
> +CONFIG_RTE_LIBRTE_MRVL_DEBUG=n
> +CONFIG_RTE_MRVL_MUSDK_DMA_MEMSIZE=41943040

Is dma memsize needs to be a configuration option?

<...>

> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +ifneq ($(MAKECMDGOALS),clean)
> +ifneq ($(MAKECMDGOALS),config)
> +ifeq ($(LIBMUSDK_PATH),)
> +$(error "Please define LIBMUSDK_PATH environment variable")

Not sure how to resolve this dependency.
What do you think adding this as configuration option?

Or DPDK just adds the -lmusdk external dependency and while compiling
for marvel EXTRA_LDFLAGS parameter should be pass with
"-L$(LIBMUSDK_PATH)" and this can be documented in marvel doc. What do
you think?

> +endif
> +ifeq ($(CONFIG_RTE_LIBRTE_CFGFILE),n)
> +$(error "RTE_LIBRTE_CFGFILE must be enabled in configuration!")

This can be also handled in drivers/net/Makefile, it can be possible to
add check there for LIBRTE_CFGFILE dependency.

> +endif
> +endif
> +endif
> +
> +# library name
> +LIB = librte_pmd_mrvl.a
> +
> +# library version
> +LIBABIVER := 1
> +
> +# versioning export map
> +EXPORT_MAP := rte_pmd_mrvl_version.map
> +
> +# external library dependencies
> +CFLAGS += -I$(LIBMUSDK_PATH)/include
> +CFLAGS += -DMVCONF_ARCH_DMA_ADDR_T_64BIT
> +CFLAGS += -DCONF_PP2_BPOOL_COOKIE_SIZE=32
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -O3
> +LDLIBS += -L$(LIBMUSDK_PATH)/lib

This can be LDFLAGS instead of LDLIBS

> +LDLIBS += -lmusdk
> +
> +# library source files
> +SRCS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl_ethdev.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl_qos.c
> +
> +# library dependencies
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += lib/librte_cfgfile

These variables no more used, you can drop this. drivers/net/Makefile
used for this, you are already updating that file, librte_cfgfile needs
to be added there.

> +
> +include $(RTE_SDK)/mk/rte.lib.mk

<...>

> +/*
> + * To use buffer harvesting based on loopback port shadow queue structure
> + * was introduced for buffers information bookkeeping.
> + *
> + * Before sending the packet, related buffer information (pp2_buff_inf) is
> + * stored in shadow queue. After packet is transmitted no longer used
> + * packet buffer is released back to it's original hardware pool,
> + * on condition it originated from interface.
> + * In case it  was generated by application itself i.e: mbuf->port field is
> + * 0xff then its released to software mempool.

You already explained here but can you please give more details why
shadow queue needed?

> + */
> +struct mrvl_shadow_txq {
> +	int head;           /* write index - used when sending buffers */
> +	int tail;           /* read index - used when releasing buffers */
> +	u16 size;           /* queue occupied size */
> +	u16 num_to_release; /* number of buffers sent, that can be released */
> +	struct buff_release_entry ent[MRVL_PP2_TX_SHADOWQ_SIZE]; /* q entries */
> +};
> +
> +struct mrvl_rxq {
> +	struct mrvl_priv *priv;
> +	struct rte_mempool *mp;
> +	int queue_id;
> +	int port_id;
> +	int cksum_enabled;
> +	uint64_t bytes_recv;
> +	uint64_t drop_mac;
> +};
> +
> +struct mrvl_txq {
> +	struct mrvl_priv *priv;
> +	int queue_id;
> +	int port_id;
> +	uint64_t bytes_sent;
> +};
> +

<...>

> +static int
> +mrvl_dev_start(struct rte_eth_dev *dev)
> +{
> +	struct mrvl_priv *priv = dev->data->dev_private;
> +	char match[MRVL_MATCH_LEN];
> +	int ret;
> +
> +	snprintf(match, sizeof(match), "ppio-%d:%d",
> +		 priv->pp_id, priv->ppio_id);
> +	priv->ppio_params.match = match;

Why this match is used, just a reminder that match is only valid for the
scope of this function, after this function it will be invalid.

<...>

> +
> +	if (rte_spinlock_trylock(&q->priv->lock) == 1) {

Why getting lock in Rx data path?

> +		num = mrvl_get_bpool_size(bpool->pp2_id, bpool->id);
> +
> +		if (unlikely(num <= q->priv->bpool_min_size ||
> +			     (!rx_done && num < q->priv->bpool_init_size))) {
> +			ret = mrvl_fill_bpool(q, MRVL_BURST_SIZE);
> +			if (ret)
> +				RTE_LOG(ERR, PMD, "Failed to fill bpool\n");
> +		} else if (unlikely(num > q->priv->bpool_max_size)) {
> +			int i;
> +			int pkt_to_remove = num - q->priv->bpool_init_size;
> +			struct rte_mbuf *mbuf;
> +			struct pp2_buff_inf buff;
> +
> +			RTE_LOG(DEBUG, PMD,
> +				"\nport-%d:%d: bpool %d oversize - remove %d buffers (pool size: %d -> %d)\n",
> +				bpool->pp2_id, q->priv->ppio->port_id,
> +				bpool->id, pkt_to_remove, num,
> +				q->priv->bpool_init_size);
> +
> +			for (i = 0; i < pkt_to_remove; i++) {
> +				pp2_bpool_get_buff(hifs[core_id], bpool, &buff);
> +				mbuf = (struct rte_mbuf *)
> +					(cookie_addr_high | buff.cookie);
> +				rte_pktmbuf_free(mbuf);
> +			}
> +			mrvl_port_bpool_size
> +				[bpool->pp2_id][bpool->id][core_id] -=
> +								pkt_to_remove;
> +		}
> +		rte_spinlock_unlock(&q->priv->lock);
> +	}
> +
> +	return rx_done;
> +}

<...>

> +	cfgnum = rte_kvargs_count(kvlist, MRVL_CFG_ARG);
> +	if (cfgnum > 1) {
> +		RTE_LOG(ERR, PMD, "Cannot handle more than one config file!\n");
> +		goto out_free_kvlist;
> +	} else if (cfgnum == 1) {
> +		rte_kvargs_process(kvlist, MRVL_CFG_ARG,
> +				   mrvl_get_qoscfg, &mrvl_qos_cfg);

Is the expected format/contect of the config file documented? How one
can know how to create a config file?

> +	}
> +
> +	/*
> +	 * ret == -EEXIST is correct, it means DMA
> +	 * has been already initialized (by another PMD).
> +	 */
> +	ret = mv_sys_dma_mem_init(RTE_MRVL_MUSDK_DMA_MEMSIZE);
> +	if (ret < 0 && ret != -EEXIST)
> +		goto out_free_kvlist;
> +
> +	ret = mrvl_init_pp2();
> +	if (ret) {
> +		RTE_LOG(ERR, PMD, "Failed to init PP!\n");
> +		goto out_deinit_dma;
> +	}
> +
> +	ret = mrvl_init_hifs();
> +	if (ret)
> +		goto out_deinit_hifs;
> +
> +	for (i = 0; i < ifnum; i++) {
> +		RTE_LOG(INFO, PMD, "Creating %s\n", ifnames[i]);
> +		ret = mrvl_eth_dev_create(vdev, ifnames[i]);

So you are supporting multiple ethdev devices created by single vdev
device, by providing multiple "iface" argument in device args.

This will cause eal create single virtual device but driver create
multiple ethdev devices. I don't see direct problem with this but lets
think about it.
This can be problem if you want to provide ethdev specific device
arguments. Perhaps that is why you need to provide a config file ?

It can be an option to define each ethdev with:
"--vdev net_mrvl0,iface=xx0,config=yy0 --vdev
net_mrlv1,iface=xx1,config=yy1 ..."

This may remove your dependecy to librte_cfgfile.

> +		if (ret)
> +			goto out_cleanup;
> +	}
> +
> +	rte_kvargs_free(kvlist);
> +
> +	memset(mrvl_port_bpool_size, 0, sizeof(mrvl_port_bpool_size));
> +
> +	mrvl_lcore_first = RTE_MAX_LCORE;
> +	mrvl_lcore_last = 0;
> +
> +	RTE_LCORE_FOREACH(core_id) {
> +		mrvl_set_first_last_cores(core_id);

This sets limits of core_id. Why you need to know this in PMD level?

<...>
  
Tomasz Duszynski Oct. 4, 2017, 8:59 a.m. UTC | #3
On Wed, Oct 04, 2017 at 01:24:27AM +0100, Ferruh Yigit wrote:
> On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
> > Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
> > Driver is based on external, publicly available, light-weight Marvell
> > MUSDK library that provides access to network packet processor.
> >
> > Driver comes with support for the following features:
> >
> > * Speed capabilities
> > * Link status
> > * Queue start/stop
> > * MTU update
> > * Jumbo frame
> > * Promiscuous mode
> > * Allmulticast mode
> > * Unicast MAC filter
> > * Multicast MAC filter
> > * RSS hash
> > * VLAN filter
> > * CRC offload
> > * L3 checksum offload
> > * L4 checksum offload
> > * Packet type parsing
> > * Basic stats
> > * Stats per queue
>
> I have more detailed comments but in high level,
> what do you think splitting this patch into three patches:
> - Skeleton
> - Add Rx/Tx support
> - Add features, like MTU update or Promiscuous etc.. support
If it's how submission process works then I think you left me with no
other option than splitting driver into nice patchset :). On the other
hand driver is really a wrapper to MUSDK library and thus quite easy to
follow. What are the benefits of such 3-way split?
>
> >
> > Driver was engineered cooperatively by Semihalf and Marvell teams.
> >
> > Semihalf:
> > Jacek Siuda <jck@semihalf.com>
> > Tomasz Duszynski <tdu@semihalf.com>
> >
> > Marvell:
> > Dmitri Epshtein <dima@marvell.com>
> > Natalie Samsonov <nsamsono@marvell.com>
> >
> > Signed-off-by: Jacek Siuda <jck@semihalf.com>
> > Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
>
> <...>
>
> > +static struct rte_vdev_driver pmd_mrvl_drv = {
> > +	.probe = rte_pmd_mrvl_probe,
> > +	.remove = rte_pmd_mrvl_remove,
> > +};
> > +
> > +RTE_PMD_REGISTER_VDEV(net_mrvl, pmd_mrvl_drv);
>
> Please help me understand.
>
> This driver implemented as virtual driver, because:
> With the help of custom kernel modules, musdk library already provides
> userspace datapath support. This PMD is an interface to musdk library.
> Is this correct?
That is right. Another reason this NIC is not PCI device.
>
> If so, just thinking loud:
> - Why not implement this PMD directly on top of kernel interface,
> removing musdk layer completely?
> - How big problem that this PMD depends on custom kernel code?
I think the main reason is that MUSDK is already used in different projects.
Keeping multiple codebases offering similar functionality would be quite
demanding in terms of extra work needed.
> - How library and custom kernel code delivered? For which platforms?
Kernel and library sources are hosted on publicly available repository.
Driver was tested on Armada 7k/8k SoCs.
>
> <....>
>

--
- Tomasz Duszyński
  
Tomasz Duszynski Oct. 4, 2017, 1:19 p.m. UTC | #4
On Wed, Oct 04, 2017 at 01:28:47AM +0100, Ferruh Yigit wrote:
> On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
> > Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
> > Driver is based on external, publicly available, light-weight Marvell
> > MUSDK library that provides access to network packet processor.
> >
> > Driver comes with support for the following features:
> >
> > * Speed capabilities
> > * Link status
> > * Queue start/stop
> > * MTU update
> > * Jumbo frame
> > * Promiscuous mode
> > * Allmulticast mode
> > * Unicast MAC filter
> > * Multicast MAC filter
> > * RSS hash
> > * VLAN filter
> > * CRC offload
> > * L3 checksum offload
> > * L4 checksum offload
> > * Packet type parsing
> > * Basic stats
> > * Stats per queue
> >
> > Driver was engineered cooperatively by Semihalf and Marvell teams.
> >
> > Semihalf:
> > Jacek Siuda <jck@semihalf.com>
> > Tomasz Duszynski <tdu@semihalf.com>
> >
> > Marvell:
> > Dmitri Epshtein <dima@marvell.com>
> > Natalie Samsonov <nsamsono@marvell.com>
> >
> > Signed-off-by: Jacek Siuda <jck@semihalf.com>
> > Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
>
> <...>
>
> > +++ b/config/common_base
> > @@ -262,6 +262,13 @@ CONFIG_RTE_LIBRTE_NFP_PMD=n
> >  CONFIG_RTE_LIBRTE_NFP_DEBUG=n
> >
> >  #
> > +# Compile Marvell PMD driver
> > +#
> > +CONFIG_RTE_LIBRTE_MRVL_PMD=n
> > +CONFIG_RTE_LIBRTE_MRVL_DEBUG=n
> > +CONFIG_RTE_MRVL_MUSDK_DMA_MEMSIZE=41943040
>
> Is dma memsize needs to be a configuration option?

That config option is used both by NET and CRYPTO drivers. In case NET
and CRYPTO are used together i.e ipsec-secgw then DMA_MEMSIZE must be
the set to the same size. Putting this configuration option in .config
makes sure DMA_MEMSIZE stays synchronized.

>
> <...>
>
> > +include $(RTE_SDK)/mk/rte.vars.mk
> > +
> > +ifneq ($(MAKECMDGOALS),clean)
> > +ifneq ($(MAKECMDGOALS),config)
> > +ifeq ($(LIBMUSDK_PATH),)
> > +$(error "Please define LIBMUSDK_PATH environment variable")
>
> Not sure how to resolve this dependency.
> What do you think adding this as configuration option?

All other drivers with external dependencies follow the same approach.

>
> Or DPDK just adds the -lmusdk external dependency and while compiling
> for marvel EXTRA_LDFLAGS parameter should be pass with
> "-L$(LIBMUSDK_PATH)" and this can be documented in marvel doc. What do
> you think?

Both solutions are reasonable. The former was chosen because that's what the
other drivers do.

>
> > +endif
> > +ifeq ($(CONFIG_RTE_LIBRTE_CFGFILE),n)
> > +$(error "RTE_LIBRTE_CFGFILE must be enabled in configuration!")
>
> This can be also handled in drivers/net/Makefile, it can be possible to
> add check there for LIBRTE_CFGFILE dependency.
>

ACK

> > +endif
> > +endif
> > +endif
> > +
> > +# library name
> > +LIB = librte_pmd_mrvl.a
> > +
> > +# library version
> > +LIBABIVER := 1
> > +
> > +# versioning export map
> > +EXPORT_MAP := rte_pmd_mrvl_version.map
> > +
> > +# external library dependencies
> > +CFLAGS += -I$(LIBMUSDK_PATH)/include
> > +CFLAGS += -DMVCONF_ARCH_DMA_ADDR_T_64BIT
> > +CFLAGS += -DCONF_PP2_BPOOL_COOKIE_SIZE=32
> > +CFLAGS += $(WERROR_FLAGS)
> > +CFLAGS += -O3
> > +LDLIBS += -L$(LIBMUSDK_PATH)/lib
>
> This can be LDFLAGS instead of LDLIBS

Moving that to LDFLAGS will break compilation in case
CONFIG_RTE_BUILD_SHARED_LIB is set as -L... does not show up on command
line thus linker does not know where to look extra library up.
I may be wrong but it looks as if specifying LDFLAGS in driver's
Makefile is no-op.

On the other hand, if we are building static libraries both
-lmusdk and -L$(LIBMUSDK_PATH)/lib are added to specific _LDLIBS which in turn
ends up in LDLIBS.

>
> > +LDLIBS += -lmusdk
> > +
> > +# library source files
> > +SRCS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl_ethdev.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl_qos.c
> > +
> > +# library dependencies
> > +DEPDIRS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += lib/librte_cfgfile
>
> These variables no more used, you can drop this. drivers/net/Makefile
> used for this, you are already updating that file, librte_cfgfile needs
> to be added there.
>

ACK

> > +
> > +include $(RTE_SDK)/mk/rte.lib.mk
>
> <...>
>
> > +/*
> > + * To use buffer harvesting based on loopback port shadow queue structure
> > + * was introduced for buffers information bookkeeping.
> > + *
> > + * Before sending the packet, related buffer information (pp2_buff_inf) is
> > + * stored in shadow queue. After packet is transmitted no longer used
> > + * packet buffer is released back to it's original hardware pool,
> > + * on condition it originated from interface.
> > + * In case it  was generated by application itself i.e: mbuf->port field is
> > + * 0xff then its released to software mempool.
>
> You already explained here but can you please give more details why
> shadow queue needed?

It's used for mbuf harvesting in tx-path. Instead of releasing pushed
out mbuf to mempool and allocating it once again later on, mbuf is
stored in the shadow queue and returned back to hardware buffer manager
after being sent.

>
> > + */
> > +struct mrvl_shadow_txq {
> > +	int head;           /* write index - used when sending buffers */
> > +	int tail;           /* read index - used when releasing buffers */
> > +	u16 size;           /* queue occupied size */
> > +	u16 num_to_release; /* number of buffers sent, that can be released */
> > +	struct buff_release_entry ent[MRVL_PP2_TX_SHADOWQ_SIZE]; /* q entries */
> > +};
> > +
> > +struct mrvl_rxq {
> > +	struct mrvl_priv *priv;
> > +	struct rte_mempool *mp;
> > +	int queue_id;
> > +	int port_id;
> > +	int cksum_enabled;
> > +	uint64_t bytes_recv;
> > +	uint64_t drop_mac;
> > +};
> > +
> > +struct mrvl_txq {
> > +	struct mrvl_priv *priv;
> > +	int queue_id;
> > +	int port_id;
> > +	uint64_t bytes_sent;
> > +};
> > +
>
> <...>
>
> > +static int
> > +mrvl_dev_start(struct rte_eth_dev *dev)
> > +{
> > +	struct mrvl_priv *priv = dev->data->dev_private;
> > +	char match[MRVL_MATCH_LEN];
> > +	int ret;
> > +
> > +	snprintf(match, sizeof(match), "ppio-%d:%d",
> > +		 priv->pp_id, priv->ppio_id);
> > +	priv->ppio_params.match = match;
>
> Why this match is used, just a reminder that match is only valid for the
> scope of this function, after this function it will be invalid.
>

Keeping match locally is fine. That's used to tell MUSDK which physical
port to configure, i.e ppio-0:1 means to configure port 1 on packet
processor 0. Armada 8k has to such packet processor, while armada 7k
only one.

> <...>
>
> > +
> > +	if (rte_spinlock_trylock(&q->priv->lock) == 1) {
>
> Why getting lock in Rx data path?
>

In multi-core and multi-queue case some kind of protection is
necessary so that several cores cannot modify bpool at
the same time.

> > +		num = mrvl_get_bpool_size(bpool->pp2_id, bpool->id);
> > +
> > +		if (unlikely(num <= q->priv->bpool_min_size ||
> > +			     (!rx_done && num < q->priv->bpool_init_size))) {
> > +			ret = mrvl_fill_bpool(q, MRVL_BURST_SIZE);
> > +			if (ret)
> > +				RTE_LOG(ERR, PMD, "Failed to fill bpool\n");
> > +		} else if (unlikely(num > q->priv->bpool_max_size)) {
> > +			int i;
> > +			int pkt_to_remove = num - q->priv->bpool_init_size;
> > +			struct rte_mbuf *mbuf;
> > +			struct pp2_buff_inf buff;
> > +
> > +			RTE_LOG(DEBUG, PMD,
> > +				"\nport-%d:%d: bpool %d oversize - remove %d buffers (pool size: %d -> %d)\n",
> > +				bpool->pp2_id, q->priv->ppio->port_id,
> > +				bpool->id, pkt_to_remove, num,
> > +				q->priv->bpool_init_size);
> > +
> > +			for (i = 0; i < pkt_to_remove; i++) {
> > +				pp2_bpool_get_buff(hifs[core_id], bpool, &buff);
> > +				mbuf = (struct rte_mbuf *)
> > +					(cookie_addr_high | buff.cookie);
> > +				rte_pktmbuf_free(mbuf);
> > +			}
> > +			mrvl_port_bpool_size
> > +				[bpool->pp2_id][bpool->id][core_id] -=
> > +								pkt_to_remove;
> > +		}
> > +		rte_spinlock_unlock(&q->priv->lock);
> > +	}
> > +
> > +	return rx_done;
> > +}
>
> <...>
>
> > +	cfgnum = rte_kvargs_count(kvlist, MRVL_CFG_ARG);
> > +	if (cfgnum > 1) {
> > +		RTE_LOG(ERR, PMD, "Cannot handle more than one config file!\n");
> > +		goto out_free_kvlist;
> > +	} else if (cfgnum == 1) {
> > +		rte_kvargs_process(kvlist, MRVL_CFG_ARG,
> > +				   mrvl_get_qoscfg, &mrvl_qos_cfg);
>
> Is the expected format/contect of the config file documented? How one
> can know how to create a config file?
>

Right, documentation is missing for that. Will add in v4.

> > +	}
> > +
> > +	/*
> > +	 * ret == -EEXIST is correct, it means DMA
> > +	 * has been already initialized (by another PMD).
> > +	 */
> > +	ret = mv_sys_dma_mem_init(RTE_MRVL_MUSDK_DMA_MEMSIZE);
> > +	if (ret < 0 && ret != -EEXIST)
> > +		goto out_free_kvlist;
> > +
> > +	ret = mrvl_init_pp2();
> > +	if (ret) {
> > +		RTE_LOG(ERR, PMD, "Failed to init PP!\n");
> > +		goto out_deinit_dma;
> > +	}
> > +
> > +	ret = mrvl_init_hifs();
> > +	if (ret)
> > +		goto out_deinit_hifs;
> > +
> > +	for (i = 0; i < ifnum; i++) {
> > +		RTE_LOG(INFO, PMD, "Creating %s\n", ifnames[i]);
> > +		ret = mrvl_eth_dev_create(vdev, ifnames[i]);
>
> So you are supporting multiple ethdev devices created by single vdev
> device, by providing multiple "iface" argument in device args.
>
> This will cause eal create single virtual device but driver create
> multiple ethdev devices. I don't see direct problem with this but lets
> think about it.
> This can be problem if you want to provide ethdev specific device
> arguments. Perhaps that is why you need to provide a config file ?
>
> It can be an option to define each ethdev with:
> "--vdev net_mrvl0,iface=xx0,config=yy0 --vdev
> net_mrlv1,iface=xx1,config=yy1 ..."
>
> This may remove your dependecy to librte_cfgfile.
>

Currently there's not need to passing separate options to each created
device. As for configuration file it handles all devices at once.

> > +		if (ret)
> > +			goto out_cleanup;
> > +	}
> > +
> > +	rte_kvargs_free(kvlist);
> > +
> > +	memset(mrvl_port_bpool_size, 0, sizeof(mrvl_port_bpool_size));
> > +
> > +	mrvl_lcore_first = RTE_MAX_LCORE;
> > +	mrvl_lcore_last = 0;
> > +
> > +	RTE_LCORE_FOREACH(core_id) {
> > +		mrvl_set_first_last_cores(core_id);
>
> This sets limits of core_id. Why you need to know this in PMD level?

It's just to limit number of entries in mrvl_port_bpool_size we iterate
over every time we want to count the total number of buffers in the
hardware buffer pool.

>
> <...>
>

--
- Tomasz Duszyński
  
Ferruh Yigit Oct. 4, 2017, 4:59 p.m. UTC | #5
On 10/4/2017 9:59 AM, Tomasz Duszynski wrote:
> On Wed, Oct 04, 2017 at 01:24:27AM +0100, Ferruh Yigit wrote:
>> On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
>>> Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
>>> Driver is based on external, publicly available, light-weight Marvell
>>> MUSDK library that provides access to network packet processor.
>>>
>>> Driver comes with support for the following features:
>>>
>>> * Speed capabilities
>>> * Link status
>>> * Queue start/stop
>>> * MTU update
>>> * Jumbo frame
>>> * Promiscuous mode
>>> * Allmulticast mode
>>> * Unicast MAC filter
>>> * Multicast MAC filter
>>> * RSS hash
>>> * VLAN filter
>>> * CRC offload
>>> * L3 checksum offload
>>> * L4 checksum offload
>>> * Packet type parsing
>>> * Basic stats
>>> * Stats per queue
>>
>> I have more detailed comments but in high level,
>> what do you think splitting this patch into three patches:
>> - Skeleton
>> - Add Rx/Tx support
>> - Add features, like MTU update or Promiscuous etc.. support
> If it's how submission process works then I think you left me with no
> other option than splitting driver into nice patchset :). 

No, there is no defined submission process.

> On the other
> hand driver is really a wrapper to MUSDK library and thus quite easy to
> follow. What are the benefits of such 3-way split?

To help others review/understand your code. Big code chunks are scary
and I believe most of details gets lost in big code chunks.

When someone from community wants to understand and update/improve/fix
your code, to help them by logically split the code that their focus can
go into more narrow part.

But this also means some effort in your side, so some kind of balance is
required.

I think splitting patch into smaller logical part is helpful for others,
what do you think, is it too much effort?

>>
>>>
>>> Driver was engineered cooperatively by Semihalf and Marvell teams.
>>>
>>> Semihalf:
>>> Jacek Siuda <jck@semihalf.com>
>>> Tomasz Duszynski <tdu@semihalf.com>
>>>
>>> Marvell:
>>> Dmitri Epshtein <dima@marvell.com>
>>> Natalie Samsonov <nsamsono@marvell.com>
>>>
>>> Signed-off-by: Jacek Siuda <jck@semihalf.com>
>>> Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
>>
>> <...>
>>
>>> +static struct rte_vdev_driver pmd_mrvl_drv = {
>>> +	.probe = rte_pmd_mrvl_probe,
>>> +	.remove = rte_pmd_mrvl_remove,
>>> +};
>>> +
>>> +RTE_PMD_REGISTER_VDEV(net_mrvl, pmd_mrvl_drv);
>>
>> Please help me understand.
>>
>> This driver implemented as virtual driver, because:
>> With the help of custom kernel modules, musdk library already provides
>> userspace datapath support. This PMD is an interface to musdk library.
>> Is this correct?
> That is right. Another reason this NIC is not PCI device.

We support more bus now :). Out of curiosity, which bus is device on?

>>
>> If so, just thinking loud:
>> - Why not implement this PMD directly on top of kernel interface,
>> removing musdk layer completely?
>> - How big problem that this PMD depends on custom kernel code?
> I think the main reason is that MUSDK is already used in different projects.
> Keeping multiple codebases offering similar functionality would be quite
> demanding in terms of extra work needed.
>> - How library and custom kernel code delivered? For which platforms?
> Kernel and library sources are hosted on publicly available repository.

I guess it would be nice to highlight custom kernel with external
patches is required. This is not mentioned in "Prerequisites" section of
the document.

> Driver was tested on Armada 7k/8k SoCs.

Can you please provide link to the HW mentioned in documentation?

>>
>> <....>
>>
> 
> --
> - Tomasz Duszyński
>
  
Tomasz Duszynski Oct. 5, 2017, 8:43 a.m. UTC | #6
On Wed, Oct 04, 2017 at 05:59:11PM +0100, Ferruh Yigit wrote:
> On 10/4/2017 9:59 AM, Tomasz Duszynski wrote:
> > On Wed, Oct 04, 2017 at 01:24:27AM +0100, Ferruh Yigit wrote:
> >> On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
> >>> Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
> >>> Driver is based on external, publicly available, light-weight Marvell
> >>> MUSDK library that provides access to network packet processor.
> >>>
> >>> Driver comes with support for the following features:
> >>>
> >>> * Speed capabilities
> >>> * Link status
> >>> * Queue start/stop
> >>> * MTU update
> >>> * Jumbo frame
> >>> * Promiscuous mode
> >>> * Allmulticast mode
> >>> * Unicast MAC filter
> >>> * Multicast MAC filter
> >>> * RSS hash
> >>> * VLAN filter
> >>> * CRC offload
> >>> * L3 checksum offload
> >>> * L4 checksum offload
> >>> * Packet type parsing
> >>> * Basic stats
> >>> * Stats per queue
> >>
> >> I have more detailed comments but in high level,
> >> what do you think splitting this patch into three patches:
> >> - Skeleton
> >> - Add Rx/Tx support
> >> - Add features, like MTU update or Promiscuous etc.. support
> > If it's how submission process works then I think you left me with no
> > other option than splitting driver into nice patchset :).
>
> No, there is no defined submission process.
>
> > On the other
> > hand driver is really a wrapper to MUSDK library and thus quite easy to
> > follow. What are the benefits of such 3-way split?
>
> To help others review/understand your code. Big code chunks are scary
> and I believe most of details gets lost in big code chunks.
>
> When someone from community wants to understand and update/improve/fix
> your code, to help them by logically split the code that their focus can
> go into more narrow part.
>
> But this also means some effort in your side, so some kind of balance is
> required.
>
> I think splitting patch into smaller logical part is helpful for others,
> what do you think, is it too much effort?
>

Fair enough. I'll split the driver as suggested. A few specific
questions about functionality each patch should contain though.

As for skeleton, I see others just put driver probing here.

As for Rx/Tx support it seems that there's no common pattern.
Functionality like starting/stopping device, queues configuration
and all the other things related to Rx/Tx should be here as well?

What's left are features which go into features-patch.

> >>
> >>>
> >>> Driver was engineered cooperatively by Semihalf and Marvell teams.
> >>>
> >>> Semihalf:
> >>> Jacek Siuda <jck@semihalf.com>
> >>> Tomasz Duszynski <tdu@semihalf.com>
> >>>
> >>> Marvell:
> >>> Dmitri Epshtein <dima@marvell.com>
> >>> Natalie Samsonov <nsamsono@marvell.com>
> >>>
> >>> Signed-off-by: Jacek Siuda <jck@semihalf.com>
> >>> Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
> >>
> >> <...>
> >>
> >>> +static struct rte_vdev_driver pmd_mrvl_drv = {
> >>> +	.probe = rte_pmd_mrvl_probe,
> >>> +	.remove = rte_pmd_mrvl_remove,
> >>> +};
> >>> +
> >>> +RTE_PMD_REGISTER_VDEV(net_mrvl, pmd_mrvl_drv);
> >>
> >> Please help me understand.
> >>
> >> This driver implemented as virtual driver, because:
> >> With the help of custom kernel modules, musdk library already provides
> >> userspace datapath support. This PMD is an interface to musdk library.
> >> Is this correct?
> > That is right. Another reason this NIC is not PCI device.
>
> We support more bus now :). Out of curiosity, which bus is device on?

Bus is called Aurora2. That's proprietary SoC interconnect fabric.

>
> >>
> >> If so, just thinking loud:
> >> - Why not implement this PMD directly on top of kernel interface,
> >> removing musdk layer completely?
> >> - How big problem that this PMD depends on custom kernel code?
> > I think the main reason is that MUSDK is already used in different projects.
> > Keeping multiple codebases offering similar functionality would be quite
> > demanding in terms of extra work needed.
> >> - How library and custom kernel code delivered? For which platforms?
> > Kernel and library sources are hosted on publicly available repository.
>
> I guess it would be nice to highlight custom kernel with external
> patches is required. This is not mentioned in "Prerequisites" section of
> the document.
>

ACK

> > Driver was tested on Armada 7k/8k SoCs.
>
> Can you please provide link to the HW mentioned in documentation?
>

You can find some info here:

https://www.marvell.com/embedded-processors/armada-70xx/
https://www.marvell.com/embedded-processors/armada-80xx/

> >>
> >> <....>
> >>
> >
> > --
> > - Tomasz Duszyński
> >
>

--
- Tomasz Duszyński
  
Ferruh Yigit Oct. 5, 2017, 5:29 p.m. UTC | #7
On 10/5/2017 9:43 AM, Tomasz Duszynski wrote:
> On Wed, Oct 04, 2017 at 05:59:11PM +0100, Ferruh Yigit wrote:
>> On 10/4/2017 9:59 AM, Tomasz Duszynski wrote:
>>> On Wed, Oct 04, 2017 at 01:24:27AM +0100, Ferruh Yigit wrote:
>>>> On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
>>>>> Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
>>>>> Driver is based on external, publicly available, light-weight Marvell
>>>>> MUSDK library that provides access to network packet processor.
>>>>>
>>>>> Driver comes with support for the following features:
>>>>>
>>>>> * Speed capabilities
>>>>> * Link status
>>>>> * Queue start/stop
>>>>> * MTU update
>>>>> * Jumbo frame
>>>>> * Promiscuous mode
>>>>> * Allmulticast mode
>>>>> * Unicast MAC filter
>>>>> * Multicast MAC filter
>>>>> * RSS hash
>>>>> * VLAN filter
>>>>> * CRC offload
>>>>> * L3 checksum offload
>>>>> * L4 checksum offload
>>>>> * Packet type parsing
>>>>> * Basic stats
>>>>> * Stats per queue
>>>>
>>>> I have more detailed comments but in high level,
>>>> what do you think splitting this patch into three patches:
>>>> - Skeleton
>>>> - Add Rx/Tx support
>>>> - Add features, like MTU update or Promiscuous etc.. support
>>> If it's how submission process works then I think you left me with no
>>> other option than splitting driver into nice patchset :).
>>
>> No, there is no defined submission process.
>>
>>> On the other
>>> hand driver is really a wrapper to MUSDK library and thus quite easy to
>>> follow. What are the benefits of such 3-way split?
>>
>> To help others review/understand your code. Big code chunks are scary
>> and I believe most of details gets lost in big code chunks.
>>
>> When someone from community wants to understand and update/improve/fix
>> your code, to help them by logically split the code that their focus can
>> go into more narrow part.
>>
>> But this also means some effort in your side, so some kind of balance is
>> required.
>>
>> I think splitting patch into smaller logical part is helpful for others,
>> what do you think, is it too much effort?
>>
> 
> Fair enough. I'll split the driver as suggested. A few specific
> questions about functionality each patch should contain though.
> 
> As for skeleton, I see others just put driver probing here.
> 
> As for Rx/Tx support it seems that there's no common pattern.
> Functionality like starting/stopping device, queues configuration
> and all the other things related to Rx/Tx should be here as well?

As you said there is no common pattern, but I think starting/stopping
device, queues configuration can go into skeleton and mainly Rx/Tx burst
functions can go into Rx/Tx patch.
But please what you think more reasonable matters here.

> 
> What's left are features which go into features-patch.

Yes.
And the .ini file, currently part of doc patch, can be part of this
features patch, it is helps more to see the code add feature and doc
documents it in same patch.

> 
>>>>
>>>>>
>>>>> Driver was engineered cooperatively by Semihalf and Marvell teams.
>>>>>
>>>>> Semihalf:
>>>>> Jacek Siuda <jck@semihalf.com>
>>>>> Tomasz Duszynski <tdu@semihalf.com>
>>>>>
>>>>> Marvell:
>>>>> Dmitri Epshtein <dima@marvell.com>
>>>>> Natalie Samsonov <nsamsono@marvell.com>
>>>>>
>>>>> Signed-off-by: Jacek Siuda <jck@semihalf.com>
>>>>> Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
>>>>
>>>> <...>
>>>>
>>>>> +static struct rte_vdev_driver pmd_mrvl_drv = {
>>>>> +	.probe = rte_pmd_mrvl_probe,
>>>>> +	.remove = rte_pmd_mrvl_remove,
>>>>> +};
>>>>> +
>>>>> +RTE_PMD_REGISTER_VDEV(net_mrvl, pmd_mrvl_drv);
>>>>
>>>> Please help me understand.
>>>>
>>>> This driver implemented as virtual driver, because:
>>>> With the help of custom kernel modules, musdk library already provides
>>>> userspace datapath support. This PMD is an interface to musdk library.
>>>> Is this correct?
>>> That is right. Another reason this NIC is not PCI device.
>>
>> We support more bus now :). Out of curiosity, which bus is device on?
> 
> Bus is called Aurora2. That's proprietary SoC interconnect fabric.
> 
>>
>>>>
>>>> If so, just thinking loud:
>>>> - Why not implement this PMD directly on top of kernel interface,
>>>> removing musdk layer completely?
>>>> - How big problem that this PMD depends on custom kernel code?
>>> I think the main reason is that MUSDK is already used in different projects.
>>> Keeping multiple codebases offering similar functionality would be quite
>>> demanding in terms of extra work needed.
>>>> - How library and custom kernel code delivered? For which platforms?
>>> Kernel and library sources are hosted on publicly available repository.
>>
>> I guess it would be nice to highlight custom kernel with external
>> patches is required. This is not mentioned in "Prerequisites" section of
>> the document.
>>
> 
> ACK
> 
>>> Driver was tested on Armada 7k/8k SoCs.
>>
>> Can you please provide link to the HW mentioned in documentation?
>>
> 
> You can find some info here:
> 
> https://www.marvell.com/embedded-processors/armada-70xx/
> https://www.marvell.com/embedded-processors/armada-80xx/

Thanks, would you mind putting these links into driver documentation as
well?

> 
>>>>
>>>> <....>
>>>>
>>>
>>> --
>>> - Tomasz Duszyński
>>>
>>
> 
> --
> - Tomasz Duszyński
>
  
Ferruh Yigit Oct. 5, 2017, 5:37 p.m. UTC | #8
On 10/4/2017 2:19 PM, Tomasz Duszynski wrote:
> On Wed, Oct 04, 2017 at 01:28:47AM +0100, Ferruh Yigit wrote:
>> On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
>>> Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
>>> Driver is based on external, publicly available, light-weight Marvell
>>> MUSDK library that provides access to network packet processor.
>>>
>>> Driver comes with support for the following features:
>>>
>>> * Speed capabilities
>>> * Link status
>>> * Queue start/stop
>>> * MTU update
>>> * Jumbo frame
>>> * Promiscuous mode
>>> * Allmulticast mode
>>> * Unicast MAC filter
>>> * Multicast MAC filter
>>> * RSS hash
>>> * VLAN filter
>>> * CRC offload
>>> * L3 checksum offload
>>> * L4 checksum offload
>>> * Packet type parsing
>>> * Basic stats
>>> * Stats per queue
>>>
>>> Driver was engineered cooperatively by Semihalf and Marvell teams.
>>>
>>> Semihalf:
>>> Jacek Siuda <jck@semihalf.com>
>>> Tomasz Duszynski <tdu@semihalf.com>
>>>
>>> Marvell:
>>> Dmitri Epshtein <dima@marvell.com>
>>> Natalie Samsonov <nsamsono@marvell.com>
>>>
>>> Signed-off-by: Jacek Siuda <jck@semihalf.com>
>>> Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
>>
>> <...>
>>
>>> +++ b/config/common_base
>>> @@ -262,6 +262,13 @@ CONFIG_RTE_LIBRTE_NFP_PMD=n
>>>  CONFIG_RTE_LIBRTE_NFP_DEBUG=n
>>>
>>>  #
>>> +# Compile Marvell PMD driver
>>> +#
>>> +CONFIG_RTE_LIBRTE_MRVL_PMD=n
>>> +CONFIG_RTE_LIBRTE_MRVL_DEBUG=n
>>> +CONFIG_RTE_MRVL_MUSDK_DMA_MEMSIZE=41943040
>>
>> Is dma memsize needs to be a configuration option?
> 
> That config option is used both by NET and CRYPTO drivers. In case NET
> and CRYPTO are used together i.e ipsec-secgw then DMA_MEMSIZE must be
> the set to the same size. Putting this configuration option in .config
> makes sure DMA_MEMSIZE stays synchronized.

OK.

> 
>>
>> <...>
>>
>>> +include $(RTE_SDK)/mk/rte.vars.mk
>>> +
>>> +ifneq ($(MAKECMDGOALS),clean)
>>> +ifneq ($(MAKECMDGOALS),config)
>>> +ifeq ($(LIBMUSDK_PATH),)
>>> +$(error "Please define LIBMUSDK_PATH environment variable")
>>
>> Not sure how to resolve this dependency.
>> What do you think adding this as configuration option?
> 
> All other drivers with external dependencies follow the same approach.
> 
>>
>> Or DPDK just adds the -lmusdk external dependency and while compiling
>> for marvel EXTRA_LDFLAGS parameter should be pass with
>> "-L$(LIBMUSDK_PATH)" and this can be documented in marvel doc. What do
>> you think?
> 
> Both solutions are reasonable. The former was chosen because that's what the
> other drivers do.

OK.

> 
>>
>>> +endif
>>> +ifeq ($(CONFIG_RTE_LIBRTE_CFGFILE),n)
>>> +$(error "RTE_LIBRTE_CFGFILE must be enabled in configuration!")
>>
>> This can be also handled in drivers/net/Makefile, it can be possible to
>> add check there for LIBRTE_CFGFILE dependency.
>>
> 
> ACK
> 
>>> +endif
>>> +endif
>>> +endif
>>> +
>>> +# library name
>>> +LIB = librte_pmd_mrvl.a
>>> +
>>> +# library version
>>> +LIBABIVER := 1
>>> +
>>> +# versioning export map
>>> +EXPORT_MAP := rte_pmd_mrvl_version.map
>>> +
>>> +# external library dependencies
>>> +CFLAGS += -I$(LIBMUSDK_PATH)/include
>>> +CFLAGS += -DMVCONF_ARCH_DMA_ADDR_T_64BIT
>>> +CFLAGS += -DCONF_PP2_BPOOL_COOKIE_SIZE=32
>>> +CFLAGS += $(WERROR_FLAGS)
>>> +CFLAGS += -O3
>>> +LDLIBS += -L$(LIBMUSDK_PATH)/lib
>>
>> This can be LDFLAGS instead of LDLIBS
> 
> Moving that to LDFLAGS will break compilation in case
> CONFIG_RTE_BUILD_SHARED_LIB is set as -L... does not show up on command
> line thus linker does not know where to look extra library up.
> I may be wrong but it looks as if specifying LDFLAGS in driver's
> Makefile is no-op.

I would expect LDFLAGS will work, but if it is breaking the build,
please keep as it is, we can check and fix this later.

> 
> On the other hand, if we are building static libraries both
> -lmusdk and -L$(LIBMUSDK_PATH)/lib are added to specific _LDLIBS which in turn
> ends up in LDLIBS.
> 
>>
>>> +LDLIBS += -lmusdk
>>> +
>>> +# library source files
>>> +SRCS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl_ethdev.c
>>> +SRCS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl_qos.c
>>> +
>>> +# library dependencies
>>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += lib/librte_cfgfile
>>
>> These variables no more used, you can drop this. drivers/net/Makefile
>> used for this, you are already updating that file, librte_cfgfile needs
>> to be added there.
>>
> 
> ACK
> 
>>> +
>>> +include $(RTE_SDK)/mk/rte.lib.mk
>>
>> <...>
>>
>>> +/*
>>> + * To use buffer harvesting based on loopback port shadow queue structure
>>> + * was introduced for buffers information bookkeeping.
>>> + *
>>> + * Before sending the packet, related buffer information (pp2_buff_inf) is
>>> + * stored in shadow queue. After packet is transmitted no longer used
>>> + * packet buffer is released back to it's original hardware pool,
>>> + * on condition it originated from interface.
>>> + * In case it  was generated by application itself i.e: mbuf->port field is
>>> + * 0xff then its released to software mempool.
>>
>> You already explained here but can you please give more details why
>> shadow queue needed?
> 
> It's used for mbuf harvesting in tx-path. Instead of releasing pushed
> out mbuf to mempool and allocating it once again later on, mbuf is
> stored in the shadow queue and returned back to hardware buffer manager
> after being sent.
> 
>>
>>> + */
>>> +struct mrvl_shadow_txq {
>>> +	int head;           /* write index - used when sending buffers */
>>> +	int tail;           /* read index - used when releasing buffers */
>>> +	u16 size;           /* queue occupied size */
>>> +	u16 num_to_release; /* number of buffers sent, that can be released */
>>> +	struct buff_release_entry ent[MRVL_PP2_TX_SHADOWQ_SIZE]; /* q entries */
>>> +};
>>> +
>>> +struct mrvl_rxq {
>>> +	struct mrvl_priv *priv;
>>> +	struct rte_mempool *mp;
>>> +	int queue_id;
>>> +	int port_id;
>>> +	int cksum_enabled;
>>> +	uint64_t bytes_recv;
>>> +	uint64_t drop_mac;
>>> +};
>>> +
>>> +struct mrvl_txq {
>>> +	struct mrvl_priv *priv;
>>> +	int queue_id;
>>> +	int port_id;
>>> +	uint64_t bytes_sent;
>>> +};
>>> +
>>
>> <...>
>>
>>> +static int
>>> +mrvl_dev_start(struct rte_eth_dev *dev)
>>> +{
>>> +	struct mrvl_priv *priv = dev->data->dev_private;
>>> +	char match[MRVL_MATCH_LEN];
>>> +	int ret;
>>> +
>>> +	snprintf(match, sizeof(match), "ppio-%d:%d",
>>> +		 priv->pp_id, priv->ppio_id);
>>> +	priv->ppio_params.match = match;
>>
>> Why this match is used, just a reminder that match is only valid for the
>> scope of this function, after this function it will be invalid.
>>
> 
> Keeping match locally is fine. That's used to tell MUSDK which physical
> port to configure, i.e ppio-0:1 means to configure port 1 on packet
> processor 0. Armada 8k has to such packet processor, while armada 7k
> only one.

Ok, thanks for clarification.

> 
>> <...>
>>
>>> +
>>> +	if (rte_spinlock_trylock(&q->priv->lock) == 1) {
>>
>> Why getting lock in Rx data path?
>>
> 
> In multi-core and multi-queue case some kind of protection is
> necessary so that several cores cannot modify bpool at
> the same time.
> 
>>> +		num = mrvl_get_bpool_size(bpool->pp2_id, bpool->id);
>>> +
>>> +		if (unlikely(num <= q->priv->bpool_min_size ||
>>> +			     (!rx_done && num < q->priv->bpool_init_size))) {
>>> +			ret = mrvl_fill_bpool(q, MRVL_BURST_SIZE);
>>> +			if (ret)
>>> +				RTE_LOG(ERR, PMD, "Failed to fill bpool\n");
>>> +		} else if (unlikely(num > q->priv->bpool_max_size)) {
>>> +			int i;
>>> +			int pkt_to_remove = num - q->priv->bpool_init_size;
>>> +			struct rte_mbuf *mbuf;
>>> +			struct pp2_buff_inf buff;
>>> +
>>> +			RTE_LOG(DEBUG, PMD,
>>> +				"\nport-%d:%d: bpool %d oversize - remove %d buffers (pool size: %d -> %d)\n",
>>> +				bpool->pp2_id, q->priv->ppio->port_id,
>>> +				bpool->id, pkt_to_remove, num,
>>> +				q->priv->bpool_init_size);
>>> +
>>> +			for (i = 0; i < pkt_to_remove; i++) {
>>> +				pp2_bpool_get_buff(hifs[core_id], bpool, &buff);
>>> +				mbuf = (struct rte_mbuf *)
>>> +					(cookie_addr_high | buff.cookie);
>>> +				rte_pktmbuf_free(mbuf);
>>> +			}
>>> +			mrvl_port_bpool_size
>>> +				[bpool->pp2_id][bpool->id][core_id] -=
>>> +								pkt_to_remove;
>>> +		}
>>> +		rte_spinlock_unlock(&q->priv->lock);
>>> +	}
>>> +
>>> +	return rx_done;
>>> +}
>>
>> <...>
>>
>>> +	cfgnum = rte_kvargs_count(kvlist, MRVL_CFG_ARG);
>>> +	if (cfgnum > 1) {
>>> +		RTE_LOG(ERR, PMD, "Cannot handle more than one config file!\n");
>>> +		goto out_free_kvlist;
>>> +	} else if (cfgnum == 1) {
>>> +		rte_kvargs_process(kvlist, MRVL_CFG_ARG,
>>> +				   mrvl_get_qoscfg, &mrvl_qos_cfg);
>>
>> Is the expected format/contect of the config file documented? How one
>> can know how to create a config file?
>>
> 
> Right, documentation is missing for that. Will add in v4.
> 
>>> +	}
>>> +
>>> +	/*
>>> +	 * ret == -EEXIST is correct, it means DMA
>>> +	 * has been already initialized (by another PMD).
>>> +	 */
>>> +	ret = mv_sys_dma_mem_init(RTE_MRVL_MUSDK_DMA_MEMSIZE);
>>> +	if (ret < 0 && ret != -EEXIST)
>>> +		goto out_free_kvlist;
>>> +
>>> +	ret = mrvl_init_pp2();
>>> +	if (ret) {
>>> +		RTE_LOG(ERR, PMD, "Failed to init PP!\n");
>>> +		goto out_deinit_dma;
>>> +	}
>>> +
>>> +	ret = mrvl_init_hifs();
>>> +	if (ret)
>>> +		goto out_deinit_hifs;
>>> +
>>> +	for (i = 0; i < ifnum; i++) {
>>> +		RTE_LOG(INFO, PMD, "Creating %s\n", ifnames[i]);
>>> +		ret = mrvl_eth_dev_create(vdev, ifnames[i]);
>>
>> So you are supporting multiple ethdev devices created by single vdev
>> device, by providing multiple "iface" argument in device args.
>>
>> This will cause eal create single virtual device but driver create
>> multiple ethdev devices. I don't see direct problem with this but lets
>> think about it.
>> This can be problem if you want to provide ethdev specific device
>> arguments. Perhaps that is why you need to provide a config file ?
>>
>> It can be an option to define each ethdev with:
>> "--vdev net_mrvl0,iface=xx0,config=yy0 --vdev
>> net_mrlv1,iface=xx1,config=yy1 ..."
>>
>> This may remove your dependecy to librte_cfgfile.
>>
> 
> Currently there's not need to passing separate options to each created
> device. As for configuration file it handles all devices at once.

Ok, that was an option...

> 
>>> +		if (ret)
>>> +			goto out_cleanup;
>>> +	}
>>> +
>>> +	rte_kvargs_free(kvlist);
>>> +
>>> +	memset(mrvl_port_bpool_size, 0, sizeof(mrvl_port_bpool_size));
>>> +
>>> +	mrvl_lcore_first = RTE_MAX_LCORE;
>>> +	mrvl_lcore_last = 0;
>>> +
>>> +	RTE_LCORE_FOREACH(core_id) {
>>> +		mrvl_set_first_last_cores(core_id);
>>
>> This sets limits of core_id. Why you need to know this in PMD level?
> 
> It's just to limit number of entries in mrvl_port_bpool_size we iterate
> over every time we want to count the total number of buffers in the
> hardware buffer pool.
> 
>>
>> <...>
>>
> 
> --
> - Tomasz Duszyński
>
  
Tomasz Duszynski Oct. 6, 2017, 6:41 a.m. UTC | #9
On Thu, Oct 05, 2017 at 06:29:12PM +0100, Ferruh Yigit wrote:
> On 10/5/2017 9:43 AM, Tomasz Duszynski wrote:
> > On Wed, Oct 04, 2017 at 05:59:11PM +0100, Ferruh Yigit wrote:
> >> On 10/4/2017 9:59 AM, Tomasz Duszynski wrote:
> >>> On Wed, Oct 04, 2017 at 01:24:27AM +0100, Ferruh Yigit wrote:
> >>>> On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
> >>>>> Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
> >>>>> Driver is based on external, publicly available, light-weight Marvell
> >>>>> MUSDK library that provides access to network packet processor.
> >>>>>
> >>>>> Driver comes with support for the following features:
> >>>>>
> >>>>> * Speed capabilities
> >>>>> * Link status
> >>>>> * Queue start/stop
> >>>>> * MTU update
> >>>>> * Jumbo frame
> >>>>> * Promiscuous mode
> >>>>> * Allmulticast mode
> >>>>> * Unicast MAC filter
> >>>>> * Multicast MAC filter
> >>>>> * RSS hash
> >>>>> * VLAN filter
> >>>>> * CRC offload
> >>>>> * L3 checksum offload
> >>>>> * L4 checksum offload
> >>>>> * Packet type parsing
> >>>>> * Basic stats
> >>>>> * Stats per queue
> >>>>
> >>>> I have more detailed comments but in high level,
> >>>> what do you think splitting this patch into three patches:
> >>>> - Skeleton
> >>>> - Add Rx/Tx support
> >>>> - Add features, like MTU update or Promiscuous etc.. support
> >>> If it's how submission process works then I think you left me with no
> >>> other option than splitting driver into nice patchset :).
> >>
> >> No, there is no defined submission process.
> >>
> >>> On the other
> >>> hand driver is really a wrapper to MUSDK library and thus quite easy to
> >>> follow. What are the benefits of such 3-way split?
> >>
> >> To help others review/understand your code. Big code chunks are scary
> >> and I believe most of details gets lost in big code chunks.
> >>
> >> When someone from community wants to understand and update/improve/fix
> >> your code, to help them by logically split the code that their focus can
> >> go into more narrow part.
> >>
> >> But this also means some effort in your side, so some kind of balance is
> >> required.
> >>
> >> I think splitting patch into smaller logical part is helpful for others,
> >> what do you think, is it too much effort?
> >>
> >
> > Fair enough. I'll split the driver as suggested. A few specific
> > questions about functionality each patch should contain though.
> >
> > As for skeleton, I see others just put driver probing here.
> >
> > As for Rx/Tx support it seems that there's no common pattern.
> > Functionality like starting/stopping device, queues configuration
> > and all the other things related to Rx/Tx should be here as well?
>
> As you said there is no common pattern, but I think starting/stopping
> device, queues configuration can go into skeleton and mainly Rx/Tx burst
> functions can go into Rx/Tx patch.
> But please what you think more reasonable matters here.
>

ACK

> >
> > What's left are features which go into features-patch.
>
> Yes.
> And the .ini file, currently part of doc patch, can be part of this
> features patch, it is helps more to see the code add feature and doc
> documents it in same patch.
>

ACK

> >
> >>>>
> >>>>>
> >>>>> Driver was engineered cooperatively by Semihalf and Marvell teams.
> >>>>>
> >>>>> Semihalf:
> >>>>> Jacek Siuda <jck@semihalf.com>
> >>>>> Tomasz Duszynski <tdu@semihalf.com>
> >>>>>
> >>>>> Marvell:
> >>>>> Dmitri Epshtein <dima@marvell.com>
> >>>>> Natalie Samsonov <nsamsono@marvell.com>
> >>>>>
> >>>>> Signed-off-by: Jacek Siuda <jck@semihalf.com>
> >>>>> Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
> >>>>
> >>>> <...>
> >>>>
> >>>>> +static struct rte_vdev_driver pmd_mrvl_drv = {
> >>>>> +	.probe = rte_pmd_mrvl_probe,
> >>>>> +	.remove = rte_pmd_mrvl_remove,
> >>>>> +};
> >>>>> +
> >>>>> +RTE_PMD_REGISTER_VDEV(net_mrvl, pmd_mrvl_drv);
> >>>>
> >>>> Please help me understand.
> >>>>
> >>>> This driver implemented as virtual driver, because:
> >>>> With the help of custom kernel modules, musdk library already provides
> >>>> userspace datapath support. This PMD is an interface to musdk library.
> >>>> Is this correct?
> >>> That is right. Another reason this NIC is not PCI device.
> >>
> >> We support more bus now :). Out of curiosity, which bus is device on?
> >
> > Bus is called Aurora2. That's proprietary SoC interconnect fabric.
> >
> >>
> >>>>
> >>>> If so, just thinking loud:
> >>>> - Why not implement this PMD directly on top of kernel interface,
> >>>> removing musdk layer completely?
> >>>> - How big problem that this PMD depends on custom kernel code?
> >>> I think the main reason is that MUSDK is already used in different projects.
> >>> Keeping multiple codebases offering similar functionality would be quite
> >>> demanding in terms of extra work needed.
> >>>> - How library and custom kernel code delivered? For which platforms?
> >>> Kernel and library sources are hosted on publicly available repository.
> >>
> >> I guess it would be nice to highlight custom kernel with external
> >> patches is required. This is not mentioned in "Prerequisites" section of
> >> the document.
> >>
> >
> > ACK
> >
> >>> Driver was tested on Armada 7k/8k SoCs.
> >>
> >> Can you please provide link to the HW mentioned in documentation?
> >>
> >
> > You can find some info here:
> >
> > https://www.marvell.com/embedded-processors/armada-70xx/
> > https://www.marvell.com/embedded-processors/armada-80xx/
>
> Thanks, would you mind putting these links into driver documentation as
> well?

ACK

>
> >
> >>>>
> >>>> <....>
> >>>>
> >>>
> >>> --
> >>> - Tomasz Duszyński
> >>>
> >>
> >
> > --
> > - Tomasz Duszyński
> >
>

--
- Tomasz Duszyński
  
Thomas Monjalon Oct. 10, 2017, 9:25 p.m. UTC | #10
05/10/2017 10:43, Tomasz Duszynski:
> On Wed, Oct 04, 2017 at 05:59:11PM +0100, Ferruh Yigit wrote:
> > On 10/4/2017 9:59 AM, Tomasz Duszynski wrote:
> > > On Wed, Oct 04, 2017 at 01:24:27AM +0100, Ferruh Yigit wrote:
> > >> On 10/3/2017 12:51 PM, Tomasz Duszynski wrote:
> > >>> Add support for the Marvell PPv2 (Packet Processor v2) 1/10 Gbps adapter.
> > >>> Driver is based on external, publicly available, light-weight Marvell
> > >>> MUSDK library that provides access to network packet processor.
[...]
> > >>> +static struct rte_vdev_driver pmd_mrvl_drv = {
> > >>> +	.probe = rte_pmd_mrvl_probe,
> > >>> +	.remove = rte_pmd_mrvl_remove,
> > >>> +};
> > >>> +
> > >>> +RTE_PMD_REGISTER_VDEV(net_mrvl, pmd_mrvl_drv);
> > >>
> > >> Please help me understand.
> > >>
> > >> This driver implemented as virtual driver, because:
> > >> With the help of custom kernel modules, musdk library already provides
> > >> userspace datapath support. This PMD is an interface to musdk library.
> > >> Is this correct?
> > > That is right. Another reason this NIC is not PCI device.
> >
> > We support more bus now :). Out of curiosity, which bus is device on?
> 
> Bus is called Aurora2. That's proprietary SoC interconnect fabric.

So you should provide drivers/bus/aurora2/.
It would do a software scan of devices (probably looking in sysfs).
Then the probe function is nearly the same as with vdev init.
It could provide a better user experience by removing the need of
explicit declaration of devices. It will allow to be integrated in
a more generic whitelist/blacklist mechanism.
And having such well defined bus code and objects will probably help
in your future developments.
  

Patch

diff --git a/config/common_base b/config/common_base
index 5e97a08..d05a60c 100644
--- a/config/common_base
+++ b/config/common_base
@@ -262,6 +262,13 @@  CONFIG_RTE_LIBRTE_NFP_PMD=n
 CONFIG_RTE_LIBRTE_NFP_DEBUG=n

 #
+# Compile Marvell PMD driver
+#
+CONFIG_RTE_LIBRTE_MRVL_PMD=n
+CONFIG_RTE_LIBRTE_MRVL_DEBUG=n
+CONFIG_RTE_MRVL_MUSDK_DMA_MEMSIZE=41943040
+
+#
 # Compile burst-oriented Broadcom BNXT PMD driver
 #
 CONFIG_RTE_LIBRTE_BNXT_PMD=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index d33c959..4a3a205 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -73,6 +73,8 @@  DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
 DEPDIRS-mlx4 = $(core-libs)
 DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
 DEPDIRS-mlx5 = $(core-libs)
+DIRS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl
+DEPDIRS-mrvl = $(core-libs)
 DIRS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp
 DEPDIRS-nfp = $(core-libs)
 DIRS-$(CONFIG_RTE_LIBRTE_BNXT_PMD) += bnxt
diff --git a/drivers/net/mrvl/Makefile b/drivers/net/mrvl/Makefile
new file mode 100644
index 0000000..ab53f49
--- /dev/null
+++ b/drivers/net/mrvl/Makefile
@@ -0,0 +1,69 @@ 
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Semihalf. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Semihalf nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+ifneq ($(MAKECMDGOALS),clean)
+ifneq ($(MAKECMDGOALS),config)
+ifeq ($(LIBMUSDK_PATH),)
+$(error "Please define LIBMUSDK_PATH environment variable")
+endif
+ifeq ($(CONFIG_RTE_LIBRTE_CFGFILE),n)
+$(error "RTE_LIBRTE_CFGFILE must be enabled in configuration!")
+endif
+endif
+endif
+
+# library name
+LIB = librte_pmd_mrvl.a
+
+# library version
+LIBABIVER := 1
+
+# versioning export map
+EXPORT_MAP := rte_pmd_mrvl_version.map
+
+# external library dependencies
+CFLAGS += -I$(LIBMUSDK_PATH)/include
+CFLAGS += -DMVCONF_ARCH_DMA_ADDR_T_64BIT
+CFLAGS += -DCONF_PP2_BPOOL_COOKIE_SIZE=32
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -O3
+LDLIBS += -L$(LIBMUSDK_PATH)/lib
+LDLIBS += -lmusdk
+
+# library source files
+SRCS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl_qos.c
+
+# library dependencies
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += lib/librte_cfgfile
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/mrvl/mrvl_ethdev.c b/drivers/net/mrvl/mrvl_ethdev.c
new file mode 100644
index 0000000..a260de5
--- /dev/null
+++ b/drivers/net/mrvl/mrvl_ethdev.c
@@ -0,0 +1,2274 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Semihalf. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Semihalf nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_ethdev.h>
+#include <rte_kvargs.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_vdev.h>
+#include <rte_cycles.h>
+
+/* Unluckily, container_of is defined by both DPDK and MUSDK,
+ * we'll declare only one version.
+ *
+ * Note that it is not used in this PMD anyway.
+ */
+#ifdef container_of
+#undef container_of
+#endif
+
+#include <drivers/mv_pp2.h>
+#include <drivers/mv_pp2_bpool.h>
+#include <drivers/mv_pp2_hif.h>
+
+#include <assert.h>
+#include <fcntl.h>
+#include <linux/ethtool.h>
+#include <linux/sockios.h>
+#include <net/if.h>
+#include <net/if_arp.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+
+#include "mrvl_ethdev.h"
+#include "mrvl_qos.h"
+
+/* bitmask with reserved hifs */
+#define MRVL_MUSDK_HIFS_RESERVED 0x0F
+/* bitmask with reserved bpools */
+#define MRVL_MUSDK_BPOOLS_RESERVED 0x07
+/* bitmask with reserved kernel RSS tables */
+#define MRVL_MUSDK_RSS_RESERVED 0x01
+/* maximum number of available hifs */
+#define MRVL_MUSDK_HIFS_MAX 9
+
+/* prefetch shift */
+#define MRVL_MUSDK_PREFETCH_SHIFT 2
+
+/* TCAM has 25 entries reserved for uc/mc filter entries */
+#define MRVL_MAC_ADDRS_MAX 25
+#define MRVL_MATCH_LEN 16
+#define MRVL_PKT_EFFEC_OFFS (MRVL_PKT_OFFS + MV_MH_SIZE)
+/* Maximum allowable packet size */
+#define MRVL_PKT_SIZE_MAX (10240 - MV_MH_SIZE)
+
+#define MRVL_IFACE_NAME_ARG "iface"
+#define MRVL_CFG_ARG "cfg"
+
+#define MRVL_BURST_SIZE 64
+
+#define MRVL_ARP_LENGTH 28
+
+#define MRVL_COOKIE_ADDR_INVALID ~0ULL
+
+#define MRVL_COOKIE_HIGH_ADDR_SHIFT	(sizeof(pp2_cookie_t) * 8)
+#define MRVL_COOKIE_HIGH_ADDR_MASK	(~0ULL << MRVL_COOKIE_HIGH_ADDR_SHIFT)
+
+static const char * const valid_args[] = {
+	MRVL_IFACE_NAME_ARG,
+	MRVL_CFG_ARG,
+	NULL
+};
+
+static int used_hifs = MRVL_MUSDK_HIFS_RESERVED;
+static struct pp2_hif *hifs[RTE_MAX_LCORE];
+static int used_bpools[PP2_NUM_PKT_PROC] = {
+	MRVL_MUSDK_BPOOLS_RESERVED,
+	MRVL_MUSDK_BPOOLS_RESERVED
+};
+
+struct pp2_bpool *mrvl_port_to_bpool_lookup[RTE_MAX_ETHPORTS];
+int mrvl_port_bpool_size[PP2_NUM_PKT_PROC][PP2_BPOOL_NUM_POOLS][RTE_MAX_LCORE];
+uint64_t cookie_addr_high = MRVL_COOKIE_ADDR_INVALID;
+
+/*
+ * To use buffer harvesting based on loopback port shadow queue structure
+ * was introduced for buffers information bookkeeping.
+ *
+ * Before sending the packet, related buffer information (pp2_buff_inf) is
+ * stored in shadow queue. After packet is transmitted no longer used
+ * packet buffer is released back to it's original hardware pool,
+ * on condition it originated from interface.
+ * In case it  was generated by application itself i.e: mbuf->port field is
+ * 0xff then its released to software mempool.
+ */
+struct mrvl_shadow_txq {
+	int head;           /* write index - used when sending buffers */
+	int tail;           /* read index - used when releasing buffers */
+	u16 size;           /* queue occupied size */
+	u16 num_to_release; /* number of buffers sent, that can be released */
+	struct buff_release_entry ent[MRVL_PP2_TX_SHADOWQ_SIZE]; /* q entries */
+};
+
+struct mrvl_rxq {
+	struct mrvl_priv *priv;
+	struct rte_mempool *mp;
+	int queue_id;
+	int port_id;
+	int cksum_enabled;
+	uint64_t bytes_recv;
+	uint64_t drop_mac;
+};
+
+struct mrvl_txq {
+	struct mrvl_priv *priv;
+	int queue_id;
+	int port_id;
+	uint64_t bytes_sent;
+};
+
+/*
+ * Every tx queue should have dedicated shadow tx queue.
+ *
+ * Ports assigned by DPDK might not start at zero or be continuous so
+ * as a workaround define shadow queues for each possible port so that
+ * we eventually fit somewhere.
+ */
+struct mrvl_shadow_txq shadow_txqs[RTE_MAX_ETHPORTS][RTE_MAX_LCORE];
+
+/** Number of ports configured. */
+int mrvl_ports_nb;
+static int mrvl_lcore_first;
+static int mrvl_lcore_last;
+
+static inline int
+mrvl_get_bpool_size(int pp2_id, int pool_id)
+{
+	int i;
+	int size = 0;
+
+	for (i = mrvl_lcore_first; i <= mrvl_lcore_last; i++)
+		size += mrvl_port_bpool_size[pp2_id][pool_id][i];
+
+	return size;
+}
+
+static inline int
+mrvl_reserve_bit(int *bitmap, int max)
+{
+	int n = sizeof(*bitmap) * 8 - __builtin_clz(*bitmap);
+
+	if (n >= max)
+		return -1;
+
+	*bitmap |= 1 << n;
+
+	return n;
+}
+
+/**
+ * Configure rss based on dpdk rss configuration.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param rss_conf
+ *   Pointer to RSS configuration.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_configure_rss(struct mrvl_priv *priv, struct rte_eth_rss_conf *rss_conf)
+{
+	if (rss_conf->rss_key)
+		RTE_LOG(WARNING, PMD, "Changing hash key is not supported\n");
+
+	if (rss_conf->rss_hf == 0) {
+		priv->ppio_params.inqs_params.hash_type = PP2_PPIO_HASH_T_NONE;
+	} else if (rss_conf->rss_hf & ETH_RSS_IPV4) {
+		priv->ppio_params.inqs_params.hash_type =
+			PP2_PPIO_HASH_T_2_TUPLE;
+	} else if (rss_conf->rss_hf & ETH_RSS_NONFRAG_IPV4_TCP) {
+		priv->ppio_params.inqs_params.hash_type =
+			PP2_PPIO_HASH_T_5_TUPLE;
+		priv->rss_hf_tcp = 1;
+	} else if (rss_conf->rss_hf & ETH_RSS_NONFRAG_IPV4_UDP) {
+		priv->ppio_params.inqs_params.hash_type =
+			PP2_PPIO_HASH_T_5_TUPLE;
+		priv->rss_hf_tcp = 0;
+	} else {
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ * Ethernet device configuration.
+ *
+ * Prepare the driver for a given number of TX and RX queues and
+ * configure RSS.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_dev_configure(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	int ret;
+
+	if (dev->data->dev_conf.rxmode.mq_mode != ETH_MQ_RX_NONE &&
+	    dev->data->dev_conf.rxmode.mq_mode != ETH_MQ_RX_RSS) {
+		RTE_LOG(INFO, PMD, "Unsupported rx multi queue mode %d\n",
+			dev->data->dev_conf.rxmode.mq_mode);
+		return -EINVAL;
+	}
+
+	if (!dev->data->dev_conf.rxmode.hw_strip_crc) {
+		RTE_LOG(INFO, PMD,
+			"L2 CRC stripping is always enabled in hw\n");
+		dev->data->dev_conf.rxmode.hw_strip_crc = 1;
+	}
+
+	if (dev->data->dev_conf.rxmode.hw_vlan_strip) {
+		RTE_LOG(INFO, PMD, "VLAN stripping not supported\n");
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_conf.rxmode.split_hdr_size) {
+		RTE_LOG(INFO, PMD, "Split headers not supported\n");
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_conf.rxmode.enable_scatter) {
+		RTE_LOG(INFO, PMD, "RX Scatter/Gather not supported\n");
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_conf.rxmode.enable_lro) {
+		RTE_LOG(INFO, PMD, "LRO not supported\n");
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_conf.rxmode.jumbo_frame)
+		dev->data->mtu = dev->data->dev_conf.rxmode.max_rx_pkt_len -
+				 ETHER_HDR_LEN - ETHER_CRC_LEN;
+
+	ret = mrvl_configure_rxqs(priv, dev->data->port_id,
+				  dev->data->nb_rx_queues);
+	if (ret < 0)
+		return ret;
+
+	priv->ppio_params.outqs_params.num_outqs = dev->data->nb_tx_queues;
+	priv->ppio_params.maintain_stats = 1;
+	priv->nb_rx_queues = dev->data->nb_rx_queues;
+
+	if (dev->data->nb_rx_queues == 1 &&
+	    dev->data->dev_conf.rxmode.mq_mode == ETH_MQ_RX_RSS) {
+		RTE_LOG(WARNING, PMD, "Disabling hash for 1 rx queue\n");
+		priv->ppio_params.inqs_params.hash_type = PP2_PPIO_HASH_T_NONE;
+
+		return 0;
+	}
+
+	return mrvl_configure_rss(priv,
+				  &dev->data->dev_conf.rx_adv_conf.rss_conf);
+}
+
+/**
+ * DPDK callback to change the MTU.
+ *
+ * Setting the MTU affects hardware MRU (packets larger than the MRU
+ * will be dropped).
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param mtu
+ *   New MTU.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	/* extra MV_MH_SIZE bytes are required for Marvell tag */
+	uint16_t mru = mtu + MV_MH_SIZE + ETHER_HDR_LEN + ETHER_CRC_LEN;
+	int ret;
+
+	if (mtu < ETHER_MIN_MTU || mru > MRVL_PKT_SIZE_MAX)
+		return -EINVAL;
+
+	ret = pp2_ppio_set_mru(priv->ppio, mru);
+	if (ret)
+		return ret;
+
+	return pp2_ppio_set_mtu(priv->ppio, mtu);
+}
+
+/**
+ * DPDK callback to bring the link up.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_dev_set_link_up(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	int ret;
+
+	ret = pp2_ppio_enable(priv->ppio);
+	if (ret)
+		return ret;
+
+	/*
+	 * mtu/mru can be updated if pp2_ppio_enable() was called at least once
+	 * as pp2_ppio_enable() changes port->t_mode from default 0 to
+	 * PP2_TRAFFIC_INGRESS_EGRESS.
+	 *
+	 * Set mtu to default DPDK value here.
+	 */
+	ret = mrvl_mtu_set(dev, dev->data->mtu);
+	if (ret)
+		pp2_ppio_disable(priv->ppio);
+
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return ret;
+}
+
+/**
+ * DPDK callback to bring the link down.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_dev_set_link_down(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	int ret;
+
+	ret = pp2_ppio_disable(priv->ppio);
+	if (ret)
+		return ret;
+
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+
+	return ret;
+}
+
+/**
+ * DPDK callback to start the device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, negative errno value on failure.
+ */
+static int
+mrvl_dev_start(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	char match[MRVL_MATCH_LEN];
+	int ret;
+
+	snprintf(match, sizeof(match), "ppio-%d:%d",
+		 priv->pp_id, priv->ppio_id);
+	priv->ppio_params.match = match;
+
+	/*
+	 * Calculate the maximum bpool size for refill feature to 1.5 of the
+	 * configured size. In case the bpool size will exceed this value,
+	 * superfluous buffers will be removed
+	 */
+	priv->bpool_max_size = priv->bpool_init_size +
+			      (priv->bpool_init_size >> 1);
+	/*
+	 * Calculate the minimum bpool size for refill feature as follows:
+	 * 2 default burst sizes multiply by number of rx queues.
+	 * If the bpool size will be below this value, new buffers will
+	 * be added to the pool.
+	 */
+	priv->bpool_min_size = priv->nb_rx_queues * MRVL_BURST_SIZE * 2;
+
+	ret = pp2_ppio_init(&priv->ppio_params, &priv->ppio);
+	if (ret)
+		return ret;
+
+	/*
+	 * In case there are some some stale uc/mc mac addresses flush them
+	 * here. It cannot be done during mrvl_dev_close() as port information
+	 * is already gone at that point (due to pp2_ppio_deinit() in
+	 * mrvl_dev_stop()).
+	 */
+	if (!priv->uc_mc_flushed) {
+		ret = pp2_ppio_flush_mac_addrs(priv->ppio, 1, 1);
+		if (ret) {
+			RTE_LOG(ERR, PMD,
+				"Failed to flush uc/mc filter list\n");
+			goto out;
+		}
+		priv->uc_mc_flushed = 1;
+	}
+
+	if (!priv->vlan_flushed) {
+		ret = pp2_ppio_flush_vlan(priv->ppio);
+		if (ret) {
+			RTE_LOG(ERR, PMD, "Failed to flush vlan list\n");
+			/*
+			 * TODO
+			 * once pp2_ppio_flush_vlan() is supported jump to out
+			 * goto out;
+			 */
+		}
+		priv->vlan_flushed = 1;
+	}
+
+	/* For default QoS config, don't start classifier. */
+	if (mrvl_qos_cfg) {
+		ret = mrvl_start_qos_mapping(priv);
+		if (ret) {
+			pp2_ppio_deinit(priv->ppio);
+			return ret;
+		}
+	}
+
+	ret = mrvl_dev_set_link_up(dev);
+	if (ret)
+		goto out;
+
+	return 0;
+out:
+	pp2_ppio_deinit(priv->ppio);
+	return ret;
+}
+
+/**
+ * Flush receive queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_flush_rx_queues(struct rte_eth_dev *dev)
+{
+	int i;
+
+	RTE_LOG(INFO, PMD, "Flushing rx queues\n");
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		int ret, num;
+
+		do {
+			struct mrvl_rxq *q = dev->data->rx_queues[i];
+			struct pp2_ppio_desc descs[MRVL_PP2_RXD_MAX];
+
+			num = MRVL_PP2_RXD_MAX;
+			ret = pp2_ppio_recv(q->priv->ppio,
+					    q->priv->rxq_map[q->queue_id].tc,
+					    q->priv->rxq_map[q->queue_id].inq,
+					    descs, (uint16_t *)&num);
+		} while (ret == 0 && num);
+	}
+}
+
+/**
+ * Flush transmit shadow queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_flush_tx_shadow_queues(struct rte_eth_dev *dev)
+{
+	int i;
+
+	RTE_LOG(INFO, PMD, "Flushing tx shadow queues\n");
+	for (i = 0; i < RTE_MAX_LCORE; i++) {
+		struct mrvl_shadow_txq *sq =
+			&shadow_txqs[dev->data->port_id][i];
+
+		while (sq->tail != sq->head) {
+			uint64_t addr = cookie_addr_high |
+					sq->ent[sq->tail].buff.cookie;
+			rte_pktmbuf_free((struct rte_mbuf *)addr);
+			sq->tail = (sq->tail + 1) & MRVL_PP2_TX_SHADOWQ_MASK;
+		}
+
+		memset(sq, 0, sizeof(*sq));
+	}
+}
+
+/**
+ * Flush hardware bpool (buffer-pool).
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_flush_bpool(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	uint32_t num;
+	int ret;
+
+	ret = pp2_bpool_get_num_buffs(priv->bpool, &num);
+	if (ret) {
+		RTE_LOG(ERR, PMD, "Failed to get bpool buffers number\n");
+		return;
+	}
+
+	while (num--) {
+		struct pp2_buff_inf inf;
+		uint64_t addr;
+
+		ret = pp2_bpool_get_buff(hifs[rte_lcore_id()], priv->bpool,
+					 &inf);
+		if (ret)
+			break;
+
+		addr = cookie_addr_high | inf.cookie;
+		rte_pktmbuf_free((struct rte_mbuf *)addr);
+	}
+}
+
+/**
+ * DPDK callback to stop the device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_dev_stop(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+
+	mrvl_dev_set_link_down(dev);
+	mrvl_flush_rx_queues(dev);
+	mrvl_flush_tx_shadow_queues(dev);
+	if (priv->qos_tbl)
+		pp2_cls_qos_tbl_deinit(priv->qos_tbl);
+	pp2_ppio_deinit(priv->ppio);
+	priv->ppio = NULL;
+}
+
+/**
+ * DPDK callback to close the device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_dev_close(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	size_t i;
+
+	for (i = 0; i < priv->ppio_params.inqs_params.num_tcs; ++i) {
+		struct pp2_ppio_tc_params *tc_params =
+			&priv->ppio_params.inqs_params.tcs_params[i];
+
+		if (tc_params->inqs_params) {
+			rte_free(tc_params->inqs_params);
+			tc_params->inqs_params = NULL;
+		}
+	}
+
+	mrvl_flush_bpool(dev);
+}
+
+/**
+ * DPDK callback to retrieve physical link information.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param wait_to_complete
+ *   Wait for request completion (ignored).
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused)
+{
+	/*
+	 * TODO
+	 * once MUSDK provides necessary API use it here
+	 */
+	struct ethtool_cmd edata;
+	struct ifreq req;
+	int ret, fd;
+
+	edata.cmd = ETHTOOL_GSET;
+
+	strcpy(req.ifr_name, dev->data->name);
+	req.ifr_data = (void *)&edata;
+
+	fd = socket(AF_INET, SOCK_DGRAM, 0);
+	if (fd == -1)
+		return -EFAULT;
+
+	ret = ioctl(fd, SIOCETHTOOL, &req);
+	if (ret == -1) {
+		close(fd);
+		return -EFAULT;
+	}
+
+	close(fd);
+
+	switch (ethtool_cmd_speed(&edata)) {
+	case SPEED_10:
+		dev->data->dev_link.link_speed = ETH_SPEED_NUM_10M;
+		break;
+	case SPEED_100:
+		dev->data->dev_link.link_speed = ETH_SPEED_NUM_100M;
+		break;
+	case SPEED_1000:
+		dev->data->dev_link.link_speed = ETH_SPEED_NUM_1G;
+		break;
+	case SPEED_10000:
+		dev->data->dev_link.link_speed = ETH_SPEED_NUM_10G;
+		break;
+	default:
+		dev->data->dev_link.link_speed = ETH_SPEED_NUM_NONE;
+	}
+
+	dev->data->dev_link.link_duplex = edata.duplex ? ETH_LINK_FULL_DUPLEX :
+							 ETH_LINK_HALF_DUPLEX;
+	dev->data->dev_link.link_autoneg = edata.autoneg ? ETH_LINK_AUTONEG :
+							   ETH_LINK_FIXED;
+
+	return 0;
+}
+
+/**
+ * DPDK callback to enable promiscuous mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	int ret;
+
+	ret = pp2_ppio_set_uc_promisc(priv->ppio, 1);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Failed to enable promiscuous mode\n");
+}
+
+/**
+ * DPDK callback to enable allmulti mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	int ret;
+
+	ret = pp2_ppio_set_mc_promisc(priv->ppio, 1);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Failed enable all-multicast mode\n");
+}
+
+/**
+ * DPDK callback to disable promiscuous mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	int ret;
+
+	ret = pp2_ppio_set_uc_promisc(priv->ppio, 0);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Failed to disable promiscuous mode\n");
+}
+
+/**
+ * DPDK callback to disable allmulticast mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	int ret;
+
+	ret = pp2_ppio_set_mc_promisc(priv->ppio, 0);
+	if (ret)
+		RTE_LOG(ERR, PMD, "Failed to disable all-multicast mode\n");
+}
+
+/**
+ * DPDK callback to remove a MAC address.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param index
+ *   MAC address index.
+ */
+static void
+mrvl_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	char buf[ETHER_ADDR_FMT_SIZE];
+	int ret;
+
+	ret = pp2_ppio_remove_mac_addr(priv->ppio,
+				       dev->data->mac_addrs[index].addr_bytes);
+	if (ret) {
+		ether_format_addr(buf, sizeof(buf),
+				  &dev->data->mac_addrs[index]);
+		RTE_LOG(ERR, PMD, "Failed to remove mac %s\n", buf);
+	}
+}
+
+/**
+ * DPDK callback to add a MAC address.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param mac_addr
+ *   MAC address to register.
+ * @param index
+ *   MAC address index.
+ * @param vmdq
+ *   VMDq pool index to associate address with (unused).
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+		  uint32_t index, uint32_t vmdq __rte_unused)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	char buf[ETHER_ADDR_FMT_SIZE];
+	int ret;
+
+	if (index == 0)
+		/* For setting index 0, mrvl_mac_addr_set() should be used.*/
+		return -1;
+
+	/*
+	 * Maximum number of uc addresses can be tuned via kernel module mvpp2x
+	 * parameter uc_filter_max. Maximum number of mc addresses is then
+	 * MRVL_MAC_ADDRS_MAX - uc_filter_max. Currently it defaults to 4 and
+	 * 21 respectively.
+	 *
+	 * If more than uc_filter_max uc addresses were added to filter list
+	 * then NIC will switch to promiscuous mode automatically.
+	 *
+	 * If more than MRVL_MAC_ADDRS_MAX - uc_filter_max number mc addresses
+	 * were added to filter list then NIC will switch to all-multicast mode
+	 * automatically.
+	 */
+	ret = pp2_ppio_add_mac_addr(priv->ppio, mac_addr->addr_bytes);
+	if (ret) {
+		ether_format_addr(buf, sizeof(buf), mac_addr);
+		RTE_LOG(ERR, PMD, "Failed to add mac %s\n", buf);
+		return -1;
+	}
+
+	return 0;
+}
+
+/**
+ * DPDK callback to set the primary MAC address.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param mac_addr
+ *   MAC address to register.
+ */
+static void
+mrvl_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+
+	pp2_ppio_set_mac_addr(priv->ppio, mac_addr->addr_bytes);
+	/*
+	 * TODO
+	 * Port stops sending packets if pp2_ppio_set_mac_addr()
+	 * was called after pp2_ppio_enable(). As a quick fix issue
+	 * enable port once again.
+	 */
+	pp2_ppio_enable(priv->ppio);
+}
+
+/**
+ * DPDK callback to get device statistics.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param stats
+ *   Stats structure output buffer.
+ */
+static void
+mrvl_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	struct pp2_ppio_statistics ppio_stats;
+	uint64_t drop_mac = 0;
+	unsigned int i, idx, ret;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct mrvl_rxq *rxq = dev->data->rx_queues[i];
+		struct pp2_ppio_inq_statistics rx_stats;
+
+		if (!rxq)
+			continue;
+
+		idx = rxq->queue_id;
+		if (unlikely(idx >= RTE_ETHDEV_QUEUE_STAT_CNTRS)) {
+			RTE_LOG(ERR, PMD,
+				"rx queue %d stats out of range (0 - %d)\n",
+				idx, RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
+			continue;
+		}
+
+		ret = pp2_ppio_inq_get_statistics(priv->ppio,
+						  priv->rxq_map[idx].tc,
+						  priv->rxq_map[idx].inq,
+						  &rx_stats, 0);
+		if (unlikely(ret)) {
+			RTE_LOG(ERR, PMD,
+				"Failed to update rx queue %d stats\n", idx);
+			break;
+		}
+
+		stats->q_ibytes[idx] = rxq->bytes_recv;
+		stats->q_ipackets[idx] = rx_stats.enq_desc - rxq->drop_mac;
+		stats->q_errors[idx] = rx_stats.drop_early +
+				       rx_stats.drop_fullq +
+				       rx_stats.drop_bm +
+				       rxq->drop_mac;
+		stats->ibytes += rxq->bytes_recv;
+		drop_mac += rxq->drop_mac;
+	}
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct mrvl_txq *txq = dev->data->tx_queues[i];
+		struct pp2_ppio_outq_statistics tx_stats;
+
+		if (!txq)
+			continue;
+
+		idx = txq->queue_id;
+		if (unlikely(idx >= RTE_ETHDEV_QUEUE_STAT_CNTRS)) {
+			RTE_LOG(ERR, PMD,
+				"tx queue %d stats out of range (0 - %d)\n",
+				idx, RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
+		}
+
+		ret = pp2_ppio_outq_get_statistics(priv->ppio, idx,
+						   &tx_stats, 0);
+		if (unlikely(ret)) {
+			RTE_LOG(ERR, PMD,
+				"Failed to update tx queue %d stats\n", idx);
+			break;
+		}
+
+		stats->q_opackets[idx] = tx_stats.deq_desc;
+		stats->q_obytes[idx] = txq->bytes_sent;
+		stats->obytes += txq->bytes_sent;
+	}
+
+	ret = pp2_ppio_get_statistics(priv->ppio, &ppio_stats, 0);
+	if (unlikely(ret)) {
+		RTE_LOG(ERR, PMD, "Failed to update port statistics\n");
+		return;
+	}
+
+	stats->ipackets += ppio_stats.rx_packets - drop_mac;
+	stats->opackets += ppio_stats.tx_packets;
+	stats->imissed += ppio_stats.rx_fullq_dropped +
+			  ppio_stats.rx_bm_dropped +
+			  ppio_stats.rx_early_dropped +
+			  ppio_stats.rx_fifo_dropped +
+			  ppio_stats.rx_cls_dropped;
+	stats->ierrors = drop_mac;
+}
+
+/**
+ * DPDK callback to clear device statistics.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+static void
+mrvl_stats_reset(struct rte_eth_dev *dev)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	int i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct mrvl_rxq *rxq = dev->data->rx_queues[i];
+
+		pp2_ppio_inq_get_statistics(priv->ppio, priv->rxq_map[i].tc,
+					    priv->rxq_map[i].inq, NULL, 1);
+		rxq->bytes_recv = 0;
+		rxq->drop_mac = 0;
+	}
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct mrvl_txq *txq = dev->data->tx_queues[i];
+
+		pp2_ppio_outq_get_statistics(priv->ppio, i, NULL, 1);
+		txq->bytes_sent = 0;
+	}
+
+	pp2_ppio_get_statistics(priv->ppio, NULL, 1);
+}
+
+/**
+ * DPDK callback to get information about the device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure (unused).
+ * @param info
+ *   Info structure output buffer.
+ */
+static void
+mrvl_dev_infos_get(struct rte_eth_dev *dev __rte_unused,
+		   struct rte_eth_dev_info *info)
+{
+	info->max_rx_queues = MRVL_PP2_RXQ_MAX;
+	info->max_tx_queues = MRVL_PP2_TXQ_MAX;
+	info->max_mac_addrs = MRVL_MAC_ADDRS_MAX;
+
+	info->rx_desc_lim.nb_max = MRVL_PP2_RXD_MAX;
+	info->rx_desc_lim.nb_min = MRVL_PP2_RXD_MIN;
+	info->rx_desc_lim.nb_align = MRVL_PP2_RXD_ALIGN;
+
+	info->tx_desc_lim.nb_max = MRVL_PP2_TXD_MAX;
+	info->tx_desc_lim.nb_min = MRVL_PP2_TXD_MIN;
+	info->tx_desc_lim.nb_align = MRVL_PP2_TXD_ALIGN;
+
+	info->rx_offload_capa = DEV_RX_OFFLOAD_IPV4_CKSUM |
+				DEV_RX_OFFLOAD_UDP_CKSUM |
+				DEV_RX_OFFLOAD_TCP_CKSUM;
+
+	info->tx_offload_capa = DEV_TX_OFFLOAD_IPV4_CKSUM |
+				DEV_TX_OFFLOAD_UDP_CKSUM |
+				DEV_TX_OFFLOAD_TCP_CKSUM;
+
+	info->flow_type_rss_offloads = ETH_RSS_IPV4 |
+				       ETH_RSS_NONFRAG_IPV4_TCP |
+				       ETH_RSS_NONFRAG_IPV4_UDP;
+
+	/* By default packets are dropped if no descriptors are available */
+	info->default_rxconf.rx_drop_en = 1;
+
+	info->max_rx_pktlen = MRVL_PKT_SIZE_MAX;
+}
+
+/**
+ * Return supported packet types.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure (unused).
+ *
+ * @return
+ *   Const pointer to the table with supported packet types.
+ */
+static const uint32_t *
+mrvl_dev_supported_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+	static const uint32_t ptypes[] = {
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_L3_IPV4,
+		RTE_PTYPE_L3_IPV4_EXT,
+		RTE_PTYPE_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_L3_IPV6,
+		RTE_PTYPE_L3_IPV6_EXT,
+		RTE_PTYPE_L2_ETHER_ARP,
+		RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L4_UDP
+	};
+
+	return ptypes;
+}
+
+/**
+ * DPDK callback to get information about specific receive queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param rx_queue_id
+ *   Receive queue index.
+ * @param qinfo
+ *   Receive queue information structure.
+ */
+static void mrvl_rxq_info_get(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+			      struct rte_eth_rxq_info *qinfo)
+{
+	struct mrvl_rxq *q = dev->data->rx_queues[rx_queue_id];
+	struct mrvl_priv *priv = dev->data->dev_private;
+	int inq = priv->rxq_map[rx_queue_id].inq;
+	int tc = priv->rxq_map[rx_queue_id].tc;
+	struct pp2_ppio_tc_params *tc_params =
+		&priv->ppio_params.inqs_params.tcs_params[tc];
+
+	qinfo->mp = q->mp;
+	qinfo->nb_desc = tc_params->inqs_params[inq].size;
+}
+
+/**
+ * DPDK callback to get information about specific transmit queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param tx_queue_id
+ *   Transmit queue index.
+ * @param qinfo
+ *   Transmit queue information structure.
+ */
+static void mrvl_txq_info_get(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+			      struct rte_eth_txq_info *qinfo)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+
+	qinfo->nb_desc =
+		priv->ppio_params.outqs_params.outqs_params[tx_queue_id].size;
+}
+
+/**
+ * DPDK callback to Configure a VLAN filter.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param vlan_id
+ *   VLAN ID to filter.
+ * @param on
+ *   Toggle filter.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+
+	return on ? pp2_ppio_add_vlan(priv->ppio, vlan_id) :
+		    pp2_ppio_remove_vlan(priv->ppio, vlan_id);
+}
+
+/**
+ * Release buffers to hardware bpool (buffer-pool)
+ *
+ * @param rxq
+ *   Receive queue pointer.
+ * @param num
+ *   Number of buffers to release to bpool.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_fill_bpool(struct mrvl_rxq *rxq, int num)
+{
+	struct buff_release_entry entries[MRVL_PP2_TXD_MAX];
+	struct rte_mbuf *mbufs[MRVL_PP2_TXD_MAX];
+	int i, ret;
+	unsigned int core_id = rte_lcore_id();
+	struct pp2_hif *hif = hifs[core_id];
+	struct pp2_bpool *bpool = rxq->priv->bpool;
+
+	ret = rte_pktmbuf_alloc_bulk(rxq->mp, mbufs, num);
+	if (ret)
+		return ret;
+
+	if (cookie_addr_high == MRVL_COOKIE_ADDR_INVALID)
+		cookie_addr_high =
+			(uint64_t)mbufs[0] & MRVL_COOKIE_HIGH_ADDR_MASK;
+
+	for (i = 0; i < num; i++) {
+		if (((uint64_t)mbufs[i] & MRVL_COOKIE_HIGH_ADDR_MASK)
+			!= cookie_addr_high) {
+			RTE_LOG(ERR, PMD,
+				"mbuf virtual addr high 0x%lx out of range\n",
+				(uint64_t)mbufs[i] >> 32);
+			goto out;
+		}
+
+		entries[i].buff.addr =
+			rte_mbuf_data_dma_addr_default(mbufs[i]);
+		entries[i].buff.cookie = (pp2_cookie_t)(uint64_t)mbufs[i];
+		entries[i].bpool = bpool;
+	}
+
+	pp2_bpool_put_buffs(hif, entries, (uint16_t *)&i);
+	mrvl_port_bpool_size[bpool->pp2_id][bpool->id][core_id] += i;
+
+	if (i != num)
+		goto out;
+
+	return 0;
+out:
+	for (; i < num; i++)
+		rte_pktmbuf_free(mbufs[i]);
+
+	return -1;
+}
+
+/**
+ * DPDK callback to configure the receive queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param conf
+ *   Thresholds parameters (unused_).
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket,
+		    const struct rte_eth_rxconf *conf __rte_unused,
+		    struct rte_mempool *mp)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	struct mrvl_rxq *rxq;
+	uint32_t min_size,
+		 max_rx_pkt_len = dev->data->dev_conf.rxmode.max_rx_pkt_len;
+	int ret, tc, inq;
+
+	if (priv->rxq_map[idx].tc == MRVL_UNKNOWN_TC) {
+		/*
+		 * Unknown TC mapping, mapping will not have a correct queue.
+		 */
+		RTE_LOG(ERR, PMD, "Unknown TC mapping for queue %hu eth%hhu\n",
+			idx, priv->ppio_id);
+		return -EFAULT;
+	}
+
+	min_size = rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM -
+		   MRVL_PKT_EFFEC_OFFS;
+	if (min_size < max_rx_pkt_len) {
+		RTE_LOG(ERR, PMD,
+			"Mbuf size must be increased to %u bytes to hold up to %u bytes of data.\n",
+			max_rx_pkt_len + RTE_PKTMBUF_HEADROOM +
+			MRVL_PKT_EFFEC_OFFS,
+			max_rx_pkt_len);
+		return -EINVAL;
+	}
+
+	if (dev->data->rx_queues[idx]) {
+		rte_free(dev->data->rx_queues[idx]);
+		dev->data->rx_queues[idx] = NULL;
+	}
+
+	rxq = rte_zmalloc_socket("rxq", sizeof(*rxq), 0, socket);
+	if (!rxq)
+		return -ENOMEM;
+
+	rxq->priv = priv;
+	rxq->mp = mp;
+	rxq->cksum_enabled = dev->data->dev_conf.rxmode.hw_ip_checksum;
+	rxq->queue_id = idx;
+	rxq->port_id = dev->data->port_id;
+	mrvl_port_to_bpool_lookup[rxq->port_id] = priv->bpool;
+
+	tc = priv->rxq_map[rxq->queue_id].tc,
+	inq = priv->rxq_map[rxq->queue_id].inq;
+	priv->ppio_params.inqs_params.tcs_params[tc].inqs_params[inq].size =
+		desc;
+
+	ret = mrvl_fill_bpool(rxq, desc);
+	if (ret) {
+		rte_free(rxq);
+		return ret;
+	}
+
+	priv->bpool_init_size += desc;
+
+	dev->data->rx_queues[idx] = rxq;
+
+	return 0;
+}
+
+/**
+ * DPDK callback to release the receive queue.
+ *
+ * @param rxq
+ *   Generic receive queue pointer.
+ */
+static void
+mrvl_rx_queue_release(void *rxq)
+{
+	struct mrvl_rxq *q = rxq;
+	struct pp2_ppio_tc_params *tc_params;
+	int i, num, tc, inq;
+
+	if (!q)
+		return;
+
+	tc = q->priv->rxq_map[q->queue_id].tc;
+	inq = q->priv->rxq_map[q->queue_id].inq;
+	tc_params = &q->priv->ppio_params.inqs_params.tcs_params[tc];
+	num = tc_params->inqs_params[inq].size;
+	for (i = 0; i < num; i++) {
+		struct pp2_buff_inf inf;
+		uint64_t addr;
+
+		pp2_bpool_get_buff(hifs[rte_lcore_id()], q->priv->bpool, &inf);
+		addr = cookie_addr_high | inf.cookie;
+		rte_pktmbuf_free((struct rte_mbuf *)addr);
+	}
+
+	rte_free(q);
+}
+
+/**
+ * DPDK callback to configure the transmit queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   Transmit queue index.
+ * @param desc
+ *   Number of descriptors to configure in the queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param conf
+ *   Thresholds parameters (unused).
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket,
+		    const struct rte_eth_txconf *conf __rte_unused)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	struct mrvl_txq *txq;
+
+	if (dev->data->tx_queues[idx]) {
+		rte_free(dev->data->tx_queues[idx]);
+		dev->data->tx_queues[idx] = NULL;
+	}
+
+	txq = rte_zmalloc_socket("txq", sizeof(*txq), 0, socket);
+	if (!txq)
+		return -ENOMEM;
+
+	txq->priv = priv;
+	txq->queue_id = idx;
+	txq->port_id = dev->data->port_id;
+	dev->data->tx_queues[idx] = txq;
+
+	priv->ppio_params.outqs_params.outqs_params[idx].size = desc;
+	priv->ppio_params.outqs_params.outqs_params[idx].weight = 1;
+
+	return 0;
+}
+
+/**
+ * DPDK callback to release the transmit queue.
+ *
+ * @param txq
+ *   Generic transmit queue pointer.
+ */
+static void
+mrvl_tx_queue_release(void *txq)
+{
+	struct mrvl_txq *q = txq;
+
+	if (!q)
+		return;
+
+	rte_free(q);
+}
+
+/**
+ * Update RSS hash configuration
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param rss_conf
+ *   Pointer to RSS configuration.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_rss_hash_update(struct rte_eth_dev *dev,
+		     struct rte_eth_rss_conf *rss_conf)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+
+	return mrvl_configure_rss(priv, rss_conf);
+}
+
+/**
+ * DPDK callback to get RSS hash configuration.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @rss_conf
+ *   Pointer to RSS configuration.
+ *
+ * @return
+ *   Always 0.
+ */
+static int
+mrvl_rss_hash_conf_get(struct rte_eth_dev *dev,
+		       struct rte_eth_rss_conf *rss_conf)
+{
+	struct mrvl_priv *priv = dev->data->dev_private;
+	enum pp2_ppio_hash_type hash_type =
+		priv->ppio_params.inqs_params.hash_type;
+
+	rss_conf->rss_key = NULL;
+
+	if (hash_type == PP2_PPIO_HASH_T_NONE)
+		rss_conf->rss_hf = 0;
+	else if (hash_type == PP2_PPIO_HASH_T_2_TUPLE)
+		rss_conf->rss_hf = ETH_RSS_IPV4;
+	else if (hash_type == PP2_PPIO_HASH_T_5_TUPLE && priv->rss_hf_tcp)
+		rss_conf->rss_hf = ETH_RSS_NONFRAG_IPV4_TCP;
+	else if (hash_type == PP2_PPIO_HASH_T_5_TUPLE && !priv->rss_hf_tcp)
+		rss_conf->rss_hf = ETH_RSS_NONFRAG_IPV4_UDP;
+
+	return 0;
+}
+
+static const struct eth_dev_ops mrvl_ops = {
+	.dev_configure = mrvl_dev_configure,
+	.dev_start = mrvl_dev_start,
+	.dev_stop = mrvl_dev_stop,
+	.dev_set_link_up = mrvl_dev_set_link_up,
+	.dev_set_link_down = mrvl_dev_set_link_down,
+	.dev_close = mrvl_dev_close,
+	.link_update = mrvl_link_update,
+	.promiscuous_enable = mrvl_promiscuous_enable,
+	.allmulticast_enable = mrvl_allmulticast_enable,
+	.promiscuous_disable = mrvl_promiscuous_disable,
+	.allmulticast_disable = mrvl_allmulticast_disable,
+	.mac_addr_remove = mrvl_mac_addr_remove,
+	.mac_addr_add = mrvl_mac_addr_add,
+	.mac_addr_set = mrvl_mac_addr_set,
+	.mtu_set = mrvl_mtu_set,
+	.stats_get = mrvl_stats_get,
+	.stats_reset = mrvl_stats_reset,
+	.dev_infos_get = mrvl_dev_infos_get,
+	.dev_supported_ptypes_get = mrvl_dev_supported_ptypes_get,
+	.rxq_info_get = mrvl_rxq_info_get,
+	.txq_info_get = mrvl_txq_info_get,
+	.vlan_filter_set = mrvl_vlan_filter_set,
+	.rx_queue_setup = mrvl_rx_queue_setup,
+	.rx_queue_release = mrvl_rx_queue_release,
+	.tx_queue_setup = mrvl_tx_queue_setup,
+	.tx_queue_release = mrvl_tx_queue_release,
+	.rss_hash_update = mrvl_rss_hash_update,
+	.rss_hash_conf_get = mrvl_rss_hash_conf_get,
+};
+
+/**
+ * Return packet type information and l3/l4 offsets.
+ *
+ * @param desc
+ *   Pointer to the received packet descriptor.
+ * @param l3_offset
+ *   l3 packet offset.
+ * @param l4_offset
+ *   l4 packet offset.
+ *
+ * @return
+ *   Packet type information.
+ */
+static inline uint32_t
+mrvl_desc_to_packet_type_and_offset(struct pp2_ppio_desc *desc,
+				    uint8_t *l3_offset, uint8_t *l4_offset)
+{
+	enum pp2_inq_l3_type l3_type;
+	enum pp2_inq_l4_type l4_type;
+	uint64_t packet_type;
+
+	pp2_ppio_inq_desc_get_l3_info(desc, &l3_type, l3_offset);
+	pp2_ppio_inq_desc_get_l4_info(desc, &l4_type, l4_offset);
+
+	packet_type = RTE_PTYPE_L2_ETHER;
+
+	switch (l3_type) {
+	case PP2_INQ_L3_TYPE_IPV4_NO_OPTS:
+		packet_type |= RTE_PTYPE_L3_IPV4;
+		break;
+	case PP2_INQ_L3_TYPE_IPV4_OK:
+		packet_type |= RTE_PTYPE_L3_IPV4_EXT;
+		break;
+	case PP2_INQ_L3_TYPE_IPV4_TTL_ZERO:
+		packet_type |= RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
+		break;
+	case PP2_INQ_L3_TYPE_IPV6_NO_EXT:
+		packet_type |= RTE_PTYPE_L3_IPV6;
+		break;
+	case PP2_INQ_L3_TYPE_IPV6_EXT:
+		packet_type |= RTE_PTYPE_L3_IPV6_EXT;
+		break;
+	case PP2_INQ_L3_TYPE_ARP:
+		packet_type |= RTE_PTYPE_L2_ETHER_ARP;
+		/*
+		 * In case of ARP l4_offset is set to wrong value.
+		 * Set it to proper one so that later on mbuf->l3_len can be
+		 * calculated subtracting l4_offset and l3_offset.
+		 */
+		*l4_offset = *l3_offset + MRVL_ARP_LENGTH;
+		break;
+	default:
+		RTE_LOG(DEBUG, PMD, "Failed to recognise l3 packet type\n");
+		break;
+	}
+
+	switch (l4_type) {
+	case PP2_INQ_L4_TYPE_TCP:
+		packet_type |= RTE_PTYPE_L4_TCP;
+		break;
+	case PP2_INQ_L4_TYPE_UDP:
+		packet_type |= RTE_PTYPE_L4_UDP;
+		break;
+	default:
+		RTE_LOG(DEBUG, PMD, "Failed to recognise l4 packet type\n");
+		break;
+	}
+
+	return packet_type;
+}
+
+/**
+ * Get offload information from the received packet descriptor.
+ *
+ * @param desc
+ *   Pointer to the received packet descriptor.
+ *
+ * @return
+ *   Mbuf offload flags.
+ */
+static inline uint64_t
+mrvl_desc_to_ol_flags(struct pp2_ppio_desc *desc)
+{
+	uint64_t flags;
+	enum pp2_inq_desc_status status;
+
+	status = pp2_ppio_inq_desc_get_l3_pkt_error(desc);
+	if (unlikely(status != PP2_DESC_ERR_OK))
+		flags = PKT_RX_IP_CKSUM_BAD;
+	else
+		flags = PKT_RX_IP_CKSUM_GOOD;
+
+	status = pp2_ppio_inq_desc_get_l4_pkt_error(desc);
+	if (unlikely(status != PP2_DESC_ERR_OK))
+		flags |= PKT_RX_L4_CKSUM_BAD;
+	else
+		flags |= PKT_RX_L4_CKSUM_GOOD;
+
+	return flags;
+}
+
+/**
+ * DPDK callback for receive.
+ *
+ * @param rxq
+ *   Generic pointer to the receive queue.
+ * @param rx_pkts
+ *   Array to store received packets.
+ * @param nb_pkts
+ *   Maximum number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully received.
+ */
+static uint16_t
+mrvl_rx_pkt_burst(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	struct mrvl_rxq *q = rxq;
+	struct pp2_ppio_desc descs[nb_pkts];
+	struct pp2_bpool *bpool;
+	int i, ret, rx_done = 0;
+	int num;
+	unsigned int core_id = rte_lcore_id();
+
+	if (unlikely(!q->priv->ppio))
+		return 0;
+
+	bpool = q->priv->bpool;
+
+	ret = pp2_ppio_recv(q->priv->ppio, q->priv->rxq_map[q->queue_id].tc,
+			    q->priv->rxq_map[q->queue_id].inq, descs, &nb_pkts);
+	if (unlikely(ret < 0)) {
+		RTE_LOG(ERR, PMD, "Failed to receive packets\n");
+		return 0;
+	}
+	mrvl_port_bpool_size[bpool->pp2_id][bpool->id][core_id] -= nb_pkts;
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *mbuf;
+		uint8_t l3_offset, l4_offset;
+		enum pp2_inq_desc_status status;
+		uint64_t addr;
+
+		if (likely(nb_pkts - i > MRVL_MUSDK_PREFETCH_SHIFT)) {
+			struct pp2_ppio_desc *pref_desc;
+			u64 pref_addr;
+
+			pref_desc = &descs[i + MRVL_MUSDK_PREFETCH_SHIFT];
+			pref_addr = cookie_addr_high |
+				    pp2_ppio_inq_desc_get_cookie(pref_desc);
+			rte_mbuf_prefetch_part1((struct rte_mbuf *)(pref_addr));
+			rte_mbuf_prefetch_part2((struct rte_mbuf *)(pref_addr));
+		}
+
+		addr = cookie_addr_high |
+		       pp2_ppio_inq_desc_get_cookie(&descs[i]);
+		mbuf = (struct rte_mbuf *)addr;
+		rte_pktmbuf_reset(mbuf);
+
+		/* drop packet in case of mac, overrun or resource error */
+		status = pp2_ppio_inq_desc_get_l2_pkt_error(&descs[i]);
+		if (unlikely(status != PP2_DESC_ERR_OK)) {
+			struct pp2_buff_inf binf = {
+				.addr = rte_mbuf_data_dma_addr_default(mbuf),
+				.cookie = (pp2_cookie_t)(uint64_t)mbuf,
+			};
+
+			pp2_bpool_put_buff(hifs[core_id], bpool, &binf);
+			mrvl_port_bpool_size
+				[bpool->pp2_id][bpool->id][core_id]++;
+			q->drop_mac++;
+			continue;
+		}
+
+		mbuf->data_off += MRVL_PKT_EFFEC_OFFS;
+		mbuf->pkt_len = pp2_ppio_inq_desc_get_pkt_len(&descs[i]);
+		mbuf->data_len = mbuf->pkt_len;
+		mbuf->port = q->port_id;
+		mbuf->packet_type =
+			mrvl_desc_to_packet_type_and_offset(&descs[i],
+							    &l3_offset,
+							    &l4_offset);
+		mbuf->l2_len = l3_offset;
+		mbuf->l3_len = l4_offset - l3_offset;
+
+		if (likely(q->cksum_enabled))
+			mbuf->ol_flags = mrvl_desc_to_ol_flags(&descs[i]);
+
+		rx_pkts[rx_done++] = mbuf;
+		q->bytes_recv += mbuf->pkt_len;
+	}
+
+	if (rte_spinlock_trylock(&q->priv->lock) == 1) {
+		num = mrvl_get_bpool_size(bpool->pp2_id, bpool->id);
+
+		if (unlikely(num <= q->priv->bpool_min_size ||
+			     (!rx_done && num < q->priv->bpool_init_size))) {
+			ret = mrvl_fill_bpool(q, MRVL_BURST_SIZE);
+			if (ret)
+				RTE_LOG(ERR, PMD, "Failed to fill bpool\n");
+		} else if (unlikely(num > q->priv->bpool_max_size)) {
+			int i;
+			int pkt_to_remove = num - q->priv->bpool_init_size;
+			struct rte_mbuf *mbuf;
+			struct pp2_buff_inf buff;
+
+			RTE_LOG(DEBUG, PMD,
+				"\nport-%d:%d: bpool %d oversize - remove %d buffers (pool size: %d -> %d)\n",
+				bpool->pp2_id, q->priv->ppio->port_id,
+				bpool->id, pkt_to_remove, num,
+				q->priv->bpool_init_size);
+
+			for (i = 0; i < pkt_to_remove; i++) {
+				pp2_bpool_get_buff(hifs[core_id], bpool, &buff);
+				mbuf = (struct rte_mbuf *)
+					(cookie_addr_high | buff.cookie);
+				rte_pktmbuf_free(mbuf);
+			}
+			mrvl_port_bpool_size
+				[bpool->pp2_id][bpool->id][core_id] -=
+								pkt_to_remove;
+		}
+		rte_spinlock_unlock(&q->priv->lock);
+	}
+
+	return rx_done;
+}
+
+/**
+ * Prepare offload information.
+ *
+ * @param ol_flags
+ *   Offload flags.
+ * @param packet_type
+ *   Packet type bitfield.
+ * @param l3_type
+ *   Pointer to the pp2_ouq_l3_type structure.
+ * @param l4_type
+ *   Pointer to the pp2_outq_l4_type structure.
+ * @param gen_l3_cksum
+ *   Will be set to 1 in case l3 checksum is computed.
+ * @param l4_cksum
+ *   Will be set to 1 in case l4 checksum is computed.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static inline int
+mrvl_prepare_proto_info(uint64_t ol_flags, uint32_t packet_type,
+			enum pp2_outq_l3_type *l3_type,
+			enum pp2_outq_l4_type *l4_type,
+			int *gen_l3_cksum,
+			int *gen_l4_cksum)
+{
+	/*
+	 * Based on ol_flags prepare information
+	 * for pp2_ppio_outq_desc_set_proto_info() which setups descriptor
+	 * for offloading.
+	 */
+	if (ol_flags & PKT_TX_IPV4) {
+		*l3_type = PP2_OUTQ_L3_TYPE_IPV4;
+		*gen_l3_cksum = ol_flags & PKT_TX_IP_CKSUM ? 1 : 0;
+	} else if (ol_flags & PKT_TX_IPV6) {
+		*l3_type = PP2_OUTQ_L3_TYPE_IPV6;
+		/* no checksum for ipv6 header */
+		*gen_l3_cksum = 0;
+	} else {
+		/* if something different then stop processing */
+		return -1;
+	}
+
+	ol_flags &= PKT_TX_L4_MASK;
+	if ((packet_type & RTE_PTYPE_L4_TCP) &&
+	    ol_flags == PKT_TX_TCP_CKSUM) {
+		*l4_type = PP2_OUTQ_L4_TYPE_TCP;
+		*gen_l4_cksum = 1;
+	} else if ((packet_type & RTE_PTYPE_L4_UDP) &&
+		   ol_flags == PKT_TX_UDP_CKSUM) {
+		*l4_type = PP2_OUTQ_L4_TYPE_UDP;
+		*gen_l4_cksum = 1;
+	} else {
+		*l4_type = PP2_OUTQ_L4_TYPE_OTHER;
+		/* no checksum for other type */
+		*gen_l4_cksum = 0;
+	}
+
+	return 0;
+}
+
+/**
+ * Release already sent buffers to bpool (buffer-pool).
+ *
+ * @param ppio
+ *   Pointer to the port structure.
+ * @param hif
+ *   Pointer to the MUSDK hardware interface.
+ * @param sq
+ *   Pointer to the shadow queue.
+ * @param qid
+ *   Queue id number.
+ * @param force
+ *   Force releasing packets.
+ */
+static inline void
+mrvl_free_sent_buffers(struct pp2_ppio *ppio, struct pp2_hif *hif,
+		       struct mrvl_shadow_txq *sq, int qid, int force)
+{
+	struct buff_release_entry *entry;
+	uint16_t nb_done = 0, num = 0, skip_bufs = 0;
+	int i, core_id = rte_lcore_id();
+
+	pp2_ppio_get_num_outq_done(ppio, hif, qid, &nb_done);
+
+	sq->num_to_release += nb_done;
+
+	if (likely(!force &&
+		   sq->num_to_release < MRVL_PP2_BUF_RELEASE_BURST_SIZE))
+		return;
+
+	nb_done = sq->num_to_release;
+	sq->num_to_release = 0;
+
+	for (i = 0; i < nb_done; i++) {
+		entry = &sq->ent[sq->tail + num];
+		if (unlikely(!entry->buff.addr)) {
+			RTE_LOG(ERR, PMD,
+				"Shadow memory @%d: cookie(%lx), pa(%lx)!\n",
+				sq->tail, (u64)entry->buff.cookie,
+				(u64)entry->buff.addr);
+			skip_bufs = 1;
+			goto skip;
+		}
+
+		if (unlikely(!entry->bpool)) {
+			struct rte_mbuf *mbuf;
+
+			mbuf = (struct rte_mbuf *)
+			       (cookie_addr_high | entry->buff.cookie);
+			rte_pktmbuf_free(mbuf);
+			skip_bufs = 1;
+			goto skip;
+		}
+
+		mrvl_port_bpool_size
+			[entry->bpool->pp2_id][entry->bpool->id][core_id]++;
+		num++;
+		if (unlikely(sq->tail + num == MRVL_PP2_TX_SHADOWQ_SIZE))
+			goto skip;
+		continue;
+skip:
+		if (likely(num))
+			pp2_bpool_put_buffs(hif, &sq->ent[sq->tail], &num);
+		num += skip_bufs;
+		sq->tail = (sq->tail + num) & MRVL_PP2_TX_SHADOWQ_MASK;
+		sq->size -= num;
+		num = 0;
+	}
+
+	if (likely(num)) {
+		pp2_bpool_put_buffs(hif, &sq->ent[sq->tail], &num);
+		sq->tail = (sq->tail + num) & MRVL_PP2_TX_SHADOWQ_MASK;
+		sq->size -= num;
+	}
+}
+
+/**
+ * DPDK callback for transmit.
+ *
+ * @param txq
+ *   Generic pointer transmit queue.
+ * @param tx_pkts
+ *   Packets to transmit.
+ * @param nb_pkts
+ *   Number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully transmitted.
+ */
+static uint16_t
+mrvl_tx_pkt_burst(void *txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct mrvl_txq *q = txq;
+	struct mrvl_shadow_txq *sq = &shadow_txqs[q->port_id][rte_lcore_id()];
+	struct pp2_hif *hif = hifs[rte_lcore_id()];
+	struct pp2_ppio_desc descs[nb_pkts];
+	int i, ret, bytes_sent = 0;
+	uint16_t num, sq_free_size;
+	uint64_t addr;
+
+	if (unlikely(!q->priv->ppio))
+		return 0;
+
+	if (sq->size)
+		mrvl_free_sent_buffers(q->priv->ppio, hif, sq, q->queue_id, 0);
+
+	sq_free_size = MRVL_PP2_TX_SHADOWQ_SIZE - sq->size - 1;
+	if (unlikely(nb_pkts > sq_free_size)) {
+		RTE_LOG(DEBUG, PMD,
+			"No room in shadow queue for %d packets! %d packets will be sent.\n",
+			nb_pkts, sq_free_size);
+		nb_pkts = sq_free_size;
+	}
+
+	for (i = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *mbuf = tx_pkts[i];
+		int gen_l3_cksum, gen_l4_cksum;
+		enum pp2_outq_l3_type l3_type;
+		enum pp2_outq_l4_type l4_type;
+
+		if (likely(nb_pkts - i > MRVL_MUSDK_PREFETCH_SHIFT)) {
+			struct rte_mbuf *pref_pkt_hdr;
+
+			pref_pkt_hdr = tx_pkts[i + MRVL_MUSDK_PREFETCH_SHIFT];
+			rte_mbuf_prefetch_part1(pref_pkt_hdr);
+			rte_mbuf_prefetch_part2(pref_pkt_hdr);
+		}
+
+		sq->ent[sq->head].buff.cookie = (pp2_cookie_t)(uint64_t)mbuf;
+		sq->ent[sq->head].buff.addr =
+			rte_mbuf_data_dma_addr_default(mbuf);
+		sq->ent[sq->head].bpool =
+			(unlikely(mbuf->port == 0xff || mbuf->refcnt > 1)) ?
+			 NULL : mrvl_port_to_bpool_lookup[mbuf->port];
+		sq->head = (sq->head + 1) & MRVL_PP2_TX_SHADOWQ_MASK;
+		sq->size++;
+
+		pp2_ppio_outq_desc_reset(&descs[i]);
+		pp2_ppio_outq_desc_set_phys_addr(&descs[i],
+						 rte_pktmbuf_mtophys(mbuf));
+		pp2_ppio_outq_desc_set_pkt_offset(&descs[i], 0);
+		pp2_ppio_outq_desc_set_pkt_len(&descs[i],
+					       rte_pktmbuf_pkt_len(mbuf));
+
+		bytes_sent += rte_pktmbuf_pkt_len(mbuf);
+		/*
+		 * in case unsupported ol_flags were passed
+		 * do not update descriptor offload information
+		 */
+		ret = mrvl_prepare_proto_info(mbuf->ol_flags, mbuf->packet_type,
+					      &l3_type, &l4_type, &gen_l3_cksum,
+					      &gen_l4_cksum);
+		if (unlikely(ret))
+			continue;
+
+		pp2_ppio_outq_desc_set_proto_info(&descs[i], l3_type, l4_type,
+						  mbuf->l2_len,
+						  mbuf->l2_len + mbuf->l3_len,
+						  gen_l3_cksum, gen_l4_cksum);
+	}
+
+	num = nb_pkts;
+	pp2_ppio_send(q->priv->ppio, hif, q->queue_id, descs, &nb_pkts);
+	/* number of packets that were not sent */
+	if (unlikely(num > nb_pkts)) {
+		for (i = nb_pkts; i < num; i++) {
+			sq->head = (MRVL_PP2_TX_SHADOWQ_SIZE + sq->head - 1) &
+				MRVL_PP2_TX_SHADOWQ_MASK;
+			addr = cookie_addr_high | sq->ent[sq->head].buff.cookie;
+			bytes_sent -=
+				rte_pktmbuf_pkt_len((struct rte_mbuf *)addr);
+		}
+		sq->size -= num - nb_pkts;
+	}
+
+	q->bytes_sent += bytes_sent;
+
+	return nb_pkts;
+}
+
+/**
+ * Initialize packet processor.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_init_pp2(void)
+{
+	struct pp2_init_params init_params;
+
+	memset(&init_params, 0, sizeof(init_params));
+	init_params.hif_reserved_map = MRVL_MUSDK_HIFS_RESERVED;
+	init_params.bm_pool_reserved_map = MRVL_MUSDK_BPOOLS_RESERVED;
+	init_params.rss_tbl_reserved_map = MRVL_MUSDK_RSS_RESERVED;
+
+	return pp2_init(&init_params);
+}
+
+/**
+ * Deinitialize packet processor.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static void
+mrvl_deinit_pp2(void)
+{
+	pp2_deinit();
+}
+
+/**
+ * Create private device structure.
+ *
+ * @param dev_name
+ *   Pointer to the port name passed in the initialization parameters.
+ *
+ * @return
+ *   Pointer to the newly allocated private device structure.
+ */
+static struct mrvl_priv *
+mrvl_priv_create(const char *dev_name)
+{
+	struct pp2_bpool_params bpool_params;
+	char match[MRVL_MATCH_LEN];
+	struct mrvl_priv *priv;
+	int ret, bpool_bit;
+
+	priv = rte_zmalloc_socket(dev_name, sizeof(*priv), 0, rte_socket_id());
+	if (!priv)
+		return NULL;
+
+	ret = pp2_netdev_get_ppio_info((char *)(uintptr_t)dev_name,
+				       &priv->pp_id, &priv->ppio_id);
+	if (ret)
+		goto out_free_priv;
+
+	bpool_bit = mrvl_reserve_bit(&used_bpools[priv->pp_id],
+				     PP2_BPOOL_NUM_POOLS);
+	if (bpool_bit < 0)
+		goto out_free_priv;
+	priv->bpool_bit = bpool_bit;
+
+	snprintf(match, sizeof(match), "pool-%d:%d", priv->pp_id,
+		 priv->bpool_bit);
+	memset(&bpool_params, 0, sizeof(bpool_params));
+	bpool_params.match = match;
+	bpool_params.buff_len = MRVL_PKT_SIZE_MAX + MRVL_PKT_EFFEC_OFFS;
+	ret = pp2_bpool_init(&bpool_params, &priv->bpool);
+	if (ret)
+		goto out_clear_bpool_bit;
+
+	priv->ppio_params.type = PP2_PPIO_T_NIC;
+	rte_spinlock_init(&priv->lock);
+
+	return priv;
+out_clear_bpool_bit:
+	used_bpools[priv->pp_id] &= ~(1 << priv->bpool_bit);
+out_free_priv:
+	rte_free(priv);
+	return NULL;
+}
+
+/**
+ * Create device representing Ethernet port.
+ *
+ * @param name
+ *   Pointer to the port's name.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_eth_dev_create(struct rte_vdev_device *vdev, const char *name)
+{
+	int ret, fd = socket(AF_INET, SOCK_DGRAM, 0);
+	struct rte_eth_dev *eth_dev;
+	struct mrvl_priv *priv;
+	struct ifreq req;
+
+	eth_dev = rte_eth_dev_allocate(name);
+	if (!eth_dev)
+		return -ENOMEM;
+
+	priv = mrvl_priv_create(name);
+	if (!priv) {
+		ret = -ENOMEM;
+		goto out_free_dev;
+	}
+
+	eth_dev->data->mac_addrs =
+		rte_zmalloc("mac_addrs",
+			    ETHER_ADDR_LEN * MRVL_MAC_ADDRS_MAX, 0);
+	if (!eth_dev->data->mac_addrs) {
+		RTE_LOG(ERR, PMD, "Failed to allocate space for eth addrs\n");
+		ret = -ENOMEM;
+		goto out_free_priv;
+	}
+
+	memset(&req, 0, sizeof(req));
+	strcpy(req.ifr_name, name);
+	ret = ioctl(fd, SIOCGIFHWADDR, &req);
+	if (ret)
+		goto out_free_mac;
+
+	memcpy(eth_dev->data->mac_addrs[0].addr_bytes,
+	       req.ifr_addr.sa_data, ETHER_ADDR_LEN);
+
+	eth_dev->rx_pkt_burst = mrvl_rx_pkt_burst;
+	eth_dev->tx_pkt_burst = mrvl_tx_pkt_burst;
+	eth_dev->data->dev_private = priv;
+	eth_dev->device = &vdev->device;
+	eth_dev->dev_ops = &mrvl_ops;
+
+	return 0;
+out_free_mac:
+	rte_free(eth_dev->data->mac_addrs);
+out_free_dev:
+	rte_eth_dev_release_port(eth_dev);
+out_free_priv:
+	rte_free(priv);
+
+	return ret;
+}
+
+/**
+ * Cleanup previously created device representing Ethernet port.
+ *
+ * @param name
+ *   Pointer to the port name.
+ */
+static void
+mrvl_eth_dev_destroy(const char *name)
+{
+	struct rte_eth_dev *eth_dev;
+	struct mrvl_priv *priv;
+
+	eth_dev = rte_eth_dev_allocated(name);
+	if (!eth_dev)
+		return;
+
+	priv = eth_dev->data->dev_private;
+	pp2_bpool_deinit(priv->bpool);
+	rte_free(priv);
+	rte_free(eth_dev->data->mac_addrs);
+	rte_eth_dev_release_port(eth_dev);
+}
+
+/**
+ * Callback used by rte_kvargs_process() during argument parsing.
+ *
+ * @param key
+ *   Pointer to the parsed key (unused).
+ * @param value
+ *   Pointer to the parsed value.
+ * @param extra_args
+ *   Pointer to the extra arguments which contains address of the
+ *   table of pointers to parsed interface names.
+ *
+ * @return
+ *   Always 0.
+ */
+static int
+mrvl_get_ifnames(const char *key __rte_unused, const char *value,
+		 void *extra_args)
+{
+	const char **ifnames = extra_args;
+
+	ifnames[mrvl_ports_nb++] = value;
+
+	return 0;
+}
+
+/**
+ * Initialize per-lcore MUSDK hardware interfaces (hifs).
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+mrvl_init_hifs(void)
+{
+	struct pp2_hif_params params;
+	char match[MRVL_MATCH_LEN];
+	int i, ret;
+
+	RTE_LCORE_FOREACH(i) {
+		ret = mrvl_reserve_bit(&used_hifs, MRVL_MUSDK_HIFS_MAX);
+		if (ret < 0)
+			return ret;
+
+		snprintf(match, sizeof(match), "hif-%d", ret);
+		memset(&params, 0, sizeof(params));
+		params.match = match;
+		params.out_size = MRVL_PP2_AGGR_TXQD_MAX;
+		ret = pp2_hif_init(&params, &hifs[i]);
+		if (ret) {
+			RTE_LOG(ERR, PMD, "Failed to initialize hif %d\n", i);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * Deinitialize per-lcore MUSDK hardware interfaces (hifs).
+ */
+static void
+mrvl_deinit_hifs(void)
+{
+	int i;
+
+	RTE_LCORE_FOREACH(i) {
+		if (hifs[i])
+			pp2_hif_deinit(hifs[i]);
+	}
+}
+
+static void mrvl_set_first_last_cores(int core_id)
+{
+	if (core_id < mrvl_lcore_first)
+		mrvl_lcore_first = core_id;
+
+	if (core_id > mrvl_lcore_last)
+		mrvl_lcore_last = core_id;
+}
+
+/**
+ * DPDK callback to register the virtual device.
+ *
+ * @param vdev
+ *   Pointer to the virtual device.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+rte_pmd_mrvl_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_kvargs *kvlist;
+	const char *ifnames[PP2_NUM_ETH_PPIO * PP2_NUM_PKT_PROC];
+	int ret = -EINVAL;
+	uint32_t i, ifnum, cfgnum, core_id;
+	const char *params;
+
+	params = rte_vdev_device_args(vdev);
+	if (!params)
+		return -EINVAL;
+
+	kvlist = rte_kvargs_parse(params, valid_args);
+	if (!kvlist)
+		return -EINVAL;
+
+	ifnum = rte_kvargs_count(kvlist, MRVL_IFACE_NAME_ARG);
+	if (ifnum > RTE_DIM(ifnames))
+		goto out_free_kvlist;
+
+	rte_kvargs_process(kvlist, MRVL_IFACE_NAME_ARG,
+			   mrvl_get_ifnames, &ifnames);
+
+	cfgnum = rte_kvargs_count(kvlist, MRVL_CFG_ARG);
+	if (cfgnum > 1) {
+		RTE_LOG(ERR, PMD, "Cannot handle more than one config file!\n");
+		goto out_free_kvlist;
+	} else if (cfgnum == 1) {
+		rte_kvargs_process(kvlist, MRVL_CFG_ARG,
+				   mrvl_get_qoscfg, &mrvl_qos_cfg);
+	}
+
+	/*
+	 * ret == -EEXIST is correct, it means DMA
+	 * has been already initialized (by another PMD).
+	 */
+	ret = mv_sys_dma_mem_init(RTE_MRVL_MUSDK_DMA_MEMSIZE);
+	if (ret < 0 && ret != -EEXIST)
+		goto out_free_kvlist;
+
+	ret = mrvl_init_pp2();
+	if (ret) {
+		RTE_LOG(ERR, PMD, "Failed to init PP!\n");
+		goto out_deinit_dma;
+	}
+
+	ret = mrvl_init_hifs();
+	if (ret)
+		goto out_deinit_hifs;
+
+	for (i = 0; i < ifnum; i++) {
+		RTE_LOG(INFO, PMD, "Creating %s\n", ifnames[i]);
+		ret = mrvl_eth_dev_create(vdev, ifnames[i]);
+		if (ret)
+			goto out_cleanup;
+	}
+
+	rte_kvargs_free(kvlist);
+
+	memset(mrvl_port_bpool_size, 0, sizeof(mrvl_port_bpool_size));
+
+	mrvl_lcore_first = RTE_MAX_LCORE;
+	mrvl_lcore_last = 0;
+
+	RTE_LCORE_FOREACH(core_id) {
+		mrvl_set_first_last_cores(core_id);
+	}
+
+	return 0;
+out_cleanup:
+	for (; i > 0; i--)
+		mrvl_eth_dev_destroy(ifnames[i]);
+out_deinit_hifs:
+	mrvl_deinit_hifs();
+	mrvl_deinit_pp2();
+out_deinit_dma:
+	mv_sys_dma_mem_destroy();
+out_free_kvlist:
+	rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+/**
+ * DPDK callback to remove virtual device.
+ *
+ * @param vdev
+ *   Pointer to the removed virtual device.
+ *
+ * @return
+ *   0 on success, negative error value otherwise.
+ */
+static int
+rte_pmd_mrvl_remove(struct rte_vdev_device *vdev)
+{
+	int i;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (!name)
+		return -EINVAL;
+
+	RTE_LOG(INFO, PMD, "Removing %s\n", name);
+
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		char ifname[RTE_ETH_NAME_MAX_LEN];
+
+		rte_eth_dev_get_name_by_port(i, ifname);
+		mrvl_eth_dev_destroy(ifname);
+	}
+
+	mrvl_deinit_hifs();
+	mrvl_deinit_pp2();
+	mv_sys_dma_mem_destroy();
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_mrvl_drv = {
+	.probe = rte_pmd_mrvl_probe,
+	.remove = rte_pmd_mrvl_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_mrvl, pmd_mrvl_drv);
+RTE_PMD_REGISTER_ALIAS(net_mrvl, eth_mrvl);
diff --git a/drivers/net/mrvl/mrvl_ethdev.h b/drivers/net/mrvl/mrvl_ethdev.h
new file mode 100644
index 0000000..72af4c7
--- /dev/null
+++ b/drivers/net/mrvl/mrvl_ethdev.h
@@ -0,0 +1,114 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Semihalf. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Semihalf nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MRVL_ETHDEV_H_
+#define _MRVL_ETHDEV_H_
+
+#include <rte_spinlock.h>
+#include <drivers/mv_pp2_cls.h>
+#include <drivers/mv_pp2_ppio.h>
+
+/** Maximum number of rx queues per port */
+#define MRVL_PP2_RXQ_MAX 32
+
+/** Maximum number of tx queues per port */
+#define MRVL_PP2_TXQ_MAX 8
+
+/** Minimum number of descriptors in tx queue */
+#define MRVL_PP2_TXD_MIN 16
+
+/** Maximum number of descriptors in tx queue */
+#define MRVL_PP2_TXD_MAX 2048
+
+/** Tx queue descriptors alignment */
+#define MRVL_PP2_TXD_ALIGN 16
+
+/** Minimum number of descriptors in rx queue */
+#define MRVL_PP2_RXD_MIN 16
+
+/** Maximum number of descriptors in rx queue */
+#define MRVL_PP2_RXD_MAX 2048
+
+/** Rx queue descriptors alignment */
+#define MRVL_PP2_RXD_ALIGN 16
+
+/** Maximum number of descriptors in tx aggregated queue */
+#define MRVL_PP2_AGGR_TXQD_MAX 2048
+
+/** Maximum number of Traffic Classes. */
+#define MRVL_PP2_TC_MAX 8
+
+/** Packet offset inside RX buffer. */
+#define MRVL_PKT_OFFS 64
+
+/** Maximum number of descriptors in shadow queue. Must be power of 2 */
+#define MRVL_PP2_TX_SHADOWQ_SIZE MRVL_PP2_TXD_MAX
+
+/** Shadow queue size mask (since shadow queue size is power of 2) */
+#define MRVL_PP2_TX_SHADOWQ_MASK (MRVL_PP2_TX_SHADOWQ_SIZE - 1)
+
+/** Minimum number of sent buffers to release from shadow queue to BM */
+#define MRVL_PP2_BUF_RELEASE_BURST_SIZE	64
+
+struct mrvl_priv {
+	/* Hot fields, used in fast path. */
+	struct pp2_bpool *bpool;  /**< BPool pointer */
+	struct pp2_ppio	*ppio;    /**< Port handler pointer */
+	rte_spinlock_t lock;	  /**< Spinlock for checking bpool status */
+	uint16_t bpool_max_size;  /**< BPool maximum size */
+	uint16_t bpool_min_size;  /**< BPool minimum size  */
+	uint16_t bpool_init_size; /**< Configured BPool size  */
+
+	/** Mapping for DPDK rx queue->(TC, MRVL relative inq) */
+	struct {
+		uint8_t tc;  /**< Traffic Class */
+		uint8_t inq; /**< Relative in-queue number */
+	} rxq_map[MRVL_PP2_RXQ_MAX] __rte_cache_aligned;
+
+	/* Configuration data, used sporadically. */
+	uint8_t pp_id;
+	uint8_t ppio_id;
+	uint8_t bpool_bit;
+	uint8_t rss_hf_tcp;
+	uint8_t uc_mc_flushed;
+	uint8_t vlan_flushed;
+
+	struct pp2_ppio_params ppio_params;
+	struct pp2_cls_qos_tbl_params qos_tbl_params;
+	struct pp2_cls_tbl *qos_tbl;
+	uint16_t nb_rx_queues;
+};
+
+/** Number of ports configured. */
+extern int mrvl_ports_nb;
+
+#endif /* _MRVL_ETHDEV_H_ */
diff --git a/drivers/net/mrvl/mrvl_qos.c b/drivers/net/mrvl/mrvl_qos.c
new file mode 100644
index 0000000..925f881
--- /dev/null
+++ b/drivers/net/mrvl/mrvl_qos.c
@@ -0,0 +1,628 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Semihalf. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Semihalf nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_cfgfile.h>
+#include <rte_log.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_string_fns.h>
+
+/* Unluckily, container_of is defined by both DPDK and MUSDK,
+ * we'll declare only one version.
+ *
+ * Note that it is not used in this PMD anyway.
+ */
+#ifdef container_of
+#undef container_of
+#endif
+
+#include "mrvl_qos.h"
+
+/* Parsing tokens. Defined conveniently, so that any correction is easy. */
+#define MRVL_TOK_DEFAULT "default"
+#define MRVL_TOK_DEFAULT_TC "default_tc"
+#define MRVL_TOK_DSCP "dscp"
+#define MRVL_TOK_MAPPING_PRIORITY "mapping_priority"
+#define MRVL_TOK_IP "ip"
+#define MRVL_TOK_IP_VLAN "ip/vlan"
+#define MRVL_TOK_PCP "pcp"
+#define MRVL_TOK_PORT "port"
+#define MRVL_TOK_RXQ "rxq"
+#define MRVL_TOK_SP "SP"
+#define MRVL_TOK_TC "tc"
+#define MRVL_TOK_TXQ "txq"
+#define MRVL_TOK_VLAN "vlan"
+#define MRVL_TOK_VLAN_IP "vlan/ip"
+#define MRVL_TOK_WEIGHT "weight"
+
+/** Number of tokens in range a-b = 2. */
+#define MAX_RNG_TOKENS 2
+
+/** Maximum possible value of PCP. */
+#define MAX_PCP 7
+
+/** Maximum possible value of DSCP. */
+#define MAX_DSCP 63
+
+/** Global QoS configuration. */
+struct mrvl_qos_cfg *mrvl_qos_cfg;
+
+/**
+ * Convert string to uint32_t with extra checks for result correctness.
+ *
+ * @param string String to convert.
+ * @param val Conversion result.
+ * @returns 0 in case of success, negative value otherwise.
+ */
+static int
+get_val_securely(const char *string, uint32_t *val)
+{
+	char *endptr;
+	size_t len = strlen(string);
+
+	if (len == 0)
+		return -1;
+
+	*val = strtoul(string, &endptr, 0);
+	if (errno != 0 || RTE_PTR_DIFF(endptr, string) != len)
+		return -2;
+
+	return 0;
+}
+
+/**
+ * Read out-queue configuration from file.
+ *
+ * @param file Path to the configuration file.
+ * @param port Port number.
+ * @param outq Out queue number.
+ * @param cfg Pointer to the Marvell QoS configuration structure.
+ * @returns 0 in case of success, negative value otherwise.
+ */
+static int
+get_outq_cfg(struct rte_cfgfile *file, int port, int outq,
+		struct mrvl_qos_cfg *cfg)
+{
+	char sec_name[32];
+	const char *entry;
+	uint32_t val;
+
+	snprintf(sec_name, sizeof(sec_name), "%s %d %s %d",
+		MRVL_TOK_PORT, port, MRVL_TOK_TXQ, outq);
+
+	/* Skip non-existing */
+	if (rte_cfgfile_num_sections(file, sec_name, strlen(sec_name)) <= 0)
+		return 0;
+
+	entry = rte_cfgfile_get_entry(file, sec_name,
+			MRVL_TOK_WEIGHT);
+	if (entry) {
+		if (get_val_securely(entry, &val) < 0)
+			return -1;
+		cfg->port[port].outq[outq].weight = (uint8_t)val;
+	}
+
+	return 0;
+}
+
+/**
+ * Gets multiple-entry values and places them in table.
+ *
+ * Entry can be anything, e.g. "1 2-3 5 6 7-9". This needs to be converted to
+ * table entries, respectively: {1, 2, 3, 5, 6, 7, 8, 9}.
+ * As all result table's elements are always 1-byte long, we
+ * won't overcomplicate the function, but we'll keep API generic,
+ * check if someone hasn't changed element size and make it simple
+ * to extend to other sizes.
+ *
+ * This function is purely utilitary, it does not print any error, only returns
+ * different error numbers.
+ *
+ * @param entry[in] Values string to parse.
+ * @param tab[out] Results table.
+ * @param elem_sz[in] Element size (in bytes).
+ * @param max_elems[in] Number of results table elements available.
+ * @param max val[in] Maximum value allowed.
+ * @returns Number of correctly parsed elements in case of success.
+ * @retval -1 Wrong element size.
+ * @retval -2 More tokens than result table allows.
+ * @retval -3 Wrong range syntax.
+ * @retval -4 Wrong range values.
+ * @retval -5 Maximum value exceeded.
+ */
+static int
+get_entry_values(const char *entry, uint8_t *tab,
+	size_t elem_sz, uint8_t max_elems, uint8_t max_val)
+{
+	/* There should not be more tokens than max elements.
+	 * Add 1 for error trap.
+	 */
+	char *tokens[max_elems + 1];
+
+	/* Begin, End + error trap = 3. */
+	char *rng_tokens[MAX_RNG_TOKENS + 1];
+	long beg, end;
+	uint32_t token_val;
+	int nb_tokens, nb_rng_tokens;
+	int i;
+	int values = 0;
+	char val;
+	char entry_cpy[CFG_VALUE_LEN];
+
+	if (elem_sz != 1)
+		return -1;
+
+	/* Copy the entry to safely use rte_strsplit(). */
+	snprintf(entry_cpy, RTE_DIM(entry_cpy), "%s", entry);
+
+	/*
+	 * If there are more tokens than array size, rte_strsplit will
+	 * not return error, just array size.
+	 */
+	nb_tokens = rte_strsplit(entry_cpy, strlen(entry_cpy),
+		tokens, max_elems + 1, ' ');
+
+	/* Quick check, will be refined later. */
+	if (nb_tokens > max_elems)
+		return -2;
+
+	for (i = 0; i < nb_tokens; ++i) {
+		if (strchr(tokens[i], '-') != NULL) {
+			/*
+			 * Split to begin and end tokens.
+			 * We want to catch error cases too, thus we leave
+			 * option for number of tokens to be more than 2.
+			 */
+			nb_rng_tokens = rte_strsplit(tokens[i],
+					strlen(tokens[i]), rng_tokens,
+					RTE_DIM(rng_tokens), '-');
+			if (nb_rng_tokens != 2)
+				return -3;
+
+			/* Range and sanity checks. */
+			if (get_val_securely(rng_tokens[0], &token_val) < 0)
+				return -4;
+			beg = (char)token_val;
+			if (get_val_securely(rng_tokens[1], &token_val) < 0)
+				return -4;
+			end = (char)token_val;
+			if (beg < 0 || beg > UCHAR_MAX ||
+				end < 0 || end > UCHAR_MAX || end < beg)
+				return -4;
+
+			for (val = beg; val <= end; ++val) {
+				if (val > max_val)
+					return -5;
+
+				*tab = val;
+				tab = RTE_PTR_ADD(tab, elem_sz);
+				++values;
+				if (values >= max_elems)
+					return -2;
+			}
+		} else {
+			/* Single values. */
+			if (get_val_securely(tokens[i], &token_val) < 0)
+				return -5;
+			val = (char)token_val;
+			if (val > max_val)
+				return -5;
+
+			*tab = val;
+			tab = RTE_PTR_ADD(tab, elem_sz);
+			++values;
+			if (values >= max_elems)
+				return -2;
+		}
+	}
+
+	return values;
+}
+
+/**
+ * Parse Traffic Class'es mapping configuration.
+ *
+ * @param file Config file handle.
+ * @param port Which port to look for.
+ * @param tc Which Traffic Class to look for.
+ * @param cfg[out] Parsing results.
+ * @returns 0 in case of success, negative value otherwise.
+ */
+static int
+parse_tc_cfg(struct rte_cfgfile *file, int port, int tc,
+		struct mrvl_qos_cfg *cfg)
+{
+	char sec_name[32];
+	const char *entry;
+	int n;
+
+	snprintf(sec_name, sizeof(sec_name), "%s %d %s %d",
+		MRVL_TOK_PORT, port, MRVL_TOK_TC, tc);
+
+	/* Skip non-existing */
+	if (rte_cfgfile_num_sections(file, sec_name, strlen(sec_name)) <= 0)
+		return 0;
+
+	entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_RXQ);
+	if (entry) {
+		n = get_entry_values(entry,
+			cfg->port[port].tc[tc].inq,
+			sizeof(cfg->port[port].tc[tc].inq[0]),
+			RTE_DIM(cfg->port[port].tc[tc].inq),
+			MRVL_PP2_RXQ_MAX);
+		if (n < 0) {
+			RTE_LOG(ERR, PMD, "Error %d while parsing: %s\n",
+				n, entry);
+			return n;
+		}
+		cfg->port[port].tc[tc].inqs = n;
+	}
+
+	entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_PCP);
+	if (entry) {
+		n = get_entry_values(entry,
+			cfg->port[port].tc[tc].pcp,
+			sizeof(cfg->port[port].tc[tc].pcp[0]),
+			RTE_DIM(cfg->port[port].tc[tc].pcp),
+			MAX_PCP);
+		if (n < 0) {
+			RTE_LOG(ERR, PMD, "Error %d while parsing: %s\n",
+				n, entry);
+			return n;
+		}
+		cfg->port[port].tc[tc].pcps = n;
+	}
+
+	entry = rte_cfgfile_get_entry(file, sec_name, MRVL_TOK_DSCP);
+	if (entry) {
+		n = get_entry_values(entry,
+			cfg->port[port].tc[tc].dscp,
+			sizeof(cfg->port[port].tc[tc].dscp[0]),
+			RTE_DIM(cfg->port[port].tc[tc].dscp),
+			MAX_DSCP);
+		if (n < 0) {
+			RTE_LOG(ERR, PMD, "Error %d while parsing: %s\n",
+				n, entry);
+			return n;
+		}
+		cfg->port[port].tc[tc].dscps = n;
+	}
+	return 0;
+}
+
+/**
+ * Parse QoS configuration - rte_kvargs_process handler.
+ *
+ * Opens configuration file and parses its content.
+ *
+ * @param key Unused.
+ * @param path Path to config file.
+ * @param extra_args Pointer to configuration structure.
+ * @returns 0 in case of success, exits otherwise.
+ */
+int
+mrvl_get_qoscfg(const char *key __rte_unused, const char *path,
+		void *extra_args)
+{
+	struct mrvl_qos_cfg **cfg = extra_args;
+	struct rte_cfgfile *file = rte_cfgfile_load(path, 0);
+	uint32_t val;
+	int n, i, ret;
+	const char *entry;
+	char sec_name[32];
+
+	if (file == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot load configuration %s\n", path);
+
+	/* Create configuration. This is never accessed on the fast path,
+	 * so we can ignore socket.
+	 */
+	*cfg = rte_zmalloc("mrvl_qos_cfg", sizeof(struct mrvl_qos_cfg), 0);
+	if (*cfg == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot allocate configuration %s\n",
+			path);
+
+	n = rte_cfgfile_num_sections(file, MRVL_TOK_PORT,
+		sizeof(MRVL_TOK_PORT) - 1);
+
+	if (n == 0) {
+		/* This is weird, but not bad. */
+		RTE_LOG(WARNING, PMD, "Empty configuration file?\n");
+		return 0;
+	}
+
+	/* Use the number of ports given as vdev parameters. */
+	for (n = 0; n < mrvl_ports_nb; ++n) {
+		snprintf(sec_name, sizeof(sec_name), "%s %d %s",
+			MRVL_TOK_PORT, n, MRVL_TOK_DEFAULT);
+
+		/* Skip ports non-existing in configuration. */
+		if (rte_cfgfile_num_sections(file, sec_name,
+				strlen(sec_name)) <= 0) {
+			(*cfg)->port[n].use_global_defaults = 1;
+			(*cfg)->port[n].mapping_priority =
+				PP2_CLS_QOS_TBL_VLAN_IP_PRI;
+			continue;
+		}
+
+		entry = rte_cfgfile_get_entry(file, sec_name,
+				MRVL_TOK_DEFAULT_TC);
+		if (entry) {
+			if (get_val_securely(entry, &val) < 0 ||
+				val > USHRT_MAX)
+				return -1;
+			(*cfg)->port[n].default_tc = (uint8_t)val;
+		} else {
+			RTE_LOG(ERR, PMD,
+				"Default Traffic Class required in custom configuration!\n");
+			return -1;
+		}
+
+		entry = rte_cfgfile_get_entry(file, sec_name,
+				MRVL_TOK_MAPPING_PRIORITY);
+		if (entry) {
+			if (!strncmp(entry, MRVL_TOK_VLAN_IP,
+				sizeof(MRVL_TOK_VLAN_IP)))
+				(*cfg)->port[n].mapping_priority =
+					PP2_CLS_QOS_TBL_VLAN_IP_PRI;
+			else if (!strncmp(entry, MRVL_TOK_IP_VLAN,
+				sizeof(MRVL_TOK_IP_VLAN)))
+				(*cfg)->port[n].mapping_priority =
+					PP2_CLS_QOS_TBL_IP_VLAN_PRI;
+			else if (!strncmp(entry, MRVL_TOK_IP,
+				sizeof(MRVL_TOK_IP)))
+				(*cfg)->port[n].mapping_priority =
+					PP2_CLS_QOS_TBL_IP_PRI;
+			else if (!strncmp(entry, MRVL_TOK_VLAN,
+				sizeof(MRVL_TOK_VLAN)))
+				(*cfg)->port[n].mapping_priority =
+					PP2_CLS_QOS_TBL_VLAN_PRI;
+			else
+				rte_exit(EXIT_FAILURE,
+					"Error in parsing %s value (%s)!\n",
+					MRVL_TOK_MAPPING_PRIORITY, entry);
+		} else {
+			(*cfg)->port[n].mapping_priority =
+				PP2_CLS_QOS_TBL_VLAN_IP_PRI;
+		}
+
+		for (i = 0; i < MRVL_PP2_RXQ_MAX; ++i) {
+			ret = get_outq_cfg(file, n, i, *cfg);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Error %d parsing port %d outq %d!\n",
+					ret, n, i);
+		}
+
+		for (i = 0; i < MRVL_PP2_TC_MAX; ++i) {
+			ret = parse_tc_cfg(file, n, i, *cfg);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Error %d parsing port %d tc %d!\n",
+					ret, n, i);
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * Setup Traffic Class.
+ *
+ * Fill in TC parameters in single MUSDK TC config entry.
+ * @param param TC parameters entry.
+ * @param inqs Number of MUSDK in-queues in this TC.
+ * @param bpool Bpool for this TC.
+ * @returns 0 in case of success, exits otherwise.
+ */
+static int
+setup_tc(struct pp2_ppio_tc_params *param, uint8_t inqs,
+	struct pp2_bpool *bpool)
+{
+	struct pp2_ppio_inq_params *inq_params;
+
+	param->pkt_offset = MRVL_PKT_OFFS;
+	param->pools[0] = bpool;
+
+	inq_params = rte_zmalloc_socket("inq_params",
+		inqs * sizeof(*inq_params),
+		0, rte_socket_id());
+	if (!inq_params)
+		return -ENOMEM;
+
+	param->num_in_qs = inqs;
+
+	/* Release old config if necessary. */
+	if (param->inqs_params)
+		rte_free(param->inqs_params);
+
+	param->inqs_params = inq_params;
+
+	return 0;
+}
+
+/**
+ * Configure RX Queues in a given port.
+ *
+ * Sets up RX queues, their Traffic Classes and DPDK rxq->(TC,inq) mapping.
+ *
+ * @param priv Port's private data
+ * @param portid DPDK port ID
+ * @param max_queues Maximum number of queues to configure.
+ * @returns 0 in case of success, negative value otherwise.
+ */
+int
+mrvl_configure_rxqs(struct mrvl_priv *priv, uint8_t portid,
+	uint16_t max_queues)
+{
+	size_t i, tc;
+
+	if (mrvl_qos_cfg == NULL ||
+		mrvl_qos_cfg->port[portid].use_global_defaults) {
+		/* No port configuration, use default: 1 TC, no QoS. */
+		priv->ppio_params.inqs_params.num_tcs = 1;
+		setup_tc(&priv->ppio_params.inqs_params.tcs_params[0],
+			max_queues, priv->bpool);
+
+		/* Direct mapping of queues i.e. 0->0, 1->1 etc. */
+		for (i = 0; i < max_queues; ++i) {
+			priv->rxq_map[i].tc = 0;
+			priv->rxq_map[i].inq = i;
+		}
+		return 0;
+	}
+
+	/* We need only a subset of configuration. */
+	struct port_cfg *port_cfg = &mrvl_qos_cfg->port[portid];
+
+	priv->qos_tbl_params.type = port_cfg->mapping_priority;
+
+	/*
+	 * We need to reverse mapping, from tc->pcp (better from usability
+	 * point of view) to pcp->tc (configurable in MUSDK).
+	 * First, set all map elements to "default".
+	 */
+	for (i = 0; i < RTE_DIM(priv->qos_tbl_params.pcp_cos_map); ++i)
+		priv->qos_tbl_params.pcp_cos_map[i].tc = port_cfg->default_tc;
+
+	/* Then, fill in all known values. */
+	for (tc = 0; tc < RTE_DIM(port_cfg->tc); ++tc) {
+		if (port_cfg->tc[tc].pcps > RTE_DIM(port_cfg->tc[0].pcp)) {
+			/* Better safe than sorry. */
+			RTE_LOG(ERR, PMD,
+				"Too many PCPs configured in TC %zu!\n", tc);
+			return -1;
+		}
+		for (i = 0; i < port_cfg->tc[tc].pcps; ++i) {
+			priv->qos_tbl_params.pcp_cos_map[
+			  port_cfg->tc[tc].pcp[i]].tc = tc;
+		}
+	}
+
+	/*
+	 * The same logic goes with DSCP.
+	 * First, set all map elements to "default".
+	 */
+	for (i = 0; i < RTE_DIM(priv->qos_tbl_params.dscp_cos_map); ++i)
+		priv->qos_tbl_params.dscp_cos_map[i].tc =
+			port_cfg->default_tc;
+
+	/* Fill in all known values. */
+	for (tc = 0; tc < RTE_DIM(port_cfg->tc); ++tc) {
+		if (port_cfg->tc[tc].dscps > RTE_DIM(port_cfg->tc[0].dscp)) {
+			/* Better safe than sorry. */
+			RTE_LOG(ERR, PMD,
+				"Too many DSCPs configured in TC %zu!\n", tc);
+			return -1;
+		}
+		for (i = 0; i < port_cfg->tc[tc].dscps; ++i) {
+			priv->qos_tbl_params.dscp_cos_map[
+			  port_cfg->tc[tc].dscp[i]].tc = tc;
+		}
+	}
+
+	/*
+	 * Surprisingly, similar logic goes with queue mapping.
+	 * We need only to store qid->tc mapping,
+	 * to know TC when queue is read.
+	 */
+	for (i = 0; i < RTE_DIM(priv->rxq_map); ++i)
+		priv->rxq_map[i].tc = MRVL_UNKNOWN_TC;
+
+	/* Set up DPDKq->(TC,inq) mapping. */
+	for (tc = 0; tc < RTE_DIM(port_cfg->tc); ++tc) {
+		if (port_cfg->tc[tc].inqs > RTE_DIM(port_cfg->tc[0].inq)) {
+			/* Overflow. */
+			RTE_LOG(ERR, PMD,
+				"Too many RX queues configured per TC %zu!\n",
+				tc);
+			return -1;
+		}
+		for (i = 0; i < port_cfg->tc[tc].inqs; ++i) {
+			uint8_t idx = port_cfg->tc[tc].inq[i];
+
+			priv->rxq_map[idx].tc = tc;
+			priv->rxq_map[idx].inq = i;
+		}
+	}
+
+	/*
+	 * Set up TC configuration. TCs need to be sequenced: 0, 1, 2
+	 * with no gaps. Empty TC means end of processing.
+	 */
+	for (i = 0; i < MRVL_PP2_TC_MAX; ++i) {
+		if (port_cfg->tc[i].inqs == 0)
+			break;
+		setup_tc(&priv->ppio_params.inqs_params.tcs_params[i],
+				port_cfg->tc[i].inqs,
+				priv->bpool);
+	}
+
+	priv->ppio_params.inqs_params.num_tcs = i;
+
+	return 0;
+}
+
+/**
+ * Start QoS mapping.
+ *
+ * Finalize QoS table configuration and initialize it in SDK. It can be done
+ * only after port is started, so we have a valid ppio reference.
+ *
+ * @param priv Port's private (configuration) data.
+ * @returns 0 in case of success, exits otherwise.
+ */
+int
+mrvl_start_qos_mapping(struct mrvl_priv *priv)
+{
+	size_t i;
+
+	if (priv->ppio == NULL) {
+		RTE_LOG(ERR, PMD, "ppio must not be NULL here!\n");
+		return -1;
+	}
+
+	for (i = 0; i < RTE_DIM(priv->qos_tbl_params.pcp_cos_map); ++i)
+		priv->qos_tbl_params.pcp_cos_map[i].ppio = priv->ppio;
+
+	for (i = 0; i < RTE_DIM(priv->qos_tbl_params.dscp_cos_map); ++i)
+		priv->qos_tbl_params.dscp_cos_map[i].ppio = priv->ppio;
+
+	/* Initialize Classifier QoS table. */
+
+	return pp2_cls_qos_tbl_init(&priv->qos_tbl_params, &priv->qos_tbl);
+}
diff --git a/drivers/net/mrvl/mrvl_qos.h b/drivers/net/mrvl/mrvl_qos.h
new file mode 100644
index 0000000..0fcc85c
--- /dev/null
+++ b/drivers/net/mrvl/mrvl_qos.h
@@ -0,0 +1,112 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Semihalf. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Semihalf nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MRVL_QOS_H_
+#define _MRVL_QOS_H_
+
+#include <rte_common.h>
+#include <rte_config.h>
+
+#include "mrvl_ethdev.h"
+
+/** Code Points per Traffic Class. Equals max(DSCP, PCP). */
+#define MRVL_CP_PER_TC (64)
+
+/** Value used as "unknown". */
+#define MRVL_UNKNOWN_TC (0xFF)
+
+/* QoS config. */
+struct mrvl_qos_cfg {
+	struct port_cfg {
+		struct {
+			uint8_t inq[MRVL_PP2_RXQ_MAX];
+			uint8_t dscp[MRVL_CP_PER_TC];
+			uint8_t pcp[MRVL_CP_PER_TC];
+			uint8_t inqs;
+			uint8_t dscps;
+			uint8_t pcps;
+		} tc[MRVL_PP2_TC_MAX];
+		struct {
+			uint8_t weight;
+		} outq[MRVL_PP2_RXQ_MAX];
+		enum pp2_cls_qos_tbl_type mapping_priority;
+		uint16_t inqs;
+		uint16_t outqs;
+		uint8_t default_tc;
+		uint8_t use_global_defaults;
+	} port[RTE_MAX_ETHPORTS];
+};
+
+/** Global QoS configuration. */
+extern struct mrvl_qos_cfg *mrvl_qos_cfg;
+
+/**
+ * Parse QoS configuration - rte_kvargs_process handler.
+ *
+ * Opens configuration file and parses its content.
+ *
+ * @param key Unused.
+ * @param path Path to config file.
+ * @param extra_args Pointer to configuration structure.
+ * @returns 0 in case of success, exits otherwise.
+ */
+int
+mrvl_get_qoscfg(const char *key __rte_unused, const char *path,
+		void *extra_args);
+
+/**
+ * Configure RX Queues in a given port.
+ *
+ * Sets up RX queues, their Traffic Classes and DPDK rxq->(TC,inq) mapping.
+ *
+ * @param priv Port's private data
+ * @param portid DPDK port ID
+ * @param max_queues Maximum number of queues to configure.
+ * @returns 0 in case of success, negative value otherwise.
+ */
+int
+mrvl_configure_rxqs(struct mrvl_priv *priv, uint8_t portid,
+		    uint16_t max_queues);
+
+/**
+ * Start QoS mapping.
+ *
+ * Finalize QoS table configuration and initialize it in SDK. It can be done
+ * only after port is started, so we have a valid ppio reference.
+ *
+ * @param priv Port's private (configuration) data.
+ * @returns 0 in case of success, exits otherwise.
+ */
+int
+mrvl_start_qos_mapping(struct mrvl_priv *priv);
+
+#endif /* _MRVL_QOS_H_ */
diff --git a/drivers/net/mrvl/rte_pmd_mrvl_version.map b/drivers/net/mrvl/rte_pmd_mrvl_version.map
new file mode 100644
index 0000000..a753031
--- /dev/null
+++ b/drivers/net/mrvl/rte_pmd_mrvl_version.map
@@ -0,0 +1,3 @@ 
+DPDK_17.11 {
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 94568a8..8df74bb 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -130,6 +130,7 @@  endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4 -libverbs
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MRVL_PMD)       += -lrte_pmd_mrvl -L$(LIBMUSDK_PATH)/lib -lmusdk
 _LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD)        += -lrte_pmd_nfp
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL)       += -lrte_pmd_null
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP)       += -lrte_pmd_pcap -lpcap