[v9,05/12] net/nfp: add flower PF setup logic

Message ID 1663238669-12244-6-git-send-email-chaoyong.he@corigine.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series preparation for the rte_flow offload of nfp PMD |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Chaoyong He Sept. 15, 2022, 10:44 a.m. UTC
  Adds the vNIC initialization logic for the flower PF vNIC. The flower
firmware application exposes this vNIC for the purposes of fallback
traffic in the switchdev use-case.

Adds minimal dev_ops for this PF vNIC device. Because the device is
being exposed externally to DPDK it needs to implements a minimal set
of dev_ops.

Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
---
 drivers/net/nfp/flower/nfp_flower.c | 369 +++++++++++++++++++++++++++++++++++-
 drivers/net/nfp/flower/nfp_flower.h |   8 +
 drivers/net/nfp/nfp_common.h        |   3 +
 3 files changed, 377 insertions(+), 3 deletions(-)
  

Comments

Ferruh Yigit Sept. 20, 2022, 2:57 p.m. UTC | #1
On 9/15/2022 11:44 AM, Chaoyong He wrote:
> Adds the vNIC initialization logic for the flower PF vNIC. The flower
> firmware application exposes this vNIC for the purposes of fallback
> traffic in the switchdev use-case.
> 
> Adds minimal dev_ops for this PF vNIC device. Because the device is
> being exposed externally to DPDK it needs to implements a minimal set
> of dev_ops.
> 
> Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>

<...>

> +
> +struct dp_packet {
> +	struct rte_mbuf mbuf;
> +	uint32_t source;
> +};
> +
> +static void
> +nfp_flower_pf_mp_init(__rte_unused struct rte_mempool *mp,
> +		__rte_unused void *opaque_arg,
> +		void *packet,
> +		__rte_unused unsigned int i)
> +{
> +	struct dp_packet *pkt = packet;
> +	/* Indicate that this pkt is from DPDK */
> +	pkt->source = 3;
> +}
> +
> +static struct rte_mempool *
> +nfp_flower_pf_mp_create(void)
> +{
> +	uint32_t nb_mbufs;
> +	unsigned int numa_node;
> +	struct rte_mempool *pktmbuf_pool;
> +	uint32_t n_rxd = PF_VNIC_NB_DESC;
> +	uint32_t n_txd = PF_VNIC_NB_DESC;
> +
> +	nb_mbufs = RTE_MAX(n_rxd + n_txd + MAX_PKT_BURST + MEMPOOL_CACHE_SIZE, 81920U);
> +
> +	numa_node = rte_socket_id();
> +	pktmbuf_pool = rte_pktmbuf_pool_create("flower_pf_mbuf_pool", nb_mbufs,
> +			MEMPOOL_CACHE_SIZE, MBUF_PRIV_SIZE,
> +			RTE_MBUF_DEFAULT_BUF_SIZE, numa_node);
> +	if (pktmbuf_pool == NULL) {
> +		PMD_INIT_LOG(ERR, "Cannot init pf vnic mbuf pool");
> +		return NULL;
> +	}
> +
> +	rte_mempool_obj_iter(pktmbuf_pool, nfp_flower_pf_mp_init, NULL);
> +
> +	return pktmbuf_pool;
> +}
> +

Hi Chaoyong,

Again, similar comment to previous versions, what I understand is this 
new flower FW supports HW flow filter and intended use case is for OvS 
HW acceleration.
But is DPDK driver need to know OvS data structures, like "struct 
dp_packet", can it be transparent to application, I am sure there are 
other devices offloading some OvS task to HW.

@Ian, @David,

Can you please comment on above usage, do you guys see any way to escape 
from OvS specific code in the driver?
  
Chaoyong He Sept. 21, 2022, 2:50 a.m. UTC | #2
> On 9/15/2022 11:44 AM, Chaoyong He wrote:
> > Adds the vNIC initialization logic for the flower PF vNIC. The flower
> > firmware application exposes this vNIC for the purposes of fallback
> > traffic in the switchdev use-case.
> >
> > Adds minimal dev_ops for this PF vNIC device. Because the device is
> > being exposed externally to DPDK it needs to implements a minimal set
> > of dev_ops.
> >
> > Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
> > Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
> 
> <...>
> 
> > +
> > +struct dp_packet {
> > +	struct rte_mbuf mbuf;
> > +	uint32_t source;
> > +};
> > +
> > +static void
> > +nfp_flower_pf_mp_init(__rte_unused struct rte_mempool *mp,
> > +		__rte_unused void *opaque_arg,
> > +		void *packet,
> > +		__rte_unused unsigned int i)
> > +{
> > +	struct dp_packet *pkt = packet;
> > +	/* Indicate that this pkt is from DPDK */
> > +	pkt->source = 3;
> > +}
> > +
> > +static struct rte_mempool *
> > +nfp_flower_pf_mp_create(void)
> > +{
> > +	uint32_t nb_mbufs;
> > +	unsigned int numa_node;
> > +	struct rte_mempool *pktmbuf_pool;
> > +	uint32_t n_rxd = PF_VNIC_NB_DESC;
> > +	uint32_t n_txd = PF_VNIC_NB_DESC;
> > +
> > +	nb_mbufs = RTE_MAX(n_rxd + n_txd + MAX_PKT_BURST +
> > +MEMPOOL_CACHE_SIZE, 81920U);
> > +
> > +	numa_node = rte_socket_id();
> > +	pktmbuf_pool = rte_pktmbuf_pool_create("flower_pf_mbuf_pool",
> nb_mbufs,
> > +			MEMPOOL_CACHE_SIZE, MBUF_PRIV_SIZE,
> > +			RTE_MBUF_DEFAULT_BUF_SIZE, numa_node);
> > +	if (pktmbuf_pool == NULL) {
> > +		PMD_INIT_LOG(ERR, "Cannot init pf vnic mbuf pool");
> > +		return NULL;
> > +	}
> > +
> > +	rte_mempool_obj_iter(pktmbuf_pool, nfp_flower_pf_mp_init,
> NULL);
> > +
> > +	return pktmbuf_pool;
> > +}
> > +
> 
> Hi Chaoyong,
> 
> Again, similar comment to previous versions, what I understand is this new
> flower FW supports HW flow filter and intended use case is for OvS HW
> acceleration.
> But is DPDK driver need to know OvS data structures, like "struct dp_packet",
> can it be transparent to application, I am sure there are other devices
> offloading some OvS task to HW.
> 
> @Ian, @David,
> 
> Can you please comment on above usage, do you guys see any way to
> escape from OvS specific code in the driver?

Firstly, I'll explain why we must include some OvS specific code in the driver.
If we don't set the `pkt->source = 3`, the OvS will coredump like this:
```
(gdb) bt
#0  0x00007fe1d48fd387 in raise () from /lib64/libc.so.6
#1  0x00007fe1d48fea78 in abort () from /lib64/libc.so.6
#2  0x00007fe1d493ff67 in __libc_message () from /lib64/libc.so.6
#3  0x00007fe1d4948329 in _int_free () from /lib64/libc.so.6
#4  0x000000000049c006 in dp_packet_uninit (b=0x1f262db80) at lib/dp-packet.c:135
#5  0x000000000061440a in dp_packet_delete (b=0x1f262db80) at lib/dp-packet.h:261
#6  0x0000000000619aa0 in dpdk_copy_batch_to_mbuf (netdev=0x1f0a04a80, batch=0x7fe1b40050c0) at lib/netdev-dpdk.c:274
#7  0x0000000000619b46 in netdev_dpdk_common_send (netdev=0x1f0a04a80, batch=0x7fe1b40050c0, stats=0x7fe1be7321f0) at
#8  0x000000000061a0ba in netdev_dpdk_eth_send (netdev=0x1f0a04a80, qid=0, batch=0x7fe1b40050c0, concurrent_txq=true)
#9  0x00000000004fbd10 in netdev_send (netdev=0x1f0a04a80, qid=0, batch=0x7fe1b40050c0, concurrent_txq=true) at lib/n
#10 0x00000000004aa663 in dp_netdev_pmd_flush_output_on_port (pmd=0x7fe1be735010, p=0x7fe1b4005090) at lib/dpif-netde
#11 0x00000000004aa85d in dp_netdev_pmd_flush_output_packets (pmd=0x7fe1be735010, force=false) at lib/dpif-netdev.c:5
#12 0x00000000004aaaef in dp_netdev_process_rxq_port (pmd=0x7fe1be735010, rxq=0x16f3f80, port_no=3) at lib/dpif-netde
#13 0x00000000004af17a in pmd_thread_main (f_=0x7fe1be735010) at lib/dpif-netdev.c:6958
#14 0x000000000057da80 in ovsthread_wrapper (aux_=0x1608b30) at lib/ovs-thread.c:422
#15 0x00007fe1d51a6ea5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fe1d49c5b0d in clone () from /lib64/libc.so.6
```
The logic in function `dp_packet_delete()` run into the wrong branch.

Then, why just our PMD need do this, and other PMDs don't?
Generally, it's greatly dependent on the hardware.

The Netronome's Network Flow Processor 4xxx (NFP-4xxx) card is the target card of these series patches.
Which only has one PF but has 2 physical ports, and the NFP PMD can work with up to 8 ports on the same PF device. 
Other PMDs hardware seems all 'one PF <--> one physical port'.

For the use case of OvS, we should add the representor port of 'physical port' to the bridge, not the representor port of PF like other PMDs.

We use a two-layer poll mode architecture. (Other PMDs are simple poll mode architecture)
In the RX direction:
1. When the physical port or vf receives pkts, the firmware will prepend a meta-data(indicating the input port) into the pkt.
2. We use the PF vNIC as a multiplexer, which keeps polling pkts from the firmware.
3. The PF vNIC will parse the meta-data, and enqueue the pkt into the corresponding rte_ring of the representor port of physical port or vf.
4. The OVS will polling pkts from the RX function of representor port, which dequeue pkts from the rte_ring.
In the TX direction:
1. The OVS send the pkts from the TX functions of representor port.
2. The representor port will prepend a meta-data(indicating the output port) into the pkt and send the pkt to firmware through the queue 0 of PF vNIC.
3. The firmware will parse the meta-data, and forward the pkt to the corresponding physical port or vf.

So the OvS won't create the mempool for us and we must create it ourselves for the PF vNIC to use.

Hopefully, I explained the things clearly. Thanks.
  
Thomas Monjalon Sept. 21, 2022, 7:35 a.m. UTC | #3
I don't understand your logic fully,
but I understand you need special code to make your hardware work with OvS,
meaning:
	- OvS must have a special handling for your HW
	- other applications won't work
Tell me I misunderstand,
but I feel we should not accept this patch,
there is probably a better way to manage the specific of your HW.

You said "NFP PMD can work with up to 8 ports on the same PF device."
Let's imagine you have 8 ports for 1 PF device.
Do you allocate 8 ethdev ports?
If yes, then each ethdev should do the internal work,
and nothing is needed at application level.


21/09/2022 04:50, Chaoyong He:
> > On 9/15/2022 11:44 AM, Chaoyong He wrote:
> > Hi Chaoyong,
> > 
> > Again, similar comment to previous versions, what I understand is this new
> > flower FW supports HW flow filter and intended use case is for OvS HW
> > acceleration.
> > But is DPDK driver need to know OvS data structures, like "struct dp_packet",
> > can it be transparent to application, I am sure there are other devices
> > offloading some OvS task to HW.
> > 
> > @Ian, @David,
> > 
> > Can you please comment on above usage, do you guys see any way to
> > escape from OvS specific code in the driver?
> 
> Firstly, I'll explain why we must include some OvS specific code in the driver.
> If we don't set the `pkt->source = 3`, the OvS will coredump like this:
> ```
> (gdb) bt
> #0  0x00007fe1d48fd387 in raise () from /lib64/libc.so.6
> #1  0x00007fe1d48fea78 in abort () from /lib64/libc.so.6
> #2  0x00007fe1d493ff67 in __libc_message () from /lib64/libc.so.6
> #3  0x00007fe1d4948329 in _int_free () from /lib64/libc.so.6
> #4  0x000000000049c006 in dp_packet_uninit (b=0x1f262db80) at lib/dp-packet.c:135
> #5  0x000000000061440a in dp_packet_delete (b=0x1f262db80) at lib/dp-packet.h:261
> #6  0x0000000000619aa0 in dpdk_copy_batch_to_mbuf (netdev=0x1f0a04a80, batch=0x7fe1b40050c0) at lib/netdev-dpdk.c:274
> #7  0x0000000000619b46 in netdev_dpdk_common_send (netdev=0x1f0a04a80, batch=0x7fe1b40050c0, stats=0x7fe1be7321f0) at
> #8  0x000000000061a0ba in netdev_dpdk_eth_send (netdev=0x1f0a04a80, qid=0, batch=0x7fe1b40050c0, concurrent_txq=true)
> #9  0x00000000004fbd10 in netdev_send (netdev=0x1f0a04a80, qid=0, batch=0x7fe1b40050c0, concurrent_txq=true) at lib/n
> #10 0x00000000004aa663 in dp_netdev_pmd_flush_output_on_port (pmd=0x7fe1be735010, p=0x7fe1b4005090) at lib/dpif-netde
> #11 0x00000000004aa85d in dp_netdev_pmd_flush_output_packets (pmd=0x7fe1be735010, force=false) at lib/dpif-netdev.c:5
> #12 0x00000000004aaaef in dp_netdev_process_rxq_port (pmd=0x7fe1be735010, rxq=0x16f3f80, port_no=3) at lib/dpif-netde
> #13 0x00000000004af17a in pmd_thread_main (f_=0x7fe1be735010) at lib/dpif-netdev.c:6958
> #14 0x000000000057da80 in ovsthread_wrapper (aux_=0x1608b30) at lib/ovs-thread.c:422
> #15 0x00007fe1d51a6ea5 in start_thread () from /lib64/libpthread.so.0
> #16 0x00007fe1d49c5b0d in clone () from /lib64/libc.so.6
> ```
> The logic in function `dp_packet_delete()` run into the wrong branch.
> 
> Then, why just our PMD need do this, and other PMDs don't?
> Generally, it's greatly dependent on the hardware.
> 
> The Netronome's Network Flow Processor 4xxx (NFP-4xxx) card is the target card of these series patches.
> Which only has one PF but has 2 physical ports, and the NFP PMD can work with up to 8 ports on the same PF device. 
> Other PMDs hardware seems all 'one PF <--> one physical port'.
> 
> For the use case of OvS, we should add the representor port of 'physical port' to the bridge, not the representor port of PF like other PMDs.
> 
> We use a two-layer poll mode architecture. (Other PMDs are simple poll mode architecture)
> In the RX direction:
> 1. When the physical port or vf receives pkts, the firmware will prepend a meta-data(indicating the input port) into the pkt.
> 2. We use the PF vNIC as a multiplexer, which keeps polling pkts from the firmware.
> 3. The PF vNIC will parse the meta-data, and enqueue the pkt into the corresponding rte_ring of the representor port of physical port or vf.
> 4. The OVS will polling pkts from the RX function of representor port, which dequeue pkts from the rte_ring.
> In the TX direction:
> 1. The OVS send the pkts from the TX functions of representor port.
> 2. The representor port will prepend a meta-data(indicating the output port) into the pkt and send the pkt to firmware through the queue 0 of PF vNIC.
> 3. The firmware will parse the meta-data, and forward the pkt to the corresponding physical port or vf.
> 
> So the OvS won't create the mempool for us and we must create it ourselves for the PF vNIC to use.
> 
> Hopefully, I explained the things clearly. Thanks.
  
Chaoyong He Sept. 21, 2022, 7:47 a.m. UTC | #4
> Subject: Re: [PATCH v9 05/12] net/nfp: add flower PF setup logic
> 
> I don't understand your logic fully,
> but I understand you need special code to make your hardware work with
> OvS,
> meaning:
> 	- OvS must have a special handling for your HW
> 	- other applications won't work
> Tell me I misunderstand,
> but I feel we should not accept this patch, there is probably a better way to
> manage the specific of your HW.

OvS need not do anything special handling for our HW.
Other applications won't work -- Sorry I don't understand your mean at this point.

> You said "NFP PMD can work with up to 8 ports on the same PF device."
> Let's imagine you have 8 ports for 1 PF device.
> Do you allocate 8 ethdev ports?
> If yes, then each ethdev should do the internal work, and nothing is needed
> at application level.

No, we still just create 1 PF vNIC to handle the feedback traffic.
Of course we will 8 representor port for physical port.
 
> 21/09/2022 04:50, Chaoyong He:
> > > On 9/15/2022 11:44 AM, Chaoyong He wrote:
> > > Hi Chaoyong,
> > >
> > > Again, similar comment to previous versions, what I understand is
> > > this new flower FW supports HW flow filter and intended use case is
> > > for OvS HW acceleration.
> > > But is DPDK driver need to know OvS data structures, like "struct
> > > dp_packet", can it be transparent to application, I am sure there
> > > are other devices offloading some OvS task to HW.
> > >
> > > @Ian, @David,
> > >
> > > Can you please comment on above usage, do you guys see any way to
> > > escape from OvS specific code in the driver?
> >
> > Firstly, I'll explain why we must include some OvS specific code in the driver.
> > If we don't set the `pkt->source = 3`, the OvS will coredump like this:
> > ```
> > (gdb) bt
> > #0  0x00007fe1d48fd387 in raise () from /lib64/libc.so.6
> > #1  0x00007fe1d48fea78 in abort () from /lib64/libc.so.6
> > #2  0x00007fe1d493ff67 in __libc_message () from /lib64/libc.so.6
> > #3  0x00007fe1d4948329 in _int_free () from /lib64/libc.so.6
> > #4  0x000000000049c006 in dp_packet_uninit (b=0x1f262db80) at
> > lib/dp-packet.c:135
> > #5  0x000000000061440a in dp_packet_delete (b=0x1f262db80) at
> > lib/dp-packet.h:261
> > #6  0x0000000000619aa0 in dpdk_copy_batch_to_mbuf
> (netdev=0x1f0a04a80,
> > batch=0x7fe1b40050c0) at lib/netdev-dpdk.c:274
> > #7  0x0000000000619b46 in netdev_dpdk_common_send
> (netdev=0x1f0a04a80,
> > batch=0x7fe1b40050c0, stats=0x7fe1be7321f0) at
> > #8  0x000000000061a0ba in netdev_dpdk_eth_send (netdev=0x1f0a04a80,
> > qid=0, batch=0x7fe1b40050c0, concurrent_txq=true)
> > #9  0x00000000004fbd10 in netdev_send (netdev=0x1f0a04a80, qid=0,
> > batch=0x7fe1b40050c0, concurrent_txq=true) at lib/n
> > #10 0x00000000004aa663 in dp_netdev_pmd_flush_output_on_port
> > (pmd=0x7fe1be735010, p=0x7fe1b4005090) at lib/dpif-netde
> > #11 0x00000000004aa85d in dp_netdev_pmd_flush_output_packets
> > (pmd=0x7fe1be735010, force=false) at lib/dpif-netdev.c:5
> > #12 0x00000000004aaaef in dp_netdev_process_rxq_port
> > (pmd=0x7fe1be735010, rxq=0x16f3f80, port_no=3) at lib/dpif-netde
> > #13 0x00000000004af17a in pmd_thread_main (f_=0x7fe1be735010) at
> > lib/dpif-netdev.c:6958
> > #14 0x000000000057da80 in ovsthread_wrapper (aux_=0x1608b30) at
> > lib/ovs-thread.c:422
> > #15 0x00007fe1d51a6ea5 in start_thread () from /lib64/libpthread.so.0
> > #16 0x00007fe1d49c5b0d in clone () from /lib64/libc.so.6 ``` The logic
> > in function `dp_packet_delete()` run into the wrong branch.
> >
> > Then, why just our PMD need do this, and other PMDs don't?
> > Generally, it's greatly dependent on the hardware.
> >
> > The Netronome's Network Flow Processor 4xxx (NFP-4xxx) card is the
> target card of these series patches.
> > Which only has one PF but has 2 physical ports, and the NFP PMD can work
> with up to 8 ports on the same PF device.
> > Other PMDs hardware seems all 'one PF <--> one physical port'.
> >
> > For the use case of OvS, we should add the representor port of 'physical
> port' to the bridge, not the representor port of PF like other PMDs.
> >
> > We use a two-layer poll mode architecture. (Other PMDs are simple poll
> > mode architecture) In the RX direction:
> > 1. When the physical port or vf receives pkts, the firmware will prepend a
> meta-data(indicating the input port) into the pkt.
> > 2. We use the PF vNIC as a multiplexer, which keeps polling pkts from the
> firmware.
> > 3. The PF vNIC will parse the meta-data, and enqueue the pkt into the
> corresponding rte_ring of the representor port of physical port or vf.
> > 4. The OVS will polling pkts from the RX function of representor port, which
> dequeue pkts from the rte_ring.
> > In the TX direction:
> > 1. The OVS send the pkts from the TX functions of representor port.
> > 2. The representor port will prepend a meta-data(indicating the output
> port) into the pkt and send the pkt to firmware through the queue 0 of PF
> vNIC.
> > 3. The firmware will parse the meta-data, and forward the pkt to the
> corresponding physical port or vf.
> >
> > So the OvS won't create the mempool for us and we must create it
> ourselves for the PF vNIC to use.
> >
> > Hopefully, I explained the things clearly. Thanks.
> 
>
  

Patch

diff --git a/drivers/net/nfp/flower/nfp_flower.c b/drivers/net/nfp/flower/nfp_flower.c
index 87cb922..34e60f8 100644
--- a/drivers/net/nfp/flower/nfp_flower.c
+++ b/drivers/net/nfp/flower/nfp_flower.c
@@ -14,12 +14,312 @@ 
 #include "../nfp_logs.h"
 #include "../nfp_ctrl.h"
 #include "../nfp_cpp_bridge.h"
+#include "../nfp_rxtx.h"
+#include "../nfpcore/nfp_mip.h"
+#include "../nfpcore/nfp_rtsym.h"
+#include "../nfpcore/nfp_nsp.h"
 #include "nfp_flower.h"
 
+#define MAX_PKT_BURST 32
+#define MBUF_PRIV_SIZE 128
+#define MEMPOOL_CACHE_SIZE 512
+#define DEFAULT_FLBUF_SIZE 9216
+
+#define PF_VNIC_NB_DESC 1024
+
+static const struct rte_eth_rxconf rx_conf = {
+	.rx_free_thresh = DEFAULT_RX_FREE_THRESH,
+	.rx_drop_en = 1,
+};
+
+static const struct rte_eth_txconf tx_conf = {
+	.tx_thresh = {
+		.pthresh  = DEFAULT_TX_PTHRESH,
+		.hthresh = DEFAULT_TX_HTHRESH,
+		.wthresh = DEFAULT_TX_WTHRESH,
+	},
+	.tx_free_thresh = DEFAULT_TX_FREE_THRESH,
+};
+
+static const struct eth_dev_ops nfp_flower_pf_vnic_ops = {
+	.dev_infos_get          = nfp_net_infos_get,
+};
+
+struct dp_packet {
+	struct rte_mbuf mbuf;
+	uint32_t source;
+};
+
+static void
+nfp_flower_pf_mp_init(__rte_unused struct rte_mempool *mp,
+		__rte_unused void *opaque_arg,
+		void *packet,
+		__rte_unused unsigned int i)
+{
+	struct dp_packet *pkt = packet;
+	/* Indicate that this pkt is from DPDK */
+	pkt->source = 3;
+}
+
+static struct rte_mempool *
+nfp_flower_pf_mp_create(void)
+{
+	uint32_t nb_mbufs;
+	unsigned int numa_node;
+	struct rte_mempool *pktmbuf_pool;
+	uint32_t n_rxd = PF_VNIC_NB_DESC;
+	uint32_t n_txd = PF_VNIC_NB_DESC;
+
+	nb_mbufs = RTE_MAX(n_rxd + n_txd + MAX_PKT_BURST + MEMPOOL_CACHE_SIZE, 81920U);
+
+	numa_node = rte_socket_id();
+	pktmbuf_pool = rte_pktmbuf_pool_create("flower_pf_mbuf_pool", nb_mbufs,
+			MEMPOOL_CACHE_SIZE, MBUF_PRIV_SIZE,
+			RTE_MBUF_DEFAULT_BUF_SIZE, numa_node);
+	if (pktmbuf_pool == NULL) {
+		PMD_INIT_LOG(ERR, "Cannot init pf vnic mbuf pool");
+		return NULL;
+	}
+
+	rte_mempool_obj_iter(pktmbuf_pool, nfp_flower_pf_mp_init, NULL);
+
+	return pktmbuf_pool;
+}
+
+static int
+nfp_flower_init_vnic_common(struct nfp_net_hw *hw, const char *vnic_type)
+{
+	uint32_t start_q;
+	uint64_t rx_bar_off;
+	uint64_t tx_bar_off;
+	const int stride = 4;
+	struct nfp_pf_dev *pf_dev;
+	struct rte_pci_device *pci_dev;
+
+	pf_dev = hw->pf_dev;
+	pci_dev = hw->pf_dev->pci_dev;
+
+	/* NFP can not handle DMA addresses requiring more than 40 bits */
+	if (rte_mem_check_dma_mask(40)) {
+		PMD_INIT_LOG(ERR, "Device %s can not be used: restricted dma mask to 40 bits!\n",
+				pci_dev->device.name);
+		return -ENODEV;
+	};
+
+	hw->device_id = pci_dev->id.device_id;
+	hw->vendor_id = pci_dev->id.vendor_id;
+	hw->subsystem_device_id = pci_dev->id.subsystem_device_id;
+	hw->subsystem_vendor_id = pci_dev->id.subsystem_vendor_id;
+
+	PMD_INIT_LOG(DEBUG, "%s vNIC ctrl bar: %p", vnic_type, hw->ctrl_bar);
+
+	/* Read the number of available rx/tx queues from hardware */
+	hw->max_rx_queues = nn_cfg_readl(hw, NFP_NET_CFG_MAX_RXRINGS);
+	hw->max_tx_queues = nn_cfg_readl(hw, NFP_NET_CFG_MAX_TXRINGS);
+
+	/* Work out where in the BAR the queues start */
+	start_q = nn_cfg_readl(hw, NFP_NET_CFG_START_TXQ);
+	tx_bar_off = (uint64_t)start_q * NFP_QCP_QUEUE_ADDR_SZ;
+	start_q = nn_cfg_readl(hw, NFP_NET_CFG_START_RXQ);
+	rx_bar_off = (uint64_t)start_q * NFP_QCP_QUEUE_ADDR_SZ;
+
+	hw->tx_bar = pf_dev->hw_queues + tx_bar_off;
+	hw->rx_bar = pf_dev->hw_queues + rx_bar_off;
+
+	/* Get some of the read-only fields from the config BAR */
+	hw->ver = nn_cfg_readl(hw, NFP_NET_CFG_VERSION);
+	hw->cap = nn_cfg_readl(hw, NFP_NET_CFG_CAP);
+	hw->max_mtu = nn_cfg_readl(hw, NFP_NET_CFG_MAX_MTU);
+	/* Set the current MTU to the maximum supported */
+	hw->mtu = hw->max_mtu;
+	hw->flbufsz = DEFAULT_FLBUF_SIZE;
+
+	/* read the Rx offset configured from firmware */
+	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
+		hw->rx_offset = NFP_NET_RX_OFFSET;
+	else
+		hw->rx_offset = nn_cfg_readl(hw, NFP_NET_CFG_RX_OFFSET_ADDR);
+
+	hw->ctrl = 0;
+	hw->stride_rx = stride;
+	hw->stride_tx = stride;
+
+	/* Reuse cfg queue setup function */
+	nfp_net_cfg_queue_setup(hw);
+
+	PMD_INIT_LOG(INFO, "%s vNIC max_rx_queues: %u, max_tx_queues: %u",
+			vnic_type, hw->max_rx_queues, hw->max_tx_queues);
+
+	/* Initializing spinlock for reconfigs */
+	rte_spinlock_init(&hw->reconfig_lock);
+
+	return 0;
+}
+
+static int
+nfp_flower_init_pf_vnic(struct nfp_net_hw *hw)
+{
+	int ret;
+	uint16_t i;
+	uint16_t n_txq;
+	uint16_t n_rxq;
+	unsigned int numa_node;
+	struct rte_mempool *mp;
+	struct nfp_pf_dev *pf_dev;
+	struct rte_eth_dev *eth_dev;
+	struct nfp_app_fw_flower *app_fw_flower;
+
+	static const struct rte_eth_conf port_conf = {
+		.rxmode = {
+			.mq_mode  = RTE_ETH_MQ_RX_RSS,
+			.offloads = RTE_ETH_RX_OFFLOAD_CHECKSUM,
+		},
+	};
+
+	/* Set up some pointers here for ease of use */
+	pf_dev = hw->pf_dev;
+	app_fw_flower = NFP_PRIV_TO_APP_FW_FLOWER(pf_dev->app_fw_priv);
+
+	/*
+	 * Perform the "common" part of setting up a flower vNIC.
+	 * Mostly reading configuration from hardware.
+	 */
+	ret = nfp_flower_init_vnic_common(hw, "pf_vnic");
+	if (ret != 0) {
+		PMD_INIT_LOG(ERR, "Could not init pf vnic");
+		return -EINVAL;
+	}
+
+	hw->eth_dev = rte_eth_dev_allocate("nfp_pf_vnic");
+	if (hw->eth_dev == NULL) {
+		PMD_INIT_LOG(ERR, "Could not allocate pf vnic");
+		return -ENOMEM;
+	}
+
+	/* Grab the pointer to the newly created rte_eth_dev here */
+	eth_dev = hw->eth_dev;
+
+	numa_node = rte_socket_id();
+
+	/* Create a mbuf pool for the PF */
+	app_fw_flower->pf_pktmbuf_pool = nfp_flower_pf_mp_create();
+	if (app_fw_flower->pf_pktmbuf_pool == NULL) {
+		PMD_INIT_LOG(ERR, "Could not create mempool for pf vnic");
+		ret = -ENOMEM;
+		goto port_release;
+	}
+
+	mp = app_fw_flower->pf_pktmbuf_pool;
+
+	/* Add Rx/Tx functions */
+	eth_dev->dev_ops = &nfp_flower_pf_vnic_ops;
+
+	/* PF vNIC gets a random MAC */
+	eth_dev->data->mac_addrs = rte_zmalloc("mac_addr", RTE_ETHER_ADDR_LEN, 0);
+	if (eth_dev->data->mac_addrs == NULL) {
+		PMD_INIT_LOG(ERR, "Could not allocate mac addr");
+		ret = -ENOMEM;
+		goto mempool_cleanup;
+	}
+
+	rte_eth_random_addr(eth_dev->data->mac_addrs->addr_bytes);
+	rte_eth_dev_probing_finish(eth_dev);
+
+	/* Configure the PF device now */
+	n_rxq = hw->max_rx_queues;
+	n_txq = hw->max_tx_queues;
+	memcpy(&eth_dev->data->dev_conf, &port_conf, sizeof(struct rte_eth_conf));
+	eth_dev->data->rx_queues = rte_zmalloc("ethdev->rx_queues",
+		sizeof(eth_dev->data->rx_queues[0]) * n_rxq, RTE_CACHE_LINE_SIZE);
+	if (eth_dev->data->rx_queues == NULL) {
+		PMD_INIT_LOG(ERR, "rte_zmalloc failed for PF vNIC rx queues");
+		ret = -ENOMEM;
+		goto mac_cleanup;
+	}
+
+	eth_dev->data->tx_queues = rte_zmalloc("ethdev->tx_queues",
+		sizeof(eth_dev->data->tx_queues[0]) * n_txq, RTE_CACHE_LINE_SIZE);
+	if (eth_dev->data->tx_queues == NULL) {
+		PMD_INIT_LOG(ERR, "rte_zmalloc failed for PF vNIC tx queues");
+		ret = -ENOMEM;
+		goto rx_queue_free;
+	}
+
+	/* Fill in some of the eth_dev fields */
+	eth_dev->device = &pf_dev->pci_dev->device;
+	eth_dev->data->nb_tx_queues = n_rxq;
+	eth_dev->data->nb_rx_queues = n_txq;
+	eth_dev->data->dev_private = hw;
+	eth_dev->data->dev_configured = 1;
+
+	/* Set up the Rx queues */
+	for (i = 0; i < n_rxq; i++) {
+		ret = nfp_net_rx_queue_setup(eth_dev, i, PF_VNIC_NB_DESC, numa_node,
+				&rx_conf, mp);
+		if (ret != 0) {
+			PMD_INIT_LOG(ERR, "Configure flower PF vNIC Rx queue %d failed", i);
+			goto rx_queue_cleanup;
+		}
+	}
+
+	/* Set up the Tx queues */
+	for (i = 0; i < n_txq; i++) {
+		ret = nfp_net_nfd3_tx_queue_setup(eth_dev, i, PF_VNIC_NB_DESC, numa_node,
+				&tx_conf);
+		if (ret != 0) {
+			PMD_INIT_LOG(ERR, "Configure flower PF vNIC Tx queue %d failed", i);
+			goto tx_queue_cleanup;
+		}
+	}
+
+	return 0;
+
+tx_queue_cleanup:
+	for (i = 0; i < n_txq; i++)
+		nfp_net_tx_queue_release(eth_dev, i);
+rx_queue_cleanup:
+	for (i = 0; i < n_rxq; i++)
+		nfp_net_rx_queue_release(eth_dev, i);
+	rte_free(eth_dev->data->tx_queues);
+rx_queue_free:
+	rte_free(eth_dev->data->rx_queues);
+mac_cleanup:
+	rte_free(eth_dev->data->mac_addrs);
+mempool_cleanup:
+	rte_mempool_free(mp);
+port_release:
+	rte_eth_dev_release_port(hw->eth_dev);
+
+	return ret;
+}
+
+__rte_unused static void
+nfp_flower_cleanup_pf_vnic(struct nfp_net_hw *hw)
+{
+	uint16_t i;
+	struct nfp_app_fw_flower *app_fw_flower;
+
+	app_fw_flower = NFP_PRIV_TO_APP_FW_FLOWER(hw->pf_dev->app_fw_priv);
+
+	for (i = 0; i < hw->max_tx_queues; i++)
+		nfp_net_tx_queue_release(hw->eth_dev, i);
+
+	for (i = 0; i < hw->max_tx_queues; i++)
+		nfp_net_rx_queue_release(hw->eth_dev, i);
+
+	rte_free(hw->eth_dev->data->tx_queues);
+	rte_free(hw->eth_dev->data->rx_queues);
+	rte_free(hw->eth_dev->data->mac_addrs);
+	rte_mempool_free(app_fw_flower->pf_pktmbuf_pool);
+	rte_eth_dev_release_port(hw->eth_dev);
+}
+
 int
 nfp_init_app_fw_flower(struct nfp_pf_dev *pf_dev)
 {
+	int ret;
 	unsigned int numa_node;
+	struct nfp_net_hw *pf_hw;
 	struct nfp_app_fw_flower *app_fw_flower;
 
 	numa_node = rte_socket_id();
@@ -34,12 +334,75 @@ 
 
 	pf_dev->app_fw_priv = app_fw_flower;
 
+	/* Allocate memory for the PF AND ctrl vNIC here (hence the * 2) */
+	pf_hw = rte_zmalloc_socket("nfp_pf_vnic", 2 * sizeof(struct nfp_net_adapter),
+			RTE_CACHE_LINE_SIZE, numa_node);
+	if (pf_hw == NULL) {
+		PMD_INIT_LOG(ERR, "Could not malloc nfp pf vnic");
+		ret = -ENOMEM;
+		goto app_cleanup;
+	}
+
+	/* Grab the number of physical ports present on hardware */
+	app_fw_flower->nfp_eth_table = nfp_eth_read_ports(pf_dev->cpp);
+	if (app_fw_flower->nfp_eth_table == NULL) {
+		PMD_INIT_LOG(ERR, "error reading nfp ethernet table");
+		ret = -EIO;
+		goto vnic_cleanup;
+	}
+
+	/* Map the PF ctrl bar */
+	pf_dev->ctrl_bar = nfp_rtsym_map(pf_dev->sym_tbl, "_pf0_net_bar0",
+			32768, &pf_dev->ctrl_area);
+	if (pf_dev->ctrl_bar == NULL) {
+		PMD_INIT_LOG(ERR, "Cloud not map the PF vNIC ctrl bar");
+		ret = -ENODEV;
+		goto eth_tbl_cleanup;
+	}
+
+	/* Fill in the PF vNIC and populate app struct */
+	app_fw_flower->pf_hw = pf_hw;
+	pf_hw->ctrl_bar = pf_dev->ctrl_bar;
+	pf_hw->pf_dev = pf_dev;
+	pf_hw->cpp = pf_dev->cpp;
+
+	ret = nfp_flower_init_pf_vnic(app_fw_flower->pf_hw);
+	if (ret != 0) {
+		PMD_INIT_LOG(ERR, "Could not initialize flower PF vNIC");
+		goto pf_cpp_area_cleanup;
+	}
+
 	return 0;
+
+pf_cpp_area_cleanup:
+	nfp_cpp_area_free(pf_dev->ctrl_area);
+eth_tbl_cleanup:
+	free(app_fw_flower->nfp_eth_table);
+vnic_cleanup:
+	rte_free(pf_hw);
+app_cleanup:
+	rte_free(app_fw_flower);
+
+	return ret;
 }
 
 int
-nfp_secondary_init_app_fw_flower(__rte_unused struct nfp_cpp *cpp)
+nfp_secondary_init_app_fw_flower(struct nfp_cpp *cpp)
 {
-	PMD_INIT_LOG(ERR, "Flower firmware not supported");
-	return -ENOTSUP;
+	struct rte_eth_dev *eth_dev;
+	const char *port_name = "pf_vnic_eth_dev";
+
+	PMD_INIT_LOG(DEBUG, "Secondary attaching to port %s", port_name);
+
+	eth_dev = rte_eth_dev_attach_secondary(port_name);
+	if (eth_dev == NULL) {
+		PMD_INIT_LOG(ERR, "Secondary process attach to port %s failed", port_name);
+		return -ENODEV;
+	}
+
+	eth_dev->process_private = cpp;
+	eth_dev->dev_ops = &nfp_flower_pf_vnic_ops;
+	rte_eth_dev_probing_finish(eth_dev);
+
+	return 0;
 }
diff --git a/drivers/net/nfp/flower/nfp_flower.h b/drivers/net/nfp/flower/nfp_flower.h
index 8b9ef95..981d88d 100644
--- a/drivers/net/nfp/flower/nfp_flower.h
+++ b/drivers/net/nfp/flower/nfp_flower.h
@@ -8,6 +8,14 @@ 
 
 /* The flower application's private structure */
 struct nfp_app_fw_flower {
+	/* Pointer to a mempool for the PF vNIC */
+	struct rte_mempool *pf_pktmbuf_pool;
+
+	/* Pointer to the PF vNIC */
+	struct nfp_net_hw *pf_hw;
+
+	/* the eth table as reported by firmware */
+	struct nfp_eth_table *nfp_eth_table;
 };
 
 int nfp_init_app_fw_flower(struct nfp_pf_dev *pf_dev);
diff --git a/drivers/net/nfp/nfp_common.h b/drivers/net/nfp/nfp_common.h
index cefe717..aa6fdd4 100644
--- a/drivers/net/nfp/nfp_common.h
+++ b/drivers/net/nfp/nfp_common.h
@@ -446,6 +446,9 @@  int nfp_net_rss_hash_conf_get(struct rte_eth_dev *dev,
 #define NFP_PRIV_TO_APP_FW_NIC(app_fw_priv)\
 	((struct nfp_app_fw_nic *)app_fw_priv)
 
+#define NFP_PRIV_TO_APP_FW_FLOWER(app_fw_priv)\
+	((struct nfp_app_fw_flower *)app_fw_priv)
+
 #endif /* _NFP_COMMON_H_ */
 /*
  * Local variables: