[dpdk-dev,v2] doc: add Vector FM10K introductions

Message ID 1454741289-2965-1-git-send-email-jing.d.chen@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Chen, Jing D Feb. 6, 2016, 6:48 a.m. UTC
  From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

Add introductions on how to enable Vector FM10K Rx/Tx functions,
the preconditions and assumptions on Rx/Tx configuration parameters.
The new content also lists the limitations of vector, so app/customer
can do better to select best Rx/Tx functions.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
---
v2:
 - rebase to latest repo
 - Reword a few sentences that not follow coding style.

 doc/guides/nics/fm10k.rst |   98 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 98 insertions(+), 0 deletions(-)
  

Comments

John McNamara Feb. 22, 2016, 1:47 p.m. UTC | #1
> -----Original Message-----

> From: Chen, Jing D

> Sent: Saturday, February 6, 2016 6:48 AM

> To: dev@dpdk.org

> Cc: Mcnamara, John <john.mcnamara@intel.com>; Chen, Jing D

> <jing.d.chen@intel.com>

> Subject: [PATCH v2] doc: add Vector FM10K introductions

> 

> From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

> 

> Add introductions on how to enable Vector FM10K Rx/Tx functions, the

> preconditions and assumptions on Rx/Tx configuration parameters.


Hi Mark,

Thanks for the update. A few minor comments below.



> +Vector PMD (vPMD) uses Intel® SIMD instructions to optimize packet I/O.

> +It improves load/store bandwidth efficiency of L1 data cache by using a

> +wider SSE/AVX register 1 (1).


This should probably be "register (1)"


> +The wider register gives space to hold multiple packet buffers so as to

> +save instruction number when processing bulk of packets.


Maybe a little clearer as:

The wider register gives space to hold multiple packet buffers so as to save
on the number of instructions when bulk processing packets.


> +

> +There is no change to PMD API. The RX/TX handler are the only two

> +entries for vPMD packet I/O. They are transparently registered at

> +runtime RX/TX execution if all condition checks pass.


s/if all condition checks pass./if all conditions are met./


> +As vPMD is focused on high throughput, it 4 packets at a time.  So it


s/it 4 packets at a time/it processes 4 packets at a time/

John
  
Chen, Jing D Feb. 23, 2016, 7:37 a.m. UTC | #2
Hi, John,

Best Regards,
Mark


> -----Original Message-----

> From: Mcnamara, John

> Sent: Monday, February 22, 2016 9:47 PM

> To: Chen, Jing D; dev@dpdk.org

> Subject: RE: [PATCH v2] doc: add Vector FM10K introductions

> 

> > -----Original Message-----

> > From: Chen, Jing D

> > Sent: Saturday, February 6, 2016 6:48 AM

> > To: dev@dpdk.org

> > Cc: Mcnamara, John <john.mcnamara@intel.com>; Chen, Jing D

> > <jing.d.chen@intel.com>

> > Subject: [PATCH v2] doc: add Vector FM10K introductions

> >

> > From: "Chen Jing D(Mark)" <jing.d.chen@intel.com>

> >

> > Add introductions on how to enable Vector FM10K Rx/Tx functions, the

> > preconditions and assumptions on Rx/Tx configuration parameters.

> 

> Hi Mark,

> 

> Thanks for the update. A few minor comments below.

> 

> 

> 

> > +Vector PMD (vPMD) uses Intel® SIMD instructions to optimize packet I/O.

> > +It improves load/store bandwidth efficiency of L1 data cache by using a

> > +wider SSE/AVX register 1 (1).

> 

> This should probably be "register (1)"

> 

> 

> > +The wider register gives space to hold multiple packet buffers so as to

> > +save instruction number when processing bulk of packets.

> 

> Maybe a little clearer as:

> 

> The wider register gives space to hold multiple packet buffers so as to save

> on the number of instructions when bulk processing packets.

> 

> 

> > +

> > +There is no change to PMD API. The RX/TX handler are the only two

> > +entries for vPMD packet I/O. They are transparently registered at

> > +runtime RX/TX execution if all condition checks pass.

> 

> s/if all condition checks pass./if all conditions are met./

> 

> 

> > +As vPMD is focused on high throughput, it 4 packets at a time.  So it

> 

> s/it 4 packets at a time/it processes 4 packets at a time/

> 

> John


Many thanks for the comments. I'll change and send a new version soon.
  

Patch

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
index 4206b7f..a502ffd 100644
--- a/doc/guides/nics/fm10k.rst
+++ b/doc/guides/nics/fm10k.rst
@@ -35,6 +35,104 @@  The FM10K poll mode driver library provides support for the Intel FM10000
 (FM10K) family of 40GbE/100GbE adapters.
 
 
+Vector PMD for FM10K
+--------------------
+
+Vector PMD (vPMD) uses Intel® SIMD instructions to optimize packet I/O.
+It improves load/store bandwidth efficiency of L1 data cache by using a wider
+SSE/AVX register 1 (1).
+The wider register gives space to hold multiple packet buffers so as to save
+instruction number when processing bulk of packets.
+
+There is no change to PMD API. The RX/TX handler are the only two entries for
+vPMD packet I/O. They are transparently registered at runtime RX/TX execution
+if all condition checks pass.
+
+1.  To date, only an SSE version of FM10K vPMD is available.
+    To ensure that vPMD is in the binary code, ensure that the option
+    CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y is in the configure file.
+
+Some constraints apply as pre-conditions for specific optimizations on bulk
+packet transfers. The following sections explain RX and TX constraints in the
+vPMD.
+
+
+RX Constraints
+~~~~~~~~~~~~~~
+
+
+Prerequisites and Pre-conditions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For Vector RX it is assumed that the number of descriptor ring will be power
+of 2. With this pre-condition, the ring pointer can easily scroll back to the
+head after hitting the tail without conditional check. In addition Vector RX
+can use this assumption to do a bit mask using ``ring_size - 1``.
+
+
+Features not Supported by Vector RX PMD
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some features are not supported when trying to increase the throughput in
+vPMD. They are:
+
+*   IEEE1588
+
+*   Flow director
+
+*   Header split
+
+*   RX checksum offload
+
+Other features are supported using optional MACRO configuration. They include:
+
+*   HW VLAN strip
+
+*   L3/L4 packet type
+
+To enabled by RX_OLFLAGS use ``RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y``.
+
+To guarantee the constraint, the configuration flags in ``dev_conf.rxmode``
+will be checked:
+
+*   ``hw_vlan_extend``
+
+*   ``hw_ip_checksum``
+
+*   ``header_split``
+
+*   ``fdir_conf->mode``
+
+
+RX Burst Size
+^^^^^^^^^^^^^
+
+As vPMD is focused on high throughput, it 4 packets at a time.  So it assumes
+that the RX burst should be greater than 4 per burst. It returns zero if using
+``nb_pkt`` < 4 in the receive handler. If ``nb_pkt`` is not multiple of 4, a
+floor alignment will be applied.
+
+
+TX Constraint
+~~~~~~~~~~~~~
+
+Features not Supported by TX Vector PMD
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+TX vPMD only works when ``txq_flags`` is set to ``FM10K_SIMPLE_TX_FLAG``.
+This means that it does not support TX multi-segment, VLAN offload or TX csum
+offload. The following MACROs are used for these three features:
+
+*   ``ETH_TXQ_FLAGS_NOMULTSEGS``
+
+*   ``ETH_TXQ_FLAGS_NOVLANOFFL``
+
+*   ``ETH_TXQ_FLAGS_NOXSUMSCTP``
+
+*   ``ETH_TXQ_FLAGS_NOXSUMUDP``
+
+*   ``ETH_TXQ_FLAGS_NOXSUMTCP``
+
 Limitations
 -----------