[v4,00/34] Implementation of revised ml/cnxk driver

Message ID 20231017165951.27299-1-syalavarthi@marvell.com (mailing list archive)
Headers
Series Implementation of revised ml/cnxk driver |

Message

Srikanth Yalavarthi Oct. 17, 2023, 4:59 p.m. UTC
  This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v4:
  - Squashed release notes
  - Updated external build dependency info in documentation

v3:
  - Reduced use of RTE_MLDEV_CNXK_ENABLE_MVTVM macro
  - Added stubs file with dummy functions to use when TVM is disabled
  - Dropped patch with internal function to read firmware
  - Updated ML CNXK PMD documentation
  - Added external library dependency info in documentation
  - Added release notes for 23.11

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (2):
  ml/cnxk: enable OCM check for multilayer TVM model
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (30):
  ml/cnxk: drop support for register polling
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 doc/guides/mldevs/cnxk.rst             |  111 +-
 doc/guides/rel_notes/release_23_11.rst |    4 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  416 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h         |  457 +---
 drivers/ml/cnxk/cn10k_ml_model.c       |  401 ++--
 drivers/ml/cnxk/cn10k_ml_model.h       |  151 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  111 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2828 ++++++++----------------
 drivers/ml/cnxk/cn10k_ml_ops.h         |  358 ++-
 drivers/ml/cnxk/cnxk_ml_dev.c          |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h          |  120 +
 drivers/ml/cnxk/cnxk_ml_io.c           |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h           |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c        |   94 +
 drivers/ml/cnxk/cnxk_ml_model.h        |  192 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          | 1690 ++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h          |   87 +
 drivers/ml/cnxk/cnxk_ml_utils.c        |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h        |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h       |  152 ++
 drivers/ml/cnxk/meson.build            |   79 +
 drivers/ml/cnxk/mvtvm_ml_dev.c         |  196 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h         |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  392 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   90 +
 drivers/ml/cnxk/mvtvm_ml_ops.c         |  652 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |   82 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c       |  141 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h       |   36 +
 30 files changed, 6173 insertions(+), 2959 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h
  

Comments

Jerin Jacob Oct. 18, 2023, 1:56 a.m. UTC | #1
On Tue, Oct 17, 2023 at 10:30 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> This patch series is an implementation of revised ml/cnxk driver
> to support models compiled with TVM compiler framework. TVM models
> use a hybrid mode for execution, with regions of the model executing
> on the ML accelerator and the rest executing on CPU cores.
>
> This series of commits reorganizes the ml/cnxk driver and adds support
> to execute multiple regions with-in a TVM model.
>

Found following build error (may be due to gcc13)

ml/cnxk: enable OCM check for multilayer TVM model

[2389/2660] Compiling C object
drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o
FAILED: drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o
ccache gcc -Idrivers/libtmp_rte_ml_cnxk.a.p -Idrivers -I../drivers
-Idrivers/ml/cnxk -I../drivers/ml/cnxk -Ilib/mldev -I../lib/mldev -I.
-I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include
-Ilib/eal/linux/include -I../lib/eal/l
inux/include -Ilib/eal/x86/include -I../lib/eal/x86/include
-Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal
-Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics
-I../lib/metrics -Ilib/telemetry -I../lib/telemetry -I
lib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/mbuf
-I../lib/mbuf -Idrivers/common/cnxk -I../drivers/common/cnxk
-Idrivers/bus/pci -I../drivers/bus/pci -Ilib/net -I../lib/net
-Ilib/ethdev -I../lib/ethdev -Ilib/meter -I../lib/me
ter -Ilib/pci -I../lib/pci -I../drivers/bus/pci/linux -Ilib/security
-I../lib/security -Ilib/cryptodev -I../lib/cryptodev -Ilib/rcu
-I../lib/rcu -Ilib/hash -I../lib/hash -fdiagnostics-color=always
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch
-Wextra -Werror -std=c11 -O2 -g -include rte_config.h -Wcast-qual
-Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security
-Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare
 -Wstrict-prototypes -Wundef -Wwrite-strings
-Wno-address-of-packed-member -Wno-packed-not-aligned
-Wno-missing-field-initializers -Wno-zero-length-bounds -D_GNU_SOURCE
-fPIC -march=native -mrtm -DALLOW_EXPERIMENTAL_API
-DALLOW_INTERNAL_API
 -Wno-format-truncation -DCNXK_ML_DEV_DEBUG
-DRTE_LOG_DEFAULT_LOGTYPE=pmd.ml.cnxk -MD -MQ
drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o -MF
drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o.d -o
drivers/libtmp_rte_ml_cnxk.a.p/
ml_cnxk_cnxk_ml_ops.c.o -c ../drivers/ml/cnxk/cnxk_ml_ops.c
../drivers/ml/cnxk/cnxk_ml_ops.c: In function ‘cnxk_ml_model_load’:
../drivers/ml/cnxk/cnxk_ml_ops.c:527:18: error: ‘struct cnxk_ml_model’
has no member named ‘type’
  527 |         if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
      |                  ^~
../drivers/ml/cnxk/cnxk_ml_ops.c:527:28: error:
‘ML_CNXK_MODEL_TYPE_GLOW’ undeclared (first use in this function); did
you mean ‘ML_CNXK_MODEL_STATE_LOADED’?
  527 |         if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
      |                            ^~~~~~~~~~~~~~~~~~~~~~~
      |                            ML_CNXK_MODEL_STATE_LOADED
../drivers/ml/cnxk/cnxk_ml_ops.c:527:28: note: each undeclared
identifier is reported only once for each function it appears in
../drivers/ml/cnxk/cnxk_ml_ops.c:549:26: error: ‘struct cnxk_ml_model’
has no member named ‘type’
  549 |                 if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
      |                          ^~
../drivers/ml/cnxk/cnxk_ml_ops.c:568:26: error: ‘struct cnxk_ml_model’
has no member named ‘type’
  568 |                 if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
      |                          ^~
[2390/2660] Generating drivers/rte_bus_dpaa.sym_chk with a custom
command (wrapped by meson to capture output)
[2391/2660] Generating drivers/rte_bus_fslmc.sym_chk with a custom
command (wrapped by meson to capture output)
[2392/2660] Generating lib/pipeline.sym_chk with a custom command
(wrapped by meson to capture output)
[2393/2660] Generating lib/ethdev.sym_chk with a custom command
(wrapped by meson to capture output)
[2394/2660] Generating lib/eal.sym_chk with a custom command (wrapped
by meson to capture output)
[2395/2660] Generating drivers/rte_common_sfc_efx.sym_chk with a
custom command (wrapped by meson to capture output)
[2396/2660] Generating drivers/rte_common_cnxk.sym_chk with a custom
command (wrapped by meson to capture output)
ninja: build stopped: subcommand failed.
  
Srikanth Yalavarthi Oct. 18, 2023, 6:55 a.m. UTC | #2
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 18 October 2023 07:26
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>;
> Prince Takkar <ptakkar@marvell.com>
> Subject: [EXT] Re: [PATCH v4 00/34] Implementation of revised ml/cnxk
> driver
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Tue, Oct 17, 2023 at 10:30 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > This patch series is an implementation of revised ml/cnxk driver to
> > support models compiled with TVM compiler framework. TVM models use
> a
> > hybrid mode for execution, with regions of the model executing on the
> > ML accelerator and the rest executing on CPU cores.
> >
> > This series of commits reorganizes the ml/cnxk driver and adds support
> > to execute multiple regions with-in a TVM model.
> >
> 
> Found following build error (may be due to gcc13)

Issue is due to order of the patches in the series. This patch is incorrectly ordered in the series and should be applied after
"ml/cnxk: add structures to support TVM model type"

I have fixed this and tested building all patches. No issues observed now.

Submitted v5 with required changes

> 
> ml/cnxk: enable OCM check for multilayer TVM model
> 
> [2389/2660] Compiling C object
> drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o
> FAILED: drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o
> ccache gcc -Idrivers/libtmp_rte_ml_cnxk.a.p -Idrivers -I../drivers -
> Idrivers/ml/cnxk -I../drivers/ml/cnxk -Ilib/mldev -I../lib/mldev -I.
> -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -
> Ilib/eal/linux/include -I../lib/eal/l inux/include -Ilib/eal/x86/include -
> I../lib/eal/x86/include -Ilib/eal/common -I../lib/eal/common -Ilib/eal -
> I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics -
> I../lib/metrics -Ilib/telemetry -I../lib/telemetry -I lib/mempool -
> I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/mbuf -I../lib/mbuf -
> Idrivers/common/cnxk -I../drivers/common/cnxk -Idrivers/bus/pci -
> I../drivers/bus/pci -Ilib/net -I../lib/net -Ilib/ethdev -I../lib/ethdev -Ilib/meter
> -I../lib/me ter -Ilib/pci -I../lib/pci -I../drivers/bus/pci/linux -Ilib/security -
> I../lib/security -Ilib/cryptodev -I../lib/cryptodev -Ilib/rcu -I../lib/rcu -Ilib/hash
> -I../lib/hash -fdiagnostics-color=always
> -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Werror -std=c11 -O2 -
> g -include rte_config.h -Wcast-qual -Wdeprecated -Wformat -Wformat-
> nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -
> Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare  -
> Wstrict-prototypes -Wundef -Wwrite-strings -Wno-address-of-packed-
> member -Wno-packed-not-aligned -Wno-missing-field-initializers -Wno-
> zero-length-bounds -D_GNU_SOURCE -fPIC -march=native -mrtm -
> DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API  -Wno-format-
> truncation -DCNXK_ML_DEV_DEBUG -
> DRTE_LOG_DEFAULT_LOGTYPE=pmd.ml.cnxk -MD -MQ
> drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o -MF
> drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o.d -o
> drivers/libtmp_rte_ml_cnxk.a.p/ ml_cnxk_cnxk_ml_ops.c.o -c
> ../drivers/ml/cnxk/cnxk_ml_ops.c
> ../drivers/ml/cnxk/cnxk_ml_ops.c: In function ‘cnxk_ml_model_load’:
> ../drivers/ml/cnxk/cnxk_ml_ops.c:527:18: error: ‘struct cnxk_ml_model’
> has no member named ‘type’
>   527 |         if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
>       |                  ^~
> ../drivers/ml/cnxk/cnxk_ml_ops.c:527:28: error:
> ‘ML_CNXK_MODEL_TYPE_GLOW’ undeclared (first use in this function); did
> you mean ‘ML_CNXK_MODEL_STATE_LOADED’?
>   527 |         if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
>       |                            ^~~~~~~~~~~~~~~~~~~~~~~
>       |                            ML_CNXK_MODEL_STATE_LOADED
> ../drivers/ml/cnxk/cnxk_ml_ops.c:527:28: note: each undeclared identifier is
> reported only once for each function it appears in
> ../drivers/ml/cnxk/cnxk_ml_ops.c:549:26: error: ‘struct cnxk_ml_model’
> has no member named ‘type’
>   549 |                 if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
>       |                          ^~
> ../drivers/ml/cnxk/cnxk_ml_ops.c:568:26: error: ‘struct cnxk_ml_model’
> has no member named ‘type’
>   568 |                 if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
>       |                          ^~
> [2390/2660] Generating drivers/rte_bus_dpaa.sym_chk with a custom
> command (wrapped by meson to capture output) [2391/2660] Generating
> drivers/rte_bus_fslmc.sym_chk with a custom command (wrapped by
> meson to capture output) [2392/2660] Generating lib/pipeline.sym_chk with
> a custom command (wrapped by meson to capture output) [2393/2660]
> Generating lib/ethdev.sym_chk with a custom command (wrapped by
> meson to capture output) [2394/2660] Generating lib/eal.sym_chk with a
> custom command (wrapped by meson to capture output) [2395/2660]
> Generating drivers/rte_common_sfc_efx.sym_chk with a custom command
> (wrapped by meson to capture output) [2396/2660] Generating
> drivers/rte_common_cnxk.sym_chk with a custom command (wrapped by
> meson to capture output)
> ninja: build stopped: subcommand failed.