From patchwork Tue Oct 17 16:59:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srikanth Yalavarthi X-Patchwork-Id: 132802 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 125064318E; Tue, 17 Oct 2023 19:05:52 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E6B6D42EC4; Tue, 17 Oct 2023 19:00:51 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 7BF4342E03 for ; Tue, 17 Oct 2023 19:00:20 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 39HCUCba018496 for ; Tue, 17 Oct 2023 10:00:20 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=tSeKlERJbmx7XiEr9D8Hf1xz+5PgXXf5sn8umuhsB1U=; b=d6O2FFTm5btPyn6dbCcAPOEQjHJ3jIyi5C0hn5NYTj8RE3Qbt80zFA+4kpl3/F2Dy9IX ZBgU6dyG1o7n6MfO8XwCaU7kzPOWPm075StRo/1exglo74aCCPgUtb/7DPpQgphAYn3N UFT1cVFdOWvg91QN2+W57yaFW7RFg+s8HEOMYza5FvFOVFw1v0qmhvQ17N+vyq+i/flO QQt9GMg4TPGbafwhiv3eV+inr5xLjfr+/SpHrItqF6eJe3+EdkAdSMB55kohMKNiJvEI ZKOpWFcON8ewsnLhBkTKgwKj5gp5zoHfxXPLRcDLAzSOBwsLLa4PZjQghVXsIVs59G1h Zg== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 3tstb3s9ky-20 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Tue, 17 Oct 2023 10:00:19 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Tue, 17 Oct 2023 10:00:11 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Tue, 17 Oct 2023 10:00:11 -0700 Received: from ml-host-33.caveonetworks.com (unknown [10.110.143.233]) by maili.marvell.com (Postfix) with ESMTP id 75F915B6942; Tue, 17 Oct 2023 10:00:11 -0700 (PDT) From: Srikanth Yalavarthi To: Srikanth Yalavarthi CC: , , , Subject: [PATCH v4 33/34] ml/cnxk: enable fast-path ops for TVM models Date: Tue, 17 Oct 2023 09:59:46 -0700 Message-ID: <20231017165951.27299-34-syalavarthi@marvell.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231017165951.27299-1-syalavarthi@marvell.com> References: <20230830155927.3566-1-syalavarthi@marvell.com> <20231017165951.27299-1-syalavarthi@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: NcYbZ2ZVlZAqfYhb1vsZx9czEFXWLF1e X-Proofpoint-GUID: NcYbZ2ZVlZAqfYhb1vsZx9czEFXWLF1e X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-10-17_03,2023-10-17_01,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Anup Prabhu Enable fast-path ops support for TVM models. Models would use TVMDP library function calls to execute inference operations for Hybrid and LLVM model sub-types. For TVM MRVL model subtypes that have a single MRVL layer, the inference requests are directly enqueued to hardware by the driver. Signed-off-by: Anup Prabhu Signed-off-by: Srikanth Yalavarthi --- doc/guides/rel_notes/release_23_11.rst | 4 + drivers/ml/cnxk/cn10k_ml_ops.c | 4 - drivers/ml/cnxk/cnxk_ml_io.h | 6 ++ drivers/ml/cnxk/cnxk_ml_ops.c | 4 + drivers/ml/cnxk/cnxk_ml_ops.h | 5 + drivers/ml/cnxk/mvtvm_ml_model.c | 20 ++++ drivers/ml/cnxk/mvtvm_ml_model.h | 6 ++ drivers/ml/cnxk/mvtvm_ml_ops.c | 124 +++++++++++++++++++++++++ drivers/ml/cnxk/mvtvm_ml_ops.h | 43 +++++++++ 9 files changed, 212 insertions(+), 4 deletions(-) diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst index 8701350b2e..ba4d162287 100644 --- a/doc/guides/rel_notes/release_23_11.rst +++ b/doc/guides/rel_notes/release_23_11.rst @@ -28,6 +28,10 @@ New Features Added support in mldev library for models with multiple inputs and outputs. + * **Added support for Marvell TVM models in ML CNXK driver.** + + Added support for models compiled using TVM framework in ML CNXK driver. + .. This section should contain new features added in this release. Sample format: diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 01b0a44caa..b9d30278c6 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -371,10 +371,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c else cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf; - cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst; - cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst; - cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get; - return 0; } diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h index 5de166c252..6d5d25a7c9 100644 --- a/drivers/ml/cnxk/cnxk_ml_io.h +++ b/drivers/ml/cnxk/cnxk_ml_io.h @@ -47,6 +47,12 @@ struct cnxk_ml_io { /* Scale */ float scale; + + /* Dequantized offset */ + uint32_t offset_d; + + /* Quantized offset */ + uint32_t offset_q; }; /* Model / Layer IO structure */ diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c index fd2c46ac1f..608e9fc4ca 100644 --- a/drivers/ml/cnxk/cnxk_ml_ops.c +++ b/drivers/ml/cnxk/cnxk_ml_ops.c @@ -632,6 +632,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co cnxk_mldev->max_nb_layers = cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models; + cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst; + cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst; + cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get; + /* Allocate and initialize index_map */ if (cnxk_mldev->index_map == NULL) { cnxk_mldev->index_map = diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h index ab32676b3e..7b49793a57 100644 --- a/drivers/ml/cnxk/cnxk_ml_ops.h +++ b/drivers/ml/cnxk/cnxk_ml_ops.h @@ -24,6 +24,11 @@ struct cnxk_ml_req { union { /* CN10K */ struct cn10k_ml_req cn10k_req; + +#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM + /* MVTVM */ + struct mvtvm_ml_req mvtvm_req; +#endif }; /* Address of status field */ diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c index 4c12f584d5..1dfd0d176a 100644 --- a/drivers/ml/cnxk/mvtvm_ml_model.c +++ b/drivers/ml/cnxk/mvtvm_ml_model.c @@ -198,6 +198,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model) model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d; model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q; + model->mvtvm.info.input[i].offset_d = model->mvtvm.info.total_input_sz_d; + model->mvtvm.info.input[i].offset_q = model->mvtvm.info.total_input_sz_q; + + model->mvtvm.input_tensor[i].device = metadata->input[i].device; + model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim; + model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype; + model->mvtvm.input_tensor[i].shape = metadata->input[i].shape; + model->mvtvm.input_tensor[i].strides = NULL; + model->mvtvm.input_tensor[i].byte_offset = model->mvtvm.info.input[i].offset_q; + plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i, model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q); } @@ -231,6 +241,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model) model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d; model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q; + model->mvtvm.info.output[i].offset_d = model->mvtvm.info.total_output_sz_d; + model->mvtvm.info.output[i].offset_q = model->mvtvm.info.total_output_sz_q; + + model->mvtvm.output_tensor[i].device = metadata->output[i].device; + model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim; + model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype; + model->mvtvm.output_tensor[i].shape = metadata->output[i].shape; + model->mvtvm.output_tensor[i].strides = NULL; + model->mvtvm.output_tensor[i].byte_offset = model->mvtvm.info.output[i].offset_q; + plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i, model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q); } diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h index 66c3af18e1..7ffce38094 100644 --- a/drivers/ml/cnxk/mvtvm_ml_model.h +++ b/drivers/ml/cnxk/mvtvm_ml_model.h @@ -69,6 +69,12 @@ struct mvtvm_ml_model_data { /* Stats for burst ops */ struct mvtvm_ml_model_xstats *burst_xstats; + + /* Input Tensor */ + DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT]; + + /* Output Tensor */ + DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT]; }; enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params); diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c index 776675843a..1e74b82a0a 100644 --- a/drivers/ml/cnxk/mvtvm_ml_ops.c +++ b/drivers/ml/cnxk/mvtvm_ml_ops.c @@ -19,6 +19,12 @@ /* ML model macros */ #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz" +__rte_hot static void +mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req) +{ + req->status = &req->mvtvm_req.status; +} + void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, uint16_t stat_id, uint16_t entry, char *suffix) @@ -242,6 +248,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params * callback->tvmrt_free = cn10k_ml_free; callback->tvmrt_quantize = mvtvm_ml_io_quantize; callback->tvmrt_dequantize = mvtvm_ml_io_dequantize; + callback->tvmrt_inference = cn10k_ml_inference_sync; } else { callback = NULL; } @@ -285,6 +292,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params * model->mvtvm.burst_xstats[qp_id].dequeued_count = 0; } + /* Set model specific fast path functions */ + if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) { + model->enqueue_single = cn10k_ml_enqueue_single; + model->result_update = cn10k_ml_result_update; + model->set_error_code = cn10k_ml_set_error_code; + model->set_poll_addr = cn10k_ml_set_poll_addr; + } else { + model->enqueue_single = mvtvm_ml_enqueue_single; + model->result_update = mvtvm_ml_result_update; + model->set_error_code = mvtvm_ml_set_error_code; + model->set_poll_addr = mvtvm_ml_set_poll_addr; + } + return 0; error: @@ -495,3 +515,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, return 0; } + +static int +mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req) +{ + uint8_t i; + + rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor, + model->mvtvm.metadata.model.num_input * sizeof(DLTensor)); + for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) { + req->mvtvm_req.input_tensor[i].data = op->input[i]->addr; + req->mvtvm_req.input_tensor[i].byte_offset = 0; + } + + rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor, + model->mvtvm.metadata.model.num_output * sizeof(DLTensor)); + for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) { + req->mvtvm_req.output_tensor[i].data = op->output[i]->addr; + req->mvtvm_req.output_tensor[i].byte_offset = 0; + } + + tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input, + req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output, + req->mvtvm_req.output_tensor, &req->mvtvm_req.result, + &req->mvtvm_req.status); + + plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status); + + return 0; +} + +__rte_hot void +mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype) +{ + RTE_SET_USED(stype); + + req->mvtvm_req.result.error_code = etype; +} + +__rte_hot bool +mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id, + struct cnxk_ml_qp *qp, uint64_t head) +{ + struct cnxk_ml_model *model; + struct cnxk_ml_queue *queue; + struct cnxk_ml_req *req; + + RTE_SET_USED(layer_id); + + queue = &qp->queue; + req = &queue->reqs[head]; + model = cnxk_mldev->mldev->data->models[op->model_id]; + + model->set_poll_addr(req); + memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result)); + req->mvtvm_req.result.error_code = 0x0; + req->mvtvm_req.result.user_ptr = op->user_ptr; + + cnxk_ml_set_poll_ptr(req); + mvtvm_ml_model_run(model, op, req); + req->timeout = plt_tsc_cycles() + queue->wait_cycles; + req->op = op; + + return true; +} + +__rte_hot void +mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request) +{ + struct mvtvm_ml_model_xstats *xstats; + struct mvtvm_ml_result *result; + struct cnxk_ml_model *model; + struct cnxk_ml_req *req; + uint64_t tvm_rt_latency; + struct cnxk_ml_qp *qp; + struct rte_ml_op *op; + + req = (struct cnxk_ml_req *)request; + result = &req->mvtvm_req.result; + op = req->op; + qp = cnxk_mldev->mldev->data->queue_pairs[qp_id]; + op->impl_opaque = result->error_code; + + if (likely(result->error_code == 0)) { + qp->stats.dequeued_count++; + op->status = RTE_ML_OP_STATUS_SUCCESS; + + model = cnxk_mldev->mldev->data->models[op->model_id]; + xstats = &model->mvtvm.burst_xstats[qp_id]; + + if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) { + xstats->tvm_rt_latency_min = UINT64_MAX; + xstats->tvm_rt_latency_max = 0; + } + tvm_rt_latency = result->stats.end_ns - result->stats.start_ns; + xstats->tvm_rt_latency = tvm_rt_latency; + xstats->tvm_rt_latency_tot += tvm_rt_latency; + xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency); + xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency); + xstats->dequeued_count++; + } else { + qp->stats.dequeue_err_count++; + op->status = RTE_ML_OP_STATUS_ERROR; + } +} diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h index 4cabe30a82..cb4b219743 100644 --- a/drivers/ml/cnxk/mvtvm_ml_ops.h +++ b/drivers/ml/cnxk/mvtvm_ml_ops.h @@ -16,6 +16,44 @@ struct cnxk_ml_dev; struct cnxk_ml_model; struct cnxk_ml_layer; +struct cnxk_ml_qp; +struct cnxk_ml_req; + +/* Inference stats */ +struct mvtvm_ml_stats { + /* Start ns */ + uint64_t start_ns; + + /* Start ns */ + uint64_t end_ns; +}; + +/* Result structure */ +struct mvtvm_ml_result { + /* Job error code */ + uint64_t error_code; + + /* Inference stats */ + struct mvtvm_ml_stats stats; + + /* User context pointer */ + void *user_ptr; +}; + +/* MVTVM specific request */ +struct mvtvm_ml_req { + /* Input tensors */ + DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT]; + + /* Output tensors */ + DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT]; + + /* Status field for poll mode requests */ + volatile uint64_t status; + + /* Result */ + struct mvtvm_ml_result result; +}; int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf); int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev); @@ -29,6 +67,11 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer, const DLTensor **deq_tensor); +__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, + uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head); +__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request); +__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype); + void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, uint16_t stat_id, uint16_t entry, char *suffix); uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,