From patchwork Tue Feb 7 16:07:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srikanth Yalavarthi X-Patchwork-Id: 123349 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B84FE41C30; Tue, 7 Feb 2023 17:11:36 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B140942FE4; Tue, 7 Feb 2023 17:08:03 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 946A842D4B for ; Tue, 7 Feb 2023 17:07:35 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 317EgVU2017206 for ; Tue, 7 Feb 2023 08:07:34 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=pfpt0220; bh=SNlo/B/Fr4NurgRt6uk+X2NTsDUOoQMnvS0P0IaUOJI=; b=N4N9wU+OThlMl8fDv50gxUUaFeb0HXfryIhCGuQEhsO9Splvvfbv90hI1EWu0lzAMHEI vFoU5iY+VXMj7zeyBaiBbU8OqzEN4EuLOIjiQQzd2jcXmofZl0EIeMETfvPy6jaEPPSK a52M6vWu2uT3wR2/uQbggsEt5TOfgTYjA22zUcVbE1TjNh64LuX/ZEDI7rWMgkOnB7xn QJzpkALliGXzH6703TCSCNYplPyILbEUUdKOAROXB1wm8u0z3kNZvSULfm37gHYAKU+I iWbfTEkwl7HHU9TgPLL2aW8EQ2PumyZexS9hO2QdEh8YUE5rNtmkk18S2JdUadAPrDJA +g== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3nkdyrssx7-20 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Tue, 07 Feb 2023 08:07:34 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Tue, 7 Feb 2023 08:07:32 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.42 via Frontend Transport; Tue, 7 Feb 2023 08:07:32 -0800 Received: from ml-host-33.caveonetworks.com (unknown [10.110.143.233]) by maili.marvell.com (Postfix) with ESMTP id A39F23F708C; Tue, 7 Feb 2023 08:07:32 -0800 (PST) From: Srikanth Yalavarthi To: Srikanth Yalavarthi CC: , , , , , Subject: [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page Date: Tue, 7 Feb 2023 08:07:19 -0800 Message-ID: <20230207160719.1307-40-syalavarthi@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207160719.1307-1-syalavarthi@marvell.com> References: <20221208200220.20267-1-syalavarthi@marvell.com> <20230207160719.1307-1-syalavarthi@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: sJGuKQs5TBkUOjG9PsaUy99yqOx1G2kO X-Proofpoint-GUID: sJGuKQs5TBkUOjG9PsaUy99yqOx1G2kO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-07_07,2023-02-06_03,2022-06-22_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Enabled support for configurable OCM page size. A new device argument "ocm_page_size" is added to specify the page size for OCM management. Supported page sizes are 1KB, 2KB, 4KB, 8KB and 16KB. Default page size is 16KB. Signed-off-by: Srikanth Yalavarthi Acked-by: Shivah Shankar S Acked-by: Prince Takkar --- doc/guides/mldevs/cnxk.rst | 16 +++++++++ drivers/ml/cnxk/cn10k_ml_dev.c | 61 ++++++++++++++++++++++++++++---- drivers/ml/cnxk/cn10k_ml_dev.h | 3 ++ drivers/ml/cnxk/cn10k_ml_model.c | 6 ++-- drivers/ml/cnxk/cn10k_ml_ocm.c | 18 +++++++--- drivers/ml/cnxk/cn10k_ml_ocm.h | 14 +++----- drivers/ml/cnxk/cn10k_ml_ops.c | 17 ++++++--- 7 files changed, 107 insertions(+), 28 deletions(-) diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst index da40336299..f7f61e8bfa 100644 --- a/doc/guides/mldevs/cnxk.rst +++ b/doc/guides/mldevs/cnxk.rst @@ -175,6 +175,22 @@ Runtime Config Options With the above configuration, ML cnxk driver is configured to use ML registers for polling in fastpath requests. +- ``OCM page size`` (default ``16384``) + + Option to specify the page size in bytes to be used for OCM management. Available + OCM is split into multiple pages of specified sizes and the pages are allocated to + the models. The parameter ``ocm_page_size`` ``devargs`` is used to specify the page + size to be used. + + Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB. Default + page size is 16 KB. + + For example:: + + -a 0000:00:10.0,ocm_page_size=8192 + + With the above configuration, page size of OCM is set to 8192 bytes / 8 KB. + Debugging Options ----------------- diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c index a746a66849..6f9a1015a6 100644 --- a/drivers/ml/cnxk/cn10k_ml_dev.c +++ b/drivers/ml/cnxk/cn10k_ml_dev.c @@ -24,6 +24,7 @@ #define CN10K_ML_OCM_ALLOC_MODE "ocm_alloc_mode" #define CN10K_ML_DEV_HW_QUEUE_LOCK "hw_queue_lock" #define CN10K_ML_FW_POLL_MEM "poll_mem" +#define CN10K_ML_OCM_PAGE_SIZE "ocm_page_size" #define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin" #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1 @@ -32,6 +33,7 @@ #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT "lowest" #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT 1 #define CN10K_ML_FW_POLL_MEM_DEFAULT "ddr" +#define CN10K_ML_OCM_PAGE_SIZE_DEFAULT 16384 /* ML firmware macros */ #define FW_MEMZONE_NAME "ml_cn10k_fw_mz" @@ -53,8 +55,12 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_OCM_ALLOC_MODE, CN10K_ML_DEV_HW_QUEUE_LOCK, CN10K_ML_FW_POLL_MEM, + CN10K_ML_OCM_PAGE_SIZE, NULL}; +/* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */ +static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384}; + /* Dummy operations for ML device */ struct rte_ml_dev_ops ml_dev_dummy_ops = {0}; @@ -95,12 +101,15 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde struct rte_kvargs *kvlist = NULL; bool ocm_alloc_mode_set = false; bool hw_queue_lock_set = false; + bool ocm_page_size_set = false; char *ocm_alloc_mode = NULL; bool poll_mem_set = false; bool fw_path_set = false; char *poll_mem = NULL; char *fw_path = NULL; int ret = 0; + bool found; + uint8_t i; if (devargs == NULL) goto check_args; @@ -191,6 +200,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde poll_mem_set = true; } + if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) { + ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg, + &mldev->ocm_page_size); + if (ret < 0) { + plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE); + ret = -EINVAL; + goto exit; + } + ocm_page_size_set = true; + } + check_args: if (!fw_path_set) mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT; @@ -272,6 +292,32 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde } plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem); + if (!ocm_page_size_set) { + mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT; + } else { + if (mldev->ocm_page_size < 0) { + plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE, + mldev->ocm_page_size); + ret = -EINVAL; + goto exit; + } + + found = false; + for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) { + if (mldev->ocm_page_size == valid_ocm_page_size[i]) { + found = true; + break; + } + } + + if (!found) { + plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size); + ret = -EINVAL; + goto exit; + } + } + plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size); + exit: if (kvlist) rte_kvargs_free(kvlist); @@ -814,10 +860,11 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd); RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table); RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci"); -RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, - CN10K_ML_FW_PATH "=" CN10K_ML_FW_ENABLE_DPE_WARNINGS - "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS - "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA - "=<0|1>" CN10K_ML_OCM_ALLOC_MODE - "=" CN10K_ML_DEV_HW_QUEUE_LOCK - "=<0|1>" CN10K_ML_FW_POLL_MEM "="); +RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH + "=" CN10K_ML_FW_ENABLE_DPE_WARNINGS + "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS + "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA + "=<0|1>" CN10K_ML_OCM_ALLOC_MODE + "=" CN10K_ML_DEV_HW_QUEUE_LOCK + "=<0|1>" CN10K_ML_FW_POLL_MEM "=" CN10K_ML_OCM_PAGE_SIZE + "=<1024|2048|4096|8192|16384>"); diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h index 966d92e027..b4e46899c0 100644 --- a/drivers/ml/cnxk/cn10k_ml_dev.h +++ b/drivers/ml/cnxk/cn10k_ml_dev.h @@ -406,6 +406,9 @@ struct cn10k_ml_dev { /* Use spinlock version of ROC enqueue */ int hw_queue_lock; + /* OCM page size */ + int ocm_page_size; + /* JCMD enqueue function handler */ bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd); diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 0ded355d81..ceffde8459 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -339,11 +339,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui scratch_size, *scratch_pages); /* Check if the model can be loaded on OCM */ - if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) { + if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) { plt_err("Cannot create the model, OCM relocatable = %u", metadata->model.ocm_relocatable); plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages, - ML_CN10K_OCM_NUMPAGES); + mldev->ocm.num_pages); return -ENOMEM; } @@ -352,7 +352,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui */ if (!metadata->model.ocm_relocatable) *scratch_pages = - PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES)); + PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages)); return 0; } diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c index 551faef7eb..d8d2c71a3c 100644 --- a/drivers/ml/cnxk/cn10k_ml_ocm.c +++ b/drivers/ml/cnxk/cn10k_ml_ocm.c @@ -220,13 +220,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w struct cn10k_ml_dev *mldev; struct cn10k_ml_ocm *ocm; - uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0}; uint16_t used_scratch_pages_max; uint16_t scratch_page_start; int used_last_wb_page_max; uint16_t scratch_page_end; uint8_t search_start_tile; uint8_t search_end_tile; + uint8_t *local_ocm_mask; int wb_page_start_curr; int max_slot_sz_curr; uint8_t tile_start; @@ -268,6 +268,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w search_end_tile = start_tile; } + /* nibbles + prefix '0x' */ + local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE); + tile_start = search_start_tile; start_search: used_scratch_pages_max = 0; @@ -279,7 +282,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max); } - memset(local_ocm_mask, 0, sizeof(local_ocm_mask)); + memset(local_ocm_mask, 0, mldev->ocm.mask_words); for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) { for (word_id = 0; word_id < ocm->mask_words; word_id++) local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id]; @@ -332,6 +335,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w if (wb_page_start != -1) *tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx); + rte_free(local_ocm_mask); + return wb_page_start; } @@ -480,7 +485,7 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp) { - char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */ + char *str; struct cn10k_ml_dev *mldev; struct cn10k_ml_ocm *ocm; uint8_t tile_id; @@ -490,12 +495,15 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp) mldev = dev->data->dev_private; ocm = &mldev->ocm; + /* nibbles + prefix '0x' */ + str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE); + fprintf(fp, "OCM State:\n"); for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) { cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str); wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages; - for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++) + for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++) wb_pages += __builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]); @@ -506,4 +514,6 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp) tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages, ocm->tile_ocm_info[tile_id].last_wb_page, str); } + + rte_free(str); } diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h index 5f018b410a..3404e7fd65 100644 --- a/drivers/ml/cnxk/cn10k_ml_ocm.h +++ b/drivers/ml/cnxk/cn10k_ml_ocm.h @@ -8,25 +8,16 @@ #include #include -/* Page size in bytes. */ -#define ML_CN10K_OCM_PAGESIZE 0x4000 - /* Number of OCM tiles. */ #define ML_CN10K_OCM_NUMTILES 0x8 /* OCM in bytes, per tile. */ #define ML_CN10K_OCM_TILESIZE 0x100000 -/* OCM pages, per tile. */ -#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE) - -/* Maximum OCM mask words, per tile, 8 bit words. */ -#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8) - /* OCM and Tile information structure */ struct cn10k_ml_ocm_tile_info { /* Mask of used / allotted pages on tile's OCM */ - uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS]; + uint8_t *ocm_mask; /* Last pages in the tile's OCM used for weights and bias, default = -1 */ int last_wb_page; @@ -78,6 +69,9 @@ struct cn10k_ml_ocm { /* OCM memory info and status*/ struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES]; + + /* Memory for ocm_mask */ + uint8_t *ocm_mask; }; int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end); diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 61e6d023c5..5b77e47322 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -311,8 +311,8 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) if (model->state == ML_CN10K_MODEL_STATE_STARTED) { fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask", ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask); - fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start", - model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE); + fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start", + model->model_mem_map.wb_page_start * mldev->ocm.page_size); } fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input); @@ -781,12 +781,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c ocm = &mldev->ocm; ocm->num_tiles = ML_CN10K_OCM_NUMTILES; ocm->size_per_tile = ML_CN10K_OCM_TILESIZE; - ocm->page_size = ML_CN10K_OCM_PAGESIZE; + ocm->page_size = mldev->ocm_page_size; ocm->num_pages = ocm->size_per_tile / ocm->page_size; ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t)); - for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) + /* Allocate memory for ocm_mask */ + ocm->ocm_mask = + rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE); + + for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) { + ocm->tile_ocm_info[tile_id].ocm_mask = ocm->ocm_mask + tile_id * ocm->mask_words; ocm->tile_ocm_info[tile_id].last_wb_page = -1; + } rte_spinlock_init(&ocm->lock); @@ -856,6 +862,9 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev) mldev = dev->data->dev_private; + /* Release ocm_mask memory */ + rte_free(mldev->ocm.ocm_mask); + /* Stop and unload all models */ for (model_id = 0; model_id < dev->data->nb_models; model_id++) { model = dev->data->models[model_id];