List cover comments

GET /api/covers/42498/comments/?format=api&order=date
HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Link: 
<http://patches.dpdk.org/api/covers/42498/comments/?format=api&order=date&page=1>; rel="first",
<http://patches.dpdk.org/api/covers/42498/comments/?format=api&order=date&page=1>; rel="last"
Vary: Accept
[ { "id": 83804, "web_url": "http://patches.dpdk.org/comment/83804/", "msgid": "<cca8de29-b578-eacc-14c3-b01253885d6c@intel.com>", "list_archive_url": "https://inbox.dpdk.org/dev/cca8de29-b578-eacc-14c3-b01253885d6c@intel.com", "date": "2018-07-13T17:10:40", "subject": "Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "submitter": { "id": 4, "url": "http://patches.dpdk.org/api/people/4/?format=api", "name": "Burakov, Anatoly", "email": "anatoly.burakov@intel.com" }, "content": "On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:\n> This is a proposal to enable using externally allocated memory\n> in DPDK.\n> \n> In a nutshell, here is what is being done here:\n> \n> - Index malloc heaps by NUMA node index, rather than NUMA node itself\n> - Add identifier string to malloc heap, to uniquely identify it\n> - Allow creating named heaps and add/remove memory to/from those heaps\n> - Allocate memseg lists at runtime, to keep track of IOVA addresses\n> of externally allocated memory\n> - If IOVA addresses aren't provided, use RTE_BAD_IOVA\n> - Allow malloc and memzones to allocate from named heaps\n> \n> The responsibility to ensure memory is accessible before using it is\n> on the shoulders of the user - there is no checking done with regards\n> to validity of the memory (nor could there be...).\n> \n> The following limitations are present:\n> \n> - No multiprocess support\n> - No thread safety\n> \n> There is currently no way to allocate memory during initialization\n> stage, so even if multiprocess support is added, it is not guaranteed\n> to work because of underlying issues with mapping fbarrays in\n> secondary processes. This is not an issue in single process scenario,\n> but it may be an issue in a multiprocess scenario in case where\n> primary doesn't intend to share the externally allocated memory, yet\n> adding such memory could fail because some other process failed to\n> attach to this shared memory when it wasn't needed.\n> \n> Anatoly Burakov (11):\n> mem: allow memseg lists to be marked as external\n> eal: add function to rerieve socket index by socket ID\n> malloc: index heaps using heap ID rather than NUMA node\n> malloc: add name to malloc heaps\n> malloc: enable retrieving statistics from named heaps\n> malloc: enable allocating from named heaps\n> malloc: enable creating new malloc heaps\n> malloc: allow adding memory to named heaps\n> malloc: allow removing memory from named heaps\n> malloc: allow destroying heaps\n> memzone: enable reserving memory from named heaps\n> \n> config/common_base | 1 +\n> lib/librte_eal/common/eal_common_lcore.c | 15 +\n> lib/librte_eal/common/eal_common_memory.c | 51 +++-\n> lib/librte_eal/common/eal_common_memzone.c | 283 ++++++++++++++----\n> .../common/include/rte_eal_memconfig.h | 5 +-\n> lib/librte_eal/common/include/rte_lcore.h | 19 +-\n> lib/librte_eal/common/include/rte_malloc.h | 158 +++++++++-\n> .../common/include/rte_malloc_heap.h | 2 +\n> lib/librte_eal/common/include/rte_memzone.h | 183 +++++++++++\n> lib/librte_eal/common/malloc_heap.c | 277 +++++++++++++++--\n> lib/librte_eal/common/malloc_heap.h | 26 ++\n> lib/librte_eal/common/rte_malloc.c | 197 +++++++++++-\n> lib/librte_eal/rte_eal_version.map | 10 +\n> 13 files changed, 1118 insertions(+), 109 deletions(-)\n> \n\nSo, now that the RFC is out, i would like to ask a general question.\n\nOne other thing that this patchset is missing, is the ability for data \nstructures (e.g. hash, mempool, etc.) to be allocated from external \nheaps. Currently, we can kinda sorta do that with various _init() API's \n(initializing a data structure over already allocated memzone), but this \nis not ideal and is a hassle for anyone using external memory in DPDK.\n\nThere are basically four ways to approach this problem (that i can see).\n\nFirst way is to change \"socket ID\" to mean \"heap ID\" everywhere. This \nhas an upside of having a consistent API to allocate from internal and \nexternal heaps, with little to no API additions, only internal changes \nto account for the fact that \"socket ID\" is now \"heap ID\".\n\nHowever, there is a massive downside to this approach: it is a *giant* \nAPI change, and it's also a giant *ABI-compatible* API change. Meaning, \nreplacing socket ID with heap ID will not cause compile failures for old \ncode, which would result in many subtle bugs in already existing \ncodebases. So, while in the perfect world this would've been my \npreferred approach, realistically i think this is a very, very bad idea.\n\nSecond one is to add a separate \"heap name\" API's to everything. This \nhas an upside of clean separation between allocation from internal and \nexternal heaps. (well, whether it's an upside is debatable...) This is \nthe approach i expected to take when i was creating this patchset.\n\nThe downside is that we have to add new API's to every library and every \nDPDK data structure, to allow explicit allocation from external heaps. \nWe will have to maintain both, and things like hardware drivers will \nneed to have a way to indicate the need to allocate things from a \nparticular external heap.\n\nThe third way is to expose the \"heap ID\" externally, and allow a single, \nunified API to reserve memory. That is, create an API that would map \neither a NUMA node ID or a heap name to an ID, and allow reserving \nmemory through that ID regardless of whether it's internal or external \nmemory. This would also allow to gradually phase out socket-based ID's \nin favor of heap ID API, should we choose to do so.\n\nThe downside for this is, it adds a layer of indirection between socket \nID and reserving memory on a particular NUMA node, and it makes it hard \nto produce a single value of \"heap ID\" in such a way as to replicate \ncurrent functionality of allocating with SOCKET_ID_ANY. Most likely user \nwill have to explicitly try to allocate on all sockets, unless we keep \nold API's around in parallel.\n\nFinally, a fourth way would be to abuse the socket ID to also mean \nsomething else, which is an approach i've seen numerous times already, \nand one that i don't like. We could register new heaps as a new, fake \nsocket ID, and use that to address external heaps (each heap would get \nits own socket). So, keep current socket ID behavior, but for \nnon-existent sockets it would be possible to be registered as a fake \nsocket pointing to an external heap.\n\nThe upside for this approach would be that no API changes are required \nwhatsoever to existing libraries - this scheme is compatible with both \ninternal and external heaps without adding a separate API.\n\nThe downside is bad semantics - \"special\" sockets, handling of \nSOCKET_ID_ANY, handling of \"invalid socket\" vs. \"invalid socket that \nhappens to correspond to an existing external heap\", and many other \nthings that can be confusing. I don't like this option, but it's an \noption :)\n\nThoughts? Comments?\n\nI myself still favor the second way, however there are good arguments to \nbe made for each of these options.", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [127.0.0.1])\n\tby dpdk.org (Postfix) with ESMTP id E07A92BAE;\n\tFri, 13 Jul 2018 19:10:50 +0200 (CEST)", "from mga06.intel.com (mga06.intel.com [134.134.136.31])\n\tby dpdk.org (Postfix) with ESMTP id D45541C0B\n\tfor <dev@dpdk.org>; Fri, 13 Jul 2018 19:10:48 +0200 (CEST)", "from fmsmga001.fm.intel.com ([10.253.24.23])\n\tby orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t13 Jul 2018 10:10:47 -0700", "from aburakov-mobl.ger.corp.intel.com (HELO [10.252.0.191])\n\t([10.252.0.191])\n\tby fmsmga001.fm.intel.com with ESMTP; 13 Jul 2018 10:10:41 -0700" ], "X-Amp-Result": "SKIPPED(no attachment in message)", "X-Amp-File-Uploaded": "False", "X-ExtLoop1": "1", "X-IronPort-AV": "E=Sophos;i=\"5.51,348,1526367600\"; d=\"scan'208\";a=\"72134412\"", "To": "dev@dpdk.org", "Cc": "srinath.mannam@broadcom.com, scott.branden@broadcom.com,\n\tajit.khaparde@broadcom.com, Thomas Monjalon <thomas@monjalon.net>,\n\tShreyansh Jain <shreyansh.jain@nxp.com>,\n\t\"jerin.jacob@caviumnetworks.com\" <jerin.jacob@caviumnetworks.com>,\n\tKeith Wiles <keith.wiles@intel.com>", "References": "<cover.1530881548.git.anatoly.burakov@intel.com>", "From": "\"Burakov, Anatoly\" <anatoly.burakov@intel.com>", "Message-ID": "<cca8de29-b578-eacc-14c3-b01253885d6c@intel.com>", "Date": "Fri, 13 Jul 2018 18:10:40 +0100", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.9.1", "MIME-Version": "1.0", "In-Reply-To": "<cover.1530881548.git.anatoly.burakov@intel.com>", "Content-Type": "text/plain; charset=utf-8; format=flowed", "Content-Language": "en-US", "Content-Transfer-Encoding": "7bit", "Subject": "Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "DPDK patches and discussions <dev.dpdk.org>", "List-Unsubscribe": "<https://mails.dpdk.org/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://mails.dpdk.org/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<https://mails.dpdk.org/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null }, { "id": 83810, "web_url": "http://patches.dpdk.org/comment/83810/", "msgid": "<08D0040C-78AA-401A-863F-57B38533648F@intel.com>", "list_archive_url": "https://inbox.dpdk.org/dev/08D0040C-78AA-401A-863F-57B38533648F@intel.com", "date": "2018-07-13T17:56:40", "subject": "Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "submitter": { "id": 166, "url": "http://patches.dpdk.org/api/people/166/?format=api", "name": "Wiles, Keith", "email": "keith.wiles@intel.com" }, "content": "> On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:\n> \n> On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:\n>> This is a proposal to enable using externally allocated memory\n>> in DPDK.\n>> In a nutshell, here is what is being done here:\n>> - Index malloc heaps by NUMA node index, rather than NUMA node itself\n>> - Add identifier string to malloc heap, to uniquely identify it\n>> - Allow creating named heaps and add/remove memory to/from those heaps\n>> - Allocate memseg lists at runtime, to keep track of IOVA addresses\n>> of externally allocated memory\n>> - If IOVA addresses aren't provided, use RTE_BAD_IOVA\n>> - Allow malloc and memzones to allocate from named heaps\n>> The responsibility to ensure memory is accessible before using it is\n>> on the shoulders of the user - there is no checking done with regards\n>> to validity of the memory (nor could there be...).\n>> The following limitations are present:\n>> - No multiprocess support\n>> - No thread safety\n>> There is currently no way to allocate memory during initialization\n>> stage, so even if multiprocess support is added, it is not guaranteed\n>> to work because of underlying issues with mapping fbarrays in\n>> secondary processes. This is not an issue in single process scenario,\n>> but it may be an issue in a multiprocess scenario in case where\n>> primary doesn't intend to share the externally allocated memory, yet\n>> adding such memory could fail because some other process failed to\n>> attach to this shared memory when it wasn't needed.\n>> Anatoly Burakov (11):\n>> mem: allow memseg lists to be marked as external\n>> eal: add function to rerieve socket index by socket ID\n>> malloc: index heaps using heap ID rather than NUMA node\n>> malloc: add name to malloc heaps\n>> malloc: enable retrieving statistics from named heaps\n>> malloc: enable allocating from named heaps\n>> malloc: enable creating new malloc heaps\n>> malloc: allow adding memory to named heaps\n>> malloc: allow removing memory from named heaps\n>> malloc: allow destroying heaps\n>> memzone: enable reserving memory from named heaps\n>> config/common_base | 1 +\n>> lib/librte_eal/common/eal_common_lcore.c | 15 +\n>> lib/librte_eal/common/eal_common_memory.c | 51 +++-\n>> lib/librte_eal/common/eal_common_memzone.c | 283 ++++++++++++++----\n>> .../common/include/rte_eal_memconfig.h | 5 +-\n>> lib/librte_eal/common/include/rte_lcore.h | 19 +-\n>> lib/librte_eal/common/include/rte_malloc.h | 158 +++++++++-\n>> .../common/include/rte_malloc_heap.h | 2 +\n>> lib/librte_eal/common/include/rte_memzone.h | 183 +++++++++++\n>> lib/librte_eal/common/malloc_heap.c | 277 +++++++++++++++--\n>> lib/librte_eal/common/malloc_heap.h | 26 ++\n>> lib/librte_eal/common/rte_malloc.c | 197 +++++++++++-\n>> lib/librte_eal/rte_eal_version.map | 10 +\n>> 13 files changed, 1118 insertions(+), 109 deletions(-)\n> \n> So, now that the RFC is out, i would like to ask a general question.\n> \n> One other thing that this patchset is missing, is the ability for data structures (e.g. hash, mempool, etc.) to be allocated from external heaps. Currently, we can kinda sorta do that with various _init() API's (initializing a data structure over already allocated memzone), but this is not ideal and is a hassle for anyone using external memory in DPDK.\n> \n> There are basically four ways to approach this problem (that i can see).\n> \n> First way is to change \"socket ID\" to mean \"heap ID\" everywhere. This has an upside of having a consistent API to allocate from internal and external heaps, with little to no API additions, only internal changes to account for the fact that \"socket ID\" is now \"heap ID\".\n> \n> However, there is a massive downside to this approach: it is a *giant* API change, and it's also a giant *ABI-compatible* API change. Meaning, replacing socket ID with heap ID will not cause compile failures for old code, which would result in many subtle bugs in already existing codebases. So, while in the perfect world this would've been my preferred approach, realistically i think this is a very, very bad idea.\n> \n> Second one is to add a separate \"heap name\" API's to everything. This has an upside of clean separation between allocation from internal and external heaps. (well, whether it's an upside is debatable...) This is the approach i expected to take when i was creating this patchset.\n> \n> The downside is that we have to add new API's to every library and every DPDK data structure, to allow explicit allocation from external heaps. We will have to maintain both, and things like hardware drivers will need to have a way to indicate the need to allocate things from a particular external heap.\n> \n> The third way is to expose the \"heap ID\" externally, and allow a single, unified API to reserve memory. That is, create an API that would map either a NUMA node ID or a heap name to an ID, and allow reserving memory through that ID regardless of whether it's internal or external memory. This would also allow to gradually phase out socket-based ID's in favor of heap ID API, should we choose to do so.\n> \n> The downside for this is, it adds a layer of indirection between socket ID and reserving memory on a particular NUMA node, and it makes it hard to produce a single value of \"heap ID\" in such a way as to replicate current functionality of allocating with SOCKET_ID_ANY. Most likely user will have to explicitly try to allocate on all sockets, unless we keep old API's around in parallel.\n> \n> Finally, a fourth way would be to abuse the socket ID to also mean something else, which is an approach i've seen numerous times already, and one that i don't like. We could register new heaps as a new, fake socket ID, and use that to address external heaps (each heap would get its own socket). So, keep current socket ID behavior, but for non-existent sockets it would be possible to be registered as a fake socket pointing to an external heap.\n> \n> The upside for this approach would be that no API changes are required whatsoever to existing libraries - this scheme is compatible with both internal and external heaps without adding a separate API.\n> \n> The downside is bad semantics - \"special\" sockets, handling of SOCKET_ID_ANY, handling of \"invalid socket\" vs. \"invalid socket that happens to correspond to an existing external heap\", and many other things that can be confusing. I don't like this option, but it's an option :)\n> \n> Thoughts? Comments?\n\n#1 is super clean, but very disruptive to everyone. Very Bad IMO\n#2 is also clean, but adds a lot of new APIs that everyone needs to use or at least in the external heap cases.\n#3 not sure I fully understand it, but reproducing heap IDs for testing is a problem and requires new/old APIs\n\n#4 Very easy to add, IMO it is clean and very small disruption to developers. It does require the special handling, but I feel it is OK and can be explained in the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is normal socket and > 64K is external id.\n\nMy vote would be #4, as it seems the least risk and work. :-)\n\n> \n> I myself still favor the second way, however there are good arguments to be made for each of these options.\n> \n> -- \n> Thanks,\n> Anatoly\n\nRegards,\nKeith", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [127.0.0.1])\n\tby dpdk.org (Postfix) with ESMTP id B9E16292D;\n\tFri, 13 Jul 2018 19:56:46 +0200 (CEST)", "from mga11.intel.com (mga11.intel.com [192.55.52.93])\n\tby dpdk.org (Postfix) with ESMTP id CB70925A1\n\tfor <dev@dpdk.org>; Fri, 13 Jul 2018 19:56:44 +0200 (CEST)", "from orsmga008.jf.intel.com ([10.7.209.65])\n\tby fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t13 Jul 2018 10:56:41 -0700", "from fmsmsx104.amr.corp.intel.com ([10.18.124.202])\n\tby orsmga008.jf.intel.com with ESMTP; 13 Jul 2018 10:56:41 -0700", "from fmsmsx117.amr.corp.intel.com ([169.254.3.3]) by\n\tfmsmsx104.amr.corp.intel.com ([169.254.3.114]) with mapi id\n\t14.03.0319.002; Fri, 13 Jul 2018 10:56:41 -0700" ], "X-Amp-Result": "SKIPPED(no attachment in message)", "X-Amp-File-Uploaded": "False", "X-ExtLoop1": "1", "X-IronPort-AV": "E=Sophos;i=\"5.51,348,1526367600\"; d=\"scan'208\";a=\"56685486\"", "From": "\"Wiles, Keith\" <keith.wiles@intel.com>", "To": "\"Burakov, Anatoly\" <anatoly.burakov@intel.com>", "CC": "\"dev@dpdk.org\" <dev@dpdk.org>, \"srinath.mannam@broadcom.com\"\n\t<srinath.mannam@broadcom.com>, \"scott.branden@broadcom.com\"\n\t<scott.branden@broadcom.com>, \"ajit.khaparde@broadcom.com\"\n\t<ajit.khaparde@broadcom.com>, Thomas Monjalon <thomas@monjalon.net>, \n\tShreyansh Jain <shreyansh.jain@nxp.com>, \"jerin.jacob@caviumnetworks.com\"\n\t<jerin.jacob@caviumnetworks.com>", "Thread-Topic": "[dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "Thread-Index": "AQHUFSvAEXj6WBxy6U2GppVpk3XrVKSN49oAgAAM2AA=", "Date": "Fri, 13 Jul 2018 17:56:40 +0000", "Message-ID": "<08D0040C-78AA-401A-863F-57B38533648F@intel.com>", "References": "<cover.1530881548.git.anatoly.burakov@intel.com>\n\t<cca8de29-b578-eacc-14c3-b01253885d6c@intel.com>", "In-Reply-To": "<cca8de29-b578-eacc-14c3-b01253885d6c@intel.com>", "Accept-Language": "en-US", "Content-Language": "en-US", "X-MS-Has-Attach": "", "X-MS-TNEF-Correlator": "", "x-originating-ip": "[10.252.133.123]", "Content-Type": "text/plain; charset=\"utf-8\"", "Content-ID": "<AF40E9B2AA9E2A42A35B98905391E579@intel.com>", "Content-Transfer-Encoding": "base64", "MIME-Version": "1.0", "Subject": "Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "DPDK patches and discussions <dev.dpdk.org>", "List-Unsubscribe": "<https://mails.dpdk.org/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://mails.dpdk.org/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<https://mails.dpdk.org/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null }, { "id": 83954, "web_url": "http://patches.dpdk.org/comment/83954/", "msgid": "<MWHPR15MB1918FE5BD47581BA2154942692520@MWHPR15MB1918.namprd15.prod.outlook.com>", "list_archive_url": "https://inbox.dpdk.org/dev/MWHPR15MB1918FE5BD47581BA2154942692520@MWHPR15MB1918.namprd15.prod.outlook.com", "date": "2018-07-19T10:58:22", "subject": "Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "submitter": { "id": 1085, "url": "http://patches.dpdk.org/api/people/1085/?format=api", "name": "László Vadkerti", "email": "laszlo.vadkerti@ericsson.com" }, "content": "> On Jul 13, 2018, at 7:57 PM, Wiles, Keith <keith.wiles@intel.com> wrote:\n>\n> > On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:\n> >\n> > On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:\n> >> This is a proposal to enable using externally allocated memory in\n> >> DPDK.\n> >> In a nutshell, here is what is being done here:\n> >> - Index malloc heaps by NUMA node index, rather than NUMA node itself\n> >> - Add identifier string to malloc heap, to uniquely identify it\n> >> - Allow creating named heaps and add/remove memory to/from those\n> >> heaps\n> >> - Allocate memseg lists at runtime, to keep track of IOVA addresses\n> >> of externally allocated memory\n> >> - If IOVA addresses aren't provided, use RTE_BAD_IOVA\n> >> - Allow malloc and memzones to allocate from named heaps The\n> >> responsibility to ensure memory is accessible before using it is on\n> >> the shoulders of the user - there is no checking done with regards to\n> >> validity of the memory (nor could there be...).\n> >> The following limitations are present:\n> >> - No multiprocess support\n> >> - No thread safety\n> >> There is currently no way to allocate memory during initialization\n> >> stage, so even if multiprocess support is added, it is not guaranteed\n> >> to work because of underlying issues with mapping fbarrays in\n> >> secondary processes. This is not an issue in single process scenario,\n> >> but it may be an issue in a multiprocess scenario in case where\n> >> primary doesn't intend to share the externally allocated memory, yet\n> >> adding such memory could fail because some other process failed to\n> >> attach to this shared memory when it wasn't needed.\n> >> Anatoly Burakov (11):\n> >> mem: allow memseg lists to be marked as external\n> >> eal: add function to rerieve socket index by socket ID\n> >> malloc: index heaps using heap ID rather than NUMA node\n> >> malloc: add name to malloc heaps\n> >> malloc: enable retrieving statistics from named heaps\n> >> malloc: enable allocating from named heaps\n> >> malloc: enable creating new malloc heaps\n> >> malloc: allow adding memory to named heaps\n> >> malloc: allow removing memory from named heaps\n> >> malloc: allow destroying heaps\n> >> memzone: enable reserving memory from named heaps\n> >> config/common_base | 1 +\n> >> lib/librte_eal/common/eal_common_lcore.c | 15 +\n> >> lib/librte_eal/common/eal_common_memory.c | 51 +++-\n> >> lib/librte_eal/common/eal_common_memzone.c | 283\n> ++++++++++++++----\n> >> .../common/include/rte_eal_memconfig.h | 5 +-\n> >> lib/librte_eal/common/include/rte_lcore.h | 19 +-\n> >> lib/librte_eal/common/include/rte_malloc.h | 158 +++++++++-\n> >> .../common/include/rte_malloc_heap.h | 2 +\n> >> lib/librte_eal/common/include/rte_memzone.h | 183 +++++++++++\n> >> lib/librte_eal/common/malloc_heap.c | 277 +++++++++++++++--\n> >> lib/librte_eal/common/malloc_heap.h | 26 ++\n> >> lib/librte_eal/common/rte_malloc.c | 197 +++++++++++-\n> >> lib/librte_eal/rte_eal_version.map | 10 +\n> >> 13 files changed, 1118 insertions(+), 109 deletions(-)\n> >\n> > So, now that the RFC is out, i would like to ask a general question.\n> >\n> > One other thing that this patchset is missing, is the ability for data\n> structures (e.g. hash, mempool, etc.) to be allocated from external heaps.\n> Currently, we can kinda sorta do that with various _init() API's (initializing a\n> data structure over already allocated memzone), but this is not ideal and is a\n> hassle for anyone using external memory in DPDK.\n> >\n> > There are basically four ways to approach this problem (that i can see).\n> >\n> > First way is to change \"socket ID\" to mean \"heap ID\" everywhere. This has\n> an upside of having a consistent API to allocate from internal and external\n> heaps, with little to no API additions, only internal changes to account for the\n> fact that \"socket ID\" is now \"heap ID\".\n> >\n> > However, there is a massive downside to this approach: it is a *giant* API\n> change, and it's also a giant *ABI-compatible* API change. Meaning,\n> replacing socket ID with heap ID will not cause compile failures for old code,\n> which would result in many subtle bugs in already existing codebases. So,\n> while in the perfect world this would've been my preferred approach,\n> realistically i think this is a very, very bad idea.\n> >\n> > Second one is to add a separate \"heap name\" API's to everything. This has\n> an upside of clean separation between allocation from internal and external\n> heaps. (well, whether it's an upside is debatable...) This is the approach i\n> expected to take when i was creating this patchset.\n> >\n> > The downside is that we have to add new API's to every library and every\n> DPDK data structure, to allow explicit allocation from external heaps. We will\n> have to maintain both, and things like hardware drivers will need to have a\n> way to indicate the need to allocate things from a particular external heap.\n> >\n> > The third way is to expose the \"heap ID\" externally, and allow a single,\n> unified API to reserve memory. That is, create an API that would map either\n> a NUMA node ID or a heap name to an ID, and allow reserving memory\n> through that ID regardless of whether it's internal or external memory. This\n> would also allow to gradually phase out socket-based ID's in favor of heap ID\n> API, should we choose to do so.\n> >\n> > The downside for this is, it adds a layer of indirection between socket ID\n> and reserving memory on a particular NUMA node, and it makes it hard to\n> produce a single value of \"heap ID\" in such a way as to replicate current\n> functionality of allocating with SOCKET_ID_ANY. Most likely user will have to\n> explicitly try to allocate on all sockets, unless we keep old API's around in\n> parallel.\n> >\n> > Finally, a fourth way would be to abuse the socket ID to also mean\n> something else, which is an approach i've seen numerous times already, and\n> one that i don't like. We could register new heaps as a new, fake socket ID,\n> and use that to address external heaps (each heap would get its own\n> socket). So, keep current socket ID behavior, but for non-existent sockets it\n> would be possible to be registered as a fake socket pointing to an external\n> heap.\n> >\n> > The upside for this approach would be that no API changes are required\n> whatsoever to existing libraries - this scheme is compatible with both internal\n> and external heaps without adding a separate API.\n> >\n> > The downside is bad semantics - \"special\" sockets, handling of\n> > SOCKET_ID_ANY, handling of \"invalid socket\" vs. \"invalid socket that\n> > happens to correspond to an existing external heap\", and many other\n> > things that can be confusing. I don't like this option, but it's an\n> > option :)\n> >\n> > Thoughts? Comments?\n> \n> #1 is super clean, but very disruptive to everyone. Very Bad IMO\n> #2 is also clean, but adds a lot of new APIs that everyone needs to use or at\n> least in the external heap cases.\n> #3 not sure I fully understand it, but reproducing heap IDs for testing is a\n> problem and requires new/old APIs\n> \n> #4 Very easy to add, IMO it is clean and very small disruption to developers.\n> It does require the special handling, but I feel it is OK and can be explained in\n> the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is\n> normal socket and > 64K is external id.\n> \n> My vote would be #4, as it seems the least risk and work. :-)\n> \nWe are living with #4 (overloaded socket_ids) since ~5 years now but it indeed generates some confusion and it is a kind of hack so it may not be the best choice going forward in official releases but for sure is the easiest/simplest solution requiring the least modifications.\nUsing an overloaded socket_id is especially disturbing in the dump memory config printout where the user will see multiple socket ids on a single socket system or more than the available real number of sockets, however it could still be explained in the notes and the documentation.\nThe allocation behavior with SOCKET_ID_ANY is also a question as I think it shouldn’t roll over to allocate memory in the external heap, we especially disabled this feature in our implementation. The reason behind is that the external memory may be a limited resource where only explicit allocation requests would be allowed and also in a multi-process environment we may not want all external heaps to be mapped into all other processes address space meaning that not all heaps are accessible from every process (I’m not sure if it is planned to be supported though but it would be an important and useful feature based on our experiences).\nAnyway I think the confusion with this option comes due to the misleading “socket_id” name which would not really mean socket id anymore. So we should probably just document it as pseudo socket_id and problem solved with #4 :)\n\nThe cleanest solution in my opinion would be #1 which could be combined with #4 so that the physical socket_id could be directly passed as the heap_id (or rather call it allocation id or just location?) so that backward compatibility could also be kept.\nMeaning to apply #1, change “socket_id” to “heap_id” (or “alloc_id”?) in all functions which are today expecting the socket_id to indicate the location of the allocations but keep a direct mapping from socket_id to heap_id, e.g. as Keith suggested lower range of heap_id would be equivalent to the socket_id and upper range would be the external id, this way even existing applications would still work without changing the code just by passing the socket_id in the heap_id parameter. However it is a question what would happen with the socket_id stored in data structures such as struct rte_mempool where socket_id is stored with a meaning “Socket id passed at create.”\nSOCKET_ID_ANY would only mean lower range of heap_ids (physical socket ids) but not the external heap and if needed a new HEAP_ID_ANY could be introduced.\n\nIf changing heap_id to socket_id in existing functions is a big issue then one option would be to keep the original API and introduce new equivalent functions allowing to use the heap_id instead of the socket_id, e.g. rte_mempool_create would have an equivalent function to use with the heap_id instead of the socket_id.\nSocket_id could then be converted to heap_id with a new function which should always be possible and can still use to direct mapping approach with lower/upper range convention.\nThe socket_id based functions would then just be wrappers calling the heap_id equivalent function after converting the socket_id to heap_id.\nUsing socket_id to indicate the location could still be relevant so the old socket_id based functions may not even need to be deprecated unless it would become hard to maintain.\n\nIrrespective of the chosen option, external heaps should be registered/identified by name and there could be a function to fetch/lookup the id (heap_id or pseudo socket_id) by registered heap name which then could be used in the related API calls.\n\nIt would also be another work item to update all the data structures which are storing the socket_id to use it as the location identifier and I think few of them may need to store both the real physical socket_id and the heap_id, e.g. in struct lcore_config where the user may want to know the real physical socket id but want to set specific heap_id as the default allocation location for the given lcore.\n\n> >\n> > I myself still favor the second way, however there are good arguments to\n> be made for each of these options.\n> >\n> > --\n> > Thanks,\n> > Anatoly\n> \n> Regards,\n> Keith\n\nThanks,\n Laszlo", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [127.0.0.1])\n\tby dpdk.org (Postfix) with ESMTP id A9361324B;\n\tThu, 19 Jul 2018 13:06:06 +0200 (CEST)", "from sesbmg23.ericsson.net (sesbmg23.ericsson.net [193.180.251.37])\n\tby dpdk.org (Postfix) with ESMTP id ED94831FC\n\tfor <dev@dpdk.org>; Thu, 19 Jul 2018 13:06:04 +0200 (CEST)", "from ESESSMB501.ericsson.se (Unknown_Domain [153.88.183.119])\n\tby sesbmg23.ericsson.net (Symantec Mail Security) with SMTP id\n\tEE.C6.25360.B90705B5; Thu, 19 Jul 2018 13:06:04 +0200 (CEST)", "from ESESSMB504.ericsson.se (153.88.183.165) by\n\tESESSMB501.ericsson.se (153.88.183.162) with Microsoft SMTP Server\n\t(version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id\n\t15.1.1466.3; Thu, 19 Jul 2018 12:58:28 +0200", "from NAM01-BN3-obe.outbound.protection.outlook.com (153.88.183.157)\n\tby ESESSMB504.ericsson.se (153.88.183.165) with Microsoft SMTP Server\n\t(version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id\n\t15.1.1466.3 via Frontend Transport; Thu, 19 Jul 2018 12:58:28 +0200", "from MWHPR15MB1918.namprd15.prod.outlook.com (10.174.100.143) by\n\tMWHPR15MB1437.namprd15.prod.outlook.com (10.173.234.139) with\n\tMicrosoft SMTP Server (version=TLS1_2,\n\tcipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id\n\t15.20.973.20; Thu, 19 Jul 2018 10:58:22 +0000", "from MWHPR15MB1918.namprd15.prod.outlook.com\n\t([fe80::cd75:9f8:91f7:436d]) by\n\tMWHPR15MB1918.namprd15.prod.outlook.com\n\t([fe80::cd75:9f8:91f7:436d%2]) with mapi id 15.20.0973.018;\n\tThu, 19 Jul 2018 10:58:22 +0000" ], "DKIM-Signature": [ "v=1; a=rsa-sha256; d=ericsson.com; s=mailgw201801;\n\tc=relaxed/simple; q=dns/txt; i=@ericsson.com; t=1531998364;\n\th=From:Sender:Reply-To:Subject:Date:Message-ID:To:CC:MIME-Version:Content-Type:\n\tContent-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From:\n\tResent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id:\n\tList-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;\n\tbh=Bl5xpJfb1LuwqczK+nLMbO5gveGmRuQIfzNznQNFqfs=;\n\tb=LtwmMePiLGsrCtzsLVlLA/vnXWQ/Ut32AaNzOOnMJwS4VEfoVXKq0ZiHwXxVLwnf\n\tQtsybpXo3hIfuwMeeq/bkz9mBHp6a31ZNk0KRWAu3Vvt9e0oPQdXygt8MBvsZh+p\n\tT9Ib9xwtWvFMbaPvl8KUYL9l+RZLHqjW7Hm0t6QBgP8=;", "v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com;\n\ts=selector1;\n\th=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;\n\tbh=Bl5xpJfb1LuwqczK+nLMbO5gveGmRuQIfzNznQNFqfs=;\n\tb=HZ7FxLEqz0N7ic37QQm26d9TJAIZ8tNM9USnL7yHmBGlJL1Noa6pB1wX3m1knzlOclV/iJ87brRIpbRJ9K7pepJvQy4W8bpC8IOuE8+9nV5qYS21SCURVH4QyRbQHeqJ1bW3LJlM+T2pi2vcNDKfwhVCLEZ0Ev0uC62OIx3Z1rk=" ], "X-AuditID": "c1b4fb25-202c69c000006310-38-5b50709b0d2f", "From": "=?utf-8?b?TMOhc3psw7MgVmFka2VydGk=?= <laszlo.vadkerti@ericsson.com>", "To": "\"Wiles, Keith\" <keith.wiles@intel.com>, \"Burakov, Anatoly\"\n\t<anatoly.burakov@intel.com>", "CC": "\"dev@dpdk.org\" <dev@dpdk.org>, \"srinath.mannam@broadcom.com\"\n\t<srinath.mannam@broadcom.com>, \"scott.branden@broadcom.com\"\n\t<scott.branden@broadcom.com>, \"ajit.khaparde@broadcom.com\"\n\t<ajit.khaparde@broadcom.com>, Thomas Monjalon <thomas@monjalon.net>, \n\tShreyansh Jain <shreyansh.jain@nxp.com>, \"jerin.jacob@caviumnetworks.com\"\n\t<jerin.jacob@caviumnetworks.com>", "Thread-Topic": "[dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "Thread-Index": "AQHUFSvEBvnznCHvikSa7AMdqi2eJ6SNboEAgAAM2wCACPStgA==", "Date": "Thu, 19 Jul 2018 10:58:22 +0000", "Message-ID": "<MWHPR15MB1918FE5BD47581BA2154942692520@MWHPR15MB1918.namprd15.prod.outlook.com>", "References": "<cover.1530881548.git.anatoly.burakov@intel.com>\n\t<cca8de29-b578-eacc-14c3-b01253885d6c@intel.com>\n\t<08D0040C-78AA-401A-863F-57B38533648F@intel.com>", "In-Reply-To": "<08D0040C-78AA-401A-863F-57B38533648F@intel.com>", "Accept-Language": "en-US", "Content-Language": "en-US", "X-MS-Has-Attach": "", "X-MS-TNEF-Correlator": "", "authentication-results": "spf=none (sender IP is )\n\tsmtp.mailfrom=laszlo.vadkerti@ericsson.com; ", "x-originating-ip": "[80.99.119.114]", "x-ms-publictraffictype": "Email", "x-microsoft-exchange-diagnostics": "1; MWHPR15MB1437;\n\t6:ofMZHkehgaZ6jziIHX4YOF7rUtxmw9Y7k3zY8rtu8yCTgh7isDJks2nxNbdoYj5dXsggFjILxd/tI21RJz/NgKir8KMx6vNxDE3tktbP7kivScEW1WkA+SMxBDWhma9OH4TWwTY+Nvl4FB7YFiqRWoxomAsBjq2rJAMhAi/Y8IKrmvmdYvj7sJUIsGusGOkDZen+Vre4w0X7UTI+c8Rm9P4M3J/oCewP6Ec0FRjyU43NRf+k73ImyULohEQoEFIdUefArUJrZI/iaYVyCvowI7mMliUiSzhPhSOaYDlN2eWiLJFKjE4Wte0ZHU7eiFZv9OR1VUEhorEJ0tYMjk6oSSbOXsUgq2ZgEWVUQdLLkwfFOXW9tjstl3fizycjbpo3I/YSNAlH0HKMMgXJRzJp2HdbrU0gwbLyVit+07XDpYU8nTbXVR4vsOse7PqjMftg0VOkSmV1ElCvVu3u4zDDkA==;\n\t5:ZfhW1e/ECzYuG7Np+rIQwAN/EKL7lqnzQeLcJLaUOcKIMpu6bzqeL9Rgxxo+Ku0EZRyV2wG/mC5Abg2ZVeU2E4RtTqIM4A8f12aNyZibKbldAQhraTbyvZMn2rIKU+78jHiUTbUSPmSSTY/iONlcpFW2H5OJyr7J1gqBjfYupUA=;\n\t7:3FtlgQWELfmaZL4ZVG1KciWxBdvuzF9EgdzyeZYLw3LPkNdslc4hbadw/oKTyMM96FIqXzE4A7tkwI7z2TvYErEQCoy5AkEfoeLlp0zeVFc51YiAQ3nj2v/NwcWwVJiDxVkvbEO+gJpfEcH/fw2Dax/aanHKHUw6pNwob8ortAYV5pHLgy2fZq96nISwlDpPC1Gsi9oIyLazfRYMZ62rsvcAeRNx6UZwSPdcpr2Gidco4fpxdwHHoXAB3LIp/WBI", "x-ms-exchange-antispam-srfa-diagnostics": "SOS;", "x-ms-office365-filtering-correlation-id": "f7da1984-21c6-4241-7ffa-08d5ed668b6f", "x-microsoft-antispam": "UriScan:; BCL:0; PCL:0;\n\tRULEID:(7020095)(4652040)(8989117)(4534165)(4627221)(201703031133081)(201702281549075)(8990107)(5600053)(711020)(2017052603328)(7153060)(7193020);\n\tSRVR:MWHPR15MB1437; ", "x-ms-traffictypediagnostic": "MWHPR15MB1437:", "x-microsoft-antispam-prvs": "<MWHPR15MB14370957CF60ACA53C04915592520@MWHPR15MB1437.namprd15.prod.outlook.com>", "x-exchange-antispam-report-test": "UriScan:(278428928389397)(228905959029699); ", "x-ms-exchange-senderadcheck": "1", "x-exchange-antispam-report-cfa-test": "BCL:0; PCL:0;\n\tRULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(10201501046)(93006095)(93001095)(3002001)(3231311)(944501410)(52105095)(149027)(150027)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123562045)(20161123558120)(20161123564045)(6072148)(201708071742011)(7699016);\n\tSRVR:MWHPR15MB1437; BCL:0; PCL:0; RULEID:; SRVR:MWHPR15MB1437; ", "x-forefront-prvs": "0738AF4208", "x-forefront-antispam-report": "SFV:NSPM;\n\tSFS:(10009020)(136003)(396003)(366004)(376002)(39860400002)(346002)(189003)(199004)(6116002)(3846002)(8676002)(86362001)(6436002)(256004)(14444005)(5024004)(186003)(81156014)(26005)(81166006)(2900100001)(8936002)(76176011)(9686003)(6506007)(53546011)(561944003)(99286004)(54906003)(33656002)(110136005)(316002)(446003)(11346002)(102836004)(55016002)(229853002)(14454004)(5250100002)(7696005)(478600001)(74316002)(6246003)(4326008)(105586002)(85202003)(106356001)(97736004)(7736002)(85182001)(305945005)(53936002)(5660300001)(68736007)(2906002)(476003)(486006)(25786009)(66066001);\n\tDIR:OUT; SFP:1101; SCL:1; SRVR:MWHPR15MB1437;\n\tH:MWHPR15MB1918.namprd15.prod.outlook.com; FPR:; SPF:None; LANG:en;\n\tPTR:InfoNoRecords; MX:1; A:1; ", "received-spf": "None (protection.outlook.com: ericsson.com does not designate\n\tpermitted sender hosts)", "x-microsoft-antispam-message-info": "VRf6CoH408kfN9nNcP+cTiC7taJujezif2yBI4QMclSjDNKjYXrQs0NqmnlfM0BwcKxaES27ZhT84avZThhIN+P2WYjdpz9MvUyTO+c7RKUdRXns2Z5//ZMwJgVriTfR6GryvT3T7rzAEbf9orjLvTEmLta6/nDzr8jyth2spU+adend8+ZbNfO7bXjT31/2ZwDAqe0jt/TZB+EMuk9HZ9vFOL49EZUkShcmDVMQvTobItDXRVy+7z96YAi67Yve+azofkFJAP1eYVsQR3n/XKG9D2dVMHN/D81vJkUhwd23zr7Dsz9UsBt/wC+E6Fsgz2vVZpGjE84d+xgJyHjEVwye71CZ6Ap/gijyY3j16HY=", "spamdiagnosticoutput": "1:99", "spamdiagnosticmetadata": "NSPM", "Content-Type": "text/plain; charset=\"utf-8\"", "Content-Transfer-Encoding": "base64", "MIME-Version": "1.0", "X-MS-Exchange-CrossTenant-Network-Message-Id": "f7da1984-21c6-4241-7ffa-08d5ed668b6f", "X-MS-Exchange-CrossTenant-originalarrivaltime": "19 Jul 2018 10:58:22.2861\n\t(UTC)", "X-MS-Exchange-CrossTenant-fromentityheader": "Hosted", "X-MS-Exchange-CrossTenant-id": "92e84ceb-fbfd-47ab-be52-080c6b87953f", "X-MS-Exchange-Transport-CrossTenantHeadersStamped": "MWHPR15MB1437", "X-OriginatorOrg": "ericsson.com", "X-Brightmail-Tracker": "H4sIAAAAAAAAA02Sa0hTYRjHec9lOxOXb8vsySxwIKWlZgUd6UJXWGC0yg95qbX0oCsz2THN\n\tCKwPiVqWilmuzJWXwrxnuGyLNEtbZuGFqA/pdGpKahpNtLB2PAZ9+z3///953ud9eRlS0UB7\n\tMrr4RE4fr41TSlyogiMNyf53EtQR6wtuIbZ21kiy/V+KSXZ8qoFgc3I3sT2t0yRbUHeNZucm\n\t0iXs9+xWmp3qa6N2yFSG3ncSVU3bdVo1ayylVcXmEUL1+lO+VFU7biLUknCXrdFcnC6J0wdu\n\tP+4Sa7VPSxJaUs/l3y2TXETGC5mIYQBvgletezORjFHgVwiyPoRlIhcnOxCUVv+RikUJAaau\n\takooKJxNwovJOYno5BEwOlRFif19CL44aIEleB9MVeRKBXbHkTDwcYwQGkg8SYCtqma+YQk+\n\tCIaudloMHYK+j8ULvAt+zlooYT8K+0C+9YAgy51zbj7socWz7iPI6TwksAxvg8qnE6TACHvA\n\ttLWCEJjEy+CzvWieAWMoMb8nRV4KIwNztJjXQPntdiQ+hTdUNXmIkZXQWXQFCSsDfi6FFsso\n\tJRr+8P3GDVLM74c8GytmPiCo6u1YmO8HaYX2hfwpsHR30iKHQNoDIxJ5FZRn2ahsFGT4b1WD\n\tcyyJfaG6MVCUvSHvik1qmL/9YnhTYKeMiCpHS3mOP3E6ZsPGAE6vi+L5M/EB8VxiHXJ+p6b6\n\tXz4m1PVtZzPCDFK6ygePqSMUtDaJTzndjIAhle7yfpVTkkdrU85z+jMa/dk4jm9GKxhKuUxu\n\t2/w4XIFjtIncKY5L4PT/XIKReV5EaZZ0LjLialnyk8zDWzTvNSakXbQy797q5b7T3ob6jMKo\n\to7+3m6B87ETw1zUvh62VuxPc+mfUM4MlqaF2H/eTbbqcR45L9qkM1+7PzFBoyJjFsRaedXh4\n\tXTazSYXmoeGgEHaLq9dzt6x1i1ODjwx49ZRCv2KN156wxl7d2x8mVknxsdogP1LPa/8CPWGm\n\tXEoDAAA=", "Subject": "Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "DPDK patches and discussions <dev.dpdk.org>", "List-Unsubscribe": "<https://mails.dpdk.org/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://mails.dpdk.org/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<https://mails.dpdk.org/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null }, { "id": 84235, "web_url": "http://patches.dpdk.org/comment/84235/", "msgid": "<f39aa913-b764-2c38-c947-90f12d18fcb5@intel.com>", "list_archive_url": "https://inbox.dpdk.org/dev/f39aa913-b764-2c38-c947-90f12d18fcb5@intel.com", "date": "2018-07-26T13:48:27", "subject": "Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "submitter": { "id": 4, "url": "http://patches.dpdk.org/api/people/4/?format=api", "name": "Burakov, Anatoly", "email": "anatoly.burakov@intel.com" }, "content": "On 19-Jul-18 11:58 AM, László Vadkerti wrote:\n>> On Jul 13, 2018, at 7:57 PM, Wiles, Keith <keith.wiles@intel.com> wrote:\n>>\n>>> On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:\n>>>\n>>> On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:\n>>>> This is a proposal to enable using externally allocated memory in\n>>>> DPDK.\n>>>> In a nutshell, here is what is being done here:\n>>>> - Index malloc heaps by NUMA node index, rather than NUMA node itself\n>>>> - Add identifier string to malloc heap, to uniquely identify it\n>>>> - Allow creating named heaps and add/remove memory to/from those\n>>>> heaps\n>>>> - Allocate memseg lists at runtime, to keep track of IOVA addresses\n>>>> of externally allocated memory\n>>>> - If IOVA addresses aren't provided, use RTE_BAD_IOVA\n>>>> - Allow malloc and memzones to allocate from named heaps The\n>>>> responsibility to ensure memory is accessible before using it is on\n>>>> the shoulders of the user - there is no checking done with regards to\n>>>> validity of the memory (nor could there be...).\n>>>> The following limitations are present:\n>>>> - No multiprocess support\n>>>> - No thread safety\n>>>> There is currently no way to allocate memory during initialization\n>>>> stage, so even if multiprocess support is added, it is not guaranteed\n>>>> to work because of underlying issues with mapping fbarrays in\n>>>> secondary processes. This is not an issue in single process scenario,\n>>>> but it may be an issue in a multiprocess scenario in case where\n>>>> primary doesn't intend to share the externally allocated memory, yet\n>>>> adding such memory could fail because some other process failed to\n>>>> attach to this shared memory when it wasn't needed.\n>>>> Anatoly Burakov (11):\n>>>> mem: allow memseg lists to be marked as external\n>>>> eal: add function to rerieve socket index by socket ID\n>>>> malloc: index heaps using heap ID rather than NUMA node\n>>>> malloc: add name to malloc heaps\n>>>> malloc: enable retrieving statistics from named heaps\n>>>> malloc: enable allocating from named heaps\n>>>> malloc: enable creating new malloc heaps\n>>>> malloc: allow adding memory to named heaps\n>>>> malloc: allow removing memory from named heaps\n>>>> malloc: allow destroying heaps\n>>>> memzone: enable reserving memory from named heaps\n>>>> config/common_base | 1 +\n>>>> lib/librte_eal/common/eal_common_lcore.c | 15 +\n>>>> lib/librte_eal/common/eal_common_memory.c | 51 +++-\n>>>> lib/librte_eal/common/eal_common_memzone.c | 283\n>> ++++++++++++++----\n>>>> .../common/include/rte_eal_memconfig.h | 5 +-\n>>>> lib/librte_eal/common/include/rte_lcore.h | 19 +-\n>>>> lib/librte_eal/common/include/rte_malloc.h | 158 +++++++++-\n>>>> .../common/include/rte_malloc_heap.h | 2 +\n>>>> lib/librte_eal/common/include/rte_memzone.h | 183 +++++++++++\n>>>> lib/librte_eal/common/malloc_heap.c | 277 +++++++++++++++--\n>>>> lib/librte_eal/common/malloc_heap.h | 26 ++\n>>>> lib/librte_eal/common/rte_malloc.c | 197 +++++++++++-\n>>>> lib/librte_eal/rte_eal_version.map | 10 +\n>>>> 13 files changed, 1118 insertions(+), 109 deletions(-)\n>>>\n>>> So, now that the RFC is out, i would like to ask a general question.\n>>>\n>>> One other thing that this patchset is missing, is the ability for data\n>> structures (e.g. hash, mempool, etc.) to be allocated from external heaps.\n>> Currently, we can kinda sorta do that with various _init() API's (initializing a\n>> data structure over already allocated memzone), but this is not ideal and is a\n>> hassle for anyone using external memory in DPDK.\n>>>\n>>> There are basically four ways to approach this problem (that i can see).\n>>>\n>>> First way is to change \"socket ID\" to mean \"heap ID\" everywhere. This has\n>> an upside of having a consistent API to allocate from internal and external\n>> heaps, with little to no API additions, only internal changes to account for the\n>> fact that \"socket ID\" is now \"heap ID\".\n>>>\n>>> However, there is a massive downside to this approach: it is a *giant* API\n>> change, and it's also a giant *ABI-compatible* API change. Meaning,\n>> replacing socket ID with heap ID will not cause compile failures for old code,\n>> which would result in many subtle bugs in already existing codebases. So,\n>> while in the perfect world this would've been my preferred approach,\n>> realistically i think this is a very, very bad idea.\n>>>\n>>> Second one is to add a separate \"heap name\" API's to everything. This has\n>> an upside of clean separation between allocation from internal and external\n>> heaps. (well, whether it's an upside is debatable...) This is the approach i\n>> expected to take when i was creating this patchset.\n>>>\n>>> The downside is that we have to add new API's to every library and every\n>> DPDK data structure, to allow explicit allocation from external heaps. We will\n>> have to maintain both, and things like hardware drivers will need to have a\n>> way to indicate the need to allocate things from a particular external heap.\n>>>\n>>> The third way is to expose the \"heap ID\" externally, and allow a single,\n>> unified API to reserve memory. That is, create an API that would map either\n>> a NUMA node ID or a heap name to an ID, and allow reserving memory\n>> through that ID regardless of whether it's internal or external memory. This\n>> would also allow to gradually phase out socket-based ID's in favor of heap ID\n>> API, should we choose to do so.\n>>>\n>>> The downside for this is, it adds a layer of indirection between socket ID\n>> and reserving memory on a particular NUMA node, and it makes it hard to\n>> produce a single value of \"heap ID\" in such a way as to replicate current\n>> functionality of allocating with SOCKET_ID_ANY. Most likely user will have to\n>> explicitly try to allocate on all sockets, unless we keep old API's around in\n>> parallel.\n>>>\n>>> Finally, a fourth way would be to abuse the socket ID to also mean\n>> something else, which is an approach i've seen numerous times already, and\n>> one that i don't like. We could register new heaps as a new, fake socket ID,\n>> and use that to address external heaps (each heap would get its own\n>> socket). So, keep current socket ID behavior, but for non-existent sockets it\n>> would be possible to be registered as a fake socket pointing to an external\n>> heap.\n>>>\n>>> The upside for this approach would be that no API changes are required\n>> whatsoever to existing libraries - this scheme is compatible with both internal\n>> and external heaps without adding a separate API.\n>>>\n>>> The downside is bad semantics - \"special\" sockets, handling of\n>>> SOCKET_ID_ANY, handling of \"invalid socket\" vs. \"invalid socket that\n>>> happens to correspond to an existing external heap\", and many other\n>>> things that can be confusing. I don't like this option, but it's an\n>>> option :)\n>>>\n>>> Thoughts? Comments?\n>>\n>> #1 is super clean, but very disruptive to everyone. Very Bad IMO\n>> #2 is also clean, but adds a lot of new APIs that everyone needs to use or at\n>> least in the external heap cases.\n>> #3 not sure I fully understand it, but reproducing heap IDs for testing is a\n>> problem and requires new/old APIs\n>>\n>> #4 Very easy to add, IMO it is clean and very small disruption to developers.\n>> It does require the special handling, but I feel it is OK and can be explained in\n>> the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is\n>> normal socket and > 64K is external id.\n>>\n>> My vote would be #4, as it seems the least risk and work. :-)\n>>\n> We are living with #4 (overloaded socket_ids) since ~5 years now but it indeed generates some confusion and it is a kind of hack so it may not be the best choice going forward in official releases but for sure is the easiest/simplest solution requiring the least modifications.\n> Using an overloaded socket_id is especially disturbing in the dump memory config printout where the user will see multiple socket ids on a single socket system or more than the available real number of sockets, however it could still be explained in the notes and the documentation.\n> The allocation behavior with SOCKET_ID_ANY is also a question as I think it shouldn’t roll over to allocate memory in the external heap, we especially disabled this feature in our implementation. The reason behind is that the external memory may be a limited resource where only explicit allocation requests would be allowed and also in a multi-process environment we may not want all external heaps to be mapped into all other processes address space meaning that not all heaps are accessible from every process (I’m not sure if it is planned to be supported though but it would be an important and useful feature based on our experiences).\n\nHi Laszlo,\n\nThat depends on what you mean by \"all other processes\". If they are all \npart of the primary-secondary process prefix, then my plan is to enable \nprivate and shared heaps - i.e. a heap is either available to a single \nprocess, or it is available to some or all processes within a prefix.\n\nIt is also not possible to share the same area with different process \nprefixes (i.e. between two different primaries) because each of the \nprocesses will think it owns the entire memory and will do with it as it \npleases. Using the same memory region with two different process \nprefixes will break many assumptions heap has - for example, it relies \non a per-heap lock to control access to the heap, and that will not work \nif you map the same memory area into multiple primary processes. I do \nnot foresee a mechanism to fix this problem within DPDK, but obviously \nif you have any suggestions, they will be considered :)\n\nThe reason we have to care about private vs. shared heaps is because of \nhow DPDK handles memory management. In order for DPDK facilities such as \nrte_mem_virt2iova() or rte_memseg_walk() to work, we need to keep track \nof the pages we use for the heap - i.e. from DPDK's point of view, \nexternal memory behaves just like regular memory and is tracked using \nthe same method of keeping page tables around (see rte_memseg_list).\n\nThese page tables need to be shared between all processes that use a \nspecific heap. This introduces an inherent point of failure - you may \nmmap() the *area itself* successfully at the same address, but you may \nstill fail to *attach to the page tables*, which will cause a particular \nheap to not be available in a process. This is a problem that i do not \nsee a solution for at the moment, and it is something that users \nattempting to use external memory in secondary processes will have to \ndeal with.\n\nI haven't yet decided whether this should be automatic (i.e. shared \nheaps \"automagically\" appearing in all processes) or manual (make the \nuser explicitly attach to an externally allocated heap in each process \nwithin the prefix). I would tend to go for the latter as it gives the \nuser more control, and it is easier to implement because there's no need \nto engage IPC to make this work.\n\n> Anyway I think the confusion with this option comes due to the misleading “socket_id” name which would not really mean socket id anymore. So we should probably just document it as pseudo socket_id and problem solved with #4 :)\n> \n> The cleanest solution in my opinion would be #1 which could be combined with #4 so that the physical socket_id could be directly passed as the heap_id (or rather call it allocation id or just location?) so that backward compatibility could also be kept.\n> Meaning to apply #1, change “socket_id” to “heap_id” (or “alloc_id”?) in all functions which are today expecting the socket_id to indicate the location of the allocations but keep a direct mapping from socket_id to heap_id, e.g. as Keith suggested lower range of heap_id would be equivalent to the socket_id and upper range would be the external id, this way even existing applications would still work without changing the code just by passing the socket_id in the heap_id parameter. However it is a question what would happen with the socket_id stored in data structures such as struct rte_mempool where socket_id is stored with a meaning “Socket id passed at create.”\n> SOCKET_ID_ANY would only mean lower range of heap_ids (physical socket ids) but not the external heap and if needed a new HEAP_ID_ANY could be introduced.\n> \n> If changing heap_id to socket_id in existing functions is a big issue then one option would be to keep the original API and introduce new equivalent functions allowing to use the heap_id instead of the socket_id, e.g. rte_mempool_create would have an equivalent function to use with the heap_id instead of the socket_id.\n> Socket_id could then be converted to heap_id with a new function which should always be possible and can still use to direct mapping approach with lower/upper range convention.\n> The socket_id based functions would then just be wrappers calling the heap_id equivalent function after converting the socket_id to heap_id.\n> Using socket_id to indicate the location could still be relevant so the old socket_id based functions may not even need to be deprecated unless it would become hard to maintain.\n> \n> Irrespective of the chosen option, external heaps should be registered/identified by name and there could be a function to fetch/lookup the id (heap_id or pseudo socket_id) by registered heap name which then could be used in the related API calls.\n\nSo, in other words, the consesus seems to be that we need to stay with \nthe old socket_id and just use weird socket ID's for external heaps. \nOkay, so be it. Less work for me implementing it :)\n\n> \n> It would also be another work item to update all the data structures which are storing the socket_id to use it as the location identifier and I think few of them may need to store both the real physical socket_id and the heap_id, e.g. in struct lcore_config where the user may want to know the real physical socket id but want to set specific heap_id as the default allocation location for the given lcore.\n\nI do not see physical socket ID of externally allocated memory as a \nmatter of concern for DPDK. I think this information should be up to the \nuser application to handle, not DPDK. From my point of view, we \nshouldn't care where the memory came from, we just facilitate using it. \nIf the user chooses to store additional metadata about the memory \nsomewhere else - that is his prerogative, but i don't think having a \nprovision for \"physical socket ID\" etc for external heaps should be in DPDK.\n\n> \n>>>\n>>> I myself still favor the second way, however there are good arguments to\n>> be made for each of these options.\n>>>\n>>> --\n>>> Thanks,\n>>> Anatoly\n>>\n>> Regards,\n>> Keith\n> \n> Thanks,\n> Laszlo\n>", "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@dpdk.org", "Delivered-To": "patchwork@dpdk.org", "Received": [ "from [92.243.14.124] (localhost [127.0.0.1])\n\tby dpdk.org (Postfix) with ESMTP id 1C40C25D9;\n\tThu, 26 Jul 2018 15:48:40 +0200 (CEST)", "from mga14.intel.com (mga14.intel.com [192.55.52.115])\n\tby dpdk.org (Postfix) with ESMTP id AB6E0235\n\tfor <dev@dpdk.org>; Thu, 26 Jul 2018 15:48:37 +0200 (CEST)", "from fmsmga008.fm.intel.com ([10.253.24.58])\n\tby fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t26 Jul 2018 06:48:34 -0700", "from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.86])\n\t([10.237.220.86])\n\tby fmsmga008.fm.intel.com with ESMTP; 26 Jul 2018 06:48:28 -0700" ], "X-Amp-Result": "SKIPPED(no attachment in message)", "X-Amp-File-Uploaded": "False", "X-ExtLoop1": "1", "X-IronPort-AV": "E=Sophos;i=\"5.51,405,1526367600\"; d=\"scan'208\";a=\"58074077\"", "To": "=?utf-8?b?TMOhc3psw7MgVmFka2VydGk=?= <laszlo.vadkerti@ericsson.com>,\n\t\"Wiles, Keith\" <keith.wiles@intel.com>", "Cc": "\"dev@dpdk.org\" <dev@dpdk.org>,\n\t\"srinath.mannam@broadcom.com\" <srinath.mannam@broadcom.com>,\n\t\"scott.branden@broadcom.com\" <scott.branden@broadcom.com>,\n\t\"ajit.khaparde@broadcom.com\" <ajit.khaparde@broadcom.com>,\n\tThomas Monjalon <thomas@monjalon.net>,\n\tShreyansh Jain <shreyansh.jain@nxp.com>,\n\t\"jerin.jacob@caviumnetworks.com\" <jerin.jacob@caviumnetworks.com>", "References": "<cover.1530881548.git.anatoly.burakov@intel.com>\n\t<cca8de29-b578-eacc-14c3-b01253885d6c@intel.com>\n\t<08D0040C-78AA-401A-863F-57B38533648F@intel.com>\n\t<MWHPR15MB1918FE5BD47581BA2154942692520@MWHPR15MB1918.namprd15.prod.outlook.com>", "From": "\"Burakov, Anatoly\" <anatoly.burakov@intel.com>", "Message-ID": "<f39aa913-b764-2c38-c947-90f12d18fcb5@intel.com>", "Date": "Thu, 26 Jul 2018 14:48:27 +0100", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.9.1", "MIME-Version": "1.0", "In-Reply-To": "<MWHPR15MB1918FE5BD47581BA2154942692520@MWHPR15MB1918.namprd15.prod.outlook.com>", "Content-Type": "text/plain; charset=utf-8; format=flowed", "Content-Language": "en-US", "Content-Transfer-Encoding": "8bit", "Subject": "Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in\n\tDPDK", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.15", "Precedence": "list", "List-Id": "DPDK patches and discussions <dev.dpdk.org>", "List-Unsubscribe": "<https://mails.dpdk.org/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://mails.dpdk.org/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<https://mails.dpdk.org/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "addressed": null } ]