[3/4] mem: allow registering external memory areas

Message ID fc9b948318ec0ff265b69c080e70eff50b337723.1543495935.git.anatoly.burakov@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series Allow using external memory without malloc |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Anatoly Burakov Nov. 29, 2018, 1:48 p.m. UTC
  The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.

This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
 lib/librte_eal/common/eal_common_memory.c     | 74 +++++++++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 63 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  2 +
 4 files changed, 189 insertions(+), 10 deletions(-)
  

Comments

Yongseok Koh Dec. 14, 2018, 9:55 a.m. UTC | #1
On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote:
> The general use-case of using external memory is well covered by
> existing external memory API's. However, certain use cases require
> manual management of externally allocated memory areas, so this
> memory should not be added to the heap. It should, however, be
> added to DPDK's internal structures, so that API's like
> ``rte_virt2memseg`` would work on such external memory segments.
> 
> This commit adds such an API to DPDK. The new functions will allow
> to register and unregister externally allocated memory areas, as
> well as documentation for them.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
>  lib/librte_eal/common/eal_common_memory.c     | 74 +++++++++++++++++++
>  lib/librte_eal/common/include/rte_memory.h    | 63 ++++++++++++++++
>  lib/librte_eal/rte_eal_version.map            |  2 +
>  4 files changed, 189 insertions(+), 10 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 8b5d050c7..d7799b626 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
>  Support for Externally Allocated Memory
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> -It is possible to use externally allocated memory in DPDK, using a set of malloc
> -heap API's. Support for externally allocated memory is implemented through
> -overloading the socket ID - externally allocated heaps will have socket ID's
> -that would be considered invalid under normal circumstances. Requesting an
> -allocation to take place from a specified externally allocated memory is a
> -matter of supplying the correct socket ID to DPDK allocator, either directly
> -(e.g. through a call to ``rte_malloc``) or indirectly (through data
> -structure-specific allocation API's such as ``rte_ring_create``).
> +It is possible to use externally allocated memory in DPDK. There are two ways in
> +which using externally allocated memory can work: the malloc heap API's, and
> +manual memory management.
>  
> -Since there is no way DPDK can verify whether memory are is available or valid,
> -this responsibility falls on the shoulders of the user. All multiprocess
> ++ Using heap API's for externally allocated memory
> +
> +Using using a set of malloc heap API's is the recommended way to use externally
> +allocated memory in DPDK. In this way, support for externally allocated memory
> +is implemented through overloading the socket ID - externally allocated heaps
> +will have socket ID's that would be considered invalid under normal
> +circumstances. Requesting an allocation to take place from a specified
> +externally allocated memory is a matter of supplying the correct socket ID to
> +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
> +indirectly (through data structure-specific allocation API's such as
> +``rte_ring_create``). Using these API's also ensures that mapping of externally
> +allocated memory for DMA is also performed on any memory segment that is added
> +to a DPDK malloc heap.
> +
> +Since there is no way DPDK can verify whether memory is available or valid, this
> +responsibility falls on the shoulders of the user. All multiprocess
>  synchronization is also user's responsibility, as well as ensuring  that all
>  calls to add/attach/detach/remove memory are done in the correct order. It is
>  not required to attach to a memory area in all processes - only attach to memory
> @@ -246,6 +255,37 @@ The expected workflow is as follows:
>  For more information, please refer to ``rte_malloc`` API documentation,
>  specifically the ``rte_malloc_heap_*`` family of function calls.
>  
> ++ Using externally allocated memory without DPDK API's
> +
> +While using heap API's is the recommended method of using externally allocated
> +memory in DPDK, there are certain use cases where the overhead of DPDK heap API
> +is undesirable - for example, when manual memory management is performed on an
> +externally allocated area. To support use cases where externally allocated
> +memory will not be used as part of normal DPDK workflow, there is also another
> +set of API's under the ``rte_extmem_*`` namespace.
> +
> +These API's are (as their name implies) intended to allow registering or
> +unregistering externally allocated memory to/from DPDK's internal page table, to
> +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
> +memory. Memory added this way will not be available for any regular DPDK
> +allocators; DPDK will leave this memory for the user application to manage.
> +
> +The expected workflow is as follows:
> +
> +* Get a pointer to memory area
> +* Register memory within DPDK
> +    - If IOVA table is not specified, IOVA addresses will be assumed to be
> +      unavailable
> +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
> +* Use the memory area in your application
> +* If memory area is no longer needed, it can be unregistered
> +    - If the area was mapped for DMA, unmapping must be performed before
> +      unregistering memory
> +
> +Since these externally allocated memory areas will not be managed by DPDK, it is
> +therefore up to the user application to decide how to use them and what to do
> +with them once they're registered.
> +
>  Per-lcore and Shared Variables
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index d47ea4938..a2e085ae8 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -24,6 +24,7 @@
>  #include "eal_memalloc.h"
>  #include "eal_private.h"
>  #include "eal_internal_cfg.h"
> +#include "malloc_heap.h"
>  
>  /*
>   * Try to mmap *size bytes in /dev/zero. If it is successful, return the
> @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
>  	return ret;
>  }
>  
> +int __rte_experimental
> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
> +		unsigned int n_pages, size_t page_sz)
> +{
> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> +	unsigned int socket_id;
> +	int ret = 0;
> +
> +	if (va_addr == NULL || page_sz == 0 || len == 0 ||
> +			!rte_is_power_of_2(page_sz) ||
> +			RTE_ALIGN(len, page_sz) != len) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}

Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like
rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't
it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't
have it either... Also you might want to add it to documentation that
granularity of these registrations is a page.

Otherwise,

Acked-by: Yongseok Koh <yskoh@mellanox.com>
Thanks

> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
> +
> +	/* make sure the segment doesn't already exist */
> +	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
> +		rte_errno = EEXIST;
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	/* get next available socket ID */
> +	socket_id = mcfg->next_socket_id;
> +	if (socket_id > INT32_MAX) {
> +		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
> +		rte_errno = ENOSPC;
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	/* we can create a new memseg */
> +	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
> +			page_sz, "extmem", socket_id) == NULL) {
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	/* memseg list successfully created - increment next socket ID */
> +	mcfg->next_socket_id++;
> +unlock:
> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
> +	return ret;
> +}
> +
> +int __rte_experimental
> +rte_extmem_unregister(void *va_addr, size_t len)
> +{
> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> +	struct rte_memseg_list *msl;
> +	int ret = 0;
> +
> +	if (va_addr == NULL || len == 0) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}
> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
> +
> +	/* find our segment */
> +	msl = malloc_heap_find_external_seg(va_addr, len);
> +	if (msl == NULL) {
> +		rte_errno = ENOENT;
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	ret = malloc_heap_destroy_external_seg(msl);
> +unlock:
> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
> +	return ret;
> +}
> +
>  /* init memory subsystem */
>  int
>  rte_eal_memory_init(void)
> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
> index d970825df..4a43c1a9e 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -423,6 +423,69 @@ int __rte_experimental
>  rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
>  		size_t *offset);
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Register external memory chunk with DPDK.
> + *
> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
> + *   API's.
> + *
> + * @note This API will not perform any DMA mapping. It is expected that user
> + *   will do that themselves.
> + *
> + * @param va_addr
> + *   Start of virtual area to register
> + * @param len
> + *   Length of virtual area to register
> + * @param iova_addrs
> + *   Array of page IOVA addresses corresponding to each page in this memory
> + *   area. Can be NULL, in which case page IOVA addresses will be set to
> + *   RTE_BAD_IOVA.
> + * @param n_pages
> + *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
> + *   is NULL.
> + * @param page_sz
> + *   Page size of the underlying memory
> + *
> + * @return
> + *   - 0 on success
> + *   - -1 in case of error, with rte_errno set to one of the following:
> + *     EINVAL - one of the parameters was invalid
> + *     EEXIST - memory chunk is already registered
> + *     ENOSPC - no more space in internal config to store a new memory chunk
> + */
> +int __rte_experimental
> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
> +		unsigned int n_pages, size_t page_sz);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Unregister external memory chunk with DPDK.
> + *
> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
> + *   API's.
> + *
> + * @note This API will not perform any DMA unmapping. It is expected that user
> + *   will do that themselves.
> + *
> + * @param va_addr
> + *   Start of virtual area to unregister
> + * @param len
> + *   Length of virtual area to unregister
> + *
> + * @return
> + *   - 0 on success
> + *   - -1 in case of error, with rte_errno set to one of the following:
> + *     EINVAL - one of the parameters was invalid
> + *     ENOENT - memory chunk was not found
> + */
> +int __rte_experimental
> +rte_extmem_unregister(void *va_addr, size_t len);
> +
>  /**
>   * Dump the physical memory layout to a file.
>   *
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 3fe78260d..593691a14 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -296,6 +296,8 @@ EXPERIMENTAL {
>  	rte_devargs_remove;
>  	rte_devargs_type_count;
>  	rte_eal_cleanup;
> +	rte_extmem_register;
> +	rte_extmem_unregister;
>  	rte_fbarray_attach;
>  	rte_fbarray_destroy;
>  	rte_fbarray_detach;
> -- 
> 2.17.1
  
Anatoly Burakov Dec. 14, 2018, 11:03 a.m. UTC | #2
On 14-Dec-18 9:55 AM, Yongseok Koh wrote:
> On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote:
>> The general use-case of using external memory is well covered by
>> existing external memory API's. However, certain use cases require
>> manual management of externally allocated memory areas, so this
>> memory should not be added to the heap. It should, however, be
>> added to DPDK's internal structures, so that API's like
>> ``rte_virt2memseg`` would work on such external memory segments.
>>
>> This commit adds such an API to DPDK. The new functions will allow
>> to register and unregister externally allocated memory areas, as
>> well as documentation for them.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
>>   lib/librte_eal/common/eal_common_memory.c     | 74 +++++++++++++++++++
>>   lib/librte_eal/common/include/rte_memory.h    | 63 ++++++++++++++++
>>   lib/librte_eal/rte_eal_version.map            |  2 +
>>   4 files changed, 189 insertions(+), 10 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
>> index 8b5d050c7..d7799b626 100644
>> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
>> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
>> @@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
>>   Support for Externally Allocated Memory
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   
>> -It is possible to use externally allocated memory in DPDK, using a set of malloc
>> -heap API's. Support for externally allocated memory is implemented through
>> -overloading the socket ID - externally allocated heaps will have socket ID's
>> -that would be considered invalid under normal circumstances. Requesting an
>> -allocation to take place from a specified externally allocated memory is a
>> -matter of supplying the correct socket ID to DPDK allocator, either directly
>> -(e.g. through a call to ``rte_malloc``) or indirectly (through data
>> -structure-specific allocation API's such as ``rte_ring_create``).
>> +It is possible to use externally allocated memory in DPDK. There are two ways in
>> +which using externally allocated memory can work: the malloc heap API's, and
>> +manual memory management.
>>   
>> -Since there is no way DPDK can verify whether memory are is available or valid,
>> -this responsibility falls on the shoulders of the user. All multiprocess
>> ++ Using heap API's for externally allocated memory
>> +
>> +Using using a set of malloc heap API's is the recommended way to use externally
>> +allocated memory in DPDK. In this way, support for externally allocated memory
>> +is implemented through overloading the socket ID - externally allocated heaps
>> +will have socket ID's that would be considered invalid under normal
>> +circumstances. Requesting an allocation to take place from a specified
>> +externally allocated memory is a matter of supplying the correct socket ID to
>> +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
>> +indirectly (through data structure-specific allocation API's such as
>> +``rte_ring_create``). Using these API's also ensures that mapping of externally
>> +allocated memory for DMA is also performed on any memory segment that is added
>> +to a DPDK malloc heap.
>> +
>> +Since there is no way DPDK can verify whether memory is available or valid, this
>> +responsibility falls on the shoulders of the user. All multiprocess
>>   synchronization is also user's responsibility, as well as ensuring  that all
>>   calls to add/attach/detach/remove memory are done in the correct order. It is
>>   not required to attach to a memory area in all processes - only attach to memory
>> @@ -246,6 +255,37 @@ The expected workflow is as follows:
>>   For more information, please refer to ``rte_malloc`` API documentation,
>>   specifically the ``rte_malloc_heap_*`` family of function calls.
>>   
>> ++ Using externally allocated memory without DPDK API's
>> +
>> +While using heap API's is the recommended method of using externally allocated
>> +memory in DPDK, there are certain use cases where the overhead of DPDK heap API
>> +is undesirable - for example, when manual memory management is performed on an
>> +externally allocated area. To support use cases where externally allocated
>> +memory will not be used as part of normal DPDK workflow, there is also another
>> +set of API's under the ``rte_extmem_*`` namespace.
>> +
>> +These API's are (as their name implies) intended to allow registering or
>> +unregistering externally allocated memory to/from DPDK's internal page table, to
>> +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
>> +memory. Memory added this way will not be available for any regular DPDK
>> +allocators; DPDK will leave this memory for the user application to manage.
>> +
>> +The expected workflow is as follows:
>> +
>> +* Get a pointer to memory area
>> +* Register memory within DPDK
>> +    - If IOVA table is not specified, IOVA addresses will be assumed to be
>> +      unavailable
>> +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
>> +* Use the memory area in your application
>> +* If memory area is no longer needed, it can be unregistered
>> +    - If the area was mapped for DMA, unmapping must be performed before
>> +      unregistering memory
>> +
>> +Since these externally allocated memory areas will not be managed by DPDK, it is
>> +therefore up to the user application to decide how to use them and what to do
>> +with them once they're registered.
>> +
>>   Per-lcore and Shared Variables
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   
>> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
>> index d47ea4938..a2e085ae8 100644
>> --- a/lib/librte_eal/common/eal_common_memory.c
>> +++ b/lib/librte_eal/common/eal_common_memory.c
>> @@ -24,6 +24,7 @@
>>   #include "eal_memalloc.h"
>>   #include "eal_private.h"
>>   #include "eal_internal_cfg.h"
>> +#include "malloc_heap.h"
>>   
>>   /*
>>    * Try to mmap *size bytes in /dev/zero. If it is successful, return the
>> @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
>>   	return ret;
>>   }
>>   
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> +		unsigned int n_pages, size_t page_sz)
>> +{
>> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> +	unsigned int socket_id;
>> +	int ret = 0;
>> +
>> +	if (va_addr == NULL || page_sz == 0 || len == 0 ||
>> +			!rte_is_power_of_2(page_sz) ||
>> +			RTE_ALIGN(len, page_sz) != len) {
>> +		rte_errno = EINVAL;
>> +		return -1;
>> +	}
> 
> Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like
> rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't
> it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't
> have it either... Also you might want to add it to documentation that
> granularity of these registrations is a page.
> 

Hi Yongseok,

Thanks for your review.

n_pages is allowed to be 0 if iovas[] is NULL. However, you're correct 
in that more sanity checking and documentation re: page alignment would 
be beneficial. I'll submit a v2.


> Otherwise,
> 
> Acked-by: Yongseok Koh <yskoh@mellanox.com>
> Thanks
> 
>> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> +	/* make sure the segment doesn't already exist */
>> +	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
>> +		rte_errno = EEXIST;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* get next available socket ID */
>> +	socket_id = mcfg->next_socket_id;
>> +	if (socket_id > INT32_MAX) {
>> +		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
>> +		rte_errno = ENOSPC;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* we can create a new memseg */
>> +	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
>> +			page_sz, "extmem", socket_id) == NULL) {
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* memseg list successfully created - increment next socket ID */
>> +	mcfg->next_socket_id++;
>> +unlock:
>> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> +	return ret;
>> +}
>> +
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len)
>> +{
>> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> +	struct rte_memseg_list *msl;
>> +	int ret = 0;
>> +
>> +	if (va_addr == NULL || len == 0) {
>> +		rte_errno = EINVAL;
>> +		return -1;
>> +	}
>> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> +	/* find our segment */
>> +	msl = malloc_heap_find_external_seg(va_addr, len);
>> +	if (msl == NULL) {
>> +		rte_errno = ENOENT;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	ret = malloc_heap_destroy_external_seg(msl);
>> +unlock:
>> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> +	return ret;
>> +}
>> +
>>   /* init memory subsystem */
>>   int
>>   rte_eal_memory_init(void)
>> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
>> index d970825df..4a43c1a9e 100644
>> --- a/lib/librte_eal/common/include/rte_memory.h
>> +++ b/lib/librte_eal/common/include/rte_memory.h
>> @@ -423,6 +423,69 @@ int __rte_experimental
>>   rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
>>   		size_t *offset);
>>   
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Register external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + *   API's.
>> + *
>> + * @note This API will not perform any DMA mapping. It is expected that user
>> + *   will do that themselves.
>> + *
>> + * @param va_addr
>> + *   Start of virtual area to register
>> + * @param len
>> + *   Length of virtual area to register
>> + * @param iova_addrs
>> + *   Array of page IOVA addresses corresponding to each page in this memory
>> + *   area. Can be NULL, in which case page IOVA addresses will be set to
>> + *   RTE_BAD_IOVA.
>> + * @param n_pages
>> + *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
>> + *   is NULL.
>> + * @param page_sz
>> + *   Page size of the underlying memory
>> + *
>> + * @return
>> + *   - 0 on success
>> + *   - -1 in case of error, with rte_errno set to one of the following:
>> + *     EINVAL - one of the parameters was invalid
>> + *     EEXIST - memory chunk is already registered
>> + *     ENOSPC - no more space in internal config to store a new memory chunk
>> + */
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> +		unsigned int n_pages, size_t page_sz);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Unregister external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + *   API's.
>> + *
>> + * @note This API will not perform any DMA unmapping. It is expected that user
>> + *   will do that themselves.
>> + *
>> + * @param va_addr
>> + *   Start of virtual area to unregister
>> + * @param len
>> + *   Length of virtual area to unregister
>> + *
>> + * @return
>> + *   - 0 on success
>> + *   - -1 in case of error, with rte_errno set to one of the following:
>> + *     EINVAL - one of the parameters was invalid
>> + *     ENOENT - memory chunk was not found
>> + */
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len);
>> +
>>   /**
>>    * Dump the physical memory layout to a file.
>>    *
>> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
>> index 3fe78260d..593691a14 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -296,6 +296,8 @@ EXPERIMENTAL {
>>   	rte_devargs_remove;
>>   	rte_devargs_type_count;
>>   	rte_eal_cleanup;
>> +	rte_extmem_register;
>> +	rte_extmem_unregister;
>>   	rte_fbarray_attach;
>>   	rte_fbarray_destroy;
>>   	rte_fbarray_detach;
>> -- 
>> 2.17.1
>
  

Patch

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 8b5d050c7..d7799b626 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -212,17 +212,26 @@  Normally, these options do not need to be changed.
 Support for Externally Allocated Memory
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-It is possible to use externally allocated memory in DPDK, using a set of malloc
-heap API's. Support for externally allocated memory is implemented through
-overloading the socket ID - externally allocated heaps will have socket ID's
-that would be considered invalid under normal circumstances. Requesting an
-allocation to take place from a specified externally allocated memory is a
-matter of supplying the correct socket ID to DPDK allocator, either directly
-(e.g. through a call to ``rte_malloc``) or indirectly (through data
-structure-specific allocation API's such as ``rte_ring_create``).
+It is possible to use externally allocated memory in DPDK. There are two ways in
+which using externally allocated memory can work: the malloc heap API's, and
+manual memory management.
 
-Since there is no way DPDK can verify whether memory are is available or valid,
-this responsibility falls on the shoulders of the user. All multiprocess
++ Using heap API's for externally allocated memory
+
+Using using a set of malloc heap API's is the recommended way to use externally
+allocated memory in DPDK. In this way, support for externally allocated memory
+is implemented through overloading the socket ID - externally allocated heaps
+will have socket ID's that would be considered invalid under normal
+circumstances. Requesting an allocation to take place from a specified
+externally allocated memory is a matter of supplying the correct socket ID to
+DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
+indirectly (through data structure-specific allocation API's such as
+``rte_ring_create``). Using these API's also ensures that mapping of externally
+allocated memory for DMA is also performed on any memory segment that is added
+to a DPDK malloc heap.
+
+Since there is no way DPDK can verify whether memory is available or valid, this
+responsibility falls on the shoulders of the user. All multiprocess
 synchronization is also user's responsibility, as well as ensuring  that all
 calls to add/attach/detach/remove memory are done in the correct order. It is
 not required to attach to a memory area in all processes - only attach to memory
@@ -246,6 +255,37 @@  The expected workflow is as follows:
 For more information, please refer to ``rte_malloc`` API documentation,
 specifically the ``rte_malloc_heap_*`` family of function calls.
 
++ Using externally allocated memory without DPDK API's
+
+While using heap API's is the recommended method of using externally allocated
+memory in DPDK, there are certain use cases where the overhead of DPDK heap API
+is undesirable - for example, when manual memory management is performed on an
+externally allocated area. To support use cases where externally allocated
+memory will not be used as part of normal DPDK workflow, there is also another
+set of API's under the ``rte_extmem_*`` namespace.
+
+These API's are (as their name implies) intended to allow registering or
+unregistering externally allocated memory to/from DPDK's internal page table, to
+allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
+memory. Memory added this way will not be available for any regular DPDK
+allocators; DPDK will leave this memory for the user application to manage.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Register memory within DPDK
+    - If IOVA table is not specified, IOVA addresses will be assumed to be
+      unavailable
+* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
+* Use the memory area in your application
+* If memory area is no longer needed, it can be unregistered
+    - If the area was mapped for DMA, unmapping must be performed before
+      unregistering memory
+
+Since these externally allocated memory areas will not be managed by DPDK, it is
+therefore up to the user application to decide how to use them and what to do
+with them once they're registered.
+
 Per-lcore and Shared Variables
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index d47ea4938..a2e085ae8 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -24,6 +24,7 @@ 
 #include "eal_memalloc.h"
 #include "eal_private.h"
 #include "eal_internal_cfg.h"
+#include "malloc_heap.h"
 
 /*
  * Try to mmap *size bytes in /dev/zero. If it is successful, return the
@@ -775,6 +776,79 @@  rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
 	return ret;
 }
 
+int __rte_experimental
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int socket_id;
+	int ret = 0;
+
+	if (va_addr == NULL || page_sz == 0 || len == 0 ||
+			!rte_is_power_of_2(page_sz) ||
+			RTE_ALIGN(len, page_sz) != len) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* make sure the segment doesn't already exist */
+	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
+		rte_errno = EEXIST;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* get next available socket ID */
+	socket_id = mcfg->next_socket_id;
+	if (socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we can create a new memseg */
+	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
+			page_sz, "extmem", socket_id) == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
+	/* memseg list successfully created - increment next socket ID */
+	mcfg->next_socket_id++;
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_extmem_unregister(void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int ret = 0;
+
+	if (va_addr == NULL || len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our segment */
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+
+	ret = malloc_heap_destroy_external_seg(msl);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index d970825df..4a43c1a9e 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -423,6 +423,69 @@  int __rte_experimental
 rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
 		size_t *offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register external memory chunk with DPDK.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to register
+ * @param len
+ *   Length of virtual area to register
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EEXIST - memory chunk is already registered
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister external memory chunk with DPDK.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA unmapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to unregister
+ * @param len
+ *   Length of virtual area to unregister
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_unregister(void *va_addr, size_t len);
+
 /**
  * Dump the physical memory layout to a file.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 3fe78260d..593691a14 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -296,6 +296,8 @@  EXPERIMENTAL {
 	rte_devargs_remove;
 	rte_devargs_type_count;
 	rte_eal_cleanup;
+	rte_extmem_register;
+	rte_extmem_unregister;
 	rte_fbarray_attach;
 	rte_fbarray_destroy;
 	rte_fbarray_detach;