[1/4] eal: add new API to register contiguous external memory
diff mbox series

Message ID 20191213141322.32730-2-maxime.coquelin@redhat.com
State Deferred
Delegated to: David Marchand
Headers show
Series
  • Add external contiguous memory support to Virtio-user
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/checkpatch warning coding style issues

Commit Message

Maxime Coquelin Dec. 13, 2019, 2:13 p.m. UTC
This new API allows to pass a file descriptor while
registering external and contiguous memory in the IOVA space.

This is required for using Virtio-user PMD with application
using external memory for the mbuf's buffers, like Seastar
or VPP.

FD is only attached to the segments if single_file_segment
option is enabled.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_eal/common/eal_common_memory.c  | 75 +++++++++++++++++++++-
 lib/librte_eal/common/include/rte_memory.h | 46 +++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 17 ++++-
 lib/librte_eal/common/malloc_heap.h        |  2 +-
 lib/librte_eal/common/rte_malloc.c         |  2 +-
 lib/librte_eal/rte_eal_version.map         |  3 +
 6 files changed, 141 insertions(+), 4 deletions(-)

Comments

Burakov, Anatoly Jan. 23, 2020, 1:06 p.m. UTC | #1
On 13-Dec-19 2:13 PM, Maxime Coquelin wrote:
> This new API allows to pass a file descriptor while
> registering external and contiguous memory in the IOVA space.
> 
> This is required for using Virtio-user PMD with application
> using external memory for the mbuf's buffers, like Seastar
> or VPP.
> 
> FD is only attached to the segments if single_file_segment
> option is enabled.
>  > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Hi Maxime,

We've discussed this privately already, but i'll send feedback anyway 
just so that it's visible.

> ---
>   lib/librte_eal/common/eal_common_memory.c  | 75 +++++++++++++++++++++-
>   lib/librte_eal/common/include/rte_memory.h | 46 +++++++++++++
>   lib/librte_eal/common/malloc_heap.c        | 17 ++++-
>   lib/librte_eal/common/malloc_heap.h        |  2 +-
>   lib/librte_eal/common/rte_malloc.c         |  2 +-
>   lib/librte_eal/rte_eal_version.map         |  3 +
>   6 files changed, 141 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index 4a9cc1f19a..7a4b371828 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -772,6 +772,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
>   	return ret;
>   }
>   
> +int
> +rte_extmem_register_contig(void *va_addr, size_t len, rte_iova_t iova_addr,
> +		size_t page_sz, int fd)
> +{

I don't see why do you have to link IOVA-contiguousness with setting an 
fd - a chunk of memory backed by an fd does not necessarily have to be 
IOVA-contiguous.

In fact, registering IOVA-contiguous memory is already possible - you 
just specify the IOVA table such that each successive IOVA is contiguous 
to the preceding one.

So, IMO, the 'fd' part should be decoupled from registering.

Also, we support two modes of operation (fd per segment and fd per 
segment list), it would be nice to have both.

For example, consider the following:

// set fd for an entire segment list, addressed by VA and len.
// must match a memory chunk that was registered earlier
rte_extmem_set_seg_fd(..., int fd);

// set per-segment fd's for the segment list, addressed by VA and len.
// must match a memory chunk that was registered earlier, and length of
// fd array must add up to length of the chunk
rte_extmem_set_seg_fd_list(..., int *fd, int len);

This would essentially accomplish what you're trying to do by just 
adding the missing piece required.

Question: do we want to be able to set segment fd's for externally 
allocated by internally managed memory? E.g. something that was added 
via malloc_heap_add_memory rather than extmem_register? I would guess 
so, so maybe it shouldn't be under rte_extmem namespace.

Patch
diff mbox series

diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 4a9cc1f19a..7a4b371828 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -772,6 +772,79 @@  rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
 	return ret;
 }
 
+int
+rte_extmem_register_contig(void *va_addr, size_t len, rte_iova_t iova_addr,
+		size_t page_sz, int fd)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	rte_iova_t *iova_addrs = NULL;
+	unsigned int socket_id, n, i;
+	int ret = 0;
+
+	if (va_addr == NULL || page_sz == 0 || len == 0 ||
+			!rte_is_power_of_2(page_sz) ||
+			RTE_ALIGN(len, page_sz) != len ||
+			((len % page_sz) != 0  ||
+			!rte_is_aligned(va_addr, page_sz))) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	n = len / page_sz;
+	if (iova_addr != 0) {
+		iova_addrs = malloc(n * sizeof(*iova_addrs));
+		if (iova_addrs == NULL) {
+			rte_errno = -ENOMEM;
+			return -1;
+		}
+
+		for (i = 0; i < n; i++)
+			iova_addrs[i] = iova_addr + n * page_sz;
+
+	}
+
+
+	if (fd >= 0 && !internal_config.single_file_segments) {
+		RTE_LOG(INFO, EAL, "FD won't be attached to the external memory," \
+				" requires single file segments\n");
+		fd = -1;
+	}
+	rte_mcfg_mem_write_lock();
+
+	/* make sure the segment doesn't already exist */
+	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
+		rte_errno = EEXIST;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* get next available socket ID */
+	socket_id = mcfg->next_socket_id;
+	if (socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we can create a new memseg */
+	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
+			page_sz, "extmem_contig", socket_id, fd) == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
+	/* memseg list successfully created - increment next socket ID */
+	mcfg->next_socket_id++;
+unlock:
+	rte_mcfg_mem_write_unlock();
+
+	if (iova_addrs != NULL)
+		free(iova_addrs);
+
+	return ret;
+}
+
 int
 rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz)
@@ -809,7 +882,7 @@  rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 	/* we can create a new memseg */
 	n = len / page_sz;
 	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
-			page_sz, "extmem", socket_id) == NULL) {
+			page_sz, "extmem", socket_id, -1) == NULL) {
 		ret = -1;
 		goto unlock;
 	}
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 3d8d0bd697..e274e47e5e 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -451,6 +451,10 @@  rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
  *   is NULL.
  * @param page_sz
  *   Page size of the underlying memory
+ * @param fd
+ *   File descriptor for the external memory region registered. Must be set to
+ *   -1 if no FD, and ignored if single-segment isn't not used or if iova
+ *   aren't contiguous (iova_addrs != NULL).
  *
  * @return
  *   - 0 on success
@@ -461,6 +465,48 @@  rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
  */
 __rte_experimental
 int
+rte_extmem_register_contig(void *va_addr, size_t len, rte_iova_t iova_addr,
+		size_t page_sz, int fd);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register external contiguous memory chunk with DPDK.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling ``rte_extmem_attach`` in
+ *   each other process.
+ *
+ * @param va_addr
+ *   Start of virtual area to register. Must be aligned by ``page_sz``.
+ * @param len
+ *   Length of virtual area to register. Must be aligned by ``page_sz``.
+ * @param iova_addr
+ *   IOVA address for the contiguous memory chunck. Can be 0, in which case
+ *   page IOVA addresses will be set to RTE_BAD_IOVA.
+ * @param page_sz
+ *   Page size of the underlying memory
+ * @param fd
+ *   File descriptor for the external memory region registered. Must be set to
+ *   -1 if no FD, and ignored if single-segment isn't not used.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EEXIST - memory chunk is already registered
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ *     ENOMEM - failed to allocate pages IOVA addresses
+ */
+__rte_experimental
+int
 rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz);
 
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 842eb9de75..229ab6563c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1096,7 +1096,7 @@  destroy_elem(struct malloc_elem *elem, size_t len)
 struct rte_memseg_list *
 malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz, const char *seg_name,
-		unsigned int socket_id)
+		unsigned int socket_id, int fd)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	char fbarray_name[RTE_FBARRAY_NAME_LEN];
@@ -1153,6 +1153,13 @@  malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 	msl->version = 0;
 	msl->external = 1;
 
+	if (fd >= 0) {
+		int list_idx = msl - mcfg->memsegs;
+
+		if (eal_memalloc_set_seg_list_fd(list_idx, fd))
+			RTE_LOG(ERR, EAL, "Failed to set segment list FD\n");
+	}
+
 	return msl;
 }
 
@@ -1202,10 +1209,18 @@  malloc_heap_find_external_seg(void *va_addr, size_t len)
 int
 malloc_heap_destroy_external_seg(struct rte_memseg_list *msl)
 {
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int list_idx;
+
 	/* destroy the fbarray backing this memory */
 	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
 		return -1;
 
+	list_idx = msl - mcfg->memsegs;
+	if (eal_memalloc_set_seg_list_fd(list_idx, -1))
+		RTE_LOG(ERR, EAL, "Failed to reset segment list FD\n");
+
+
 	/* reset the memseg list */
 	memset(msl, 0, sizeof(*msl));
 
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 772736b53f..438ce908de 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -65,7 +65,7 @@  malloc_heap_destroy(struct malloc_heap *heap);
 struct rte_memseg_list *
 malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz, const char *seg_name,
-		unsigned int socket_id);
+		unsigned int socket_id, int fd);
 
 struct rte_memseg_list *
 malloc_heap_find_external_seg(void *va_addr, size_t len);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index d6026a2b17..aa19b0517f 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -389,7 +389,7 @@  rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	n = len / page_sz;
 
 	msl = malloc_heap_create_external_seg(va_addr, iova_addrs, n, page_sz,
-			heap_name, heap->socket_id);
+			heap_name, heap->socket_id, -1);
 	if (msl == NULL) {
 		ret = -1;
 		goto unlock;
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index e38d02530c..ddcb4b0512 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -332,4 +332,7 @@  EXPERIMENTAL {
 	# added in 19.11
 	rte_log_get_stream;
 	rte_mcfg_get_single_file_segments;
+
+	# added in 20.02
+	rte_extmem_register_contig;
 };