[v2] mem: fix cleanup when multi-process is disabled

Message ID 20210324193227.15497-1-dmitry.kozliuk@gmail.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [v2] mem: fix cleanup when multi-process is disabled |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-abi-testing success Testing PASS
ci/travis-robot success travis build: passed
ci/github-robot success github build: passed
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Dmitry Kozlyuk March 24, 2021, 7:32 p.m. UTC
  rte_eal_memory_detach() did not account for cases where multi-process
mode is disabled: --in-memory and --no-shconf. This resulted
in unmapping memory that had not been mapped, which caused errors:

    EAL: Could not unmap memory: No error   (Windows)
    EAL: Cannot munmap(0x1d47f40, 0x7000): Invalid argument  (Linux)

Confusing "No error" was caused by using errno instead of rte_errno
set by rte_mem_unmap().

Skip detaching memory altogether when --in-memory is specified.
Skip unmapping configuration when it's not shared.
Fix and add error handling to produce proper log messages.

Fixes: dfbc61a2f9a6 ("mem: detach memsegs on cleanup")
Cc: Anatoly Burakov <anatoly.burakov@intel.com>

Reported-by: Jie Zhou <jizh@microsoft.com>
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
---
 lib/librte_eal/common/eal_common_memory.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)
  

Comments

Menon, Ranjit March 25, 2021, 3:39 p.m. UTC | #1
On 3/24/2021 12:32 PM, Dmitry Kozlyuk wrote:
> rte_eal_memory_detach() did not account for cases where multi-process
> mode is disabled: --in-memory and --no-shconf. This resulted
> in unmapping memory that had not been mapped, which caused errors:
>
>      EAL: Could not unmap memory: No error   (Windows)
>      EAL: Cannot munmap(0x1d47f40, 0x7000): Invalid argument  (Linux)
>
> Confusing "No error" was caused by using errno instead of rte_errno
> set by rte_mem_unmap().
>
> Skip detaching memory altogether when --in-memory is specified.
> Skip unmapping configuration when it's not shared.
> Fix and add error handling to produce proper log messages.
>
> Fixes: dfbc61a2f9a6 ("mem: detach memsegs on cleanup")
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>
> Reported-by: Jie Zhou <jizh@microsoft.com>
> Suggested-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> ---
>   lib/librte_eal/common/eal_common_memory.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index 0e99986d3d..9495170c86 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -1006,10 +1006,15 @@ rte_extmem_detach(void *va_addr, size_t len)
>   int
>   rte_eal_memory_detach(void)
>   {
> +	const struct internal_config *internal_conf =
> +		eal_get_internal_configuration();
>   	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>   	size_t page_sz = rte_mem_page_size();
>   	unsigned int i;
>   
> +	if (internal_conf->in_memory == 1)
> +		return 0;
> +
>   	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>   
>   	/* detach internal memory subsystem data first */
> @@ -1032,7 +1037,7 @@ rte_eal_memory_detach(void)
>   		if (!msl->external)
>   			if (rte_mem_unmap(msl->base_va, msl->len) != 0)
>   				RTE_LOG(ERR, EAL, "Could not unmap memory: %s\n",
> -						strerror(errno));
> +						rte_strerror(rte_errno));
>   
>   		/*
>   		 * we are detaching the fbarray rather than destroying because
> @@ -1050,7 +1055,10 @@ rte_eal_memory_detach(void)
>   	 * config - we can't zero it out because it might still be referenced
>   	 * by other processes.
>   	 */
> -	rte_mem_unmap(mcfg, RTE_ALIGN(sizeof(*mcfg), page_sz));
> +	if (internal_conf->no_shconf == 0)
> +		if (rte_mem_unmap(mcfg, RTE_ALIGN(sizeof(*mcfg), page_sz)) != 0)
> +			RTE_LOG(ERR, EAL, "Could not unmap shared memory config: %s\n",
> +					rte_strerror(rte_errno));
>   	rte_eal_get_configuration()->mem_config = NULL;
>   
>   	return 0;
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
  
Burakov, Anatoly March 26, 2021, 12:34 p.m. UTC | #2
On 24-Mar-21 7:32 PM, Dmitry Kozlyuk wrote:
> rte_eal_memory_detach() did not account for cases where multi-process
> mode is disabled: --in-memory and --no-shconf. This resulted
> in unmapping memory that had not been mapped, which caused errors:
> 
>      EAL: Could not unmap memory: No error   (Windows)
>      EAL: Cannot munmap(0x1d47f40, 0x7000): Invalid argument  (Linux)
> 
> Confusing "No error" was caused by using errno instead of rte_errno
> set by rte_mem_unmap().
> 
> Skip detaching memory altogether when --in-memory is specified.
> Skip unmapping configuration when it's not shared.
> Fix and add error handling to produce proper log messages.
> 
> Fixes: dfbc61a2f9a6 ("mem: detach memsegs on cleanup")
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> Reported-by: Jie Zhou <jizh@microsoft.com>
> Suggested-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> ---

LGTM

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
David Marchand March 26, 2021, 4:15 p.m. UTC | #3
On Wed, Mar 24, 2021 at 8:32 PM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
>
> rte_eal_memory_detach() did not account for cases where multi-process
> mode is disabled: --in-memory and --no-shconf. This resulted
> in unmapping memory that had not been mapped, which caused errors:
>
>     EAL: Could not unmap memory: No error   (Windows)
>     EAL: Cannot munmap(0x1d47f40, 0x7000): Invalid argument  (Linux)
>
> Confusing "No error" was caused by using errno instead of rte_errno
> set by rte_mem_unmap().
>
> Skip detaching memory altogether when --in-memory is specified.
> Skip unmapping configuration when it's not shared.
> Fix and add error handling to produce proper log messages.
>
> Fixes: dfbc61a2f9a6 ("mem: detach memsegs on cleanup")
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>
> Reported-by: Jie Zhou <jizh@microsoft.com>
> Suggested-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

Applied, thanks Dmitry.
  
David Marchand April 9, 2021, noon UTC | #4
On Wed, Mar 24, 2021 at 8:32 PM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
> @@ -1050,7 +1055,10 @@ rte_eal_memory_detach(void)
>          * config - we can't zero it out because it might still be referenced
>          * by other processes.
>          */
> -       rte_mem_unmap(mcfg, RTE_ALIGN(sizeof(*mcfg), page_sz));
> +       if (internal_conf->no_shconf == 0)
> +               if (rte_mem_unmap(mcfg, RTE_ALIGN(sizeof(*mcfg), page_sz)) != 0)
> +                       RTE_LOG(ERR, EAL, "Could not unmap shared memory config: %s\n",
> +                                       rte_strerror(rte_errno));
>         rte_eal_get_configuration()->mem_config = NULL;
>
>         return 0;

We have another issue if eal init fails early, then the application
exits calling rte_exit() -> rte_eal_cleanup() ->
rte_eal_memory_detach()
The issue itself is not related to this current change but rather to
dfbc61a2f9a6 ("mem: detach memsegs on cleanup"), but it became visible
with the above log.


Example:
$ ./build/app/dpdk-testpmd --plop
...
EAL: FATAL: Invalid 'command line' arguments.
EAL: Invalid 'command line' arguments.
EAL: Error - exiting with code: 1
  Cause: Cannot init EAL: Invalid argument
EAL: Could not unmap shared memory config: Invalid argument
  

Patch

diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0e99986d3d..9495170c86 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -1006,10 +1006,15 @@  rte_extmem_detach(void *va_addr, size_t len)
 int
 rte_eal_memory_detach(void)
 {
+	const struct internal_config *internal_conf =
+		eal_get_internal_configuration();
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	size_t page_sz = rte_mem_page_size();
 	unsigned int i;
 
+	if (internal_conf->in_memory == 1)
+		return 0;
+
 	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
 
 	/* detach internal memory subsystem data first */
@@ -1032,7 +1037,7 @@  rte_eal_memory_detach(void)
 		if (!msl->external)
 			if (rte_mem_unmap(msl->base_va, msl->len) != 0)
 				RTE_LOG(ERR, EAL, "Could not unmap memory: %s\n",
-						strerror(errno));
+						rte_strerror(rte_errno));
 
 		/*
 		 * we are detaching the fbarray rather than destroying because
@@ -1050,7 +1055,10 @@  rte_eal_memory_detach(void)
 	 * config - we can't zero it out because it might still be referenced
 	 * by other processes.
 	 */
-	rte_mem_unmap(mcfg, RTE_ALIGN(sizeof(*mcfg), page_sz));
+	if (internal_conf->no_shconf == 0)
+		if (rte_mem_unmap(mcfg, RTE_ALIGN(sizeof(*mcfg), page_sz)) != 0)
+			RTE_LOG(ERR, EAL, "Could not unmap shared memory config: %s\n",
+					rte_strerror(rte_errno));
 	rte_eal_get_configuration()->mem_config = NULL;
 
 	return 0;