pci: keep API compatibility with mmap values

Message ID 20200710115324.3902559-1-thomas@monjalon.net (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series pci: keep API compatibility with mmap values |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed
ci/Intel-compilation success Compilation OK
ci/iol-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Thomas Monjalon July 10, 2020, 11:53 a.m. UTC
The function pci_map_resource() returns MAP_FAILED in case of error.
When replacing the call to mmap() by rte_mem_map(),
the error code became NULL, breaking the API.
This function is probably not used outside of DPDK,
but it is still a problem for two reasons:
	- the deprecation process was not followed
	- the Linux function pci_vfio_mmap_bar() is broken for i40e

The error code is reverted to the Unix value MAP_FAILED.
Windows needs to define this special value (-1 as in Unix).
After proper deprecation process, the API could be changed again
if really needed.

Because of the switch from mmap() to rte_mem_map(),
another part of the API was changed: "int additional_flags"
are defined as "additional flags for the mapping range"
without mentioning it was directly used in mmap().
Currently it is directly used in rte_mem_map(),
that's why the values rte_map_flags must be mapped (sic) on the mmap ones
in case of Unix OS.

These are side effects of a badly defined API using Unix values.

Bugzilla ID: 503
Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
Cc: talshn@mellanox.com

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/bus/pci/bsd/pci.c                 | 2 +-
 drivers/bus/pci/linux/pci_uio.c           | 2 +-
 drivers/bus/pci/linux/pci_vfio.c          | 4 ++--
 drivers/bus/pci/pci_common_uio.c          | 2 +-
 lib/librte_eal/include/rte_eal_paging.h   | 8 ++++++++
 lib/librte_eal/windows/include/sys/mman.h | 9 +++++++++
 lib/librte_pci/rte_pci.c                  | 1 +
 lib/librte_pci/rte_pci.h                  | 2 +-
 8 files changed, 24 insertions(+), 6 deletions(-)
 create mode 100644 lib/librte_eal/windows/include/sys/mman.h
  

Comments

David Marchand July 10, 2020, 1:34 p.m. UTC | #1
On Fri, Jul 10, 2020 at 1:53 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> The function pci_map_resource() returns MAP_FAILED in case of error.
> When replacing the call to mmap() by rte_mem_map(),
> the error code became NULL, breaking the API.
> This function is probably not used outside of DPDK,
> but it is still a problem for two reasons:
>         - the deprecation process was not followed
>         - the Linux function pci_vfio_mmap_bar() is broken for i40e
>
> The error code is reverted to the Unix value MAP_FAILED.
> Windows needs to define this special value (-1 as in Unix).
> After proper deprecation process, the API could be changed again
> if really needed.
>
> Because of the switch from mmap() to rte_mem_map(),
> another part of the API was changed: "int additional_flags"
> are defined as "additional flags for the mapping range"
> without mentioning it was directly used in mmap().
> Currently it is directly used in rte_mem_map(),
> that's why the values rte_map_flags must be mapped (sic) on the mmap ones
> in case of Unix OS.
>
> These are side effects of a badly defined API using Unix values.
>
> Bugzilla ID: 503
> Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> Cc: talshn@mellanox.com
>
> Reported-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  drivers/bus/pci/bsd/pci.c                 | 2 +-
>  drivers/bus/pci/linux/pci_uio.c           | 2 +-
>  drivers/bus/pci/linux/pci_vfio.c          | 4 ++--
>  drivers/bus/pci/pci_common_uio.c          | 2 +-
>  lib/librte_eal/include/rte_eal_paging.h   | 8 ++++++++
>  lib/librte_eal/windows/include/sys/mman.h | 9 +++++++++
>  lib/librte_pci/rte_pci.c                  | 1 +
>  lib/librte_pci/rte_pci.h                  | 2 +-
>  8 files changed, 24 insertions(+), 6 deletions(-)
>  create mode 100644 lib/librte_eal/windows/include/sys/mman.h
>
> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
> index 8bc473eb9a..6ec27b4b5b 100644
> --- a/drivers/bus/pci/bsd/pci.c
> +++ b/drivers/bus/pci/bsd/pci.c
> @@ -192,7 +192,7 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
>         mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
>                         (size_t)dev->mem_resource[res_idx].len, 0);
>         close(fd);
> -       if (mapaddr == NULL)
> +       if (mapaddr == MAP_FAILED)
>                 goto error;
>
>         maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr;
> diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
> index b622001539..097dc19225 100644
> --- a/drivers/bus/pci/linux/pci_uio.c
> +++ b/drivers/bus/pci/linux/pci_uio.c
> @@ -345,7 +345,7 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
>         mapaddr = pci_map_resource(pci_map_addr, fd, 0,
>                         (size_t)dev->mem_resource[res_idx].len, 0);
>         close(fd);
> -       if (mapaddr == NULL)
> +       if (mapaddr == MAP_FAILED)
>                 goto error;
>
>         pci_map_addr = RTE_PTR_ADD(mapaddr,
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index fdeb9a8caf..07e072e13f 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -566,7 +566,7 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
>                 }
>
>                 /* if there's a second part, try to map it */
> -               if (map_addr != NULL
> +               if (map_addr != MAP_FAILED
>                         && memreg[1].offset && memreg[1].size) {
>                         void *second_addr = RTE_PTR_ADD(bar_addr,
>                                                 (uintptr_t)(memreg[1].offset -
> @@ -578,7 +578,7 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
>                                                         RTE_MAP_FORCE_ADDRESS);
>                 }
>
> -               if (map_addr == NULL) {
> +               if (map_addr == NULL || map_addr == MAP_FAILED) {
>                         munmap(bar_addr, bar->size);
>                         bar_addr = MAP_FAILED;
>                         RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n",
> diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
> index 793dfd0a7c..f4dca9da91 100644
> --- a/drivers/bus/pci/pci_common_uio.c
> +++ b/drivers/bus/pci/pci_common_uio.c
> @@ -58,7 +58,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
>                                         "Cannot mmap device resource file %s to address: %p\n",
>                                         uio_res->maps[i].path,
>                                         uio_res->maps[i].addr);
> -                               if (mapaddr != NULL) {
> +                               if (mapaddr != MAP_FAILED) {
>                                         /* unmap addrs correctly mapped */
>                                         for (j = 0; j < i; j++)
>                                                 pci_unmap_resource(
> diff --git a/lib/librte_eal/include/rte_eal_paging.h b/lib/librte_eal/include/rte_eal_paging.h
> index ed98e70e9e..680a7f2505 100644
> --- a/lib/librte_eal/include/rte_eal_paging.h
> +++ b/lib/librte_eal/include/rte_eal_paging.h
> @@ -3,6 +3,7 @@
>   */
>
>  #include <stdint.h>
> +#include <sys/mman.h>
>
>  #include <rte_compat.h>
>
> @@ -22,6 +23,7 @@ enum rte_mem_prot {
>
>  /** Additional flags for memory mapping. */
>  enum rte_map_flags {
> +#ifdef RTE_EXEC_ENV_WINDOWS
>         /** Changes to the mapped memory are visible to other processes. */
>         RTE_MAP_SHARED = 1 << 0,
>         /** Mapping is not backed by a regular file. */
> @@ -35,6 +37,12 @@ enum rte_map_flags {
>          * it is not required to do so, thus mapping with this flag may fail.
>          */
>         RTE_MAP_FORCE_ADDRESS = 1 << 3
> +#else /* map mmap flags because they are exposed in pci_map_resource() API */
> +       RTE_MAP_SHARED = MAP_SHARED,
> +       RTE_MAP_ANONYMOUS = MAP_ANONYMOUS,
> +       RTE_MAP_PRIVATE = MAP_PRIVATE,
> +       RTE_MAP_FORCE_ADDRESS = MAP_FIXED,
> +#endif
>  };
>
>  /**
> diff --git a/lib/librte_eal/windows/include/sys/mman.h b/lib/librte_eal/windows/include/sys/mman.h
> new file mode 100644
> index 0000000000..0b4b10df1f
> --- /dev/null
> +++ b/lib/librte_eal/windows/include/sys/mman.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2020 Mellanox Technologies, Ltd
> + */
> +
> +/*
> + * The syscall mmap does not exist on Windows,
> + * but this error code is used in a badly defined DPDK API for PCI mapping.
> + */
> +#define MAP_FAILED ((void *) -1)
> diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c
> index d8272b9076..1d1cbc75ac 100644
> --- a/lib/librte_pci/rte_pci.c
> +++ b/lib/librte_pci/rte_pci.c
> @@ -163,6 +163,7 @@ pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
>                         __func__, fd, requested_addr, size,
>                         (unsigned long long)offset,
>                         rte_strerror(rte_errno), mapaddr);
> +               mapaddr = MAP_FAILED; /* API uses mmap error code */
>         } else
>                 RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
>
> diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h
> index 104b2bb858..a03235da1f 100644
> --- a/lib/librte_pci/rte_pci.h
> +++ b/lib/librte_pci/rte_pci.h
> @@ -160,7 +160,7 @@ int rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr);
>   *      The additional flags for the mapping range.
>   * @return
>   *   - On success, the function returns a pointer to the mapped area.
> - *   - On error, NULL is returned.
> + *   - On error, MAP_FAILED is returned.
>   */
>  void *pci_map_resource(void *requested_addr, int fd, off_t offset,
>                 size_t size, int additional_flags);
> --
> 2.27.0
>

As we discussed offlist, I am not really ecstatic about this.
But the function was exposed, so no breaking.

Reviewed-by: David Marchand <david.marchand@redhat.com>
  
Anatoly Burakov July 10, 2020, 3:39 p.m. UTC | #2
On 10-Jul-20 12:53 PM, Thomas Monjalon wrote:
> The function pci_map_resource() returns MAP_FAILED in case of error.
> When replacing the call to mmap() by rte_mem_map(),
> the error code became NULL, breaking the API.
> This function is probably not used outside of DPDK,
> but it is still a problem for two reasons:
> 	- the deprecation process was not followed
> 	- the Linux function pci_vfio_mmap_bar() is broken for i40e
> 
> The error code is reverted to the Unix value MAP_FAILED.
> Windows needs to define this special value (-1 as in Unix).
> After proper deprecation process, the API could be changed again
> if really needed.
> 
> Because of the switch from mmap() to rte_mem_map(),
> another part of the API was changed: "int additional_flags"
> are defined as "additional flags for the mapping range"
> without mentioning it was directly used in mmap().
> Currently it is directly used in rte_mem_map(),
> that's why the values rte_map_flags must be mapped (sic) on the mmap ones
> in case of Unix OS.
> 
> These are side effects of a badly defined API using Unix values.
> 
> Bugzilla ID: 503
> Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> Cc: talshn@mellanox.com
> 
> Reported-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---

<snip>

>   					/* unmap addrs correctly mapped */
>   					for (j = 0; j < i; j++)
>   						pci_unmap_resource(
> diff --git a/lib/librte_eal/include/rte_eal_paging.h b/lib/librte_eal/include/rte_eal_paging.h
> index ed98e70e9e..680a7f2505 100644
> --- a/lib/librte_eal/include/rte_eal_paging.h
> +++ b/lib/librte_eal/include/rte_eal_paging.h
> @@ -3,6 +3,7 @@
>    */
>   
>   #include <stdint.h>
> +#include <sys/mman.h>
>   
>   #include <rte_compat.h>
>   
> @@ -22,6 +23,7 @@ enum rte_mem_prot {
>   
>   /** Additional flags for memory mapping. */
>   enum rte_map_flags {
> +#ifdef RTE_EXEC_ENV_WINDOWS
>   	/** Changes to the mapped memory are visible to other processes. */
>   	RTE_MAP_SHARED = 1 << 0,
>   	/** Mapping is not backed by a regular file. */
> @@ -35,6 +37,12 @@ enum rte_map_flags {
>   	 * it is not required to do so, thus mapping with this flag may fail.
>   	 */
>   	RTE_MAP_FORCE_ADDRESS = 1 << 3
> +#else /* map mmap flags because they are exposed in pci_map_resource() API */
> +	RTE_MAP_SHARED = MAP_SHARED,
> +	RTE_MAP_ANONYMOUS = MAP_ANONYMOUS,
> +	RTE_MAP_PRIVATE = MAP_PRIVATE,
> +	RTE_MAP_FORCE_ADDRESS = MAP_FIXED,
> +#endif

I'm probably missing something, but why is this needed? Doesn't 
rte_mem_map() automatically translate these flags into proper ones? 
pci_map_resource() will call rte_mem_map(), and that will translate 
these flags into their Unix equivalents.
  
Thomas Monjalon July 10, 2020, 4:17 p.m. UTC | #3
10/07/2020 17:39, Burakov, Anatoly:
> On 10-Jul-20 12:53 PM, Thomas Monjalon wrote:
> > The function pci_map_resource() returns MAP_FAILED in case of error.
> > When replacing the call to mmap() by rte_mem_map(),
> > the error code became NULL, breaking the API.
> > This function is probably not used outside of DPDK,
> > but it is still a problem for two reasons:
> > 	- the deprecation process was not followed
> > 	- the Linux function pci_vfio_mmap_bar() is broken for i40e
> > 
> > The error code is reverted to the Unix value MAP_FAILED.
> > Windows needs to define this special value (-1 as in Unix).
> > After proper deprecation process, the API could be changed again
> > if really needed.
> > 
> > Because of the switch from mmap() to rte_mem_map(),
> > another part of the API was changed: "int additional_flags"
> > are defined as "additional flags for the mapping range"
> > without mentioning it was directly used in mmap().
> > Currently it is directly used in rte_mem_map(),
> > that's why the values rte_map_flags must be mapped (sic) on the mmap ones
> > in case of Unix OS.
> > 
> > These are side effects of a badly defined API using Unix values.
[...]
> >   /** Additional flags for memory mapping. */
> >   enum rte_map_flags {
> > +#ifdef RTE_EXEC_ENV_WINDOWS
> >   	/** Changes to the mapped memory are visible to other processes. */
> >   	RTE_MAP_SHARED = 1 << 0,
> >   	/** Mapping is not backed by a regular file. */
> > @@ -35,6 +37,12 @@ enum rte_map_flags {
> >   	 * it is not required to do so, thus mapping with this flag may fail.
> >   	 */
> >   	RTE_MAP_FORCE_ADDRESS = 1 << 3
> > +#else /* map mmap flags because they are exposed in pci_map_resource() API */
> > +	RTE_MAP_SHARED = MAP_SHARED,
> > +	RTE_MAP_ANONYMOUS = MAP_ANONYMOUS,
> > +	RTE_MAP_PRIVATE = MAP_PRIVATE,
> > +	RTE_MAP_FORCE_ADDRESS = MAP_FIXED,
> > +#endif
> 
> I'm probably missing something, but why is this needed? Doesn't 

Yes you missed reading the commit log :)
Or maybe it is not written clearly enough. Will try to rephrase.

> rte_mem_map() automatically translate these flags into proper ones? 
> pci_map_resource() will call rte_mem_map(), and that will translate 
> these flags into their Unix equivalents.

The problem is that we have an API which is taking mmap flags as input.
"int additional_flags" is a parameter of the function,
and are supposed to be mmap flags. But it is not stated clearly.
When Windows will use this function, it won't use mmap flags
but RTE_MAP_*. So we must accept both.
That's why the best is to make values the same.

In 20.11, we could change the API,
make clear that only RTE_MAP_* is accepted,
and remove this workaround.
Or even better, remove pci_map_resource from the PCI lib,
and implement it in the PCI bus driver.

pci_map_resource() function is a bad designed API
  
Thomas Monjalon July 10, 2020, 5:10 p.m. UTC | #4
10/07/2020 13:53, Thomas Monjalon:
> The function pci_map_resource() returns MAP_FAILED in case of error.
> When replacing the call to mmap() by rte_mem_map(),
> the error code became NULL, breaking the API.
> This function is probably not used outside of DPDK,
> but it is still a problem for two reasons:
> 	- the deprecation process was not followed
> 	- the Linux function pci_vfio_mmap_bar() is broken for i40e
> 
> The error code is reverted to the Unix value MAP_FAILED.
> Windows needs to define this special value (-1 as in Unix).
> After proper deprecation process, the API could be changed again
> if really needed.
> 
> Because of the switch from mmap() to rte_mem_map(),
> another part of the API was changed: "int additional_flags"
> are defined as "additional flags for the mapping range"
> without mentioning it was directly used in mmap().
> Currently it is directly used in rte_mem_map(),
> that's why the values rte_map_flags must be mapped (sic) on the mmap ones
> in case of Unix OS.
> 
> These are side effects of a badly defined API using Unix values.
> 
> Bugzilla ID: 503

This is urgent. i40e does not probe in some cases.
If no test is done, I don't merge it and Intel will complain about
validation issue in 20.08-rc1.
And next time, I won't rush on bug fixing.
  
Dmitry Kozlyuk July 10, 2020, 6:31 p.m. UTC | #5
My apologies for not examining mmap() usage thoroughly enough when designing
these internal APIs. I support your suggestion to move PCI mapping functions
to bus driver after proper deprecation.

On Fri, 10 Jul 2020 13:53:24 +0200, Thomas Monjalon wrote:
[snip]
> diff --git a/lib/librte_eal/include/rte_eal_paging.h b/lib/librte_eal/include/rte_eal_paging.h
> index ed98e70e9e..680a7f2505 100644
> --- a/lib/librte_eal/include/rte_eal_paging.h
> +++ b/lib/librte_eal/include/rte_eal_paging.h
> @@ -3,6 +3,7 @@
>   */
>  
>  #include <stdint.h>
> +#include <sys/mman.h>
>  
>  #include <rte_compat.h>
>  
> @@ -22,6 +23,7 @@ enum rte_mem_prot {
>  
>  /** Additional flags for memory mapping. */
>  enum rte_map_flags {
> +#ifdef RTE_EXEC_ENV_WINDOWS
>  	/** Changes to the mapped memory are visible to other processes. */
>  	RTE_MAP_SHARED = 1 << 0,
>  	/** Mapping is not backed by a regular file. */
> @@ -35,6 +37,12 @@ enum rte_map_flags {
>  	 * it is not required to do so, thus mapping with this flag may fail.
>  	 */
>  	RTE_MAP_FORCE_ADDRESS = 1 << 3
> +#else /* map mmap flags because they are exposed in pci_map_resource() API */
> +	RTE_MAP_SHARED = MAP_SHARED,
> +	RTE_MAP_ANONYMOUS = MAP_ANONYMOUS,
> +	RTE_MAP_PRIVATE = MAP_PRIVATE,
> +	RTE_MAP_FORCE_ADDRESS = MAP_FIXED,
> +#endif
>  };
>  
>  /**
> diff --git a/lib/librte_eal/windows/include/sys/mman.h b/lib/librte_eal/windows/include/sys/mman.h
> new file mode 100644
> index 0000000000..0b4b10df1f
> --- /dev/null
> +++ b/lib/librte_eal/windows/include/sys/mman.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2020 Mellanox Technologies, Ltd
> + */
> +
> +/*
> + * The syscall mmap does not exist on Windows,
> + * but this error code is used in a badly defined DPDK API for PCI mapping.
> + */
> +#define MAP_FAILED ((void *) -1)

I see why we can't help exposing <sys/mman.h> in <rte_eal_paging.h>, but I'm
against adding this stub header. Why not make #include <sys/mman.h>
conditional and define MAP_FAIELD in <rte_os.h> for Windows?

(While networking headers we have define things that are common to all
platforms, just expressed in incompatible ways, memory management is
inherently OS-specific.)
  
Thomas Monjalon July 10, 2020, 8:02 p.m. UTC | #6
10/07/2020 20:31, Dmitry Kozlyuk:
> On Fri, 10 Jul 2020 13:53:24 +0200, Thomas Monjalon wrote:
> > --- /dev/null
> > +++ b/lib/librte_eal/windows/include/sys/mman.h
> > @@ -0,0 +1,9 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright 2020 Mellanox Technologies, Ltd
> > + */
> > +
> > +/*
> > + * The syscall mmap does not exist on Windows,
> > + * but this error code is used in a badly defined DPDK API for PCI mapping.
> > + */
> > +#define MAP_FAILED ((void *) -1)
> 
> I see why we can't help exposing <sys/mman.h> in <rte_eal_paging.h>, but I'm
> against adding this stub header. Why not make #include <sys/mman.h>
> conditional and define MAP_FAIELD in <rte_os.h> for Windows?

Yes this macro can be in lib/librte_eal/windows/include/rte_os.h
I'll do a v2.
  
Ma, LihongX July 11, 2020, 3:18 a.m. UTC | #7
tested-by:ma,Lihong<lihongx.ma@intel.com>

Regards,
Ma,lihong

-----Original Message-----
From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
Sent: Friday, July 10, 2020 7:53 PM
To: dev@dpdk.org
Cc: david.marchand@redhat.com; Yigit, Ferruh <ferruh.yigit@intel.com>; grive@u256.net; Zhang, AlvinX <alvinx.zhang@intel.com>; Xing, Beilei <beilei.xing@intel.com>; Guo, Jia <jia.guo@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; dmitry.kozliuk@gmail.com; navasile@linux.microsoft.com; dmitrym@microsoft.com; Kadam, Pallavi <pallavi.kadam@intel.com>; talshn@mellanox.com
Subject: [dpdk-dev] [PATCH] pci: keep API compatibility with mmap values

The function pci_map_resource() returns MAP_FAILED in case of error.
When replacing the call to mmap() by rte_mem_map(), the error code became NULL, breaking the API.
This function is probably not used outside of DPDK, but it is still a problem for two reasons:
	- the deprecation process was not followed
	- the Linux function pci_vfio_mmap_bar() is broken for i40e

The error code is reverted to the Unix value MAP_FAILED.
Windows needs to define this special value (-1 as in Unix).
After proper deprecation process, the API could be changed again if really needed.

Because of the switch from mmap() to rte_mem_map(), another part of the API was changed: "int additional_flags"
are defined as "additional flags for the mapping range"
without mentioning it was directly used in mmap().
Currently it is directly used in rte_mem_map(), that's why the values rte_map_flags must be mapped (sic) on the mmap ones in case of Unix OS.

These are side effects of a badly defined API using Unix values.

Bugzilla ID: 503
Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
Cc: talshn@mellanox.com

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 drivers/bus/pci/bsd/pci.c                 | 2 +-
 drivers/bus/pci/linux/pci_uio.c           | 2 +-
 drivers/bus/pci/linux/pci_vfio.c          | 4 ++--
 drivers/bus/pci/pci_common_uio.c          | 2 +-
 lib/librte_eal/include/rte_eal_paging.h   | 8 ++++++++
 lib/librte_eal/windows/include/sys/mman.h | 9 +++++++++
 lib/librte_pci/rte_pci.c                  | 1 +
 lib/librte_pci/rte_pci.h                  | 2 +-
 8 files changed, 24 insertions(+), 6 deletions(-)  create mode 100644 lib/librte_eal/windows/include/sys/mman.h

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c index 8bc473eb9a..6ec27b4b5b 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -192,7 +192,7 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 	mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
 			(size_t)dev->mem_resource[res_idx].len, 0);
 	close(fd);
-	if (mapaddr == NULL)
+	if (mapaddr == MAP_FAILED)
 		goto error;
 
 	maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr;
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c index b622001539..097dc19225 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -345,7 +345,7 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 	mapaddr = pci_map_resource(pci_map_addr, fd, 0,
 			(size_t)dev->mem_resource[res_idx].len, 0);
 	close(fd);
-	if (mapaddr == NULL)
+	if (mapaddr == MAP_FAILED)
 		goto error;
 
 	pci_map_addr = RTE_PTR_ADD(mapaddr,
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fdeb9a8caf..07e072e13f 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -566,7 +566,7 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 		}
 
 		/* if there's a second part, try to map it */
-		if (map_addr != NULL
+		if (map_addr != MAP_FAILED
 			&& memreg[1].offset && memreg[1].size) {
 			void *second_addr = RTE_PTR_ADD(bar_addr,
 						(uintptr_t)(memreg[1].offset -
@@ -578,7 +578,7 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 							RTE_MAP_FORCE_ADDRESS);
 		}
 
-		if (map_addr == NULL) {
+		if (map_addr == NULL || map_addr == MAP_FAILED) {
 			munmap(bar_addr, bar->size);
 			bar_addr = MAP_FAILED;
 			RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n", diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 793dfd0a7c..f4dca9da91 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -58,7 +58,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 					"Cannot mmap device resource file %s to address: %p\n",
 					uio_res->maps[i].path,
 					uio_res->maps[i].addr);
-				if (mapaddr != NULL) {
+				if (mapaddr != MAP_FAILED) {
 					/* unmap addrs correctly mapped */
 					for (j = 0; j < i; j++)
 						pci_unmap_resource(
diff --git a/lib/librte_eal/include/rte_eal_paging.h b/lib/librte_eal/include/rte_eal_paging.h
index ed98e70e9e..680a7f2505 100644
--- a/lib/librte_eal/include/rte_eal_paging.h
+++ b/lib/librte_eal/include/rte_eal_paging.h
@@ -3,6 +3,7 @@
  */
 
 #include <stdint.h>
+#include <sys/mman.h>
 
 #include <rte_compat.h>
 
@@ -22,6 +23,7 @@ enum rte_mem_prot {
 
 /** Additional flags for memory mapping. */  enum rte_map_flags {
+#ifdef RTE_EXEC_ENV_WINDOWS
 	/** Changes to the mapped memory are visible to other processes. */
 	RTE_MAP_SHARED = 1 << 0,
 	/** Mapping is not backed by a regular file. */ @@ -35,6 +37,12 @@ enum rte_map_flags {
 	 * it is not required to do so, thus mapping with this flag may fail.
 	 */
 	RTE_MAP_FORCE_ADDRESS = 1 << 3
+#else /* map mmap flags because they are exposed in pci_map_resource() API */
+	RTE_MAP_SHARED = MAP_SHARED,
+	RTE_MAP_ANONYMOUS = MAP_ANONYMOUS,
+	RTE_MAP_PRIVATE = MAP_PRIVATE,
+	RTE_MAP_FORCE_ADDRESS = MAP_FIXED,
+#endif
 };
 
 /**
diff --git a/lib/librte_eal/windows/include/sys/mman.h b/lib/librte_eal/windows/include/sys/mman.h
new file mode 100644
index 0000000000..0b4b10df1f
--- /dev/null
+++ b/lib/librte_eal/windows/include/sys/mman.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd  */
+
+/*
+ * The syscall mmap does not exist on Windows,
+ * but this error code is used in a badly defined DPDK API for PCI mapping.
+ */
+#define MAP_FAILED ((void *) -1)
diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c index d8272b9076..1d1cbc75ac 100644
--- a/lib/librte_pci/rte_pci.c
+++ b/lib/librte_pci/rte_pci.c
@@ -163,6 +163,7 @@ pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
 			__func__, fd, requested_addr, size,
 			(unsigned long long)offset,
 			rte_strerror(rte_errno), mapaddr);
+		mapaddr = MAP_FAILED; /* API uses mmap error code */
 	} else
 		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
 
diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h index 104b2bb858..a03235da1f 100644
--- a/lib/librte_pci/rte_pci.h
+++ b/lib/librte_pci/rte_pci.h
@@ -160,7 +160,7 @@ int rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr);
  *      The additional flags for the mapping range.
  * @return
  *   - On success, the function returns a pointer to the mapped area.
- *   - On error, NULL is returned.
+ *   - On error, MAP_FAILED is returned.
  */
 void *pci_map_resource(void *requested_addr, int fd, off_t offset,
 		size_t size, int additional_flags);
--
2.27.0
  
Anatoly Burakov July 13, 2020, 8:56 a.m. UTC | #8
On 10-Jul-20 5:17 PM, Thomas Monjalon wrote:
> 10/07/2020 17:39, Burakov, Anatoly:
>> On 10-Jul-20 12:53 PM, Thomas Monjalon wrote:
>>> The function pci_map_resource() returns MAP_FAILED in case of error.
>>> When replacing the call to mmap() by rte_mem_map(),
>>> the error code became NULL, breaking the API.
>>> This function is probably not used outside of DPDK,
>>> but it is still a problem for two reasons:
>>> 	- the deprecation process was not followed
>>> 	- the Linux function pci_vfio_mmap_bar() is broken for i40e
>>>
>>> The error code is reverted to the Unix value MAP_FAILED.
>>> Windows needs to define this special value (-1 as in Unix).
>>> After proper deprecation process, the API could be changed again
>>> if really needed.
>>>
>>> Because of the switch from mmap() to rte_mem_map(),
>>> another part of the API was changed: "int additional_flags"
>>> are defined as "additional flags for the mapping range"
>>> without mentioning it was directly used in mmap().
>>> Currently it is directly used in rte_mem_map(),
>>> that's why the values rte_map_flags must be mapped (sic) on the mmap ones
>>> in case of Unix OS.
>>>
>>> These are side effects of a badly defined API using Unix values.
> [...]
>>>    /** Additional flags for memory mapping. */
>>>    enum rte_map_flags {
>>> +#ifdef RTE_EXEC_ENV_WINDOWS
>>>    	/** Changes to the mapped memory are visible to other processes. */
>>>    	RTE_MAP_SHARED = 1 << 0,
>>>    	/** Mapping is not backed by a regular file. */
>>> @@ -35,6 +37,12 @@ enum rte_map_flags {
>>>    	 * it is not required to do so, thus mapping with this flag may fail.
>>>    	 */
>>>    	RTE_MAP_FORCE_ADDRESS = 1 << 3
>>> +#else /* map mmap flags because they are exposed in pci_map_resource() API */
>>> +	RTE_MAP_SHARED = MAP_SHARED,
>>> +	RTE_MAP_ANONYMOUS = MAP_ANONYMOUS,
>>> +	RTE_MAP_PRIVATE = MAP_PRIVATE,
>>> +	RTE_MAP_FORCE_ADDRESS = MAP_FIXED,
>>> +#endif
>>
>> I'm probably missing something, but why is this needed? Doesn't
> 
> Yes you missed reading the commit log :)
> Or maybe it is not written clearly enough. Will try to rephrase.
> 
>> rte_mem_map() automatically translate these flags into proper ones?
>> pci_map_resource() will call rte_mem_map(), and that will translate
>> these flags into their Unix equivalents.
> 
> The problem is that we have an API which is taking mmap flags as input.
> "int additional_flags" is a parameter of the function,
> and are supposed to be mmap flags. But it is not stated clearly.
> When Windows will use this function, it won't use mmap flags
> but RTE_MAP_*. So we must accept both.
> That's why the best is to make values the same.
> 
> In 20.11, we could change the API,
> make clear that only RTE_MAP_* is accepted,
> and remove this workaround.
> Or even better, remove pci_map_resource from the PCI lib,
> and implement it in the PCI bus driver.
> 
> pci_map_resource() function is a bad designed API
> 
> 

Right, this makes it clearer :) Thanks!
  
David Marchand July 15, 2020, 8:01 a.m. UTC | #9
On Fri, Jul 10, 2020 at 6:17 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> The problem is that we have an API which is taking mmap flags as input.
> "int additional_flags" is a parameter of the function,
> and are supposed to be mmap flags. But it is not stated clearly.
> When Windows will use this function, it won't use mmap flags
> but RTE_MAP_*. So we must accept both.
> That's why the best is to make values the same.
>
> In 20.11, we could change the API,
> make clear that only RTE_MAP_* is accepted,
> and remove this workaround.
> Or even better, remove pci_map_resource from the PCI lib,
> and implement it in the PCI bus driver.

My preference goes to simple removal.
Maybe it can be handled/rediscussed when the windows porting touches
pci resources mappings.

Removal needs to be announced, so something to plan right now.
  

Patch

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 8bc473eb9a..6ec27b4b5b 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -192,7 +192,7 @@  pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 	mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
 			(size_t)dev->mem_resource[res_idx].len, 0);
 	close(fd);
-	if (mapaddr == NULL)
+	if (mapaddr == MAP_FAILED)
 		goto error;
 
 	maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr;
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index b622001539..097dc19225 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -345,7 +345,7 @@  pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 	mapaddr = pci_map_resource(pci_map_addr, fd, 0,
 			(size_t)dev->mem_resource[res_idx].len, 0);
 	close(fd);
-	if (mapaddr == NULL)
+	if (mapaddr == MAP_FAILED)
 		goto error;
 
 	pci_map_addr = RTE_PTR_ADD(mapaddr,
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fdeb9a8caf..07e072e13f 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -566,7 +566,7 @@  pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 		}
 
 		/* if there's a second part, try to map it */
-		if (map_addr != NULL
+		if (map_addr != MAP_FAILED
 			&& memreg[1].offset && memreg[1].size) {
 			void *second_addr = RTE_PTR_ADD(bar_addr,
 						(uintptr_t)(memreg[1].offset -
@@ -578,7 +578,7 @@  pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 							RTE_MAP_FORCE_ADDRESS);
 		}
 
-		if (map_addr == NULL) {
+		if (map_addr == NULL || map_addr == MAP_FAILED) {
 			munmap(bar_addr, bar->size);
 			bar_addr = MAP_FAILED;
 			RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n",
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 793dfd0a7c..f4dca9da91 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -58,7 +58,7 @@  pci_uio_map_secondary(struct rte_pci_device *dev)
 					"Cannot mmap device resource file %s to address: %p\n",
 					uio_res->maps[i].path,
 					uio_res->maps[i].addr);
-				if (mapaddr != NULL) {
+				if (mapaddr != MAP_FAILED) {
 					/* unmap addrs correctly mapped */
 					for (j = 0; j < i; j++)
 						pci_unmap_resource(
diff --git a/lib/librte_eal/include/rte_eal_paging.h b/lib/librte_eal/include/rte_eal_paging.h
index ed98e70e9e..680a7f2505 100644
--- a/lib/librte_eal/include/rte_eal_paging.h
+++ b/lib/librte_eal/include/rte_eal_paging.h
@@ -3,6 +3,7 @@ 
  */
 
 #include <stdint.h>
+#include <sys/mman.h>
 
 #include <rte_compat.h>
 
@@ -22,6 +23,7 @@  enum rte_mem_prot {
 
 /** Additional flags for memory mapping. */
 enum rte_map_flags {
+#ifdef RTE_EXEC_ENV_WINDOWS
 	/** Changes to the mapped memory are visible to other processes. */
 	RTE_MAP_SHARED = 1 << 0,
 	/** Mapping is not backed by a regular file. */
@@ -35,6 +37,12 @@  enum rte_map_flags {
 	 * it is not required to do so, thus mapping with this flag may fail.
 	 */
 	RTE_MAP_FORCE_ADDRESS = 1 << 3
+#else /* map mmap flags because they are exposed in pci_map_resource() API */
+	RTE_MAP_SHARED = MAP_SHARED,
+	RTE_MAP_ANONYMOUS = MAP_ANONYMOUS,
+	RTE_MAP_PRIVATE = MAP_PRIVATE,
+	RTE_MAP_FORCE_ADDRESS = MAP_FIXED,
+#endif
 };
 
 /**
diff --git a/lib/librte_eal/windows/include/sys/mman.h b/lib/librte_eal/windows/include/sys/mman.h
new file mode 100644
index 0000000000..0b4b10df1f
--- /dev/null
+++ b/lib/librte_eal/windows/include/sys/mman.h
@@ -0,0 +1,9 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+/*
+ * The syscall mmap does not exist on Windows,
+ * but this error code is used in a badly defined DPDK API for PCI mapping.
+ */
+#define MAP_FAILED ((void *) -1)
diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c
index d8272b9076..1d1cbc75ac 100644
--- a/lib/librte_pci/rte_pci.c
+++ b/lib/librte_pci/rte_pci.c
@@ -163,6 +163,7 @@  pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
 			__func__, fd, requested_addr, size,
 			(unsigned long long)offset,
 			rte_strerror(rte_errno), mapaddr);
+		mapaddr = MAP_FAILED; /* API uses mmap error code */
 	} else
 		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
 
diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h
index 104b2bb858..a03235da1f 100644
--- a/lib/librte_pci/rte_pci.h
+++ b/lib/librte_pci/rte_pci.h
@@ -160,7 +160,7 @@  int rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr);
  *      The additional flags for the mapping range.
  * @return
  *   - On success, the function returns a pointer to the mapped area.
- *   - On error, NULL is returned.
+ *   - On error, MAP_FAILED is returned.
  */
 void *pci_map_resource(void *requested_addr, int fd, off_t offset,
 		size_t size, int additional_flags);