[2/2] eal: fix loading of shared libs from driver plugin directories

Message ID 4a04b06b040ac16c45addf92fb43f59b070f185a.1606229937.git.tredaelli@redhat.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series Fix shared lib detection on Fedora/CentOS/RHEL |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed

Commit Message

Timothy Redaelli Nov. 24, 2020, 3:14 p.m. UTC
  Commit 49b536fc3060 ("eal: load only shared libs from driver plugin directories")
introduced a check that any shared library must ends with .so, but it can't
work, at least, on Fedora/CentOS/RHEL since .so symlinks are not installed
when you install dpdk package, but only when you install dpdk-devel package.

This commit adds also a check for .so.ABI_VERSION to check for shared
lib.

See Fedora Packaging Guidelines for more informations:
https://docs.fedoraproject.org/en-US/packaging-guidelines/#_devel_packages

Fixes: 49b536fc3060 ("eal: load only shared libs from driver plugin directories")
Cc: bruce.richardson@intel.com
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
---
 lib/librte_eal/common/eal_common_options.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
  

Comments

Bruce Richardson Nov. 24, 2020, 3:22 p.m. UTC | #1
On Tue, Nov 24, 2020 at 04:14:15PM +0100, Timothy Redaelli wrote:
> Commit 49b536fc3060 ("eal: load only shared libs from driver plugin directories")
> introduced a check that any shared library must ends with .so, but it can't
> work, at least, on Fedora/CentOS/RHEL since .so symlinks are not installed
> when you install dpdk package, but only when you install dpdk-devel package.
> 
> This commit adds also a check for .so.ABI_VERSION to check for shared
> lib.
> 
> See Fedora Packaging Guidelines for more informations:
> https://docs.fedoraproject.org/en-US/packaging-guidelines/#_devel_packages
> 
> Fixes: 49b536fc3060 ("eal: load only shared libs from driver plugin directories")
> Cc: bruce.richardson@intel.com
> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
David Marchand Nov. 25, 2020, 10:01 a.m. UTC | #2
On Tue, Nov 24, 2020 at 4:15 PM Timothy Redaelli <tredaelli@redhat.com> wrote:
>
> Commit 49b536fc3060 ("eal: load only shared libs from driver plugin directories")
> introduced a check that any shared library must ends with .so, but it can't
> work, at least, on Fedora/CentOS/RHEL since .so symlinks are not installed
> when you install dpdk package, but only when you install dpdk-devel package.
>
> This commit adds also a check for .so.ABI_VERSION to check for shared
> lib.
>
> See Fedora Packaging Guidelines for more informations:
> https://docs.fedoraproject.org/en-US/packaging-guidelines/#_devel_packages
>
> Fixes: 49b536fc3060 ("eal: load only shared libs from driver plugin directories")
> Cc: bruce.richardson@intel.com
> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
> ---
>  lib/librte_eal/common/eal_common_options.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
> index e6f77ad217..e9e2362c53 100644
> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -402,8 +402,10 @@ eal_plugindir_init(const char *path)
>                 struct stat sb;
>                 int nlen = strnlen(dent->d_name, sizeof(dent->d_name));
>
> -               /* check if name ends in .so */
> -               if (strcmp(&dent->d_name[nlen - 3], ".so") != 0)
> +               /* check if name ends in .so or .so.ABI_VERSION */
> +               if (strcmp(&dent->d_name[nlen - 3], ".so") != 0 &&
> +                   strcmp(&dent->d_name[nlen - 4 - strlen(ABI_VERSION)],
> +                          ".so."ABI_VERSION) != 0)
>                         continue;
>
>                 snprintf(sopath, sizeof(sopath), "%s/%s", path, dent->d_name);

A first doubt I had about this patch is about multiple loading of the
same driver, and its effects.
So I first had a look at the main branch current state (using
devtools/test-null.sh) and I can see multiple loading already happens
for statically linked drivers like the bond driver.

$ LD_LIBRARY_PATH=build/lib gdb build/app/dpdk-testpmd -ex "b main"
-ex "run -c 3 --no-huge -m 20 -d build/drivers '--log-level=*:debug'
-a 0:0.0 --vdev net_null1 --vdev net_null2 -- --no-mlockall
--total-num-mbufs=2048 -ia"
...
(gdb) info sharedlibrary
...
0x00007ffff796d5b0  0x00007ffff797d2d6  Yes
/home/dmarchan/dpdk/build/app/../drivers/librte_net_bond.so.21
...
(gdb) b rte_eal_init
(gdb) c
(gdb) finish
...
EAL: open shared lib build/drivers/librte_net_avp.so
EAL: open shared lib build/drivers/librte_net_bond.so
EAL: open shared lib build/drivers/librte_net_ixgbe.so
...
(gdb) set $elm=*(struct shared_driver *)solib_list.tqh_first
(gdb) while $elm
p $elm
set $elm=*(struct shared_driver *)($elm.next.tqe_next)
end
...
$67 = {next = {tqe_next = 0x5c3350, tqe_prev = 0x5c1310}, name =
"build/drivers/librte_net_avp.so", '\000' <repeats 4064 times>,
lib_handle = 0x616870}
$68 = {next = {tqe_next = 0x5c4370, tqe_prev = 0x5c2330}, name =
"build/drivers/librte_net_bond.so", '\000' <repeats 4063 times>,
lib_handle = 0x7ffff7995a80}
$69 = {next = {tqe_next = 0x5c5390, tqe_prev = 0x5c3350}, name =
"build/drivers/librte_net_ixgbe.so", '\000' <repeats 4062 times>,
lib_handle = 0x7ffff7942fc0}
...

I could not confirm the 0x7ffff7995a80 handle points at the first load
of the bond driver.
gdb does not seem to know of the linker internal structures, a bit
hard to tell but the addresses are in the same range, so likely the
same object.

I confirmed the bond constructor only being called once when the bond
driver is loaded because of static link:

$ LD_LIBRARY_PATH=build/lib gdb build/app/dpdk-testpmd -ex "b
vdrvinitfn_pmd_bond_drv" -ex "run -c 3 --no-huge -m 20 -d
build/drivers '--log-level=*:debug' -a 0:0.0 --vdev net_null1 --vdev
net_null2 -- --no-mlockall --total-num-mbufs=2048 -ia"
GNU gdb (GDB) Fedora 8.3.50.20190824-30.fc31
...
Reading symbols from build/app/dpdk-testpmd...
Function "vdrvinitfn_pmd_bond_drv" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (vdrvinitfn_pmd_bond_drv) pending.
Starting program: /home/dmarchan/dpdk/build/app/dpdk-testpmd -c 3
--no-huge -m 20 -d build/drivers '--log-level=*:debug' -a 0:0.0 --vdev
net_null1 --vdev net_null2 -- --no-mlockall --total-num-mbufs=2048 -ia
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".

Breakpoint 1, vdrvinitfn_pmd_bond_drv () at
../drivers/net/bonding/rte_eth_bond_pmd.c:3755
3755    RTE_PMD_REGISTER_VDEV(net_bonding, pmd_bond_drv);
Missing separate debuginfos, use: dnf debuginfo-install
elfutils-libelf-0.179-2.fc31.x86_64 glibc-2.30-13.fc31.x86_64
jansson-2.12-4.fc31.x86_64 libbsd-0.9.1-4.fc31.x86_64
libfdt-1.6.0-1.fc31.x86_64 numactl-libs-2.0.12-3.fc31.x86_64
zlib-1.2.11-20.fc31.x86_64

(gdb) c
Continuing.
...
EAL: open shared lib build/drivers/librte_net_avp.so
EAL: open shared lib build/drivers/librte_net_bond.so
EAL: open shared lib build/drivers/librte_net_ixgbe.so
...
testpmd>


Now, about this patch.
In this test, it has the effect of loading all drivers twice (or
thrice if statically linked).
I see no problem with it since the linker is intelligent enough and
constructors are only called once.


My only remaining doubt is whether we should be looking for
.ABI_VERSION or .ABI_MAJOR extension.
This could be discussed, but in its current form, this patch lgtm.

Acked-by: David Marchand <david.marchand@redhat.com>
  
Ferruh Yigit Nov. 25, 2020, 11:05 a.m. UTC | #3
On 11/25/2020 10:01 AM, David Marchand wrote:
> On Tue, Nov 24, 2020 at 4:15 PM Timothy Redaelli <tredaelli@redhat.com> wrote:
>>
>> Commit 49b536fc3060 ("eal: load only shared libs from driver plugin directories")
>> introduced a check that any shared library must ends with .so, but it can't
>> work, at least, on Fedora/CentOS/RHEL since .so symlinks are not installed
>> when you install dpdk package, but only when you install dpdk-devel package.
>>
>> This commit adds also a check for .so.ABI_VERSION to check for shared
>> lib.
>>
>> See Fedora Packaging Guidelines for more informations:
>> https://docs.fedoraproject.org/en-US/packaging-guidelines/#_devel_packages
>>
>> Fixes: 49b536fc3060 ("eal: load only shared libs from driver plugin directories")
>> Cc: bruce.richardson@intel.com
>> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
>> ---
>>   lib/librte_eal/common/eal_common_options.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
>> index e6f77ad217..e9e2362c53 100644
>> --- a/lib/librte_eal/common/eal_common_options.c
>> +++ b/lib/librte_eal/common/eal_common_options.c
>> @@ -402,8 +402,10 @@ eal_plugindir_init(const char *path)
>>                  struct stat sb;
>>                  int nlen = strnlen(dent->d_name, sizeof(dent->d_name));
>>
>> -               /* check if name ends in .so */
>> -               if (strcmp(&dent->d_name[nlen - 3], ".so") != 0)
>> +               /* check if name ends in .so or .so.ABI_VERSION */
>> +               if (strcmp(&dent->d_name[nlen - 3], ".so") != 0 &&
>> +                   strcmp(&dent->d_name[nlen - 4 - strlen(ABI_VERSION)],
>> +                          ".so."ABI_VERSION) != 0)
>>                          continue;
>>
>>                  snprintf(sopath, sizeof(sopath), "%s/%s", path, dent->d_name);
> 
> A first doubt I had about this patch is about multiple loading of the
> same driver, and its effects.
> So I first had a look at the main branch current state (using
> devtools/test-null.sh) and I can see multiple loading already happens
> for statically linked drivers like the bond driver.
> 
> $ LD_LIBRARY_PATH=build/lib gdb build/app/dpdk-testpmd -ex "b main"
> -ex "run -c 3 --no-huge -m 20 -d build/drivers '--log-level=*:debug'
> -a 0:0.0 --vdev net_null1 --vdev net_null2 -- --no-mlockall
> --total-num-mbufs=2048 -ia"
> ...
> (gdb) info sharedlibrary
> ...
> 0x00007ffff796d5b0  0x00007ffff797d2d6  Yes
> /home/dmarchan/dpdk/build/app/../drivers/librte_net_bond.so.21
> ...
> (gdb) b rte_eal_init
> (gdb) c
> (gdb) finish
> ...
> EAL: open shared lib build/drivers/librte_net_avp.so
> EAL: open shared lib build/drivers/librte_net_bond.so
> EAL: open shared lib build/drivers/librte_net_ixgbe.so
> ...
> (gdb) set $elm=*(struct shared_driver *)solib_list.tqh_first
> (gdb) while $elm
> p $elm
> set $elm=*(struct shared_driver *)($elm.next.tqe_next)
> end
> ...
> $67 = {next = {tqe_next = 0x5c3350, tqe_prev = 0x5c1310}, name =
> "build/drivers/librte_net_avp.so", '\000' <repeats 4064 times>,
> lib_handle = 0x616870}
> $68 = {next = {tqe_next = 0x5c4370, tqe_prev = 0x5c2330}, name =
> "build/drivers/librte_net_bond.so", '\000' <repeats 4063 times>,
> lib_handle = 0x7ffff7995a80}
> $69 = {next = {tqe_next = 0x5c5390, tqe_prev = 0x5c3350}, name =
> "build/drivers/librte_net_ixgbe.so", '\000' <repeats 4062 times>,
> lib_handle = 0x7ffff7942fc0}
> ...
> 
> I could not confirm the 0x7ffff7995a80 handle points at the first load
> of the bond driver.
> gdb does not seem to know of the linker internal structures, a bit
> hard to tell but the addresses are in the same range, so likely the
> same object.
> 
> I confirmed the bond constructor only being called once when the bond
> driver is loaded because of static link:
> 
> $ LD_LIBRARY_PATH=build/lib gdb build/app/dpdk-testpmd -ex "b
> vdrvinitfn_pmd_bond_drv" -ex "run -c 3 --no-huge -m 20 -d
> build/drivers '--log-level=*:debug' -a 0:0.0 --vdev net_null1 --vdev
> net_null2 -- --no-mlockall --total-num-mbufs=2048 -ia"
> GNU gdb (GDB) Fedora 8.3.50.20190824-30.fc31
> ...
> Reading symbols from build/app/dpdk-testpmd...
> Function "vdrvinitfn_pmd_bond_drv" not defined.
> Make breakpoint pending on future shared library load? (y or [n]) y
> Breakpoint 1 (vdrvinitfn_pmd_bond_drv) pending.
> Starting program: /home/dmarchan/dpdk/build/app/dpdk-testpmd -c 3
> --no-huge -m 20 -d build/drivers '--log-level=*:debug' -a 0:0.0 --vdev
> net_null1 --vdev net_null2 -- --no-mlockall --total-num-mbufs=2048 -ia
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/usr/lib64/libthread_db.so.1".
> 
> Breakpoint 1, vdrvinitfn_pmd_bond_drv () at
> ../drivers/net/bonding/rte_eth_bond_pmd.c:3755
> 3755    RTE_PMD_REGISTER_VDEV(net_bonding, pmd_bond_drv);
> Missing separate debuginfos, use: dnf debuginfo-install
> elfutils-libelf-0.179-2.fc31.x86_64 glibc-2.30-13.fc31.x86_64
> jansson-2.12-4.fc31.x86_64 libbsd-0.9.1-4.fc31.x86_64
> libfdt-1.6.0-1.fc31.x86_64 numactl-libs-2.0.12-3.fc31.x86_64
> zlib-1.2.11-20.fc31.x86_64
> 
> (gdb) c
> Continuing.
> ...
> EAL: open shared lib build/drivers/librte_net_avp.so
> EAL: open shared lib build/drivers/librte_net_bond.so
> EAL: open shared lib build/drivers/librte_net_ixgbe.so
> ...
> testpmd>
> 
> 
> Now, about this patch.
> In this test, it has the effect of loading all drivers twice (or
> thrice if statically linked).
> I see no problem with it since the linker is intelligent enough and
> constructors are only called once.
> 
> 
> My only remaining doubt is whether we should be looking for
> .ABI_VERSION or .ABI_MAJOR extension.
> This could be discussed, but in its current form, this patch lgtm.
> 
> Acked-by: David Marchand <david.marchand@redhat.com>
> 
> 

I also checked the same, yes same file processed twice, with actual name and 
symlink, but since eal using 'realpath()' before 'dlopen()' the path passed to 
the 'dlopen()' is exact same for both symlink and actual file.

And 'dlopen()' has following in the man page:
"
  If  the  same  shared object is opened again with dlopen(), the same object 
handle is returned.  The dynamic linker maintains reference counts for object 
handles, so a dynamically loaded shared object is not deallocated until 
dlclose() has been called on it as many times as dlopen() has succeeded on it. 
Constructors (see below) are called only when the object is actually loaded into 
memory (i.e., when the reference count increases to 1).
"

As I run/debug with or without the symlinks, it looks OK with patch,
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
  

Patch

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index e6f77ad217..e9e2362c53 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -402,8 +402,10 @@  eal_plugindir_init(const char *path)
 		struct stat sb;
 		int nlen = strnlen(dent->d_name, sizeof(dent->d_name));
 
-		/* check if name ends in .so */
-		if (strcmp(&dent->d_name[nlen - 3], ".so") != 0)
+		/* check if name ends in .so or .so.ABI_VERSION */
+		if (strcmp(&dent->d_name[nlen - 3], ".so") != 0 &&
+		    strcmp(&dent->d_name[nlen - 4 - strlen(ABI_VERSION)],
+			   ".so."ABI_VERSION) != 0)
 			continue;
 
 		snprintf(sopath, sizeof(sopath), "%s/%s", path, dent->d_name);