build: avoid --as-needed as it causes overlinking

Message ID 20190828122752.27887-1-christian.ehrhardt@canonical.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series build: avoid --as-needed as it causes overlinking |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-dpdk_compile_ovs success Compile Testing PASS
ci/iol-dpdk_compile_spdk success Compile Testing PASS
ci/iol-dpdk_compile success Compile Testing PASS
ci/intel-Performance success Performance Testing PASS
ci/mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Christian Ehrhardt Aug. 28, 2019, 12:27 p.m. UTC
  A while ago telemetry was added in 57ae0ec6 and it also added as-needed
to config/meson.build. This seems no more needed these days as due to other
build changes the ordering in buildlogs is:
  [...] -lrte_telemetry [...] -Wl,--no-as-needed [...]
Which means telemetry no more benefits from --no-as-needed anyway.

Overlinking problems get triggered by the meson generated pkgconfig which
will have:
   [...] -Wl,--no-as-needed <somelibsusedbydpdk>
This will overlink <somelibs> and in addition anything that follows
as it also doesn't wrap back to --as-needed. So if a projects includes
dpdk libs + <other> it will also consider <other> with --no-as-needed.

Fixes: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1841759

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 config/meson.build | 3 ---
 1 file changed, 3 deletions(-)
  

Comments

Luca Boccassi Aug. 28, 2019, 1:49 p.m. UTC | #1
On Wed, 2019-08-28 at 14:27 +0200, Christian Ehrhardt wrote:
> A while ago telemetry was added in 57ae0ec6 and it also added as-
> needed
> to config/meson.build. This seems no more needed these days as due to
> other
> build changes the ordering in buildlogs is:
>   [...] -lrte_telemetry [...] -Wl,--no-as-needed [...]
> Which means telemetry no more benefits from --no-as-needed anyway.
> 
> Overlinking problems get triggered by the meson generated pkgconfig
> which
> will have:
>    [...] -Wl,--no-as-needed <somelibsusedbydpdk>
> This will overlink <somelibs> and in addition anything that follows
> as it also doesn't wrap back to --as-needed. So if a projects
> includes
> dpdk libs + <other> it will also consider <other> with --no-as-
> needed.
> 
> Fixes: 
> https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1841759
> 
> 
> Signed-off-by: Christian Ehrhardt <
> christian.ehrhardt@canonical.com
> >
> ---
>  config/meson.build | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/config/meson.build b/config/meson.build
> index 2bafea530..58800a980 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -93,9 +93,6 @@ dpdk_conf.set('RTE_TOOLCHAIN_' +
> toolchain.to_upper(), 1)
>  
>  dpdk_conf.set('RTE_ARCH_64', cc.sizeof('void *') == 8)
>  
> -add_project_link_arguments('-Wl,--no-as-needed', language: 'c')
> -dpdk_extra_ldflags += '-Wl,--no-as-needed'
> -
>  # use pthreads
>  add_project_link_arguments('-pthread', language: 'c')
>  dpdk_extra_ldflags += '-pthread'
> 

Acked-by: Luca Boccassi <bluca@debian.org>
  
Aaron Conole Aug. 28, 2019, 1:53 p.m. UTC | #2
Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:

> A while ago telemetry was added in 57ae0ec6 and it also added as-needed
> to config/meson.build. This seems no more needed these days as due to other
> build changes the ordering in buildlogs is:
>   [...] -lrte_telemetry [...] -Wl,--no-as-needed [...]
> Which means telemetry no more benefits from --no-as-needed anyway.
>
> Overlinking problems get triggered by the meson generated pkgconfig which
> will have:
>    [...] -Wl,--no-as-needed <somelibsusedbydpdk>
> This will overlink <somelibs> and in addition anything that follows
> as it also doesn't wrap back to --as-needed. So if a projects includes
> dpdk libs + <other> it will also consider <other> with --no-as-needed.
>
> Fixes: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1841759
>
> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> ---

Hi Christian,

I agree this is something to be fixed.  It will need additional work,
though:

  https://travis-ci.com/ovsrobot/dpdk/builds/124909245

The unit tests are failing with this patch.

>  config/meson.build | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/config/meson.build b/config/meson.build
> index 2bafea530..58800a980 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -93,9 +93,6 @@ dpdk_conf.set('RTE_TOOLCHAIN_' + toolchain.to_upper(), 1)
>  
>  dpdk_conf.set('RTE_ARCH_64', cc.sizeof('void *') == 8)
>  
> -add_project_link_arguments('-Wl,--no-as-needed', language: 'c')
> -dpdk_extra_ldflags += '-Wl,--no-as-needed'
> -
>  # use pthreads
>  add_project_link_arguments('-pthread', language: 'c')
>  dpdk_extra_ldflags += '-pthread'
  
Christian Ehrhardt Aug. 28, 2019, 2:48 p.m. UTC | #3
On Wed, Aug 28, 2019 at 3:53 PM Aaron Conole <aconole@redhat.com> wrote:
>
> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>
> > A while ago telemetry was added in 57ae0ec6 and it also added as-needed
> > to config/meson.build. This seems no more needed these days as due to other
> > build changes the ordering in buildlogs is:
> >   [...] -lrte_telemetry [...] -Wl,--no-as-needed [...]
> > Which means telemetry no more benefits from --no-as-needed anyway.
> >
> > Overlinking problems get triggered by the meson generated pkgconfig which
> > will have:
> >    [...] -Wl,--no-as-needed <somelibsusedbydpdk>
> > This will overlink <somelibs> and in addition anything that follows
> > as it also doesn't wrap back to --as-needed. So if a projects includes
> > dpdk libs + <other> it will also consider <other> with --no-as-needed.
> >
> > Fixes: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1841759
> >
> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> > ---
>
> Hi Christian,
>
> I agree this is something to be fixed.  It will need additional work,
> though:
>
>   https://travis-ci.com/ovsrobot/dpdk/builds/124909245
>

Thanks for the Link Aaron, yet I'm puzzled what to do there atm.

The kind of error I found in the failing logs were misleading at first:
- linker can't find -lvirt / -lpqos / ...
  well the test env needs to install them, maybe it was added as
dependency by accident before?
  I'd understand (due to the change) if it would complain about missing symbols
  (no more added due to as-needed, but then for some reason needed)
  But this is vice versa, it just doesn't find the libs in the build env
- error: unrecognized command line option '-Wformat-truncation'
  I don't see how I'd cause this ...
=> Maybe this is just an artifact that is even part of the normal/good tests?


Comparing former logs - last good test was
https://travis-ci.com/ovsrobot/dpdk/builds/124875383
This first seemed more helpful.

DPDK:fast-tests / eal_flags_w_opt_autotest  FAIL
DPDK:fast-tests / func_reentrancy_autotest  FAIL
DPDK:fast-tests / mbuf_autotest         FAIL
DPDK:fast-tests / mempool_autotest      FAIL
DPDK:fast-tests / ring_pmd_autotest     FAIL
DPDK:fast-tests / sched_autotest        FAIL
DPDK:fast-tests / table_autotest        FAIL
[...]
Overall about 14/60 of the tests failed with no recognizable pattern
why just those and not the others.

I only see "Full log written ... on_error", so I can't directly
compare how a good run would look in the configure/build stage.
Looking just at the bad case there are plenty of messages like
- "no available hugepages"
- "cannot reserve memory", ..
But all those indicate more a flaky test(-env) than an error in the
commit, there must be more to it.

@Aaron is there a good way to get the rest of the log for a good case
to compare?

Maybe I'm yet to blind for all the potential side effects of the change?
  
Aaron Conole Aug. 28, 2019, 3:14 p.m. UTC | #4
Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:

> On Wed, Aug 28, 2019 at 3:53 PM Aaron Conole <aconole@redhat.com> wrote:
>>
>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>>
>> > A while ago telemetry was added in 57ae0ec6 and it also added as-needed
>> > to config/meson.build. This seems no more needed these days as due to other
>> > build changes the ordering in buildlogs is:
>> >   [...] -lrte_telemetry [...] -Wl,--no-as-needed [...]
>> > Which means telemetry no more benefits from --no-as-needed anyway.
>> >
>> > Overlinking problems get triggered by the meson generated pkgconfig which
>> > will have:
>> >    [...] -Wl,--no-as-needed <somelibsusedbydpdk>
>> > This will overlink <somelibs> and in addition anything that follows
>> > as it also doesn't wrap back to --as-needed. So if a projects includes
>> > dpdk libs + <other> it will also consider <other> with --no-as-needed.
>> >
>> > Fixes: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1841759
>> >
>> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
>> > ---
>>
>> Hi Christian,
>>
>> I agree this is something to be fixed.  It will need additional work,
>> though:
>>
>>   https://travis-ci.com/ovsrobot/dpdk/builds/124909245
>>
>
> Thanks for the Link Aaron, yet I'm puzzled what to do there atm.
>
> The kind of error I found in the failing logs were misleading at first:
> - linker can't find -lvirt / -lpqos / ...
>   well the test env needs to install them, maybe it was added as
> dependency by accident before?

Not sure about this.  It's strange to require that we *install* the
libraries before we can unit test them.  After all, if I'm going to
potentially replace my previously installed libraries, I definitely want
to know that the unit tests are passing.

>   I'd understand (due to the change) if it would complain about missing symbols
>   (no more added due to as-needed, but then for some reason needed)
>   But this is vice versa, it just doesn't find the libs in the build env
> - error: unrecognized command line option '-Wformat-truncation'
>   I don't see how I'd cause this ...
> => Maybe this is just an artifact that is even part of the normal/good tests?

I don't think so - but there's a simple change.  I've pushed to my own
branch and you can see the builds:

  https://travis-ci.org/orgcandman/dpdk/branches using the same
  series_6154 branch name.

> Comparing former logs - last good test was
> https://travis-ci.com/ovsrobot/dpdk/builds/124875383
> This first seemed more helpful.
>
> DPDK:fast-tests / eal_flags_w_opt_autotest  FAIL
> DPDK:fast-tests / func_reentrancy_autotest  FAIL
> DPDK:fast-tests / mbuf_autotest         FAIL
> DPDK:fast-tests / mempool_autotest      FAIL
> DPDK:fast-tests / ring_pmd_autotest     FAIL
> DPDK:fast-tests / sched_autotest        FAIL
> DPDK:fast-tests / table_autotest        FAIL
> [...]
> Overall about 14/60 of the tests failed with no recognizable pattern
> why just those and not the others.

Good question :)

> I only see "Full log written ... on_error", so I can't directly
> compare how a good run would look in the configure/build stage.
> Looking just at the bad case there are plenty of messages like
> - "no available hugepages"
> - "cannot reserve memory", ..
> But all those indicate more a flaky test(-env) than an error in the
> commit, there must be more to it.

Okay.  Fair enough.

> @Aaron is there a good way to get the rest of the log for a good case
> to compare?

Let's wait for https://travis-ci.org/orgcandman/dpdk/builds/577910388 to
spit out some details.

> Maybe I'm yet to blind for all the potential side effects of the change?
  
Aaron Conole Aug. 28, 2019, 3:23 p.m. UTC | #5
Aaron Conole <aconole@redhat.com> writes:

> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>
>> On Wed, Aug 28, 2019 at 3:53 PM Aaron Conole <aconole@redhat.com> wrote:
>>>
>>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>>>
>>> > A while ago telemetry was added in 57ae0ec6 and it also added as-needed
>>> > to config/meson.build. This seems no more needed these days as due to other
>>> > build changes the ordering in buildlogs is:
>>> >   [...] -lrte_telemetry [...] -Wl,--no-as-needed [...]
>>> > Which means telemetry no more benefits from --no-as-needed anyway.
>>> >
>>> > Overlinking problems get triggered by the meson generated pkgconfig which
>>> > will have:
>>> >    [...] -Wl,--no-as-needed <somelibsusedbydpdk>
>>> > This will overlink <somelibs> and in addition anything that follows
>>> > as it also doesn't wrap back to --as-needed. So if a projects includes
>>> > dpdk libs + <other> it will also consider <other> with --no-as-needed.
>>> >
>>> > Fixes: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1841759
>>> >
>>> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
>>> > ---
>>>
>>> Hi Christian,
>>>
>>> I agree this is something to be fixed.  It will need additional work,
>>> though:
>>>
>>>   https://travis-ci.com/ovsrobot/dpdk/builds/124909245
>>>
>>
>> Thanks for the Link Aaron, yet I'm puzzled what to do there atm.
>>
>> The kind of error I found in the failing logs were misleading at first:
>> - linker can't find -lvirt / -lpqos / ...
>>   well the test env needs to install them, maybe it was added as
>> dependency by accident before?
>
> Not sure about this.  It's strange to require that we *install* the
> libraries before we can unit test them.  After all, if I'm going to
> potentially replace my previously installed libraries, I definitely want
> to know that the unit tests are passing.
>
>>   I'd understand (due to the change) if it would complain about missing symbols
>>   (no more added due to as-needed, but then for some reason needed)
>>   But this is vice versa, it just doesn't find the libs in the build env
>> - error: unrecognized command line option '-Wformat-truncation'
>>   I don't see how I'd cause this ...
>> => Maybe this is just an artifact that is even part of the normal/good tests?
>
> I don't think so - but there's a simple change.  I've pushed to my own
> branch and you can see the builds:
>
>   https://travis-ci.org/orgcandman/dpdk/branches using the same
>   series_6154 branch name.
>
>> Comparing former logs - last good test was
>> https://travis-ci.com/ovsrobot/dpdk/builds/124875383
>> This first seemed more helpful.
>>
>> DPDK:fast-tests / eal_flags_w_opt_autotest  FAIL
>> DPDK:fast-tests / func_reentrancy_autotest  FAIL
>> DPDK:fast-tests / mbuf_autotest         FAIL
>> DPDK:fast-tests / mempool_autotest      FAIL
>> DPDK:fast-tests / ring_pmd_autotest     FAIL
>> DPDK:fast-tests / sched_autotest        FAIL
>> DPDK:fast-tests / table_autotest        FAIL
>> [...]
>> Overall about 14/60 of the tests failed with no recognizable pattern
>> why just those and not the others.
>
> Good question :)
>
>> I only see "Full log written ... on_error", so I can't directly
>> compare how a good run would look in the configure/build stage.
>> Looking just at the bad case there are plenty of messages like
>> - "no available hugepages"
>> - "cannot reserve memory", ..
>> But all those indicate more a flaky test(-env) than an error in the
>> commit, there must be more to it.
>
> Okay.  Fair enough.
>
>> @Aaron is there a good way to get the rest of the log for a good case
>> to compare?
>
> Let's wait for https://travis-ci.org/orgcandman/dpdk/builds/577910388 to
> spit out some details.

Oops - forgot to push the revert:

https://travis-ci.org/orgcandman/dpdk/builds/577918381 is the correct
build.

Sorry.

>> Maybe I'm yet to blind for all the potential side effects of the change?
  
Christian Ehrhardt Aug. 29, 2019, 10:18 a.m. UTC | #6
On Wed, Aug 28, 2019 at 5:23 PM Aaron Conole <aconole@redhat.com> wrote:
>
> Aaron Conole <aconole@redhat.com> writes:
>
> > Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
> >
> >> On Wed, Aug 28, 2019 at 3:53 PM Aaron Conole <aconole@redhat.com> wrote:
> >>>
> >>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
> >>>
> >>> > A while ago telemetry was added in 57ae0ec6 and it also added as-needed
> >>> > to config/meson.build. This seems no more needed these days as due to other
> >>> > build changes the ordering in buildlogs is:
> >>> >   [...] -lrte_telemetry [...] -Wl,--no-as-needed [...]
> >>> > Which means telemetry no more benefits from --no-as-needed anyway.
> >>> >
> >>> > Overlinking problems get triggered by the meson generated pkgconfig which
> >>> > will have:
> >>> >    [...] -Wl,--no-as-needed <somelibsusedbydpdk>
> >>> > This will overlink <somelibs> and in addition anything that follows
> >>> > as it also doesn't wrap back to --as-needed. So if a projects includes
> >>> > dpdk libs + <other> it will also consider <other> with --no-as-needed.
> >>> >
> >>> > Fixes: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1841759
> >>> >
> >>> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> >>> > ---
> >>>
> >>> Hi Christian,
> >>>
> >>> I agree this is something to be fixed.  It will need additional work,
> >>> though:
> >>>
> >>>   https://travis-ci.com/ovsrobot/dpdk/builds/124909245
> >>>
> >>
> >> Thanks for the Link Aaron, yet I'm puzzled what to do there atm.
> >>
> >> The kind of error I found in the failing logs were misleading at first:
> >> - linker can't find -lvirt / -lpqos / ...
> >>   well the test env needs to install them, maybe it was added as
> >> dependency by accident before?
> >
> > Not sure about this.  It's strange to require that we *install* the
> > libraries before we can unit test them.  After all, if I'm going to
> > potentially replace my previously installed libraries, I definitely want
> > to know that the unit tests are passing.
> >
> >>   I'd understand (due to the change) if it would complain about missing symbols
> >>   (no more added due to as-needed, but then for some reason needed)
> >>   But this is vice versa, it just doesn't find the libs in the build env
> >> - error: unrecognized command line option '-Wformat-truncation'
> >>   I don't see how I'd cause this ...
> >> => Maybe this is just an artifact that is even part of the normal/good tests?
> >
> > I don't think so - but there's a simple change.  I've pushed to my own
> > branch and you can see the builds:
> >
> >   https://travis-ci.org/orgcandman/dpdk/branches using the same
> >   series_6154 branch name.
> >
> >> Comparing former logs - last good test was
> >> https://travis-ci.com/ovsrobot/dpdk/builds/124875383
> >> This first seemed more helpful.
> >>
> >> DPDK:fast-tests / eal_flags_w_opt_autotest  FAIL
> >> DPDK:fast-tests / func_reentrancy_autotest  FAIL
> >> DPDK:fast-tests / mbuf_autotest         FAIL
> >> DPDK:fast-tests / mempool_autotest      FAIL
> >> DPDK:fast-tests / ring_pmd_autotest     FAIL
> >> DPDK:fast-tests / sched_autotest        FAIL
> >> DPDK:fast-tests / table_autotest        FAIL
> >> [...]
> >> Overall about 14/60 of the tests failed with no recognizable pattern
> >> why just those and not the others.
> >
> > Good question :)
> >
> >> I only see "Full log written ... on_error", so I can't directly
> >> compare how a good run would look in the configure/build stage.
> >> Looking just at the bad case there are plenty of messages like
> >> - "no available hugepages"
> >> - "cannot reserve memory", ..
> >> But all those indicate more a flaky test(-env) than an error in the
> >> commit, there must be more to it.
> >
> > Okay.  Fair enough.
> >
> >> @Aaron is there a good way to get the rest of the log for a good case
> >> to compare?
> >
> > Let's wait for https://travis-ci.org/orgcandman/dpdk/builds/577910388 to
> > spit out some details.
>
> Oops - forgot to push the revert:
>
> https://travis-ci.org/orgcandman/dpdk/builds/577918381 is the correct
> build.
>
> Sorry.

Thanks Aaron, breaking down the actual different test results ...

Tests: eal_flags_w_opt_autotest eal_flags_b_opt_autotest
  EAL: failed to parse device "00FF:09:0B.3"
  EAL: Unable to parse device '00FF:09:0B.3'

Test: func_reentrancy_autotest
  mempool create/lookup: common object allocated 0 times (should be 1)

Tests: flow_classify_autotest ring_pmd_autotest sched_autotest
table_autotest bitratestats_autotest distributor_autotest
latencystats_autotest reorder_autotest
  MBUF: error setting mempool handler
  Cannot init mbuf pool on socket 0

Test: mbuf_autotest
  MBUF: error setting mempool handler
  cannot allocate mbuf pool

Test: mempool_autotest
  cannot allocate mp_nocache mempool

Test: eventdev_common_autotest
  Tests not executed


I built DPDK off of my commit locally and tried to run the tests, but
they work fine


# app/test/dpdk-test --no-huge -l 0-1 --pci-whitelist 00FF:09:0B.3
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Static memory layout is selected, amount of reserved memory can
be adjusted with -m or --socket-mem
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL:   cannot open VFIO container, error 2 (No such file or directory)
EAL: VFIO support could not be initialized
APP: HPET is not enabled, using TSC as default timer

Works just fine ?!
And if I really break it it fails as expected

# app/test/dpdk-test --no-huge -l 0-1 --pci-whitelist 00FF:09:0B.3xxxx
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Static memory layout is selected, amount of reserved memory can
be adjusted with -m or --socket-mem
EAL: failed to parse device "00FF:09:0B.3xxxx"
EAL: Unable to parse device '00FF:09:0B.3xxxx'

And trying another one of the cases that are most common "error
setting mempool handler" works fine as well

# app/test/dpdk-test --no-huge -l 0-1
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Static memory layout is selected, amount of reserved memory can
be adjusted with -m or --socket-mem
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL:   cannot open VFIO container, error 2 (No such file or directory)
EAL: VFIO support could not be initialized
EAL: PCI device 0000:00:1f.6 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 8086:15d8 net_e1000_em
APP: HPET is not enabled, using TSC as default timer
RTE>>flow_classify_autotest
Created table_acl for for IPv4 five tuple packets
Allocated mbuf pool on socket 0
Set up IPv4 UDP traffic
ETH  pktlen 14
ETH + IPv4 pktlen 34
ETH + IPv4 + UDP pktlen 42

Set up IPv4 TCP traffic
ETH  pktlen 14
ETH + IPv4 pktlen 34
ETH + IPv4 + TCP pktlen 54

Set up IPv4 SCTP traffic
ETH  pktlen 14
ETH + IPv4 pktlen 34
ETH + IPv4 + SCTP pktlen 42

Test OK


/me is still not seeing what would be wrong with it in this test environment :-/
Especially since the change is on linking, that should be a do-or-die
breakage and not a (what seems random) subset of test fails.

@Aaron, thanks for your help so far. Is there an easy way to run
exactly my code once again to see it if works this time?
Or was https://travis-ci.org/orgcandman/dpdk/builds/577910388 exactly
that already (and again failed at the same tests).
  
Christian Ehrhardt Aug. 29, 2019, 3:25 p.m. UTC | #7
On Thu, Aug 29, 2019 at 12:18 PM Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
>
> On Wed, Aug 28, 2019 at 5:23 PM Aaron Conole <aconole@redhat.com> wrote:
> >
> > Aaron Conole <aconole@redhat.com> writes:
> >
> > > Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
> > >
> > >> On Wed, Aug 28, 2019 at 3:53 PM Aaron Conole <aconole@redhat.com> wrote:
> > >>>
> > >>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
> > >>>
> > >>> > A while ago telemetry was added in 57ae0ec6 and it also added as-needed
> > >>> > to config/meson.build. This seems no more needed these days as due to other
> > >>> > build changes the ordering in buildlogs is:
> > >>> >   [...] -lrte_telemetry [...] -Wl,--no-as-needed [...]
> > >>> > Which means telemetry no more benefits from --no-as-needed anyway.
> > >>> >
> > >>> > Overlinking problems get triggered by the meson generated pkgconfig which
> > >>> > will have:
> > >>> >    [...] -Wl,--no-as-needed <somelibsusedbydpdk>
> > >>> > This will overlink <somelibs> and in addition anything that follows
> > >>> > as it also doesn't wrap back to --as-needed. So if a projects includes
> > >>> > dpdk libs + <other> it will also consider <other> with --no-as-needed.
> > >>> >
> > >>> > Fixes: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1841759
> > >>> >
> > >>> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> > >>> > ---
> > >>>
> > >>> Hi Christian,
> > >>>
> > >>> I agree this is something to be fixed.  It will need additional work,
> > >>> though:
> > >>>
> > >>>   https://travis-ci.com/ovsrobot/dpdk/builds/124909245
> > >>>
> > >>
> > >> Thanks for the Link Aaron, yet I'm puzzled what to do there atm.
> > >>
> > >> The kind of error I found in the failing logs were misleading at first:
> > >> - linker can't find -lvirt / -lpqos / ...
> > >>   well the test env needs to install them, maybe it was added as
> > >> dependency by accident before?
> > >
> > > Not sure about this.  It's strange to require that we *install* the
> > > libraries before we can unit test them.  After all, if I'm going to
> > > potentially replace my previously installed libraries, I definitely want
> > > to know that the unit tests are passing.
> > >
> > >>   I'd understand (due to the change) if it would complain about missing symbols
> > >>   (no more added due to as-needed, but then for some reason needed)
> > >>   But this is vice versa, it just doesn't find the libs in the build env
> > >> - error: unrecognized command line option '-Wformat-truncation'
> > >>   I don't see how I'd cause this ...
> > >> => Maybe this is just an artifact that is even part of the normal/good tests?
> > >
> > > I don't think so - but there's a simple change.  I've pushed to my own
> > > branch and you can see the builds:
> > >
> > >   https://travis-ci.org/orgcandman/dpdk/branches using the same
> > >   series_6154 branch name.
> > >
> > >> Comparing former logs - last good test was
> > >> https://travis-ci.com/ovsrobot/dpdk/builds/124875383
> > >> This first seemed more helpful.
> > >>
> > >> DPDK:fast-tests / eal_flags_w_opt_autotest  FAIL
> > >> DPDK:fast-tests / func_reentrancy_autotest  FAIL
> > >> DPDK:fast-tests / mbuf_autotest         FAIL
> > >> DPDK:fast-tests / mempool_autotest      FAIL
> > >> DPDK:fast-tests / ring_pmd_autotest     FAIL
> > >> DPDK:fast-tests / sched_autotest        FAIL
> > >> DPDK:fast-tests / table_autotest        FAIL
> > >> [...]
> > >> Overall about 14/60 of the tests failed with no recognizable pattern
> > >> why just those and not the others.
> > >
> > > Good question :)
> > >
> > >> I only see "Full log written ... on_error", so I can't directly
> > >> compare how a good run would look in the configure/build stage.
> > >> Looking just at the bad case there are plenty of messages like
> > >> - "no available hugepages"
> > >> - "cannot reserve memory", ..
> > >> But all those indicate more a flaky test(-env) than an error in the
> > >> commit, there must be more to it.
> > >
> > > Okay.  Fair enough.
> > >
> > >> @Aaron is there a good way to get the rest of the log for a good case
> > >> to compare?
> > >
> > > Let's wait for https://travis-ci.org/orgcandman/dpdk/builds/577910388 to
> > > spit out some details.
> >
> > Oops - forgot to push the revert:
> >
> > https://travis-ci.org/orgcandman/dpdk/builds/577918381 is the correct
> > build.
> >
> > Sorry.
>
> Thanks Aaron, breaking down the actual different test results ...
>
> Tests: eal_flags_w_opt_autotest eal_flags_b_opt_autotest
>   EAL: failed to parse device "00FF:09:0B.3"
>   EAL: Unable to parse device '00FF:09:0B.3'
>
> Test: func_reentrancy_autotest
>   mempool create/lookup: common object allocated 0 times (should be 1)
>
> Tests: flow_classify_autotest ring_pmd_autotest sched_autotest
> table_autotest bitratestats_autotest distributor_autotest
> latencystats_autotest reorder_autotest
>   MBUF: error setting mempool handler
>   Cannot init mbuf pool on socket 0
>
> Test: mbuf_autotest
>   MBUF: error setting mempool handler
>   cannot allocate mbuf pool
>
> Test: mempool_autotest
>   cannot allocate mp_nocache mempool
>
> Test: eventdev_common_autotest
>   Tests not executed
>
>
> I built DPDK off of my commit locally and tried to run the tests, but
> they work fine
>
>
> # app/test/dpdk-test --no-huge -l 0-1 --pci-whitelist 00FF:09:0B.3
> EAL: Detected 4 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Static memory layout is selected, amount of reserved memory can
> be adjusted with -m or --socket-mem
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: Probing VFIO support...
> EAL:   cannot open VFIO container, error 2 (No such file or directory)
> EAL: VFIO support could not be initialized
> APP: HPET is not enabled, using TSC as default timer
>
> Works just fine ?!
> And if I really break it it fails as expected
>
> # app/test/dpdk-test --no-huge -l 0-1 --pci-whitelist 00FF:09:0B.3xxxx
> EAL: Detected 4 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Static memory layout is selected, amount of reserved memory can
> be adjusted with -m or --socket-mem
> EAL: failed to parse device "00FF:09:0B.3xxxx"
> EAL: Unable to parse device '00FF:09:0B.3xxxx'
>
> And trying another one of the cases that are most common "error
> setting mempool handler" works fine as well
>
> # app/test/dpdk-test --no-huge -l 0-1
> EAL: Detected 4 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Static memory layout is selected, amount of reserved memory can
> be adjusted with -m or --socket-mem
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: Probing VFIO support...
> EAL:   cannot open VFIO container, error 2 (No such file or directory)
> EAL: VFIO support could not be initialized
> EAL: PCI device 0000:00:1f.6 on NUMA socket -1
> EAL:   Invalid NUMA socket, default to 0
> EAL:   probe driver: 8086:15d8 net_e1000_em
> APP: HPET is not enabled, using TSC as default timer
> RTE>>flow_classify_autotest
> Created table_acl for for IPv4 five tuple packets
> Allocated mbuf pool on socket 0
> Set up IPv4 UDP traffic
> ETH  pktlen 14
> ETH + IPv4 pktlen 34
> ETH + IPv4 + UDP pktlen 42
>
> Set up IPv4 TCP traffic
> ETH  pktlen 14
> ETH + IPv4 pktlen 34
> ETH + IPv4 + TCP pktlen 54
>
> Set up IPv4 SCTP traffic
> ETH  pktlen 14
> ETH + IPv4 pktlen 34
> ETH + IPv4 + SCTP pktlen 42
>
> Test OK
>
>
> /me is still not seeing what would be wrong with it in this test environment :-/
> Especially since the change is on linking, that should be a do-or-die
> breakage and not a (what seems random) subset of test fails.
>
> @Aaron, thanks for your help so far. Is there an easy way to run
> exactly my code once again to see it if works this time?
> Or was https://travis-ci.org/orgcandman/dpdk/builds/577910388 exactly
> that already (and again failed at the same tests).
>

Thanks to Luca's support I set up travis for my github and
experimented with some ideas we had.
While I still don't understand how exactly we break those tests in
particular, we have found a way to fix the generated pkg-config
without breaking the tests:

Bad: https://travis-ci.org/cpaelzer/dpdk/builds/578391327
Good: https://travis-ci.org/cpaelzer/dpdk/builds/578370028

I'll submit a v2 to this mail in a few minutes.
  

Patch

diff --git a/config/meson.build b/config/meson.build
index 2bafea530..58800a980 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -93,9 +93,6 @@  dpdk_conf.set('RTE_TOOLCHAIN_' + toolchain.to_upper(), 1)
 
 dpdk_conf.set('RTE_ARCH_64', cc.sizeof('void *') == 8)
 
-add_project_link_arguments('-Wl,--no-as-needed', language: 'c')
-dpdk_extra_ldflags += '-Wl,--no-as-needed'
-
 # use pthreads
 add_project_link_arguments('-pthread', language: 'c')
 dpdk_extra_ldflags += '-pthread'