[v2,01/10] build: add an option to enable LTO build

Message ID 20190917075754.8310-2-amo@semihalf.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series Add an option to use LTO for DPDK build |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Andrzej Ostruszka Sept. 17, 2019, 7:57 a.m. UTC
  This patch adds an option to enable link time optimization.  In addition
to LTO option itself (-flto) fat-lto-objects are being used.  This is
because during the build pmdinfogen scans the generated ELF objects to
find this_pmd_name* symbol in symbol table.  Without fat-lto-objects gcc
produces ELF only with extra symbols for internal use during linking and
clang does not produce ELF at all (only LLVM IR bitcode).

Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
---
 .travis.yml                                  |  7 ++++
 config/common_base                           |  5 +++
 config/meson.build                           | 15 ++++++++
 doc/guides/prog_guide/lto.rst                | 37 ++++++++++++++++++++
 doc/guides/rel_notes/release_19_11.rst       |  8 +++++
 lib/librte_distributor/rte_distributor.c     | 18 +++++-----
 lib/librte_distributor/rte_distributor_v20.c | 18 +++++-----
 lib/librte_lpm/rte_lpm.c                     | 28 +++++++--------
 lib/librte_lpm/rte_lpm6.c                    | 16 ++++-----
 lib/librte_timer/rte_timer.c                 | 20 +++++------
 mk/toolchain/clang/rte.toolchain-compat.mk   |  4 +++
 mk/toolchain/clang/rte.vars.mk               |  8 +++++
 mk/toolchain/gcc/rte.toolchain-compat.mk     |  4 +++
 mk/toolchain/gcc/rte.vars.mk                 | 12 +++++++
 mk/toolchain/icc/rte.vars.mk                 |  8 +++++
 15 files changed, 158 insertions(+), 50 deletions(-)
 create mode 100644 doc/guides/prog_guide/lto.rst
  

Comments

Bruce Richardson Sept. 18, 2019, 10:36 a.m. UTC | #1
On Tue, Sep 17, 2019 at 09:57:45AM +0200, Andrzej Ostruszka wrote:
> This patch adds an option to enable link time optimization.  In addition
> to LTO option itself (-flto) fat-lto-objects are being used.  This is
> because during the build pmdinfogen scans the generated ELF objects to
> find this_pmd_name* symbol in symbol table.  Without fat-lto-objects gcc
> produces ELF only with extra symbols for internal use during linking and
> clang does not produce ELF at all (only LLVM IR bitcode).
> 
> Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
>
For meson changes part:

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
Ray Kinsella Sept. 18, 2019, 1:32 p.m. UTC | #2
this is cool, good work.
comments below.

On 17/09/2019 08:57, Andrzej Ostruszka wrote:
> This patch adds an option to enable link time optimization.  In addition
> to LTO option itself (-flto) fat-lto-objects are being used.  This is
> because during the build pmdinfogen scans the generated ELF objects to
> find this_pmd_name* symbol in symbol table.  Without fat-lto-objects gcc
> produces ELF only with extra symbols for internal use during linking and
> clang does not produce ELF at all (only LLVM IR bitcode).
> 
> Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
> ---
>  .travis.yml                                  |  7 ++++
>  config/common_base                           |  5 +++
>  config/meson.build                           | 15 ++++++++
>  doc/guides/prog_guide/lto.rst                | 37 ++++++++++++++++++++
>  doc/guides/rel_notes/release_19_11.rst       |  8 +++++
>  lib/librte_distributor/rte_distributor.c     | 18 +++++-----
>  lib/librte_distributor/rte_distributor_v20.c | 18 +++++-----
>  lib/librte_lpm/rte_lpm.c                     | 28 +++++++--------
>  lib/librte_lpm/rte_lpm6.c                    | 16 ++++-----
>  lib/librte_timer/rte_timer.c                 | 20 +++++------
>  mk/toolchain/clang/rte.toolchain-compat.mk   |  4 +++
>  mk/toolchain/clang/rte.vars.mk               |  8 +++++
>  mk/toolchain/gcc/rte.toolchain-compat.mk     |  4 +++
>  mk/toolchain/gcc/rte.vars.mk                 | 12 +++++++
>  mk/toolchain/icc/rte.vars.mk                 |  8 +++++
>  15 files changed, 158 insertions(+), 50 deletions(-)
>  create mode 100644 doc/guides/prog_guide/lto.rst
> 
> diff --git a/.travis.yml b/.travis.yml
> index 781f9f666..70d221852 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -31,6 +31,7 @@ env:
>    - DEF_LIB="static" OPTS="-Denable_kmods=false"
>    - DEF_LIB="shared" OPTS="-Denable_kmods=false"
>    - DEF_LIB="shared" RUN_TESTS=1 BUILD_DOCS=1
> +  - DEF_LIB="shared" OPTS="-Db_lto=true"
>  
>  matrix:
>    include:
> @@ -100,6 +101,12 @@ matrix:
>        apt:
>          packages:
>            - *extra_packages
> +  - env: DEF_LIB="shared" OPTS="-Db_lto=true" EXTRA_PACKAGES=1
> +    compiler: gcc
> +    addons:
> +      apt:
> +        packages:
> +          - *extra_packages
>  
>  
>  script: ./.ci/${TRAVIS_OS_NAME}-build.sh
> diff --git a/config/common_base b/config/common_base
> index 8ef75c203..73a55fdec 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -49,6 +49,11 @@ CONFIG_RTE_FORCE_INTRINSICS=n
>  #
>  CONFIG_RTE_ARCH_STRICT_ALIGN=n
>  
> +#
> +# Enable link time optimization
> +#
> +CONFIG_RTE_ENABLE_LTO=n
> +
>  #
>  # Compile to share library
>  #

Why would we make this optional in this way and expand the matrix of
different ways to build DPDK. To ask another way, why wouldn't a user
turn on GSO.

> diff --git a/config/meson.build b/config/meson.build
> index 2bafea530..97bbc323b 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -196,3 +196,18 @@ add_project_arguments('-D_GNU_SOURCE', language: 'c')
>  if is_freebsd
>  	add_project_arguments('-D__BSD_VISIBLE', language: 'c')
>  endif
> +
> +if get_option('b_lto')
> +	if cc.has_argument('-ffat-lto-objects')
> +		add_project_arguments('-ffat-lto-objects', language: 'c')
> +	else
> +		error('compiler does not support fat LTO objects - please turn LTO off')
> +	endif
> +	if cc.get_id() == 'gcc'
> +		# workaround for bug 81440
> +		if cc.version().version_compare('<8.0')
> +			add_project_arguments('-Wno-lto-type-mismatch', language: 'c')
> +			add_project_link_arguments('-Wno-lto-type-mismatch', language: 'c')
> +		endif
> +	endif
> +endif
> diff --git a/doc/guides/prog_guide/lto.rst b/doc/guides/prog_guide/lto.rst
> new file mode 100644
> index 000000000..b2b36e51c
> --- /dev/null
> +++ b/doc/guides/prog_guide/lto.rst
> @@ -0,0 +1,37 @@
> +Link Time Optimization
> +======================
> +
> +The DPDK framework supports compilation with link time optimization
> +turned on.  This depends obviously on the capabilities of the compiler
> +to do "whole program" optimization at link time and is available only
> +for compilers that support that feature (gcc, clang and icc).  To be
> +more specific compiler have to support creation of ELF objects
> +containing both normal code and internal representation
> +(fat-lto-objects).  This is required since during build some code is
> +generated by parsing produced ELF objects (pmdinfogen).
> +
> +The amount of performance gain that one can get from LTO depends on the
> +compiler and the code that is being compiled.  However LTO is also
> +useful for additional code analysis done by the compiler.  In particular
> +due to interprocedural analysis compiler can produce additional warnings
> +about variables that might be used uninitialized.  Some of these
> +warnings might be "false positives" though and you might need to
> +explicitly initialize variable in order to silence the compiler.
> +
> +Link time optimization can be enabled for whole DPDK framework by
> +setting:
> +
> +.. code-block:: console
> +    CONFIG_ENABLE_LTO=y
> +
> +in config file for the case of make based build and by:
> +
> +.. code-block:: console
> +    meson build -Db_lto=true -Ddefault_library=shared
> +    ninja -C build
> +
> +for the case of meson based build (only shared libraries are supported
> +when building with meson and LTO enabled).
> +
> +Please note that turning LTO on causes considerable extension of
> +compilation time.
> diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
> index 8490d897c..97b4f4083 100644
> --- a/doc/guides/rel_notes/release_19_11.rst
> +++ b/doc/guides/rel_notes/release_19_11.rst
> @@ -56,6 +56,14 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =========================================================
>  
> +**Added build support for Link Time Optimization.**
> +
> + LTO is an optimization technique used by the compiler to perform whole
> + program analysis and optimization at link time.  In order to do that
> + compilers store their internal representation of the source code that
> + the linker uses at the final stage of compilation process.
> +
> + See :doc:`../prog_guide/lto` for more information:
>  
>  Removed Items
>  -------------
> diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
> index 21eb1fb0a..848250f4a 100644
> --- a/lib/librte_distributor/rte_distributor.c
> +++ b/lib/librte_distributor/rte_distributor.c
> @@ -32,7 +32,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
>  
>  /**** Burst Packet APIs called by workers ****/
>  
> -void
> +void __vsym

all these additional __vsym annotations looks like they belong in a
seperate patch, as they are fixing a bug and are not directly related to
adding LTO the build system.

>  rte_distributor_request_pkt_v1705(struct rte_distributor *d,
>  		unsigned int worker_id, struct rte_mbuf **oldpkt,
>  		unsigned int count)
> @@ -84,7 +84,7 @@ MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
>  		unsigned int count),
>  		rte_distributor_request_pkt_v1705);
>  
> -int
> +int __vsym
>  rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
>  		unsigned int worker_id, struct rte_mbuf **pkts)
>  {
> @@ -124,7 +124,7 @@ MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
>  		unsigned int worker_id, struct rte_mbuf **pkts),
>  		rte_distributor_poll_pkt_v1705);
>  
> -int
> +int __vsym
>  rte_distributor_get_pkt_v1705(struct rte_distributor *d,
>  		unsigned int worker_id, struct rte_mbuf **pkts,
>  		struct rte_mbuf **oldpkt, unsigned int return_count)
> @@ -159,7 +159,7 @@ MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
>  		struct rte_mbuf **oldpkt, unsigned int return_count),
>  		rte_distributor_get_pkt_v1705);
>  
> -int
> +int __vsym
>  rte_distributor_return_pkt_v1705(struct rte_distributor *d,
>  		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
>  {
> @@ -335,7 +335,7 @@ release(struct rte_distributor *d, unsigned int wkr)
>  
>  
>  /* process a set of packets to distribute them to workers */
> -int
> +int __vsym
>  rte_distributor_process_v1705(struct rte_distributor *d,
>  		struct rte_mbuf **mbufs, unsigned int num_mbufs)
>  {
> @@ -476,7 +476,7 @@ MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
>  		rte_distributor_process_v1705);
>  
>  /* return to the caller, packets returned from workers */
> -int
> +int __vsym
>  rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
>  		struct rte_mbuf **mbufs, unsigned int max_mbufs)
>  {
> @@ -526,7 +526,7 @@ total_outstanding(const struct rte_distributor *d)
>   * Flush the distributor, so that there are no outstanding packets in flight or
>   * queued up.
>   */
> -int
> +int __vsym
>  rte_distributor_flush_v1705(struct rte_distributor *d)
>  {
>  	unsigned int flushed;
> @@ -561,7 +561,7 @@ MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
>  		rte_distributor_flush_v1705);
>  
>  /* clears the internal returns array in the distributor */
> -void
> +void __vsym
>  rte_distributor_clear_returns_v1705(struct rte_distributor *d)
>  {
>  	unsigned int wkr;
> @@ -581,7 +581,7 @@ MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
>  		rte_distributor_clear_returns_v1705);
>  
>  /* creates a distributor instance */
> -struct rte_distributor *
> +struct rte_distributor * __vsym
>  rte_distributor_create_v1705(const char *name,
>  		unsigned int socket_id,
>  		unsigned int num_workers,
> diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
> index cdc0969a8..31c766421 100644
> --- a/lib/librte_distributor/rte_distributor_v20.c
> +++ b/lib/librte_distributor/rte_distributor_v20.c
> @@ -27,7 +27,7 @@ EAL_REGISTER_TAILQ(rte_distributor_tailq)
>  
>  /**** APIs called by workers ****/
>  
> -void
> +void __vsym
>  rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
>  		unsigned worker_id, struct rte_mbuf *oldpkt)
>  {
> @@ -40,7 +40,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
>  }
>  VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
>  
> -struct rte_mbuf *
> +struct rte_mbuf * __vsym
>  rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
>  		unsigned worker_id)
>  {
> @@ -54,7 +54,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
>  }
>  VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
>  
> -struct rte_mbuf *
> +struct rte_mbuf * __vsym
>  rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
>  		unsigned worker_id, struct rte_mbuf *oldpkt)
>  {
> @@ -66,7 +66,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
>  }
>  VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
>  		unsigned worker_id, struct rte_mbuf *oldpkt)
>  {
> @@ -191,7 +191,7 @@ process_returns(struct rte_distributor_v20 *d)
>  }
>  
>  /* process a set of packets to distribute them to workers */
> -int
> +int __vsym
>  rte_distributor_process_v20(struct rte_distributor_v20 *d,
>  		struct rte_mbuf **mbufs, unsigned num_mbufs)
>  {
> @@ -296,7 +296,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
>  VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
>  
>  /* return to the caller, packets returned from workers */
> -int
> +int __vsym
>  rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
>  		struct rte_mbuf **mbufs, unsigned max_mbufs)
>  {
> @@ -334,7 +334,7 @@ total_outstanding(const struct rte_distributor_v20 *d)
>  
>  /* flush the distributor, so that there are no outstanding packets in flight or
>   * queued up. */
> -int
> +int __vsym
>  rte_distributor_flush_v20(struct rte_distributor_v20 *d)
>  {
>  	const unsigned flushed = total_outstanding(d);
> @@ -347,7 +347,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
>  VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
>  
>  /* clears the internal returns array in the distributor */
> -void
> +void __vsym
>  rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
>  {
>  	d->returns.start = d->returns.count = 0;
> @@ -358,7 +358,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
>  VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
>  
>  /* creates a distributor instance */
> -struct rte_distributor_v20 *
> +struct rte_distributor_v20 * __vsym
>  rte_distributor_create_v20(const char *name,
>  		unsigned socket_id,
>  		unsigned num_workers)
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 3a929a1b1..a2fba8d61 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -89,7 +89,7 @@ depth_to_range(uint8_t depth)
>  /*
>   * Find an existing lpm table and return a pointer to it.
>   */
> -struct rte_lpm_v20 *
> +struct rte_lpm_v20 * __vsym
>  rte_lpm_find_existing_v20(const char *name)
>  {
>  	struct rte_lpm_v20 *l = NULL;
> @@ -115,7 +115,7 @@ rte_lpm_find_existing_v20(const char *name)
>  }
>  VERSION_SYMBOL(rte_lpm_find_existing, _v20, 2.0);
>  
> -struct rte_lpm *
> +struct rte_lpm * __vsym
>  rte_lpm_find_existing_v1604(const char *name)
>  {
>  	struct rte_lpm *l = NULL;
> @@ -146,7 +146,7 @@ MAP_STATIC_SYMBOL(struct rte_lpm *rte_lpm_find_existing(const char *name),
>  /*
>   * Allocates memory for LPM object
>   */
> -struct rte_lpm_v20 *
> +struct rte_lpm_v20 * __vsym
>  rte_lpm_create_v20(const char *name, int socket_id, int max_rules,
>  		__rte_unused int flags)
>  {
> @@ -219,7 +219,7 @@ rte_lpm_create_v20(const char *name, int socket_id, int max_rules,
>  }
>  VERSION_SYMBOL(rte_lpm_create, _v20, 2.0);
>  
> -struct rte_lpm *
> +struct rte_lpm * __vsym
>  rte_lpm_create_v1604(const char *name, int socket_id,
>  		const struct rte_lpm_config *config)
>  {
> @@ -328,7 +328,7 @@ MAP_STATIC_SYMBOL(
>  /*
>   * Deallocates memory for given LPM table.
>   */
> -void
> +void __vsym
>  rte_lpm_free_v20(struct rte_lpm_v20 *lpm)
>  {
>  	struct rte_lpm_list *lpm_list;
> @@ -357,7 +357,7 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm)
>  }
>  VERSION_SYMBOL(rte_lpm_free, _v20, 2.0);
>  
> -void
> +void __vsym
>  rte_lpm_free_v1604(struct rte_lpm *lpm)
>  {
>  	struct rte_lpm_list *lpm_list;
> @@ -1176,7 +1176,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>  /*
>   * Add a route
>   */
> -int
> +int __vsym
>  rte_lpm_add_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth,
>  		uint8_t next_hop)
>  {
> @@ -1217,7 +1217,7 @@ rte_lpm_add_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth,
>  }
>  VERSION_SYMBOL(rte_lpm_add, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_lpm_add_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth,
>  		uint32_t next_hop)
>  {
> @@ -1263,7 +1263,7 @@ MAP_STATIC_SYMBOL(int rte_lpm_add(struct rte_lpm *lpm, uint32_t ip,
>  /*
>   * Look for a rule in the high-level rules table
>   */
> -int
> +int __vsym
>  rte_lpm_is_rule_present_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth,
>  uint8_t *next_hop)
>  {
> @@ -1290,7 +1290,7 @@ uint8_t *next_hop)
>  }
>  VERSION_SYMBOL(rte_lpm_is_rule_present, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_lpm_is_rule_present_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth,
>  uint32_t *next_hop)
>  {
> @@ -1843,7 +1843,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>  /*
>   * Deletes a rule
>   */
> -int
> +int __vsym
>  rte_lpm_delete_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth)
>  {
>  	int32_t rule_to_delete_index, sub_rule_index;
> @@ -1897,7 +1897,7 @@ rte_lpm_delete_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth)
>  }
>  VERSION_SYMBOL(rte_lpm_delete, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_lpm_delete_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth)
>  {
>  	int32_t rule_to_delete_index, sub_rule_index;
> @@ -1956,7 +1956,7 @@ MAP_STATIC_SYMBOL(int rte_lpm_delete(struct rte_lpm *lpm, uint32_t ip,
>  /*
>   * Delete all rules from the LPM table.
>   */
> -void
> +void __vsym
>  rte_lpm_delete_all_v20(struct rte_lpm_v20 *lpm)
>  {
>  	/* Zero rule information. */
> @@ -1973,7 +1973,7 @@ rte_lpm_delete_all_v20(struct rte_lpm_v20 *lpm)
>  }
>  VERSION_SYMBOL(rte_lpm_delete_all, _v20, 2.0);
>  
> -void
> +void __vsym
>  rte_lpm_delete_all_v1604(struct rte_lpm *lpm)
>  {
>  	/* Zero rule information. */
> diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
> index 9b8aeb972..49a7fea1d 100644
> --- a/lib/librte_lpm/rte_lpm6.c
> +++ b/lib/librte_lpm/rte_lpm6.c
> @@ -811,7 +811,7 @@ add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
>  /*
>   * Add a route
>   */
> -int
> +int __vsym
>  rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
>  		uint8_t next_hop)
>  {
> @@ -861,7 +861,7 @@ simulate_add(struct rte_lpm6 *lpm, const uint8_t *masked_ip, uint8_t depth)
>  	return 0;
>  }
>  
> -int
> +int __vsym
>  rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
>  		uint32_t next_hop)
>  {
> @@ -954,7 +954,7 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
>  /*
>   * Looks up an IP
>   */
> -int
> +int __vsym
>  rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
>  {
>  	uint32_t next_hop32 = 0;
> @@ -972,7 +972,7 @@ rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
>  }
>  VERSION_SYMBOL(rte_lpm6_lookup, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
>  		uint32_t *next_hop)
>  {
> @@ -1007,7 +1007,7 @@ MAP_STATIC_SYMBOL(int rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip,
>  /*
>   * Looks up a group of IP addresses
>   */
> -int
> +int __vsym
>  rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
>  		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
>  		int16_t * next_hops, unsigned n)
> @@ -1048,7 +1048,7 @@ rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
>  }
>  VERSION_SYMBOL(rte_lpm6_lookup_bulk_func, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
>  		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
>  		int32_t *next_hops, unsigned int n)
> @@ -1098,7 +1098,7 @@ MAP_STATIC_SYMBOL(int rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
>  /*
>   * Look for a rule in the high-level rules table
>   */
> -int
> +int __vsym
>  rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
>  		uint8_t *next_hop)
>  {
> @@ -1118,7 +1118,7 @@ rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
>  }
>  VERSION_SYMBOL(rte_lpm6_is_rule_present, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
>  		uint32_t *next_hop)
>  {
> diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
> index bdcf05d06..e560ace06 100644
> --- a/lib/librte_timer/rte_timer.c
> +++ b/lib/librte_timer/rte_timer.c
> @@ -131,7 +131,7 @@ rte_timer_data_dealloc(uint32_t id)
>  	return 0;
>  }
>  
> -void
> +void __vsym
>  rte_timer_subsystem_init_v20(void)
>  {
>  	unsigned lcore_id;
> @@ -153,7 +153,7 @@ VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
>   * secondary processes should be empty, the zeroth entry can be shared by
>   * multiple processes.
>   */
> -int
> +int __vsym
>  rte_timer_subsystem_init_v1905(void)
>  {
>  	const struct rte_memzone *mz;
> @@ -551,7 +551,7 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
>  }
>  
>  /* Reset and start the timer associated with the timer handle tim */
> -int
> +int __vsym
>  rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
>  		    enum rte_timer_type type, unsigned int tim_lcore,
>  		    rte_timer_cb_t fct, void *arg)
> @@ -574,7 +574,7 @@ rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
>  }
>  VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
>  		      enum rte_timer_type type, unsigned int tim_lcore,
>  		      rte_timer_cb_t fct, void *arg)
> @@ -657,14 +657,14 @@ __rte_timer_stop(struct rte_timer *tim, int local_is_locked,
>  }
>  
>  /* Stop the timer associated with the timer handle tim */
> -int
> +int __vsym
>  rte_timer_stop_v20(struct rte_timer *tim)
>  {
>  	return __rte_timer_stop(tim, 0, &default_timer_data);
>  }
>  VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_timer_stop_v1905(struct rte_timer *tim)
>  {
>  	return rte_timer_alt_stop(default_data_id, tim);
> @@ -817,14 +817,14 @@ __rte_timer_manage(struct rte_timer_data *timer_data)
>  	priv_timer[lcore_id].running_tim = NULL;
>  }
>  
> -void
> +void __vsym
>  rte_timer_manage_v20(void)
>  {
>  	__rte_timer_manage(&default_timer_data);
>  }
>  VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_timer_manage_v1905(void)
>  {
>  	struct rte_timer_data *timer_data;
> @@ -1074,14 +1074,14 @@ __rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
>  #endif
>  }
>  
> -void
> +void __vsym
>  rte_timer_dump_stats_v20(FILE *f)
>  {
>  	__rte_timer_dump_stats(&default_timer_data, f);
>  }
>  VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
>  
> -int
> +int __vsym
>  rte_timer_dump_stats_v1905(FILE *f)
>  {
>  	return rte_timer_alt_dump_stats(default_data_id, f);
> diff --git a/mk/toolchain/clang/rte.toolchain-compat.mk b/mk/toolchain/clang/rte.toolchain-compat.mk
> index e6189b498..78f96c648 100644
> --- a/mk/toolchain/clang/rte.toolchain-compat.mk
> +++ b/mk/toolchain/clang/rte.toolchain-compat.mk
> @@ -20,3 +20,7 @@ CLANG_MINOR_VERSION := $(shell echo $(CLANG_VERSION) | cut -f2 -d.)
>  ifeq ($(shell test $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -lt 35 && echo 1), 1)
>  	CC_SUPPORTS_Z := false
>  endif
> +
> +ifeq ($(shell test $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -lt 60 && echo 1), 1)
> +	CONFIG_RTE_ENABLE_LTO=n
> +endif
> diff --git a/mk/toolchain/clang/rte.vars.mk b/mk/toolchain/clang/rte.vars.mk
> index 3c49dc568..3b1fa05f9 100644
> --- a/mk/toolchain/clang/rte.vars.mk
> +++ b/mk/toolchain/clang/rte.vars.mk
> @@ -48,6 +48,14 @@ endif
>  # process cpu flags
>  include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
>  
> +ifeq ($(CONFIG_RTE_ENABLE_LTO),y)
> +# 'fat-lto' is used since pmdinfogen needs to have 'this_pmd_nameX'
> +# exported in symbol table and without this option only internal
> +# representation is present.
> +TOOLCHAIN_CFLAGS += -flto -ffat-lto-objects
> +TOOLCHAIN_LDFLAGS += -flto
> +endif
> +
>  # workaround clang bug with warning "missing field initializer" for "= {0}"
>  WERROR_FLAGS += -Wno-missing-field-initializers
>  
> diff --git a/mk/toolchain/gcc/rte.toolchain-compat.mk b/mk/toolchain/gcc/rte.toolchain-compat.mk
> index ea40a11c0..ad4fad83c 100644
> --- a/mk/toolchain/gcc/rte.toolchain-compat.mk
> +++ b/mk/toolchain/gcc/rte.toolchain-compat.mk
> @@ -88,6 +88,10 @@ else
>  		MACHINE_CFLAGS := $(filter-out -march% -mtune% -msse%,$(MACHINE_CFLAGS))
>  	endif
>  
> +	ifeq ($(shell test $(GCC_VERSION) -lt 45 && echo 1), 1)
> +		CONFIG_RTE_ENABLE_LTO=n
> +	endif
> +
>  	# Disable thunderx PMD for gcc < 4.7
>  	ifeq ($(shell test $(GCC_VERSION) -lt 47 && echo 1), 1)
>  		CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD=d
> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index b852fcfd7..9fc704193 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk
> @@ -62,6 +62,18 @@ endif
>  # process cpu flags
>  include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
>  
> +ifeq ($(CONFIG_RTE_ENABLE_LTO),y)
> +# 'fat-lto' is used since pmdinfogen needs to have 'this_pmd_nameX'
> +# exported in symbol table and without this option only internal
> +# representation is present.
> +TOOLCHAIN_CFLAGS += -flto -ffat-lto-objects
> +TOOLCHAIN_LDFLAGS += -flto
> +# workaround for GCC bug 81440
> +ifeq ($(shell test $(GCC_VERSION) -lt 80 && echo 1), 1)
> +WERROR_FLAGS += -Wno-lto-type-mismatch
> +endif
> +endif
> +
>  # workaround GCC bug with warning "missing initializer" for "= {0}"
>  ifeq ($(shell test $(GCC_VERSION) -lt 47 && echo 1), 1)
>  WERROR_FLAGS += -Wno-missing-field-initializers
> diff --git a/mk/toolchain/icc/rte.vars.mk b/mk/toolchain/icc/rte.vars.mk
> index aa1422bf1..8aa87aa1e 100644
> --- a/mk/toolchain/icc/rte.vars.mk
> +++ b/mk/toolchain/icc/rte.vars.mk
> @@ -54,5 +54,13 @@ endif
>  # process cpu flags
>  include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
>  
> +ifeq ($(CONFIG_RTE_ENABLE_LTO),y)
> +# 'fat-lto' is used since pmdinfogen needs to have 'this_pmd_nameX'
> +# exported in symbol table and without this option only internal
> +# representation is present.
> +TOOLCHAIN_CFLAGS += -flto -ffat-lto-objects
> +TOOLCHAIN_LDFLAGS += -flto
> +endif
> +
>  export CC AS AR LD OBJCOPY OBJDUMP STRIP READELF
>  export TOOLCHAIN_CFLAGS TOOLCHAIN_LDFLAGS TOOLCHAIN_ASFLAGS
>
  
Andrzej Ostruszka Sept. 19, 2019, 12:35 p.m. UTC | #3
On 9/18/19 3:32 PM, Ray Kinsella wrote:
> this is cool, good work.
> comments below.
[...]>> +CONFIG_RTE_ENABLE_LTO=n
>> +
>>  #
>>  # Compile to share library
>>  #
>
> Why would we make this optional in this way and expand the matrix of
> different ways to build DPDK. To ask another way, why wouldn't a user
> turn on GSO.

Compilation time is much longer.  In a normal hack|fix/compile/repeat
cycle with "compile" part being simple "make" the link time might be a
bit annoying.  So I imagine keeping LTO off for the most part of the dev
cycle and then at the end when doing release/cleanup turn LTO on - to
either get release build ready or to get some set of warnings that you
address before yet another attempt to release build.  By the way - this
make config option is equivalent to meson 'b_lto' option, which by
default is off, so we have similar behaviour in both build types.

Regards
Andrzej

PS. I assumed that you've meant "LTO" not "GSO" - if not, then please
explain what you've meant.
  
Ray Kinsella Sept. 19, 2019, 1:28 p.m. UTC | #4
On 19/09/2019 13:35, Andrzej Ostruszka wrote:
> On 9/18/19 3:32 PM, Ray Kinsella wrote:
...>
> Compilation time is much longer.  In a normal hack|fix/compile/repeat
> cycle with "compile" part being simple "make" the link time might be a
> bit annoying.  So I imagine keeping LTO off for the most part of the dev
> cycle and then at the end when doing release/cleanup turn LTO on - to
> either get release build ready or to get some set of warnings that you
> address before yet another attempt to release build.  

Well look, I would say a few things.

1. I see build times going down dramatically with Meson/Ninja in any case.
2. You don't want any false positivess - i.e. misleading the contributor
into thinking their code builds fine, and then someone switches on LTO
and suddenly there is a problem that was missed (and hopefully gets
caught by CI) - and BTW I have seen these LTO specific issues in the past.

I strongly believe we will make all our lives easier, by having as few
ways of building DPDK as possible.

> By the way - this
> make config option is equivalent to meson 'b_lto' option, which by
> default is off, so we have similar behaviour in both build types.
> 
> Regards
> Andrzej
> 
> PS. I assumed that you've meant "LTO" not "GSO" - if not, then please
> explain what you've meant.
> 

Your are correct, GSO was on my mind apologies.
  
Bruce Richardson Sept. 19, 2019, 3:16 p.m. UTC | #5
On Thu, Sep 19, 2019 at 02:28:04PM +0100, Ray Kinsella wrote:
> 
> 
> On 19/09/2019 13:35, Andrzej Ostruszka wrote:
> > On 9/18/19 3:32 PM, Ray Kinsella wrote:
> ...>
> > Compilation time is much longer.  In a normal hack|fix/compile/repeat
> > cycle with "compile" part being simple "make" the link time might be a
> > bit annoying.  So I imagine keeping LTO off for the most part of the dev
> > cycle and then at the end when doing release/cleanup turn LTO on - to
> > either get release build ready or to get some set of warnings that you
> > address before yet another attempt to release build.  
> 
> Well look, I would say a few things.
> 
> 1. I see build times going down dramatically with Meson/Ninja in any case.

In the general case yes, but mainly it helps if you have a large number of
cores. For anyone building in a CI with only a couple of cores, meson+ninja
isn't going to help much.

/Bruce
  
Ray Kinsella Sept. 20, 2019, 7:38 a.m. UTC | #6
On 19/09/2019 16:16, Bruce Richardson wrote:
> On Thu, Sep 19, 2019 at 02:28:04PM +0100, Ray Kinsella wrote:
>>
>>
>> On 19/09/2019 13:35, Andrzej Ostruszka wrote:
>>> On 9/18/19 3:32 PM, Ray Kinsella wrote:
>> ...>
>>> Compilation time is much longer.  In a normal hack|fix/compile/repeat
>>> cycle with "compile" part being simple "make" the link time might be a
>>> bit annoying.  So I imagine keeping LTO off for the most part of the dev
>>> cycle and then at the end when doing release/cleanup turn LTO on - to
>>> either get release build ready or to get some set of warnings that you
>>> address before yet another attempt to release build.  
>>
>> Well look, I would say a few things.
>>
>> 1. I see build times going down dramatically with Meson/Ninja in any case.
> 
> In the general case yes, but mainly it helps if you have a large number of
> cores. For anyone building in a CI with only a couple of cores, meson+ninja
> isn't going to help much.
> 
> /Bruce
> 

Very true, and I completely acknowledge that LTO adds compilation time.
I would just like to see it in or out, not another build time option.

Ray K
  
Thomas Monjalon Sept. 23, 2019, 7:23 a.m. UTC | #7
20/09/2019 09:38, Ray Kinsella:
> 
> On 19/09/2019 16:16, Bruce Richardson wrote:
> > On Thu, Sep 19, 2019 at 02:28:04PM +0100, Ray Kinsella wrote:
> >>
> >>
> >> On 19/09/2019 13:35, Andrzej Ostruszka wrote:
> >>> On 9/18/19 3:32 PM, Ray Kinsella wrote:
> >> ...>
> >>> Compilation time is much longer.  In a normal hack|fix/compile/repeat
> >>> cycle with "compile" part being simple "make" the link time might be a
> >>> bit annoying.  So I imagine keeping LTO off for the most part of the dev
> >>> cycle and then at the end when doing release/cleanup turn LTO on - to
> >>> either get release build ready or to get some set of warnings that you
> >>> address before yet another attempt to release build.  
> >>
> >> Well look, I would say a few things.
> >>
> >> 1. I see build times going down dramatically with Meson/Ninja in any case.
> > 
> > In the general case yes, but mainly it helps if you have a large number of
> > cores. For anyone building in a CI with only a couple of cores, meson+ninja
> > isn't going to help much.
> > 
> > /Bruce
> > 
> 
> Very true, and I completely acknowledge that LTO adds compilation time.
> I would just like to see it in or out, not another build time option.

Please can we get some numbers to understand how longer it is?
  
Ray Kinsella Sept. 23, 2019, 9:36 a.m. UTC | #8
On 23/09/2019 08:23, Thomas Monjalon wrote:
> 20/09/2019 09:38, Ray Kinsella:
>>
>> On 19/09/2019 16:16, Bruce Richardson wrote:
>>> On Thu, Sep 19, 2019 at 02:28:04PM +0100, Ray Kinsella wrote:
>>>>
>>>>
>>>> On 19/09/2019 13:35, Andrzej Ostruszka wrote:
>>>>> On 9/18/19 3:32 PM, Ray Kinsella wrote:
>>>> ...>
>>>>> Compilation time is much longer.  In a normal hack|fix/compile/repeat
>>>>> cycle with "compile" part being simple "make" the link time might be a
>>>>> bit annoying.  So I imagine keeping LTO off for the most part of the dev
>>>>> cycle and then at the end when doing release/cleanup turn LTO on - to
>>>>> either get release build ready or to get some set of warnings that you
>>>>> address before yet another attempt to release build.  
>>>>
>>>> Well look, I would say a few things.
>>>>
>>>> 1. I see build times going down dramatically with Meson/Ninja in any case.
>>>
>>> In the general case yes, but mainly it helps if you have a large number of
>>> cores. For anyone building in a CI with only a couple of cores, meson+ninja
>>> isn't going to help much.
>>>
>>> /Bruce
>>>
>>
>> Very true, and I completely acknowledge that LTO adds compilation time.
>> I would just like to see it in or out, not another build time option.
> 
> Please can we get some numbers to understand how longer it is?
> 

I will note here, that it will always be a function of the number of
object files involved. We should measure it now to understand ...
  
Mattias Rönnblom Sept. 23, 2019, 10:16 a.m. UTC | #9
On 2019-09-23 11:36, Ray Kinsella wrote:
> 
> 
> On 23/09/2019 08:23, Thomas Monjalon wrote:
>> 20/09/2019 09:38, Ray Kinsella:
>>>
>>> On 19/09/2019 16:16, Bruce Richardson wrote:
>>>> On Thu, Sep 19, 2019 at 02:28:04PM +0100, Ray Kinsella wrote:
>>>>>
>>>>>
>>>>> On 19/09/2019 13:35, Andrzej Ostruszka wrote:
>>>>>> On 9/18/19 3:32 PM, Ray Kinsella wrote:
>>>>> ...>
>>>>>> Compilation time is much longer.  In a normal hack|fix/compile/repeat
>>>>>> cycle with "compile" part being simple "make" the link time might be a
>>>>>> bit annoying.  So I imagine keeping LTO off for the most part of the dev
>>>>>> cycle and then at the end when doing release/cleanup turn LTO on - to
>>>>>> either get release build ready or to get some set of warnings that you
>>>>>> address before yet another attempt to release build.
>>>>>
>>>>> Well look, I would say a few things.
>>>>>
>>>>> 1. I see build times going down dramatically with Meson/Ninja in any case.
>>>>
>>>> In the general case yes, but mainly it helps if you have a large number of
>>>> cores. For anyone building in a CI with only a couple of cores, meson+ninja
>>>> isn't going to help much.
>>>>
>>>> /Bruce
>>>>
>>>
>>> Very true, and I completely acknowledge that LTO adds compilation time.
>>> I would just like to see it in or out, not another build time option.
>>
>> Please can we get some numbers to understand how longer it is?
>>
> 
> I will note here, that it will always be a function of the number of
> object files involved. We should measure it now to understand ...
> 
> 

On my system the non-LTO from-scratch build takes about one minute. With 
LTO enabled, it takes about five minutes.

In addition, it makes our application *run* slower as well.

This reinforces past experience I have with LTO - it's more cool than 
useful, unless you care much about the resulting code size.
  
Andrzej Ostruszka Sept. 23, 2019, 12:03 p.m. UTC | #10
On 9/23/19 9:23 AM, Thomas Monjalon wrote:
[...]
> Please can we get some numbers to understand how longer it is?

Below numbers are for make based (make -j8) clean build on my system:

non-LTO
real: 144.56s, user:451.81s, sys:48.46s, CPU:346%

LTO
real: 607.20s, user:2141.71s, sys:88.36s, CPU:367%

So it is similar ~5x increase as Mattias has reported.  Have not
measured it, but the lion share of that increase is due to linking of
'test' apps.

I would vote for leaving LTO as an option - although I must admit I did
not get what Ray meant by saying:

20/09/2019 09:38, Ray Kinsella:
[...]
> I would just like to see it in or out, not another build time option.

If "in or out" means "either accept the patches with LTO on and no
config option or reject them" then I disagree.  Even if run time
improvements are questionable I find the additional link time warnings
beneficial and would like to have an easy way to turn them on when doing
final touches before pushing out.

Regards
Andrzej
  
Bruce Richardson Sept. 23, 2019, 12:06 p.m. UTC | #11
On Mon, Sep 23, 2019 at 02:03:35PM +0200, Andrzej Ostruszka wrote:
> On 9/23/19 9:23 AM, Thomas Monjalon wrote:
> [...]
> > Please can we get some numbers to understand how longer it is?
> 
> Below numbers are for make based (make -j8) clean build on my system:
> 
> non-LTO
> real: 144.56s, user:451.81s, sys:48.46s, CPU:346%
> 
> LTO
> real: 607.20s, user:2141.71s, sys:88.36s, CPU:367%
> 
> So it is similar ~5x increase as Mattias has reported.  Have not
> measured it, but the lion share of that increase is due to linking of
> 'test' apps.
> 

Interesting. Do we want to explicitly not use lto for the test app?

> I would vote for leaving LTO as an option - although I must admit I did
> not get what Ray meant by saying:
> 
> 20/09/2019 09:38, Ray Kinsella:
> [...]
> > I would just like to see it in or out, not another build time option.
> 
> If "in or out" means "either accept the patches with LTO on and no
> config option or reject them" then I disagree.  Even if run time
> improvements are questionable I find the additional link time warnings
> beneficial and would like to have an easy way to turn them on when doing
> final touches before pushing out.

Looking at it from the meson build view-point, support for lto is built
into the tool itself, so the support can't be removed as such - it will
either result in a working build or not. Therefore, I'm for taking a patch
to support lto in meson, whatever about supporting it through make etc.

/Bruce
  
Ray Kinsella Sept. 23, 2019, 12:16 p.m. UTC | #12
> If "in or out" means "either accept the patches with LTO on and no
> config option or reject them" then I disagree.  Even if run time
> improvements are questionable I find the additional link time warnings
> beneficial and would like to have an easy way to turn them on when doing
> final touches before pushing out.

My rational for my "in or out" suggestion, is always to have as few
rarely used build options as possible. To reduce the number of things
the community has to support.

Ray K
  
Andrzej Ostruszka Sept. 23, 2019, 1:02 p.m. UTC | #13
On 9/23/19 2:06 PM, Bruce Richardson wrote:
> On Mon, Sep 23, 2019 at 02:03:35PM +0200, Andrzej Ostruszka wrote:
[...]
>> So it is similar ~5x increase as Mattias has reported.  Have not
>> measured it, but the lion share of that increase is due to linking of
>> 'test' apps.
>>
> 
> Interesting. Do we want to explicitly not use lto for the test app?

It is the linking of these apps where LTO really kicks in.  During
compilation of objects compiler just additionally generates internal
representation (in extra sections of ELF object).  Linking these objects
into library does not do much - so the actual optimization is done when
producing final executable.

It might be so that when using dedicated user app the penalty would be
much smaller since the amount of code used would be much smaller - but
FWIW all the warnings that I was fixing were produced by the linking of
app/test* binaries.

Regards
Andrzej
  
Bruce Richardson Sept. 23, 2019, 4:13 p.m. UTC | #14
On Mon, Sep 23, 2019 at 03:02:25PM +0200, Andrzej Ostruszka wrote:
> On 9/23/19 2:06 PM, Bruce Richardson wrote:
> > On Mon, Sep 23, 2019 at 02:03:35PM +0200, Andrzej Ostruszka wrote:
> [...]
> >> So it is similar ~5x increase as Mattias has reported.  Have not
> >> measured it, but the lion share of that increase is due to linking of
> >> 'test' apps.
> >>
> > 
> > Interesting. Do we want to explicitly not use lto for the test app?
> 
> It is the linking of these apps where LTO really kicks in.  During
> compilation of objects compiler just additionally generates internal
> representation (in extra sections of ELF object).  Linking these objects
> into library does not do much - so the actual optimization is done when
> producing final executable.
> 
> It might be so that when using dedicated user app the penalty would be
> much smaller since the amount of code used would be much smaller - but
> FWIW all the warnings that I was fixing were produced by the linking of
> app/test* binaries.
> 
I wasn't suggesting removing it for testpmd test-acl etc., just for the
autotest "test" binary since it's the one with the most .o files.

However, testing on my system with the meson build, I'm getting lots of
link errors [See below]. Any suggestions?

/Bruce

/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans93.ltrans.o: in function `ena_stop':
<artificial>:(.text+0x9ed6): undefined reference to `rte_timer_stop'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans93.ltrans.o: in function `ena_start':
<artificial>:(.text+0xae23): undefined reference to `rte_timer_reset'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans93.ltrans.o: in function `eth_ena_dev_init.cold':
<artificial>:(.text.unlikely+0x224a): undefined reference to `rte_timer_subsystem_init'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans119.ltrans.o: in function `rte_distributor_flush_v1705':
<artificial>:(.text+0xa229): undefined reference to `rte_distributor_process'
/usr/bin/ld: <artificial>:(.text+0xa256): undefined reference to `rte_distributor_process'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans119.ltrans.o: in function `rte_distributor_get_pkt_v1705':
<artificial>:(.text+0xd24b): undefined reference to `rte_distributor_request_pkt'
/usr/bin/ld: <artificial>:(.text+0xd259): undefined reference to `rte_distributor_poll_pkt'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans122.ltrans.o: in function `rte_table_lpm_ipv6_lookup.lto_priv.0':
<artificial>:(.text+0x810c): undefined reference to `rte_lpm6_lookup'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans122.ltrans.o: in function `rte_table_lpm_ipv6_entry_delete.lto_priv.0':
<artificial>:(.text+0x81bb): undefined reference to `rte_lpm6_is_rule_present'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans122.ltrans.o: in function `rte_table_lpm_ipv6_entry_add.lto_priv.0':
<artificial>:(.text+0x8e13): undefined reference to `rte_lpm6_is_rule_present'
/usr/bin/ld: <artificial>:(.text+0x8e79): undefined reference to `rte_lpm6_add'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans122.ltrans.o: in function `rte_table_lpm_entry_delete.lto_priv.0':
<artificial>:(.text+0x9b15): undefined reference to `rte_lpm_is_rule_present'
/usr/bin/ld: <artificial>:(.text+0x9b4c): undefined reference to `rte_lpm_delete'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans122.ltrans.o: in function `rte_table_lpm_entry_add.lto_priv.0':
<artificial>:(.text+0x9c06): undefined reference to `rte_lpm_is_rule_present'
/usr/bin/ld: <artificial>:(.text+0x9c73): undefined reference to `rte_lpm_add'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans122.ltrans.o: in function `rte_table_lpm_free.lto_priv.0':
<artificial>:(.text+0x9d12): undefined reference to `rte_lpm_free'
/usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans122.ltrans.o: in function `rte_table_lpm_create.lto_priv.0':
<artificial>:(.text+0x9e01): undefined reference to `rte_lpm_create'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
  
Andrzej Ostruszka Sept. 24, 2019, 6:46 a.m. UTC | #15
On 9/23/19 6:13 PM, Bruce Richardson wrote:
[...]
> However, testing on my system with the meson build, I'm getting lots of
> link errors [See below]. Any suggestions?
> 
> /Bruce
> 
> /usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans93.ltrans.o: in function `ena_stop':
> <artificial>:(.text+0x9ed6): undefined reference to `rte_timer_stop'
[...]

What is 'default_library'?  It should be 'shared' as mentioned in the
docs.  The problem is that RTE_BUILD_SHARED_LIB is statically defined in
rte_config.h used by meson build.  This results in broken static
libraries (those that are using versioned symbols - like
timer/lpm/distributor) - since the MAP_STATIC_SYMBOL macro defining the
default alias is empty.  With LTO compiler (or rather linker) is quite
aggressive in removing stuff that it thinks is not needed.

Regards
Andrzej

PS. IMHO this SHARED_LIB define should be removed from the rte_config.h
and meson.build should be updated to detect 'default_library' and add it
as needed.  Don't know exactly how meson behaves if 'default_library' is
'both' - the docs say that it reuses objects from static build, so we
might have to work around it for LTO & 'both'.
  
Bruce Richardson Sept. 24, 2019, 10:25 a.m. UTC | #16
On Tue, Sep 24, 2019 at 08:46:25AM +0200, Andrzej Ostruszka wrote:
> On 9/23/19 6:13 PM, Bruce Richardson wrote:
> [...]
> > However, testing on my system with the meson build, I'm getting lots of
> > link errors [See below]. Any suggestions?
> > 
> > /Bruce
> > 
> > /usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans93.ltrans.o: in function `ena_stop':
> > <artificial>:(.text+0x9ed6): undefined reference to `rte_timer_stop'
> [...]
> 
> What is 'default_library'?  It should be 'shared' as mentioned in the
> docs.  The problem is that RTE_BUILD_SHARED_LIB is statically defined in
> rte_config.h used by meson build.  This results in broken static
> libraries (those that are using versioned symbols - like
> timer/lpm/distributor) - since the MAP_STATIC_SYMBOL macro defining the
> default alias is empty.  With LTO compiler (or rather linker) is quite
> aggressive in removing stuff that it thinks is not needed.
> 
> Regards
> Andrzej
> 
> PS. IMHO this SHARED_LIB define should be removed from the rte_config.h
> and meson.build should be updated to detect 'default_library' and add it
> as needed.  Don't know exactly how meson behaves if 'default_library' is
> 'both' - the docs say that it reuses objects from static build, so we
> might have to work around it for LTO & 'both'.

That proposal won't work either as we build both static and shared
libraries in all cases - the default_library value only affects whether the
test apps are linked against the .a or .so files.

The real issue seems to be that the compat.h header has different
compilation paths for static and shared libraries, which means that any C
file including it can't have a .o file that can be used for both a .a and a
.so simultaneously. Having it default to the shared library path seems to
work fine thus far, but with LTO it seems broken as you say. Adding Neil as
the original file author of this in case he has any suggestions here. I'd
really rather not have to go back to building .a's and .so's separately.

Regards,
/Bruce
  
Andrzej Ostruszka Sept. 24, 2019, 11:52 a.m. UTC | #17
On 9/24/19 12:25 PM, Bruce Richardson wrote:
> On Tue, Sep 24, 2019 at 08:46:25AM +0200, Andrzej Ostruszka wrote:
[...]
>> PS. IMHO this SHARED_LIB define should be removed from the rte_config.h
>> and meson.build should be updated to detect 'default_library' and add it
>> as needed.  Don't know exactly how meson behaves if 'default_library' is
>> 'both' - the docs say that it reuses objects from static build, so we
>> might have to work around it for LTO & 'both'.
> 
> That proposal won't work either as we build both static and shared
> libraries in all cases - the default_library value only affects whether the
> test apps are linked against the .a or .so files.
> 
> The real issue seems to be that the compat.h header has different
> compilation paths for static and shared libraries, which means that any C
> file including it can't have a .o file that can be used for both a .a and a
> .so simultaneously. Having it default to the shared library path seems to
> work fine thus far, but with LTO it seems broken as you say. Adding Neil as
> the original file author of this in case he has any suggestions here. I'd
> really rather not have to go back to building .a's and .so's separately.

I thought about this already a bit and I see two ways out.

The one which depends on meson's 'default_library' - for
"static"/"shared" the case would be easy (along the lines I've
mentioned).  The "both" would be tricky if we want to rely on meson
built-in behaviour.

The other is that we use 'default_library' as we do today (as an
indicator which library should be used when linking apps).  In that case
all is good apart from building static libs out of object files.  That
would require additional step.  That is each object, for which source
uses BIND_DEFAULT_SYMBOL/MAP_STATIC_SYMBOL, would need to be processed
with "objcopy --add-symbol" adding correct alias before being used to
create static lib.

Just my 2 cents.

Regards
Andrzej
  
Bruce Richardson Sept. 24, 2019, 12:11 p.m. UTC | #18
On Tue, Sep 24, 2019 at 01:52:23PM +0200, Andrzej Ostruszka wrote:
> 
> On 9/24/19 12:25 PM, Bruce Richardson wrote:
> > On Tue, Sep 24, 2019 at 08:46:25AM +0200, Andrzej Ostruszka wrote:
> [...]
> >> PS. IMHO this SHARED_LIB define should be removed from the rte_config.h
> >> and meson.build should be updated to detect 'default_library' and add it
> >> as needed.  Don't know exactly how meson behaves if 'default_library' is
> >> 'both' - the docs say that it reuses objects from static build, so we
> >> might have to work around it for LTO & 'both'.
> > 
> > That proposal won't work either as we build both static and shared
> > libraries in all cases - the default_library value only affects whether the
> > test apps are linked against the .a or .so files.
> > 
> > The real issue seems to be that the compat.h header has different
> > compilation paths for static and shared libraries, which means that any C
> > file including it can't have a .o file that can be used for both a .a and a
> > .so simultaneously. Having it default to the shared library path seems to
> > work fine thus far, but with LTO it seems broken as you say. Adding Neil as
> > the original file author of this in case he has any suggestions here. I'd
> > really rather not have to go back to building .a's and .so's separately.
> 
> I thought about this already a bit and I see two ways out.
> 
> The one which depends on meson's 'default_library' - for
> "static"/"shared" the case would be easy (along the lines I've
> mentioned).  The "both" would be tricky if we want to rely on meson
> built-in behaviour.
> 
Ignoring the both case, which I think is just unsupported right now - it
wasn't there when we added support originally for building DPDK - using the
shared/static setting still won't solve the problem since it only applied
to DPDK's own apps. The reason for building both static and shared libs is
to give apps the flexibility to link against either as they prefer - using
the pkg-config appropriately. It's perfectly valid to have testpmd linked
statically in a DPDK build so that it can be run from the build directory,
yet have other apps on the system linked dynamically.

> The other is that we use 'default_library' as we do today (as an
> indicator which library should be used when linking apps).  In that case
> all is good apart from building static libs out of object files.  That
> would require additional step.  That is each object, for which source
> uses BIND_DEFAULT_SYMBOL/MAP_STATIC_SYMBOL, would need to be processed
> with "objcopy --add-symbol" adding correct alias before being used to
> create static lib.
> 
Interesting. It seems a bit awkward a solution, but may be doable.

> Just my 2 cents.
> 
> Regards
> Andrzej
  
Neil Horman Sept. 24, 2019, 12:59 p.m. UTC | #19
On Tue, Sep 24, 2019 at 12:25:35PM +0200, Bruce Richardson wrote:
> On Tue, Sep 24, 2019 at 08:46:25AM +0200, Andrzej Ostruszka wrote:
> > On 9/23/19 6:13 PM, Bruce Richardson wrote:
> > [...]
> > > However, testing on my system with the meson build, I'm getting lots of
> > > link errors [See below]. Any suggestions?
> > > 
> > > /Bruce
> > > 
> > > /usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans93.ltrans.o: in function `ena_stop':
> > > <artificial>:(.text+0x9ed6): undefined reference to `rte_timer_stop'
> > [...]
> > 
> > What is 'default_library'?  It should be 'shared' as mentioned in the
> > docs.  The problem is that RTE_BUILD_SHARED_LIB is statically defined in
> > rte_config.h used by meson build.  This results in broken static
> > libraries (those that are using versioned symbols - like
> > timer/lpm/distributor) - since the MAP_STATIC_SYMBOL macro defining the
> > default alias is empty.  With LTO compiler (or rather linker) is quite
> > aggressive in removing stuff that it thinks is not needed.
> > 
> > Regards
> > Andrzej
> > 
> > PS. IMHO this SHARED_LIB define should be removed from the rte_config.h
> > and meson.build should be updated to detect 'default_library' and add it
> > as needed.  Don't know exactly how meson behaves if 'default_library' is
> > 'both' - the docs say that it reuses objects from static build, so we
> > might have to work around it for LTO & 'both'.
> 
> That proposal won't work either as we build both static and shared
> libraries in all cases - the default_library value only affects whether the
> test apps are linked against the .a or .so files.
> 
> The real issue seems to be that the compat.h header has different
> compilation paths for static and shared libraries, which means that any C
> file including it can't have a .o file that can be used for both a .a and a
> .so simultaneously. Having it default to the shared library path seems to
> work fine thus far, but with LTO it seems broken as you say. Adding Neil as
> the original file author of this in case he has any suggestions here. I'd
> really rather not have to go back to building .a's and .so's separately.
> 
The notion of using the same object file to link to a static archive and a dso
seems somewhat suspect to me in general.  I say that because I don't see a way
for the linker to know/prove at link time that the options used to compile an
object for target (a) will be the same as those used to compile the same object
for target (b).  In this particular case, you've identified an issue in
compilation changes that triggers off the building of dso's vs static archives,
but I could envision a scenario in which you might try to build targets for BSD
and Linux in parallel, or even for different machines (i.e. build for a least
common denominator x8664 target, and a highly optimized recent x8664 processor
with all the ISA extensions enabled). We don't do that currently now of course,
but we could, and the only way we could do so would be to rebuild all the
objects with the compilation flags for each separately.

That said, if the goal is to just overcome this particular situation, it might
(strong might), be sufficient to simply augment the MAP_STATIC_SYMBOL macro in
the CONFIG_RTE_BUILD_SHARED_LIB=n case to append the 'used' attribute.
Ostensibly, LTO would be smart enough then to not eliminate the symbol?  Just a
thought.

But I think we need to take care here.  While its fine solve this particular
situation, I think the notion of reusing objects for multiple link targets has
the potential to uncover many issues of this class, which won't be as solveable
without having to just rebuild objects from scratch.

Neil

> Regards,
> /Bruce
>
  
Ray Kinsella Sept. 24, 2019, 4:01 p.m. UTC | #20
On 24/09/2019 13:59, Neil Horman wrote:
> On Tue, Sep 24, 2019 at 12:25:35PM +0200, Bruce Richardson wrote:
>> On Tue, Sep 24, 2019 at 08:46:25AM +0200, Andrzej Ostruszka wrote:
>>> On 9/23/19 6:13 PM, Bruce Richardson wrote:
>>> [...]
>>>> However, testing on my system with the meson build, I'm getting lots of
>>>> link errors [See below]. Any suggestions?
>>>>
>>>> /Bruce
>>>>
>>>> /usr/bin/ld: /tmp/dpdk-testpmd.hncbtE.ltrans93.ltrans.o: in function `ena_stop':
>>>> <artificial>:(.text+0x9ed6): undefined reference to `rte_timer_stop'
>>> [...]
>>>
> The notion of using the same object file to link to a static archive and a dso
> seems somewhat suspect to me in general.

Well I would say there are two different things going on here, one
doesn't exclude the other.

> I say that because I don't see a way
> for the linker to know/prove at link time that the options used to compile an
> object for target (a) will be the same as those used to compile the same object
> for target (b).  

Yes and no. So I might build an object differently for say a different
platform, but I am not sure I would necessarily build it differently for
static versus dynamic (linking is a different story of course).

(lto is also a different story, as when you build objects with lto on,
as I remember you end up with guile not bytecode)

> In this particular case, you've identified an issue in
> compilation changes that triggers off the building of dso's vs static archives,
> but I could envision a scenario in which you might try to build targets for BSD
> and Linux in parallel, or even for different machines (i.e. build for a least
> common denominator x8664 target, and a highly optimized recent x8664 processor
> with all the ISA extensions enabled). We don't do that currently now of course,
> but we could, and the only way we could do so would be to rebuild all the
> objects with the compilation flags for each separately.

Ok - but there is nothing in the above in this that precludes all of
these object variants being used in both static and dynamic builds, it's
all down to how they are integrated - FD.io VPP does this out of the box
as it happens.

> 
> That said, if the goal is to just overcome this particular situation, it might
> (strong might), be sufficient to simply augment the MAP_STATIC_SYMBOL macro in
> the CONFIG_RTE_BUILD_SHARED_LIB=n case to append the 'used' attribute.
> Ostensibly, LTO would be smart enough then to not eliminate the symbol?  Just a
> thought.

+1, that would be a simple solution.

> 
> But I think we need to take care here.  While its fine solve this particular
> situation, I think the notion of reusing objects for multiple link targets has
> the potential to uncover many issues of this class, which won't be as solveable
> without having to just rebuild objects from scratch.
> 
> Neil
> 
>> Regards,
>> /Bruce
>>
  
Andrzej Ostruszka Sept. 26, 2019, 3:32 p.m. UTC | #21
On 9/24/19 2:59 PM, Neil Horman wrote:
> On Tue, Sep 24, 2019 at 12:25:35PM +0200, Bruce Richardson wrote:
>> On Tue, Sep 24, 2019 at 08:46:25AM +0200, Andrzej Ostruszka wrote:
>>> On 9/23/19 6:13 PM, Bruce Richardson wrote:
[...]
>> The real issue seems to be that the compat.h header has different
>> compilation paths for static and shared libraries, which means that any C
>> file including it can't have a .o file that can be used for both a .a and a
>> .so simultaneously. Having it default to the shared library path seems to
>> work fine thus far, but with LTO it seems broken as you say. Adding Neil as
>> the original file author of this in case he has any suggestions here. I'd
>> really rather not have to go back to building .a's and .so's separately.
>>
> The notion of using the same object file to link to a static archive and a dso
> seems somewhat suspect to me in general.

I'd think so too ... but there might be something fishy with gcc here.
More on this below.

[...]
> That said, if the goal is to just overcome this particular situation, it might
> (strong might), be sufficient to simply augment the MAP_STATIC_SYMBOL macro in
> the CONFIG_RTE_BUILD_SHARED_LIB=n case to append the 'used' attribute.
> Ostensibly, LTO would be smart enough then to not eliminate the symbol?  Just a
> thought.
Just to clarify things here is the current status in a nutshell.
1. The make based build work with LTO (both static and shared).  It is
using CONFIG_*_SHARED_LIB option and thus different paths of
rte_compat.h are used.

2. The meson build is not using config and has "SHARED" fixed to "y" in
config/rte_config.h.  This works and produces:
- shared library that is working when linking both w/ and w/o LTO
- static library that is only working when linking w/o LTO - when
linking with LTO then it complains about missing symbols for which
different symbol versions are defined.

Augmenting MAP_STATIC_SYMBOL won't help since it is not used at all in
meson build.

I've played around with couple ideas.  Some of them might sound stupid
for you - but I'll report them anyway

Modifying symbol tables
-----------------------
I thought that since problems are only with versioned symbols then I'll
try to modify the symbols tables archives so they look like in "static
make" case e.g. (for just one symbol):

$ objcopy --strip-symbol=rte_timer_subsystem_init@DPDK_2.0 \
	--redefine-sym \
rte_timer_subsystem_init@@DPDK_19.05=rte_timer_subsystem_init \
	librte_timer.a

After this the symbol table looks like in fully static case but this
doesn't work so I'm pretty sure that when using LTO linker does not
check the symbol table at all and just looks in "lto" sections.

Adding static macro
-------------------
I've added for the shared case macro:
#define MAP_STATIC_SYMBOL(f, p) f \
	__attribute__((alias(RTE_STR(p)),weak))
with the idea that maybe during linking against *.a versioned symbols
are not taken into account and if I add weak non-versioned symbol then
it will be used when linking against *.a and when linking against *.so
the strong versioned one will be used.  This thinking is in contrast
with the fact that meson build works w/o LTO where only versioned
symbols are present in those problematic libs :) - and it doesn't work.
Linker reports multiple definitions.

So adding these two together it seems to me that:
a) gcc is using only internal "lto" sections when linking with LTO
b) gcc is storing those versioned symbols in those "lto" sections
c) when linking w/ LTO against archives is not capable to use those
versioned symbols in "lto" sections
d) when linking w/ LTO against shared libraries is capable to use
versioned symbols from "lto" sections
e) when linking w/o LTO against archive is capable to use versioned
symbols from global symbol table (in contrast with point 'c' above)

I'd appreciate some input from those who know gcc internals.  In the
meantime I'll try to come up with minimal example and follow up on gcc
related lists/groups.

Regards
Andrzej
  
Bruce Richardson Sept. 27, 2019, 7:55 p.m. UTC | #22
On Thu, Sep 26, 2019 at 05:32:12PM +0200, Andrzej Ostruszka wrote:
> On 9/24/19 2:59 PM, Neil Horman wrote:
> > On Tue, Sep 24, 2019 at 12:25:35PM +0200, Bruce Richardson wrote:
> >> On Tue, Sep 24, 2019 at 08:46:25AM +0200, Andrzej Ostruszka wrote:
> >>> On 9/23/19 6:13 PM, Bruce Richardson wrote:
> [...]
> >> The real issue seems to be that the compat.h header has different
> >> compilation paths for static and shared libraries, which means that any C
> >> file including it can't have a .o file that can be used for both a .a and a
> >> .so simultaneously. Having it default to the shared library path seems to
> >> work fine thus far, but with LTO it seems broken as you say. Adding Neil as
> >> the original file author of this in case he has any suggestions here. I'd
> >> really rather not have to go back to building .a's and .so's separately.
> >>
> > The notion of using the same object file to link to a static archive and a dso
> > seems somewhat suspect to me in general.
> 
> I'd think so too ... but there might be something fishy with gcc here.
> More on this below.
> 
> [...]
> > That said, if the goal is to just overcome this particular situation, it might
> > (strong might), be sufficient to simply augment the MAP_STATIC_SYMBOL macro in
> > the CONFIG_RTE_BUILD_SHARED_LIB=n case to append the 'used' attribute.
> > Ostensibly, LTO would be smart enough then to not eliminate the symbol?  Just a
> > thought.
> Just to clarify things here is the current status in a nutshell.
> 1. The make based build work with LTO (both static and shared).  It is
> using CONFIG_*_SHARED_LIB option and thus different paths of
> rte_compat.h are used.
> 
> 2. The meson build is not using config and has "SHARED" fixed to "y" in
> config/rte_config.h.  This works and produces:
> - shared library that is working when linking both w/ and w/o LTO
> - static library that is only working when linking w/o LTO - when
> linking with LTO then it complains about missing symbols for which
> different symbol versions are defined.
> 
> Augmenting MAP_STATIC_SYMBOL won't help since it is not used at all in
> meson build.
> 
> I've played around with couple ideas.  Some of them might sound stupid
> for you - but I'll report them anyway
> 
> Modifying symbol tables
> -----------------------
> I thought that since problems are only with versioned symbols then I'll
> try to modify the symbols tables archives so they look like in "static
> make" case e.g. (for just one symbol):
> 
> $ objcopy --strip-symbol=rte_timer_subsystem_init@DPDK_2.0 \
> 	--redefine-sym \
> rte_timer_subsystem_init@@DPDK_19.05=rte_timer_subsystem_init \
> 	librte_timer.a
> 
> After this the symbol table looks like in fully static case but this
> doesn't work so I'm pretty sure that when using LTO linker does not
> check the symbol table at all and just looks in "lto" sections.
> 
> Adding static macro
> -------------------
> I've added for the shared case macro:
> #define MAP_STATIC_SYMBOL(f, p) f \
> 	__attribute__((alias(RTE_STR(p)),weak))
> with the idea that maybe during linking against *.a versioned symbols
> are not taken into account and if I add weak non-versioned symbol then
> it will be used when linking against *.a and when linking against *.so
> the strong versioned one will be used.  This thinking is in contrast
> with the fact that meson build works w/o LTO where only versioned
> symbols are present in those problematic libs :) - and it doesn't work.
> Linker reports multiple definitions.
> 
> So adding these two together it seems to me that:
> a) gcc is using only internal "lto" sections when linking with LTO
> b) gcc is storing those versioned symbols in those "lto" sections
> c) when linking w/ LTO against archives is not capable to use those
> versioned symbols in "lto" sections
> d) when linking w/ LTO against shared libraries is capable to use
> versioned symbols from "lto" sections
> e) when linking w/o LTO against archive is capable to use versioned
> symbols from global symbol table (in contrast with point 'c' above)
> 
> I'd appreciate some input from those who know gcc internals.  In the
> meantime I'll try to come up with minimal example and follow up on gcc
> related lists/groups.
> 
Please have a look over the patchset I just posted, as one possible
solution to this. http://patches.dpdk.org/project/dpdk/list/?series=6594

For the general case of compiling DPDK, a given block of C code is going to
result in the same object file whether compiled for static or shared (so
long as -fPIC is passed to the compile), so running two compiles for each
and every C file would be wasted effort. However, for those files with
function versioning, I think any proper solution has to involve compiling
twice with different macros, so that's what the patchset seeks to achieve.

/Bruce
  
Thomas Monjalon Oct. 27, 2019, 11:31 a.m. UTC | #23
18/09/2019 15:32, Ray Kinsella:
> this is cool, good work.
> comments below.
> 
> On 17/09/2019 08:57, Andrzej Ostruszka wrote:
> > --- a/lib/librte_distributor/rte_distributor.c
> > +++ b/lib/librte_distributor/rte_distributor.c
> > @@ -32,7 +32,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
> >  
> >  /**** Burst Packet APIs called by workers ****/
> >  
> > -void
> > +void __vsym
> 
> all these additional __vsym annotations looks like they belong in a
> seperate patch, as they are fixing a bug and are not directly related to
> adding LTO the build system.

Andrzej, you did not reply to this question.
This is a real blocker for merging this series.

Should __vsym addition be in a separate patch?
Should we document its use in rte_function_versioning.h
and versioning.rst?

Please explain, thanks.
  
Andrzej Ostruszka Oct. 28, 2019, 8:36 a.m. UTC | #24
On 10/27/19 12:31 PM, Thomas Monjalon wrote:
> 18/09/2019 15:32, Ray Kinsella:
>> this is cool, good work.
>> comments below.
>>
>> On 17/09/2019 08:57, Andrzej Ostruszka wrote:
>>> --- a/lib/librte_distributor/rte_distributor.c
>>> +++ b/lib/librte_distributor/rte_distributor.c
>>> @@ -32,7 +32,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
>>>  
>>>  /**** Burst Packet APIs called by workers ****/
>>>  
>>> -void
>>> +void __vsym
>>
>> all these additional __vsym annotations looks like they belong in a
>> seperate patch, as they are fixing a bug and are not directly related to
>> adding LTO the build system.
> 
> Andrzej, you did not reply to this question.
> This is a real blocker for merging this series.

Thomas, thank you for the reminder.  Somehow that comment has escaped me
- although I've read it then.

> Should __vsym addition be in a separate patch?

I'm fine both ways.  You could argue that:
- it is a bug since '__vsym' clearly annotates the function as being
  used as a particular version of a symbol and as such it was missing
- or as a part of enablement for LTO since without it compiler/linker
  should not be removing given function and '__vsym' really is just a
  "attribute(used)" to tell optimizing compiler/linker that this
  function should not be removed.

Since you raised that question I'm guessing that you prefer it to be in
a separate patch - so, unless you object now, I'm going to split it in
the next version and have it as a first patch.

> Should we document its use in rte_function_versioning.h
> and versioning.rst?

Yes, I think so.  I'll add that.

Again, thank you for the reminder and comments.

Regards
Andrzej
  
Thomas Monjalon Oct. 28, 2019, 9:07 a.m. UTC | #25
28/10/2019 09:36, Andrzej Ostruszka:
> On 10/27/19 12:31 PM, Thomas Monjalon wrote:
> > 18/09/2019 15:32, Ray Kinsella:
> >> this is cool, good work.
> >> comments below.
> >>
> >> On 17/09/2019 08:57, Andrzej Ostruszka wrote:
> >>> --- a/lib/librte_distributor/rte_distributor.c
> >>> +++ b/lib/librte_distributor/rte_distributor.c
> >>> @@ -32,7 +32,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
> >>>  
> >>>  /**** Burst Packet APIs called by workers ****/
> >>>  
> >>> -void
> >>> +void __vsym
> >>
> >> all these additional __vsym annotations looks like they belong in a
> >> seperate patch, as they are fixing a bug and are not directly related to
> >> adding LTO the build system.
> > 
> > Andrzej, you did not reply to this question.
> > This is a real blocker for merging this series.
> 
> Thomas, thank you for the reminder.  Somehow that comment has escaped me
> - although I've read it then.
> 
> > Should __vsym addition be in a separate patch?
> 
> I'm fine both ways.  You could argue that:
> - it is a bug since '__vsym' clearly annotates the function as being
>   used as a particular version of a symbol and as such it was missing
> - or as a part of enablement for LTO since without it compiler/linker
>   should not be removing given function and '__vsym' really is just a
>   "attribute(used)" to tell optimizing compiler/linker that this
>   function should not be removed.
> 
> Since you raised that question I'm guessing that you prefer it to be in
> a separate patch - so, unless you object now, I'm going to split it in
> the next version and have it as a first patch.
> 
> > Should we document its use in rte_function_versioning.h
> > and versioning.rst?
> 
> Yes, I think so.  I'll add that.
> 
> Again, thank you for the reminder and comments.

Thank you, a separate patch with doc update is very welcome.
  
Andrzej Ostruszka Oct. 28, 2019, 12:12 p.m. UTC | #26
On 10/28/19 9:36 AM, Andrzej Ostruszka wrote:
> On 10/27/19 12:31 PM, Thomas Monjalon wrote:
[...]
>> Should we document its use in rte_function_versioning.h
>> and versioning.rst?
> 
> Yes, I think so.  I'll add that.

One quick notice.  There is a slight mismatch between documentation of
VERSION_SYMBOL/BIND_DEFAULT_SYMBOL and their implementation i.e. the
docs claim that the underscore is added by these macros (b_e) while it
is currently being supplied at the macro invocations.

I'll try to update docs to match the header but if you'd like to change
it the other way* please let me know.

Regards
Andrzej

[*] Change the implementation of these macros to use:
    ... RTE_STR(b) "_" RTE_STR(e)
  and remove underscore from all invocations of these macros.
  
Thomas Monjalon Oct. 28, 2019, 5:16 p.m. UTC | #27
28/10/2019 13:12, Andrzej Ostruszka:
> On 10/28/19 9:36 AM, Andrzej Ostruszka wrote:
> > On 10/27/19 12:31 PM, Thomas Monjalon wrote:
> [...]
> >> Should we document its use in rte_function_versioning.h
> >> and versioning.rst?
> > 
> > Yes, I think so.  I'll add that.
> 
> One quick notice.  There is a slight mismatch between documentation of
> VERSION_SYMBOL/BIND_DEFAULT_SYMBOL and their implementation i.e. the
> docs claim that the underscore is added by these macros (b_e) while it
> is currently being supplied at the macro invocations.
> 
> I'll try to update docs to match the header but if you'd like to change
> it the other way* please let me know.
> 
> Regards
> Andrzej
> 
> [*] Change the implementation of these macros to use:
>     ... RTE_STR(b) "_" RTE_STR(e)
>   and remove underscore from all invocations of these macros.

Please do the doc fix in a separate patch, Cc'ing Neil Horman
who can decide what is best.
Thanks
  

Patch

diff --git a/.travis.yml b/.travis.yml
index 781f9f666..70d221852 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -31,6 +31,7 @@  env:
   - DEF_LIB="static" OPTS="-Denable_kmods=false"
   - DEF_LIB="shared" OPTS="-Denable_kmods=false"
   - DEF_LIB="shared" RUN_TESTS=1 BUILD_DOCS=1
+  - DEF_LIB="shared" OPTS="-Db_lto=true"
 
 matrix:
   include:
@@ -100,6 +101,12 @@  matrix:
       apt:
         packages:
           - *extra_packages
+  - env: DEF_LIB="shared" OPTS="-Db_lto=true" EXTRA_PACKAGES=1
+    compiler: gcc
+    addons:
+      apt:
+        packages:
+          - *extra_packages
 
 
 script: ./.ci/${TRAVIS_OS_NAME}-build.sh
diff --git a/config/common_base b/config/common_base
index 8ef75c203..73a55fdec 100644
--- a/config/common_base
+++ b/config/common_base
@@ -49,6 +49,11 @@  CONFIG_RTE_FORCE_INTRINSICS=n
 #
 CONFIG_RTE_ARCH_STRICT_ALIGN=n
 
+#
+# Enable link time optimization
+#
+CONFIG_RTE_ENABLE_LTO=n
+
 #
 # Compile to share library
 #
diff --git a/config/meson.build b/config/meson.build
index 2bafea530..97bbc323b 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -196,3 +196,18 @@  add_project_arguments('-D_GNU_SOURCE', language: 'c')
 if is_freebsd
 	add_project_arguments('-D__BSD_VISIBLE', language: 'c')
 endif
+
+if get_option('b_lto')
+	if cc.has_argument('-ffat-lto-objects')
+		add_project_arguments('-ffat-lto-objects', language: 'c')
+	else
+		error('compiler does not support fat LTO objects - please turn LTO off')
+	endif
+	if cc.get_id() == 'gcc'
+		# workaround for bug 81440
+		if cc.version().version_compare('<8.0')
+			add_project_arguments('-Wno-lto-type-mismatch', language: 'c')
+			add_project_link_arguments('-Wno-lto-type-mismatch', language: 'c')
+		endif
+	endif
+endif
diff --git a/doc/guides/prog_guide/lto.rst b/doc/guides/prog_guide/lto.rst
new file mode 100644
index 000000000..b2b36e51c
--- /dev/null
+++ b/doc/guides/prog_guide/lto.rst
@@ -0,0 +1,37 @@ 
+Link Time Optimization
+======================
+
+The DPDK framework supports compilation with link time optimization
+turned on.  This depends obviously on the capabilities of the compiler
+to do "whole program" optimization at link time and is available only
+for compilers that support that feature (gcc, clang and icc).  To be
+more specific compiler have to support creation of ELF objects
+containing both normal code and internal representation
+(fat-lto-objects).  This is required since during build some code is
+generated by parsing produced ELF objects (pmdinfogen).
+
+The amount of performance gain that one can get from LTO depends on the
+compiler and the code that is being compiled.  However LTO is also
+useful for additional code analysis done by the compiler.  In particular
+due to interprocedural analysis compiler can produce additional warnings
+about variables that might be used uninitialized.  Some of these
+warnings might be "false positives" though and you might need to
+explicitly initialize variable in order to silence the compiler.
+
+Link time optimization can be enabled for whole DPDK framework by
+setting:
+
+.. code-block:: console
+    CONFIG_ENABLE_LTO=y
+
+in config file for the case of make based build and by:
+
+.. code-block:: console
+    meson build -Db_lto=true -Ddefault_library=shared
+    ninja -C build
+
+for the case of meson based build (only shared libraries are supported
+when building with meson and LTO enabled).
+
+Please note that turning LTO on causes considerable extension of
+compilation time.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 8490d897c..97b4f4083 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,14 @@  New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+**Added build support for Link Time Optimization.**
+
+ LTO is an optimization technique used by the compiler to perform whole
+ program analysis and optimization at link time.  In order to do that
+ compilers store their internal representation of the source code that
+ the linker uses at the final stage of compilation process.
+
+ See :doc:`../prog_guide/lto` for more information:
 
 Removed Items
 -------------
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 21eb1fb0a..848250f4a 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -32,7 +32,7 @@  EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 
 /**** Burst Packet APIs called by workers ****/
 
-void
+void __vsym
 rte_distributor_request_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
@@ -84,7 +84,7 @@  MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
 		unsigned int count),
 		rte_distributor_request_pkt_v1705);
 
-int
+int __vsym
 rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
@@ -124,7 +124,7 @@  MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts),
 		rte_distributor_poll_pkt_v1705);
 
-int
+int __vsym
 rte_distributor_get_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
@@ -159,7 +159,7 @@  MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
 		struct rte_mbuf **oldpkt, unsigned int return_count),
 		rte_distributor_get_pkt_v1705);
 
-int
+int __vsym
 rte_distributor_return_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
@@ -335,7 +335,7 @@  release(struct rte_distributor *d, unsigned int wkr)
 
 
 /* process a set of packets to distribute them to workers */
-int
+int __vsym
 rte_distributor_process_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
@@ -476,7 +476,7 @@  MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
 		rte_distributor_process_v1705);
 
 /* return to the caller, packets returned from workers */
-int
+int __vsym
 rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
@@ -526,7 +526,7 @@  total_outstanding(const struct rte_distributor *d)
  * Flush the distributor, so that there are no outstanding packets in flight or
  * queued up.
  */
-int
+int __vsym
 rte_distributor_flush_v1705(struct rte_distributor *d)
 {
 	unsigned int flushed;
@@ -561,7 +561,7 @@  MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
 		rte_distributor_flush_v1705);
 
 /* clears the internal returns array in the distributor */
-void
+void __vsym
 rte_distributor_clear_returns_v1705(struct rte_distributor *d)
 {
 	unsigned int wkr;
@@ -581,7 +581,7 @@  MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
 		rte_distributor_clear_returns_v1705);
 
 /* creates a distributor instance */
-struct rte_distributor *
+struct rte_distributor * __vsym
 rte_distributor_create_v1705(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index cdc0969a8..31c766421 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -27,7 +27,7 @@  EAL_REGISTER_TAILQ(rte_distributor_tailq)
 
 /**** APIs called by workers ****/
 
-void
+void __vsym
 rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
@@ -40,7 +40,7 @@  rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 }
 VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
-struct rte_mbuf *
+struct rte_mbuf * __vsym
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id)
 {
@@ -54,7 +54,7 @@  rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 }
 VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
-struct rte_mbuf *
+struct rte_mbuf * __vsym
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
@@ -66,7 +66,7 @@  rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 }
 VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
-int
+int __vsym
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
@@ -191,7 +191,7 @@  process_returns(struct rte_distributor_v20 *d)
 }
 
 /* process a set of packets to distribute them to workers */
-int
+int __vsym
 rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned num_mbufs)
 {
@@ -296,7 +296,7 @@  rte_distributor_process_v20(struct rte_distributor_v20 *d,
 VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
-int
+int __vsym
 rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned max_mbufs)
 {
@@ -334,7 +334,7 @@  total_outstanding(const struct rte_distributor_v20 *d)
 
 /* flush the distributor, so that there are no outstanding packets in flight or
  * queued up. */
-int
+int __vsym
 rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 {
 	const unsigned flushed = total_outstanding(d);
@@ -347,7 +347,7 @@  rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
-void
+void __vsym
 rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 {
 	d->returns.start = d->returns.count = 0;
@@ -358,7 +358,7 @@  rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
-struct rte_distributor_v20 *
+struct rte_distributor_v20 * __vsym
 rte_distributor_create_v20(const char *name,
 		unsigned socket_id,
 		unsigned num_workers)
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 3a929a1b1..a2fba8d61 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -89,7 +89,7 @@  depth_to_range(uint8_t depth)
 /*
  * Find an existing lpm table and return a pointer to it.
  */
-struct rte_lpm_v20 *
+struct rte_lpm_v20 * __vsym
 rte_lpm_find_existing_v20(const char *name)
 {
 	struct rte_lpm_v20 *l = NULL;
@@ -115,7 +115,7 @@  rte_lpm_find_existing_v20(const char *name)
 }
 VERSION_SYMBOL(rte_lpm_find_existing, _v20, 2.0);
 
-struct rte_lpm *
+struct rte_lpm * __vsym
 rte_lpm_find_existing_v1604(const char *name)
 {
 	struct rte_lpm *l = NULL;
@@ -146,7 +146,7 @@  MAP_STATIC_SYMBOL(struct rte_lpm *rte_lpm_find_existing(const char *name),
 /*
  * Allocates memory for LPM object
  */
-struct rte_lpm_v20 *
+struct rte_lpm_v20 * __vsym
 rte_lpm_create_v20(const char *name, int socket_id, int max_rules,
 		__rte_unused int flags)
 {
@@ -219,7 +219,7 @@  rte_lpm_create_v20(const char *name, int socket_id, int max_rules,
 }
 VERSION_SYMBOL(rte_lpm_create, _v20, 2.0);
 
-struct rte_lpm *
+struct rte_lpm * __vsym
 rte_lpm_create_v1604(const char *name, int socket_id,
 		const struct rte_lpm_config *config)
 {
@@ -328,7 +328,7 @@  MAP_STATIC_SYMBOL(
 /*
  * Deallocates memory for given LPM table.
  */
-void
+void __vsym
 rte_lpm_free_v20(struct rte_lpm_v20 *lpm)
 {
 	struct rte_lpm_list *lpm_list;
@@ -357,7 +357,7 @@  rte_lpm_free_v20(struct rte_lpm_v20 *lpm)
 }
 VERSION_SYMBOL(rte_lpm_free, _v20, 2.0);
 
-void
+void __vsym
 rte_lpm_free_v1604(struct rte_lpm *lpm)
 {
 	struct rte_lpm_list *lpm_list;
@@ -1176,7 +1176,7 @@  add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 /*
  * Add a route
  */
-int
+int __vsym
 rte_lpm_add_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth,
 		uint8_t next_hop)
 {
@@ -1217,7 +1217,7 @@  rte_lpm_add_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth,
 }
 VERSION_SYMBOL(rte_lpm_add, _v20, 2.0);
 
-int
+int __vsym
 rte_lpm_add_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth,
 		uint32_t next_hop)
 {
@@ -1263,7 +1263,7 @@  MAP_STATIC_SYMBOL(int rte_lpm_add(struct rte_lpm *lpm, uint32_t ip,
 /*
  * Look for a rule in the high-level rules table
  */
-int
+int __vsym
 rte_lpm_is_rule_present_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth,
 uint8_t *next_hop)
 {
@@ -1290,7 +1290,7 @@  uint8_t *next_hop)
 }
 VERSION_SYMBOL(rte_lpm_is_rule_present, _v20, 2.0);
 
-int
+int __vsym
 rte_lpm_is_rule_present_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth,
 uint32_t *next_hop)
 {
@@ -1843,7 +1843,7 @@  delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 /*
  * Deletes a rule
  */
-int
+int __vsym
 rte_lpm_delete_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth)
 {
 	int32_t rule_to_delete_index, sub_rule_index;
@@ -1897,7 +1897,7 @@  rte_lpm_delete_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth)
 }
 VERSION_SYMBOL(rte_lpm_delete, _v20, 2.0);
 
-int
+int __vsym
 rte_lpm_delete_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth)
 {
 	int32_t rule_to_delete_index, sub_rule_index;
@@ -1956,7 +1956,7 @@  MAP_STATIC_SYMBOL(int rte_lpm_delete(struct rte_lpm *lpm, uint32_t ip,
 /*
  * Delete all rules from the LPM table.
  */
-void
+void __vsym
 rte_lpm_delete_all_v20(struct rte_lpm_v20 *lpm)
 {
 	/* Zero rule information. */
@@ -1973,7 +1973,7 @@  rte_lpm_delete_all_v20(struct rte_lpm_v20 *lpm)
 }
 VERSION_SYMBOL(rte_lpm_delete_all, _v20, 2.0);
 
-void
+void __vsym
 rte_lpm_delete_all_v1604(struct rte_lpm *lpm)
 {
 	/* Zero rule information. */
diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 9b8aeb972..49a7fea1d 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -811,7 +811,7 @@  add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
 /*
  * Add a route
  */
-int
+int __vsym
 rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop)
 {
@@ -861,7 +861,7 @@  simulate_add(struct rte_lpm6 *lpm, const uint8_t *masked_ip, uint8_t depth)
 	return 0;
 }
 
-int
+int __vsym
 rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint32_t next_hop)
 {
@@ -954,7 +954,7 @@  lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 /*
  * Looks up an IP
  */
-int
+int __vsym
 rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
 {
 	uint32_t next_hop32 = 0;
@@ -972,7 +972,7 @@  rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
 }
 VERSION_SYMBOL(rte_lpm6_lookup, _v20, 2.0);
 
-int
+int __vsym
 rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
 		uint32_t *next_hop)
 {
@@ -1007,7 +1007,7 @@  MAP_STATIC_SYMBOL(int rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip,
 /*
  * Looks up a group of IP addresses
  */
-int
+int __vsym
 rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int16_t * next_hops, unsigned n)
@@ -1048,7 +1048,7 @@  rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
 }
 VERSION_SYMBOL(rte_lpm6_lookup_bulk_func, _v20, 2.0);
 
-int
+int __vsym
 rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int32_t *next_hops, unsigned int n)
@@ -1098,7 +1098,7 @@  MAP_STATIC_SYMBOL(int rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 /*
  * Look for a rule in the high-level rules table
  */
-int
+int __vsym
 rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t *next_hop)
 {
@@ -1118,7 +1118,7 @@  rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 }
 VERSION_SYMBOL(rte_lpm6_is_rule_present, _v20, 2.0);
 
-int
+int __vsym
 rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint32_t *next_hop)
 {
diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index bdcf05d06..e560ace06 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -131,7 +131,7 @@  rte_timer_data_dealloc(uint32_t id)
 	return 0;
 }
 
-void
+void __vsym
 rte_timer_subsystem_init_v20(void)
 {
 	unsigned lcore_id;
@@ -153,7 +153,7 @@  VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
  * secondary processes should be empty, the zeroth entry can be shared by
  * multiple processes.
  */
-int
+int __vsym
 rte_timer_subsystem_init_v1905(void)
 {
 	const struct rte_memzone *mz;
@@ -551,7 +551,7 @@  __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 }
 
 /* Reset and start the timer associated with the timer handle tim */
-int
+int __vsym
 rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
 		    enum rte_timer_type type, unsigned int tim_lcore,
 		    rte_timer_cb_t fct, void *arg)
@@ -574,7 +574,7 @@  rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
 }
 VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
 
-int
+int __vsym
 rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
 		      enum rte_timer_type type, unsigned int tim_lcore,
 		      rte_timer_cb_t fct, void *arg)
@@ -657,14 +657,14 @@  __rte_timer_stop(struct rte_timer *tim, int local_is_locked,
 }
 
 /* Stop the timer associated with the timer handle tim */
-int
+int __vsym
 rte_timer_stop_v20(struct rte_timer *tim)
 {
 	return __rte_timer_stop(tim, 0, &default_timer_data);
 }
 VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
 
-int
+int __vsym
 rte_timer_stop_v1905(struct rte_timer *tim)
 {
 	return rte_timer_alt_stop(default_data_id, tim);
@@ -817,14 +817,14 @@  __rte_timer_manage(struct rte_timer_data *timer_data)
 	priv_timer[lcore_id].running_tim = NULL;
 }
 
-void
+void __vsym
 rte_timer_manage_v20(void)
 {
 	__rte_timer_manage(&default_timer_data);
 }
 VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
 
-int
+int __vsym
 rte_timer_manage_v1905(void)
 {
 	struct rte_timer_data *timer_data;
@@ -1074,14 +1074,14 @@  __rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
 #endif
 }
 
-void
+void __vsym
 rte_timer_dump_stats_v20(FILE *f)
 {
 	__rte_timer_dump_stats(&default_timer_data, f);
 }
 VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
 
-int
+int __vsym
 rte_timer_dump_stats_v1905(FILE *f)
 {
 	return rte_timer_alt_dump_stats(default_data_id, f);
diff --git a/mk/toolchain/clang/rte.toolchain-compat.mk b/mk/toolchain/clang/rte.toolchain-compat.mk
index e6189b498..78f96c648 100644
--- a/mk/toolchain/clang/rte.toolchain-compat.mk
+++ b/mk/toolchain/clang/rte.toolchain-compat.mk
@@ -20,3 +20,7 @@  CLANG_MINOR_VERSION := $(shell echo $(CLANG_VERSION) | cut -f2 -d.)
 ifeq ($(shell test $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -lt 35 && echo 1), 1)
 	CC_SUPPORTS_Z := false
 endif
+
+ifeq ($(shell test $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -lt 60 && echo 1), 1)
+	CONFIG_RTE_ENABLE_LTO=n
+endif
diff --git a/mk/toolchain/clang/rte.vars.mk b/mk/toolchain/clang/rte.vars.mk
index 3c49dc568..3b1fa05f9 100644
--- a/mk/toolchain/clang/rte.vars.mk
+++ b/mk/toolchain/clang/rte.vars.mk
@@ -48,6 +48,14 @@  endif
 # process cpu flags
 include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
 
+ifeq ($(CONFIG_RTE_ENABLE_LTO),y)
+# 'fat-lto' is used since pmdinfogen needs to have 'this_pmd_nameX'
+# exported in symbol table and without this option only internal
+# representation is present.
+TOOLCHAIN_CFLAGS += -flto -ffat-lto-objects
+TOOLCHAIN_LDFLAGS += -flto
+endif
+
 # workaround clang bug with warning "missing field initializer" for "= {0}"
 WERROR_FLAGS += -Wno-missing-field-initializers
 
diff --git a/mk/toolchain/gcc/rte.toolchain-compat.mk b/mk/toolchain/gcc/rte.toolchain-compat.mk
index ea40a11c0..ad4fad83c 100644
--- a/mk/toolchain/gcc/rte.toolchain-compat.mk
+++ b/mk/toolchain/gcc/rte.toolchain-compat.mk
@@ -88,6 +88,10 @@  else
 		MACHINE_CFLAGS := $(filter-out -march% -mtune% -msse%,$(MACHINE_CFLAGS))
 	endif
 
+	ifeq ($(shell test $(GCC_VERSION) -lt 45 && echo 1), 1)
+		CONFIG_RTE_ENABLE_LTO=n
+	endif
+
 	# Disable thunderx PMD for gcc < 4.7
 	ifeq ($(shell test $(GCC_VERSION) -lt 47 && echo 1), 1)
 		CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD=d
diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
index b852fcfd7..9fc704193 100644
--- a/mk/toolchain/gcc/rte.vars.mk
+++ b/mk/toolchain/gcc/rte.vars.mk
@@ -62,6 +62,18 @@  endif
 # process cpu flags
 include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
 
+ifeq ($(CONFIG_RTE_ENABLE_LTO),y)
+# 'fat-lto' is used since pmdinfogen needs to have 'this_pmd_nameX'
+# exported in symbol table and without this option only internal
+# representation is present.
+TOOLCHAIN_CFLAGS += -flto -ffat-lto-objects
+TOOLCHAIN_LDFLAGS += -flto
+# workaround for GCC bug 81440
+ifeq ($(shell test $(GCC_VERSION) -lt 80 && echo 1), 1)
+WERROR_FLAGS += -Wno-lto-type-mismatch
+endif
+endif
+
 # workaround GCC bug with warning "missing initializer" for "= {0}"
 ifeq ($(shell test $(GCC_VERSION) -lt 47 && echo 1), 1)
 WERROR_FLAGS += -Wno-missing-field-initializers
diff --git a/mk/toolchain/icc/rte.vars.mk b/mk/toolchain/icc/rte.vars.mk
index aa1422bf1..8aa87aa1e 100644
--- a/mk/toolchain/icc/rte.vars.mk
+++ b/mk/toolchain/icc/rte.vars.mk
@@ -54,5 +54,13 @@  endif
 # process cpu flags
 include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
 
+ifeq ($(CONFIG_RTE_ENABLE_LTO),y)
+# 'fat-lto' is used since pmdinfogen needs to have 'this_pmd_nameX'
+# exported in symbol table and without this option only internal
+# representation is present.
+TOOLCHAIN_CFLAGS += -flto -ffat-lto-objects
+TOOLCHAIN_LDFLAGS += -flto
+endif
+
 export CC AS AR LD OBJCOPY OBJDUMP STRIP READELF
 export TOOLCHAIN_CFLAGS TOOLCHAIN_LDFLAGS TOOLCHAIN_ASFLAGS