[v2,00/71] replace use of fixed size rte_mempcy

Message ID 20240301171707.95242-1-stephen@networkplumber.org (mailing list archive)
Headers
Series replace use of fixed size rte_mempcy |

Message

Stephen Hemminger March 1, 2024, 5:14 p.m. UTC
  The DPDK has a lot of "cargo cult" usage of rte_memcpy.
This patch set replaces cases where rte_memcpy is used with a fixed
size constant size.

Typical example is:
	rte_memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
which can be replaced with:
	memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);

This has two benefits. Gcc (and clang) are smart enough that for
all small fixed size values, they just generate the necessary instructions
to do it inline. It also means that fortify, Coverity, and ASAN
analyzers can check these memcpy's.

So faster, better, safer.

The first patch is a simple coccinelle script to do the replacement
and the rest are the results broken out by module.

The coccinelle script can be used again to make sure more bad
usage doesn't creep in with new drivers.

v2 - fix CI failure on some OS by adding string.h
     remove rte_memcpy.h if no longer used

Stephen Hemminger (71):
  cocci/rte_memcpy: add script to eliminate fixed size rte_memcpy
  eal: replace use of fixed size rte_memcpy
  ethdev: replace use of fixed size rte_memcpy
  eventdev: replace use of fixed size rte_memcpy
  cryptodev: replace use of fixed size rte_memcpy
  ip_frag: replace use of fixed size rte_memcpy
  net: replace use of fixed size rte_memcpy
  lpm: replace use of fixed size rte_memcpy
  node: replace use of fixed size rte_memcpy
  pdcp: replace use of fixed size rte_memcpy
  pipeline: replace use of fixed size rte_memcpy
  rib: replace use of fixed size rte_memcpy
  security: replace use of fixed size rte_memcpy
  net/mlx5: replace use of fixed size rte_memcpy
  net/nfp: replace use of fixed size rte_memcpy
  net/ngbe: replace use of fixed size rte_memcpy
  net/null: replace use of fixed size rte_memcpy
  net/pcap: replace use of fixed size rte_memcpy
  net/sfc: replace use of fixed size rte_memcpy
  net/tap: replace use of fixed size rte_memcpy
  net/txgbe: replace use of fixed size rte_memcpy
  raw/ifpga: replace use of fixed size rte_memcpy
  raw/skeleton: replace use of fixed size rte_memcpy
  net/hns3: replace use of fixed size rte_memcpy
  net/i40e: replace use of fixed size rte_memcpy
  net/iavf: replace use of fixed size rte_memcpy
  net/ice: replace use of fixed size rte_memcpy
  net/idpf: replace use of fixed size rte_memcpy
  net/ipn3ke: replace use of fixed size rte_memcpy
  net/ixgbe: replace use of fixed size rte_memcpy
  net/memif: replace use of fixed size rte_memcpy
  net/qede: replace use of fixed size rte_memcpy
  baseband/acc: replace use of fixed size rte_memcpy
  baseband/la12xx: replace use of fixed size rte_memcpy
  common/idpf: replace use of fixed size rte_memcpy
  common/qat: replace use of fixed size rte_memcpy
  compress/qat: replace use of fixed size rte_memcpy
  crypto/ccp: replace use of fixed size rte_memcpy
  crypto/cnxk: replace use of fixed size rte_memcpy
  crypto/dpaa_sec: replace use of fixed size rte_memcpy
  crypto/ipsec_mb: replace use of fixed size rte_memcpy
  crypto/qat: replace use of fixed size rte_memcpy
  crypto/scheduler: replace use of fixed size rte_memcpy
  event/cnxk: replace use of fixed size rte_memcpy
  event/dlb2: replace use of fixed size rte_memcpy
  event/dpaa2: replace use of fixed size rte_memcpy
  event/octeontx: replace use of fixed size rte_memcpy
  mempool/dpaa: replace use of fixed size rte_memcpy
  mempool/dpaa2: replace use of fixed size rte_memcpy
  ml/cnxk: replace use of fixed size rte_memcpy
  net/af_xdp: replace use of fixed size rte_memcpy
  net/avp: replace use of fixed size rte_memcpy
  net/axgbe: replace use of fixed size rte_memcpy
  net/bnx2x: replace use of fixed size rte_memcpy
  net/bnxt: replace use of fixed size rte_memcpy
  net/bonding: replace use of fixed size rte_memcpy
  net/cnxk: replace use of fixed size rte_memcpy
  net/cpfl: replace use of fixed size rte_memcpy
  net/cxgbe: replace use of fixed size rte_memcpy
  net/dpaa2: replace use of fixed size rte_memcpy
  net/e1000: replace use of fixed size rte_memcpy
  net/enic: replace use of fixed size rte_memcpy
  net/failsafe: replace use of fixed size rte_memcpy
  net/gve/base: replace use of fixed size rte_memcpy
  net/hinic: replace use of fixed size rte_memcpy
  net/mvpp2: replace use of fixed size rte_memcpy
  app/test-pmd: replace use of fixed size rte_memcpy
  app/graph: replace use of fixed size rte_memcpy
  app/test-eventdev: replace use of fixed size rte_memcpy
  app/test: replace use of fixed size rte_memcpy
  examples: replace use of fixed size rte_memcpy

 app/graph/neigh.c                             |   8 +-
 app/test-eventdev/test_pipeline_common.c      |  19 ++-
 app/test-pmd/cmdline.c                        |  48 ++++----
 app/test-pmd/cmdline_flow.c                   |  24 ++--
 app/test-pmd/config.c                         |   8 +-
 app/test-pmd/csumonly.c                       |   1 -
 app/test-pmd/flowgen.c                        |   1 -
 app/test-pmd/iofwd.c                          |   1 -
 app/test-pmd/macfwd.c                         |   1 -
 app/test-pmd/macswap.c                        |   1 -
 app/test-pmd/noisy_vnf.c                      |   1 -
 app/test-pmd/rxonly.c                         |   1 -
 app/test-pmd/testpmd.c                        |   1 -
 app/test/commands.c                           |   1 -
 app/test/packet_burst_generator.c             |   4 +-
 app/test/test_crc.c                           |   5 +-
 app/test/test_cryptodev.c                     |  18 ++-
 app/test/test_cryptodev_asym.c                |   1 -
 app/test/test_cryptodev_security_pdcp.c       |   1 -
 app/test/test_efd.c                           |   1 -
 app/test/test_efd_perf.c                      |   1 -
 app/test/test_event_crypto_adapter.c          |  12 +-
 app/test/test_event_dma_adapter.c             |   4 +-
 app/test/test_eventdev.c                      |   1 -
 app/test/test_ipsec.c                         |   6 +-
 app/test/test_link_bonding_mode4.c            |   8 +-
 app/test/test_mbuf.c                          |   1 -
 app/test/test_member.c                        |   1 -
 app/test/test_member_perf.c                   |   1 -
 app/test/test_rawdev.c                        |   1 -
 app/test/test_security_inline_proto.c         |  36 +++---
 app/test/test_service_cores.c                 |   1 -
 app/test/virtual_pmd.c                        |   3 +-
 devtools/cocci/rte_memcpy.cocci               |  11 ++
 drivers/baseband/acc/rte_acc100_pmd.c         |  17 ++-
 drivers/baseband/acc/rte_vrb_pmd.c            |  21 ++--
 drivers/baseband/la12xx/bbdev_la12xx.c        |   4 +-
 drivers/common/idpf/idpf_common_device.c      |   4 +-
 drivers/common/idpf/idpf_common_virtchnl.c    |   8 +-
 drivers/common/qat/qat_qp.c                   |  10 +-
 drivers/compress/qat/qat_comp.c               |   8 +-
 drivers/crypto/ccp/ccp_crypto.c               |  14 +--
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
 drivers/crypto/cnxk/cnxk_se.h                 |   2 +-
 drivers/crypto/dpaa_sec/dpaa_sec.c            |   2 +-
 drivers/crypto/ipsec_mb/pmd_snow3g.c          |   4 +-
 drivers/crypto/qat/qat_sym_session.c          |  52 ++++-----
 .../scheduler/rte_cryptodev_scheduler.c       |   6 +-
 drivers/crypto/scheduler/scheduler_failover.c |  12 +-
 drivers/event/cnxk/cnxk_tim_evdev.c           |   4 +-
 drivers/event/dlb2/dlb2.c                     |   6 +-
 drivers/event/dpaa2/dpaa2_eventdev.c          |   6 +-
 drivers/event/octeontx/timvf_evdev.c          |   4 +-
 drivers/mempool/dpaa/dpaa_mempool.c           |   4 +-
 drivers/mempool/dpaa2/dpaa2_hw_mempool.c      |   4 +-
 drivers/ml/cnxk/cn10k_ml_model.c              |   8 +-
 drivers/ml/cnxk/cn10k_ml_ops.c                |  11 +-
 drivers/ml/cnxk/cnxk_ml_ops.c                 |   2 +-
 drivers/ml/cnxk/mvtvm_ml_model.c              |   8 +-
 drivers/ml/cnxk/mvtvm_ml_ops.c                |   8 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c           |   2 +-
 drivers/net/avp/avp_ethdev.c                  |   4 +-
 drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
 drivers/net/bnx2x/bnx2x.c                     |  32 +++--
 drivers/net/bnx2x/bnx2x_stats.c               |  10 +-
 drivers/net/bnx2x/bnx2x_vfpf.c                |  19 +--
 drivers/net/bnxt/bnxt_flow.c                  |  34 +++---
 drivers/net/bonding/rte_eth_bond_8023ad.c     |   4 +-
 drivers/net/bonding/rte_eth_bond_flow.c       |   2 +-
 drivers/net/cnxk/cnxk_ethdev_ops.c            |   2 +-
 drivers/net/cnxk/cnxk_tm.c                    |   5 +-
 drivers/net/cpfl/cpfl_ethdev.c                |   3 +-
 drivers/net/cpfl/cpfl_vchnl.c                 |   4 +-
 drivers/net/cxgbe/clip_tbl.c                  |   2 +-
 drivers/net/cxgbe/cxgbe_filter.c              |   8 +-
 drivers/net/cxgbe/l2t.c                       |   4 +-
 drivers/net/cxgbe/smt.c                       |  20 ++--
 drivers/net/dpaa2/base/dpaa2_hw_dpni.c        |   1 -
 drivers/net/dpaa2/dpaa2_ethdev.c              |   1 -
 drivers/net/dpaa2/dpaa2_recycle.c             |   1 -
 drivers/net/dpaa2/dpaa2_rxtx.c                |   1 -
 drivers/net/dpaa2/dpaa2_sparser.c             |   1 -
 drivers/net/dpaa2/dpaa2_tm.c                  |   2 +-
 drivers/net/e1000/em_rxtx.c                   |   1 -
 drivers/net/e1000/igb_flow.c                  |  22 ++--
 drivers/net/e1000/igb_pf.c                    |   7 +-
 drivers/net/e1000/igb_rxtx.c                  |   1 -
 drivers/net/enic/enic_main.c                  |   8 +-
 drivers/net/failsafe/failsafe_ops.c           |   6 +-
 drivers/net/gve/base/gve_adminq.c             |   2 +-
 drivers/net/hinic/hinic_pmd_flow.c            |  40 +++----
 drivers/net/hns3/hns3_fdir.c                  |   2 +-
 drivers/net/hns3/hns3_flow.c                  |   4 +-
 drivers/net/i40e/i40e_ethdev.c                | 109 ++++++++----------
 drivers/net/i40e/i40e_fdir.c                  |  28 +++--
 drivers/net/i40e/i40e_flow.c                  |  56 +++++----
 drivers/net/i40e/i40e_pf.c                    |   3 +-
 drivers/net/i40e/i40e_tm.c                    |  11 +-
 drivers/net/i40e/rte_pmd_i40e.c               |  34 +++---
 drivers/net/iavf/iavf_fdir.c                  |  93 +++++++--------
 drivers/net/iavf/iavf_fsub.c                  |  50 ++++----
 drivers/net/iavf/iavf_generic_flow.c          |   2 +-
 drivers/net/iavf/iavf_tm.c                    |  11 +-
 drivers/net/iavf/iavf_vchnl.c                 |   9 +-
 drivers/net/ice/ice_dcf.c                     |   5 +-
 drivers/net/ice/ice_dcf_parent.c              |   2 +-
 drivers/net/ice/ice_dcf_sched.c               |  11 +-
 drivers/net/ice/ice_diagnose.c                |   4 +-
 drivers/net/ice/ice_ethdev.c                  |  14 +--
 drivers/net/ice/ice_fdir_filter.c             |  37 +++---
 drivers/net/ice/ice_generic_flow.c            |   2 +-
 drivers/net/ice/ice_hash.c                    |   2 +-
 drivers/net/ice/ice_tm.c                      |  11 +-
 drivers/net/idpf/idpf_ethdev.c                |   7 +-
 drivers/net/idpf/idpf_rxtx.c                  |  10 +-
 drivers/net/ipn3ke/ipn3ke_flow.c              |  32 +++--
 drivers/net/ipn3ke/ipn3ke_representor.c       |  16 +--
 drivers/net/ipn3ke/ipn3ke_tm.c                |   6 +-
 drivers/net/ixgbe/ixgbe_ethdev.c              |   9 +-
 drivers/net/ixgbe/ixgbe_fdir.c                |   7 +-
 drivers/net/ixgbe/ixgbe_flow.c                |  65 +++++------
 drivers/net/ixgbe/ixgbe_ipsec.c               |   8 +-
 drivers/net/ixgbe/ixgbe_pf.c                  |   5 +-
 drivers/net/ixgbe/ixgbe_tm.c                  |  11 +-
 drivers/net/ixgbe/rte_pmd_ixgbe.c             |   4 +-
 drivers/net/memif/memif_socket.c              |   4 +-
 drivers/net/mlx5/mlx5_devx.c                  |   4 +-
 drivers/net/mlx5/mlx5_flow.c                  |  38 +++---
 drivers/net/mlx5/mlx5_flow_aso.c              |   6 +-
 drivers/net/mlx5/mlx5_flow_hw.c               |  16 +--
 drivers/net/mlx5/mlx5_rx.c                    |   6 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c              |   8 +-
 drivers/net/mvpp2/mrvl_tm.c                   |   2 +-
 drivers/net/nfp/flower/nfp_conntrack.c        |   2 +-
 drivers/net/nfp/flower/nfp_flower_flow.c      |  16 +--
 .../net/nfp/flower/nfp_flower_representor.c   |   2 +-
 drivers/net/nfp/nfp_mtr.c                     |  10 +-
 drivers/net/ngbe/ngbe_pf.c                    |   4 +-
 drivers/net/null/rte_eth_null.c               |   6 +-
 drivers/net/pcap/pcap_ethdev.c                |   2 +-
 drivers/net/pcap/pcap_osdep_freebsd.c         |   2 +-
 drivers/net/pcap/pcap_osdep_linux.c           |   2 +-
 drivers/net/qede/qede_main.c                  |   2 +-
 drivers/net/sfc/sfc.c                         |   2 +-
 drivers/net/sfc/sfc_ef10_tx.c                 |   2 +-
 drivers/net/sfc/sfc_ethdev.c                  |  11 +-
 drivers/net/sfc/sfc_flow.c                    |  20 ++--
 drivers/net/sfc/sfc_flow_rss.c                |   2 +-
 drivers/net/sfc/sfc_mae.c                     |   2 +-
 drivers/net/sfc/sfc_rx.c                      |   2 +-
 drivers/net/sfc/sfc_tso.c                     |   2 +-
 drivers/net/sfc/sfc_tso.h                     |   9 +-
 drivers/net/tap/rte_eth_tap.c                 |  14 +--
 drivers/net/txgbe/txgbe_ethdev.c              |   9 +-
 drivers/net/txgbe/txgbe_fdir.c                |   6 +-
 drivers/net/txgbe/txgbe_flow.c                |  65 +++++------
 drivers/net/txgbe/txgbe_ipsec.c               |   8 +-
 drivers/net/txgbe/txgbe_pf.c                  |   4 +-
 drivers/net/txgbe/txgbe_tm.c                  |  11 +-
 drivers/raw/ifpga/afu_pmd_he_hssi.c           |   3 +-
 drivers/raw/ifpga/afu_pmd_he_lpbk.c           |   3 +-
 drivers/raw/ifpga/afu_pmd_he_mem.c            |   3 +-
 drivers/raw/ifpga/afu_pmd_n3000.c             |   8 +-
 drivers/raw/ifpga/ifpga_rawdev.c              |  11 +-
 drivers/raw/skeleton/skeleton_rawdev.c        |   7 +-
 examples/bbdev_app/main.c                     |   2 +-
 examples/l2fwd-cat/cat.c                      |   4 +-
 examples/ptpclient/ptpclient.c                |  11 +-
 examples/vhost/main.c                         |   5 +-
 examples/vmdq/main.c                          |   6 +-
 examples/vmdq_dcb/main.c                      |  15 +--
 lib/cryptodev/rte_cryptodev.c                 |   2 +-
 lib/eal/common/eal_common_options.c           |   7 +-
 lib/ethdev/rte_ethdev.c                       |   3 +-
 lib/ethdev/rte_flow.c                         |   5 +-
 lib/eventdev/rte_event_crypto_adapter.c       |   2 +-
 lib/eventdev/rte_event_dma_adapter.c          |   4 +-
 lib/eventdev/rte_event_timer_adapter.c        |   2 +-
 lib/fib/trie.c                                |   2 +-
 lib/ip_frag/rte_ipv6_fragmentation.c          |   4 +-
 lib/ip_frag/rte_ipv6_reassembly.c             |   6 +-
 lib/lpm/rte_lpm6.c                            |   3 +-
 lib/net/rte_ether.c                           |   2 +-
 lib/node/ip6_lookup.c                         |   8 +-
 lib/pdcp/pdcp_process.c                       |  36 +++---
 lib/pipeline/rte_table_action.c               |   8 +-
 lib/rib/rte_rib6.h                            |   5 +-
 lib/security/rte_security.c                   |   4 +-
 188 files changed, 881 insertions(+), 998 deletions(-)
 create mode 100644 devtools/cocci/rte_memcpy.cocci
  

Comments

Tyler Retzlaff March 1, 2024, 5:50 p.m. UTC | #1
On Fri, Mar 01, 2024 at 09:14:56AM -0800, Stephen Hemminger wrote:
> The DPDK has a lot of "cargo cult" usage of rte_memcpy.
> This patch set replaces cases where rte_memcpy is used with a fixed
> size constant size.
> 
> Typical example is:
> 	rte_memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
> which can be replaced with:
> 	memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
> 
> This has two benefits. Gcc (and clang) are smart enough that for
> all small fixed size values, they just generate the necessary instructions
> to do it inline. It also means that fortify, Coverity, and ASAN
> analyzers can check these memcpy's.
> 
> So faster, better, safer.
> 
> The first patch is a simple coccinelle script to do the replacement
> and the rest are the results broken out by module.
> 
> The coccinelle script can be used again to make sure more bad
> usage doesn't creep in with new drivers.
> 
> v2 - fix CI failure on some OS by adding string.h
>      remove rte_memcpy.h if no longer used
> 
> Stephen Hemminger (71):
>   cocci/rte_memcpy: add script to eliminate fixed size rte_memcpy
>   eal: replace use of fixed size rte_memcpy
>   ethdev: replace use of fixed size rte_memcpy
>   eventdev: replace use of fixed size rte_memcpy
>   cryptodev: replace use of fixed size rte_memcpy
>   ip_frag: replace use of fixed size rte_memcpy
>   net: replace use of fixed size rte_memcpy
>   lpm: replace use of fixed size rte_memcpy
>   node: replace use of fixed size rte_memcpy
>   pdcp: replace use of fixed size rte_memcpy
>   pipeline: replace use of fixed size rte_memcpy
>   rib: replace use of fixed size rte_memcpy
>   security: replace use of fixed size rte_memcpy
>   net/mlx5: replace use of fixed size rte_memcpy
>   net/nfp: replace use of fixed size rte_memcpy
>   net/ngbe: replace use of fixed size rte_memcpy
>   net/null: replace use of fixed size rte_memcpy
>   net/pcap: replace use of fixed size rte_memcpy
>   net/sfc: replace use of fixed size rte_memcpy
>   net/tap: replace use of fixed size rte_memcpy
>   net/txgbe: replace use of fixed size rte_memcpy
>   raw/ifpga: replace use of fixed size rte_memcpy
>   raw/skeleton: replace use of fixed size rte_memcpy
>   net/hns3: replace use of fixed size rte_memcpy
>   net/i40e: replace use of fixed size rte_memcpy
>   net/iavf: replace use of fixed size rte_memcpy
>   net/ice: replace use of fixed size rte_memcpy
>   net/idpf: replace use of fixed size rte_memcpy
>   net/ipn3ke: replace use of fixed size rte_memcpy
>   net/ixgbe: replace use of fixed size rte_memcpy
>   net/memif: replace use of fixed size rte_memcpy
>   net/qede: replace use of fixed size rte_memcpy
>   baseband/acc: replace use of fixed size rte_memcpy
>   baseband/la12xx: replace use of fixed size rte_memcpy
>   common/idpf: replace use of fixed size rte_memcpy
>   common/qat: replace use of fixed size rte_memcpy
>   compress/qat: replace use of fixed size rte_memcpy
>   crypto/ccp: replace use of fixed size rte_memcpy
>   crypto/cnxk: replace use of fixed size rte_memcpy
>   crypto/dpaa_sec: replace use of fixed size rte_memcpy
>   crypto/ipsec_mb: replace use of fixed size rte_memcpy
>   crypto/qat: replace use of fixed size rte_memcpy
>   crypto/scheduler: replace use of fixed size rte_memcpy
>   event/cnxk: replace use of fixed size rte_memcpy
>   event/dlb2: replace use of fixed size rte_memcpy
>   event/dpaa2: replace use of fixed size rte_memcpy
>   event/octeontx: replace use of fixed size rte_memcpy
>   mempool/dpaa: replace use of fixed size rte_memcpy
>   mempool/dpaa2: replace use of fixed size rte_memcpy
>   ml/cnxk: replace use of fixed size rte_memcpy
>   net/af_xdp: replace use of fixed size rte_memcpy
>   net/avp: replace use of fixed size rte_memcpy
>   net/axgbe: replace use of fixed size rte_memcpy
>   net/bnx2x: replace use of fixed size rte_memcpy
>   net/bnxt: replace use of fixed size rte_memcpy
>   net/bonding: replace use of fixed size rte_memcpy
>   net/cnxk: replace use of fixed size rte_memcpy
>   net/cpfl: replace use of fixed size rte_memcpy
>   net/cxgbe: replace use of fixed size rte_memcpy
>   net/dpaa2: replace use of fixed size rte_memcpy
>   net/e1000: replace use of fixed size rte_memcpy
>   net/enic: replace use of fixed size rte_memcpy
>   net/failsafe: replace use of fixed size rte_memcpy
>   net/gve/base: replace use of fixed size rte_memcpy
>   net/hinic: replace use of fixed size rte_memcpy
>   net/mvpp2: replace use of fixed size rte_memcpy
>   app/test-pmd: replace use of fixed size rte_memcpy
>   app/graph: replace use of fixed size rte_memcpy
>   app/test-eventdev: replace use of fixed size rte_memcpy
>   app/test: replace use of fixed size rte_memcpy
>   examples: replace use of fixed size rte_memcpy
> 
>  app/graph/neigh.c                             |   8 +-
>  app/test-eventdev/test_pipeline_common.c      |  19 ++-
>  app/test-pmd/cmdline.c                        |  48 ++++----
>  app/test-pmd/cmdline_flow.c                   |  24 ++--
>  app/test-pmd/config.c                         |   8 +-
>  app/test-pmd/csumonly.c                       |   1 -
>  app/test-pmd/flowgen.c                        |   1 -
>  app/test-pmd/iofwd.c                          |   1 -
>  app/test-pmd/macfwd.c                         |   1 -
>  app/test-pmd/macswap.c                        |   1 -
>  app/test-pmd/noisy_vnf.c                      |   1 -
>  app/test-pmd/rxonly.c                         |   1 -
>  app/test-pmd/testpmd.c                        |   1 -
>  app/test/commands.c                           |   1 -
>  app/test/packet_burst_generator.c             |   4 +-
>  app/test/test_crc.c                           |   5 +-
>  app/test/test_cryptodev.c                     |  18 ++-
>  app/test/test_cryptodev_asym.c                |   1 -
>  app/test/test_cryptodev_security_pdcp.c       |   1 -
>  app/test/test_efd.c                           |   1 -
>  app/test/test_efd_perf.c                      |   1 -
>  app/test/test_event_crypto_adapter.c          |  12 +-
>  app/test/test_event_dma_adapter.c             |   4 +-
>  app/test/test_eventdev.c                      |   1 -
>  app/test/test_ipsec.c                         |   6 +-
>  app/test/test_link_bonding_mode4.c            |   8 +-
>  app/test/test_mbuf.c                          |   1 -
>  app/test/test_member.c                        |   1 -
>  app/test/test_member_perf.c                   |   1 -
>  app/test/test_rawdev.c                        |   1 -
>  app/test/test_security_inline_proto.c         |  36 +++---
>  app/test/test_service_cores.c                 |   1 -
>  app/test/virtual_pmd.c                        |   3 +-
>  devtools/cocci/rte_memcpy.cocci               |  11 ++
>  drivers/baseband/acc/rte_acc100_pmd.c         |  17 ++-
>  drivers/baseband/acc/rte_vrb_pmd.c            |  21 ++--
>  drivers/baseband/la12xx/bbdev_la12xx.c        |   4 +-
>  drivers/common/idpf/idpf_common_device.c      |   4 +-
>  drivers/common/idpf/idpf_common_virtchnl.c    |   8 +-
>  drivers/common/qat/qat_qp.c                   |  10 +-
>  drivers/compress/qat/qat_comp.c               |   8 +-
>  drivers/crypto/ccp/ccp_crypto.c               |  14 +--
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
>  drivers/crypto/cnxk/cnxk_se.h                 |   2 +-
>  drivers/crypto/dpaa_sec/dpaa_sec.c            |   2 +-
>  drivers/crypto/ipsec_mb/pmd_snow3g.c          |   4 +-
>  drivers/crypto/qat/qat_sym_session.c          |  52 ++++-----
>  .../scheduler/rte_cryptodev_scheduler.c       |   6 +-
>  drivers/crypto/scheduler/scheduler_failover.c |  12 +-
>  drivers/event/cnxk/cnxk_tim_evdev.c           |   4 +-
>  drivers/event/dlb2/dlb2.c                     |   6 +-
>  drivers/event/dpaa2/dpaa2_eventdev.c          |   6 +-
>  drivers/event/octeontx/timvf_evdev.c          |   4 +-
>  drivers/mempool/dpaa/dpaa_mempool.c           |   4 +-
>  drivers/mempool/dpaa2/dpaa2_hw_mempool.c      |   4 +-
>  drivers/ml/cnxk/cn10k_ml_model.c              |   8 +-
>  drivers/ml/cnxk/cn10k_ml_ops.c                |  11 +-
>  drivers/ml/cnxk/cnxk_ml_ops.c                 |   2 +-
>  drivers/ml/cnxk/mvtvm_ml_model.c              |   8 +-
>  drivers/ml/cnxk/mvtvm_ml_ops.c                |   8 +-
>  drivers/net/af_xdp/rte_eth_af_xdp.c           |   2 +-
>  drivers/net/avp/avp_ethdev.c                  |   4 +-
>  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
>  drivers/net/bnx2x/bnx2x.c                     |  32 +++--
>  drivers/net/bnx2x/bnx2x_stats.c               |  10 +-
>  drivers/net/bnx2x/bnx2x_vfpf.c                |  19 +--
>  drivers/net/bnxt/bnxt_flow.c                  |  34 +++---
>  drivers/net/bonding/rte_eth_bond_8023ad.c     |   4 +-
>  drivers/net/bonding/rte_eth_bond_flow.c       |   2 +-
>  drivers/net/cnxk/cnxk_ethdev_ops.c            |   2 +-
>  drivers/net/cnxk/cnxk_tm.c                    |   5 +-
>  drivers/net/cpfl/cpfl_ethdev.c                |   3 +-
>  drivers/net/cpfl/cpfl_vchnl.c                 |   4 +-
>  drivers/net/cxgbe/clip_tbl.c                  |   2 +-
>  drivers/net/cxgbe/cxgbe_filter.c              |   8 +-
>  drivers/net/cxgbe/l2t.c                       |   4 +-
>  drivers/net/cxgbe/smt.c                       |  20 ++--
>  drivers/net/dpaa2/base/dpaa2_hw_dpni.c        |   1 -
>  drivers/net/dpaa2/dpaa2_ethdev.c              |   1 -
>  drivers/net/dpaa2/dpaa2_recycle.c             |   1 -
>  drivers/net/dpaa2/dpaa2_rxtx.c                |   1 -
>  drivers/net/dpaa2/dpaa2_sparser.c             |   1 -
>  drivers/net/dpaa2/dpaa2_tm.c                  |   2 +-
>  drivers/net/e1000/em_rxtx.c                   |   1 -
>  drivers/net/e1000/igb_flow.c                  |  22 ++--
>  drivers/net/e1000/igb_pf.c                    |   7 +-
>  drivers/net/e1000/igb_rxtx.c                  |   1 -
>  drivers/net/enic/enic_main.c                  |   8 +-
>  drivers/net/failsafe/failsafe_ops.c           |   6 +-
>  drivers/net/gve/base/gve_adminq.c             |   2 +-
>  drivers/net/hinic/hinic_pmd_flow.c            |  40 +++----
>  drivers/net/hns3/hns3_fdir.c                  |   2 +-
>  drivers/net/hns3/hns3_flow.c                  |   4 +-
>  drivers/net/i40e/i40e_ethdev.c                | 109 ++++++++----------
>  drivers/net/i40e/i40e_fdir.c                  |  28 +++--
>  drivers/net/i40e/i40e_flow.c                  |  56 +++++----
>  drivers/net/i40e/i40e_pf.c                    |   3 +-
>  drivers/net/i40e/i40e_tm.c                    |  11 +-
>  drivers/net/i40e/rte_pmd_i40e.c               |  34 +++---
>  drivers/net/iavf/iavf_fdir.c                  |  93 +++++++--------
>  drivers/net/iavf/iavf_fsub.c                  |  50 ++++----
>  drivers/net/iavf/iavf_generic_flow.c          |   2 +-
>  drivers/net/iavf/iavf_tm.c                    |  11 +-
>  drivers/net/iavf/iavf_vchnl.c                 |   9 +-
>  drivers/net/ice/ice_dcf.c                     |   5 +-
>  drivers/net/ice/ice_dcf_parent.c              |   2 +-
>  drivers/net/ice/ice_dcf_sched.c               |  11 +-
>  drivers/net/ice/ice_diagnose.c                |   4 +-
>  drivers/net/ice/ice_ethdev.c                  |  14 +--
>  drivers/net/ice/ice_fdir_filter.c             |  37 +++---
>  drivers/net/ice/ice_generic_flow.c            |   2 +-
>  drivers/net/ice/ice_hash.c                    |   2 +-
>  drivers/net/ice/ice_tm.c                      |  11 +-
>  drivers/net/idpf/idpf_ethdev.c                |   7 +-
>  drivers/net/idpf/idpf_rxtx.c                  |  10 +-
>  drivers/net/ipn3ke/ipn3ke_flow.c              |  32 +++--
>  drivers/net/ipn3ke/ipn3ke_representor.c       |  16 +--
>  drivers/net/ipn3ke/ipn3ke_tm.c                |   6 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c              |   9 +-
>  drivers/net/ixgbe/ixgbe_fdir.c                |   7 +-
>  drivers/net/ixgbe/ixgbe_flow.c                |  65 +++++------
>  drivers/net/ixgbe/ixgbe_ipsec.c               |   8 +-
>  drivers/net/ixgbe/ixgbe_pf.c                  |   5 +-
>  drivers/net/ixgbe/ixgbe_tm.c                  |  11 +-
>  drivers/net/ixgbe/rte_pmd_ixgbe.c             |   4 +-
>  drivers/net/memif/memif_socket.c              |   4 +-
>  drivers/net/mlx5/mlx5_devx.c                  |   4 +-
>  drivers/net/mlx5/mlx5_flow.c                  |  38 +++---
>  drivers/net/mlx5/mlx5_flow_aso.c              |   6 +-
>  drivers/net/mlx5/mlx5_flow_hw.c               |  16 +--
>  drivers/net/mlx5/mlx5_rx.c                    |   6 +-
>  drivers/net/mlx5/mlx5_rxtx_vec.c              |   8 +-
>  drivers/net/mvpp2/mrvl_tm.c                   |   2 +-
>  drivers/net/nfp/flower/nfp_conntrack.c        |   2 +-
>  drivers/net/nfp/flower/nfp_flower_flow.c      |  16 +--
>  .../net/nfp/flower/nfp_flower_representor.c   |   2 +-
>  drivers/net/nfp/nfp_mtr.c                     |  10 +-
>  drivers/net/ngbe/ngbe_pf.c                    |   4 +-
>  drivers/net/null/rte_eth_null.c               |   6 +-
>  drivers/net/pcap/pcap_ethdev.c                |   2 +-
>  drivers/net/pcap/pcap_osdep_freebsd.c         |   2 +-
>  drivers/net/pcap/pcap_osdep_linux.c           |   2 +-
>  drivers/net/qede/qede_main.c                  |   2 +-
>  drivers/net/sfc/sfc.c                         |   2 +-
>  drivers/net/sfc/sfc_ef10_tx.c                 |   2 +-
>  drivers/net/sfc/sfc_ethdev.c                  |  11 +-
>  drivers/net/sfc/sfc_flow.c                    |  20 ++--
>  drivers/net/sfc/sfc_flow_rss.c                |   2 +-
>  drivers/net/sfc/sfc_mae.c                     |   2 +-
>  drivers/net/sfc/sfc_rx.c                      |   2 +-
>  drivers/net/sfc/sfc_tso.c                     |   2 +-
>  drivers/net/sfc/sfc_tso.h                     |   9 +-
>  drivers/net/tap/rte_eth_tap.c                 |  14 +--
>  drivers/net/txgbe/txgbe_ethdev.c              |   9 +-
>  drivers/net/txgbe/txgbe_fdir.c                |   6 +-
>  drivers/net/txgbe/txgbe_flow.c                |  65 +++++------
>  drivers/net/txgbe/txgbe_ipsec.c               |   8 +-
>  drivers/net/txgbe/txgbe_pf.c                  |   4 +-
>  drivers/net/txgbe/txgbe_tm.c                  |  11 +-
>  drivers/raw/ifpga/afu_pmd_he_hssi.c           |   3 +-
>  drivers/raw/ifpga/afu_pmd_he_lpbk.c           |   3 +-
>  drivers/raw/ifpga/afu_pmd_he_mem.c            |   3 +-
>  drivers/raw/ifpga/afu_pmd_n3000.c             |   8 +-
>  drivers/raw/ifpga/ifpga_rawdev.c              |  11 +-
>  drivers/raw/skeleton/skeleton_rawdev.c        |   7 +-
>  examples/bbdev_app/main.c                     |   2 +-
>  examples/l2fwd-cat/cat.c                      |   4 +-
>  examples/ptpclient/ptpclient.c                |  11 +-
>  examples/vhost/main.c                         |   5 +-
>  examples/vmdq/main.c                          |   6 +-
>  examples/vmdq_dcb/main.c                      |  15 +--
>  lib/cryptodev/rte_cryptodev.c                 |   2 +-
>  lib/eal/common/eal_common_options.c           |   7 +-
>  lib/ethdev/rte_ethdev.c                       |   3 +-
>  lib/ethdev/rte_flow.c                         |   5 +-
>  lib/eventdev/rte_event_crypto_adapter.c       |   2 +-
>  lib/eventdev/rte_event_dma_adapter.c          |   4 +-
>  lib/eventdev/rte_event_timer_adapter.c        |   2 +-
>  lib/fib/trie.c                                |   2 +-
>  lib/ip_frag/rte_ipv6_fragmentation.c          |   4 +-
>  lib/ip_frag/rte_ipv6_reassembly.c             |   6 +-
>  lib/lpm/rte_lpm6.c                            |   3 +-
>  lib/net/rte_ether.c                           |   2 +-
>  lib/node/ip6_lookup.c                         |   8 +-
>  lib/pdcp/pdcp_process.c                       |  36 +++---
>  lib/pipeline/rte_table_action.c               |   8 +-
>  lib/rib/rte_rib6.h                            |   5 +-
>  lib/security/rte_security.c                   |   4 +-
>  188 files changed, 881 insertions(+), 998 deletions(-)
>  create mode 100644 devtools/cocci/rte_memcpy.cocci
> 
> -- 

Series-acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

> 2.43.0
  
Mattias Rönnblom March 2, 2024, 11:14 a.m. UTC | #2
On 2024-03-01 18:14, Stephen Hemminger wrote:
> The DPDK has a lot of "cargo cult" usage of rte_memcpy.
> This patch set replaces cases where rte_memcpy is used with a fixed
> size constant size.
> 
> Typical example is:
> 	rte_memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
> which can be replaced with:
> 	memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
> 
> This has two benefits. Gcc (and clang) are smart enough that for
> all small fixed size values, they just generate the necessary instructions
> to do it inline. It also means that fortify, Coverity, and ASAN
> analyzers can check these memcpy's.
> 

Instead of smearing out the knowledge of when to use rte_memcpy(), and 
when to use memcpy() across the code base, wouldn't it be better to 
*always* call rte_memcpy() in the fast path, and leave the policy 
decision to the rte_memcpy() implementation?

In rte_memcpy(), add:
if (__builtin_constant_p(n) && n < RTE_LIBC_MEMCPY_SIZE_THRESHOLD)
	memcpy(/../);
..or something to that effect.

Could you have a #ifdef for dumb static analysis tools? To make it look 
like you are always using memcpy()?

> So faster, better, safer.
> 

What is "faster" based on?

My experience with replacing rte_memcpy() with memcpy() (or vice versa) 
is mixed.

I've also tried just dropping the DPDK-custom memcpy() implementation 
altogether, and that caused a performance drop (in a particular app, on 
a particular compiler and CPU).

> The first patch is a simple coccinelle script to do the replacement
> and the rest are the results broken out by module.
> 
> The coccinelle script can be used again to make sure more bad
> usage doesn't creep in with new drivers.
> 
> v2 - fix CI failure on some OS by adding string.h
>       remove rte_memcpy.h if no longer used
> 
> Stephen Hemminger (71):
>    cocci/rte_memcpy: add script to eliminate fixed size rte_memcpy
>    eal: replace use of fixed size rte_memcpy
>    ethdev: replace use of fixed size rte_memcpy
>    eventdev: replace use of fixed size rte_memcpy
>    cryptodev: replace use of fixed size rte_memcpy
>    ip_frag: replace use of fixed size rte_memcpy
>    net: replace use of fixed size rte_memcpy
>    lpm: replace use of fixed size rte_memcpy
>    node: replace use of fixed size rte_memcpy
>    pdcp: replace use of fixed size rte_memcpy
>    pipeline: replace use of fixed size rte_memcpy
>    rib: replace use of fixed size rte_memcpy
>    security: replace use of fixed size rte_memcpy
>    net/mlx5: replace use of fixed size rte_memcpy
>    net/nfp: replace use of fixed size rte_memcpy
>    net/ngbe: replace use of fixed size rte_memcpy
>    net/null: replace use of fixed size rte_memcpy
>    net/pcap: replace use of fixed size rte_memcpy
>    net/sfc: replace use of fixed size rte_memcpy
>    net/tap: replace use of fixed size rte_memcpy
>    net/txgbe: replace use of fixed size rte_memcpy
>    raw/ifpga: replace use of fixed size rte_memcpy
>    raw/skeleton: replace use of fixed size rte_memcpy
>    net/hns3: replace use of fixed size rte_memcpy
>    net/i40e: replace use of fixed size rte_memcpy
>    net/iavf: replace use of fixed size rte_memcpy
>    net/ice: replace use of fixed size rte_memcpy
>    net/idpf: replace use of fixed size rte_memcpy
>    net/ipn3ke: replace use of fixed size rte_memcpy
>    net/ixgbe: replace use of fixed size rte_memcpy
>    net/memif: replace use of fixed size rte_memcpy
>    net/qede: replace use of fixed size rte_memcpy
>    baseband/acc: replace use of fixed size rte_memcpy
>    baseband/la12xx: replace use of fixed size rte_memcpy
>    common/idpf: replace use of fixed size rte_memcpy
>    common/qat: replace use of fixed size rte_memcpy
>    compress/qat: replace use of fixed size rte_memcpy
>    crypto/ccp: replace use of fixed size rte_memcpy
>    crypto/cnxk: replace use of fixed size rte_memcpy
>    crypto/dpaa_sec: replace use of fixed size rte_memcpy
>    crypto/ipsec_mb: replace use of fixed size rte_memcpy
>    crypto/qat: replace use of fixed size rte_memcpy
>    crypto/scheduler: replace use of fixed size rte_memcpy
>    event/cnxk: replace use of fixed size rte_memcpy
>    event/dlb2: replace use of fixed size rte_memcpy
>    event/dpaa2: replace use of fixed size rte_memcpy
>    event/octeontx: replace use of fixed size rte_memcpy
>    mempool/dpaa: replace use of fixed size rte_memcpy
>    mempool/dpaa2: replace use of fixed size rte_memcpy
>    ml/cnxk: replace use of fixed size rte_memcpy
>    net/af_xdp: replace use of fixed size rte_memcpy
>    net/avp: replace use of fixed size rte_memcpy
>    net/axgbe: replace use of fixed size rte_memcpy
>    net/bnx2x: replace use of fixed size rte_memcpy
>    net/bnxt: replace use of fixed size rte_memcpy
>    net/bonding: replace use of fixed size rte_memcpy
>    net/cnxk: replace use of fixed size rte_memcpy
>    net/cpfl: replace use of fixed size rte_memcpy
>    net/cxgbe: replace use of fixed size rte_memcpy
>    net/dpaa2: replace use of fixed size rte_memcpy
>    net/e1000: replace use of fixed size rte_memcpy
>    net/enic: replace use of fixed size rte_memcpy
>    net/failsafe: replace use of fixed size rte_memcpy
>    net/gve/base: replace use of fixed size rte_memcpy
>    net/hinic: replace use of fixed size rte_memcpy
>    net/mvpp2: replace use of fixed size rte_memcpy
>    app/test-pmd: replace use of fixed size rte_memcpy
>    app/graph: replace use of fixed size rte_memcpy
>    app/test-eventdev: replace use of fixed size rte_memcpy
>    app/test: replace use of fixed size rte_memcpy
>    examples: replace use of fixed size rte_memcpy
> 
>   app/graph/neigh.c                             |   8 +-
>   app/test-eventdev/test_pipeline_common.c      |  19 ++-
>   app/test-pmd/cmdline.c                        |  48 ++++----
>   app/test-pmd/cmdline_flow.c                   |  24 ++--
>   app/test-pmd/config.c                         |   8 +-
>   app/test-pmd/csumonly.c                       |   1 -
>   app/test-pmd/flowgen.c                        |   1 -
>   app/test-pmd/iofwd.c                          |   1 -
>   app/test-pmd/macfwd.c                         |   1 -
>   app/test-pmd/macswap.c                        |   1 -
>   app/test-pmd/noisy_vnf.c                      |   1 -
>   app/test-pmd/rxonly.c                         |   1 -
>   app/test-pmd/testpmd.c                        |   1 -
>   app/test/commands.c                           |   1 -
>   app/test/packet_burst_generator.c             |   4 +-
>   app/test/test_crc.c                           |   5 +-
>   app/test/test_cryptodev.c                     |  18 ++-
>   app/test/test_cryptodev_asym.c                |   1 -
>   app/test/test_cryptodev_security_pdcp.c       |   1 -
>   app/test/test_efd.c                           |   1 -
>   app/test/test_efd_perf.c                      |   1 -
>   app/test/test_event_crypto_adapter.c          |  12 +-
>   app/test/test_event_dma_adapter.c             |   4 +-
>   app/test/test_eventdev.c                      |   1 -
>   app/test/test_ipsec.c                         |   6 +-
>   app/test/test_link_bonding_mode4.c            |   8 +-
>   app/test/test_mbuf.c                          |   1 -
>   app/test/test_member.c                        |   1 -
>   app/test/test_member_perf.c                   |   1 -
>   app/test/test_rawdev.c                        |   1 -
>   app/test/test_security_inline_proto.c         |  36 +++---
>   app/test/test_service_cores.c                 |   1 -
>   app/test/virtual_pmd.c                        |   3 +-
>   devtools/cocci/rte_memcpy.cocci               |  11 ++
>   drivers/baseband/acc/rte_acc100_pmd.c         |  17 ++-
>   drivers/baseband/acc/rte_vrb_pmd.c            |  21 ++--
>   drivers/baseband/la12xx/bbdev_la12xx.c        |   4 +-
>   drivers/common/idpf/idpf_common_device.c      |   4 +-
>   drivers/common/idpf/idpf_common_virtchnl.c    |   8 +-
>   drivers/common/qat/qat_qp.c                   |  10 +-
>   drivers/compress/qat/qat_comp.c               |   8 +-
>   drivers/crypto/ccp/ccp_crypto.c               |  14 +--
>   drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
>   drivers/crypto/cnxk/cnxk_se.h                 |   2 +-
>   drivers/crypto/dpaa_sec/dpaa_sec.c            |   2 +-
>   drivers/crypto/ipsec_mb/pmd_snow3g.c          |   4 +-
>   drivers/crypto/qat/qat_sym_session.c          |  52 ++++-----
>   .../scheduler/rte_cryptodev_scheduler.c       |   6 +-
>   drivers/crypto/scheduler/scheduler_failover.c |  12 +-
>   drivers/event/cnxk/cnxk_tim_evdev.c           |   4 +-
>   drivers/event/dlb2/dlb2.c                     |   6 +-
>   drivers/event/dpaa2/dpaa2_eventdev.c          |   6 +-
>   drivers/event/octeontx/timvf_evdev.c          |   4 +-
>   drivers/mempool/dpaa/dpaa_mempool.c           |   4 +-
>   drivers/mempool/dpaa2/dpaa2_hw_mempool.c      |   4 +-
>   drivers/ml/cnxk/cn10k_ml_model.c              |   8 +-
>   drivers/ml/cnxk/cn10k_ml_ops.c                |  11 +-
>   drivers/ml/cnxk/cnxk_ml_ops.c                 |   2 +-
>   drivers/ml/cnxk/mvtvm_ml_model.c              |   8 +-
>   drivers/ml/cnxk/mvtvm_ml_ops.c                |   8 +-
>   drivers/net/af_xdp/rte_eth_af_xdp.c           |   2 +-
>   drivers/net/avp/avp_ethdev.c                  |   4 +-
>   drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
>   drivers/net/bnx2x/bnx2x.c                     |  32 +++--
>   drivers/net/bnx2x/bnx2x_stats.c               |  10 +-
>   drivers/net/bnx2x/bnx2x_vfpf.c                |  19 +--
>   drivers/net/bnxt/bnxt_flow.c                  |  34 +++---
>   drivers/net/bonding/rte_eth_bond_8023ad.c     |   4 +-
>   drivers/net/bonding/rte_eth_bond_flow.c       |   2 +-
>   drivers/net/cnxk/cnxk_ethdev_ops.c            |   2 +-
>   drivers/net/cnxk/cnxk_tm.c                    |   5 +-
>   drivers/net/cpfl/cpfl_ethdev.c                |   3 +-
>   drivers/net/cpfl/cpfl_vchnl.c                 |   4 +-
>   drivers/net/cxgbe/clip_tbl.c                  |   2 +-
>   drivers/net/cxgbe/cxgbe_filter.c              |   8 +-
>   drivers/net/cxgbe/l2t.c                       |   4 +-
>   drivers/net/cxgbe/smt.c                       |  20 ++--
>   drivers/net/dpaa2/base/dpaa2_hw_dpni.c        |   1 -
>   drivers/net/dpaa2/dpaa2_ethdev.c              |   1 -
>   drivers/net/dpaa2/dpaa2_recycle.c             |   1 -
>   drivers/net/dpaa2/dpaa2_rxtx.c                |   1 -
>   drivers/net/dpaa2/dpaa2_sparser.c             |   1 -
>   drivers/net/dpaa2/dpaa2_tm.c                  |   2 +-
>   drivers/net/e1000/em_rxtx.c                   |   1 -
>   drivers/net/e1000/igb_flow.c                  |  22 ++--
>   drivers/net/e1000/igb_pf.c                    |   7 +-
>   drivers/net/e1000/igb_rxtx.c                  |   1 -
>   drivers/net/enic/enic_main.c                  |   8 +-
>   drivers/net/failsafe/failsafe_ops.c           |   6 +-
>   drivers/net/gve/base/gve_adminq.c             |   2 +-
>   drivers/net/hinic/hinic_pmd_flow.c            |  40 +++----
>   drivers/net/hns3/hns3_fdir.c                  |   2 +-
>   drivers/net/hns3/hns3_flow.c                  |   4 +-
>   drivers/net/i40e/i40e_ethdev.c                | 109 ++++++++----------
>   drivers/net/i40e/i40e_fdir.c                  |  28 +++--
>   drivers/net/i40e/i40e_flow.c                  |  56 +++++----
>   drivers/net/i40e/i40e_pf.c                    |   3 +-
>   drivers/net/i40e/i40e_tm.c                    |  11 +-
>   drivers/net/i40e/rte_pmd_i40e.c               |  34 +++---
>   drivers/net/iavf/iavf_fdir.c                  |  93 +++++++--------
>   drivers/net/iavf/iavf_fsub.c                  |  50 ++++----
>   drivers/net/iavf/iavf_generic_flow.c          |   2 +-
>   drivers/net/iavf/iavf_tm.c                    |  11 +-
>   drivers/net/iavf/iavf_vchnl.c                 |   9 +-
>   drivers/net/ice/ice_dcf.c                     |   5 +-
>   drivers/net/ice/ice_dcf_parent.c              |   2 +-
>   drivers/net/ice/ice_dcf_sched.c               |  11 +-
>   drivers/net/ice/ice_diagnose.c                |   4 +-
>   drivers/net/ice/ice_ethdev.c                  |  14 +--
>   drivers/net/ice/ice_fdir_filter.c             |  37 +++---
>   drivers/net/ice/ice_generic_flow.c            |   2 +-
>   drivers/net/ice/ice_hash.c                    |   2 +-
>   drivers/net/ice/ice_tm.c                      |  11 +-
>   drivers/net/idpf/idpf_ethdev.c                |   7 +-
>   drivers/net/idpf/idpf_rxtx.c                  |  10 +-
>   drivers/net/ipn3ke/ipn3ke_flow.c              |  32 +++--
>   drivers/net/ipn3ke/ipn3ke_representor.c       |  16 +--
>   drivers/net/ipn3ke/ipn3ke_tm.c                |   6 +-
>   drivers/net/ixgbe/ixgbe_ethdev.c              |   9 +-
>   drivers/net/ixgbe/ixgbe_fdir.c                |   7 +-
>   drivers/net/ixgbe/ixgbe_flow.c                |  65 +++++------
>   drivers/net/ixgbe/ixgbe_ipsec.c               |   8 +-
>   drivers/net/ixgbe/ixgbe_pf.c                  |   5 +-
>   drivers/net/ixgbe/ixgbe_tm.c                  |  11 +-
>   drivers/net/ixgbe/rte_pmd_ixgbe.c             |   4 +-
>   drivers/net/memif/memif_socket.c              |   4 +-
>   drivers/net/mlx5/mlx5_devx.c                  |   4 +-
>   drivers/net/mlx5/mlx5_flow.c                  |  38 +++---
>   drivers/net/mlx5/mlx5_flow_aso.c              |   6 +-
>   drivers/net/mlx5/mlx5_flow_hw.c               |  16 +--
>   drivers/net/mlx5/mlx5_rx.c                    |   6 +-
>   drivers/net/mlx5/mlx5_rxtx_vec.c              |   8 +-
>   drivers/net/mvpp2/mrvl_tm.c                   |   2 +-
>   drivers/net/nfp/flower/nfp_conntrack.c        |   2 +-
>   drivers/net/nfp/flower/nfp_flower_flow.c      |  16 +--
>   .../net/nfp/flower/nfp_flower_representor.c   |   2 +-
>   drivers/net/nfp/nfp_mtr.c                     |  10 +-
>   drivers/net/ngbe/ngbe_pf.c                    |   4 +-
>   drivers/net/null/rte_eth_null.c               |   6 +-
>   drivers/net/pcap/pcap_ethdev.c                |   2 +-
>   drivers/net/pcap/pcap_osdep_freebsd.c         |   2 +-
>   drivers/net/pcap/pcap_osdep_linux.c           |   2 +-
>   drivers/net/qede/qede_main.c                  |   2 +-
>   drivers/net/sfc/sfc.c                         |   2 +-
>   drivers/net/sfc/sfc_ef10_tx.c                 |   2 +-
>   drivers/net/sfc/sfc_ethdev.c                  |  11 +-
>   drivers/net/sfc/sfc_flow.c                    |  20 ++--
>   drivers/net/sfc/sfc_flow_rss.c                |   2 +-
>   drivers/net/sfc/sfc_mae.c                     |   2 +-
>   drivers/net/sfc/sfc_rx.c                      |   2 +-
>   drivers/net/sfc/sfc_tso.c                     |   2 +-
>   drivers/net/sfc/sfc_tso.h                     |   9 +-
>   drivers/net/tap/rte_eth_tap.c                 |  14 +--
>   drivers/net/txgbe/txgbe_ethdev.c              |   9 +-
>   drivers/net/txgbe/txgbe_fdir.c                |   6 +-
>   drivers/net/txgbe/txgbe_flow.c                |  65 +++++------
>   drivers/net/txgbe/txgbe_ipsec.c               |   8 +-
>   drivers/net/txgbe/txgbe_pf.c                  |   4 +-
>   drivers/net/txgbe/txgbe_tm.c                  |  11 +-
>   drivers/raw/ifpga/afu_pmd_he_hssi.c           |   3 +-
>   drivers/raw/ifpga/afu_pmd_he_lpbk.c           |   3 +-
>   drivers/raw/ifpga/afu_pmd_he_mem.c            |   3 +-
>   drivers/raw/ifpga/afu_pmd_n3000.c             |   8 +-
>   drivers/raw/ifpga/ifpga_rawdev.c              |  11 +-
>   drivers/raw/skeleton/skeleton_rawdev.c        |   7 +-
>   examples/bbdev_app/main.c                     |   2 +-
>   examples/l2fwd-cat/cat.c                      |   4 +-
>   examples/ptpclient/ptpclient.c                |  11 +-
>   examples/vhost/main.c                         |   5 +-
>   examples/vmdq/main.c                          |   6 +-
>   examples/vmdq_dcb/main.c                      |  15 +--
>   lib/cryptodev/rte_cryptodev.c                 |   2 +-
>   lib/eal/common/eal_common_options.c           |   7 +-
>   lib/ethdev/rte_ethdev.c                       |   3 +-
>   lib/ethdev/rte_flow.c                         |   5 +-
>   lib/eventdev/rte_event_crypto_adapter.c       |   2 +-
>   lib/eventdev/rte_event_dma_adapter.c          |   4 +-
>   lib/eventdev/rte_event_timer_adapter.c        |   2 +-
>   lib/fib/trie.c                                |   2 +-
>   lib/ip_frag/rte_ipv6_fragmentation.c          |   4 +-
>   lib/ip_frag/rte_ipv6_reassembly.c             |   6 +-
>   lib/lpm/rte_lpm6.c                            |   3 +-
>   lib/net/rte_ether.c                           |   2 +-
>   lib/node/ip6_lookup.c                         |   8 +-
>   lib/pdcp/pdcp_process.c                       |  36 +++---
>   lib/pipeline/rte_table_action.c               |   8 +-
>   lib/rib/rte_rib6.h                            |   5 +-
>   lib/security/rte_security.c                   |   4 +-
>   188 files changed, 881 insertions(+), 998 deletions(-)
>   create mode 100644 devtools/cocci/rte_memcpy.cocci
>
  
Mattias Rönnblom March 2, 2024, 12:01 p.m. UTC | #3
On 2024-03-02 12:14, Mattias Rönnblom wrote:
> On 2024-03-01 18:14, Stephen Hemminger wrote:
>> The DPDK has a lot of "cargo cult" usage of rte_memcpy.
>> This patch set replaces cases where rte_memcpy is used with a fixed
>> size constant size.
>>
>> Typical example is:
>>     rte_memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
>> which can be replaced with:
>>     memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
>>
>> This has two benefits. Gcc (and clang) are smart enough that for
>> all small fixed size values, they just generate the necessary 
>> instructions
>> to do it inline. It also means that fortify, Coverity, and ASAN
>> analyzers can check these memcpy's.
>>
> 
> Instead of smearing out the knowledge of when to use rte_memcpy(), and 
> when to use memcpy() across the code base, wouldn't it be better to 
> *always* call rte_memcpy() in the fast path, and leave the policy 
> decision to the rte_memcpy() implementation?
> 
> In rte_memcpy(), add:
> if (__builtin_constant_p(n) && n < RTE_LIBC_MEMCPY_SIZE_THRESHOLD)
>      memcpy(/../);
> ..or something to that effect.
> 
> Could you have a #ifdef for dumb static analysis tools? To make it look 
> like you are always using memcpy()?
> 
>> So faster, better, safer.
>>
> 
> What is "faster" based on?
> 

I ran some DSW benchmarks, and if you add

diff --git a/lib/eal/x86/include/rte_memcpy.h 
b/lib/eal/x86/include/rte_memcpy.h
index 72a92290e0..64cd82d78d 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -862,6 +862,11 @@ rte_memcpy_aligned(void *dst, const void *src, 
size_t n)
  static __rte_always_inline void *
  rte_memcpy(void *dst, const void *src, size_t n)
  {
+       if (__builtin_constant_p(n) && n <= 32) {
+               memcpy(dst, src, n);
+               return dst;
+       }
+
         if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
                 return rte_memcpy_aligned(dst, src, n);
         else

...the overhead increases from roughly 48 core clock cycles/event to 59 
cc/event. The same for "n < 128". (I'm not sure what counts as "small" 
here.)

So something rte_memcpy() does for small and constant memory copies does 
make things go *significantly* faster, at least in certain cases.

(Linux, GCC 11.4, Intel Gracemont.)

> My experience with replacing rte_memcpy() with memcpy() (or vice versa) 
> is mixed.
> 
> I've also tried just dropping the DPDK-custom memcpy() implementation 
> altogether, and that caused a performance drop (in a particular app, on 
> a particular compiler and CPU).
> 
>> The first patch is a simple coccinelle script to do the replacement
>> and the rest are the results broken out by module.
>>
>> The coccinelle script can be used again to make sure more bad
>> usage doesn't creep in with new drivers.
>>
>> v2 - fix CI failure on some OS by adding string.h
>>       remove rte_memcpy.h if no longer used
>>
>> Stephen Hemminger (71):
>>    cocci/rte_memcpy: add script to eliminate fixed size rte_memcpy
>>    eal: replace use of fixed size rte_memcpy
>>    ethdev: replace use of fixed size rte_memcpy
>>    eventdev: replace use of fixed size rte_memcpy
>>    cryptodev: replace use of fixed size rte_memcpy
>>    ip_frag: replace use of fixed size rte_memcpy
>>    net: replace use of fixed size rte_memcpy
>>    lpm: replace use of fixed size rte_memcpy
>>    node: replace use of fixed size rte_memcpy
>>    pdcp: replace use of fixed size rte_memcpy
>>    pipeline: replace use of fixed size rte_memcpy
>>    rib: replace use of fixed size rte_memcpy
>>    security: replace use of fixed size rte_memcpy
>>    net/mlx5: replace use of fixed size rte_memcpy
>>    net/nfp: replace use of fixed size rte_memcpy
>>    net/ngbe: replace use of fixed size rte_memcpy
>>    net/null: replace use of fixed size rte_memcpy
>>    net/pcap: replace use of fixed size rte_memcpy
>>    net/sfc: replace use of fixed size rte_memcpy
>>    net/tap: replace use of fixed size rte_memcpy
>>    net/txgbe: replace use of fixed size rte_memcpy
>>    raw/ifpga: replace use of fixed size rte_memcpy
>>    raw/skeleton: replace use of fixed size rte_memcpy
>>    net/hns3: replace use of fixed size rte_memcpy
>>    net/i40e: replace use of fixed size rte_memcpy
>>    net/iavf: replace use of fixed size rte_memcpy
>>    net/ice: replace use of fixed size rte_memcpy
>>    net/idpf: replace use of fixed size rte_memcpy
>>    net/ipn3ke: replace use of fixed size rte_memcpy
>>    net/ixgbe: replace use of fixed size rte_memcpy
>>    net/memif: replace use of fixed size rte_memcpy
>>    net/qede: replace use of fixed size rte_memcpy
>>    baseband/acc: replace use of fixed size rte_memcpy
>>    baseband/la12xx: replace use of fixed size rte_memcpy
>>    common/idpf: replace use of fixed size rte_memcpy
>>    common/qat: replace use of fixed size rte_memcpy
>>    compress/qat: replace use of fixed size rte_memcpy
>>    crypto/ccp: replace use of fixed size rte_memcpy
>>    crypto/cnxk: replace use of fixed size rte_memcpy
>>    crypto/dpaa_sec: replace use of fixed size rte_memcpy
>>    crypto/ipsec_mb: replace use of fixed size rte_memcpy
>>    crypto/qat: replace use of fixed size rte_memcpy
>>    crypto/scheduler: replace use of fixed size rte_memcpy
>>    event/cnxk: replace use of fixed size rte_memcpy
>>    event/dlb2: replace use of fixed size rte_memcpy
>>    event/dpaa2: replace use of fixed size rte_memcpy
>>    event/octeontx: replace use of fixed size rte_memcpy
>>    mempool/dpaa: replace use of fixed size rte_memcpy
>>    mempool/dpaa2: replace use of fixed size rte_memcpy
>>    ml/cnxk: replace use of fixed size rte_memcpy
>>    net/af_xdp: replace use of fixed size rte_memcpy
>>    net/avp: replace use of fixed size rte_memcpy
>>    net/axgbe: replace use of fixed size rte_memcpy
>>    net/bnx2x: replace use of fixed size rte_memcpy
>>    net/bnxt: replace use of fixed size rte_memcpy
>>    net/bonding: replace use of fixed size rte_memcpy
>>    net/cnxk: replace use of fixed size rte_memcpy
>>    net/cpfl: replace use of fixed size rte_memcpy
>>    net/cxgbe: replace use of fixed size rte_memcpy
>>    net/dpaa2: replace use of fixed size rte_memcpy
>>    net/e1000: replace use of fixed size rte_memcpy
>>    net/enic: replace use of fixed size rte_memcpy
>>    net/failsafe: replace use of fixed size rte_memcpy
>>    net/gve/base: replace use of fixed size rte_memcpy
>>    net/hinic: replace use of fixed size rte_memcpy
>>    net/mvpp2: replace use of fixed size rte_memcpy
>>    app/test-pmd: replace use of fixed size rte_memcpy
>>    app/graph: replace use of fixed size rte_memcpy
>>    app/test-eventdev: replace use of fixed size rte_memcpy
>>    app/test: replace use of fixed size rte_memcpy
>>    examples: replace use of fixed size rte_memcpy
>>
>>   app/graph/neigh.c                             |   8 +-
>>   app/test-eventdev/test_pipeline_common.c      |  19 ++-
>>   app/test-pmd/cmdline.c                        |  48 ++++----
>>   app/test-pmd/cmdline_flow.c                   |  24 ++--
>>   app/test-pmd/config.c                         |   8 +-
>>   app/test-pmd/csumonly.c                       |   1 -
>>   app/test-pmd/flowgen.c                        |   1 -
>>   app/test-pmd/iofwd.c                          |   1 -
>>   app/test-pmd/macfwd.c                         |   1 -
>>   app/test-pmd/macswap.c                        |   1 -
>>   app/test-pmd/noisy_vnf.c                      |   1 -
>>   app/test-pmd/rxonly.c                         |   1 -
>>   app/test-pmd/testpmd.c                        |   1 -
>>   app/test/commands.c                           |   1 -
>>   app/test/packet_burst_generator.c             |   4 +-
>>   app/test/test_crc.c                           |   5 +-
>>   app/test/test_cryptodev.c                     |  18 ++-
>>   app/test/test_cryptodev_asym.c                |   1 -
>>   app/test/test_cryptodev_security_pdcp.c       |   1 -
>>   app/test/test_efd.c                           |   1 -
>>   app/test/test_efd_perf.c                      |   1 -
>>   app/test/test_event_crypto_adapter.c          |  12 +-
>>   app/test/test_event_dma_adapter.c             |   4 +-
>>   app/test/test_eventdev.c                      |   1 -
>>   app/test/test_ipsec.c                         |   6 +-
>>   app/test/test_link_bonding_mode4.c            |   8 +-
>>   app/test/test_mbuf.c                          |   1 -
>>   app/test/test_member.c                        |   1 -
>>   app/test/test_member_perf.c                   |   1 -
>>   app/test/test_rawdev.c                        |   1 -
>>   app/test/test_security_inline_proto.c         |  36 +++---
>>   app/test/test_service_cores.c                 |   1 -
>>   app/test/virtual_pmd.c                        |   3 +-
>>   devtools/cocci/rte_memcpy.cocci               |  11 ++
>>   drivers/baseband/acc/rte_acc100_pmd.c         |  17 ++-
>>   drivers/baseband/acc/rte_vrb_pmd.c            |  21 ++--
>>   drivers/baseband/la12xx/bbdev_la12xx.c        |   4 +-
>>   drivers/common/idpf/idpf_common_device.c      |   4 +-
>>   drivers/common/idpf/idpf_common_virtchnl.c    |   8 +-
>>   drivers/common/qat/qat_qp.c                   |  10 +-
>>   drivers/compress/qat/qat_comp.c               |   8 +-
>>   drivers/crypto/ccp/ccp_crypto.c               |  14 +--
>>   drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
>>   drivers/crypto/cnxk/cnxk_se.h                 |   2 +-
>>   drivers/crypto/dpaa_sec/dpaa_sec.c            |   2 +-
>>   drivers/crypto/ipsec_mb/pmd_snow3g.c          |   4 +-
>>   drivers/crypto/qat/qat_sym_session.c          |  52 ++++-----
>>   .../scheduler/rte_cryptodev_scheduler.c       |   6 +-
>>   drivers/crypto/scheduler/scheduler_failover.c |  12 +-
>>   drivers/event/cnxk/cnxk_tim_evdev.c           |   4 +-
>>   drivers/event/dlb2/dlb2.c                     |   6 +-
>>   drivers/event/dpaa2/dpaa2_eventdev.c          |   6 +-
>>   drivers/event/octeontx/timvf_evdev.c          |   4 +-
>>   drivers/mempool/dpaa/dpaa_mempool.c           |   4 +-
>>   drivers/mempool/dpaa2/dpaa2_hw_mempool.c      |   4 +-
>>   drivers/ml/cnxk/cn10k_ml_model.c              |   8 +-
>>   drivers/ml/cnxk/cn10k_ml_ops.c                |  11 +-
>>   drivers/ml/cnxk/cnxk_ml_ops.c                 |   2 +-
>>   drivers/ml/cnxk/mvtvm_ml_model.c              |   8 +-
>>   drivers/ml/cnxk/mvtvm_ml_ops.c                |   8 +-
>>   drivers/net/af_xdp/rte_eth_af_xdp.c           |   2 +-
>>   drivers/net/avp/avp_ethdev.c                  |   4 +-
>>   drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
>>   drivers/net/bnx2x/bnx2x.c                     |  32 +++--
>>   drivers/net/bnx2x/bnx2x_stats.c               |  10 +-
>>   drivers/net/bnx2x/bnx2x_vfpf.c                |  19 +--
>>   drivers/net/bnxt/bnxt_flow.c                  |  34 +++---
>>   drivers/net/bonding/rte_eth_bond_8023ad.c     |   4 +-
>>   drivers/net/bonding/rte_eth_bond_flow.c       |   2 +-
>>   drivers/net/cnxk/cnxk_ethdev_ops.c            |   2 +-
>>   drivers/net/cnxk/cnxk_tm.c                    |   5 +-
>>   drivers/net/cpfl/cpfl_ethdev.c                |   3 +-
>>   drivers/net/cpfl/cpfl_vchnl.c                 |   4 +-
>>   drivers/net/cxgbe/clip_tbl.c                  |   2 +-
>>   drivers/net/cxgbe/cxgbe_filter.c              |   8 +-
>>   drivers/net/cxgbe/l2t.c                       |   4 +-
>>   drivers/net/cxgbe/smt.c                       |  20 ++--
>>   drivers/net/dpaa2/base/dpaa2_hw_dpni.c        |   1 -
>>   drivers/net/dpaa2/dpaa2_ethdev.c              |   1 -
>>   drivers/net/dpaa2/dpaa2_recycle.c             |   1 -
>>   drivers/net/dpaa2/dpaa2_rxtx.c                |   1 -
>>   drivers/net/dpaa2/dpaa2_sparser.c             |   1 -
>>   drivers/net/dpaa2/dpaa2_tm.c                  |   2 +-
>>   drivers/net/e1000/em_rxtx.c                   |   1 -
>>   drivers/net/e1000/igb_flow.c                  |  22 ++--
>>   drivers/net/e1000/igb_pf.c                    |   7 +-
>>   drivers/net/e1000/igb_rxtx.c                  |   1 -
>>   drivers/net/enic/enic_main.c                  |   8 +-
>>   drivers/net/failsafe/failsafe_ops.c           |   6 +-
>>   drivers/net/gve/base/gve_adminq.c             |   2 +-
>>   drivers/net/hinic/hinic_pmd_flow.c            |  40 +++----
>>   drivers/net/hns3/hns3_fdir.c                  |   2 +-
>>   drivers/net/hns3/hns3_flow.c                  |   4 +-
>>   drivers/net/i40e/i40e_ethdev.c                | 109 ++++++++----------
>>   drivers/net/i40e/i40e_fdir.c                  |  28 +++--
>>   drivers/net/i40e/i40e_flow.c                  |  56 +++++----
>>   drivers/net/i40e/i40e_pf.c                    |   3 +-
>>   drivers/net/i40e/i40e_tm.c                    |  11 +-
>>   drivers/net/i40e/rte_pmd_i40e.c               |  34 +++---
>>   drivers/net/iavf/iavf_fdir.c                  |  93 +++++++--------
>>   drivers/net/iavf/iavf_fsub.c                  |  50 ++++----
>>   drivers/net/iavf/iavf_generic_flow.c          |   2 +-
>>   drivers/net/iavf/iavf_tm.c                    |  11 +-
>>   drivers/net/iavf/iavf_vchnl.c                 |   9 +-
>>   drivers/net/ice/ice_dcf.c                     |   5 +-
>>   drivers/net/ice/ice_dcf_parent.c              |   2 +-
>>   drivers/net/ice/ice_dcf_sched.c               |  11 +-
>>   drivers/net/ice/ice_diagnose.c                |   4 +-
>>   drivers/net/ice/ice_ethdev.c                  |  14 +--
>>   drivers/net/ice/ice_fdir_filter.c             |  37 +++---
>>   drivers/net/ice/ice_generic_flow.c            |   2 +-
>>   drivers/net/ice/ice_hash.c                    |   2 +-
>>   drivers/net/ice/ice_tm.c                      |  11 +-
>>   drivers/net/idpf/idpf_ethdev.c                |   7 +-
>>   drivers/net/idpf/idpf_rxtx.c                  |  10 +-
>>   drivers/net/ipn3ke/ipn3ke_flow.c              |  32 +++--
>>   drivers/net/ipn3ke/ipn3ke_representor.c       |  16 +--
>>   drivers/net/ipn3ke/ipn3ke_tm.c                |   6 +-
>>   drivers/net/ixgbe/ixgbe_ethdev.c              |   9 +-
>>   drivers/net/ixgbe/ixgbe_fdir.c                |   7 +-
>>   drivers/net/ixgbe/ixgbe_flow.c                |  65 +++++------
>>   drivers/net/ixgbe/ixgbe_ipsec.c               |   8 +-
>>   drivers/net/ixgbe/ixgbe_pf.c                  |   5 +-
>>   drivers/net/ixgbe/ixgbe_tm.c                  |  11 +-
>>   drivers/net/ixgbe/rte_pmd_ixgbe.c             |   4 +-
>>   drivers/net/memif/memif_socket.c              |   4 +-
>>   drivers/net/mlx5/mlx5_devx.c                  |   4 +-
>>   drivers/net/mlx5/mlx5_flow.c                  |  38 +++---
>>   drivers/net/mlx5/mlx5_flow_aso.c              |   6 +-
>>   drivers/net/mlx5/mlx5_flow_hw.c               |  16 +--
>>   drivers/net/mlx5/mlx5_rx.c                    |   6 +-
>>   drivers/net/mlx5/mlx5_rxtx_vec.c              |   8 +-
>>   drivers/net/mvpp2/mrvl_tm.c                   |   2 +-
>>   drivers/net/nfp/flower/nfp_conntrack.c        |   2 +-
>>   drivers/net/nfp/flower/nfp_flower_flow.c      |  16 +--
>>   .../net/nfp/flower/nfp_flower_representor.c   |   2 +-
>>   drivers/net/nfp/nfp_mtr.c                     |  10 +-
>>   drivers/net/ngbe/ngbe_pf.c                    |   4 +-
>>   drivers/net/null/rte_eth_null.c               |   6 +-
>>   drivers/net/pcap/pcap_ethdev.c                |   2 +-
>>   drivers/net/pcap/pcap_osdep_freebsd.c         |   2 +-
>>   drivers/net/pcap/pcap_osdep_linux.c           |   2 +-
>>   drivers/net/qede/qede_main.c                  |   2 +-
>>   drivers/net/sfc/sfc.c                         |   2 +-
>>   drivers/net/sfc/sfc_ef10_tx.c                 |   2 +-
>>   drivers/net/sfc/sfc_ethdev.c                  |  11 +-
>>   drivers/net/sfc/sfc_flow.c                    |  20 ++--
>>   drivers/net/sfc/sfc_flow_rss.c                |   2 +-
>>   drivers/net/sfc/sfc_mae.c                     |   2 +-
>>   drivers/net/sfc/sfc_rx.c                      |   2 +-
>>   drivers/net/sfc/sfc_tso.c                     |   2 +-
>>   drivers/net/sfc/sfc_tso.h                     |   9 +-
>>   drivers/net/tap/rte_eth_tap.c                 |  14 +--
>>   drivers/net/txgbe/txgbe_ethdev.c              |   9 +-
>>   drivers/net/txgbe/txgbe_fdir.c                |   6 +-
>>   drivers/net/txgbe/txgbe_flow.c                |  65 +++++------
>>   drivers/net/txgbe/txgbe_ipsec.c               |   8 +-
>>   drivers/net/txgbe/txgbe_pf.c                  |   4 +-
>>   drivers/net/txgbe/txgbe_tm.c                  |  11 +-
>>   drivers/raw/ifpga/afu_pmd_he_hssi.c           |   3 +-
>>   drivers/raw/ifpga/afu_pmd_he_lpbk.c           |   3 +-
>>   drivers/raw/ifpga/afu_pmd_he_mem.c            |   3 +-
>>   drivers/raw/ifpga/afu_pmd_n3000.c             |   8 +-
>>   drivers/raw/ifpga/ifpga_rawdev.c              |  11 +-
>>   drivers/raw/skeleton/skeleton_rawdev.c        |   7 +-
>>   examples/bbdev_app/main.c                     |   2 +-
>>   examples/l2fwd-cat/cat.c                      |   4 +-
>>   examples/ptpclient/ptpclient.c                |  11 +-
>>   examples/vhost/main.c                         |   5 +-
>>   examples/vmdq/main.c                          |   6 +-
>>   examples/vmdq_dcb/main.c                      |  15 +--
>>   lib/cryptodev/rte_cryptodev.c                 |   2 +-
>>   lib/eal/common/eal_common_options.c           |   7 +-
>>   lib/ethdev/rte_ethdev.c                       |   3 +-
>>   lib/ethdev/rte_flow.c                         |   5 +-
>>   lib/eventdev/rte_event_crypto_adapter.c       |   2 +-
>>   lib/eventdev/rte_event_dma_adapter.c          |   4 +-
>>   lib/eventdev/rte_event_timer_adapter.c        |   2 +-
>>   lib/fib/trie.c                                |   2 +-
>>   lib/ip_frag/rte_ipv6_fragmentation.c          |   4 +-
>>   lib/ip_frag/rte_ipv6_reassembly.c             |   6 +-
>>   lib/lpm/rte_lpm6.c                            |   3 +-
>>   lib/net/rte_ether.c                           |   2 +-
>>   lib/node/ip6_lookup.c                         |   8 +-
>>   lib/pdcp/pdcp_process.c                       |  36 +++---
>>   lib/pipeline/rte_table_action.c               |   8 +-
>>   lib/rib/rte_rib6.h                            |   5 +-
>>   lib/security/rte_security.c                   |   4 +-
>>   188 files changed, 881 insertions(+), 998 deletions(-)
>>   create mode 100644 devtools/cocci/rte_memcpy.cocci
>>
  
Morten Brørup March 2, 2024, 1:05 p.m. UTC | #4
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Saturday, 2 March 2024 13.02
> 
> On 2024-03-02 12:14, Mattias Rönnblom wrote:
> > On 2024-03-01 18:14, Stephen Hemminger wrote:
> >> The DPDK has a lot of "cargo cult" usage of rte_memcpy.
> >> This patch set replaces cases where rte_memcpy is used with a fixed
> >> size constant size.
> >>
> >> Typical example is:
> >>     rte_memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
> >> which can be replaced with:
> >>     memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN);
> >>
> >> This has two benefits. Gcc (and clang) are smart enough that for
> >> all small fixed size values, they just generate the necessary
> >> instructions
> >> to do it inline. It also means that fortify, Coverity, and ASAN
> >> analyzers can check these memcpy's.
> >>
> >
> > Instead of smearing out the knowledge of when to use rte_memcpy(), and
> > when to use memcpy() across the code base, wouldn't it be better to
> > *always* call rte_memcpy() in the fast path, and leave the policy
> > decision to the rte_memcpy() implementation?
> >
> > In rte_memcpy(), add:
> > if (__builtin_constant_p(n) && n < RTE_LIBC_MEMCPY_SIZE_THRESHOLD)
> >      memcpy(/../);
> > ..or something to that effect.
> >
> > Could you have a #ifdef for dumb static analysis tools? To make it
> look
> > like you are always using memcpy()?
> >
> >> So faster, better, safer.
> >>
> >
> > What is "faster" based on?
> >
> 
> I ran some DSW benchmarks, and if you add
> 
> diff --git a/lib/eal/x86/include/rte_memcpy.h
> b/lib/eal/x86/include/rte_memcpy.h
> index 72a92290e0..64cd82d78d 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -862,6 +862,11 @@ rte_memcpy_aligned(void *dst, const void *src,
> size_t n)
>   static __rte_always_inline void *
>   rte_memcpy(void *dst, const void *src, size_t n)
>   {
> +       if (__builtin_constant_p(n) && n <= 32) {
> +               memcpy(dst, src, n);
> +               return dst;
> +       }
> +
>          if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK))
>                  return rte_memcpy_aligned(dst, src, n);
>          else
> 
> ...the overhead increases from roughly 48 core clock cycles/event to 59
> cc/event. The same for "n < 128". (I'm not sure what counts as "small"
> here.)

Thank you for digging deep into this, Mattias.
Your performance data are very interesting.

> 
> So something rte_memcpy() does for small and constant memory copies does
> make things go *significantly* faster, at least in certain cases.

Interesting.
Perhaps something with aligned copies...
The performance benefits of well known alignment was something I was looking into when working on non-temporal memcpy functions, because non-temporal load/store has some alignment requirements. (NB: NT memcpy development is hold, until I get more time to work on it again.)
Passing alignment information as a flag to an extended memcpy, to be used by __builtin_constant_p(n), could speed up copying when alignment is known by the developer, but impossible for the compiler to determine at build time.
The rte_memcpy() checks for one specific alignment criteria at runtime. I suppose the branch predictor makes it execute nearly as fast as if determined at build time, but it still consumes a lot more instruction memory.
Perhaps something else...?

> 
> (Linux, GCC 11.4, Intel Gracemont.)
> 
> > My experience with replacing rte_memcpy() with memcpy() (or vice
> versa)
> > is mixed.
> >
> > I've also tried just dropping the DPDK-custom memcpy() implementation
> > altogether, and that caused a performance drop (in a particular app,
> on
> > a particular compiler and CPU).

I guess the compilers are just not where we want them to be yet.

I don't mind generally replacing rte_memcpy() with memcpy() in the control plane.
But we should use whatever is more efficient in the data plane.

We must also keep in mind that DPDK supports old distros with old compilers. We should not remove a superfluous hand crafted optimization if a supported old compiler hasn't caught up with it yet, i.e. if it isn't superfluous on some of the old compilers supported by DPDK.
  
Stephen Hemminger March 2, 2024, 4:35 p.m. UTC | #5
On Sat, 2 Mar 2024 13:01:51 +0100
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> I ran some DSW benchmarks, and if you add
> 
> diff --git a/lib/eal/x86/include/rte_memcpy.h 
> b/lib/eal/x86/include/rte_memcpy.h
> index 72a92290e0..64cd82d78d 100644
> --- a/lib/eal/x86/include/rte_memcpy.h
> +++ b/lib/eal/x86/include/rte_memcpy.h
> @@ -862,6 +862,11 @@ rte_memcpy_aligned(void *dst, const void *src, 
> size_t n)
>   static __rte_always_inline void *
>   rte_memcpy(void *dst, const void *src, size_t n)
>   {
> +       if (__builtin_constant_p(n) && n <= 32) {
> +               memcpy(dst, src, n);
> +               return dst;
> +       }
> +

The default GCC inline threshold is 64 bytes (ie cache line size)
and that makes sense.  Since using __builtin_constant_p could
do:
	if (__builtin_constant_p(p) && n < RTE_CACHE_LINE_SIZE)
		return __builtin_memcpy(dst, src, n);
  
Stephen Hemminger March 2, 2024, 4:37 p.m. UTC | #6
On Sat, 2 Mar 2024 14:05:45 +0100
Morten Brørup <mb@smartsharesystems.com> wrote:

> >   
> > > My experience with replacing rte_memcpy() with memcpy() (or vice  
> > versa)  
> > > is mixed.
> > >
> > > I've also tried just dropping the DPDK-custom memcpy() implementation
> > > altogether, and that caused a performance drop (in a particular app,  
> > on  
> > > a particular compiler and CPU).  
> 
> I guess the compilers are just not where we want them to be yet.
> 
> I don't mind generally replacing rte_memcpy() with memcpy() in the control plane.
> But we should use whatever is more efficient in the data plane.
> 
> We must also keep in mind that DPDK supports old distros with old compilers. We should not remove a superfluous hand crafted optimization if a supported old compiler hasn't caught up with it yet, i.e. if it isn't superfluous on some of the old compilers supported by DPDK.

When I scanned the result.
	1. Most copies were small (like Ether address or IPv6 address) and compiler
           inlining should beat a function call every time.
        2. Larger structure copies were in control path.
  
Morten Brørup March 2, 2024, 5:32 p.m. UTC | #7
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 2 March 2024 17.38
> 
> On Sat, 2 Mar 2024 14:05:45 +0100
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > >
> > > > My experience with replacing rte_memcpy() with memcpy() (or vice
> > > versa)
> > > > is mixed.
> > > >
> > > > I've also tried just dropping the DPDK-custom memcpy()
> implementation
> > > > altogether, and that caused a performance drop (in a particular
> app,
> > > on
> > > > a particular compiler and CPU).
> >
> > I guess the compilers are just not where we want them to be yet.
> >
> > I don't mind generally replacing rte_memcpy() with memcpy() in the
> control plane.
> > But we should use whatever is more efficient in the data plane.
> >
> > We must also keep in mind that DPDK supports old distros with old
> compilers. We should not remove a superfluous hand crafted optimization
> if a supported old compiler hasn't caught up with it yet, i.e. if it
> isn't superfluous on some of the old compilers supported by DPDK.
> 
> When I scanned the result.
>         1. Most copies were small (like Ether address or IPv6 address)
>            and compiler
>            inlining should beat a function call every time.

Please note that rte_memcpy() is inline, so no function call is involved.

>         2. Larger structure copies were in control path.

Yep, I saw the same two things when scanning v1 of the series before acking it.
If we didn't overlook any fast path copies, this series is a good clean-up

I must admit that I assume that any compiler's built-in memcpy() is able to efficiently copy small structures of build time constant size.
Assumptions are the mother of all FU's, but being wrong on this would be a very big surprise to me.