[v4] lib/table: fix cache alignment issue

Message ID 20200722021628.17194-1-ting.xu@intel.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [v4] lib/table: fix cache alignment issue |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Xu, Ting July 22, 2020, 2:16 a.m. UTC
  When create softnic hash table with 16 keys, it failed on 32-bit
environment, because the pointer field in structure rte_bucket_4_16
is only 32 bits. Add a padding field in 32-bit environment to keep
the structure to a multiple of 64 bytes. Apply this to 8-byte and
32-byte key hash function as well.

Fixes: 8aa327214c ("table: hash")
Cc: stable@dpdk.org

Signed-off-by: Ting Xu <ting.xu@intel.com>

---
v3->v4: Change design based on comment
v2->v3: Rebase
v1->v2: Correct patch time
---
 lib/librte_table/rte_table_hash_key16.c | 17 +++++++++++++++++
 lib/librte_table/rte_table_hash_key32.c | 17 +++++++++++++++++
 lib/librte_table/rte_table_hash_key8.c  | 16 ++++++++++++++++
 3 files changed, 50 insertions(+)
  

Comments

Cristian Dumitrescu July 22, 2020, 8:26 a.m. UTC | #1
> +#ifdef RTE_ARCH_64
>  struct rte_bucket_4_32 {
>  	/* Cache line 0 */
>  	uint64_t signature[4 + 1];
> @@ -46,6 +47,22 @@ struct rte_bucket_4_32 {
>  	/* Cache line 3 */
>  	uint8_t data[0];
>  };
> +#else
> +struct rte_bucket_4_32 {
> +	/* Cache line 0 */
> +	uint64_t signature[4 + 1];
> +	uint64_t lru_list;
> +	struct rte_bucket_4_32 *next;
> +	uint32_t pad;
> +	uint64_t next_valid;
> +
> +	/* Cache lines 1 and 2 */
> +	uint64_t key[4][4];
> +
> +	/* Cache line 3 */
> +	uint8_t data[0];
> +};
> +#endif
> 

Hi Ting,

Yes, it looks good, but as mentioned previously please do the same on the other files in the same folder and add the changes to your patch, as we need to keep all these files in sync:

rte_table_hash_key8.c, struct rte_bucket_4_8
rte_table_hash_key32.c, struct rte_bucket_4_32

Thanks,
Cristian
  
Xu, Ting July 22, 2020, 8:30 a.m. UTC | #2
Hi, Cristian,

> -----Original Message-----
> From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Sent: Wednesday, July 22, 2020 4:27 PM
> To: Xu, Ting <ting.xu@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: RE: [PATCH v4] lib/table: fix cache alignment issue
> 
> 
> 
> > +#ifdef RTE_ARCH_64
> >  struct rte_bucket_4_32 {
> >  	/* Cache line 0 */
> >  	uint64_t signature[4 + 1];
> > @@ -46,6 +47,22 @@ struct rte_bucket_4_32 {
> >  	/* Cache line 3 */
> >  	uint8_t data[0];
> >  };
> > +#else
> > +struct rte_bucket_4_32 {
> > +	/* Cache line 0 */
> > +	uint64_t signature[4 + 1];
> > +	uint64_t lru_list;
> > +	struct rte_bucket_4_32 *next;
> > +	uint32_t pad;
> > +	uint64_t next_valid;
> > +
> > +	/* Cache lines 1 and 2 */
> > +	uint64_t key[4][4];
> > +
> > +	/* Cache line 3 */
> > +	uint8_t data[0];
> > +};
> > +#endif
> >
> 
> Hi Ting,
> 
> Yes, it looks good, but as mentioned previously please do the same on the
> other files in the same folder and add the changes to your patch, as we need
> to keep all these files in sync:
> 
> rte_table_hash_key8.c, struct rte_bucket_4_8 rte_table_hash_key32.c, struct
> rte_bucket_4_32
> 

I have did the changes to 8 and 32 bytes in this patch.

> Thanks,
> Cristian
  
Cristian Dumitrescu July 22, 2020, 8:48 a.m. UTC | #3
> -----Original Message-----
> From: Xu, Ting <ting.xu@intel.com>
> Sent: Wednesday, July 22, 2020 3:16 AM
> To: dev@dpdk.org
> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Xu, Ting
> <ting.xu@intel.com>; stable@dpdk.org
> Subject: [PATCH v4] lib/table: fix cache alignment issue
> 
> When create softnic hash table with 16 keys, it failed on 32-bit
> environment, because the pointer field in structure rte_bucket_4_16
> is only 32 bits. Add a padding field in 32-bit environment to keep
> the structure to a multiple of 64 bytes. Apply this to 8-byte and
> 32-byte key hash function as well.
> 
> Fixes: 8aa327214c ("table: hash")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ting Xu <ting.xu@intel.com>
> 
> ---
> v3->v4: Change design based on comment
> v2->v3: Rebase
> v1->v2: Correct patch time
> ---

Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
  
Cristian Dumitrescu July 22, 2020, 8:49 a.m. UTC | #4
> -----Original Message-----
> From: Xu, Ting <ting.xu@intel.com>
> Sent: Wednesday, July 22, 2020 9:31 AM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org
> Subject: RE: [PATCH v4] lib/table: fix cache alignment issue
> 
> Hi, Cristian,
> 
> > -----Original Message-----
> > From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> > Sent: Wednesday, July 22, 2020 4:27 PM
> > To: Xu, Ting <ting.xu@intel.com>; dev@dpdk.org
> > Cc: stable@dpdk.org
> > Subject: RE: [PATCH v4] lib/table: fix cache alignment issue
> >
> >
> >
> > > +#ifdef RTE_ARCH_64
> > >  struct rte_bucket_4_32 {
> > >  	/* Cache line 0 */
> > >  	uint64_t signature[4 + 1];
> > > @@ -46,6 +47,22 @@ struct rte_bucket_4_32 {
> > >  	/* Cache line 3 */
> > >  	uint8_t data[0];
> > >  };
> > > +#else
> > > +struct rte_bucket_4_32 {
> > > +	/* Cache line 0 */
> > > +	uint64_t signature[4 + 1];
> > > +	uint64_t lru_list;
> > > +	struct rte_bucket_4_32 *next;
> > > +	uint32_t pad;
> > > +	uint64_t next_valid;
> > > +
> > > +	/* Cache lines 1 and 2 */
> > > +	uint64_t key[4][4];
> > > +
> > > +	/* Cache line 3 */
> > > +	uint8_t data[0];
> > > +};
> > > +#endif
> > >
> >
> > Hi Ting,
> >
> > Yes, it looks good, but as mentioned previously please do the same on the
> > other files in the same folder and add the changes to your patch, as we
> need
> > to keep all these files in sync:
> >
> > rte_table_hash_key8.c, struct rte_bucket_4_8 rte_table_hash_key32.c,
> struct
> > rte_bucket_4_32
> >
> 
> I have did the changes to 8 and 32 bytes in this patch.
> 
> > Thanks,
> > Cristian

Upss, sorry, somehow I missed it. I just acked this patch. Thanks, Ting!

Regards,
Cristian
  
David Marchand July 29, 2020, 12:01 p.m. UTC | #5
On Wed, Jul 22, 2020 at 4:13 AM Ting Xu <ting.xu@intel.com> wrote:
>
> When create softnic hash table with 16 keys, it failed on 32-bit
> environment, because the pointer field in structure rte_bucket_4_16
> is only 32 bits. Add a padding field in 32-bit environment to keep
> the structure to a multiple of 64 bytes. Apply this to 8-byte and
> 32-byte key hash function as well.

Please correct me if I am wrong, but it simply means this part of the
table library never worked for 32-bit.
It seems more adding 32-bit support rather than a fix and then I
wonder if it has its place in rc3.



Now, looking at the details:

For 64-bit on my x86, we have:

struct rte_bucket_4_8 {
    uint64_t                   signature;            /*     0     8 */
    uint64_t                   lru_list;             /*     8     8 */
    struct rte_bucket_4_8 *    next;                 /*    16     8 */
    uint64_t                   next_valid;           /*    24     8 */
    uint64_t                   key[4];               /*    32    32 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    uint8_t                    data[];               /*    64     0 */

    /* size: 64, cachelines: 1, members: 6 */
};


For 32-bit, we have:

struct rte_bucket_4_8 {
    uint64_t                   signature;            /*     0     8 */
    uint64_t                   lru_list;             /*     8     8 */
    struct rte_bucket_4_8 *    next;                 /*    16     4 */
    uint64_t                   next_valid;           /*    20     8 */
    uint64_t                   key[4];               /*    28    32 */
    uint8_t                    data[];               /*    60     0 */

    /* size: 60, cachelines: 1, members: 6 */
    /* last cacheline: 60 bytes */
} __attribute__((__packed__));

^^ it is interesting that a packed attribute ends up here.
I saw no such attribute in the library code.
Compiler black magic at work I guess...


>
> Fixes: 8aa327214c ("table: hash")
> Cc: stable@dpdk.org
>
> Signed-off-by: Ting Xu <ting.xu@intel.com>
>
> ---
> v3->v4: Change design based on comment
> v2->v3: Rebase
> v1->v2: Correct patch time
> ---
>  lib/librte_table/rte_table_hash_key16.c | 17 +++++++++++++++++
>  lib/librte_table/rte_table_hash_key32.c | 17 +++++++++++++++++
>  lib/librte_table/rte_table_hash_key8.c  | 16 ++++++++++++++++
>  3 files changed, 50 insertions(+)
>
> diff --git a/lib/librte_table/rte_table_hash_key16.c b/lib/librte_table/rte_table_hash_key16.c
> index 2cca1c924..c4384b114 100644
> --- a/lib/librte_table/rte_table_hash_key16.c
> +++ b/lib/librte_table/rte_table_hash_key16.c
> @@ -33,6 +33,7 @@
>
>  #endif
>
> +#ifdef RTE_ARCH_64
>  struct rte_bucket_4_16 {
>         /* Cache line 0 */
>         uint64_t signature[4 + 1];
> @@ -46,6 +47,22 @@ struct rte_bucket_4_16 {
>         /* Cache line 2 */
>         uint8_t data[0];
>  };
> +#else
> +struct rte_bucket_4_16 {
> +       /* Cache line 0 */
> +       uint64_t signature[4 + 1];
> +       uint64_t lru_list;
> +       struct rte_bucket_4_16 *next;
> +       uint32_t pad;
> +       uint64_t next_valid;
> +
> +       /* Cache line 1 */
> +       uint64_t key[4][2];
> +
> +       /* Cache line 2 */
> +       uint8_t data[0];
> +};
> +#endif

The change could simply be:

@@ -38,6 +38,9 @@ struct rte_bucket_4_16 {
        uint64_t signature[4 + 1];
        uint64_t lru_list;
        struct rte_bucket_4_16 *next;
+#ifndef RTE_ARCH_64
+       uint32_t pad;
+#endif
        uint64_t next_valid;

        /* Cache line 1 */

It avoids duplicating the whole structure definition (we could miss
updating one side of the #ifdef later).
Idem for the other "8" and "32" structures.
  
Cristian Dumitrescu July 29, 2020, 1:13 p.m. UTC | #6
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, July 29, 2020 1:01 PM
> To: Xu, Ting <ting.xu@intel.com>; Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com>
> Cc: dev <dev@dpdk.org>; dpdk stable <stable@dpdk.org>
> Subject: Re: [dpdk-dev] [PATCH v4] lib/table: fix cache alignment issue
> 
> On Wed, Jul 22, 2020 at 4:13 AM Ting Xu <ting.xu@intel.com> wrote:
> >
> > When create softnic hash table with 16 keys, it failed on 32-bit
> > environment, because the pointer field in structure rte_bucket_4_16
> > is only 32 bits. Add a padding field in 32-bit environment to keep
> > the structure to a multiple of 64 bytes. Apply this to 8-byte and
> > 32-byte key hash function as well.
> 
> Please correct me if I am wrong, but it simply means this part of the
> table library never worked for 32-bit.
> It seems more adding 32-bit support rather than a fix and then I
> wonder if it has its place in rc3.
> 

Functionally. the code works, but performance is affected.

The only thing that prevents the code from working is the check in the table create function that checks the size of the above structure is 64 bytes, which caught this issue.

> 
> 
> Now, looking at the details:
> 
> For 64-bit on my x86, we have:
> 
> struct rte_bucket_4_8 {
>     uint64_t                   signature;            /*     0     8 */
>     uint64_t                   lru_list;             /*     8     8 */
>     struct rte_bucket_4_8 *    next;                 /*    16     8 */
>     uint64_t                   next_valid;           /*    24     8 */
>     uint64_t                   key[4];               /*    32    32 */
>     /* --- cacheline 1 boundary (64 bytes) --- */
>     uint8_t                    data[];               /*    64     0 */
> 
>     /* size: 64, cachelines: 1, members: 6 */
> };
> 
> 
> For 32-bit, we have:
> 
> struct rte_bucket_4_8 {
>     uint64_t                   signature;            /*     0     8 */
>     uint64_t                   lru_list;             /*     8     8 */
>     struct rte_bucket_4_8 *    next;                 /*    16     4 */
>     uint64_t                   next_valid;           /*    20     8 */
>     uint64_t                   key[4];               /*    28    32 */
>     uint8_t                    data[];               /*    60     0 */
> 
>     /* size: 60, cachelines: 1, members: 6 */
>     /* last cacheline: 60 bytes */
> } __attribute__((__packed__));
> 
> ^^ it is interesting that a packed attribute ends up here.
> I saw no such attribute in the library code.
> Compiler black magic at work I guess...
> 

Where do you see the packet attribute? I don't see it in the code.

A packet attribute would explain this issue, i.e. why did the compiler decide not to insert an expected padfing of 4 bytes right after the "next" field, that would allow the field "next_valid" to be aligned to its natural boundary of 8 bytes.

> 
> >
> > Fixes: 8aa327214c ("table: hash")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ting Xu <ting.xu@intel.com>
> >
> > ---
> > v3->v4: Change design based on comment
> > v2->v3: Rebase
> > v1->v2: Correct patch time
> > ---
> >  lib/librte_table/rte_table_hash_key16.c | 17 +++++++++++++++++
> >  lib/librte_table/rte_table_hash_key32.c | 17 +++++++++++++++++
> >  lib/librte_table/rte_table_hash_key8.c  | 16 ++++++++++++++++
> >  3 files changed, 50 insertions(+)
> >
> > diff --git a/lib/librte_table/rte_table_hash_key16.c
> b/lib/librte_table/rte_table_hash_key16.c
> > index 2cca1c924..c4384b114 100644
> > --- a/lib/librte_table/rte_table_hash_key16.c
> > +++ b/lib/librte_table/rte_table_hash_key16.c
> > @@ -33,6 +33,7 @@
> >
> >  #endif
> >
> > +#ifdef RTE_ARCH_64
> >  struct rte_bucket_4_16 {
> >         /* Cache line 0 */
> >         uint64_t signature[4 + 1];
> > @@ -46,6 +47,22 @@ struct rte_bucket_4_16 {
> >         /* Cache line 2 */
> >         uint8_t data[0];
> >  };
> > +#else
> > +struct rte_bucket_4_16 {
> > +       /* Cache line 0 */
> > +       uint64_t signature[4 + 1];
> > +       uint64_t lru_list;
> > +       struct rte_bucket_4_16 *next;
> > +       uint32_t pad;
> > +       uint64_t next_valid;
> > +
> > +       /* Cache line 1 */
> > +       uint64_t key[4][2];
> > +
> > +       /* Cache line 2 */
> > +       uint8_t data[0];
> > +};
> > +#endif
> 
> The change could simply be:
> 
> @@ -38,6 +38,9 @@ struct rte_bucket_4_16 {
>         uint64_t signature[4 + 1];
>         uint64_t lru_list;
>         struct rte_bucket_4_16 *next;
> +#ifndef RTE_ARCH_64
> +       uint32_t pad;
> +#endif
>         uint64_t next_valid;
> 
>         /* Cache line 1 */
> 
> It avoids duplicating the whole structure definition (we could miss
> updating one side of the #ifdef later).
> Idem for the other "8" and "32" structures.
> 
> 
> --
> David Marchand
  
David Marchand July 29, 2020, 1:28 p.m. UTC | #7
On Wed, Jul 29, 2020 at 3:14 PM Dumitrescu, Cristian
<cristian.dumitrescu@intel.com> wrote:
> > Please correct me if I am wrong, but it simply means this part of the
> > table library never worked for 32-bit.
> > It seems more adding 32-bit support rather than a fix and then I
> > wonder if it has its place in rc3.
> >
>
> Functionally. the code works, but performance is affected.
>
> The only thing that prevents the code from working is the check in the table create function that checks the size of the above structure is 64 bytes, which caught this issue.

Yes, and that's my point.
It was not working.
It was not tested.


This patch asks for backport in stable branches, I will let Kevin and
Luca comment.


>
> >
> >
> > Now, looking at the details:
> >
> > For 64-bit on my x86, we have:
> >
> > struct rte_bucket_4_8 {
> >     uint64_t                   signature;            /*     0     8 */
> >     uint64_t                   lru_list;             /*     8     8 */
> >     struct rte_bucket_4_8 *    next;                 /*    16     8 */
> >     uint64_t                   next_valid;           /*    24     8 */
> >     uint64_t                   key[4];               /*    32    32 */
> >     /* --- cacheline 1 boundary (64 bytes) --- */
> >     uint8_t                    data[];               /*    64     0 */
> >
> >     /* size: 64, cachelines: 1, members: 6 */
> > };
> >
> >
> > For 32-bit, we have:
> >
> > struct rte_bucket_4_8 {
> >     uint64_t                   signature;            /*     0     8 */
> >     uint64_t                   lru_list;             /*     8     8 */
> >     struct rte_bucket_4_8 *    next;                 /*    16     4 */
> >     uint64_t                   next_valid;           /*    20     8 */
> >     uint64_t                   key[4];               /*    28    32 */
> >     uint8_t                    data[];               /*    60     0 */
> >
> >     /* size: 60, cachelines: 1, members: 6 */
> >     /* last cacheline: 60 bytes */
> > } __attribute__((__packed__));
> >
> > ^^ it is interesting that a packed attribute ends up here.
> > I saw no such attribute in the library code.
> > Compiler black magic at work I guess...
> >
>
> Where do you see the packet attribute? I don't see it in the code.

That's pahole reporting this.
Maybe the tool extrapolates this attribute based on the next_valid
field placement... I don't know.

> A packet attribute would explain this issue, i.e. why did the compiler decide not to insert an expected padfing of 4 bytes right after the "next" field, that would allow the field "next_valid" to be aligned to its natural boundary of 8 bytes.

Or a 64-bit field on 32-bit has a special alignment that I am not aware of.


>
> >
> > >
> > > Fixes: 8aa327214c ("table: hash")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Ting Xu <ting.xu@intel.com>
> > >
> > > ---
> > > v3->v4: Change design based on comment
> > > v2->v3: Rebase
> > > v1->v2: Correct patch time
> > > ---
> > >  lib/librte_table/rte_table_hash_key16.c | 17 +++++++++++++++++
> > >  lib/librte_table/rte_table_hash_key32.c | 17 +++++++++++++++++
> > >  lib/librte_table/rte_table_hash_key8.c  | 16 ++++++++++++++++
> > >  3 files changed, 50 insertions(+)
> > >
> > > diff --git a/lib/librte_table/rte_table_hash_key16.c
> > b/lib/librte_table/rte_table_hash_key16.c
> > > index 2cca1c924..c4384b114 100644
> > > --- a/lib/librte_table/rte_table_hash_key16.c
> > > +++ b/lib/librte_table/rte_table_hash_key16.c
> > > @@ -33,6 +33,7 @@
> > >
> > >  #endif
> > >
> > > +#ifdef RTE_ARCH_64
> > >  struct rte_bucket_4_16 {
> > >         /* Cache line 0 */
> > >         uint64_t signature[4 + 1];
> > > @@ -46,6 +47,22 @@ struct rte_bucket_4_16 {
> > >         /* Cache line 2 */
> > >         uint8_t data[0];
> > >  };
> > > +#else
> > > +struct rte_bucket_4_16 {
> > > +       /* Cache line 0 */
> > > +       uint64_t signature[4 + 1];
> > > +       uint64_t lru_list;
> > > +       struct rte_bucket_4_16 *next;
> > > +       uint32_t pad;
> > > +       uint64_t next_valid;
> > > +
> > > +       /* Cache line 1 */
> > > +       uint64_t key[4][2];
> > > +
> > > +       /* Cache line 2 */
> > > +       uint8_t data[0];
> > > +};
> > > +#endif
> >
> > The change could simply be:
> >
> > @@ -38,6 +38,9 @@ struct rte_bucket_4_16 {
> >         uint64_t signature[4 + 1];
> >         uint64_t lru_list;
> >         struct rte_bucket_4_16 *next;
> > +#ifndef RTE_ARCH_64
> > +       uint32_t pad;
> > +#endif
> >         uint64_t next_valid;
> >
> >         /* Cache line 1 */
> >
> > It avoids duplicating the whole structure definition (we could miss
> > updating one side of the #ifdef later).
> > Idem for the other "8" and "32" structures.


What about this comment?
  
Cristian Dumitrescu July 29, 2020, 1:54 p.m. UTC | #8
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, July 29, 2020 2:28 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: Xu, Ting <ting.xu@intel.com>; dev <dev@dpdk.org>; dpdk stable
> <stable@dpdk.org>; Kevin Traynor <ktraynor@redhat.com>; Luca Boccassi
> <bluca@debian.org>
> Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4] lib/table: fix cache
> alignment issue
> 
> On Wed, Jul 29, 2020 at 3:14 PM Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com> wrote:
> > > Please correct me if I am wrong, but it simply means this part of the
> > > table library never worked for 32-bit.
> > > It seems more adding 32-bit support rather than a fix and then I
> > > wonder if it has its place in rc3.
> > >
> >
> > Functionally. the code works, but performance is affected.
> >
> > The only thing that prevents the code from working is the check in the
> table create function that checks the size of the above structure is 64 bytes,
> which caught this issue.
> 
> Yes, and that's my point.
> It was not working.
> It was not tested.
> 
> 

Not sure when this code was last tested on 32-bit systems, I'll let the validation folks comment on this, but I cannot rule out a change in compiler behavior either.

This is a low complexity and low impact change, hence low risk IMO.

> This patch asks for backport in stable branches, I will let Kevin and
> Luca comment.
> 
> 
> >
> > >
> > >
> > > Now, looking at the details:
> > >
> > > For 64-bit on my x86, we have:
> > >
> > > struct rte_bucket_4_8 {
> > >     uint64_t                   signature;            /*     0     8 */
> > >     uint64_t                   lru_list;             /*     8     8 */
> > >     struct rte_bucket_4_8 *    next;                 /*    16     8 */
> > >     uint64_t                   next_valid;           /*    24     8 */
> > >     uint64_t                   key[4];               /*    32    32 */
> > >     /* --- cacheline 1 boundary (64 bytes) --- */
> > >     uint8_t                    data[];               /*    64     0 */
> > >
> > >     /* size: 64, cachelines: 1, members: 6 */
> > > };
> > >
> > >
> > > For 32-bit, we have:
> > >
> > > struct rte_bucket_4_8 {
> > >     uint64_t                   signature;            /*     0     8 */
> > >     uint64_t                   lru_list;             /*     8     8 */
> > >     struct rte_bucket_4_8 *    next;                 /*    16     4 */
> > >     uint64_t                   next_valid;           /*    20     8 */
> > >     uint64_t                   key[4];               /*    28    32 */
> > >     uint8_t                    data[];               /*    60     0 */
> > >
> > >     /* size: 60, cachelines: 1, members: 6 */
> > >     /* last cacheline: 60 bytes */
> > > } __attribute__((__packed__));
> > >
> > > ^^ it is interesting that a packed attribute ends up here.
> > > I saw no such attribute in the library code.
> > > Compiler black magic at work I guess...
> > >
> >
> > Where do you see the packet attribute? I don't see it in the code.
> 
> That's pahole reporting this.
> Maybe the tool extrapolates this attribute based on the next_valid
> field placement... I don't know.
> 
> > A packet attribute would explain this issue, i.e. why did the compiler decide
> not to insert an expected padfing of 4 bytes right after the "next" field, that
> would allow the field "next_valid" to be aligned to its natural boundary of 8
> bytes.
> 
> Or a 64-bit field on 32-bit has a special alignment that I am not aware of.
> 
> 
> >
> > >
> > > >
> > > > Fixes: 8aa327214c ("table: hash")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Ting Xu <ting.xu@intel.com>
> > > >
> > > > ---
> > > > v3->v4: Change design based on comment
> > > > v2->v3: Rebase
> > > > v1->v2: Correct patch time
> > > > ---
> > > >  lib/librte_table/rte_table_hash_key16.c | 17 +++++++++++++++++
> > > >  lib/librte_table/rte_table_hash_key32.c | 17 +++++++++++++++++
> > > >  lib/librte_table/rte_table_hash_key8.c  | 16 ++++++++++++++++
> > > >  3 files changed, 50 insertions(+)
> > > >
> > > > diff --git a/lib/librte_table/rte_table_hash_key16.c
> > > b/lib/librte_table/rte_table_hash_key16.c
> > > > index 2cca1c924..c4384b114 100644
> > > > --- a/lib/librte_table/rte_table_hash_key16.c
> > > > +++ b/lib/librte_table/rte_table_hash_key16.c
> > > > @@ -33,6 +33,7 @@
> > > >
> > > >  #endif
> > > >
> > > > +#ifdef RTE_ARCH_64
> > > >  struct rte_bucket_4_16 {
> > > >         /* Cache line 0 */
> > > >         uint64_t signature[4 + 1];
> > > > @@ -46,6 +47,22 @@ struct rte_bucket_4_16 {
> > > >         /* Cache line 2 */
> > > >         uint8_t data[0];
> > > >  };
> > > > +#else
> > > > +struct rte_bucket_4_16 {
> > > > +       /* Cache line 0 */
> > > > +       uint64_t signature[4 + 1];
> > > > +       uint64_t lru_list;
> > > > +       struct rte_bucket_4_16 *next;
> > > > +       uint32_t pad;
> > > > +       uint64_t next_valid;
> > > > +
> > > > +       /* Cache line 1 */
> > > > +       uint64_t key[4][2];
> > > > +
> > > > +       /* Cache line 2 */
> > > > +       uint8_t data[0];
> > > > +};
> > > > +#endif
> > >
> > > The change could simply be:
> > >
> > > @@ -38,6 +38,9 @@ struct rte_bucket_4_16 {
> > >         uint64_t signature[4 + 1];
> > >         uint64_t lru_list;
> > >         struct rte_bucket_4_16 *next;
> > > +#ifndef RTE_ARCH_64
> > > +       uint32_t pad;
> > > +#endif
> > >         uint64_t next_valid;
> > >
> > >         /* Cache line 1 */
> > >
> > > It avoids duplicating the whole structure definition (we could miss
> > > updating one side of the #ifdef later).
> > > Idem for the other "8" and "32" structures.
> 
> 
> What about this comment?
> 
> 
> --
> David Marchand
  
David Marchand July 29, 2020, 1:59 p.m. UTC | #9
On Wed, Jul 29, 2020 at 3:54 PM Dumitrescu, Cristian
<cristian.dumitrescu@intel.com> wrote:
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Wednesday, July 29, 2020 2:28 PM
> > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> > Cc: Xu, Ting <ting.xu@intel.com>; dev <dev@dpdk.org>; dpdk stable
> > <stable@dpdk.org>; Kevin Traynor <ktraynor@redhat.com>; Luca Boccassi
> > <bluca@debian.org>
> > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4] lib/table: fix cache
> > alignment issue
> >
> > On Wed, Jul 29, 2020 at 3:14 PM Dumitrescu, Cristian
> > <cristian.dumitrescu@intel.com> wrote:
> > > > Please correct me if I am wrong, but it simply means this part of the
> > > > table library never worked for 32-bit.
> > > > It seems more adding 32-bit support rather than a fix and then I
> > > > wonder if it has its place in rc3.
> > > >
> > >
> > > Functionally. the code works, but performance is affected.
> > >
> > > The only thing that prevents the code from working is the check in the
> > table create function that checks the size of the above structure is 64 bytes,
> > which caught this issue.
> >
> > Yes, and that's my point.
> > It was not working.
> > It was not tested.
> >
> >
>
> Not sure when this code was last tested on 32-bit systems, I'll let the validation folks comment on this, but I cannot rule out a change in compiler behavior either.
>
> This is a low complexity and low impact change, hence low risk IMO.

Risk is to be evaluated when there is a need.
I got pinged on this, like it was the end of the times.

Then I find something that is not worth looking at, hence I am a bit irritated.

And please, for the 2nd time, can you look at my comment below?


> > > > > diff --git a/lib/librte_table/rte_table_hash_key16.c
> > > > b/lib/librte_table/rte_table_hash_key16.c
> > > > > index 2cca1c924..c4384b114 100644
> > > > > --- a/lib/librte_table/rte_table_hash_key16.c
> > > > > +++ b/lib/librte_table/rte_table_hash_key16.c
> > > > > @@ -33,6 +33,7 @@
> > > > >
> > > > >  #endif
> > > > >
> > > > > +#ifdef RTE_ARCH_64
> > > > >  struct rte_bucket_4_16 {
> > > > >         /* Cache line 0 */
> > > > >         uint64_t signature[4 + 1];
> > > > > @@ -46,6 +47,22 @@ struct rte_bucket_4_16 {
> > > > >         /* Cache line 2 */
> > > > >         uint8_t data[0];
> > > > >  };
> > > > > +#else
> > > > > +struct rte_bucket_4_16 {
> > > > > +       /* Cache line 0 */
> > > > > +       uint64_t signature[4 + 1];
> > > > > +       uint64_t lru_list;
> > > > > +       struct rte_bucket_4_16 *next;
> > > > > +       uint32_t pad;
> > > > > +       uint64_t next_valid;
> > > > > +
> > > > > +       /* Cache line 1 */
> > > > > +       uint64_t key[4][2];
> > > > > +
> > > > > +       /* Cache line 2 */
> > > > > +       uint8_t data[0];
> > > > > +};
> > > > > +#endif
> > > >
> > > > The change could simply be:
> > > >
> > > > @@ -38,6 +38,9 @@ struct rte_bucket_4_16 {
> > > >         uint64_t signature[4 + 1];
> > > >         uint64_t lru_list;
> > > >         struct rte_bucket_4_16 *next;
> > > > +#ifndef RTE_ARCH_64
> > > > +       uint32_t pad;
> > > > +#endif
> > > >         uint64_t next_valid;
> > > >
> > > >         /* Cache line 1 */
> > > >
> > > > It avoids duplicating the whole structure definition (we could miss
> > > > updating one side of the #ifdef later).
> > > > Idem for the other "8" and "32" structures.
> >
> >
> > What about this comment?

What about this comment?
  
Cristian Dumitrescu July 29, 2020, 2:53 p.m. UTC | #10
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, July 29, 2020 3:00 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: Xu, Ting <ting.xu@intel.com>; dev <dev@dpdk.org>; dpdk stable
> <stable@dpdk.org>; Kevin Traynor <ktraynor@redhat.com>; Luca Boccassi
> <bluca@debian.org>
> Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4] lib/table: fix cache
> alignment issue
> 
> On Wed, Jul 29, 2020 at 3:54 PM Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com> wrote:
> > > -----Original Message-----
> > > From: David Marchand <david.marchand@redhat.com>
> > > Sent: Wednesday, July 29, 2020 2:28 PM
> > > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> > > Cc: Xu, Ting <ting.xu@intel.com>; dev <dev@dpdk.org>; dpdk stable
> > > <stable@dpdk.org>; Kevin Traynor <ktraynor@redhat.com>; Luca
> Boccassi
> > > <bluca@debian.org>
> > > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4] lib/table: fix cache
> > > alignment issue
> > >
> > > On Wed, Jul 29, 2020 at 3:14 PM Dumitrescu, Cristian
> > > <cristian.dumitrescu@intel.com> wrote:
> > > > > Please correct me if I am wrong, but it simply means this part of the
> > > > > table library never worked for 32-bit.
> > > > > It seems more adding 32-bit support rather than a fix and then I
> > > > > wonder if it has its place in rc3.
> > > > >
> > > >
> > > > Functionally. the code works, but performance is affected.
> > > >
> > > > The only thing that prevents the code from working is the check in the
> > > table create function that checks the size of the above structure is 64
> bytes,
> > > which caught this issue.
> > >
> > > Yes, and that's my point.
> > > It was not working.
> > > It was not tested.
> > >
> > >
> >
> > Not sure when this code was last tested on 32-bit systems, I'll let the
> validation folks comment on this, but I cannot rule out a change in compiler
> behavior either.
> >
> > This is a low complexity and low impact change, hence low risk IMO.
> 
> Risk is to be evaluated when there is a need.
> I got pinged on this, like it was the end of the times.
> 
> Then I find something that is not worth looking at, hence I am a bit irritated.
> 

I got pinged as well, and I also had to allocate time on this patch. It probably means it is important for somebody.

> And please, for the 2nd time, can you look at my comment below?
> 
Sorry, I missed it first.

> 
> > > > > > diff --git a/lib/librte_table/rte_table_hash_key16.c
> > > > > b/lib/librte_table/rte_table_hash_key16.c
> > > > > > index 2cca1c924..c4384b114 100644
> > > > > > --- a/lib/librte_table/rte_table_hash_key16.c
> > > > > > +++ b/lib/librte_table/rte_table_hash_key16.c
> > > > > > @@ -33,6 +33,7 @@
> > > > > >
> > > > > >  #endif
> > > > > >
> > > > > > +#ifdef RTE_ARCH_64
> > > > > >  struct rte_bucket_4_16 {
> > > > > >         /* Cache line 0 */
> > > > > >         uint64_t signature[4 + 1];
> > > > > > @@ -46,6 +47,22 @@ struct rte_bucket_4_16 {
> > > > > >         /* Cache line 2 */
> > > > > >         uint8_t data[0];
> > > > > >  };
> > > > > > +#else
> > > > > > +struct rte_bucket_4_16 {
> > > > > > +       /* Cache line 0 */
> > > > > > +       uint64_t signature[4 + 1];
> > > > > > +       uint64_t lru_list;
> > > > > > +       struct rte_bucket_4_16 *next;
> > > > > > +       uint32_t pad;
> > > > > > +       uint64_t next_valid;
> > > > > > +
> > > > > > +       /* Cache line 1 */
> > > > > > +       uint64_t key[4][2];
> > > > > > +
> > > > > > +       /* Cache line 2 */
> > > > > > +       uint8_t data[0];
> > > > > > +};
> > > > > > +#endif
> > > > >
> > > > > The change could simply be:
> > > > >
> > > > > @@ -38,6 +38,9 @@ struct rte_bucket_4_16 {
> > > > >         uint64_t signature[4 + 1];
> > > > >         uint64_t lru_list;
> > > > >         struct rte_bucket_4_16 *next;
> > > > > +#ifndef RTE_ARCH_64
> > > > > +       uint32_t pad;
> > > > > +#endif
> > > > >         uint64_t next_valid;
> > > > >
> > > > >         /* Cache line 1 */
> > > > >
> > > > > It avoids duplicating the whole structure definition (we could miss
> > > > > updating one side of the #ifdef later).
> > > > > Idem for the other "8" and "32" structures.
> > >
> > >
> > > What about this comment?
> 
> What about this comment?
> 

You might suspect I also thought about this option. My preference is for the option in the patch for the reasons that IMO it is easier to read and understand the reason for the difference, even though the code is slightly larger. It also leaves the 64-bit code untouched, so it is easier to remove when we finally decide at some point to drop the 32-bit support.

But I can live with the option you describe as well. Thanks for the input.

For me, it would be great if somebody on this list could indicate why the 4-byte padding was not inserted by the compiler automatically, and hence the need for this fix.

> 
> --
> David Marchand
  
Xu, Ting July 30, 2020, 6:57 a.m. UTC | #11
Hi, all,

> -----Original Message-----
> From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Sent: Wednesday, July 29, 2020 10:53 PM
> To: David Marchand <david.marchand@redhat.com>
> Cc: Xu, Ting <ting.xu@intel.com>; dev <dev@dpdk.org>; dpdk stable
> <stable@dpdk.org>; Kevin Traynor <ktraynor@redhat.com>; Luca Boccassi
> <bluca@debian.org>
> Subject: RE: [dpdk-stable] [dpdk-dev] [PATCH v4] lib/table: fix cache alignment
> issue
> 
> 
> 
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Wednesday, July 29, 2020 3:00 PM
> > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> > Cc: Xu, Ting <ting.xu@intel.com>; dev <dev@dpdk.org>; dpdk stable
> > <stable@dpdk.org>; Kevin Traynor <ktraynor@redhat.com>; Luca Boccassi
> > <bluca@debian.org>
> > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4] lib/table: fix cache
> > alignment issue
> >
> > On Wed, Jul 29, 2020 at 3:54 PM Dumitrescu, Cristian
> > <cristian.dumitrescu@intel.com> wrote:
> > > > -----Original Message-----
> > > > From: David Marchand <david.marchand@redhat.com>
> > > > Sent: Wednesday, July 29, 2020 2:28 PM
> > > > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> > > > Cc: Xu, Ting <ting.xu@intel.com>; dev <dev@dpdk.org>; dpdk stable
> > > > <stable@dpdk.org>; Kevin Traynor <ktraynor@redhat.com>; Luca
> > Boccassi
> > > > <bluca@debian.org>
> > > > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4] lib/table: fix
> > > > cache alignment issue
> > > >
> > > > On Wed, Jul 29, 2020 at 3:14 PM Dumitrescu, Cristian
> > > > <cristian.dumitrescu@intel.com> wrote:
> > > > > > Please correct me if I am wrong, but it simply means this part
> > > > > > of the table library never worked for 32-bit.
> > > > > > It seems more adding 32-bit support rather than a fix and then
> > > > > > I wonder if it has its place in rc3.
> > > > > >
> > > > >
> > > > > Functionally. the code works, but performance is affected.
> > > > >
> > > > > The only thing that prevents the code from working is the check
> > > > > in the
> > > > table create function that checks the size of the above structure
> > > > is 64
> > bytes,
> > > > which caught this issue.
> > > >
> > > > Yes, and that's my point.
> > > > It was not working.
> > > > It was not tested.
> > > >
> > > >
> > >
> > > Not sure when this code was last tested on 32-bit systems, I'll let
> > > the
> > validation folks comment on this, but I cannot rule out a change in
> > compiler behavior either.
> > >
> > > This is a low complexity and low impact change, hence low risk IMO.
> >
> > Risk is to be evaluated when there is a need.
> > I got pinged on this, like it was the end of the times.
> >
> > Then I find something that is not worth looking at, hence I am a bit irritated.
> >
> 
> I got pinged as well, and I also had to allocate time on this patch. It probably
> means it is important for somebody.
> 
> > And please, for the 2nd time, can you look at my comment below?
> >
> Sorry, I missed it first.
> 
> >
> > > > > > > diff --git a/lib/librte_table/rte_table_hash_key16.c
> > > > > > b/lib/librte_table/rte_table_hash_key16.c
> > > > > > > index 2cca1c924..c4384b114 100644
> > > > > > > --- a/lib/librte_table/rte_table_hash_key16.c
> > > > > > > +++ b/lib/librte_table/rte_table_hash_key16.c
> > > > > > > @@ -33,6 +33,7 @@
> > > > > > >
> > > > > > >  #endif
> > > > > > >
> > > > > > > +#ifdef RTE_ARCH_64
> > > > > > >  struct rte_bucket_4_16 {
> > > > > > >         /* Cache line 0 */
> > > > > > >         uint64_t signature[4 + 1]; @@ -46,6 +47,22 @@ struct
> > > > > > > rte_bucket_4_16 {
> > > > > > >         /* Cache line 2 */
> > > > > > >         uint8_t data[0];
> > > > > > >  };
> > > > > > > +#else
> > > > > > > +struct rte_bucket_4_16 {
> > > > > > > +       /* Cache line 0 */
> > > > > > > +       uint64_t signature[4 + 1];
> > > > > > > +       uint64_t lru_list;
> > > > > > > +       struct rte_bucket_4_16 *next;
> > > > > > > +       uint32_t pad;
> > > > > > > +       uint64_t next_valid;
> > > > > > > +
> > > > > > > +       /* Cache line 1 */
> > > > > > > +       uint64_t key[4][2];
> > > > > > > +
> > > > > > > +       /* Cache line 2 */
> > > > > > > +       uint8_t data[0];
> > > > > > > +};
> > > > > > > +#endif
> > > > > >
> > > > > > The change could simply be:
> > > > > >
> > > > > > @@ -38,6 +38,9 @@ struct rte_bucket_4_16 {
> > > > > >         uint64_t signature[4 + 1];
> > > > > >         uint64_t lru_list;
> > > > > >         struct rte_bucket_4_16 *next;
> > > > > > +#ifndef RTE_ARCH_64
> > > > > > +       uint32_t pad;
> > > > > > +#endif
> > > > > >         uint64_t next_valid;
> > > > > >
> > > > > >         /* Cache line 1 */
> > > > > >
> > > > > > It avoids duplicating the whole structure definition (we could
> > > > > > miss updating one side of the #ifdef later).
> > > > > > Idem for the other "8" and "32" structures.
> > > >
> > > >
> > > > What about this comment?
> >
> > What about this comment?
> >
> 
> You might suspect I also thought about this option. My preference is for the
> option in the patch for the reasons that IMO it is easier to read and
> understand the reason for the difference, even though the code is slightly
> larger. It also leaves the 64-bit code untouched, so it is easier to remove when
> we finally decide at some point to drop the 32-bit support.
> 
> But I can live with the option you describe as well. Thanks for the input.
> 
> For me, it would be great if somebody on this list could indicate why the 4-
> byte padding was not inserted by the compiler automatically, and hence the
> need for this fix.
> 

Thanks for your help and additional works on this patch.
The validation team tested this case in a 32-bit environment, besides, there are a series of similar tests in 32-bit environment as well. There might be some practical needs for this.
Therefore, before we decide to drop 32-bit support formally, I think such modification is OK, if we cannot fix the compiler issue directly.

Shall I update the patch as David suggested to make it simpler?

> >
> > --
> > David Marchand
  
Kevin Traynor July 30, 2020, 10:35 a.m. UTC | #12
On 29/07/2020 14:28, David Marchand wrote:
> On Wed, Jul 29, 2020 at 3:14 PM Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com> wrote:
>>> Please correct me if I am wrong, but it simply means this part of the
>>> table library never worked for 32-bit.
>>> It seems more adding 32-bit support rather than a fix and then I
>>> wonder if it has its place in rc3.
>>>
>>
>> Functionally. the code works, but performance is affected.
>>
>> The only thing that prevents the code from working is the check in the table create function that checks the size of the above structure is 64 bytes, which caught this issue.
> 
> Yes, and that's my point.
> It was not working.
> It was not tested.
> 
> 
> This patch asks for backport in stable branches, I will let Kevin and
> Luca comment.
> 

It should be in master first, but then it's a fix so I think it can go
to stable if requested and supported by the table maintainer in the
event of any (future) regressions.

> 
>>
>>>
>>>
>>> Now, looking at the details:
>>>
>>> For 64-bit on my x86, we have:
>>>
>>> struct rte_bucket_4_8 {
>>>     uint64_t                   signature;            /*     0     8 */
>>>     uint64_t                   lru_list;             /*     8     8 */
>>>     struct rte_bucket_4_8 *    next;                 /*    16     8 */
>>>     uint64_t                   next_valid;           /*    24     8 */
>>>     uint64_t                   key[4];               /*    32    32 */
>>>     /* --- cacheline 1 boundary (64 bytes) --- */
>>>     uint8_t                    data[];               /*    64     0 */
>>>
>>>     /* size: 64, cachelines: 1, members: 6 */
>>> };
>>>
>>>
>>> For 32-bit, we have:
>>>
>>> struct rte_bucket_4_8 {
>>>     uint64_t                   signature;            /*     0     8 */
>>>     uint64_t                   lru_list;             /*     8     8 */
>>>     struct rte_bucket_4_8 *    next;                 /*    16     4 */
>>>     uint64_t                   next_valid;           /*    20     8 */
>>>     uint64_t                   key[4];               /*    28    32 */
>>>     uint8_t                    data[];               /*    60     0 */
>>>
>>>     /* size: 60, cachelines: 1, members: 6 */
>>>     /* last cacheline: 60 bytes */
>>> } __attribute__((__packed__));
>>>
>>> ^^ it is interesting that a packed attribute ends up here.
>>> I saw no such attribute in the library code.
>>> Compiler black magic at work I guess...
>>>
>>
>> Where do you see the packet attribute? I don't see it in the code.
> 
> That's pahole reporting this.
> Maybe the tool extrapolates this attribute based on the next_valid
> field placement... I don't know.
> 
>> A packet attribute would explain this issue, i.e. why did the compiler decide not to insert an expected padfing of 4 bytes right after the "next" field, that would allow the field "next_valid" to be aligned to its natural boundary of 8 bytes.
> 
> Or a 64-bit field on 32-bit has a special alignment that I am not aware of.
> 
> 
>>
>>>
>>>>
>>>> Fixes: 8aa327214c ("table: hash")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Ting Xu <ting.xu@intel.com>
>>>>
>>>> ---
>>>> v3->v4: Change design based on comment
>>>> v2->v3: Rebase
>>>> v1->v2: Correct patch time
>>>> ---
>>>>  lib/librte_table/rte_table_hash_key16.c | 17 +++++++++++++++++
>>>>  lib/librte_table/rte_table_hash_key32.c | 17 +++++++++++++++++
>>>>  lib/librte_table/rte_table_hash_key8.c  | 16 ++++++++++++++++
>>>>  3 files changed, 50 insertions(+)
>>>>
>>>> diff --git a/lib/librte_table/rte_table_hash_key16.c
>>> b/lib/librte_table/rte_table_hash_key16.c
>>>> index 2cca1c924..c4384b114 100644
>>>> --- a/lib/librte_table/rte_table_hash_key16.c
>>>> +++ b/lib/librte_table/rte_table_hash_key16.c
>>>> @@ -33,6 +33,7 @@
>>>>
>>>>  #endif
>>>>
>>>> +#ifdef RTE_ARCH_64
>>>>  struct rte_bucket_4_16 {
>>>>         /* Cache line 0 */
>>>>         uint64_t signature[4 + 1];
>>>> @@ -46,6 +47,22 @@ struct rte_bucket_4_16 {
>>>>         /* Cache line 2 */
>>>>         uint8_t data[0];
>>>>  };
>>>> +#else
>>>> +struct rte_bucket_4_16 {
>>>> +       /* Cache line 0 */
>>>> +       uint64_t signature[4 + 1];
>>>> +       uint64_t lru_list;
>>>> +       struct rte_bucket_4_16 *next;
>>>> +       uint32_t pad;
>>>> +       uint64_t next_valid;
>>>> +
>>>> +       /* Cache line 1 */
>>>> +       uint64_t key[4][2];
>>>> +
>>>> +       /* Cache line 2 */
>>>> +       uint8_t data[0];
>>>> +};
>>>> +#endif
>>>
>>> The change could simply be:
>>>
>>> @@ -38,6 +38,9 @@ struct rte_bucket_4_16 {
>>>         uint64_t signature[4 + 1];
>>>         uint64_t lru_list;
>>>         struct rte_bucket_4_16 *next;
>>> +#ifndef RTE_ARCH_64
>>> +       uint32_t pad;
>>> +#endif
>>>         uint64_t next_valid;
>>>
>>>         /* Cache line 1 */
>>>
>>> It avoids duplicating the whole structure definition (we could miss
>>> updating one side of the #ifdef later).
>>> Idem for the other "8" and "32" structures.
> 
> 
> What about this comment?
> 
>
  
Xu, Ting Sept. 9, 2020, 6:18 a.m. UTC | #13
Hi, All

Sorry to bother you again. Since the next release is coming, and this patch is deferred for some time, I'd like to know that shall we continue to merge it?

What is the key issue that blocks us? Thanks!

Best Regards,
Xu Ting

> -----Original Message-----
> From: Kevin Traynor <ktraynor@redhat.com>
> Sent: Thursday, July 30, 2020 6:35 PM
> To: David Marchand <david.marchand@redhat.com>; Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com>
> Cc: Xu, Ting <ting.xu@intel.com>; dev <dev@dpdk.org>; dpdk stable
> <stable@dpdk.org>; Luca Boccassi <bluca@debian.org>
> Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4] lib/table: fix cache alignment
> issue
> 
> 
> 
> On 29/07/2020 14:28, David Marchand wrote:
> > On Wed, Jul 29, 2020 at 3:14 PM Dumitrescu, Cristian
> > <cristian.dumitrescu@intel.com> wrote:
> >>> Please correct me if I am wrong, but it simply means this part of
> >>> the table library never worked for 32-bit.
> >>> It seems more adding 32-bit support rather than a fix and then I
> >>> wonder if it has its place in rc3.
> >>>
> >>
> >> Functionally. the code works, but performance is affected.
> >>
> >> The only thing that prevents the code from working is the check in the
> table create function that checks the size of the above structure is 64 bytes,
> which caught this issue.
> >
> > Yes, and that's my point.
> > It was not working.
> > It was not tested.
> >
> >
> > This patch asks for backport in stable branches, I will let Kevin and
> > Luca comment.
> >
> 
> It should be in master first, but then it's a fix so I think it can go to stable if
> requested and supported by the table maintainer in the event of any (future)
> regressions.
> 
> >
> >>
> >>>
> >>>
> >>> Now, looking at the details:
> >>>
> >>> For 64-bit on my x86, we have:
> >>>
> >>> struct rte_bucket_4_8 {
> >>>     uint64_t                   signature;            /*     0     8 */
> >>>     uint64_t                   lru_list;             /*     8     8 */
> >>>     struct rte_bucket_4_8 *    next;                 /*    16     8 */
> >>>     uint64_t                   next_valid;           /*    24     8 */
> >>>     uint64_t                   key[4];               /*    32    32 */
> >>>     /* --- cacheline 1 boundary (64 bytes) --- */
> >>>     uint8_t                    data[];               /*    64     0 */
> >>>
> >>>     /* size: 64, cachelines: 1, members: 6 */ };
> >>>
> >>>
> >>> For 32-bit, we have:
> >>>
> >>> struct rte_bucket_4_8 {
> >>>     uint64_t                   signature;            /*     0     8 */
> >>>     uint64_t                   lru_list;             /*     8     8 */
> >>>     struct rte_bucket_4_8 *    next;                 /*    16     4 */
> >>>     uint64_t                   next_valid;           /*    20     8 */
> >>>     uint64_t                   key[4];               /*    28    32 */
> >>>     uint8_t                    data[];               /*    60     0 */
> >>>
> >>>     /* size: 60, cachelines: 1, members: 6 */
> >>>     /* last cacheline: 60 bytes */
> >>> } __attribute__((__packed__));
> >>>
> >>> ^^ it is interesting that a packed attribute ends up here.
> >>> I saw no such attribute in the library code.
> >>> Compiler black magic at work I guess...
> >>>
> >>
> >> Where do you see the packet attribute? I don't see it in the code.
> >
> > That's pahole reporting this.
> > Maybe the tool extrapolates this attribute based on the next_valid
> > field placement... I don't know.
> >
> >> A packet attribute would explain this issue, i.e. why did the compiler
> decide not to insert an expected padfing of 4 bytes right after the "next" field,
> that would allow the field "next_valid" to be aligned to its natural boundary
> of 8 bytes.
> >
> > Or a 64-bit field on 32-bit has a special alignment that I am not aware of.
> >
> >
> >>
> >>>
> >>>>
> >>>> Fixes: 8aa327214c ("table: hash")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Ting Xu <ting.xu@intel.com>
> >>>>
> >>>> ---
> >>>> v3->v4: Change design based on comment
> >>>> v2->v3: Rebase
> >>>> v1->v2: Correct patch time
> >>>> ---
> >>>>  lib/librte_table/rte_table_hash_key16.c | 17 +++++++++++++++++
> >>>>  lib/librte_table/rte_table_hash_key32.c | 17 +++++++++++++++++
> >>>>  lib/librte_table/rte_table_hash_key8.c  | 16 ++++++++++++++++
> >>>>  3 files changed, 50 insertions(+)
> >>>>
> >>>> diff --git a/lib/librte_table/rte_table_hash_key16.c
> >>> b/lib/librte_table/rte_table_hash_key16.c
> >>>> index 2cca1c924..c4384b114 100644
> >>>> --- a/lib/librte_table/rte_table_hash_key16.c
> >>>> +++ b/lib/librte_table/rte_table_hash_key16.c
> >>>> @@ -33,6 +33,7 @@
> >>>>
> >>>>  #endif
> >>>>
> >>>> +#ifdef RTE_ARCH_64
> >>>>  struct rte_bucket_4_16 {
> >>>>         /* Cache line 0 */
> >>>>         uint64_t signature[4 + 1];
> >>>> @@ -46,6 +47,22 @@ struct rte_bucket_4_16 {
> >>>>         /* Cache line 2 */
> >>>>         uint8_t data[0];
> >>>>  };
> >>>> +#else
> >>>> +struct rte_bucket_4_16 {
> >>>> +       /* Cache line 0 */
> >>>> +       uint64_t signature[4 + 1];
> >>>> +       uint64_t lru_list;
> >>>> +       struct rte_bucket_4_16 *next;
> >>>> +       uint32_t pad;
> >>>> +       uint64_t next_valid;
> >>>> +
> >>>> +       /* Cache line 1 */
> >>>> +       uint64_t key[4][2];
> >>>> +
> >>>> +       /* Cache line 2 */
> >>>> +       uint8_t data[0];
> >>>> +};
> >>>> +#endif
> >>>
> >>> The change could simply be:
> >>>
> >>> @@ -38,6 +38,9 @@ struct rte_bucket_4_16 {
> >>>         uint64_t signature[4 + 1];
> >>>         uint64_t lru_list;
> >>>         struct rte_bucket_4_16 *next;
> >>> +#ifndef RTE_ARCH_64
> >>> +       uint32_t pad;
> >>> +#endif
> >>>         uint64_t next_valid;
> >>>
> >>>         /* Cache line 1 */
> >>>
> >>> It avoids duplicating the whole structure definition (we could miss
> >>> updating one side of the #ifdef later).
> >>> Idem for the other "8" and "32" structures.
> >
> >
> > What about this comment?
> >
> >
  
David Marchand Sept. 15, 2020, 8:03 a.m. UTC | #14
On Wed, Sep 9, 2020 at 8:19 AM Xu, Ting <ting.xu@intel.com> wrote:
> Sorry to bother you again. Since the next release is coming, and this patch is deferred for some time, I'd like to know that shall we continue to merge it?
>
> What is the key issue that blocks us? Thanks!

Afaics, Kevin email did not get a reply, so I understand this as an
implicit ack from the table maintainers.
I will take this patch as is (i.e. with the request for backport) in rc1.
  
Xu, Ting Oct. 14, 2020, 8:26 a.m. UTC | #15
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, September 15, 2020 4:03 PM
> To: Xu, Ting <ting.xu@intel.com>
> Cc: Kevin Traynor <ktraynor@redhat.com>; Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com>; dev <dev@dpdk.org>; dpdk stable
> <stable@dpdk.org>; Luca Boccassi <bluca@debian.org>
> Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4] lib/table: fix cache alignment
> issue
> 
> On Wed, Sep 9, 2020 at 8:19 AM Xu, Ting <ting.xu@intel.com> wrote:
> > Sorry to bother you again. Since the next release is coming, and this patch is
> deferred for some time, I'd like to know that shall we continue to merge it?
> >
> > What is the key issue that blocks us? Thanks!
> 
> Afaics, Kevin email did not get a reply, so I understand this as an implicit ack
> from the table maintainers.
> I will take this patch as is (i.e. with the request for backport) in rc1.
> 

Kindly remind RC1 is coming soon. If there is no additional comments, could this be merged?

> 
> --
> David Marchand
  
David Marchand Oct. 14, 2020, 1:53 p.m. UTC | #16
On Wed, Jul 22, 2020 at 4:13 AM Ting Xu <ting.xu@intel.com> wrote:
>
> When create softnic hash table with 16 keys, it failed on 32-bit
> environment, because the pointer field in structure rte_bucket_4_16
> is only 32 bits. Add a padding field in 32-bit environment to keep
> the structure to a multiple of 64 bytes. Apply this to 8-byte and
> 32-byte key hash function as well.
>
> Fixes: 8aa327214c ("table: hash")
> Cc: stable@dpdk.org
>
> Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

Applied, thanks.
  

Patch

diff --git a/lib/librte_table/rte_table_hash_key16.c b/lib/librte_table/rte_table_hash_key16.c
index 2cca1c924..c4384b114 100644
--- a/lib/librte_table/rte_table_hash_key16.c
+++ b/lib/librte_table/rte_table_hash_key16.c
@@ -33,6 +33,7 @@ 
 
 #endif
 
+#ifdef RTE_ARCH_64
 struct rte_bucket_4_16 {
 	/* Cache line 0 */
 	uint64_t signature[4 + 1];
@@ -46,6 +47,22 @@  struct rte_bucket_4_16 {
 	/* Cache line 2 */
 	uint8_t data[0];
 };
+#else
+struct rte_bucket_4_16 {
+	/* Cache line 0 */
+	uint64_t signature[4 + 1];
+	uint64_t lru_list;
+	struct rte_bucket_4_16 *next;
+	uint32_t pad;
+	uint64_t next_valid;
+
+	/* Cache line 1 */
+	uint64_t key[4][2];
+
+	/* Cache line 2 */
+	uint8_t data[0];
+};
+#endif
 
 struct rte_table_hash {
 	struct rte_table_stats stats;
diff --git a/lib/librte_table/rte_table_hash_key32.c b/lib/librte_table/rte_table_hash_key32.c
index a137c5028..3e0031fe1 100644
--- a/lib/librte_table/rte_table_hash_key32.c
+++ b/lib/librte_table/rte_table_hash_key32.c
@@ -33,6 +33,7 @@ 
 
 #endif
 
+#ifdef RTE_ARCH_64
 struct rte_bucket_4_32 {
 	/* Cache line 0 */
 	uint64_t signature[4 + 1];
@@ -46,6 +47,22 @@  struct rte_bucket_4_32 {
 	/* Cache line 3 */
 	uint8_t data[0];
 };
+#else
+struct rte_bucket_4_32 {
+	/* Cache line 0 */
+	uint64_t signature[4 + 1];
+	uint64_t lru_list;
+	struct rte_bucket_4_32 *next;
+	uint32_t pad;
+	uint64_t next_valid;
+
+	/* Cache lines 1 and 2 */
+	uint64_t key[4][4];
+
+	/* Cache line 3 */
+	uint8_t data[0];
+};
+#endif
 
 struct rte_table_hash {
 	struct rte_table_stats stats;
diff --git a/lib/librte_table/rte_table_hash_key8.c b/lib/librte_table/rte_table_hash_key8.c
index 1811ad8d0..34e3ed1af 100644
--- a/lib/librte_table/rte_table_hash_key8.c
+++ b/lib/librte_table/rte_table_hash_key8.c
@@ -31,6 +31,7 @@ 
 
 #endif
 
+#ifdef RTE_ARCH_64
 struct rte_bucket_4_8 {
 	/* Cache line 0 */
 	uint64_t signature;
@@ -43,6 +44,21 @@  struct rte_bucket_4_8 {
 	/* Cache line 1 */
 	uint8_t data[0];
 };
+#else
+struct rte_bucket_4_8 {
+	/* Cache line 0 */
+	uint64_t signature;
+	uint64_t lru_list;
+	struct rte_bucket_4_8 *next;
+	uint32_t pad;
+	uint64_t next_valid;
+
+	uint64_t key[4];
+
+	/* Cache line 1 */
+	uint8_t data[0];
+};
+#endif
 
 struct rte_table_hash {
 	struct rte_table_stats stats;