[v5] eal: use memzone to share tsc hz with secondary processes
Checks
Commit Message
Ideally, get_tsc_freq_arch() is able to provide the
TSC rate using arch-specific means. When that is not
possible, DPDK reverts to calculating the TSC rate with
a 100ms nanosleep or 1s sleep. The latter occurs more
frequently in VMs which often do not have access to the
data they need from arch-specific means (CPUID leaf 0x15
or MSR 0xCE on x86).
In secondary processes, the extra 100ms is especially
noticeable and consumes the bulk of rte_eal_init()
execution time. To resolve this extra delay, have
the primary process put the TSC rate into a shared
memory region that the secondary process can lookup.
Reduces rte_eal_init() execution time in a secondary
process from 165ms to 66ms on my test system.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
---
lib/librte_eal/common/eal_common_timer.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
Comments
On Mon, Aug 26, 2019 at 06:44:46AM -0700, Jim Harris wrote:
> Ideally, get_tsc_freq_arch() is able to provide the
> TSC rate using arch-specific means. When that is not
> possible, DPDK reverts to calculating the TSC rate with
> a 100ms nanosleep or 1s sleep. The latter occurs more
> frequently in VMs which often do not have access to the
> data they need from arch-specific means (CPUID leaf 0x15
> or MSR 0xCE on x86).
>
> In secondary processes, the extra 100ms is especially
> noticeable and consumes the bulk of rte_eal_init()
> execution time. To resolve this extra delay, have
> the primary process put the TSC rate into a shared
> memory region that the secondary process can lookup.
>
> Reduces rte_eal_init() execution time in a secondary
> process from 165ms to 66ms on my test system.
>
> Signed-off-by: Jim Harris <james.r.harris@intel.com>
> ---
I think using shared memory is a lot simpler to manage, so LGTM.
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
On 26-Aug-19 2:44 PM, Jim Harris wrote:
> Ideally, get_tsc_freq_arch() is able to provide the
> TSC rate using arch-specific means. When that is not
> possible, DPDK reverts to calculating the TSC rate with
> a 100ms nanosleep or 1s sleep. The latter occurs more
> frequently in VMs which often do not have access to the
> data they need from arch-specific means (CPUID leaf 0x15
> or MSR 0xCE on x86).
>
> In secondary processes, the extra 100ms is especially
> noticeable and consumes the bulk of rte_eal_init()
> execution time. To resolve this extra delay, have
> the primary process put the TSC rate into a shared
> memory region that the secondary process can lookup.
>
> Reduces rte_eal_init() execution time in a secondary
> process from 165ms to 66ms on my test system.
>
> Signed-off-by: Jim Harris <james.r.harris@intel.com>
> ---
I think this is a bad idea. If you're allocating something, you're
supposed to free it in rte_eal_cleanup(). If you don't, that memory
leaks (i.e. there are leftover hugepages after process exit). Since both
primary and secondary are referencing it (even if only at init), there
is no safe way to free this memzone.
On Tue, Aug 27, 2019 at 01:04:18PM +0100, Burakov, Anatoly wrote:
> On 26-Aug-19 2:44 PM, Jim Harris wrote:
> > Ideally, get_tsc_freq_arch() is able to provide the
> > TSC rate using arch-specific means. When that is not
> > possible, DPDK reverts to calculating the TSC rate with
> > a 100ms nanosleep or 1s sleep. The latter occurs more
> > frequently in VMs which often do not have access to the
> > data they need from arch-specific means (CPUID leaf 0x15
> > or MSR 0xCE on x86).
> >
> > In secondary processes, the extra 100ms is especially
> > noticeable and consumes the bulk of rte_eal_init()
> > execution time. To resolve this extra delay, have
> > the primary process put the TSC rate into a shared
> > memory region that the secondary process can lookup.
> >
> > Reduces rte_eal_init() execution time in a secondary
> > process from 165ms to 66ms on my test system.
> >
> > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > ---
>
> I think this is a bad idea. If you're allocating something, you're supposed
> to free it in rte_eal_cleanup(). If you don't, that memory leaks (i.e. there
> are leftover hugepages after process exit). Since both primary and secondary
> are referencing it (even if only at init), there is no safe way to free this
> memzone.
>
What is the issue with not freeing it? How do we handle this for other
global data to be shared?
On Tue, Aug 27, 2019 at 01:04:18PM +0100, Burakov, Anatoly wrote:
> On 26-Aug-19 2:44 PM, Jim Harris wrote:
> > Ideally, get_tsc_freq_arch() is able to provide the
> > TSC rate using arch-specific means. When that is not
> > possible, DPDK reverts to calculating the TSC rate with
> > a 100ms nanosleep or 1s sleep. The latter occurs more
> > frequently in VMs which often do not have access to the
> > data they need from arch-specific means (CPUID leaf 0x15
> > or MSR 0xCE on x86).
> >
> > In secondary processes, the extra 100ms is especially
> > noticeable and consumes the bulk of rte_eal_init()
> > execution time. To resolve this extra delay, have
> > the primary process put the TSC rate into a shared
> > memory region that the secondary process can lookup.
> >
> > Reduces rte_eal_init() execution time in a secondary
> > process from 165ms to 66ms on my test system.
> >
> > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > ---
>
> I think this is a bad idea. If you're allocating something, you're supposed
> to free it in rte_eal_cleanup(). If you don't, that memory leaks (i.e. there
> are leftover hugepages after process exit). Since both primary and secondary
> are referencing it (even if only at init), there is no safe way to free this
> memzone.
>
Actually, after looking at the code again, the secondary processes only
reference the memzone once on init, and fallback to the current path if the
memzone does not exist. Therefore, it should be safe to remove the memzone
on termination of the primary process.
On 27-Aug-19 1:48 PM, Bruce Richardson wrote:
> On Tue, Aug 27, 2019 at 01:04:18PM +0100, Burakov, Anatoly wrote:
>> On 26-Aug-19 2:44 PM, Jim Harris wrote:
>>> Ideally, get_tsc_freq_arch() is able to provide the
>>> TSC rate using arch-specific means. When that is not
>>> possible, DPDK reverts to calculating the TSC rate with
>>> a 100ms nanosleep or 1s sleep. The latter occurs more
>>> frequently in VMs which often do not have access to the
>>> data they need from arch-specific means (CPUID leaf 0x15
>>> or MSR 0xCE on x86).
>>>
>>> In secondary processes, the extra 100ms is especially
>>> noticeable and consumes the bulk of rte_eal_init()
>>> execution time. To resolve this extra delay, have
>>> the primary process put the TSC rate into a shared
>>> memory region that the secondary process can lookup.
>>>
>>> Reduces rte_eal_init() execution time in a secondary
>>> process from 165ms to 66ms on my test system.
>>>
>>> Signed-off-by: Jim Harris <james.r.harris@intel.com>
>>> ---
>>
>> I think this is a bad idea. If you're allocating something, you're supposed
>> to free it in rte_eal_cleanup(). If you don't, that memory leaks (i.e. there
>> are leftover hugepages after process exit). Since both primary and secondary
>> are referencing it (even if only at init), there is no safe way to free this
>> memzone.
>>
> What is the issue with not freeing it? How do we handle this for other
> global data to be shared?
>
I don't think we have any shared data that persists after process exit.
Or at least not in EAL we don't - i'm aware that some drivers leak
memory in that way, but that's on them, not on me :) The immediate
consequence of such an approach is failed EAL flags unit tests (because
they expect that nothing is left after DPDK process termination).
Most shared data in libraries is explicitly initialized/deinitialized by
the library user, so it's not an issue. An exception to that is the
timer library, which uses refcounting to initialize/deinitialize shared
state.
@@ -15,9 +15,12 @@
#include <rte_log.h>
#include <rte_cycles.h>
#include <rte_pause.h>
+#include <rte_memzone.h>
#include "eal_private.h"
+static const char *MZ_RTE_TSC_FREQ = "rte_tsc_freq";
+
/* The frequency of the RDTSC timer resolution */
static uint64_t eal_tsc_resolution_hz;
@@ -77,9 +80,16 @@ estimate_tsc_freq(void)
void
set_tsc_freq(void)
{
- uint64_t freq;
+ const struct rte_memzone *mz;
+ uint64_t freq = 0;
- freq = get_tsc_freq_arch();
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ mz = rte_memzone_lookup(MZ_RTE_TSC_FREQ);
+ if (mz != NULL)
+ freq = *(uint64_t *)mz->addr;
+ }
+ if (!freq)
+ freq = get_tsc_freq_arch();
if (!freq)
freq = get_tsc_freq();
if (!freq)
@@ -87,6 +97,12 @@ set_tsc_freq(void)
RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000);
eal_tsc_resolution_hz = freq;
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ mz = rte_memzone_reserve(MZ_RTE_TSC_FREQ, sizeof(uint64_t),
+ SOCKET_ID_ANY, 0);
+ if (mz != NULL)
+ *(uint64_t *)mz->addr = freq;
+ }
}
void rte_delay_us_callback_register(void (*userfunc)(unsigned int))