[v5] eal: use memzone to share tsc hz with secondary processes

Message ID 156682708634.28714.543470193614987025.stgit@jrharri1-skx (mailing list archive)
State Superseded, archived
Headers
Series [v5] eal: use memzone to share tsc hz with secondary processes |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-Compile-Testing success Compile Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Harris, James R Aug. 26, 2019, 1:44 p.m. UTC
  Ideally, get_tsc_freq_arch() is able to provide the
TSC rate using arch-specific means.  When that is not
possible, DPDK reverts to calculating the TSC rate with
a 100ms nanosleep or 1s sleep.  The latter occurs more
frequently in VMs which often do not have access to the
data they need from arch-specific means (CPUID leaf 0x15
or MSR 0xCE on x86).

In secondary processes, the extra 100ms is especially
noticeable and consumes the bulk of rte_eal_init()
execution time.  To resolve this extra delay, have
the primary process put the TSC rate into a shared
memory region that the secondary process can lookup.

Reduces rte_eal_init() execution time in a secondary
process from 165ms to 66ms on my test system.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
---
 lib/librte_eal/common/eal_common_timer.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)
  

Comments

Bruce Richardson Aug. 27, 2019, 8:14 a.m. UTC | #1
On Mon, Aug 26, 2019 at 06:44:46AM -0700, Jim Harris wrote:
> Ideally, get_tsc_freq_arch() is able to provide the
> TSC rate using arch-specific means.  When that is not
> possible, DPDK reverts to calculating the TSC rate with
> a 100ms nanosleep or 1s sleep.  The latter occurs more
> frequently in VMs which often do not have access to the
> data they need from arch-specific means (CPUID leaf 0x15
> or MSR 0xCE on x86).
> 
> In secondary processes, the extra 100ms is especially
> noticeable and consumes the bulk of rte_eal_init()
> execution time.  To resolve this extra delay, have
> the primary process put the TSC rate into a shared
> memory region that the secondary process can lookup.
> 
> Reduces rte_eal_init() execution time in a secondary
> process from 165ms to 66ms on my test system.
> 
> Signed-off-by: Jim Harris <james.r.harris@intel.com>
> ---

I think using shared memory is a lot simpler to manage, so LGTM.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
Burakov, Anatoly Aug. 27, 2019, 12:04 p.m. UTC | #2
On 26-Aug-19 2:44 PM, Jim Harris wrote:
> Ideally, get_tsc_freq_arch() is able to provide the
> TSC rate using arch-specific means.  When that is not
> possible, DPDK reverts to calculating the TSC rate with
> a 100ms nanosleep or 1s sleep.  The latter occurs more
> frequently in VMs which often do not have access to the
> data they need from arch-specific means (CPUID leaf 0x15
> or MSR 0xCE on x86).
> 
> In secondary processes, the extra 100ms is especially
> noticeable and consumes the bulk of rte_eal_init()
> execution time.  To resolve this extra delay, have
> the primary process put the TSC rate into a shared
> memory region that the secondary process can lookup.
> 
> Reduces rte_eal_init() execution time in a secondary
> process from 165ms to 66ms on my test system.
> 
> Signed-off-by: Jim Harris <james.r.harris@intel.com>
> ---

I think this is a bad idea. If you're allocating something, you're 
supposed to free it in rte_eal_cleanup(). If you don't, that memory 
leaks (i.e. there are leftover hugepages after process exit). Since both 
primary and secondary are referencing it (even if only at init), there 
is no safe way to free this memzone.
  
Bruce Richardson Aug. 27, 2019, 12:48 p.m. UTC | #3
On Tue, Aug 27, 2019 at 01:04:18PM +0100, Burakov, Anatoly wrote:
> On 26-Aug-19 2:44 PM, Jim Harris wrote:
> > Ideally, get_tsc_freq_arch() is able to provide the
> > TSC rate using arch-specific means.  When that is not
> > possible, DPDK reverts to calculating the TSC rate with
> > a 100ms nanosleep or 1s sleep.  The latter occurs more
> > frequently in VMs which often do not have access to the
> > data they need from arch-specific means (CPUID leaf 0x15
> > or MSR 0xCE on x86).
> > 
> > In secondary processes, the extra 100ms is especially
> > noticeable and consumes the bulk of rte_eal_init()
> > execution time.  To resolve this extra delay, have
> > the primary process put the TSC rate into a shared
> > memory region that the secondary process can lookup.
> > 
> > Reduces rte_eal_init() execution time in a secondary
> > process from 165ms to 66ms on my test system.
> > 
> > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > ---
> 
> I think this is a bad idea. If you're allocating something, you're supposed
> to free it in rte_eal_cleanup(). If you don't, that memory leaks (i.e. there
> are leftover hugepages after process exit). Since both primary and secondary
> are referencing it (even if only at init), there is no safe way to free this
> memzone.
> 
What is the issue with not freeing it? How do we handle this for other
global data to be shared?
  
Bruce Richardson Aug. 27, 2019, 2 p.m. UTC | #4
On Tue, Aug 27, 2019 at 01:04:18PM +0100, Burakov, Anatoly wrote:
> On 26-Aug-19 2:44 PM, Jim Harris wrote:
> > Ideally, get_tsc_freq_arch() is able to provide the
> > TSC rate using arch-specific means.  When that is not
> > possible, DPDK reverts to calculating the TSC rate with
> > a 100ms nanosleep or 1s sleep.  The latter occurs more
> > frequently in VMs which often do not have access to the
> > data they need from arch-specific means (CPUID leaf 0x15
> > or MSR 0xCE on x86).
> > 
> > In secondary processes, the extra 100ms is especially
> > noticeable and consumes the bulk of rte_eal_init()
> > execution time.  To resolve this extra delay, have
> > the primary process put the TSC rate into a shared
> > memory region that the secondary process can lookup.
> > 
> > Reduces rte_eal_init() execution time in a secondary
> > process from 165ms to 66ms on my test system.
> > 
> > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > ---
> 
> I think this is a bad idea. If you're allocating something, you're supposed
> to free it in rte_eal_cleanup(). If you don't, that memory leaks (i.e. there
> are leftover hugepages after process exit). Since both primary and secondary
> are referencing it (even if only at init), there is no safe way to free this
> memzone.
> 
Actually, after looking at the code again, the secondary processes only
reference the memzone once on init, and fallback to the current path if the
memzone does not exist. Therefore, it should be safe to remove the memzone
on termination of the primary process.
  
Burakov, Anatoly Aug. 28, 2019, 9:01 a.m. UTC | #5
On 27-Aug-19 1:48 PM, Bruce Richardson wrote:
> On Tue, Aug 27, 2019 at 01:04:18PM +0100, Burakov, Anatoly wrote:
>> On 26-Aug-19 2:44 PM, Jim Harris wrote:
>>> Ideally, get_tsc_freq_arch() is able to provide the
>>> TSC rate using arch-specific means.  When that is not
>>> possible, DPDK reverts to calculating the TSC rate with
>>> a 100ms nanosleep or 1s sleep.  The latter occurs more
>>> frequently in VMs which often do not have access to the
>>> data they need from arch-specific means (CPUID leaf 0x15
>>> or MSR 0xCE on x86).
>>>
>>> In secondary processes, the extra 100ms is especially
>>> noticeable and consumes the bulk of rte_eal_init()
>>> execution time.  To resolve this extra delay, have
>>> the primary process put the TSC rate into a shared
>>> memory region that the secondary process can lookup.
>>>
>>> Reduces rte_eal_init() execution time in a secondary
>>> process from 165ms to 66ms on my test system.
>>>
>>> Signed-off-by: Jim Harris <james.r.harris@intel.com>
>>> ---
>>
>> I think this is a bad idea. If you're allocating something, you're supposed
>> to free it in rte_eal_cleanup(). If you don't, that memory leaks (i.e. there
>> are leftover hugepages after process exit). Since both primary and secondary
>> are referencing it (even if only at init), there is no safe way to free this
>> memzone.
>>
> What is the issue with not freeing it? How do we handle this for other
> global data to be shared?
> 

I don't think we have any shared data that persists after process exit. 
Or at least not in EAL we don't - i'm aware that some drivers leak 
memory in that way, but that's on them, not on me :) The immediate 
consequence of such an approach is failed EAL flags unit tests (because 
they expect that nothing is left after DPDK process termination).

Most shared data in libraries is explicitly initialized/deinitialized by 
the library user, so it's not an issue. An exception to that is the 
timer library, which uses refcounting to initialize/deinitialize shared 
state.
  

Patch

diff --git a/lib/librte_eal/common/eal_common_timer.c b/lib/librte_eal/common/eal_common_timer.c
index 145543de7..2aeab6462 100644
--- a/lib/librte_eal/common/eal_common_timer.c
+++ b/lib/librte_eal/common/eal_common_timer.c
@@ -15,9 +15,12 @@ 
 #include <rte_log.h>
 #include <rte_cycles.h>
 #include <rte_pause.h>
+#include <rte_memzone.h>
 
 #include "eal_private.h"
 
+static const char *MZ_RTE_TSC_FREQ = "rte_tsc_freq";
+
 /* The frequency of the RDTSC timer resolution */
 static uint64_t eal_tsc_resolution_hz;
 
@@ -77,9 +80,16 @@  estimate_tsc_freq(void)
 void
 set_tsc_freq(void)
 {
-	uint64_t freq;
+	const struct rte_memzone *mz;
+	uint64_t freq = 0;
 
-	freq = get_tsc_freq_arch();
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		mz = rte_memzone_lookup(MZ_RTE_TSC_FREQ);
+		if (mz != NULL)
+			freq = *(uint64_t *)mz->addr;
+	}
+	if (!freq)
+		freq = get_tsc_freq_arch();
 	if (!freq)
 		freq = get_tsc_freq();
 	if (!freq)
@@ -87,6 +97,12 @@  set_tsc_freq(void)
 
 	RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000);
 	eal_tsc_resolution_hz = freq;
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mz = rte_memzone_reserve(MZ_RTE_TSC_FREQ, sizeof(uint64_t),
+					 SOCKET_ID_ANY, 0);
+		if (mz != NULL)
+			*(uint64_t *)mz->addr = freq;
+	}
 }
 
 void rte_delay_us_callback_register(void (*userfunc)(unsigned int))