Message ID | 20210325082125.37488-1-xiangxia.m.yue@gmail.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Thomas Monjalon |
Headers | show |
Series | eal/linux: add operation LOCK_NB to flock() | expand |
Context | Check | Description |
---|---|---|
ci/intel-Testing | success | Testing PASS |
ci/Intel-compilation | success | Compilation OK |
ci/iol-mellanox-Performance | success | Performance Testing PASS |
ci/github-robot | success | github build: passed |
ci/travis-robot | success | travis build: passed |
ci/iol-testing | success | Testing PASS |
ci/iol-abi-testing | success | Testing PASS |
ci/iol-intel-Performance | success | Performance Testing PASS |
ci/checkpatch | success | coding style OK |
On Thu, Mar 25, 2021 at 4:25 PM <xiangxia.m.yue@gmail.com> wrote: > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> ping > The hugepage of different size, 2MB, 1GB may be mounted on > the same directory (e.g /dev/hugepages). Then dpdk > primary process will be blocked. To address this issue, > add the LOCK_NB flags to flock(). > > $ cat /proc/mounts > ... > none /dev/hugepages hugetlbfs rw,seclabel,relatime,pagesize=1024M 0 0 > none /dev/hugepages hugetlbfs rw,seclabel,relatime,pagesize=2M 0 0 > > Add more details for err logs. > > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > --- > lib/librte_eal/linux/eal_hugepage_info.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_eal/linux/eal_hugepage_info.c b/lib/librte_eal/linux/eal_hugepage_info.c > index d97792cadeb6..1ff76e539053 100644 > --- a/lib/librte_eal/linux/eal_hugepage_info.c > +++ b/lib/librte_eal/linux/eal_hugepage_info.c > @@ -451,9 +451,12 @@ hugepage_info_init(void) > hpi->lock_descriptor = open(hpi->hugedir, O_RDONLY); > > /* if blocking lock failed */ > - if (flock(hpi->lock_descriptor, LOCK_EX) == -1) { > + if (flock(hpi->lock_descriptor, LOCK_EX | LOCK_NB) == -1) { > RTE_LOG(CRIT, EAL, > - "Failed to lock hugepage directory!\n"); > + "Failed to lock hugepage directory! " > + "The hugepage dir (%s) was locked by " > + "other processes or self twice.\n", > + hpi->hugedir); > break; > } > /* clear out the hugepages dir from unused pages */ > -- > 2.27.0 >
On 25-Mar-21 8:21 AM, xiangxia.m.yue@gmail.com wrote: > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > The hugepage of different size, 2MB, 1GB may be mounted on > the same directory (e.g /dev/hugepages). Then dpdk > primary process will be blocked. To address this issue, > add the LOCK_NB flags to flock(). > > $ cat /proc/mounts > ... > none /dev/hugepages hugetlbfs rw,seclabel,relatime,pagesize=1024M 0 0 > none /dev/hugepages hugetlbfs rw,seclabel,relatime,pagesize=2M 0 0 > > Add more details for err logs. > > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > --- > lib/librte_eal/linux/eal_hugepage_info.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_eal/linux/eal_hugepage_info.c b/lib/librte_eal/linux/eal_hugepage_info.c > index d97792cadeb6..1ff76e539053 100644 > --- a/lib/librte_eal/linux/eal_hugepage_info.c > +++ b/lib/librte_eal/linux/eal_hugepage_info.c > @@ -451,9 +451,12 @@ hugepage_info_init(void) > hpi->lock_descriptor = open(hpi->hugedir, O_RDONLY); > > /* if blocking lock failed */ > - if (flock(hpi->lock_descriptor, LOCK_EX) == -1) { > + if (flock(hpi->lock_descriptor, LOCK_EX | LOCK_NB) == -1) { > RTE_LOG(CRIT, EAL, > - "Failed to lock hugepage directory!\n"); > + "Failed to lock hugepage directory! " > + "The hugepage dir (%s) was locked by " > + "other processes or self twice.\n", > + hpi->hugedir); > break; > } > /* clear out the hugepages dir from unused pages */ > Use cases such as "having two hugetlbfs page sizes on the same hugetlbfs mountpoint" are user error, but i agree that deadlocking is probably not the way we want to go about it. An alternative way would be to check if we already have a mountpoint with the same path, and this would produce a better error message (as a user, "hugepage dir is locked by self twice" doesn't tell me anything useful), at a cost of slightly more complicated code. I'm not sure which way i want to go here. Normally, hugetlbfs shouldn't be staying locked for long, so i'm wary of adding a LOCK_NB here, so i feel slightly uneasy about this patch. Do you have any opinions? Also, do other OS's EALs need similar fix?
diff --git a/lib/librte_eal/linux/eal_hugepage_info.c b/lib/librte_eal/linux/eal_hugepage_info.c index d97792cadeb6..1ff76e539053 100644 --- a/lib/librte_eal/linux/eal_hugepage_info.c +++ b/lib/librte_eal/linux/eal_hugepage_info.c @@ -451,9 +451,12 @@ hugepage_info_init(void) hpi->lock_descriptor = open(hpi->hugedir, O_RDONLY); /* if blocking lock failed */ - if (flock(hpi->lock_descriptor, LOCK_EX) == -1) { + if (flock(hpi->lock_descriptor, LOCK_EX | LOCK_NB) == -1) { RTE_LOG(CRIT, EAL, - "Failed to lock hugepage directory!\n"); + "Failed to lock hugepage directory! " + "The hugepage dir (%s) was locked by " + "other processes or self twice.\n", + hpi->hugedir); break; } /* clear out the hugepages dir from unused pages */