Message ID | 20200901165643.15668-1-stephen@networkplumber.org (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Thomas Monjalon |
Headers | show |
Series | usertools: add huge page setup script | expand |
Context | Check | Description |
---|---|---|
ci/checkpatch | success | coding style OK |
ci/Intel-compilation | success | Compilation OK |
ci/travis-robot | success | Travis build: passed |
ci/iol-mellanox-Performance | success | Performance Testing PASS |
ci/iol-testing | fail | Testing issues |
ci/iol-intel-Performance | success | Performance Testing PASS |
ci/iol-intel-Functional | success | Functional Testing PASS |
On 9/1/2020 5:56 PM, Stephen Hemminger wrote: > This is an improved version of the setup of huge pages > bases on earlier DPDK setup. Differences are: > * it autodetects NUMA vs non NUMA > * it allows setting different page sizes > recent kernels support multiple sizes. > * it accepts a parameter in bytes (not pages). > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > This is lightly tested, it still needs testing on multiple architectures > etc. > Thanks. It can be useful to have options to display current hugepage settings and remove the allocation.
On Tue, Sep 01, 2020 at 09:56:43AM -0700, Stephen Hemminger wrote: > This is an improved version of the setup of huge pages > bases on earlier DPDK setup. Differences are: > * it autodetects NUMA vs non NUMA > * it allows setting different page sizes > recent kernels support multiple sizes. > * it accepts a parameter in bytes (not pages). > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > This is lightly tested, it still needs testing on multiple architectures > etc. > > usertools/hugepage-setup.sh | 169 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 169 insertions(+) > create mode 100755 usertools/hugepage-setup.sh > > diff --git a/usertools/hugepage-setup.sh b/usertools/hugepage-setup.sh > new file mode 100755 > index 000000000000..df132e2f8d64 > --- /dev/null > +++ b/usertools/hugepage-setup.sh > @@ -0,0 +1,169 @@ > +#! /bin/bash Is there a good reason to limit this to bash rather than general "sh"? Also, if we ever see this script being expanded to cover more, would it be more extensible in python rather than shell?
On Wed, 2 Sep 2020 10:55:07 +0100 Bruce Richardson <bruce.richardson@intel.com> wrote: > On Tue, Sep 01, 2020 at 09:56:43AM -0700, Stephen Hemminger wrote: > > This is an improved version of the setup of huge pages > > bases on earlier DPDK setup. Differences are: > > * it autodetects NUMA vs non NUMA > > * it allows setting different page sizes > > recent kernels support multiple sizes. > > * it accepts a parameter in bytes (not pages). > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > > --- > > This is lightly tested, it still needs testing on multiple architectures > > etc. > > > > usertools/hugepage-setup.sh | 169 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 169 insertions(+) > > create mode 100755 usertools/hugepage-setup.sh > > > > diff --git a/usertools/hugepage-setup.sh b/usertools/hugepage-setup.sh > > new file mode 100755 > > index 000000000000..df132e2f8d64 > > --- /dev/null > > +++ b/usertools/hugepage-setup.sh > > @@ -0,0 +1,169 @@ > > +#! /bin/bash > > Is there a good reason to limit this to bash rather than general "sh"? > > Also, if we ever see this script being expanded to cover more, would it be > more extensible in python rather than shell? Mainly because bash has arithmetic operations, and doing it with normal shell requires using expr.
diff --git a/usertools/hugepage-setup.sh b/usertools/hugepage-setup.sh new file mode 100755 index 000000000000..df132e2f8d64 --- /dev/null +++ b/usertools/hugepage-setup.sh @@ -0,0 +1,169 @@ +#! /bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2010-2014 Intel Corporation +# + +usage() +{ + echo "Usage: $0 size [pagesize]" + echo " size is in bytes with optional M or G suffix" + echo " pagesize is the pagesize to use" + exit 1 +} + +get_pagesize() +{ + SIZE="$1" + + if [[ "$SIZE" =~ ^[0-9]+G$ ]]; then + echo $((${SIZE%%G} * 1024 * 1024)) + elif [[ "$SIZE" =~ ^[0-9]+M$ ]]; then + echo $((${SIZE%%M} * 1024)) + elif [[ "$SIZE" =~ ^[0-9]+K$ ]]; then + echo ${SIZE%%K} + elif [[ "$SIZE" =~ ^[0-9]+$ ]]; then + if [ $((SIZE % 1024)) -ne 0 ]; then + exit 1 + else + echo $((SIZE / 1024)) + fi + else + exit 1 + fi +} + +# +# Creates hugepage filesystem. +# +create_mnt_huge() +{ + echo "Creating /mnt/huge and mounting as hugetlbfs" + mkdir -p /mnt/huge + + grep -s '/mnt/huge' /proc/mounts > /dev/null + if [ $? -ne 0 ] ; then + mount -t hugetlbfs -o pagesize=${PAGESIZE} nodev /mnt/huge + fi +} + +# +# Removes hugepage filesystem. +# +remove_mnt_huge() +{ + echo "Unmounting /mnt/huge and removing directory" + grep -s '/mnt/huge' /proc/mounts > /dev/null + if [ $? -eq 0 ] ; then + umount /mnt/huge + fi + + if [ -d /mnt/huge ] ; then + rm -R /mnt/huge + fi +} +# +# Removes all reserved hugepages. +# +clear_huge_pages() +{ + echo > .echo_tmp + for d in /sys/devices/system/node/node? ; do + for sz in $d/hugepages/hugepages-* ; do + echo "echo 0 > ${sz}/nr_hugepages" >> .echo_tmp + done + done + echo "Removing currently reserved hugepages" + sh .echo_tmp + rm -f .echo_tmp + + remove_mnt_huge +} + +# +# Creates hugepages. +# +set_non_numa_pages() +{ + path=/sys/kernel/mm/hugepages/hugepages-${HUGEPGSZ}kB + if [ ! -d $path ]; then + >&2 echo "${HUGEPGSZ}K is not a valid huge page size" + exit 1 + fi + for sz in /sys/kernel/mm/hugepages/hugepages-* ; do + echo "echo 0 > ${sz}/nr_hugepages" >> .echo_tmp + done + + echo "Reserving $PAGES hugepages of size $HUGEPGSZ kB" + echo $PAGES > $path/nr_hugepages + + create_mnt_huge +} + +# +# Creates hugepages on specific NUMA nodes. +# +set_numa_pages() +{ + clear_huge_pages + + echo > .echo_tmp + for d in /sys/devices/system/node/node? ; do + node=$(basename $d) + path="$d/hugepages/hugepages-${HUGEPGSZ}kB" + if [ ! -d $path ]; then + >&2 echo "${HUGEPGSZ}K is not a valid huge page size" + exit 1 + fi + + echo "echo $Pages > $path" >> .echo_tmp + done + echo "Reserving $PAGES hugepages of size $HUGEPGSZ kB (numa)" + sh .echo_tmp + rm -f .echo_tmp + + create_mnt_huge +} + +# +# Need size argument +# +[ $# -ge 1 ] || usage + +# +# Convert from size to pages +# +KSIZE=$(get_pagesize $1) +if [ $? -ne 0 ]; then + >&2 echo "Invalid huge area size: $1" + exit 1 +fi + +# +# Optional second argument is pagesize +# +if [ $# -gt 1 ]; then + HUGEPGSZ=$(get_pagesize $2) + if [ $? -ne 0 ]; then + >&2 echo "Invalid huge page size: $2" + exit 1 + fi +else + HUGEPGSZ=$(awk '/^Hugepagesize/ { print $2 }' /proc/meminfo ) +fi + +if [ $((KSIZE % HUGEPGSZ)) -ne 0 ] ; then + echo "Invalid number of huge pages $KSIZE K, should be multiple of $HUGEPGSZ K" + exit 1 +fi + +PAGES=$((KSIZE / HUGEPGSZ)) +PAGESIZE=$((HUGEPGSZ * 1024)) + +# +# Do NUMA if necessary +# +if [ -e /sys/devices/numa/node ]; then + set_numa_pages +else + set_non_numa_pages +fi
This is an improved version of the setup of huge pages bases on earlier DPDK setup. Differences are: * it autodetects NUMA vs non NUMA * it allows setting different page sizes recent kernels support multiple sizes. * it accepts a parameter in bytes (not pages). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- This is lightly tested, it still needs testing on multiple architectures etc. usertools/hugepage-setup.sh | 169 ++++++++++++++++++++++++++++++++++++ 1 file changed, 169 insertions(+) create mode 100755 usertools/hugepage-setup.sh