From patchwork Mon Jun 25 15:59:44 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Burakov, Anatoly" <anatoly.burakov@intel.com>
X-Patchwork-Id: 41483
X-Patchwork-Delegate: thomas@monjalon.net
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@dpdk.org
Delivered-To: patchwork@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 1CFAE5398;
	Mon, 25 Jun 2018 18:00:03 +0200 (CEST)
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
	by dpdk.org (Postfix) with ESMTP id 2EA234CA1
	for <dev@dpdk.org>; Mon, 25 Jun 2018 17:59:51 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
	by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
	25 Jun 2018 08:59:49 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.51,270,1526367600"; d="scan'208";a="60036798"
Received: from irvmail001.ir.intel.com ([163.33.26.43])
	by FMSMGA003.fm.intel.com with ESMTP; 25 Jun 2018 08:59:48 -0700
Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com
	[10.237.217.45])
	by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id
	w5PFxlQ3032518; Mon, 25 Jun 2018 16:59:47 +0100
Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1])
	by sivswdev01.ir.intel.com with ESMTP id w5PFxlZa026634;
	Mon, 25 Jun 2018 16:59:47 +0100
Received: (from aburakov@localhost)
	by sivswdev01.ir.intel.com with LOCAL id w5PFxlWo026630;
	Mon, 25 Jun 2018 16:59:47 +0100
From: Anatoly Burakov <anatoly.burakov@intel.com>
To: dev@dpdk.org
Cc: john.mcnamara@intel.com, bruce.richardson@intel.com,
	pablo.de.lara.guarch@intel.com, david.hunt@intel.com,
	mohammad.abdul.awal@intel.com
Date: Mon, 25 Jun 2018 16:59:44 +0100
Message-Id: 
 <064ea16d773eef551c174f27f0a331d1871c91fc.1529940601.git.anatoly.burakov@intel.com>
X-Mailer: git-send-email 1.7.0.7
In-Reply-To: <cover.1529940601.git.anatoly.burakov@intel.com>
References: <cover.1529940601.git.anatoly.burakov@intel.com>
In-Reply-To: <cover.1529940601.git.anatoly.burakov@intel.com>
References: <cover.1529940601.git.anatoly.burakov@intel.com>
Subject: [dpdk-dev] [RFC 7/9] usertools/lib: add hugepage information library
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
	<mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
	<mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Add a library for getting hugepage information on Linux system.

Supported functionality:

- List active hugetlbfs mountpoints
- Change hugetlbfs mountpoints
  - Supports both transient and persistent (fstab) mountpoints
- Display/change number of allocated hugepages
  - Supports both total and per-NUMA node page counts

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 usertools/DPDKConfigLib/HugeUtil.py | 309 ++++++++++++++++++++++++++++
 usertools/DPDKConfigLib/Util.py     |  49 +++++
 2 files changed, 358 insertions(+)
 create mode 100755 usertools/DPDKConfigLib/HugeUtil.py

diff --git a/usertools/DPDKConfigLib/HugeUtil.py b/usertools/DPDKConfigLib/HugeUtil.py
new file mode 100755
index 000000000..79ed97bb7
--- /dev/null
+++ b/usertools/DPDKConfigLib/HugeUtil.py
@@ -0,0 +1,309 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+
+from .PlatformInfo import *
+from .Util import *
+import re
+import os
+import subprocess
+
+__KERNEL_NUMA_HP_PATH = \
+    "/sys/devices/system/node/node%i/hugepages/hugepages-%ikB/"
+__KERNEL_HP_PATH = "/sys/kernel/mm/hugepages/hugepages-%ikB/"
+__NR_HP_FNAME = "nr_hugepages"
+# check if we have systemd
+_have_systemd = run(["which", "systemctl"])
+
+# local copy of platform info
+info = PlatformInfo()
+
+
+def _find_runtime_hugetlbfs_mountpoints():
+    mountpoints = {}
+    with open("/proc/mounts") as f:
+        for line in f:
+            if not _is_hugetlbfs_mount(line):
+                continue
+            line = line.strip()
+            _, path, _, options, _, _ = line.split()
+
+            m = re.search(r"pagesize=(\d+\w)", options)
+            if m:
+                pagesz = human_readable_to_kilobytes(m.group(1))
+            else:
+                # if no size specified, assume default hugepage size
+                pagesz = info.default_hugepage_size
+            if pagesz in mountpoints:
+                raise RuntimeError("Multiple mountpoints for same hugetlbfs")
+            mountpoints[pagesz] = path
+    return mountpoints
+
+
+def _find_nr_hugepages(page_sz, node=None):
+    if node is not None:
+        path = os.path.join(__KERNEL_NUMA_HP_PATH % (node, page_sz),
+                            __NR_HP_FNAME)
+    else:
+        path = os.path.join(__KERNEL_HP_PATH % (page_sz), __NR_HP_FNAME)
+    return int(read_file(path))
+
+
+def _write_nr_hugepages(page_sz, nr_pages, node=None):
+    if node is not None:
+        path = os.path.join(__KERNEL_NUMA_HP_PATH % (node, page_sz),
+                            __NR_HP_FNAME)
+    else:
+        path = os.path.join(__KERNEL_HP_PATH % (page_sz), __NR_HP_FNAME)
+    write_file(path, str(nr_pages))
+
+
+def _is_hugetlbfs_mount(line):
+    # ignore comemnts
+    if line.strip().startswith("#"):
+        return False
+    tokens = line.split()
+    if len(tokens) != 6:
+        return False
+    return tokens[2] == "hugetlbfs"
+
+
+def _update_fstab_hugetlbfs_mounts(mountpoints):
+    # remove all hugetlbfs mappings
+    with open("/etc/fstab") as f:
+        lines = f.readlines()
+    mount_idxs = [idx for idx, line in enumerate(lines)
+                  if _is_hugetlbfs_mount(line)]
+
+    # delete all lines with hugetlbfs mountpoints
+    for idx in reversed(sorted(mount_idxs)):
+        del lines[idx]
+
+    # append new mountpoints
+    lines.extend(["hugetlbfs %s hugetlbfs pagesize=%s 0 0\n" %
+                  (mountpoints[size], kilobytes_to_human_readable(size))
+                  for size in mountpoints.keys() if mountpoints[size] != ""])
+
+    # finally, write everything back
+    with open("/etc/fstab", "w") as f:
+        f.writelines(lines)
+
+
+def _find_fstab_hugetlbfs_mounts():
+    mountpoints = {}
+    with open("/etc/fstab") as f:
+        for line in f:
+            if not _is_hugetlbfs_mount(line):
+                continue
+            line = line.strip()
+            _, path, _, options, _, _ = line.split()
+
+            m = re.search(r"pagesize=(\d+\w)", options)
+            if m:
+                pagesz = human_readable_to_kilobytes(m.group(1))
+            else:
+                # if no size specified, assume default hugepage size
+                pagesz = info.default_hugepage_size
+            if pagesz in mountpoints:
+                raise RuntimeError("Multiple mountpoints for same hugetlbfs")
+            mountpoints[pagesz] = path
+    return mountpoints
+
+
+def _find_systemd_hugetlbfs_mounts():
+    # we find systemd mounts by virtue of them not being in fstab, so check each
+    units = []
+    out = subprocess.check_output(["systemctl", "-t", "mount", "--all"],
+                                  stderr=None)
+    lines = out.decode("utf-8").splitlines()
+    for line in lines:
+        line = line.strip()
+
+        tokens = line.split()
+
+        if len(tokens) == 0:
+            continue
+
+        # masked unit files are second token
+        if tokens[0].endswith(".mount"):
+            unit = tokens[0]
+        elif tokens[1].endswith(".mount"):
+            tokens = tokens[1:]
+            unit = tokens[0]
+        else:
+            continue  # not a unit line
+
+        # if this is inactive and masked, we don't care
+        load, active, sub = tokens[1:4]
+        if load == "masked" and active == "inactive":
+            continue
+
+        units.append({"unit": unit, "load": load, "active": active, "sub": sub})
+
+    for unit_dict in units:
+        # status may return non-zero, but we don't care
+        try:
+            out = subprocess.check_output(["systemctl", "status",
+                                           unit_dict["unit"]], stderr=None)
+        except subprocess.CalledProcessError as e:
+            out = e.output
+        lines = out.decode("utf-8").splitlines()
+        for line in lines:
+            line = line.strip()
+            if line.startswith("What"):
+                unit_dict["fs"] = line.split()[1]
+            elif line.startswith("Where"):
+                unit_dict["path"] = line.split()[1]
+
+    fstab_mountpoints = _find_fstab_hugetlbfs_mounts().values()
+    filter_func = (lambda x: x.get("fs", "") == "hugetlbfs" and
+                             x.get("path", "") not in fstab_mountpoints)
+    return {u["unit"]: u["path"] for u in filter(filter_func, units)}
+
+
+def _disable_systemd_hugetlbfs_mounts():
+    mounts = _find_systemd_hugetlbfs_mounts()
+    for unit, path in mounts.keys():
+        run(["systemctl", "stop", unit])  # unmount
+        run(["systemctl", "mask", unit])  # prevent this from ever running
+
+
+class PersistentMountpointConfig:
+    def __init__(self):
+        self.update()
+
+    def update(self):
+        self.reset()
+        self.mountpoints = _find_fstab_hugetlbfs_mounts()
+        for sz in info.hugepage_sizes_enabled:
+            self.mountpoints.setdefault(sz, "")
+
+    def commit(self):
+        # check if we are trying to mount hugetlbfs of unsupported size
+        supported = set(info.hugepage_sizes_supported)
+        all_sizes = set(self.mountpoints.keys())
+        if not all_sizes.issubset(supported):
+            diff = supported.difference(all_sizes)
+            raise ValueError("Unsupported hugepage sizes: %s" %
+                             [kilobytes_to_human_readable(s) for s in diff])
+
+        if _have_systemd:
+            # dealing with fstab is easier, so disable all systemd mounts
+            _disable_systemd_hugetlbfs_mounts()
+
+        _update_fstab_hugetlbfs_mounts(self.mountpoints)
+
+        if _have_systemd:
+            run(["systemctl", "daemon-reload"])
+        self.update()
+
+    def reset(self):
+        self.mountpoints = {}  # pagesz : path
+
+
+class RuntimeMountpointConfig:
+    def __init__(self):
+        self.update()
+
+    def update(self):
+        self.reset()
+        self.mountpoints = _find_runtime_hugetlbfs_mountpoints()
+        for sz in info.hugepage_sizes_enabled:
+            self.mountpoints.setdefault(sz, "")
+
+    def commit(self):
+        # check if we are trying to mount hugetlbfs of unsupported size
+        supported = set(info.hugepage_sizes_supported)
+        all_sizes = set(self.mountpoints.keys())
+        if not all_sizes.issubset(supported):
+            diff = supported.difference(all_sizes)
+            raise ValueError("Unsupported hugepage sizes: %s" %
+                             [kilobytes_to_human_readable(s) for s in diff])
+
+        cur_mp = _find_runtime_hugetlbfs_mountpoints()
+        sizes = set(cur_mp.keys()).union(self.mountpoints.keys())
+
+        for size in sizes:
+            old = cur_mp.get(size, "")
+            new = self.mountpoints.get(size, "")
+
+            is_unmount = old != "" and new == ""
+            is_mount = old == "" and new != ""
+            is_remount = old != "" and new != "" and old != new
+
+            mount_param = ["-t", "hugetlbfs", "-o",
+                           "pagesize=%sM" % (size / 1024)]
+
+            if is_unmount:
+                run(["umount", old])
+            elif is_mount:
+                mkpath(new)
+                run(["mount"] + mount_param + [new])
+            elif is_remount:
+                mkpath(new)
+                run(["umount", old])
+                run(["mount"] + mount_param + [new])
+
+        if _have_systemd:
+            run(["systemctl", "daemon-reload"])
+        self.update()
+
+    def reset(self):
+        self.mountpoints = {}  # pagesz : path
+
+
+class RuntimeHugepageConfig:
+    def __init__(self):
+        self.update()
+
+    def update(self):
+        self.reset()
+
+        hugepage_sizes = info.hugepage_sizes_enabled
+        if len(hugepage_sizes) == 0:
+            raise RuntimeError("Hugepages appear to be disabled")
+        self.total_nr_hugepages = \
+            {page_sz: _find_nr_hugepages(page_sz)
+             for page_sz in hugepage_sizes}
+        for node in info.numa_nodes:
+            for page_sz in hugepage_sizes:
+                self.hugepages_per_node[node, page_sz] = \
+                    _find_nr_hugepages(page_sz, node)
+
+    def commit(self):
+        # sanity checks
+
+        # check if user has messed with hugepage sizes
+        supported_sizes = set(info.hugepage_sizes_supported)
+        keys = self.total_nr_hugepages.keys()
+        if set(keys) != set(supported_sizes):
+            diff = supported_sizes.difference(keys)
+            raise ValueError("Missing hugepage sizes: %s" %
+                             [kilobytes_to_human_readable(s) for s in diff])
+
+        for d in self.hugepages_per_node:
+            keys = d.keys()
+            if set(keys) != set(supported_sizes):
+                diff = supported_sizes.difference(keys)
+                raise ValueError("Missing hugepage sizes: %s" %
+                                 [kilobytes_to_human_readable(s) for s in diff])
+
+        # check if all hugepage numbers add up
+        for size in supported_sizes:
+            total_hps = sum([self.hugepages_per_node[node, size]
+                             for node in info.numa_nodes])
+            if total_hps != self.total_nr_hugepages[size]:
+                raise ValueError("Total number of hugepages not equal to sum of"
+                                 "pages on all NUMA nodes")
+
+        # now, commit our configuration
+        for size, value in self.total_nr_hugepages.items():
+            _write_nr_hugepages(size, value)
+        for node, size, value in self.hugepages_per_node.items():
+            _write_nr_hugepages(size, value, node)
+        self.update()
+
+    def reset(self):
+        self.total_nr_hugepages = {}
+        self.hugepages_per_node = {}
diff --git a/usertools/DPDKConfigLib/Util.py b/usertools/DPDKConfigLib/Util.py
index eb21cce15..ba0c36537 100755
--- a/usertools/DPDKConfigLib/Util.py
+++ b/usertools/DPDKConfigLib/Util.py
@@ -2,6 +2,25 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+import subprocess
+import re
+import os
+import errno
+
+__PGSZ_UNITS = ['k', 'M', 'G', 'T', 'P']
+
+
+# equivalent to mkdir -p
+def mkpath(path):
+    try:
+        os.makedirs(path)
+    except OSError as e:
+        if e.errno == errno.EEXIST:
+            pass
+        else:
+            raise e
+
+
 # read entire file and return the result
 def read_file(path):
     with open(path, 'r') as f:
@@ -21,6 +40,36 @@ def append_file(path, value):
         f.write(value)
 
 
+# run command while suppressing its output
+def run(args):
+    try:
+        subprocess.check_output(args, stderr=None)
+    except subprocess.CalledProcessError:
+        return False
+    return True
+
+
+def kilobytes_to_human_readable(value):
+    for unit in __PGSZ_UNITS:
+        if abs(value) < 1024:
+            cur_unit = unit
+            break
+        value /= 1024
+    else:
+        raise ValueError("Value too large")
+    return "%i%s" % (value, cur_unit)
+
+
+def human_readable_to_kilobytes(value):
+    m = re.match(r"(\d+)([%s])$" % ''.join(__PGSZ_UNITS), value)
+    if not m:
+        raise ValueError("Invalid value format: %s" % value)
+    ival = int(m.group(1))
+    suffix = m.group(2)
+    pow = __PGSZ_UNITS.index(suffix)
+    return ival * (1024 ** pow)
+
+
 # split line into key-value pair, cleaning up the values in the process
 def kv_split(line, separator):
     # just in case