[v4] dts: Change hugepage runtime config to 2MB Exclusively

Message ID 20240418161026.2839-1-npratte@iol.unh.edu (mailing list archive)
State New
Delegated to: Thomas Monjalon
Headers
Series [v4] dts: Change hugepage runtime config to 2MB Exclusively |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/loongarch-unit-testing success Unit Testing PASS
ci/intel-Functional success Functional PASS
ci/github-robot: build success github build: passed
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing fail Testing issues
ci/iol-sample-apps-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS

Commit Message

Nicholas Pratte April 18, 2024, 4:10 p.m. UTC
  The previous implementation configures and allocates hugepage sizes
based on a system default. This can lead to two problems: overallocation of
hugepages (which may crash the remote host), and configuration of hugepage
sizes that are not recommended during runtime. This new implementation
allows only 2MB hugepage allocation during runtime; any other unique
hugepage size must be configured by the end-user for initializing DTS.

If the amount of 2MB hugepages requested exceeds the amount of 2MB
hugepages already configured on the system, then the system will remount
hugepages to cover the difference. If the amount of hugepages requested is
either less than or equal to the amount already configured on the system,
then nothing is done.

Bugzilla ID: 1370
Signed-off-by: Nicholas Pratte <npratte@iol.unh.edu>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
---

v4:
 * dts.rst punctuation/grammar corrections and 2mb exclusivity
   justifications included in documentation
---
 doc/guides/tools/dts.rst                     |  6 ++++-
 dts/conf.yaml                                |  4 ++--
 dts/framework/config/__init__.py             |  4 ++--
 dts/framework/config/conf_yaml_schema.json   |  6 ++---
 dts/framework/config/types.py                |  2 +-
 dts/framework/testbed_model/linux_session.py | 24 +++++++++++---------
 dts/framework/testbed_model/node.py          |  8 ++++++-
 dts/framework/testbed_model/os_session.py    |  5 +++-
 8 files changed, 37 insertions(+), 22 deletions(-)
  

Comments

Juraj Linkeš April 25, 2024, 8 a.m. UTC | #1
Just a few minor points, otherwise this looks good.

> diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
> index 47b218b2c6..71473dbb3d 100644
> --- a/doc/guides/tools/dts.rst
> +++ b/doc/guides/tools/dts.rst
> @@ -131,7 +131,11 @@ There are two areas that need to be set up on a System Under Test:
>
>       You may specify the optional hugepage configuration in the DTS config file.
>       If you do, DTS will take care of configuring hugepages,
> -     overwriting your current SUT hugepage configuration.
> +     overwriting your current SUT hugepage configuration. Configuration of hugepages via DTS
> +     allows only for allocation of 2MB hugepages, as doing so prevents accidental/over
> +     allocation of hugepages and hugepages sizes not recommended during runtime due to

This wording is a bit confusing. What would make sense to me is
"allocation of hugepages with hugepage sizes not recommended". Or am I
missing something?

> +     contiguous memory space requirements. Thus, if you require hugepage
> +     sizes not equal to 2MB, then this configuration must be done outside of the DTS framework.
>
>     * System under test configuration
>
> diff --git a/dts/conf.yaml b/dts/conf.yaml
> index 8068345dd5..56c3ae6f4c 100644
> --- a/dts/conf.yaml
> +++ b/dts/conf.yaml
> @@ -35,7 +35,7 @@ nodes:
>      lcores: "" # use all the available logical cores
>      use_first_core: false # tells DPDK to use any physical core
>      memory_channels: 4 # tells DPDK to use 4 memory channels
> -    hugepages:  # optional; if removed, will use system hugepage configuration
> +    hugepages_2mb: # optional; if removed, will use system hugepage configuration
>          amount: 256

I noticed this mistake I made a while back, but this looks like a good
opportunity to point it out. Amount is used with uncountable nouns and
hugepages is countable. We should correct this mistake (rename to
number in all places, I think that's used somewhere in Linux) and this
patch could be a good place to do it. Or maybe a separate one or even
a separate patchset. What do you think, would you please fix this
while you're making hugepage changes?

>          force_first_numa: false
>      ports:
> @@ -71,7 +71,7 @@ nodes:
>          os_driver: rdma
>          peer_node: "SUT 1"
>          peer_pci: "0000:00:08.1"
> -    hugepages:  # optional; if removed, will use system hugepage configuration
> +    hugepages_2mb: # optional; if removed, will use system hugepage configuration
>          amount: 256
>          force_first_numa: false
>      traffic_generator:

<snip>

> diff --git a/dts/framework/testbed_model/node.py b/dts/framework/testbed_model/node.py
> index 74061f6262..056a031ca0 100644
> --- a/dts/framework/testbed_model/node.py
> +++ b/dts/framework/testbed_model/node.py
> @@ -97,6 +97,10 @@ def __init__(self, node_config: NodeConfiguration):
>          self.virtual_devices = []
>          self._init_ports()
>
> +    @property
> +    def _hugepage_default_size(self) -> int:
> +        return 2048
> +

I'm thinking about the placement of this and I'm kinda torn between
Node and OSSession. In Node, it makes sense as it basically means "no
matter what sort of node we're connected to, we're going to use this
hugepage size". In OSSession, it would make more sense if we need a
different hugepage size based on the OS (and possibly arch, which
could be passed to it). For now, leaving it in Node is probably
better. We can always move it if need be.

But I don't really get why it's a property. Using properties doesn't
really make sense if we're not putting any logic into it. This could
just as easily be self._hugepage_default_size = 2048 in __init__().

>      def _init_ports(self) -> None:
>          self.ports = [Port(self.name, port_config) for port_config in self.config.ports]
>          self.main_session.update_ports(self.ports)
> @@ -266,7 +270,9 @@ def _setup_hugepages(self) -> None:
>          """
>          if self.config.hugepages:
>              self.main_session.setup_hugepages(
> -                self.config.hugepages.amount, self.config.hugepages.force_first_numa
> +                self.config.hugepages.amount,
> +                self._hugepage_default_size,
> +                self.config.hugepages.force_first_numa,
>              )
>
>      def configure_port_state(self, port: Port, enable: bool = True) -> None:
> diff --git a/dts/framework/testbed_model/os_session.py b/dts/framework/testbed_model/os_session.py
> index d5bf7e0401..5d58400cbe 100644
> --- a/dts/framework/testbed_model/os_session.py
> +++ b/dts/framework/testbed_model/os_session.py
> @@ -345,7 +345,9 @@ def get_dpdk_file_prefix(self, dpdk_prefix: str) -> str:
>          """
>
>      @abstractmethod
> -    def setup_hugepages(self, hugepage_count: int, force_first_numa: bool) -> None:
> +    def setup_hugepages(
> +        self, hugepage_count: int, hugepage_size: int, force_first_numa: bool
> +    ) -> None:
>          """Configure hugepages on the node.
>
>          Get the node's Hugepage Size, configure the specified count of hugepages
> @@ -353,6 +355,7 @@ def setup_hugepages(self, hugepage_count: int, force_first_numa: bool) -> None:
>
>          Args:
>              hugepage_count: Configure this many hugepages.
> +            hugepage_size: Configure hugepages of this size (currently not used in the config)

The dot at the end of the sentence is missing. The reminder in the
parentheses is not needed, the OSSession class is decoupled from the
rest of the classes (and is thus not aware of the config).

>              force_first_numa:  If :data:`True`, configure hugepages just on the first numa node.
>          """
>
> --
> 2.44.0
>
  
Nicholas Pratte April 29, 2024, 5:26 p.m. UTC | #2
I fixed the docstring under setup_hugepages in os_session, and I also
made a quick fix to the dts.rst documentation. For the dts.rst
documentation, I think the following changes make more sense, based on
the concerns outlined:

(here is a snip of the documentation with the change I made)
"as doing so prevents accidental/over
allocation of with hugepage sizes not recommended during runtime due to
contiguous memory space requirements."

With regard to the wording used for the total number of hugepages, I
could change the wording to either "quantity" or "number;" I think
quantity makes more sense and is less ambiguous, but I'm curious what
you think. With reference to your comments about putting this in a
different patch set, I think a good argument could be made to put this
kind of a change in out currently existing patch, but I understand the
argument at both ends. Personally, I am in favor of adding this fix to
the current patch since we're renaming key/value pairs in the schema
and yaml already.

As far as the property is concerned, when Jeremy and I discussed how
to best implement this fix, he suggested that a property might make
more sense here because of the potential changes that we might make to
the default size in the future (whether that be by OS, arch or
otherwise). Ultimately, we settled on inserting a property that
returns 2048 for now with the understanding that, in the future,
developers can add logic to the property as needed. Initially, I had
the hugepage size configured in the manner you described, so the
property implementation is not something I'm adamant on. I can make
the suggested change you gave above, or alternatively, if my provided
reasoning makes sense, I can insert a comment exclaiming the existence
of the property.
  
Juraj Linkeš April 30, 2024, 8:42 a.m. UTC | #3
Please leave the context you're addressing. Reading this was a bit
confusing and also hard to understand.

On Mon, Apr 29, 2024 at 7:26 PM Nicholas Pratte <npratte@iol.unh.edu> wrote:
>
> I fixed the docstring under setup_hugepages in os_session, and I also
> made a quick fix to the dts.rst documentation. For the dts.rst
> documentation, I think the following changes make more sense, based on
> the concerns outlined:
>
> (here is a snip of the documentation with the change I made)
> "as doing so prevents accidental/over
> allocation of with hugepage sizes not recommended during runtime due to
> contiguous memory space requirements."
>

Looks like there's a typo: allocation of with hugepage sizes

> With regard to the wording used for the total number of hugepages, I
> could change the wording to either "quantity" or "number;" I think
> quantity makes more sense and is less ambiguous, but I'm curious what
> you think. With reference to your comments about putting this in a
> different patch set, I think a good argument could be made to put this
> kind of a change in out currently existing patch, but I understand the
> argument at both ends. Personally, I am in favor of adding this fix to
> the current patch since we're renaming key/value pairs in the schema
> and yaml already.
>

It looks like quantity is used with both countable and uncountable
nouns, whereas number is only used with countable nouns, so quantity
is also fine.
Let's put it into this patch series.

> As far as the property is concerned, when Jeremy and I discussed how
> to best implement this fix, he suggested that a property might make
> more sense here because of the potential changes that we might make to
> the default size in the future (whether that be by OS, arch or
> otherwise). Ultimately, we settled on inserting a property that
> returns 2048 for now with the understanding that, in the future,
> developers can add logic to the property as needed. Initially, I had
> the hugepage size configured in the manner you described, so the
> property implementation is not something I'm adamant on. I can make
> the suggested change you gave above, or alternatively, if my provided
> reasoning makes sense, I can insert a comment exclaiming the existence
> of the property.

The current expectation, based on the previous discussion, is we won't
be needing any logic, so I'd just make it a class variable (defined in
OSSession, as it's the same for all sessions). We can change it in the
future if we uncover a case where we might need it.
  

Patch

diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 47b218b2c6..71473dbb3d 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -131,7 +131,11 @@  There are two areas that need to be set up on a System Under Test:
 
      You may specify the optional hugepage configuration in the DTS config file.
      If you do, DTS will take care of configuring hugepages,
-     overwriting your current SUT hugepage configuration.
+     overwriting your current SUT hugepage configuration. Configuration of hugepages via DTS
+     allows only for allocation of 2MB hugepages, as doing so prevents accidental/over
+     allocation of hugepages and hugepages sizes not recommended during runtime due to
+     contiguous memory space requirements. Thus, if you require hugepage
+     sizes not equal to 2MB, then this configuration must be done outside of the DTS framework.
 
    * System under test configuration
 
diff --git a/dts/conf.yaml b/dts/conf.yaml
index 8068345dd5..56c3ae6f4c 100644
--- a/dts/conf.yaml
+++ b/dts/conf.yaml
@@ -35,7 +35,7 @@  nodes:
     lcores: "" # use all the available logical cores
     use_first_core: false # tells DPDK to use any physical core
     memory_channels: 4 # tells DPDK to use 4 memory channels
-    hugepages:  # optional; if removed, will use system hugepage configuration
+    hugepages_2mb: # optional; if removed, will use system hugepage configuration
         amount: 256
         force_first_numa: false
     ports:
@@ -71,7 +71,7 @@  nodes:
         os_driver: rdma
         peer_node: "SUT 1"
         peer_pci: "0000:00:08.1"
-    hugepages:  # optional; if removed, will use system hugepage configuration
+    hugepages_2mb: # optional; if removed, will use system hugepage configuration
         amount: 256
         force_first_numa: false
     traffic_generator:
diff --git a/dts/framework/config/__init__.py b/dts/framework/config/__init__.py
index 4cb5c74059..b6f820e39e 100644
--- a/dts/framework/config/__init__.py
+++ b/dts/framework/config/__init__.py
@@ -255,8 +255,8 @@  def from_dict(
             Either an SUT or TG configuration instance.
         """
         hugepage_config = None
-        if "hugepages" in d:
-            hugepage_config_dict = d["hugepages"]
+        if "hugepages_2mb" in d:
+            hugepage_config_dict = d["hugepages_2mb"]
             if "force_first_numa" not in hugepage_config_dict:
                 hugepage_config_dict["force_first_numa"] = False
             hugepage_config = HugepageConfiguration(**hugepage_config_dict)
diff --git a/dts/framework/config/conf_yaml_schema.json b/dts/framework/config/conf_yaml_schema.json
index 4731f4511d..f4d7199523 100644
--- a/dts/framework/config/conf_yaml_schema.json
+++ b/dts/framework/config/conf_yaml_schema.json
@@ -146,7 +146,7 @@ 
         "compiler"
       ]
     },
-    "hugepages": {
+    "hugepages_2mb": {
       "type": "object",
       "description": "Optional hugepage configuration. If not specified, hugepages won't be configured and DTS will use system configuration.",
       "properties": {
@@ -253,8 +253,8 @@ 
             "type": "integer",
             "description": "How many memory channels to use. Optional, defaults to 1."
           },
-          "hugepages": {
-            "$ref": "#/definitions/hugepages"
+          "hugepages_2mb": {
+            "$ref": "#/definitions/hugepages_2mb"
           },
           "ports": {
             "type": "array",
diff --git a/dts/framework/config/types.py b/dts/framework/config/types.py
index 1927910d88..016e0c3dbd 100644
--- a/dts/framework/config/types.py
+++ b/dts/framework/config/types.py
@@ -46,7 +46,7 @@  class NodeConfigDict(TypedDict):
     """Allowed keys and values."""
 
     #:
-    hugepages: HugepageConfigurationDict
+    hugepages_2mb: HugepageConfigurationDict
     #:
     name: str
     #:
diff --git a/dts/framework/testbed_model/linux_session.py b/dts/framework/testbed_model/linux_session.py
index 5d24030c3d..d0f7cfa77c 100644
--- a/dts/framework/testbed_model/linux_session.py
+++ b/dts/framework/testbed_model/linux_session.py
@@ -15,7 +15,7 @@ 
 
 from typing_extensions import NotRequired
 
-from framework.exception import RemoteCommandExecutionError
+from framework.exception import ConfigurationError, RemoteCommandExecutionError
 from framework.utils import expand_range
 
 from .cpu import LogicalCore
@@ -84,14 +84,20 @@  def get_dpdk_file_prefix(self, dpdk_prefix: str) -> str:
         """Overrides :meth:`~.os_session.OSSession.get_dpdk_file_prefix`."""
         return dpdk_prefix
 
-    def setup_hugepages(self, hugepage_count: int, force_first_numa: bool) -> None:
+    def setup_hugepages(
+        self, hugepage_count: int, hugepage_size: int, force_first_numa: bool
+    ) -> None:
         """Overrides :meth:`~.os_session.OSSession.setup_hugepages`."""
         self._logger.info("Getting Hugepage information.")
-        hugepage_size = self._get_hugepage_size()
-        hugepages_total = self._get_hugepages_total()
+        hugepages_total = self._get_hugepages_total(hugepage_size)
+        if (
+            f"hugepages-{hugepage_size}kB"
+            not in self.send_command("ls /sys/kernel/mm/hugepages").stdout
+        ):
+            raise ConfigurationError("hugepage size not supported by operating system")
         self._numa_nodes = self._get_numa_nodes()
 
-        if force_first_numa or hugepages_total != hugepage_count:
+        if force_first_numa or hugepages_total < hugepage_count:
             # when forcing numa, we need to clear existing hugepages regardless
             # of size, so they can be moved to the first numa node
             self._configure_huge_pages(hugepage_count, hugepage_size, force_first_numa)
@@ -99,13 +105,9 @@  def setup_hugepages(self, hugepage_count: int, force_first_numa: bool) -> None:
             self._logger.info("Hugepages already configured.")
         self._mount_huge_pages()
 
-    def _get_hugepage_size(self) -> int:
-        hugepage_size = self.send_command("awk '/Hugepagesize/ {print $2}' /proc/meminfo").stdout
-        return int(hugepage_size)
-
-    def _get_hugepages_total(self) -> int:
+    def _get_hugepages_total(self, hugepage_size: int) -> int:
         hugepages_total = self.send_command(
-            "awk '/HugePages_Total/ { print $2 }' /proc/meminfo"
+            f"cat /sys/kernel/mm/hugepages/hugepages-{hugepage_size}kB/nr_hugepages"
         ).stdout
         return int(hugepages_total)
 
diff --git a/dts/framework/testbed_model/node.py b/dts/framework/testbed_model/node.py
index 74061f6262..056a031ca0 100644
--- a/dts/framework/testbed_model/node.py
+++ b/dts/framework/testbed_model/node.py
@@ -97,6 +97,10 @@  def __init__(self, node_config: NodeConfiguration):
         self.virtual_devices = []
         self._init_ports()
 
+    @property
+    def _hugepage_default_size(self) -> int:
+        return 2048
+
     def _init_ports(self) -> None:
         self.ports = [Port(self.name, port_config) for port_config in self.config.ports]
         self.main_session.update_ports(self.ports)
@@ -266,7 +270,9 @@  def _setup_hugepages(self) -> None:
         """
         if self.config.hugepages:
             self.main_session.setup_hugepages(
-                self.config.hugepages.amount, self.config.hugepages.force_first_numa
+                self.config.hugepages.amount,
+                self._hugepage_default_size,
+                self.config.hugepages.force_first_numa,
             )
 
     def configure_port_state(self, port: Port, enable: bool = True) -> None:
diff --git a/dts/framework/testbed_model/os_session.py b/dts/framework/testbed_model/os_session.py
index d5bf7e0401..5d58400cbe 100644
--- a/dts/framework/testbed_model/os_session.py
+++ b/dts/framework/testbed_model/os_session.py
@@ -345,7 +345,9 @@  def get_dpdk_file_prefix(self, dpdk_prefix: str) -> str:
         """
 
     @abstractmethod
-    def setup_hugepages(self, hugepage_count: int, force_first_numa: bool) -> None:
+    def setup_hugepages(
+        self, hugepage_count: int, hugepage_size: int, force_first_numa: bool
+    ) -> None:
         """Configure hugepages on the node.
 
         Get the node's Hugepage Size, configure the specified count of hugepages
@@ -353,6 +355,7 @@  def setup_hugepages(self, hugepage_count: int, force_first_numa: bool) -> None:
 
         Args:
             hugepage_count: Configure this many hugepages.
+            hugepage_size: Configure hugepages of this size (currently not used in the config)
             force_first_numa:  If :data:`True`, configure hugepages just on the first numa node.
         """