[v1,1/5] devtools: script to remove unused headers includes

Message ID 20211004101058.2396458-2-sean.morrissey@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series introduce IWYU |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-testing warning apply patch failure

Commit Message

Sean Morrissey Oct. 4, 2021, 10:10 a.m. UTC
  This script can be used for removing headers flagged for removal by the
include-what-you-use (IWYU) tool. The script has the ability to remove
headers from specified sub-directories or dpdk as a whole.

example usages:

Remove headers flagged by iwyu_tool output file
$ ./devtools/process_iwyu.py iwyu.out -b build

Remove headers flagged by iwyu_tool output file from sub-directory
$ ./devtools/process_iwyu.py iwyu.out -b build -d lib/kvargs

Remove headers directly piped from the iwyu_tool
$ iwyu_tool -p build | ./devtools/process_iwyu.py - -b build

Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Signed-off-by: Conor Fogarty <conor.fogarty@intel.com>
---
 devtools/process_iwyu.py | 109 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 109 insertions(+)
 create mode 100755 devtools/process_iwyu.py
  

Comments

Bruce Richardson Oct. 4, 2021, 3:07 p.m. UTC | #1
On Mon, Oct 04, 2021 at 10:10:54AM +0000, Sean Morrissey wrote:
> This script can be used for removing headers flagged for removal by the
> include-what-you-use (IWYU) tool. The script has the ability to remove
> headers from specified sub-directories or dpdk as a whole.
> 
Since it also is importing meson and calling "meson compile" it appears to
be testing the build after each removal too. I think this should be called
out, to make it clear it's not a "blind" removal of headers.

Further review comments inline below.

/Bruce

> example usages:
> 
> Remove headers flagged by iwyu_tool output file
> $ ./devtools/process_iwyu.py iwyu.out -b build
> 
> Remove headers flagged by iwyu_tool output file from sub-directory
> $ ./devtools/process_iwyu.py iwyu.out -b build -d lib/kvargs
> 
> Remove headers directly piped from the iwyu_tool
> $ iwyu_tool -p build | ./devtools/process_iwyu.py - -b build
> 
> Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
> Signed-off-by: Conor Fogarty <conor.fogarty@intel.com>
> ---
>  devtools/process_iwyu.py | 109 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 109 insertions(+)
>  create mode 100755 devtools/process_iwyu.py
> 
> diff --git a/devtools/process_iwyu.py b/devtools/process_iwyu.py
> new file mode 100755
> index 0000000000..ddc4ceafa4
> --- /dev/null
> +++ b/devtools/process_iwyu.py
> @@ -0,0 +1,109 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2021 Intel Corporation
> +#
> +
> +import argparse
> +import fileinput
> +import sys
> +from os.path import abspath, relpath, join
> +from pathlib import Path
> +from mesonbuild import mesonmain
> +
> +def args_parse():
> +    parser = argparse.ArgumentParser(description='This script can be used to remove includes which are not in use\n')
> +    parser.add_argument('-b', '--build_dir', type=str, help='Name of the build directory in which the IWYU tool was used in.', default="build")
> +    parser.add_argument('-d', '--sub_dir', type=str, help='The sub-directory to remove headers from.', default="")
> +    parser.add_argument('file', type=Path, help='The path to the IWYU log file or output from stdin.')
> +

These lines are all very long. While the text strings shouldn't be split
across lines, you can break the line across multiple ones between
parameters. I suggest checking this whole script using "flake8" to check
for style errors.

> +    args = parser.parse_args()
> +
> +    return args
> +

"args" is unneeded here. "return parse.parse_args()" is shorter. :-)

> +
> +def run_meson(args):
> +    "Runs a meson command logging output to process.log"
> +    with open('process.log', 'a') as sys.stdout:
> +        ret = mesonmain.run(args, abspath('meson'))
> +    sys.stdout = sys.__stdout__
> +    return ret
> +

I think process.log should be renamed to "process_iwyu.log" to match the
script name.
Also, it's nice to see a few functions like this with a docstring at the
start. It would be good to have a one-line summary at the start of every
fn in this file.

> +
> +def remove_includes(filename, include, dpdk_dir, build_dir):
> +    # Load in file - readlines()
> +    # loop through list once in mem -> make cpy of list with line removed
> +    # write cpy  -> stored in memory so write cpy to file then check
> +    # run test build -> call ninja on the build folder, ninja -C build, subprocess
You actually call "meson compile" rather than ninja -C build. Please make
sure the comments match the code as the code is reworked.

If you take the approach of adding a one-line string at the start of each
function, I'd suggest splitting up this comment into smaller comments
spread throughout the code, explaining each short block as it appears.
[Though I see it's reasonably well commented below as-is]

> +    # if fails -> write original back to file otherwise continue on
> +    # newlist = [ln for ln in lines if not ln.startswith(...)] filters out one element
> +    filepath = filename
> +
> +    with open(filepath, 'r+') as f:
> +        lines = f.readlines()  # Read lines when file is opened
> +
> +    with open(filepath, 'w') as f:
> +        for ln in lines:  # Removes the include passed in
> +            if ln.strip("\n") != include:
Strip without any parameters removes all whitespace, which is probably ok
here, so drop the explicit "\n".

> +                f.write(ln)
> +
> +    ret = run_meson(['compile', '-C', join(dpdk_dir, build_dir)])
> +    if (ret == 0):  # Include is not needed -> build is successful
> +        print('SUCCESS')
> +    else:
> +        # failed, catch the error
> +        # return file to original state
> +        with open(filepath, 'w') as f:
> +            f.writelines(lines)
> +            print('FAILED')
> +
> +
> +def get_build_config(builddir, condition):
> +    "returns contents of rte_build_config.h"
> +    with open(join(builddir, 'rte_build_config.h')) as f:
> +        return [ln for ln in f.readlines() if condition(ln)]
> +
> +
> +def uses_libbsd(builddir):
> +    "return whether the build uses libbsd or not"
> +    return bool(get_build_config(builddir, lambda ln: 'RTE_USE_LIBBSD' in ln))
> +
> +
> +def process(args):
> +    filename = None
> +    build_dir = args.build_dir
> +    dpdk_dir = abspath(__file__).split('/devtools')[0]
This assumes that we are using the script from the devtools dir of the copy
of DPDK we want to edit. That's probably a fair enough assumption and good
enough, though I'd like more input on it.
The other alternative is to work of the build dir, which will have the
paths to the DPDK source code directory in it.  Running "meson configure"
in the build dir will print out the source directory location at the start:

$ meson configure | head
Core properties:
  Source dir /home/bruce/dpdk.org
  Build dir  /home/bruce/dpdk.org/build


> +    directory = args.sub_dir
> +    # Use stdin if no iwyu_tool out file given
> +    logfile = abspath(args.file) if str(args.file) != '-' else args.file

Not sure this line is necessary. Assuming that we don't chdir anywhere,
using args.file directly below in call to "fileinput.input" should work
perfectly.

> +
> +    keep_str_fns = uses_libbsd(join(dpdk_dir, build_dir)) # check for libbsd
> +    if keep_str_fns:
> +        print('Warning: libbsd is present, build will fail to detect incorrect removal of rte_string_fns.h',
> +              file=sys.stderr)
> +    run_meson(['configure', dpdk_dir + "/" + build_dir, '-Dwerror=true'])  # turn on werror
> +
> +    for line in fileinput.input(logfile):
> +        if 'should remove' in line:
> +            # If the file path in the iwyu_tool output is an absolute path
> +            # it means the file is outside of the dpdk directory, therefore ignore it
> +            # Also check to see if the file is within the specified sub directory
> +            if line.split()[0] != abspath(line.split()[0]) and directory in line.split()[0]:
> +                filename = relpath(join(build_dir, line.split()[0]))
> +        elif line.startswith('-') and filename:
> +            include = '#include ' + line.split()[2]
> +            print(f"Remove {include} from {filename} ... ", end='', flush=True)
> +            if keep_str_fns and '<rte_string_fns.h>' in include:
> +                print('skipped')
> +                continue
> +            remove_includes(filename, include, dpdk_dir, build_dir)
> +        else:
> +            filename = None
> +
> +
> +def main():
> +    args = args_parse()
> +    process(args)
> +
Nit: again, args variable is unnecessary, and you can shorten to single
line if you like "process(args_parse())"

> +
> +if __name__ == '__main__':
> +    main()
> -- 
> 2.25.1
>
  
Thomas Monjalon Oct. 6, 2021, 1:37 p.m. UTC | #2
04/10/2021 12:10, Sean Morrissey:
> This script can be used for removing headers flagged for removal by the
> include-what-you-use (IWYU) tool. The script has the ability to remove
> headers from specified sub-directories or dpdk as a whole.

My fear is that it could flag an include as useless,
while it is required for a different system (BSD, Windows).
  
Sean Morrissey Oct. 6, 2021, 3:19 p.m. UTC | #3
On 06/10/2021 14:37, Thomas Monjalon wrote:
> 04/10/2021 12:10, Sean Morrissey:
>> This script can be used for removing headers flagged for removal by the
>> include-what-you-use (IWYU) tool. The script has the ability to remove
>> headers from specified sub-directories or dpdk as a whole.
> My fear is that it could flag an include as useless,
> while it is required for a different system (BSD, Windows).
>
>
Hi Thomas,

Stephen raised a similar concern. Please see Bruce's reply here: 
https://patches.dpdk.org/project/dpdk/cover/20211004101058.2396458-1-sean.morrissey@intel.com/

Thanks,

Sean.
  
Bruce Richardson Oct. 6, 2021, 4:28 p.m. UTC | #4
On Wed, Oct 06, 2021 at 04:19:46PM +0100, Morrissey, Sean wrote:
> 
> On 06/10/2021 14:37, Thomas Monjalon wrote:
> > 04/10/2021 12:10, Sean Morrissey:
> > > This script can be used for removing headers flagged for removal by the
> > > include-what-you-use (IWYU) tool. The script has the ability to remove
> > > headers from specified sub-directories or dpdk as a whole.
> > My fear is that it could flag an include as useless,
> > while it is required for a different system (BSD, Windows).
> > 
> > 
> Hi Thomas,
> 
> Stephen raised a similar concern. Please see Bruce's reply here: https://patches.dpdk.org/project/dpdk/cover/20211004101058.2396458-1-sean.morrissey@intel.com/
> 
One additional data point. Checking the output of a run I did of IWYU a
month or two ago, shows 2254 header includes that IWYU thinks should be
removed. With that sort of scale of an issue, I think automation will be
needed.

/Bruce
  
Thomas Monjalon Oct. 6, 2021, 4:44 p.m. UTC | #5
06/10/2021 18:28, Bruce Richardson:
> On Wed, Oct 06, 2021 at 04:19:46PM +0100, Morrissey, Sean wrote:
> > 
> > On 06/10/2021 14:37, Thomas Monjalon wrote:
> > > 04/10/2021 12:10, Sean Morrissey:
> > > > This script can be used for removing headers flagged for removal by the
> > > > include-what-you-use (IWYU) tool. The script has the ability to remove
> > > > headers from specified sub-directories or dpdk as a whole.
> > > My fear is that it could flag an include as useless,
> > > while it is required for a different system (BSD, Windows).
> > > 
> > > 
> > Hi Thomas,
> > 
> > Stephen raised a similar concern. Please see Bruce's reply here: https://patches.dpdk.org/project/dpdk/cover/20211004101058.2396458-1-sean.morrissey@intel.com/
> > 
> One additional data point. Checking the output of a run I did of IWYU a
> month or two ago, shows 2254 header includes that IWYU thinks should be
> removed. With that sort of scale of an issue, I think automation will be
> needed.

I agree to integrate this tool, but we must make clear there are false positives.
Please add a warning when running it.
  

Patch

diff --git a/devtools/process_iwyu.py b/devtools/process_iwyu.py
new file mode 100755
index 0000000000..ddc4ceafa4
--- /dev/null
+++ b/devtools/process_iwyu.py
@@ -0,0 +1,109 @@ 
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+#
+
+import argparse
+import fileinput
+import sys
+from os.path import abspath, relpath, join
+from pathlib import Path
+from mesonbuild import mesonmain
+
+def args_parse():
+    parser = argparse.ArgumentParser(description='This script can be used to remove includes which are not in use\n')
+    parser.add_argument('-b', '--build_dir', type=str, help='Name of the build directory in which the IWYU tool was used in.', default="build")
+    parser.add_argument('-d', '--sub_dir', type=str, help='The sub-directory to remove headers from.', default="")
+    parser.add_argument('file', type=Path, help='The path to the IWYU log file or output from stdin.')
+
+    args = parser.parse_args()
+
+    return args
+
+
+def run_meson(args):
+    "Runs a meson command logging output to process.log"
+    with open('process.log', 'a') as sys.stdout:
+        ret = mesonmain.run(args, abspath('meson'))
+    sys.stdout = sys.__stdout__
+    return ret
+
+
+def remove_includes(filename, include, dpdk_dir, build_dir):
+    # Load in file - readlines()
+    # loop through list once in mem -> make cpy of list with line removed
+    # write cpy  -> stored in memory so write cpy to file then check
+    # run test build -> call ninja on the build folder, ninja -C build, subprocess
+    # if fails -> write original back to file otherwise continue on
+    # newlist = [ln for ln in lines if not ln.startswith(...)] filters out one element
+    filepath = filename
+
+    with open(filepath, 'r+') as f:
+        lines = f.readlines()  # Read lines when file is opened
+
+    with open(filepath, 'w') as f:
+        for ln in lines:  # Removes the include passed in
+            if ln.strip("\n") != include:
+                f.write(ln)
+
+    ret = run_meson(['compile', '-C', join(dpdk_dir, build_dir)])
+    if (ret == 0):  # Include is not needed -> build is successful
+        print('SUCCESS')
+    else:
+        # failed, catch the error
+        # return file to original state
+        with open(filepath, 'w') as f:
+            f.writelines(lines)
+            print('FAILED')
+
+
+def get_build_config(builddir, condition):
+    "returns contents of rte_build_config.h"
+    with open(join(builddir, 'rte_build_config.h')) as f:
+        return [ln for ln in f.readlines() if condition(ln)]
+
+
+def uses_libbsd(builddir):
+    "return whether the build uses libbsd or not"
+    return bool(get_build_config(builddir, lambda ln: 'RTE_USE_LIBBSD' in ln))
+
+
+def process(args):
+    filename = None
+    build_dir = args.build_dir
+    dpdk_dir = abspath(__file__).split('/devtools')[0]
+    directory = args.sub_dir
+    # Use stdin if no iwyu_tool out file given
+    logfile = abspath(args.file) if str(args.file) != '-' else args.file
+
+    keep_str_fns = uses_libbsd(join(dpdk_dir, build_dir)) # check for libbsd
+    if keep_str_fns:
+        print('Warning: libbsd is present, build will fail to detect incorrect removal of rte_string_fns.h',
+              file=sys.stderr)
+    run_meson(['configure', dpdk_dir + "/" + build_dir, '-Dwerror=true'])  # turn on werror
+
+    for line in fileinput.input(logfile):
+        if 'should remove' in line:
+            # If the file path in the iwyu_tool output is an absolute path
+            # it means the file is outside of the dpdk directory, therefore ignore it
+            # Also check to see if the file is within the specified sub directory
+            if line.split()[0] != abspath(line.split()[0]) and directory in line.split()[0]:
+                filename = relpath(join(build_dir, line.split()[0]))
+        elif line.startswith('-') and filename:
+            include = '#include ' + line.split()[2]
+            print(f"Remove {include} from {filename} ... ", end='', flush=True)
+            if keep_str_fns and '<rte_string_fns.h>' in include:
+                print('skipped')
+                continue
+            remove_includes(filename, include, dpdk_dir, build_dir)
+        else:
+            filename = None
+
+
+def main():
+    args = args_parse()
+    process(args)
+
+
+if __name__ == '__main__':
+    main()