[v2,1/2] devtools: script to check meson indentation of lists

Message ID 20210426105403.226004-1-bruce.richardson@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v2,1/2] devtools: script to check meson indentation of lists |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Bruce Richardson April 26, 2021, 10:54 a.m. UTC
  This is a script to fix up minor formatting issues in meson files.
It scans for, and can optionally fix, indentation issues and missing
trailing commas in the lists in meson.build files. It also detects,
and can fix, multi-line lists where more than one entry appears on a
line.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 devtools/dpdk_meson_check.py | 125 +++++++++++++++++++++++++++++++++++
 1 file changed, 125 insertions(+)
 create mode 100755 devtools/dpdk_meson_check.py
  

Comments

Burakov, Anatoly April 26, 2021, 1:40 p.m. UTC | #1
On 26-Apr-21 11:54 AM, Bruce Richardson wrote:
> This is a script to fix up minor formatting issues in meson files.
> It scans for, and can optionally fix, indentation issues and missing
> trailing commas in the lists in meson.build files. It also detects,
> and can fix, multi-line lists where more than one entry appears on a
> line.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

<snip>

> +def split_code_comments(line):
> +    'splits a line into a code part and a comment part, returns (code, comment) tuple'
> +    if line.lstrip().startswith('#'):
> +        return ('', line)
> +    elif '#' in line and '#include' not in line:  # catch 99% of cases, not 100% > +        idx = line.index('#')
> +        while (line[idx - 1].isspace()):
> +            idx -= 1
> +        return line[:idx], line[idx:]


I think this could be simplified with regex:

# find any occurrences of '#' but only if it's not an '#include'
if not re.search(r'#(?!include)', line)
     return line, ''
return line.split('#', maxsplit=1)

> +    else:
> +        return (line, '')
> +
> +
> +def setline(contents, index, value):
> +    'sets the contents[index] to value. Returns the line, along with code and comments part'
> +    line = contents[index] = value
> +    code, comments = split_code_comments(line)
> +    return line, code, comments
> +
> +
> +def check_indentation(filename, contents):
> +    '''check that a list or files() is correctly indented'''
> +    infiles = False
> +    inlist = False
> +    edit_count = 0
> +    for lineno, line in enumerate(contents):
> +        code, comments = split_code_comments(line)

Nitpicking, but maybe instead of calling strip() all over the place, 
just count the number of spaces and strip right at the outset? E.g. 
something like:

stripped = code.strip()
line_indent = len(code) - len(stripped)

You can then reason about indent levels by comparing stripped to code 
afterwards, and avoid doing this:

> +            # skip further subarrays or lists
> +            if '[' in code or ']' in code:
> +                continue
> +            if not code.startswith(indent) or code[len(indent)] == ' ':

Opting to just check the indent size you calculated initially. Unless 
i'm missing something :)

You could also increment edit_count if `calculated indent + stripped` is 
equal to `code`. Seems easier logic than raw string manipulation you're 
going for here...

<snip>

> +def process_file(filename, fix):
> +    '''run checks on file "filename"'''
> +    if VERBOSE:
> +        print(f'Processing {filename}')
> +    with open(filename) as f:
> +        contents = [ln.rstrip() for ln in f.readlines()]

So any trailing whitespace gets automatically and silently fixed?

> +
> +    if check_indentation(filename, contents) > 0 and fix:
> +        print(f"Fixing {filename}")
> +        with open(filename, 'w') as f:
> +            f.writelines([f'{ln}\n' for ln in contents])

Something seems suspect here. So, if `fix` is *not* specified, the 
script just opens the file, reads it, and... does nothing else?
  
Bruce Richardson April 26, 2021, 2:05 p.m. UTC | #2
On Mon, Apr 26, 2021 at 02:40:25PM +0100, Burakov, Anatoly wrote:
> On 26-Apr-21 11:54 AM, Bruce Richardson wrote:
> > This is a script to fix up minor formatting issues in meson files.
> > It scans for, and can optionally fix, indentation issues and missing
> > trailing commas in the lists in meson.build files. It also detects,
> > and can fix, multi-line lists where more than one entry appears on a
> > line.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> 
> <snip>
> 
> > +def split_code_comments(line):
> > +    'splits a line into a code part and a comment part, returns (code, comment) tuple'
> > +    if line.lstrip().startswith('#'):
> > +        return ('', line)
> > +    elif '#' in line and '#include' not in line:  # catch 99% of cases, not 100% > +        idx = line.index('#')
> > +        while (line[idx - 1].isspace()):
> > +            idx -= 1
> > +        return line[:idx], line[idx:]
> 
> 
> I think this could be simplified with regex:
> 
> # find any occurrences of '#' but only if it's not an '#include'
> if not re.search(r'#(?!include)', line)
>     return line, ''
> return line.split('#', maxsplit=1)

Not sure that is simpler, and just splitting on '#' is actually not what we
want either.

Firstly, while r'#(?!include)' is not a massively complex regex, just
checking for "'#' in line and '#include' not in line" is just easier to
read for most mortals.

In terms of the split, I did initially do as you have here and split on
'#', but we don't actually want that, because we want to preserve the
whitespace in the line before the comment too - as part of the comment, not
the code. This is why after finding the '#' we walk backwards to find the
end of the code and find that as the split point. It then saves us worrying
about any strips() breaking any comment alignment the user has explicitly
set up. Not using split also means that we can just merge the strings back
with '+' rather than having to use "'#'.join()".

> 
> > +    else:
> > +        return (line, '')
> > +
> > +
> > +def setline(contents, index, value):
> > +    'sets the contents[index] to value. Returns the line, along with code and comments part'
> > +    line = contents[index] = value
> > +    code, comments = split_code_comments(line)
> > +    return line, code, comments
> > +
> > +
> > +def check_indentation(filename, contents):
> > +    '''check that a list or files() is correctly indented'''
> > +    infiles = False
> > +    inlist = False
> > +    edit_count = 0
> > +    for lineno, line in enumerate(contents):
> > +        code, comments = split_code_comments(line)
> 
> Nitpicking, but maybe instead of calling strip() all over the place, just
> count the number of spaces and strip right at the outset? E.g. something
> like:
> 
> stripped = code.strip()
> line_indent = len(code) - len(stripped)
> 
> You can then reason about indent levels by comparing stripped to code
> afterwards, and avoid doing this:
> 
> > +            # skip further subarrays or lists
> > +            if '[' in code or ']' in code:
> > +                continue
> > +            if not code.startswith(indent) or code[len(indent)] == ' ':
> 
> Opting to just check the indent size you calculated initially. Unless i'm
> missing something :)
> 
> You could also increment edit_count if `calculated indent + stripped` is
> equal to `code`. Seems easier logic than raw string manipulation you're
> going for here...
> 
> <snip>

Interesting. That could be a good approach alright. If I do a V3 (not
guaranteed for this release) I can try taking that idea on board.

> 
> > +def process_file(filename, fix):
> > +    '''run checks on file "filename"'''
> > +    if VERBOSE:
> > +        print(f'Processing {filename}')
> > +    with open(filename) as f:
> > +        contents = [ln.rstrip() for ln in f.readlines()]
> 
> So any trailing whitespace gets automatically and silently fixed?
> 
Hadn't actually thought of that, but yes, that will happen if --fix is
given and other changes are made to the file. Ideally, that should be fixed
to "non-silently" do so, but I'd view it as low priority since other tools
tend to be good at flagging trailing whitespace issues anyway.

> > +
> > +    if check_indentation(filename, contents) > 0 and fix:
> > +        print(f"Fixing {filename}")
> > +        with open(filename, 'w') as f:
> > +            f.writelines([f'{ln}\n' for ln in contents])
> 
> Something seems suspect here. So, if `fix` is *not* specified, the script
> just opens the file, reads it, and... does nothing else?
> 
No, it prints out all the errors without actually fixing them.

Regards,
/Bruce
  
Burakov, Anatoly April 26, 2021, 2:48 p.m. UTC | #3
On 26-Apr-21 3:05 PM, Bruce Richardson wrote:
> On Mon, Apr 26, 2021 at 02:40:25PM +0100, Burakov, Anatoly wrote:
>> On 26-Apr-21 11:54 AM, Bruce Richardson wrote:
>>> This is a script to fix up minor formatting issues in meson files.
>>> It scans for, and can optionally fix, indentation issues and missing
>>> trailing commas in the lists in meson.build files. It also detects,
>>> and can fix, multi-line lists where more than one entry appears on a
>>> line.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> ---
>>
>> <snip>
>>
>>> +def split_code_comments(line):
>>> +    'splits a line into a code part and a comment part, returns (code, comment) tuple'
>>> +    if line.lstrip().startswith('#'):
>>> +        return ('', line)
>>> +    elif '#' in line and '#include' not in line:  # catch 99% of cases, not 100% > +        idx = line.index('#')
>>> +        while (line[idx - 1].isspace()):
>>> +            idx -= 1
>>> +        return line[:idx], line[idx:]
>>
>>
>> I think this could be simplified with regex:
>>
>> # find any occurrences of '#' but only if it's not an '#include'
>> if not re.search(r'#(?!include)', line)
>>      return line, ''
>> return line.split('#', maxsplit=1)
> 
> Not sure that is simpler, and just splitting on '#' is actually not what we
> want either.
> 
> Firstly, while r'#(?!include)' is not a massively complex regex, just
> checking for "'#' in line and '#include' not in line" is just easier to
> read for most mortals.
> 
> In terms of the split, I did initially do as you have here and split on
> '#', but we don't actually want that, because we want to preserve the
> whitespace in the line before the comment too - as part of the comment, not
> the code. This is why after finding the '#' we walk backwards to find the
> end of the code and find that as the split point. It then saves us worrying
> about any strips() breaking any comment alignment the user has explicitly
> set up. Not using split also means that we can just merge the strings back
> with '+' rather than having to use "'#'.join()".

Fair enough so!

> 
>>
>>> +    else:
>>> +        return (line, '')
>>> +
>>> +
>>> +def setline(contents, index, value):
>>> +    'sets the contents[index] to value. Returns the line, along with code and comments part'
>>> +    line = contents[index] = value
>>> +    code, comments = split_code_comments(line)
>>> +    return line, code, comments
>>> +
>>> +
>>> +def check_indentation(filename, contents):
>>> +    '''check that a list or files() is correctly indented'''
>>> +    infiles = False
>>> +    inlist = False
>>> +    edit_count = 0
>>> +    for lineno, line in enumerate(contents):
>>> +        code, comments = split_code_comments(line)
>>
>> Nitpicking, but maybe instead of calling strip() all over the place, just
>> count the number of spaces and strip right at the outset? E.g. something
>> like:
>>
>> stripped = code.strip()
>> line_indent = len(code) - len(stripped)
>>
>> You can then reason about indent levels by comparing stripped to code
>> afterwards, and avoid doing this:
>>
>>> +            # skip further subarrays or lists
>>> +            if '[' in code or ']' in code:
>>> +                continue
>>> +            if not code.startswith(indent) or code[len(indent)] == ' ':
>>
>> Opting to just check the indent size you calculated initially. Unless i'm
>> missing something :)
>>
>> You could also increment edit_count if `calculated indent + stripped` is
>> equal to `code`. Seems easier logic than raw string manipulation you're
>> going for here...
>>
>> <snip>
> 
> Interesting. That could be a good approach alright. If I do a V3 (not
> guaranteed for this release) I can try taking that idea on board.
> 
>>
>>> +def process_file(filename, fix):
>>> +    '''run checks on file "filename"'''
>>> +    if VERBOSE:
>>> +        print(f'Processing {filename}')
>>> +    with open(filename) as f:
>>> +        contents = [ln.rstrip() for ln in f.readlines()]
>>
>> So any trailing whitespace gets automatically and silently fixed?
>>
> Hadn't actually thought of that, but yes, that will happen if --fix is
> given and other changes are made to the file. Ideally, that should be fixed
> to "non-silently" do so, but I'd view it as low priority since other tools
> tend to be good at flagging trailing whitespace issues anyway.

Yeah, i think there's not a lot of trailing whitespace there in the 
first place, so probably a non issue.

> 
>>> +
>>> +    if check_indentation(filename, contents) > 0 and fix:
>>> +        print(f"Fixing {filename}")
>>> +        with open(filename, 'w') as f:
>>> +            f.writelines([f'{ln}\n' for ln in contents])
>>
>> Something seems suspect here. So, if `fix` is *not* specified, the script
>> just opens the file, reads it, and... does nothing else?
>>
> No, it prints out all the errors without actually fixing them.
> 

Oh, right, check_indentation will run first. Never mind!

In any case,

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
Thomas Monjalon May 4, 2021, 1:05 p.m. UTC | #4
26/04/2021 12:54, Bruce Richardson:
> This is a script to fix up minor formatting issues in meson files.
> It scans for, and can optionally fix, indentation issues and missing
> trailing commas in the lists in meson.build files. It also detects,
> and can fix, multi-line lists where more than one entry appears on a
> line.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  devtools/dpdk_meson_check.py | 125 +++++++++++++++++++++++++++++++++++

Renamed to check-meson.py and added in MAINTAINERS.

Series applied, thanks.
  
David Marchand May 4, 2021, 1:34 p.m. UTC | #5
On Tue, May 4, 2021 at 3:05 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 26/04/2021 12:54, Bruce Richardson:
> > This is a script to fix up minor formatting issues in meson files.
> > It scans for, and can optionally fix, indentation issues and missing
> > trailing commas in the lists in meson.build files. It also detects,
> > and can fix, multi-line lists where more than one entry appears on a
> > line.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  devtools/dpdk_meson_check.py | 125 +++++++++++++++++++++++++++++++++++
>
> Renamed to check-meson.py and added in MAINTAINERS.
>
> Series applied, thanks.

The default behavior is not that friendly (to me).
I have build directories in my sources with stuff from older branches
where the coding fixes are not backported.
This results in the check complaining on all those directories.

Can we have a default behavior where only committed files are considered?
Maybe filtering with git ls-files **/meson.build.
  

Patch

diff --git a/devtools/dpdk_meson_check.py b/devtools/dpdk_meson_check.py
new file mode 100755
index 000000000..29f788796
--- /dev/null
+++ b/devtools/dpdk_meson_check.py
@@ -0,0 +1,125 @@ 
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+
+'''
+A Python script to run some checks on meson.build files in DPDK
+'''
+
+import sys
+import os
+from os.path import relpath, join
+from argparse import ArgumentParser
+
+VERBOSE = False
+
+
+def scan_dir(path):
+    '''return meson.build files found in path'''
+    for root, dirs, files in os.walk(path):
+        if 'meson.build' in files:
+            yield(relpath(join(root, 'meson.build')))
+
+
+def split_code_comments(line):
+    'splits a line into a code part and a comment part, returns (code, comment) tuple'
+    if line.lstrip().startswith('#'):
+        return ('', line)
+    elif '#' in line and '#include' not in line:  # catch 99% of cases, not 100%
+        idx = line.index('#')
+        while (line[idx - 1].isspace()):
+            idx -= 1
+        return line[:idx], line[idx:]
+    else:
+        return (line, '')
+
+
+def setline(contents, index, value):
+    'sets the contents[index] to value. Returns the line, along with code and comments part'
+    line = contents[index] = value
+    code, comments = split_code_comments(line)
+    return line, code, comments
+
+
+def check_indentation(filename, contents):
+    '''check that a list or files() is correctly indented'''
+    infiles = False
+    inlist = False
+    edit_count = 0
+    for lineno, line in enumerate(contents):
+        code, comments = split_code_comments(line)
+        if not code.strip():
+            continue
+        if code.endswith('files('):
+            if infiles:
+                raise(f'Error parsing {filename}:{lineno}, got "files(" when already parsing files list')
+            if inlist:
+                print(f'Error parsing {filename}:{lineno}, got "files(" when already parsing array list')
+            infiles = True
+            indent_count = len(code) - len(code.lstrip(' '))
+            indent = ' ' * (indent_count + 8)  # double indent required
+        elif code.endswith('= ['):
+            if infiles:
+                raise(f'Error parsing {filename}:{lineno}, got start of array when already parsing files list')
+            if inlist:
+                print(f'Error parsing {filename}:{lineno}, got start of array when already parsing array list')
+            inlist = True
+            indent_count = len(code) - len(code.lstrip(' '))
+            indent = ' ' * (indent_count + 8)  # double indent required
+        elif infiles and (code.endswith(')') or code.strip().startswith(')')):
+            infiles = False
+            continue
+        elif inlist and (code.endswith(']') or code.strip().startswith(']')):
+            inlist = False
+            continue
+        elif inlist or infiles:
+            # skip further subarrays or lists
+            if '[' in code or ']' in code:
+                continue
+            if not code.startswith(indent) or code[len(indent)] == ' ':
+                print(f'Error: Incorrect indent at {filename}:{lineno + 1}')
+                line, code, comments = setline(contents, lineno, indent + line.strip())
+                edit_count += 1
+            if not code.endswith(','):
+                print(f'Error: Missing trailing "," in list at {filename}:{lineno + 1}')
+                line, code, comments = setline(contents, lineno, code + ',' + comments)
+                edit_count += 1
+            if len(code.split(',')) > 2:  # only one comma per line
+                print(f'Error: multiple entries per line in list at {filename}:{lineno +1}')
+                entries = [e.strip() for e in code.split(',') if e.strip()]
+                line, code, comments = setline(contents, lineno,
+                                               indent + (',\n' + indent).join(entries) +
+                                               ',' + comments)
+                edit_count += 1
+    return edit_count
+
+
+def process_file(filename, fix):
+    '''run checks on file "filename"'''
+    if VERBOSE:
+        print(f'Processing {filename}')
+    with open(filename) as f:
+        contents = [ln.rstrip() for ln in f.readlines()]
+
+    if check_indentation(filename, contents) > 0 and fix:
+        print(f"Fixing {filename}")
+        with open(filename, 'w') as f:
+            f.writelines([f'{ln}\n' for ln in contents])
+
+
+def main():
+    '''parse arguments and then call other functions to do work'''
+    global VERBOSE
+    parser = ArgumentParser(description='Run syntax checks on DPDK meson.build files')
+    parser.add_argument('-d', metavar='directory', default='.', help='Directory to process')
+    parser.add_argument('--fix', action='store_true', help='Attempt to fix errors')
+    parser.add_argument('-v', action='store_true', help='Verbose output')
+    args = parser.parse_args()
+
+    VERBOSE = args.v
+    for f in scan_dir(args.d):
+        process_file(f, args.fix)
+
+
+if __name__ == "__main__":
+    main()