Using Bash to automate Rubocop fixes

At my day job, we have a medium-sized Rails project that powers various internal tools. The engineering team sat down a few months ago to align along code formatting and style guidelines for this project in an effort to encode and enforce them with the use of Rubocop, a linter for Ruby projects. Having a standard code style has many benefits: it introduces consistency across the project making for a more ergonomic experience for developers, it lessens the amount of time spent addressing style discussions in PRs, and it adds guardrails for new programmers to the project.

As the style guide was applied a year or two into the lifetime of the project, various parts of the project were in violation of our new style guide. Fixing parts of the codebase in violation of the style guide has been a slow burn for the engineering team. Updating the style for existing code has usually happened as a side effect from other, more tangible changes to the project. While we eventually want to achieve full compliance, we haven’t spent dedicated cycles on fixing outstanding issues.

Helpfully, Rubocop includes an “autocorrect” feature for a subset of style violations. This could be leveraged to accelerate style guide adherence. By letting Rubocop do a portion of the work, developers don’t have to spend as much time manually fixing out-of-date styles.

I quickly cooked up a script that finds all “autocorrectable” style violations, fixes them across the project, commits and pushes them, and then opens up a pull request.

Editor’s Note When I wrote this post, I thought tac was a command included with OS X. Turns out, it’s not, and I must’ve installed it in the past. If you need to get tac and you’re on OS X, try following this helpful SO question.

#!/usr/bin/env bash
#
# This script will run `rubocop` for a specific directory, find the "worst" offense, autocorrect offenses, and then open a PR
# This requires `hub`, which can be installed on macOS via `brew install hub`
#
# Usage: ./open-next-rubocop-pr <directory>
# Example: ./open-next-rubocop-pr spec
set -euo pipefail

if [ -z "${1:-}" ]
then
  echo "Please pass the directory to run Rubocop in as the first argument"
  exit 1
fi
dir=$1

echo "Making sure that the repo is clean..."
git diff --exit-code
git diff --cached --exit-code

if [ $current_branch_hash != $default_branch_hash ]
then
  echo "Please run this on a branch that matches the default branch for the repo"
  exit 1
fi

echo "Getting all cops that can be autocorrected..."
autocorrectable_cops=$(rubocop --show-cops | grep "# Supports --auto-correct" -A1 -B0 | sed -e 's/\# Supports --auto-correct//' -e 's/--//' -e 's/://' -e '/^$/d')
non_only_cops=( Lint/UnneededCopDisableDirective Lint/UnneededCopEnableDirective )

for non_only_cop in "${non_only_cops[@]}"
do
  autocorrectable_cops=$(echo "$autocorrectable_cops" | sed "s~$non_only_cop~~")
done
autocorrectable_cops=$(echo "$autocorrectable_cops" | sed '/^$/d')

echo "Getting all cops present in $dir..."
rubocop_output=$(rubocop $dir --format o || true) # rubocop returns a non-success exit code if there are violations; `|| true` prevents `set -e` from exiting this script
offenses_and_counts=$(echo "$rubocop_output" | tac | tail -n +3 | tac)
offenses=$(echo "$offenses_and_counts" | sed -e 's/.*  //' -e 's/-//g' -e '/^$/d')

echo "Getting a cop to fix..."
autocorrectable_offenses=$(comm -1 -2 <(echo "$offenses" | sort) <(echo "$autocorrectable_cops" | sort))
offense=$(echo "$autocorrectable_offenses" | head -n 1)

echo "Checking out a new branch..."
branch_name="rubocop-${offense}"
git checkout -b $branch_name

echo "Autocorrecting $offense..."
rubocop $dir --format q --only $offense -a

echo "Committing..."
for file in $(git diff --name-only)
do
  git add $file
  git commit -m "[rubocop] fix $offense in $file"
done

echo "Pushing..."
git push -u origin $branch_name
hub pull-request -m "[rubocop] fix $offense" -l rubocop
echo "All done!"

echo "Making sure that the repo is clean..."

We want to make sure that the local repo is clean. This is mostly a sanity check. We want to ensure that our Rubocop changes are performed in isolation so that the eventual PR that’s opened is clear and that the description is correct. As this script runs end-to-end from enacting the changes to opening a PR, we don’t want to entangle other changes to the repository.

echo "Getting all cops that can be autocorrected..."

I wasn’t able to find an easy, machine-readable way via the Rubocop CLI to list all cops that can be autocorrected. Rubocop does expose a --show-cops flag that includes information about whether or not a cop is autocorrectable, though obtaining that information requires a bit more output manipulation than I would’ve liked. For example, below is a portion of the output from rubocop --show-cops:

# Supports --auto-correct
Style/Strip:
  Description: Use `strip` instead of `lstrip.rstrip`.
  Enabled: true
  VersionAdded: '0.36'

Style/StructInheritance:
  Description: Checks for inheritance from Struct.new.
  StyleGuide: "#no-extend-struct-new"
  Enabled: true
  VersionAdded: '0.29'

Style/Strip is prepended with the line # Supports --auto-correct, indicating it’s autocorrectable. Style/StructInheritance isn’t prepended with anything, so it can’t be autocorrected.

I’m able to get a list of the violation names (i.e., Style/Strip) using a combination of grep and sed. grep uses the -A1 argument to indicate that the result should include 1 trailing line after the match, where the -B0 argument ensures 0 lines of leading context are included.

The sed command within the loop is a little funky too. Here, sed uses ~ as a delimiter instead of / for the substitute command, so that the value of the variable $non_only_cop can be used as it will include slashes (such as Lint/UselessAssignment) and would otherwise be misinterpreted by sed.

There are also some cops that can be autocorrected but can’t be run in isolation. This is an issue, because I want to only fix a single violation per PR. These are Lint/UnneededCopDisableDirective and Lint/UnneededCopEnableDirective, which both seem to point out when a comment enabling or disabling a cop is unnecessary. My guess is that when these are run in isolation, they can’t work because they don’t know whats unnecessary.

echo "Getting all cops present in $dir..."

The project has a configuration specified in rubocop.yml that enables a certain set of style rules that our project follows. In order to know which cops to fix, we needed to find which violations were present in the project. By default, Rubocop will use this configuration file to only report violations of styles enabled for the project. This was obtained by just running rubocop on the directory and specifying the format of the output to just include a count of each offense (using the flag --format [o]ffenses). The violation names present could be gleaned from that output. Running rubocop --format o resulted in:

595/595 files |================================================================================== 100 ===================================================================================>| Time: 00:00:00

98   Lint/UselessAssignment
36   Style/BlockDelimiters
20   Layout/SpaceInsideHashLiteralBraces
10   Layout/EmptyLinesAroundBlockBody
... # more actual output
1    Style/Semicolon
1    Style/UnneededInterpolation
--
290  Total

I need to do this odd looking thing $(rubocop $dir --format o || true) because I’ve use set -e (along with some other settings), which you can find near the top of the file. set -e will cause a bash script to exit immediately if a command fails. rubocop returns a non-success (non-zero) exit code if there are violations. So, || true will prevent set -e from exiting the script when there are violations (and of course there are violations, that’s what we’re trying to fix with this script), as this option will know not to trigger on failing commands that are part of conditional statements. A little hacky, but setting the option provides a much safer environment to run the script in, as we an unexpected failure of one of our commands will stop the script from proceeding.

Using tac + tail removed non-violation rows and sed plucked the violation names.

echo "Getting a cop to fix..."

The script needs to find the intersection between violations present in the project and the autocorrectable violations, so that it can know which violation it can and should automatically fix. Having both pieces of information, comm was able to find the intersection. comm is a utility to find the intersection (or the symmetric difference) between two lists. One quirk of comm is that both input lists need to be sorted in the same manner, hence the usage of sort.

From that intersection list, head -n 1 just pops the first violation name into the $offense variable.

echo "Checking out a new branch..."

We want a new branch to make these changes on so that we can open up a PR.

echo "Autocorrecting $offense..."

The bulk of the work here. We know the violation exists in the project and we know that it can be fixed. The --only flag ignores all other violations. --format [q]uiet supresses Rubocop output so that the script isn’t too noisy. Last but not least, -a enables autocorrect, so that any violations discovered are fixed.

echo "Committing..."

Thanks to our earlier confirmation that the git repo is clean, we know that all changed files are Rubocop fixes. We just iterate through all modified files and place them in their own commit.

echo "Pushing..."

With our branch fixing all instances of the violation, we open a PR using hub. hub is a wrapper around git that allows for GitHub actions to be performed. In this case, we just want to open up a PR with a title identifying which violation the PR fixes. It also adds a “rubocop” label via -l.

echo "All done!"

This script comes with a few caveats. First, not all changes will be the optimal changes. For example, take this snippet of code, which has a Layout/AlignHash violation (sourced from an actual example in our project):

player.update!(name: 'Jason Varitek',
  position: 'Catcher')

Rubocop may autocorrect this like so:

player.update!(name: 'Jason Varitek',
               position: 'Catcher')

However, the preferred version by the team might be:

player.update!(
  name: 'Jason Varitek',
  position: 'Catcher'
)

So, there may be some manually inspection of changes after usage of this script. However, this script automates a ton of the work.

It’s much easier to look at the diff generated by the PR from this script and update areas that may be fixed in a better way than having to hunt down each instance, fix it, commit it, etc. This script offloads much of the busy work and leaves the non-trivial changes to the developer. Use of this script has certainly increased the velocity of our style improvements, allowing us to work toward our goal of total style guide adherence.