Bulk refactoring files with regular expressions

The other week at work, my team decided to adopt the airbnb/javascript conventions for our existing monorepo, which meant that we committed to doing a lot of refactoring, since we previously and arbitrarily were using kebab-case for all of our TypeScript filenames. Obviously, it would have been better if we had committed to some coding conventions well ahead of time, but that’s not the point of this post. Most IDEs have refactoring tools that can change all references for updated filenames or variable names, but I don’t use an standard IDE as a NeoVim user. Even with an IDE, do they offer the ability to rename files using regex? I’m not sure I’ve come across one yet, but that’s not to say that they don’t exist. I know that renaming files in VSCode automatically updates any relevant import statements. That’s pretty nice, but that still requires having to type the change manually for each and every file, which is obviously tedious, time-consuming, and error-prone.

One of my fellow NeoVim-loving coworkers recently told me about a plugin called oil.nvim that re-imagines the Vim file exploring experience by utilizing the buffer for viewing and managing files one directory at a time. The power of this is the ability to edit files and folders as normal text just like any other buffer, so all of your existing plugins that aid that action are now just as useful for managing files and folders. Backspace allows you to move up the file tree, and Enter allows you to move down the tree when your cursor is on a directory. On a file, Enter simply opens the file in a buffer as you would likely expect.

You can probably see where some of this is going, but with oil.nvim, I am able to use the built-in :substitute command with regex to change the names of a bunch of files at once without having to type the names of each file.


That’s it!

Let me explain what this command is actually doing.

element description
:s shorthand for :substitute, which is the built-in find-replace in vi
% apply to all lines within the buffer (all files and folders in this case)
-\([a-z]\) LHS: match every dash (-) followed by a lowercase letter; capture that letter
\u\1 RHS: make uppercase the first letter of the first capture group

In order to change all of the files necessary, I simply have to navigate into each directory within the file structure and execute that command. It’s not perfectly automated, but it automates the more tedious part, which saves me quite a lot of time, and our directory structure is not very broad and is quite shallow. After I’ve executed the above command once, NeoVim remembers, and I’m able to execute it much quicker with subsequent executions.


This solves one of our problems, but we’re still left with a very obvious mismatched state: what about the imports within the files that reference these now-renamed files?

Solving this problem was equally as simple, but it did require me to download more modern version of sed (brew install gsed). The macOS version doesn’t have the ability to uppercase letters like :substitute does, but GNU sed does. With the new version of sed, I am now able to perform a similar substitution across a bunch of files.

gsed -r -i"" '/from ['"'"'"]\./s/-([a-z])/\u\1/g' **/*.ts

Again, that’s it.

element description
-r use extended regular expressions
-i"" save the substitutions changes to the file, but don’t save any backups ("")
/from ['"'"'"]\./ only match lines that have the text “from” followed by a single- or double-quote followed by a period
s/-([a-z])/\u\1/g you should recognize this substitute from earlier, the only difference being the lack of backslash before the parentheses due to using extended regex
**/*.ts run gsed against all files with .ts in this directory and all directories below it

The quote situation is chaotic due to how embedding quotes on the command line work in Unix. To explain what’s going on with the quotes after the from:
Since we’re encapsulating the script for the gsed command in single-quotes, it’s not simple to tell the one-liner to actually look for single-quotes. In order to accomplish this, we need to use the first single-quote to complete the quoted string, but we must immediately start another quote to keep the script going, since it’s incomplete. For this next part of the script, we use double-quotes instead, since our goal is to code for the single-quote. Within the double-quoted string, we have a single single-quote. Generally, you don’t want to encapsulate scripts for Unix commands with double-quotes due to globbing and string substitution, so we keep this double-quoted as short as possible. Now that we’ve successfully included the single-quote, we can go back to using the single-quote string for the remainder of the script for gsed. We immediately follow that up with a single double-quote to complete our character class for both a single single-quote and a single double-quote. Our codebase isn’t yet standardized (obviously…the point of this task), so some of our imports are surrounding strings with single-quotes and some are using double-quotes. For this reason, I have to make sure my regex accounts for both possibilities.

If I didn’t have to account for putting the script in the command line, it would look like this instead:

/from ['"]\./s/-([a-z])/\u\1/g

So, you can see how '"'"' converts to '. Really, I could have just used the dot wildcard to account for the single- or double-quote, and that would have really simplified the regex, but I wanted to be more precise with my regex.

That was quite a tangent for explaining that one command, and it makes the one-liner seem less “simple” as I claimed earlier, but once you get the hang of both regex and the command line, it’s really not too bad. The last thing that I’ll mention is that the . after the character class for the single- and double-quote was my solution for only matching import statements that aren’t libraries, i.e. imports of our files. Libraries don’t generally start with a . in their name, and they are also not relative paths like our developer-created files.

import dotenv from 'dotenv'
// vs
import MyController from './controller/my-controller'
import TheirService from '../service/their-service'

Anyway, the next question is how are we supposed to verify the results of these commands. Did we break anything? tsc should be enough, but just to be sure, we should probably do some regression testing. Our codebase has a requirement for 80% code coverage, but our team tries to do 90%+. Our unit test setup is comprehensive enough to do a good job of regression testing, so to ensure that my changes didn’t break anything, I ran npm run test before even making the changes to get a snapshot of the state of the unit tests.

All tests passed.
No surprises.

With that baseline, I can run the tests again after I made the filename changes using oil.nvim.

Virtually no tests pass.
No surprises.

Lastly, I run the aforementioned gsed command to update all of the import statements and run the tests one more time.

All tests passed.
At this point, I actually was a bit surprised, because these solutions don’t always solution as cleanly as you think them up in your head. There are often edge-cases that you forgot to account for, but I somehow nailed it first try.

My solution worked for this particular codebase, but I’m sure there are other setups where this wouldn’t have worked as nicely. This method of problem solving definitely requires understanding your problem space. It can be difficult or impossible to find a one-size-fits-all solution for any given problem type, so creativity is an important element to the problem solving process.

Some other checks that I did just to get some numbers:

# To see all of the relevant imports
grep -E 'from ['"'"'"]\.' **/*.ts

# To see all of the files that have those imports, using `sort -u` instead of `uniq` as a general precaution to guarantee a unique count
grep -E 'from ['"'"'"]\.' **/*.ts | awk -F: '{ print $1 }' | sort -u

# To see the count of those files
grep -E 'from ['"'"'"]\.' **/*.ts | awk -F: '{ print $1 }' | sort -u | wc -l