Encoding knowledge with automation scripts
In the previous post I shared a couple small scripts I use with Git. I automate a lot of the repetition in my development workflow to make tasks not just easier but also more reliable. Automation encodes knowledge. Scripts give me a place to capture things I learn.
Below are descriptions of several scripts I use on a daily basis.
Most of my scripts aren't portable. They're tailored to the specific tools I use, such as Tmux and Alfred, they often require particular libraries and environment variables, and while some are Bash, most of them favor Clojure (Babashka). Rather than sharing the scripts themselves, this post focuses on the problems that prompted them and what I automated.
ws & friends
I use Git worktrees extensively, checking out work-in-progress branches and code to review in separate folders and Tmux sessions. Here's one version of the script. My current iteration is called ws and takes two or three parameters:
ws <type> <name> [<ticket-number>]
type can be feat, fix, task, or cve. name is the name to use for the branch and Tmux session. If there's a Jira ticket associated with the work, ticket-number captures that.
ws creates a new Git branch using the given information (e.g. feature/some-branch-name-TICKET-5555), then checks out the branch in a new Git worktree and creates a Tmux session for it. It also captures the data in a couple of files local to the worktree. One is a local draft of the PR, which is Markdown but has YAML frontmatter where I specify the title of the PR, any labels, reviewers, and the ticket number. Those will be used when creating the PR (see gh-open below). Some of that information is also captured in a .workspace.yaml file, which can be used to re-launch the the Tmux session. Both .workspace.yaml and the PR file are created by ws and ignored by Git via a global gitignore.
ws creates the workspace files, but opening the workspace is performed by wso ("workspace open"), which ws invokes but which I can also invoke manually on any directory with a .workspace.yaml file. Initializing the workspace is done by wsi ("workspace initialize"), which sends the init command (see below) to the first Tmux pane in the workspace.
Similar to ws is wsc ("workspace checkout"), useful for branches that exist but for which I don't have a local worktree.
To review someone else's pull request locally, I use wsr ("workspace review"). It pulls the URL for the PR from the clipboard, fetches the PR from GitHub's API, switches to the local directory for that repo, then invokes wsc for the PR's branch.
When I'm done with a workspace, I run wsd ("workspace destroy") to check for pending changes, and if there aren't any, the script switches the Tmux client to a Tmux session that won't be deleted, deletes the Git worktree, and kills the workspace's Tmux session.
init and friends
I have an init-<repo-name> script for each repo I work in. They do all the setup for a new worktree, such as creating a Python virtual env, activating it, and installing dependencies. Most repos have a Makefile or install.sh script which does the heavy-lifting, so these init-* scripts capture what's specific to my workflow, such as using pyenv rather than some other virtual environment tool.
init is just a wrapper script that 1) inspects the Git repo to figure out which init-* script to run, 2) verifies the init-* exists, and 3) runs it. ws launches this automatically in new workspaces, but on occasion I invoke it manually.
cve
A few weeks ago I addressed a lengthy backlog of security vulnerabilities. I chose to tackle them in bulk—rather than chip away here and there—so I could focus on developing a system and encoding it in a script, smoothing the process for future fixes. The result was the script cve with several subcommands.
cve check reads the system clipboard to get the URL for a Jira ticket, parses the ticket's description, finds the name of the affected Docker image and the package that needs to be upgraded, then opens a URL to our internal artifact repository that lists the image's current vulnerabilities. Occasionally vulnerabilities have been fixed by other changes and the ticket remains open. When that happens, cve close will comment on the Jira ticket with a link to the vulnerabilities page, assign it to me, and close the ticket. If the vulnerability still exists, cve ws finds the repo that produces the affected Docker image and invokes ws cve with the CVE's ID.
cve repro builds the Docker image, then runs docker scout cves and picks out the sections mentioning the CVE's identifier. This is the most complicated subcommand. Some of the Dockerfiles need tweaks to build locally (for which I keep a set of patches), several of the images depend on other images having been built first, some images are built by specific Makefiles, and building can require particular environment variables. All that information is tracked in a data structure in the script.
cve locations runs docker scout cves with the --format sarif option, which emits JSON, then the script plucks out the buried .physicalLocation fields to get the full paths to the files that introduce the vulnerability. Where the file indicates whether the vulnerability is due to an NPM dependency, a Python dependency, or a package installed in the Docker image itself.
testit
Different projects have different ways to run tests, and even a single repo can have separate test runners for backend unit tests, frontend unit tests, and integration tests. When I want to run a single test, I never remember whether the runner wants a line number or the name of the test. Rather than trying to keep it all straight, I use testit.
The script takes a path and line number. It checks the path against several regex patterns to find the function that can run the test. For test runners that take a path and line number, the function calls babashka.process/shell with the proper directory, environment variables, and shell command. For test runners that want the name of a test, the script first parses the test file by shelling out to tree-sitter with tree-sitter parse <path> --xml, then it finds the appropriate concrete syntax tree expression by pattern-matching with Meander. For example, in a Python unit test suite, it looks for methods starting with test_ and classes starting with Test, picking the last one that's above or at the given line number, then it constructs the TestClass::test_method identifier that the command line test runner expects.
Rather than invoking testit directly, I have a NeoVim keystroke that grabs the current filename and line number and submits a testit command in a separate Tmux pane. That makes it easy to trigger tests from my editor, as well as re-run tests without having to find the same place in the editor.
gh-open & gh-update
When a pull request is ready to open, I polish up my local Markdown draft and run gh-open. That parses the YAML frontmatter out of the Markdown, constructs a title, submits the PR via GitHub's API, requests reviewers and adds labels, and opens the PR in my default browser. If the PR needs any screenshots, I edit the description via the browser. I've aliased gh-open to a gh subcommand with gh alias set open '!gh-open $@'.
I also have gh-update, which takes the latest version of the local Markdown file and updates the pull request. If the PR has images, they're be lost; probably I should add a script that copies the current PR description into my local file.
qlink
I use Espanso as a text expander, most often for generating links to PRs and tickets. Those links go in Slack, personal notes, or comments in GitHub and Jira, and depending on context they need different formats. For that I use qlink ("quick link"). qlink takes a small specifier that names the source of the URL and the output format to use. The p. specifier means "get the URL from the clipboard and output a short-form Markdown link". j5555, means "create a long-form Markdown link for Jira TICKET-5555". To generate a long-form link, the script fetches the title from the appropriate API.
Here's a poor-man's version to get a long-form Markdown link from the URL of a pull request:
gh pr view <URL> \
--json title,url,headRepository,number \
--jq '"[\(.headRepository.name) #\(.number) - \(.title)](\(.url))"'
gh-pr-list
This script I trigger from Alfred. I have a workflow that runs it for a handful of different keywords, one for each repo I work in. Besides taking the repo name, the script takes a shorthand search query, constructs a URL for a GitHub pull request or list of pull requests, then opens the URL in my browser. The supported shorthands are:
- no query: open the list of all PRs in the repo
- digits: open the PR with the given number
- starts with
#: search for PRs with the given label - alphanumeric characters: search for PRs by the given author
grase
I prefer rebasing my branches to merging in the mainline branch. Rebasing keeps my commits together, and only rarely have teammates complained about the change in branch reference. I always use interactive rebasing so I can verify what commits will be moved. Originally grase was a Bash alias for git rebase -i <mainline-branch>, but since .workspace.yaml captures the target branch, grase is now a script that reads the .workspace.yaml field and rebases on the correct branch. This is especially useful when I have stacked PRs, where changing one branch requires rebasing downstream branches.
grase is an example of how these automation scripts feed off each other. The .workspace.yaml file I originally created for wso has useful information for other scripts.
checkin
These scripts cross repo boundaries, so none of them live in any of the projects I use them for. Instead I keep them in a few external repos (plus some private repos). After tweaking things here and there through the day, it's tough to remember what changed where, so checkin reads directory paths from a CHECKIN_PATHS env var, then goes through the directories one by one, spawning a subshell in the given directory. Then I stage code in the Git index. When I'm done, I exit the subshell with Ctrl-D, and checkin commits the changes with a generic message and pushes to the remote repository.
Running checkin is an easy routine to follow at the end of the day, and it reminds me of some fun I had.
The script takes an optional argument to read from a different environment variable, allowing me to check in specific groups of repos. checkin I run daily; checkin weekly I run on Friday to capture changes in less frequently touched repos.
If you have scripts you use to automate your workflow or make it more fun, I'd be happy to hear about them.