CodePatterns: Find & Replace for Code.

Introduction

Plain text find & replace features work well for names and single-line patterns that happen to map well to regular expressions. For more complex manipulations, these tools fall short and we need something purpose-built.

To solve this problem I introduce CodePatterns, an intuitive and expressive language for specifying code modifications.

CodePatterns works like regular find & replace, but with some extensions to make dealing with multiple lines and indentation easy, and to allow matching and manipulating arbitrary language-aware syntax elements using the lisp-like Tree-sitter query language.

It is my idea of what a find & replace feature should look like in a modern code editor.

Overview & Approach

When designing CodePatterns I had a few specific limitations of plain text and regex find & replace in mind, as well as a concrete use case to support (the one in the video). The limitations I saw were:

  • Takes whitespace literally. If newlines are supported, they are taken to mean exactly one newline character.

    Similarly with indentation; to perform a find & replace on indented code the exact number and type of indent characters must be entered into the box.

  • No knowledge of syntax elements. Requires you to write an ad-hoc, slow (etc) implementation of half of the language you’re working in (‘s grammar), using regexes, every time you want to do a non-trivial automated refactoring.

  • Capturing groups, lookaheads, and lookbehinds are awkward in regexes. (See the [ and ] characters in Find Expressions below for how CodePatterns does lookahead/lookbehind.)

I considered various designs for the syntax, including a scripting language-like version with explicit directives, and ended up settling on a minimal set of constructs that allows for a gradual increase in complexity starting from plain text.

The simplest form of CodePatterns is plain text find & replace:

Find

foo

Replace with

bar

//

A newline means “go to the next line of code”—CodePatterns doesn’t care exactly how many newlines there are (either in the code or in the find expression):

Find

foo\();
bar\();

Note that the opening parens in the find expression are escaped—this is necessary because a bare open paren indicates a Tree-sitter query (see below).

Replace with

foo();
baz();

//

Indentation is context-relative, and again CodePatterns is intelligent enough to know that it doesn’t matter how the indentation is done. A single CodePatterns refactoring works across multiple files with different indentation characters.

Find

foo\();

if \(condition) {
	bar\();
}

Replace with

foo();

if (condition) {
	baz();
}

//

This works no matter what indentation level the occurrences start at.

To add flexibility, you can include a regex:

Find

foo\();

/w+/@object.bar();

Replace with

foo();

bar(@object);

//

In the above find expression we used JavaScript literal syntax to create a regex (/ must also be escaped to match literally). Note that no capture group is needed inside the regex in order to use its value in the replacement: instead we use an @-prefixed capture label to give it a semantic name.

When plain text and regexes are not sufficient to describe the required operation, you can use the full power of Tree-sitter queries:

Find

(expression_statement
	(call_expression
		(identifier) @name
		(#eq? @name "foo")
	)
)

(expression_statement
	(call_expression
		(member_expression
			(identifier) @obj
			(property_identifier) @method
		)
	)
)

Note: the outer expression_statement wrappers are only there to capture the semicolons. An alternative would be to use either plain text or regexes to capture it:

(call_expression
	(identifier) @name
	(#eq? @name "foo")
);

or to make it optional:

(call_expression
	(identifier) @name
	(#eq? @name "foo")
)/;?/

Replace with

foo();

@method(@obj);

//

See the video for a complete walked-through example of a real-world refactoring using Tree-sitter queries.

Comparison with Other Tools

Existing tools for structural find & replace-like behaviour can be time-consuming to learn (e.g. driven by a programmatic API), require stepping out of the editor, and/or are language-specific.

CodePatterns is language-agnostic, uses a simple declarative syntax, and integrates into the editor as a drop-in replacement or alternative to plain text find & replace.

Tool Type API style Languages Live preview
CodePatterns In-editor Declarative Any (Tree-sitter) Yes
JetBrains structural search and replace In-editor Declarative Java, Kotlin and Groovy (as of 20 Jan 2023) Yes
ast-grep Command line Declarative Any (Tree-sitter)
jscodeshift Command line Programmatic JavaScript

Find Expressions

A find expression consists of one or more of the following:

  • Plain text, which matches itself.

  • A newline, which matches one or more newlines, skipping over whitespace-only lines.

  • An increase or decrease in indentation, which matches exactly that and is relative to the current context.

  • On its own line, a line quantifier which matches zero or more whole lines:

    Quantifier Repeat Lazy/greedy
    * 0 or more Greedy
    *? 0 or more Lazy
    + 1 or more Greedy
    +? 1 or more Lazy

    This can be followed by an optional capture label (see below).

    Laziness/greediness

    A greedy line quantifier tries to include as many lines as possible in the match, whereas a lazy one tries to continue with the rest of the query after matching as few lines as possible.

    Example

    * @someLines
  • A regular expression (in JavaScript literal syntax) followed by an optional capture label with no whitespace in between, e.g. /\w+/@functionName.

    Note: in the context of a CodePatterns find expression, most regex flags don’t make sense and will not be interpreted as being part of the regex. Available flags are i (case-insensitive), u (unicode), and v (improved unicode). See MDN for more details.

  • A Tree-sitter query which matches the text of the matching nodes, followed by an optional capture label, e.g. (function_declaration) @fn.

  • [ and ] which mark the start and end of the text to replace, respectively. Either or both can be omitted, defaulting to the start and end of the match. Expressions before [ can be thought of as a lookbehind, and after the ] as a lookahead.

Capture Labels

A capture label consists of an @ followed by an alphabetic name for the capture, and makes the associated match available to use in the replacement (see Replacement Expressions).

To continue matching alphabetic literals directly after a capture label, escape the first character with a backslash:

class /w+/@name\Factory

The above query matches class and then any class name ending in Factory, with the first part of the name captured as @name.

Examples

  1. Combining literals, regular expressions, and line quantifiers to match a JavaScript function (just for illustration):

    function /w+/@name\(/[^)]*/@args) {
    	* @body
    }
  2. Matching one or more JavaScript functions in a much nicer way, with a Tree-sitter query:

    (function_declaration)+ @fns

Escaping

The following characters must be escaped with a backslash in literals:

  • \, /, [, ], and (.
  • @ if preceded by a regular expression or Tree-sitter query.
  • * and + if at the start of a line.
  • *, +, and ? if preceded by a Tree-sitter query as in the example above.

Capturing & Deleting Tree-sitter Nodes

Captured nodes within Tree-sitter queries are available in the replacement, and the names can be prefixed with a dash (e.g. @-name) to delete those nodes from the result (i.e. they will not be there when a surrounding capture is inserted into the replacement).

Deleted nodes are available to use elsewhere in the replacement without the prefix, e.g. @name. This allows portions of code to be selected and moved around in a semantic way.

Replacement Expressions

A replacement expression consists of one or more of the following:

  • Plain text, which produces itself.

  • A newline, which produces a newline and preserves the current indentation.

  • An increase or decrease in indentation, which indents or dedents relative to the current context.

  • A capture reference, e.g. @captureName, which produces the corresponding regular expression match, lines, or syntax nodes. Multi-line captures are re-indented to the current context.

To insert a literal @ in the replacement, two @s are used (@@).

When performing the replacement (including when removing @--prefixed nodes), blank lines are inserted or deleted according to context and preferences; for example spacing may be maintained between code blocks and other lines. This behaviour is left open to implementors.

Examples

Converting a JavaScript module that exports an object with an init method, to a function that performs the body of the init method and returns the original object with the init method removed:

Find

module.exports = (object
	(method_definition
		(property_identifier) @p
		(statement_block "{" (_)+ @initBody "}")
	) @-init
	.
	"," @-c
	(#eq? @p "init")
) @obj/;?/

Replace with

module.exports = function() {
	@initBody
	
	return @obj;
}

//

Converting AMD modules to ES6:

Find

define\((function
	(statement_block
		(_)+ @body
		(return_statement "return" (_) @value)
	)
)\);

Note that the outermost parens are escaped, as we’re matching the name and parens of the define() call as plain text for simplicity. The next inner pair marks the Tree-sitter node for the function passed to define.

Within the function we ignore the arguments and step into a statement_block, which is the function body in { ... }. Within that we match a repeated wildcard and capture it as the “body”. We then match a return_statement at the same level, so @body will contain everything in the function except the final return statement.

Finally, we match a wildcard within the return statement to capture the value of the module as @value.

Replace with

@body

export default @value;

To generate the ES6 version we put the body, and then export default followed by the module value.

//

Great! When can I use it?

CodePatterns is currently only implemented in Edita, as far as I know, and is not quite production-ready. If you’d like to try it, you can email me at gus@gushogg-blake.com.

If you are an implementor and would like help with adding CodePatterns to another editor or IDE, I am available for consulting work.