CodePatterns: Find & Replace for Code.
Introduction
Plain text find & replace features work well for names and single-line patterns that happen to map well to regular expressions. For more complex manipulations, these tools fall short and we need something purpose-built.
To solve this problem I introduce CodePatterns, an intuitive and expressive language for specifying code modifications.
CodePatterns works like regular find & replace, but with some extensions to make dealing with multiple lines and indentation easy, and to allow matching and manipulating arbitrary language-aware syntax elements using the lisp-like Tree-sitter query language.
It is my idea of what a find & replace feature should look like in a modern code editor.
Overview & Approach
When designing CodePatterns I had a few specific limitations of plain text and regex find & replace in mind, as well as a concrete use case to support (the one in the video). The limitations I saw were:
Takes whitespace literally. If newlines are supported, they are taken to mean exactly one newline character.
Similarly with indentation; to perform a find & replace on indented code the exact number and type of indent characters must be entered into the box.
No knowledge of syntax elements. Requires you to write an ad-hoc, slow (etc) implementation of half of the language you’re working in (‘s grammar), using regexes, every time you want to do a non-trivial automated refactoring.
Capturing groups, lookaheads, and lookbehinds are awkward in regexes. (See the
[
and]
characters in Find Expressions below for how CodePatterns does lookahead/lookbehind.)
I considered various designs for the syntax, including a scripting language-like version with explicit directives, and ended up settling on a minimal set of constructs that allows for a gradual increase in complexity starting from plain text.
The simplest form of CodePatterns is plain text find & replace:
Find
foo
Replace with
bar
//
A newline means “go to the next line of code”—CodePatterns doesn’t care exactly how many newlines there are (either in the code or in the find expression):
Find
foo\();
bar\();
Note that the opening parens in the find expression are escaped—this is necessary because a bare open paren indicates a Tree-sitter query (see below).
Replace with
foo();
baz();
//
Indentation is context-relative, and again CodePatterns is intelligent enough to know that it doesn’t matter how the indentation is done. A single CodePatterns refactoring works across multiple files with different indentation characters.
Find
foo\();
if \(condition) {
bar\();
}
Replace with
foo();
if (condition) {
baz();
}
//
This works no matter what indentation level the occurrences start at.
To add flexibility, you can include a regex:
Find
foo\();
/w+/@object.bar();
Replace with
foo();
bar(@object);
//
In the above find expression we used JavaScript literal syntax to create a regex (/
must also be escaped to match literally). Note that no capture group is needed inside the regex in order to use its value in the replacement: instead we use an @
-prefixed capture label to give it a semantic name.
When plain text and regexes are not sufficient to describe the required operation, you can use the full power of Tree-sitter queries:
Find
(expression_statement
(call_expression
(identifier) @name
(#eq? @name "foo")
)
)
(expression_statement
(call_expression
(member_expression
(identifier) @obj
(property_identifier) @method
)
)
)
Note: the outer expression_statement
wrappers are only there to capture the semicolons. An alternative would be to use either plain text or regexes to capture it:
(call_expression
(identifier) @name
(#eq? @name "foo")
);
or to make it optional:
(call_expression
(identifier) @name
(#eq? @name "foo")
)/;?/
Replace with
foo();
@method(@obj);
//
See the video for a complete walked-through example of a real-world refactoring using Tree-sitter queries.
Comparison with Other Tools
Existing tools for structural find & replace-like behaviour can be time-consuming to learn (e.g. driven by a programmatic API), require stepping out of the editor, and/or are language-specific.
CodePatterns is language-agnostic, uses a simple declarative syntax, and integrates into the editor as a drop-in replacement or alternative to plain text find & replace.
Tool | Type | API style | Languages | Live preview |
---|---|---|---|---|
CodePatterns | In-editor | Declarative | Any (Tree-sitter) | Yes |
JetBrains structural search and replace | In-editor | Declarative | Java, Kotlin and Groovy (as of 20 Jan 2023) | Yes |
ast-grep | Command line | Declarative | Any (Tree-sitter) | – |
jscodeshift | Command line | Programmatic | JavaScript | – |
Find Expressions
A find expression consists of one or more of the following:
Plain text, which matches itself.
A newline, which matches one or more newlines, skipping over whitespace-only lines.
An increase or decrease in indentation, which matches exactly that and is relative to the current context.
On its own line, a line quantifier which matches zero or more whole lines:
Quantifier Repeat Lazy/greedy *
0 or more Greedy *?
0 or more Lazy +
1 or more Greedy +?
1 or more Lazy This can be followed by an optional capture label (see below).
Laziness/greediness
A greedy line quantifier tries to include as many lines as possible in the match, whereas a lazy one tries to continue with the rest of the query after matching as few lines as possible.
Example
* @someLines
A regular expression (in JavaScript literal syntax) followed by an optional capture label with no whitespace in between, e.g.
/\w+/@functionName
.Note: in the context of a CodePatterns find expression, most regex flags don’t make sense and will not be interpreted as being part of the regex. Available flags are
i
(case-insensitive),u
(unicode), andv
(improved unicode). See MDN for more details.A Tree-sitter query which matches the text of the matching nodes, followed by an optional capture label, e.g.
(function_declaration) @fn
.[
and]
which mark the start and end of the text to replace, respectively. Either or both can be omitted, defaulting to the start and end of the match. Expressions before[
can be thought of as a lookbehind, and after the]
as a lookahead.
Capture Labels
A capture label consists of an @
followed by an alphabetic name for the capture, and makes the associated match available to use in the replacement (see Replacement Expressions).
To continue matching alphabetic literals directly after a capture label, escape the first character with a backslash:
class /w+/@name\Factory
The above query matches class
and then any class name ending in Factory
, with the first part of the name captured as @name
.
Examples
Combining literals, regular expressions, and line quantifiers to match a JavaScript function (just for illustration):
function /w+/@name\(/[^)]*/@args) { * @body }
Matching one or more JavaScript functions in a much nicer way, with a Tree-sitter query:
(function_declaration)+ @fns
Escaping
The following characters must be escaped with a backslash in literals:
\
,/
,[
,]
, and(
.@
if preceded by a regular expression or Tree-sitter query.*
and+
if at the start of a line.*
,+
, and?
if preceded by a Tree-sitter query as in the example above.
Capturing & Deleting Tree-sitter Nodes
Captured nodes within Tree-sitter queries are available in the replacement, and the names can be prefixed with a dash (e.g. @-name
) to delete those nodes from the result (i.e. they will not be there when a surrounding capture is inserted into the replacement).
Deleted nodes are available to use elsewhere in the replacement without the prefix, e.g. @name
. This allows portions of code to be selected and moved around in a semantic way.
Replacement Expressions
A replacement expression consists of one or more of the following:
Plain text, which produces itself.
A newline, which produces a newline and preserves the current indentation.
An increase or decrease in indentation, which indents or dedents relative to the current context.
A capture reference, e.g.
@captureName
, which produces the corresponding regular expression match, lines, or syntax nodes. Multi-line captures are re-indented to the current context.
To insert a literal @
in the replacement, two @
s are used (@@
).
When performing the replacement (including when removing @-
-prefixed nodes), blank lines are inserted or deleted according to context and preferences; for example spacing may be maintained between code blocks and other lines. This behaviour is left open to implementors.
Examples
Converting a JavaScript module that exports an object with an init
method, to a function that performs the body of the init
method and returns the original object with the init
method removed:
Find
module.exports = (object
(method_definition
(property_identifier) @p
(statement_block "{" (_)+ @initBody "}")
) @-init
.
"," @-c
(#eq? @p "init")
) @obj/;?/
Replace with
module.exports = function() {
@initBody
return @obj;
}
//
Converting AMD modules to ES6:
Find
define\((function
(statement_block
(_)+ @body
(return_statement "return" (_) @value)
)
)\);
Note that the outermost parens are escaped, as we’re matching the name and parens of the define()
call as plain text for simplicity. The next inner pair marks the Tree-sitter node for the function passed to define
.
Within the function we ignore the arguments and step into a statement_block
, which is the function body in { ... }
. Within that we match a repeated wildcard and capture it as the “body”. We then match a return_statement
at the same level, so @body
will contain everything in the function except the final return statement.
Finally, we match a wildcard within the return statement to capture the value of the module as @value
.
Replace with
@body
export default @value;
To generate the ES6 version we put the body, and then export default
followed by the module value.
//
Great! When can I use it?
CodePatterns is currently only implemented in Edita, as far as I know, and is not quite production-ready. If you’d like to try it, you can email me at gus@gushogg-blake.com.
If you are an implementor and would like help with adding CodePatterns to another editor or IDE, I am available for consulting work.