Vim Recipes ‣ Searching ‣ Creating Regular Expressions
You want to use a Vim regular expression, but don't know how they work.
For example, you want to search a document for a word that begins with a vowel.
Vim allows you to use regular expressions (regexps) in many areas. The Searching for Any Word recipe, for example, explains how to search a file for a regexp.
A regexp is a pattern that describes a string. We will use the /pattern/ notation for describing patterns, and the "string" notation to represent the text the pattern is being tested against.
The simplest form of pattern is a literal string, which matches that exact string. For example, /cow/ matches "cow", "Don't have a cow", and "cower".
The period (.) has special significance in a regexp. It matches any single character. So, /.ow/ also matches "cow", but also "sow", and "tow".
You can use character ranges to indicate that any one of the specified characters are acceptable. For example /[cs]ow/ would match "cow", "sow", and "undersow".
If your range consists of alphabetically or numerically consecutive characters you can specify the start character and end character separated by a hyphen. For example, to match "b", "c", "d", "e", or "f", you can use /[b-f]/. Or, an integer between 1 and 5: /[1-5]/.
You can invert character ranges so they match any character not specified. For example, /[^cs]ow/ matches any character that isn't a "c" or an "s" followed by "ow", i.e. "acknowledge", "I said "ow"!", and "bellow".
Another useful concept of regexps is repetition. If you wanted to match strings containing consecutive "o"s followed by an "i, like "cooing" and "tattooist", you could use /ooi/. If you wanted to abstract this pattern, however, to match one or more "o"s followed by an "i", you'd have a problem.
The solution is to suffix the part of the pattern that can be repeated with a metacharacter which specifies the type of repetition. A metacharacter is simply a character that has special significance in a regular expression. For example, the "+" metacharacter requires that what precedes itIn fact, it requires that the atom that precedes it occurs one or more times, but this recipe is already too complex. If you want this level of detail see :help pattern or a regular expression book. occurs either one or more times. For example, /o\+i/ matches one or more "o"s followed by an "i": "abattoir", "cooing", and "oii".
The "*" metacharacter represents any number of occurrences of the preceding character, so /o*i/ matches "zucchini", "boating", and "zooming". This time the "o" is made optional. (Given that it starts the pattern it's actually unnecessary; /i/ will match everything that it matches).
A more useful example is /[a-c]t*o\+i/ which matches either "a", "b", or "c" followed by any number of "t"s, followed by at least one "o", followed by an "i". The following words satisfy the pattern: "tattooing", "coins", and "limboing". It may not be intuitive that "tattooing" would match, so let's walk through it: The "a" satisfies /[a-c]/, the following two "t"s match /t*/, the following two "o"s match /o\+/, then the "i" matches /i/.
A key concept to grasp here is that a string matches a regexp as long as a contiguous portion of it matches. In the example above the regexp looks at the first character of "tattooing" and tries applying the pattern to it. This fails because "t" is not a member of the character class [a-c]. So it moves on to the next letter and starts again, this time it matches up to "i", as explained above, and because the pattern has now been exhausted, the rest of the string is ignored.
You can make a portion of the regexp optional (i.e. insisting that it matches 0 or 1 times) with \=. You can generalise this with the \{min,max\} notation which matches at least min times, but no more than max times. For example, /[^a-c][a-c]\{2,4\}[hero]/ matches "yachts" ("yach"), and "blabbed" ("labbe"), but doesn't match "cabbage".
Like character ranges, alternation allows you to specify a list of alternatives that can match at a given point. Whereas character ranges specify sets of characters, alternation is used for sets of strings. For example, /ing\|ed/ matches the string "ing" or the string "ed", e.g. "simpered", and "attacking". If you used a character range here, e.g. /[inged]/, the pattern would match any string that contained an "i", an "n", a "g", an "e", or a "d". i.e. it would match all the strings the alternation approach does, but also many, many more.
All the patterns so far have been allowed to match at any point in the string. That is to say, before Vim gives up on a match it will try applying the pattern at every point in the text. You can change this behaviour by using anchors: ^ matches the start of a line, while $ matches the end. So, /^\s\=\uo/ matches a line that begins with an optional white space character, which is followed by an uppercase letter, which is followed by an "o". The following strings will all match: " Popes are religious", "Roman", and "Soviet Union".
You can combine the two anchors to require that the whole line matches the pattern. For example, /^\uo\%(v|ma\).\+[rnt]\$/ will match "Tomahawk thrown", "November" and "Soviet", but will reject "Soviet Union" or "During November".
The features described above are common to most regexp implementations. Vim offers some extensions, though, that users familiar with other regexp implementations may not be aware of.
By default regexps are case sensitive. That is to say /cow/ will not match "Cow". You can make all patterns ignore case with :set ignorecase. To change the case sensitivity for a particular pattern surround the relevant portions with \c (to ignore case from this point on) and/or \C to respect case from this point on. For example, /\ccow/ matches "cow", "coW", and "Cow". However, /\cco\Cw/ matches "COw" and "cow", but not "COW". The \c makes the "co" case insensitive, then the \C makes the "w" case sensitive.
Some characters in a regexp have a special significance and don't match themselves literally in the string. For example, /^foo/ matches a line starting with "foo"; it doesn't match "^foo". To match a special character you need to precede it with a backslash, For example \^ matches "^", \$ matches "$", \. matches ".", etc.
Of particular note in Vim is \n, which matches a newline character, \r which matches a carriage return character. and \t which matches a <Tab>.
Vim supports backreferences which allow you to refer to part of a match later in the same match. For example, /\([a-z]\)\1/ matches a lowercase letter followed by the same character that just matched. This would match "zoom", and "seeing". The parenthesised portion of the pattern is a group, and the backreference (\n) refers to the nth group. So, /\([a-z]\)\([a-z]\)\2\1/ matches two lowercase letters, followed by the second one again, then the first one again. This matches strings like "abba".