But "one or more" is rather vague: in the string 123, "one or more digits" (starting from the left) could be 1, 12 or 123. It allows the engine to match one or more of the token it quantifies: \d+ can therefore match one or more digits. This behavior is called greedy.įor instance, take the + quantifier. Greedy: As Many As Possible (longest match)īy default, a quantifier tells the engine to match as many instances of its quantified token or subpattern as possible. For instance, \QC+\E+ matches all of C++++, but against C+C+ it only matches C+.
If you stick a + after such a sequence, should it apply to the whole sequence, or only to its last character? The engine treats the content of the sequence as a series of literals, so the quantifier only applies to the last character. One place where the "stick-to-the-left" rule is not immediately obvious is with the \Q…\E sequence that escapes all of the characters it contains. ✽ in (?:apple,|carrot,)+ the quantifier + applies to the subexpression (?:apple,|carrot,) ✽ in carrots? the quantifier ? applies to the character s-not to carrots ✽ in \w* the quantifier * applies to the token \w
✽ in A+ the quantifier + applies to the character A ✽ The Longest Match and Shortest Match Trapsīefore we dive into quantifier tricks and traps, let's have a quick reminder of the basics because I don't know what you've read so far and there's no shortage of incomplete regex tutorials.Ī regex quantifier such as + tells the regex engine to match a certain quantity of the character, token or subexpression immediately to its left. ✽ Quantifier Basics ( greedy / docile / lazy / helpful / possessive ) This page digs deep into the details of quantifiers and shows you the traps you need to be aware of and the tricks you need to master in order to wield them effectively.įor easy navigation, here are some jumping points to various sections of the page: Is there a bug in your regex engine?Īs it turns out, there is more to quantifiers than just "greedy" and "lazy". You may have heard that they can be "greedy" or "lazy", sometimes even "possessive"-but sometimes they don't seem to behave the way you had expected. The behavior of regex quantifiers is a common source of woes for the regex apprentice.