Caret (^) usage in regex with Java examples
The caret anchor(^) is used to match a character at the beginning of a string .Therefore ^abc matches abc at the beginning of a string.
Example 1- ^ to match start of string
The below program matches ‘th’ at the beginning of a string.It also matches ‘Th’ as we set our matcher to be case insensitive .
package devsought;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex5 {
public static void main(String... args) {
String str = "The chef is in the kitchen preparing a meal.";
Pattern pattern = Pattern.compile("^th", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println("Matched::" + matcher.group());
} else {
System.out.println("NO MATCH");
}
}
}
The above program outputs
Matched::Th
Explanation: Th in ‘The’ is matches since Th is at the beginning of our string. The second ‘the’ does not give a match since its not at the start of our string.We can observe this is our string was ‘chef is in the kitchen preparing a meal. Whereby the output would be NO MATCH .
Example 2-when used outside at start of brackets ^[….]+
When used in this scenario it means we try to match, at the start of the string, one of the characters inside the brackets e.g ^ [abc]+ will get a match in abefgj, aacblkja,cbegh,b12c but not get a match in dabgh,gccb,1abcc.
See below table for inputs and matches of the above regex(^[abc]+)-match at the start of the string ,one or more times(we have used quantifier +) the occurrence of a ,b or c
Input string
|
Matched result
|
Explanation
|
abefgj
|
ab
|
The first a and b match start of string.Starting at e no match as not defined in our regex
|
aacblkja
|
aacb
|
First two a’s match,followed by c and b.Starting l no match as not defined in our regex
|
cbegh
|
cb
|
c and b match as per our regex.Rest of string does not match
|
b12c
|
b
|
Only b matches.
|
dabgh
|
NO MATCH
|
d is not one of the characters we expect at start of our string
|
gccb
|
NO MATCH
|
g is not one of the characters we expect at start of our string
|
1abcc
|
NO MATCH
|
1 is not one of the characters we expect at start of our string
|
Example 3- when used inside brackets as first character e.g [ad23][^bc][e-z]+
In this scenario ,the ^ negates, that is excludes characters after it inside the brackets from being matched. Matched outputs from table below explains this.
regex([ad23][^bc][e-z]+)- Match a string that starts with either a,d,2 or 3 followed by a character that is not b or c then followed by one or more characters in the range e to z.
Input string
|
Matched result
|
Explanation
|
aafe2bh
|
aafe
|
First a matches,second a matches as its not b or c then f matches followed by e since they are in the range e-z.Matching breaks at 2 since 2 is not within range e-z .
|
cad22
|
NO MATCH
|
First character c is not in expected characters at start of string
|
dadiseatingbreadatthelounge
|
adise , ating and datthelounge
|
adise-The first d is not matched since its not expected as a starting character,second character a matches the regex expectations,the following character d matches the regex expectation since its neither b nor c.The susbsequent characters i,s and e match as they are in the range e-z.
ating- a(the second letter ‘a’) was not included in the previous match since it does not fit in the range e-z,but it can start a new match since we expect a as one of the first characters.The subsequent character t matches as a second character since its neither b nor c,i-g(i,n,g) match as they are in the range e-z.
datthelounge- the previous match breaks after the first g,because the subsequent character b is not within range e-z.Also b is not one of the start characters so we skip it.The subsequent r and e are not expected beginning characters so they are also skipped.The subsequent a is also skipped since the d that follows matches the regex but the next character a is not in the range e-z.So the subsequent d(the third d) is now one of the expected starting characters so it makes start of building a match,the following character a passes as its neither b nor c,The following t matches as its within range e-z,so do all characters after it giving us the third match
|
23dibjilm
|
3di
|
You would have expected 23 to match right?It does NOT fit.2 matches as its one of the first characters in our expectations.3 matches as its neither b nor c.However our third character d fails as its not in the range e-z and the match fails at this stage.We then move to the second character 3 which matches the expectations of the first character.The subsequent character d matches as its neither b nor c.The subsequent i matches as it’s in the range e-z.The subsequent b fails as its not within range e-z.
|
2dcfg2ak
|
2ak
|
Exercise:Using acquired logic so far,you can try to walk through how our matcher arrived at the output 2ak.
|
Sample code below with output
package devsought;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex6 {
public static void main(String... args) {
String str = "aafe2bh";
Pattern pattern = Pattern.compile("[ad23][^bc][e-z]+");
Matcher matcher = pattern.matcher(str);
boolean matchFound = false;
while (matcher.find()) {
System.out.println(matcher.group());
matchFound = true;
}
if (!matchFound) {
System.out.println("NO MATCH");
}
}
}
The output of the above is
Matched::aafe
Example 4 -when it's not the first character inside brackets e.g [abc][@^12][\w]+
In this scenario the ^ loses its special meaning and is just a character to be matched like any other. The above regex means match any string that starts with either a,b or c, followed by either @,^,1 or 2 then one or more word characters
Input string
|
Matched result
|
Explanation
|
ab^ccd
|
b^ccd
|
The first character a matches but second character b fails it as its not in our expected list of second position characters.We now start at position b which passes the regex’s expectation of first position.The second character ^ passes our expectations for this position .Starting at position c to end of string ,all characters match coz we expect at least 1 or more word characters.
|
qd^2abc@
|
NO MATCH
|
First character q does not meet our expectations for this position and it fails.Further positions do not satisfy our regex and we don’t get a match.
|
c^2abc@
|
c^2abc
|
First character c matches our regex so as the second character ^.Subsequently word characters 2, a-c satisfy .Matching breaks at @ since its not a word character.
|
abcab^12
|
b^12
|
All matches performed from abca fail because we expect second character to be @,^,1 or 2 if we had started matching from them.The second b matches as our first character followed by ^ as our second as per the regex.1 and 2 follow as they are word characters.
|
cb@k^bb
|
b@k
|
Exercise:Using acquired logic so far,you can try to walk through how our matcher arrived at the output b@k
|
Example 5-when its used as \^ inside brackets-e.g [abc][@\^12][\w]+-this regex will give same outcome as in Example 4 above.We have just added \ before the ^ .
Sample code is shown below with the output generated
package devsought;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex6 {
public static void main(String... args) {
String str = "ab^ccd";
Pattern pattern = Pattern.compile("[abc][@\\^12][\\w]+");
Matcher matcher = pattern.matcher(str);
boolean matchFound = false;
while (matcher.find()) {
System.out.println(matcher.group());
matchFound = true;
}
if (!matchFound) {
System.out.println("NO MATCH");
}
}
}
The output generated is
Matched::b^ccd