Using Java regexes #
Escaping \
#
A regex in Java is a string.
Recall that within a Java string, a \
character must be escaped (i.e. written \\
) to be read as a normal character.
So in a Java string that represents a regex, every \
should be written \\
.
Example. The string
"[A-Z]\\d+"
represents the regex[A-Z]\d+
Hint. This is another good reason for using a regex validation tool (e.g. regex101) to test and debug a regex, before incorporating it into a program.
Warning. We saw earlier that when
\
is treated as a normal character in a regex, it must be escaped (\\
). Therefore in this specific case, the Java string should contain\\\\
.
Example. The string
"\\d+\\\\[A-Z]"
represents the regex\d+\\[A-Z]
Regexes and String
instance methods
#
Some instance methods of the class String
take a regex as input.
Among these:
boolean matches(String regex)
returns true iff the whole string belongs to the language described by the regex. Or equivalently if there is a match for the regex^
regex$
.
String input = "ab";
// Outputs false
System.out.println(input.matches("a"));
// Outputs true
System.out.println(input.matches("ab|a"));
// Outputs true
System.out.println(input.matches("a|ab"));
String replaceFirst(String regex, String replacement)
replaces the best first match with the input replacement string:
String input = "aba";
// Outputs "ba"
System.out.println(input.replaceFirst("ab|a", "b"));
// Outputs "bba"
System.out.println(input.replaceFirst("a|ab", "b"));
String replaceAll(String regex, String replacement)
replaces all (successive) best first matches with the input replacement string:
String input = "aba";
// Outputs "bb"
System.out.println(input.replaceAll("ab|a", "b"));
// Outputs "bbb"
System.out.println(input.replaceAll("a|ab", "b"));
For both
replaceFirst
andreplaceAll
, the replacement string can use a reference to a group, written$
$n$, where $n$ is the group number:String input = "The windows are open. The apples are green."; String output = input.replaceAll("(\\w+)s\\s+are", "$1 is"); // Outputs "The window is open. The apple is green." System.out.println(output);
Reminder. Strings in Java are immutable. So when the methods
replaceFirst
andreplaceAll
successfully “modify” the string, they return a different object.
String[] split()
splits the input string around the (successive) best first matches:
String input = "This is weird.\n"+
"Or not."
// Contains [ "This", "is", "weird.", "Or", "not." ]
String[] output = input.split("\\s+");
java.util.regex
#
The native package java.util.regex
contains among other the following classes:
Pattern
#
A Pattern is a regex.
A Pattern can be created with the static method Pattern Pattern.compile(Sring regex)
.
Matcher
#
A Matcher is a “regex engine” for a specific regex and a specific string.
A Matcher
can be created out of a Pattern
, with the instance method Matcher matcher(String inputString)
of the class Pattern
.
Pattern pattern = Pattern.compile("\\d+[a-z]*");
Matcher matcher = pattern.matcher("Alice787@unibz");
Among others, the class Matcher
provides the following instance methods:
-
boolean matches()
,String replaceFirst()
andString replaceAll()
behave analogously to their counterpart for the classString
, described above. -
boolean find()
tries to match the next best first match. If this method succeeds, then information can be retrieved about the matched segment (see below). The next call tofind()
will find the following best first match (if any).
Pattern pattern = Pattern.compile("[A-Z][a-z]*");
Matcher matcher = pattern.matcher("Alice and Bob are exhausted.");
// Outputs true and matches the segment with word "Alice"
System.out.println(matcher.find());
// Outputs true and matches the segment with word "Bob"
System.out.println(matcher.find());
// Outputs false
System.out.println(matcher.find());
-
String group(int i)
returns the substring captured by Group $i$ in the latest match. Group0
stands for the whole regex. -
String group()
is equivalent togroup(0)
. -
int start()
returns the start index (included) of the latest matched segment. -
int end()
returns the end index (excluded) of the latest matched segment.
Pattern pattern = Pattern.compile("([A-Z])[a-z]*");
Matcher matcher = pattern.matcher("Alice and Bob are exhausted.");
//Find the best first match
matcher.find();
// Outputs "Alice"
System.out.println(matcher.group());
// Outputs "A"
System.out.println(matcher.group(1));
// Outputs 0
System.out.println(matcher.start());
// Outputs 5
System.out.println(matcher.end());
//Find the next best first match
matcher.find();
// Outputs "Bob"
System.out.println(matcher.group());
// Outputs "B"
System.out.println(matcher.group(1));
public Sream<MatchResult> results()
returns all (successive) best first matches.
Pattern pattern = Pattern.compile("[A-Z][a-z]*");
Matcher matcher = pattern.matcher("Alice and Bob are exhausted.");
// Contains two match results:
// - one for the segment with word "Alice",
// - one for the segment with word "Bob"
List<MatchResult> matches = matcher.results().toList();
MatchResult
#
A MatchResult is a matched segment.
The class MatchResult
provides (among others) the instance methods
String group(int i)
,
String group()
,
int start()
and
int end()
,
which behave analogously to their counterparts in the class Matcher
.
Pattern pattern = Pattern.compile("[A-Z][a-z]*");
Matcher matcher = pattern.matcher("Alice and Bob are exhausted.");
// Get all best first matches
List<MatchResult> matches = matcher.results().toList();
// Outputs "Alice"
System.out.println(matches.get(0).group());
// Outputs "Bob"
System.out.println(matches.get(1).group());