Wednesday, May 04, 2005

Repetition and groups in Regular Expressions

Category: Programming technique
Details:
Regular Expressions are a powerful way to match parts of a section of text. Matching a section of text in a group with a repetition operator match expressions as expected, will only remember the last matched value.

An example in Java:
Matcher m = Pattern.matches("([a-z])+", "abc");
if (m.matches())
    for (int i = 1; i < m.groupCount(); i++)
        System.out.println("Found: " + m.group(i));
When executed, the above will produce the following:
Found: c
Not exactly what I had in mind. When you retrieve group 1 after evaluation, you will get "a". Even though the pattern matches successfully, the "a" match is overwritten by "b", which is overwritten by "c".

To get the desired result, use the following instead (changes in red, and note the subtraction of the '+' after the group):
Matcher m = Pattern.matches("\G([a-z])", "abc");
while (m.find())
    for (int i = 1; i < m.groupCount(); i++)
        System.out.println("Found: " + m.group(i));
This should now produce the following:
Found: a
Found: b
Found: c
Where I learned it:
While attempting to extract parts of a date expression in a project I'm working on.

References:

2 comments:

Anonymous said...

How can get the repition in of the sub group which is a part of the group.

e.g:
pattern (abb)(abc){1,8}
String abbabcabc

David Peterson said...

To be honest, not sure. I think the simplest would be to use the pattern you have, and then split the second group every three characters. That will give you the set 'abc' strings in an array. Although if your pattern is more complex this won't work so well...