Regular Expressions

Regular Expressions 09/15/10 Andrew Butcher abutcher@afrolegs.com Some Examples

Simple Character Classes Brackets denote a character class. Simply surrounding your class with brackets will match your input once. Supplying a + will signify that your class will be repeated one or more times. [a] Just a, once. [a-z]+ Any lower case letter one or more times. [a-zA-Z]+ Any upper or lower case letter one or more times. [a-zA-Z0-9]+ Any upper or lowercase letters or numbers one or more times.

Matching Strings .* Will allow you to match the and it's closing tag .]]>

The Kleene Star or Wildcard * The wildcard * means that the class will be repeated zero or more times. The . stands for absolutely anything, once. [a-z]*[0-9]+ Zero or more a-z characters followed by one or more numbers. .* Meaning anything at all zero or more times. .at This will match hat, cat, bat, fat, etc.

Searching At the Beginning or End of Strings A caret ^ means match this class at the beginning of my string. The ^ must preceed your class. A dollar sign $ means match this class at the end of my string. The $ must follow your class. ^[A-Z].* Caret means 'begins with' so... any upper case letter followed by anything zero or more times. .*;$ Anything ending in a semicolon.

Logical Expressions Placing a bar | in your expression will signify a logical OR. ([a-b]+|[0-1]+) One or more a-b's OR one or more 1-0's.

Practical Applications

Matching an IP Address This regular expression will match an IP Address. It's not a valid IP Address but it looks like one, at least. ([0-9]{1,3}[.]){3}([0-9]{1,3}) Curly braces signify the amount of times this class will occur. Also note that you have to wrap special character such as the period in braches [.] although you could also terminate them with a back-slash like, \:

Matching a MAC Address See if you can do this yourself, using the IP Address example. Valid MAC Addresses include: C0:FF:EE:BA:BE:EE DE:AD:BE:EF:BA:BE

sed: Stream Editor sed stands for stream editor and allows you to apply regular expressions to files to make quick fixes. whitespace.txt abutcher@shell003:~$ cat whitespace.txt <---- Notice the leading whitespace. abutcher@shell003:~$ sed -r 's/^[ ]+//' whitespace.txt <---- Notice the leading whitespace. abutcher@shell003:~$ sed -ri 's/^[ ]+//' whitespace.txt abutcher@shell003:~$ cat whitespace.txt <---- Notice the leading whitespace. ]]>

Regular Expressions in Java

RegexTest.java package edu.wvu.mix.abutcher; import java.io.IOException; import java.util.regex.Pattern; import jline.ConsoleReader; public class RegexTest { public static void main(String[] args) throws IOException { ConsoleReader reader = new ConsoleReader(); String line; while ((line = reader.readLine("$> ")) != null) { if (Pattern.matches(".*;$", line)) { System.out.println("\'" + line + "\' ends in a semicolon."); } if (Pattern.matches("([0-9]{1,3}[.]){3}([0-9]{1,3})", line)) { System.out.println("\'" + line + "\' looks like an IP address."); } if (Pattern.matches("^[A-Z].*", line) ) { System.out.println("\'" + line + "\' begins with a capital letter."); } } } }