Introduction to Regular Expressions for Beginners

by on May 9th, 2011 2 comments

Regular expressions or Regex are a set of characters and operators, which allows us to perform complex searches for particular characters, complete words, or patterns within a text. When you are using a very basic text editor, which does not support regular expressions, then all you can do is perform simple searches.

For instance, in almost every text editor, you can easily find a specific word and replace it with another one, but with the help of regular expressions, you can search for words with say four vowels, words that start with the letter “z,” or words, which are seven letters long. This comes in handy in many situations, especially if you spend a lot of time building web sites, working with text files extensively, or working as a computer programmer.

The regular expressions are simply a set of characters and operators, which are used by many advanced text editors and programming languages – Perl, Awk, Ruby, and Tcl support regular expressions fully, while other programming languages have libraries and modules, which provide access to regular expressions. The easiest way to understand how regular expressions work and just how powerful they are is by looking at a few basic examples, however, before we proceed, it should be noted that despite the fact that regular expressions are language-independent, they could be implemented differently by the different text editors, programming languages, or software.

What can I do with regular expressions?

You can manipulate strings on a mass scale! If you have a .doc file, containing a table with customers’ names and you want to transpose these names with the help of regular expressions, you can do so by simply entering a set of characters and wildcards, and replacing it with another set of characters and wildcards, and do that in a single go. For instance, in any Word document you can transpose names with middle initials and turn:

Janet X. Robinson

Tanya F. Johnston

Peter Adams

Into

Robinson, Janet X.

Johnston, Tanya F.

Adams, Peter

This is accomplished by using the Replace function, pasting (*) ([! ]@)^13 into to the “Find” field, pasting \2, \1^p into the “Replace” field, selecting “Use wildcards,” and pressing the “Replace” button. How easy is that? Even if you have hundreds of names in the table, the replacement will still take a split second, while doing the same text manipulation by hand could take hours.
Here is another quick example – if you work as an accountant for a large US company and need to send an annual report to the EU branch, you might need to change the date format from YYYY/MM/DD to DD/MM/YYYY. If the software that you use supports regular expressions, you can do the replacement and compile the report in a matter of seconds. However, by using regular expressions, you can do much more complex and powerful searches, and find and manipulate alphanumeric, hexadecimal, and binary numbers.

In order to find a character, number, or pattern, the regular expressions use specific syntax, consisting of literals, a group of special characters known as metacharacters, anchors, repeats, and more. The metacharacters are eleven in total and include the asterisk, the plus sing, the dollar sign, the square bracket, the backslash, the period, the pipe, both opening and closing round bracket, the caret, and the question mark. One of the most widely used metacharacters is the dot, which matches almost any character and can help you construct and run some powerful searches. The question mark, on the other hand, makes a token optional and the regular expression colou?r will help you find both the words “color” and “colour” in any text. Regular expressions also have character classes, which are used to match one out of two or more characters. Here is a quick example – using [sz] will match either of the two characters and the regular expression reali[sz]e will help you find both the UK and US spelling of the word “realize,” “realize” and “realise.”

If you want to learn more about regular expressions, then you can start with this simple and easy to understand, one page tutorial. It is packed with examples, both at basic and intermediate level, and even touches on some of the advanced features such as non-greedy quantifiers, pattern-match modifiers, changing backreference behavior, naming backreferences, and lookahead assertions.

Another great Regex tutorial can be found here. It is split into ten parts, it is very well written, and it comes with a handful of common examples.

If you want to learn anything and everything about regular expressions, then this is probably the most complete tutorial that you can find online. The site covers a lot of ground, contains useful links to books and external sources, and is very easy to navigate as well.

You do not have to know regular expressions inside out in order to be able to use them – there are literally thousands of examples, which you can find easily online, and quite a few text editors and programs, which support them. Listed below are some very popular Windows and Mac OS X programs, text editors, servers, and utilities, which support regular expressions:

  • Adobe Dreamweaver
  • Apache HTTP Server
  • Elvis
  • GNU Grep
  • Microsoft Visual Studio
  • Microsoft Word
  • Notepad++
  • NoteTab
  • OpenOffice.org Base
  • OpenOffice.org Calc
  • OpenOffice.org Writer
  • Oracle Database
  • TextMate
  • Vim

Moreover, if you expect to be working with regular expressions often, then you might consider using some of the great tools, listed below:

RegexBuddy is an excellent all-in-one Regex builder and tester, which  supports almost all Regex flavors, including Java, Perl, .NET, PCRE, ECMAScript, Python, Ruby, XML Schema, XPath, Tcl ARE, POSIX BRE/ERE, and GNU BRE/ERE.

Regular Expression tester is a free Firefox plug-in

This plug-in is ideal for developers, who use the popular browser, and comes with a multitude of useful features such as special characters display, color highlighting of found expressions, global and multiline search, case sensitive search, smart bracket interpretation, and many more.

Regular Expression Generator for HTML Element

This is a Ruby program, which you can use when testing web applications or scraping HTML code, and which will help you generate regular expressions that match various HTML elements.

RegExr

An online tool, which will help you learn, write, and test regular expression right from within your web browser. It comes with code hinting, real time results, and built-in Regex guide; the tool has a desktop version too, which can be downloaded, and ran on Mac, Windows, or Linux.