Regular Expressions: To Compile or not to Compile ?

When using regular expressions one often has to decide whether to compile the expressions before applying them.
Below is a simple test I ran and the results.

Results

1-1.000.000 Calls

re.py module

Conclusion

1. Despite several tests, the very first call is always somewhat slower that others.
This could be due to the RE module file includes and initialisation.

2. Reusing pre-compiled regular expressions does indeed improve the execution speed (1.5-2x)
My previous tests showed much bigger differences, but this turned out to be nothing but a test error.
Python string concatenation “got” in the way.

regex2_max = regex2+str(max) <= this is expensive, especially when doing it in a loop the current tests now ensures that the same number of string concatenation are done in on 3 times of tests. 3. Pre-compiling for each call everytime is meaningless because re.match compiles the regular expression before matching any ways. 4. The timing differences between pre-compiled or uncompiled regular expressions is much smaller(1.5-2x), than I even anticipated 5. Based on the numbers above, pre-compilation of regular expressions is an over-optimization UNLESS: a. one has more than 100 different regular expression and b. they are ALL used very frequently - as in 1.000 of times in a few minutes. It may be interesting to actually count the number of times the regular expression are accessed beforehand.

References

1. Python Regular Expression Documentation: http://docs.python.org/2/library/re.html
2. Is Regex Pre-Compilation Worth It: http://stackoverflow.com/questions/452104/is-it-worth-using-pythons-re-compile

About Gugulethu Ncube

IT enthusiast getting things done.