CRegExp

CRegExp class

Available since ver.1.3.1
CRegExp(string pattern, string flags)

Creating instance: local regex = CRegExp(pattern, flags);
pattern - regular expression,
flags:
i - PCRE_CASELESS
If this modifier is set, letters in the pattern match both upper and lower case letters.

m - PCRE_MULTILINE
By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). This is the same as Perl. When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m modifier. If there are no "\n" characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect.

s - PCRE_DOTALL
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

x - PCRE_EXTENDED
If this modifier is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class, and characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored. This is equivalent to Perl's /x modifier, and makes it possible to include commentary inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern.

u - PCRE_UTF8|PCRE_UCP
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. Five and six octet UTF-8 sequences are regarded as invalid.

g - global

Class methods:

bool search(string stuff) Do a search on the given string.
This method does the actual search on the given string.
@param "stuff" the string in which you want to search for something.
@return boolean true if the regular expression matched. false if not.

bool match(string stuff) Same as match
bool searchWithOffset(string stuff, int offset ) Do a search on the given string beginning at the given offset.
This method does the actual search on the given string.
@param "stuff" the string in which you want to search for something.
@param "OffSet" the offset where to start the search.
@return boolean true if the regular expression matched. false if not.

string getMatch(int pos) Get a substring at a known position.
This method throws an out-of-range exception if the given position
is invalid.
@param "pos" the position of the substring to return. Identical to perl's $1..$n.
@return the substring at the given position.
Example:

local mysub = regex.getMatch(0); 
Get the first substring that matched the expression in the "regex" object.

string replace(string piece, string with) Replace parts of a string using regular expressions.
This method is the counterpart of the perl s/// operator.
It replaces the substrings which matched the given regular expression
(given to the constructor) with the supplied string.

@param "piece" the string in which you want to search and replace.
@param "with" the string which you want to place on the positions which match the expression (given to the constructor).

bool matched() Test if a search was successfull.
This method must be invoked after calling search().
@return boolean true if the search was successfull at all, or false if not.

int matchesCount() Get the number of substrings generated by RegExp.
@return the number of substrings generated by RegExp.
array getSubStrings() Return an array of substrings, if any. array split(string piece) split a string into pieces
This method will split the given string into a array
of strings using the compiled expression (given to the constructor).
@param "piece" The string you want to split into it's parts.
@return an array of strings
array splitWithLimit(string piece, int limit) split a string into pieces
This method will split the given string into a array
of strings using the compiled expression (given to the constructor).

@param "piece" The string you want to split into it's parts.
@param "limit" the maximum number of elements you want to get back from split().
@return an array of strings
array splitWithLimitOffset(string piece, int limit, int start_offset) split a string into pieces
This method will split the given string into a array
of strings using the compiled expression (given to the constructor).

@param "piece" The string you want to split into it's parts.
@param "limit" the maximum number of elements you want to get back from split().
@param "start_offset" at which substring the returned array should start.
@return an array of strings
array splitWithLimitStartEndOffset(string piece, int limit, int start_offset, int end_offset) split a string into pieces
This method will split the given string into a array
of strings using the compiled expression (given to the constructor).
@param "piece" The string you want to split into it's parts.
@param "limit" the maximum number of elements you want to get back from split().
@param "start_offset" at which substring the returned array should start.
@param "end_offset" at which substring the returned array should end.
@return an array of strings
int getMatchStart(int pos) Get the start position of a substring within the searched string.
This method returns the character position of the first character of
a substring withing the searched string.
@param "pos" the position of the substring.
@return the integer character position of the first character of a substring.
Positions are starting at 0.

Example:

local regex = CRegExp("([0-9]+)", "");               // search for numerical characters
regex.search("The 11th september.");  // do the search on this string
local day = regex.getMatch(0);      // returns "11"
local pos = regex.getMatchStart(0);   // returns 4, because "11" begins at the
                                      // 4th character inside the search string.

int getMatchEnd(int pos) Get the end position of a substring within the searched string.
This method returns the character position of the last character of
a substring withing the searched string.
@param "pos" the position of the substring.
@return the integer character position of the last character of a substring.
Positions are starting at 0.

Example:

local regex = CRegExp("([0-9]+)", "");               // search for numerical characters
regex.search("The 11th september.");  // do the search on this string
local day = regex.getMatch(0);      // returns "11"
local pos = regex.getMatchEnd(0);     // returns 5, because "11" ends at the
                                       // 5th character inside the search string
int getEntireMatchStart() Get the start position of the entire match within the searched string.
This method returns the character position of the first character of
the entire match within the searched string.
@return the integer character position of the first character of the entire match.
Example:
local regex = CRegExp("([0-9]+)\\s([a-z]+)","");     // search for the date(makes 2 substrings
regex.search("The 11th september.");  // do the search on this string
local pos = regex.getEntireMatchStart();    // returns 4, because "11th september" begins at the
                                     // 4th character inside the search string.

int getEntireMatchEnd() Get the end position of the entire match within the searched string.
This method returns the character position of the last character of
the entire match within the searched string.
@return the integer character position of the last character of the entire match.

Example:

local regex = CRegExp("([0-9]+)([a-z]+)\\s([a-z]+)","");     // search for the date(makes 2 substrings
regex.search("The 11th september.");  // do the search on this string
local pos = regex.getMatchEnd();      // returns 17, because "11th september", which is
                                           // the entire match, ends at the
                                      // 17th character inside the search string.

bool findAll(string stuff) Perform a global regular expression match
It works like php function 'preg_match_all' with flags = PREG_SET_ORDER.
@param "stuff" the string in which you want to search for something.
@return array (2-dimensional) or an empty array if match failed.
Example:
local regex = CRegExp("(http|https|ftp)://([\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)","imcu");
local matches = regex.findAll(GetFileContents("d:\\links.html"));
for ( local i = 0; i< matches.len(); i++ ) {
	for ( local j = 0; j< matches[i].len(); j++ ) {
		print("match["+i +"]["+j+"]=" + matches[i][j]+"\r\n");
	}
}
links.html content:
<p><a href="https://getsharex.com" target="_blank">ShareX<br></a> 
<a href="https://code.google.com/p/sharexmod/" target="_blank">ShareXmod</a>
<a href="https://getsharex.com" target="_blank"><br></a><a href="https://app.prntscr.com/" target="_blank">Lightshot</a>
<br> <a href="http://www.nteworks.com/picpick/en/" target="_blank">PickPick</a><br> 
<a href="http://monosnap.com/" target="_blank">monosnap</a> 
Output:
match[0][0]=https
match[0][1]=getsharex.com
match[1][0]=https
match[1][1]=code.google.com/p/sharexmod/
match[2][0]=https
match[2][1]=getsharex.com
match[3][0]=https
match[3][1]=app.prntscr.com/
match[4][0]=http
match[4][1]=www.nteworks.com/picpick/en/
match[5][0]=http
match[5][1]=monosnap.com/

 

 

Tests:

local regex = CRegExp("([0-9]+)", "");  // search for numerical characters
regex.search("The 11th september.");  // do the search on this string
print( "regex.matchesCount() returned " + regex.matchesCount().tostring()+"\r\n");
print("regex.getMatch(0) returned " + regex.getMatch(0)+"\r\n");
print( "regex.getMatchStart(0) returned " + regex.getMatchStart(0).tostring()+"\r\n");
print("regex.getMatchEnd(0) returned "+ regex.getMatchEnd(0).tostring()+"\r\n");
  
regex = CRegExp("([0-9]+)([a-z]+)\\s([a-z]+)", "i");     // search for the date(makes 2 substrings
regex.search("The 11th september."+"\r\n");  // do the search on this string
print( "regex.getEntireMatchStart() returned "+ regex.getEntireMatchStart().tostring()+"\r\n");
print( "regex.getEntireMatchEnd() returned "+ regex.getEntireMatchEnd().tostring()+"\r\n");
local substrings = regex.getSubStrings();
for ( local i = 0; i< substrings.len(); i++ ) {
	print("match["+i +"]=" + substrings[i]+"\r\n");
}
Output:
regex.matchesCount() returned 1
regex.getMatch(0) returned 
regex.getMatchStart(0) returned 4
regex.getMatchEnd(0) returned 5
regex.getEntireMatchStart() returned 4
regex.getEntireMatchEnd() returned 17
match[0]=11
match[1]=th
match[2]=september