libcamgm
Classes | Public Types | Public Member Functions | Private Attributes | List of all members
ca_mgm::PerlRegEx Class Reference

#include <PerlRegEx.hpp>

Classes

struct  match_t
 POSIX RegEx like structure for captured substring offset pair. More...
 

Public Types

typedef std::vector< int > MatchVector
 
typedef std::vector< match_tMatchArray
 POSIX RegEx like match array with captured substring offsets. More...
 

Public Member Functions

 PerlRegEx ()
 
 PerlRegEx (const std::string &regex, int cflags=0)
 
 PerlRegEx (const PerlRegEx &ref)
 
 ~PerlRegEx ()
 
PerlRegExoperator= (const PerlRegEx &ref)
 
bool compile (const std::string &regex, int cflags=0)
 
int errorCode ()
 
std::string errorString () const
 
std::string patternString () const
 
int compileFlags () const
 
bool isCompiled () const
 
std::vector< std::string > capture (const std::string &str, size_t index=0, size_t count=0, int eflags=0)
 
std::string replace (const std::string &str, const std::string &rep, bool global=false, int eflags=0)
 
std::vector< std::string > split (const std::string &str, bool empty=false, int eflags=0)
 
std::vector< std::string > grep (const std::vector< std::string > &src, int eflags=0)
 
bool match (const std::string &str, size_t index=0, int eflags=0) const
 
bool execute (MatchVector &sub, const std::string &str, size_t index=0, size_t count=0, int eflags=0)
 
bool execute (MatchArray &sub, const std::string &str, size_t index=0, size_t count=0, int eflags=0)
 

Private Attributes

pcre * m_pcre
 
int m_flags
 
int m_ecode
 
std::string m_error
 
std::string m_rxstr
 

Detailed Description

Perl compatible Regular Expression wrapper class and utility functions.

The PerlRegEx implementation depends on avaliability of the pcre library.

Consult the pcre_compile(3), pcre_exec(3) and pcreapi(3) manual pages for informations about details of the pcre implementation.

Note
This class does NOT wrap all features provided by the pcre library!

Member Typedef Documentation

typedef std::vector<match_t> ca_mgm::PerlRegEx::MatchArray

POSIX RegEx like match array with captured substring offsets.

typedef std::vector<int> ca_mgm::PerlRegEx::MatchVector

Native PCRE vector of integers. It contains captured substring offsets pairs. Each even index number points to a start and odd index number the corresponding end of the matched substring.

Constructor & Destructor Documentation

ca_mgm::PerlRegEx::PerlRegEx ( )

Create a new PerlRegEx object without compilation.

ca_mgm::PerlRegEx::PerlRegEx ( const std::string &  regex,
int  cflags = 0 
)

Create a new PerlRegEx object and compile the regular expression.

Parameters
regexA perl regular expression pattern.
cflagsBitwise-or of compile() flags.
Exceptions
RegExCompileExceptionon compilation failure.
ca_mgm::PerlRegEx::PerlRegEx ( const PerlRegEx ref)

Create a new PerlRegEx as (deep) copy of the specified reference. If the reference is compiled, the new object will be compiled as well.

Parameters
refThe PerlRegEx object reference to copy.
Exceptions
RegExCompileExceptionon compilation failure.
ca_mgm::PerlRegEx::~PerlRegEx ( )

Destroy this PerlRegEx object.

Member Function Documentation

std::vector<std::string> ca_mgm::PerlRegEx::capture ( const std::string &  str,
size_t  index = 0,
size_t  count = 0,
int  eflags = 0 
)

Search in string and return an array of captured substrings.

Parameters
strstring to search in
indexmatch string starting at index
countexpected substring count
eflagsexecution flags, see execute()
Returns
array of captured substrings
Exceptions
RegExCompileExceptionif regex is not compiled.
RegExExecuteExceptionon execute failures.
OutOfBoundsExceptionif the index is greater than the string length.
Example:
std::string str("Foo = bar trala hoho");
PerlRegEx reg("^((?i)[a-z]+)[ \t]*=[ \t]*(.*)$");
std::vector<std::string> out = reg.capture(str);
//
// out is { "Foo = bar trala hoho",
// "Foo",
// "bar trala hoho"
// }
bool ca_mgm::PerlRegEx::compile ( const std::string &  regex,
int  cflags = 0 
)

Compile the regular expression pattern contained in the string.

Parameters
regexA regular expression pattern.
cflagsBitwise-or of compilation flags.
Returns
True on successful compilation, false on failure.

The cflags parameter can be set to one or a bitwise-or of the following option flags. Consult the pcre_compile(3) and pcreapi(3) manual pages for the complete list and detailed description.

Most of the compile options can be set also directly in the pattern string using the (?<option character>="">) notation as listed bellow.

  • i PCRE_CASELESS match upper and lower case letters
  • m PCRE_MULTILINE the "^" and "$" matches begin and end of a line instead of the string
  • s PCRE_DOTALL dot metacharacters matches also newlines
  • x PCRE_EXTENDED ignore not escaped whitespaces
  • U PCRE_UNGREEDY invert "greediness" of quantifiers
  • PCRE_UTF8 causes to act in UTF8 mode
  • PCRE_ANCHORED force pattern to be "anchored"
  • PCRE_NO_AUTO_CAPTURE behave as if "(" parenthesis is followed by a "?:"
int ca_mgm::PerlRegEx::compileFlags ( ) const
Returns
The compilation flags used in compile() method.
int ca_mgm::PerlRegEx::errorCode ( )

Return the last error code generated by compile or one of the executing methods.

In case of a compile error, the returned value points to the position (character offset) in the regex pattern string, where where the error was discovered.

In all other cases, the result of the pcre_exec function call is returned.

Returns
pcre_exec result or compile error position.
std::string ca_mgm::PerlRegEx::errorString ( ) const

Return the error message string for the last error code.

Returns
The error message or empty string if no expression was compiled.
bool ca_mgm::PerlRegEx::execute ( MatchVector sub,
const std::string &  str,
size_t  index = 0,
size_t  count = 0,
int  eflags = 0 
)

Execute regular expression matching against the string. The matching starts at the specified index and return true on match of false if no match found.

Note
In contrast to the PosixRegEx class, the PCRE library supports a string index (startoffset) and is able to look behind the starting point. If the regex makes use of the "start of string/line" metacharacter (^), the regex may not match if index is greater than 0.

The expected maximal number of matching substrings can be specified in count. If the default value of 0 is used, the detected count by pcre_fullinfo will be used.

Note
If the specified count is greater 0 but smaller than the effectively number of found matches, false is returned (failure, error code 0). If the specified count is greater 0 and greater than the the effectively number of found matches, unused offsets at the end are filled with to -1.

If no match was found, the sub array will be empty and false is returned. If a match is found and the expression was compiled to capture substrings, the sub array will be filled with the captured substring offsets. The first (index 0) offset pair points to the start of the first match and the end of the last match. Unused / optional capturing subpattern offsets will be set to -1.

The resulting MatchVector is twice as large as the number of captured substrings, the resulting MatchArray equal.

Consult the pcre_exec(3) and pcreapi(3) manual pages for the complete and detailed description.

Parameters
subarray for substring offsets
strstring to match
indexmatch string starting at index
countnumber of expected substring matches
eflagsexecution flags described bellow
Returns
true on match or false
Exceptions
RegExCompileExceptionif regex is not compiled.
AssertionExceptionif the count value is too big (would cause integer overflow).
OutOfBoundsExceptionif the index is greater than the string length.

The eflags parameter can be set to 0 or one or a bitwise-or of the following options:

  • PCRE_NOTBOL The circumflex character (^) will not match the beginning of string.
  • PCRE_NOTEOL The dollar sign ($) will not match the end of string.
  • PCRE_ANCHORED Match only at the first position
  • PCRE_NOTEMPTY An empty string is not a valid match
  • PCRE_NO_UTF8_CHECK Do the string for UTF-8 validity (only relevant if PCRE_UTF8 was set at compile time)
Example:
std::string str("foo = bar trala hoho");
if( PerlRegEx("=").execute(vsub, str) && !vsub.empty())
{
//
// vsub[0] is 4,
// vsub[1] is 5
//
}
if( PerlRegEx("=").execute(rsub, str) && !rsub.empty())
{
//
// rsub[0].rm_so is 4,
// rsub[0].rm_eo is 5
//
}
bool ca_mgm::PerlRegEx::execute ( MatchArray sub,
const std::string &  str,
size_t  index = 0,
size_t  count = 0,
int  eflags = 0 
)

Execute regular expression matching against the string. The matching starts at the specified index and return true on match of false if no match found.

Note
In contrast to the PosixRegEx class, the PCRE library supports a string index (startoffset) and is able to look behind the starting point. If the regex makes use of the "start of string/line" metacharacter (^), the regex may not match if index is greater than 0.

The expected maximal number of matching substrings can be specified in count. If the default value of 0 is used, the detected count by pcre_fullinfo will be used.

Note
If the specified count is greater 0 but smaller than the effectively number of found matches, false is returned (failure, error code 0). If the specified count is greater 0 and greater than the the effectively number of found matches, unused offsets at the end are filled with to -1.

If no match was found, the sub array will be empty and false is returned. If a match is found and the expression was compiled to capture substrings, the sub array will be filled with the captured substring offsets. The first (index 0) offset pair points to the start of the first match and the end of the last match. Unused / optional capturing subpattern offsets will be set to -1.

The resulting MatchVector is twice as large as the number of captured substrings, the resulting MatchArray equal.

Consult the pcre_exec(3) and pcreapi(3) manual pages for the complete and detailed description.

Parameters
subarray for substring offsets
strstring to match
indexmatch string starting at index
countnumber of expected substring matches
eflagsexecution flags described bellow
Returns
true on match or false
Exceptions
RegExCompileExceptionif regex is not compiled.
AssertionExceptionif the count value is too big (would cause integer overflow).
OutOfBoundsExceptionif the index is greater than the string length.

The eflags parameter can be set to 0 or one or a bitwise-or of the following options:

  • PCRE_NOTBOL The circumflex character (^) will not match the beginning of string.
  • PCRE_NOTEOL The dollar sign ($) will not match the end of string.
  • PCRE_ANCHORED Match only at the first position
  • PCRE_NOTEMPTY An empty string is not a valid match
  • PCRE_NO_UTF8_CHECK Do the string for UTF-8 validity (only relevant if PCRE_UTF8 was set at compile time)
Example:
std::string str("foo = bar trala hoho");
if( PerlRegEx("=").execute(vsub, str) && !vsub.empty())
{
//
// vsub[0] is 4,
// vsub[1] is 5
//
}
if( PerlRegEx("=").execute(rsub, str) && !rsub.empty())
{
//
// rsub[0].rm_so is 4,
// rsub[0].rm_eo is 5
//
}
std::vector<std::string> ca_mgm::PerlRegEx::grep ( const std::vector< std::string > &  src,
int  eflags = 0 
)

Match all strings in the array against regular expression. Returns an array of matching strings.

Parameters
srclist of strings to match
eflagsexecution flags, see execute() method
Exceptions
RegExCompileExceptionif regex is not compiled.
RegExExecuteExceptionon execute failures.
OutOfBoundsExceptionif the index is greater than the string length.
Example:
std::vector<std::string> src;
src.push_back("\t");
src.push_back("one");
src.push_back("");
src.push_back("two");
src.push_back(" ");
std::vector<std::string> out = PerlRegEx("[^ \t]").grep(src);
//
// out is { "one", "two" }
//
bool ca_mgm::PerlRegEx::isCompiled ( ) const
Returns
true, if the current regex object is compiled.
bool ca_mgm::PerlRegEx::match ( const std::string &  str,
size_t  index = 0,
int  eflags = 0 
) const

Execute regular expression matching against the string. The matching starts at the specified index and return true on match of false if no match found.

See execute() method for description of the index and eflags parameters.

Parameters
strstring to match
indexmatch string starting index
eflagsexecution flags, see execute() method
Returns
true on match or false
Exceptions
RegExCompileExceptionif regex is not compiled.
RegExExecuteExceptionon execute failures.
OutOfBoundsExceptionif the index is greater than the string length.
Example:
std::string str("foo = bar ");
if( PerlRegEx("^[a-z]+[ \t]*=[ \t]*.*$").match(str))
{
}
PerlRegEx& ca_mgm::PerlRegEx::operator= ( const PerlRegEx ref)

Assign the specified PerlRegEx reference. If the reference is compiled, the current object will be (re)compiled.

Parameters
refThe PerlRegEx object reference to assign from.
Exceptions
RegExCompileExceptionon compilation failure.
std::string ca_mgm::PerlRegEx::patternString ( ) const
Returns
The regular expression pattern string.
std::string ca_mgm::PerlRegEx::replace ( const std::string &  str,
const std::string &  rep,
bool  global = false,
int  eflags = 0 
)

Replace (substitute) the first or all matching substrings.

Substring(s) matching regular expression are replaced with the string provided in rep and a new, modified string is returned. If no matches are found, a copy of 'str' string is returned.

The rep string can contain capturing references "\\1" to "\\9" that will be substituted with the corresponding captured string. Prepended "\\" before the reference disables (switches to skip) the substitution. Note, the notation using double-slash followed by a digit character, not just "\1" like the "\n" escape sequence.

Parameters
strstring that should be matched
repreplacement substring with optional references
globalif to replace the first or all matches
eflagsexecution flags, see execute() method
Returns
new string with modification(s)
Exceptions
RegExCompileExceptionif regex is not compiled.
RegExExecuteExceptionon execute failures.
OutOfBoundsExceptionif the index is greater than the string length.
Example:
std::string str("//foo/.//bar/hoho");
PerlRegEx reg("([/]+(\\.?[/]+)?)");
std::string out = reg.replace(str, "/");
//
// out is "/foo/bar/hoho"
//
std::vector<std::string> ca_mgm::PerlRegEx::split ( const std::string &  str,
bool  empty = false,
int  eflags = 0 
)

Split the specified string into an array of substrings. The regular expression is used to match the separators.

If the empty flag is true, empty substring are included in the resulting array.

If no separators were found, and the empty flag is true, the array will contain the input string as its only element. If the empty flag is false, a empty array is returned.

Parameters
strstring that should be splitted
emptywhether to capture empty substrings
eflagsexecution flags, see execute() method
Returns
array of resulting substrings or empty array on failure
Exceptions
RegExCompileExceptionif regex is not compiled.
RegExExecuteExceptionon execute failures.
OutOfBoundsExceptionif the index is greater than the string length.
Example:
std::string str("1.23, .50 , , 71.00 , 6.00");
std::vector<std::string> out1 = PerlRegEx("([ \t]*,[ \t]*)").split(str);
//
// out1 is { "1.23", ".50", "71.00", "6.00" }
//

Member Data Documentation

int ca_mgm::PerlRegEx::m_ecode
mutableprivate
std::string ca_mgm::PerlRegEx::m_error
mutableprivate
int ca_mgm::PerlRegEx::m_flags
private
pcre* ca_mgm::PerlRegEx::m_pcre
private
std::string ca_mgm::PerlRegEx::m_rxstr
private

The documentation for this class was generated from the following file: