Regex Match very slow...

Discussion:

(too old to reply)

Tom

2016-05-24 12:11:48 UTC

Hi,

I have the following code:

$Tel_File = gc '.\out.tel'

foreach ($r in $array_ref) {
Write-Host "Looking for:" $r
if ($Tel_File -match ".*_AC_.*$r.*")
{
write-host "$r Connect to AC"
}
elseif ($Tel_File -match ".*VCC(\d+)?_.*$r.*")
{
write-host "$r Connect to VCC"
}
}

The $Tel_File is a little over 2000 lines long and looks like:

'V2P5_AVIN' ; C170:F133.1 R354:F136.1 U45:F132.39

Some lines are much longer..

The foreach can have well over 100 entries, which has "C170" ($r). It's the regrex matching that appears to be very slow...

How can I get this to run faster?

Thank you for any help in advance!

-Tom

Jürgen Exner

2016-05-25 00:26:52 UTC

Permalink

Post by Tom
$Tel_File = gc '.\out.tel'
foreach ($r in $array_ref) {
Write-Host "Looking for:" $r
if ($Tel_File -match ".*_AC_.*$r.*")
{
write-host "$r Connect to AC"
}
elseif ($Tel_File -match ".*VCC(\d+)?_.*$r.*")
{
write-host "$r Connect to VCC"
}
}
'V2P5_AVIN' ; C170:F133.1 R354:F136.1 U45:F132.39
Some lines are much longer..
The foreach can have well over 100 entries, which has "C170" ($r). It's the regrex matching that appears to be very slow...

Well, yes, evaluating REs is complex and expensive.

Post by Tom
How can I get this to run faster?

There are 3 steps that come to my mind.

First the leading and trailing '.*' don't to anything useful except
making the RE more complex and more expensive to execute.
So get rid of them.

Second you are running this expensive combined RE match 2000x100x2
times. This can be optimized.
First filter for those elements which match just e.g. "_AC_" and in this
much smaller result list filter again for the matches from $r.

And third with one exception you are not really using any RE features
but you are doing a simple string compare. So why kicking off the
expensive RE engine when a simple substring compare will do already?
Please see https://technet.microsoft.com/en-us/library/ee692804.aspx,
section "Checking For Strings Within Strings" for the cheap and fast way
how to test if $b is a substring of $a.

jue

Tom

2016-05-25 12:14:16 UTC

Permalink

Post by JÃ¼rgen Exner

Well, yes, evaluating REs is complex and expensive.

Post by Tom
How can I get this to run faster?

There are 3 steps that come to my mind.
First the leading and trailing '.*' don't to anything useful except
making the RE more complex and more expensive to execute.
So get rid of them.
Second you are running this expensive combined RE match 2000x100x2
times. This can be optimized.
First filter for those elements which match just e.g. "_AC_" and in this
much smaller result list filter again for the matches from $r.
And third with one exception you are not really using any RE features
but you are doing a simple string compare. So why kicking off the
expensive RE engine when a simple substring compare will do already?
Please see https://technet.microsoft.com/en-us/library/ee692804.aspx,
section "Checking For Strings Within Strings" for the cheap and fast way
how to test if $b is a substring of $a.
jue

Awesome, thank you very much. Just removing the leading and trailing .* saw a huge performance boost. I try the other suggestions as well.

-Tom