Discussion:
Regex Match very slow...
(too old to reply)
Tom
2016-05-24 12:11:48 UTC
Permalink
Hi,

I have the following code:

$Tel_File = gc '.\out.tel'

foreach ($r in $array_ref) {
Write-Host "Looking for:" $r
if ($Tel_File -match ".*_AC_.*$r.*")
{
write-host "$r Connect to AC"
}
elseif ($Tel_File -match ".*VCC(\d+)?_.*$r.*")
{
write-host "$r Connect to VCC"
}
}

The $Tel_File is a little over 2000 lines long and looks like:

'V2P5_AVIN' ; C170:F133.1 R354:F136.1 U45:F132.39

Some lines are much longer..

The foreach can have well over 100 entries, which has "C170" ($r). It's the regrex matching that appears to be very slow...

How can I get this to run faster?

Thank you for any help in advance!

-Tom
Jürgen Exner
2016-05-25 00:26:52 UTC
Permalink
Post by Tom
$Tel_File = gc '.\out.tel'
foreach ($r in $array_ref) {
Write-Host "Looking for:" $r
if ($Tel_File -match ".*_AC_.*$r.*")
{
write-host "$r Connect to AC"
}
elseif ($Tel_File -match ".*VCC(\d+)?_.*$r.*")
{
write-host "$r Connect to VCC"
}
}
'V2P5_AVIN' ; C170:F133.1 R354:F136.1 U45:F132.39
Some lines are much longer..
The foreach can have well over 100 entries, which has "C170" ($r). It's the regrex matching that appears to be very slow...
Well, yes, evaluating REs is complex and expensive.
Post by Tom
How can I get this to run faster?
There are 3 steps that come to my mind.

First the leading and trailing '.*' don't to anything useful except
making the RE more complex and more expensive to execute.
So get rid of them.

Second you are running this expensive combined RE match 2000x100x2
times. This can be optimized.
First filter for those elements which match just e.g. "_AC_" and in this
much smaller result list filter again for the matches from $r.

And third with one exception you are not really using any RE features
but you are doing a simple string compare. So why kicking off the
expensive RE engine when a simple substring compare will do already?
Please see https://technet.microsoft.com/en-us/library/ee692804.aspx,
section "Checking For Strings Within Strings" for the cheap and fast way
how to test if $b is a substring of $a.

jue
Tom
2016-05-25 12:14:16 UTC
Permalink
Post by Jürgen Exner
Post by Tom
$Tel_File = gc '.\out.tel'
foreach ($r in $array_ref) {
Write-Host "Looking for:" $r
if ($Tel_File -match ".*_AC_.*$r.*")
{
write-host "$r Connect to AC"
}
elseif ($Tel_File -match ".*VCC(\d+)?_.*$r.*")
{
write-host "$r Connect to VCC"
}
}
'V2P5_AVIN' ; C170:F133.1 R354:F136.1 U45:F132.39
Some lines are much longer..
The foreach can have well over 100 entries, which has "C170" ($r). It's the regrex matching that appears to be very slow...
Well, yes, evaluating REs is complex and expensive.
Post by Tom
How can I get this to run faster?
There are 3 steps that come to my mind.
First the leading and trailing '.*' don't to anything useful except
making the RE more complex and more expensive to execute.
So get rid of them.
Second you are running this expensive combined RE match 2000x100x2
times. This can be optimized.
First filter for those elements which match just e.g. "_AC_" and in this
much smaller result list filter again for the matches from $r.
And third with one exception you are not really using any RE features
but you are doing a simple string compare. So why kicking off the
expensive RE engine when a simple substring compare will do already?
Please see https://technet.microsoft.com/en-us/library/ee692804.aspx,
section "Checking For Strings Within Strings" for the cheap and fast way
how to test if $b is a substring of $a.
jue
Awesome, thank you very much. Just removing the leading and trailing .* saw a huge performance boost. I try the other suggestions as well.

-Tom

Loading...