More Pattern Matching

In my last post I introduced the principles of using the pattern matching functions within JMP.  Once you start using pattern matching you will discover that you need to use some additional features, which I discuss here.  

I’m going to take a look at some additional pattern matching functions.  In isolation these functions may appear to have limited utility, but in combination can become very powerful.

Pat Len( length )

This function allows a section of text to be matched of predefined length.  It’s useful if the string contains fields of fixed width.

In the above example the variable var is assigned the value “abc”.

Pat Span( string )

Sometimes we want to match a specific set of characters.  Here is a simple string:

“wafer=3,site=1”

I could use the following pattern definition:

However, I would have a problem for wafer 10!

The Pat Span function allows me to define a set of allowable characters for the purpose of matching:

The following example correctly identifies the wafer number as “10”:

As a side note, if the pattern definition starts getting too long it can be broken up into multiple variables, for example:

Pay Any( string )

This function is similar to Pat Span in that it specifies a string of allowable characters.  The Pat Any function however, restricts the match to a single character.

The above code results in a value of “1” for wnum.  Clearly it is inappropriate for this example, but it serves as a comparison between the earlier example for Pat Span.

In essence Pay Any combines characteristics of Pat Span whilst using the capability of Pat Len to constrain the length to one character.

Pat Altern

Sometimes a single pattern isn’t sufficient to define a generalised rule for a match:

This code example generates the following results:

The user-defined function Get Value works correctly for all but the first field.  The function uses the following pattern:

For the first field this doesn’t work, it requires an alternate pattern:

The Pat Altern function generates a pattern that matches any one of the supplied patterns:

The function is performing a logical-OR on the patterns and can be abbreviated with the OR operator:

In this example, rather than creating entire patterns for the two matches, the OR could be used simply to make the rule pattern more flexible with respect to the initial comma:

Pat Repeat( pattern )

Here is an example of using the Pat Repeat function, based loosely on the example from the online help:

This generates:

In this example the argument to the Pat Repeat function is a string, but in the general case is a pattern:

This identifies the substring containing the data fields:

Pat Pos(  position )

Associated with pattern matching is the concept of a cursor position.  Pattern matching is performed relative to the cursor position, which by default is at the start of the string.  The Pat Pos function can be used to manipulate the cursor position, or to report its value.

In the above example var contains “cde”.  Notice that pattern matching starts at the character following the cursor position (you might have expected the variable to contain “bcd”).

Once the pattern match has completed, where is the cursor?  This information can be revealed using the function without an argument:

The variable pos has the value 5.

In the following example the position is adjusted in iteration of the loop:

This generates an array arr with the following key/value pairs:

Pat Test( expression )

Let’s say I have the following string:

“catch a catnapping cat in a catsup factory”

and I would like to transform it into:

“catch a catnapping dog in a catsup factory”

This is based on the example code found in the JMP on-line help.

To replace “cat” with “dog” I can take advantage of the third optional argument associated with the Pat Match function:

This gives me:

The challenge for pattern matching is that I want to locate “cat” but there are multiple instances of the text “cat” within the string.

I want to locate the third instance.

I’d like my pattern matching to “work” only on the third instance of a successful match.  The Pat Test function allows me to do this.  It allows me to evaluate an expression each time a match is found, and it gives me the ability to “disallow” the match.  So let’t take a look at how it works:

The above code behaves identically to the first example.  Pat Test is always true.  Change the assignment of testValue from 1 to 0 and Pat Test will always be false.  This “disables” a successful match and isMatch is false.  In fact what happens is this:

  1. The first match for cat is located.  But Pat Test is false so the match is discarded, and the pattern matching is re-evaluated.
  2. The second match for cat is located.  But Pat Test is false so the match is discarded, and the pattern matching is re-evaluated.
  3. The third match for cat is located.  But Pat Test is false so the match is discarded, and the pattern matching is re-evaluated.
  4. The fourth match for cat is located.  But Pat Test is false so the match is discarded, and the pattern matching is re-evaluated.
  5. No more matches are found, so isMatch is set to zero.

Instead of using testValue, a JSL expression can be used that turns Pat Test on or off:

The expression testValue will evaluate to zero unless nCats is 3.

Success! This achieves the correct substitution:

In the following example Pat Test is defined so that it always returns false:

This might seem a strange thing to do.  But recall that a false value forces a re-evaluation of the string.  Each re-evaluation shifts the cursor position until the pattern matching is exhausted.

The following code modification allows the number of matches to be counted:

Once the pattern matching is performed n takes the value 5.  Why stop there:

This generates an array arr with the following key/value pairs:

This is performing the same task as the last example I used in the section of Pat Test.  But this revise code doesn’t require an explicit loop for iteration.

Pat Break( string )

Pattern matching is performed until the string is matched.  For example:

The variable str contains the string “abcde”.

If the string argument contains more than one character then a match is performed when any of the characters is found.  For example if in the above example the argument was “cf” then str would be “ab”.

Note that the match does not include the character in the matching string.

The following example retrieves the first line from a text file:

3 thoughts on “More Pattern Matching”

    1. That’s a very good question. For a long time I couldn’t see a use for it but recently I had a challenging problem with the speed of importing text files. It was taking about 3 seconds to read a file – not a problem for just one file but I had to read about 50, so that converted to a wait of 2 to 3 minutes. Using pattern matching I got the time down from minutes to seconds and Pat Fence was an essential element of the solution. I’ll see if I can write a post to illustrate the logic.

Leave a Reply

Your email address will not be published. Required fields are marked *