Basic Pattern Matching

Pattern matching is an incredibly powerful technique for interrogating text strings for the purpose of matching and manipulating string patterns.  In this post I will illustrate some of the basic principles of pattern matching.  This will be followed by more advanced scenarios.  

Pattern Matching Variables

Let’s say I have the following text string (hover your mouse over the line to see the full text):

Such as string (but probably much larger!) might be derived from loading data  from an external file (using the function Load Text File).

Let’s say I want to extract the name of the equipment (“e1”) from this text string.  I can locate the value by noting that it is preceded by the text “eqpt=” and is followed by a comma.  If I assume that these values will be constant for all text strings then I can create a JSL pattern variable:

PatArb is a pattern matching function that matches an arbitrary pattern of text (in this example that text is “e1”).

Pattern Matching

Now that I have described the pattern that I am interested in I can ask whether my text string contains the pattern:

isMatch will have the value 1 (true) if the match is found, otherwise it will be 0 (false).

Extracting Matched Patterns

Knowing that I have found a match is useful but it doesn’t tell me the value of the match – remember, I want to find the name of the equipment (“e1” in this example).

The information I am interested in corresponds to the arbitrary text string identified by the Pat Arb function.  I can ask JMP to store the matched text in a variable using the following notation:

PatArb() >> variableName

Using this notation my pattern variable becomes:

Now when I apply the pattern matching I can identify the piece of equipment:

If you run this code you should see in the JMP log window that the variable eqptName has been assigned the value “e1”.

Extracting a Data Assignment

Let’s take a look at another example.  In addition to the header section there is also a data section:

In particular it contains assignments with the following pattern:

variableName = value

I want to extract this information.  Let’s try the simplest pattern definition that we could use to define this:

To apply the pattern I use this code:

This correctly identifies the variable name “a” but the assigned value is “10,b=20,c=30,d=40,e=50”.  It’s taken everything to the right of the equals sign.  I can fix that by saying that the value is delimited by a comma:

This works; my log window contains the following output:

Or at least, it works in the sense that I have extracted the first variable.  If I wanted the next variable I would need a different pattern:

Writing a separate pattern for each variable isn’t a viable solution.  I want to define a single pattern and use it iteratively.

Iteration with String Replacement

One way of doing this is to apply a pattern to the string, then throw away the part of the string that matched the pattern.  Then I apply the pattern again.  This is easier to understand by example.

First I am going to simplify the problem by extracting just the data component from the string:

The variable strdata now contains the string:

 “a=10,b=20,c=30,d=40,e=50”

I want to create a pattern to pick out the first variable assignment:

This new pattern can be applied to the data string:

This successfully extracts the first name and value:

Now I want to throw the first part of the data string away, so that I can apply the pattern to the subsequent text.  The Pat Match function has an optional third argument.  This argument defines some replacement text.

If a pattern match is found, then the text that matched the pattern is replaced with the replacement text.  If I want to throw away the text that matched the pattern I can use a null string.

The log window contains the following output:

Notice that the data string now starts with my next variable assignment.  Now I just need to re-apply my pattern.  I can do this using a While loop.  Also, since I am now creating multiple variable name/value pairs, it is more convenient to place them in an associative array:

My log window looks like this:

I’ve successfully iterated through the text string creating pairs of assignments stored in an associative array.

There is one problem to deal with.  It hasn’t picked out the last assigment (“e=50”).  My pattern explicitly states that the assignment is followed by a comma.  That condition is not satisfied for the last assignment.  I’m going to make a pragmatic solution which is to simply append a comma to the end of my data string!  Here is my final code:

 

 

3 thoughts on “Basic Pattern Matching”

Leave a Reply

Your email address will not be published. Required fields are marked *