Scripting Table Subsets

The best way to script table manipulation tasks such as joins and subsets is to first perform the task interactively and then make a copy of the source JSL that is automatically generated by JMP.  In many instances this code is sufficient, but sometimes you need to make the code more general, and that’s where things can get tricky.

In this post I will take you through the process of transforming the JMP-generated code into a more flexible piece of JSL.

So What’s The Problem?

I want to make a copy of a table, but to only copy specific columns.  I can do that in JMP using Table>Subset.  The resultant output table has a table script source attached to it:

This script works fine against my existing table Demo Data, but will not necessarily run correctly against other tables that I am working with.  The reason is that different tables will contain different variables names: the column Glucose is specific to this table.  Other tables will contain a different variable name.  So I need to generalise the code to work with a variety of different tables and columns names.

Step 1: Referencing the Source Table

The JMP-generated code hard-codes the name of the table acting as the source data table.  In practice, my script will open the data table and return a table reference dt.  For the purposes of testing, I will specify the table reference as the currently active data table.  I’ll also take the opportunity to assign a table reference (dtSub) to the new output table.

Step 2: Handling Column References

This is the important step.  Specifically I have a hard-coded reference to the column Glucose; I want to change the code so that the name of the column is stored in a variable var.

In most languages you would do the following:

However, this code will fail.  The JMP log window will contain the message Column not found in access or evaluation of ‘Bad Argument’.

The solution is to explicitly evaluate the variable by replacing var with Eval(var);

Step 3: Generating the Variable

In the previous step I verified that the code successfully runs using a variable to store the column name.  But the value is still hard-coded.  Now I can derive the value from the actual table.  For my data I know it will be the last column:

Step 4: Being Thorough

The script will fail if the columns Bioreactor, Age and Time do not exist.  If I am in a controlled environment, or using the script just for myself then almost certainly consider that to be an unlikely risk and not worry about it.  However, if I want to generate “production quality” code then I could be more rigorous and explicitly check the for existence of these columns.

Here I am taking advantage of two user-defined functions: Missing Columns and Assert.

An Alternative to Step 3

In step 3 I used Eval(var) to substitute an individual column name.  In practice I typically use a more general approach which is to use a variable for the entire list of columns.

First I rewrite the code to verify that the column specifications can be replaced with a list:

Now I am free to perform whatever tasks are necessary to generate the items within the list colList; for example:

This might seem overly complicated.  But the benefit is that I have totally decoupled the code that performs the table subset from the code that identifies the source columns.

Share the joy:

One thought on “Scripting Table Subsets”

Leave a Reply