Scripting Table Subsets

The best way to script table manipulation tasks such as joins and subsets is to first perform the task interactively and then make a copy of the source JSL that is automatically generated by JMP. In many instances this code is sufficient, but sometimes you need to make the code more general, and that’s where things can get tricky.

In this post I will take you through the process of transforming the JMP-generated code into a more flexible piece of JSL.

So What’s The Problem?

I want to make a copy of a table, but to only copy specific columns. I can do that in JMP using Table>Subset. The resultant output table has a table script source attached to it:

Data Table( "Demo Data" ) << Subset(
	Output Table( "My Subset" ),
	All rows,
	columns( :Bioreactor, :Age, :Time, :Glucose )
)

Data Table( "Demo Data" ) << Subset(

Output Table( "My Subset" ),

All rows,

columns( :Bioreactor, :Age, :Time, :Glucose )

)

This script works fine against my existing table Demo Data, but will not necessarily run correctly against other tables that I am working with. The reason is that different tables will contain different variables names: the column Glucose is specific to this table. Other tables will contain a different variable name. So I need to generalise the code to work with a variety of different tables and columns names.

Step 1: Referencing the Source Table

The JMP-generated code hard-codes the name of the table acting as the source data table. In practice, my script will open the data table and return a table reference dt. For the purposes of testing, I will specify the table reference as the currently active data table. I’ll also take the opportunity to assign a table reference (dtSub) to the new output table.

dt = Current Data Table();
dtSub = dt << Subset(
	Output Table( "My Subset" ),
	All rows,
	columns( :Bioreactor, :Age, :Time, :Glucose )
)

dt = Current Data Table();

dtSub = dt << Subset(

Output Table( "My Subset" ),

All rows,

columns( :Bioreactor, :Age, :Time, :Glucose )

)

Step 2: Handling Column References

This is the important step. Specifically I have a hard-coded reference to the column Glucose; I want to change the code so that the name of the column is stored in a variable var.

In most languages you would do the following:

dt = Current Data Table();
dtSub = dt << Subset(
	Output Table( "My Subset" ),
	All rows,
	columns( :Bioreactor, :Age, :Time, var )
)

dt = Current Data Table();

dtSub = dt << Subset(

Output Table( "My Subset" ),

All rows,

columns( :Bioreactor, :Age, :Time, var )

)

However, this code will fail. The JMP log window will contain the message Column not found in access or evaluation of ‘Bad Argument’.

The solution is to explicitly evaluate the variable by replacing var with Eval(var);

dt = Current Data Table();
var = "Glucose";
dtSub = dt << Subset(
	Output Table( "My Subset" ),
	All rows,
	columns( :Bioreactor, :Age, :Time, Eval(var) )
)

dt = Current Data Table();

var = "Glucose";

dtSub = dt << Subset(

Output Table( "My Subset" ),

All rows,

columns( :Bioreactor, :Age, :Time, Eval(var) )

)

Step 3: Generating the Variable

In the previous step I verified that the code successfully runs using a variable to store the column name. But the value is still hard-coded. Now I can derive the value from the actual table. For my data I know it will be the last column:

dt = Current Data Table();
var = Column(dt,NCols(dt)) << Get Name;
dtSub = dt << Subset(
	Output Table( "My Subset" ),
	All rows,
	columns( :Bioreactor, :Age, :Time, Eval(var) )
)

dt = Current Data Table();

var = Column(dt,NCols(dt)) << Get Name;

dtSub = dt << Subset(

Output Table( "My Subset" ),

All rows,

columns( :Bioreactor, :Age, :Time, Eval(var) )

)

Step 4: Being Thorough

The script will fail if the columns Bioreactor, Age and Time do not exist. If I am in a controlled environment, or using the script just for myself then almost certainly consider that to be an unlikely risk and not worry about it. However, if I want to generate “production quality” code then I could be more rigorous and explicitly check the for existence of these columns.

dt = Current Data Table();
lstMissing = Missing Columns(dt,"Bioreactor","Age","Time");
Assert(NItems(lstMissing)==0,
	"One or more required columns are missing"
);
var = Column(dt,NCols(dt)) << Get Name;
dtSub = dt << Subset(
	Output Table( "My Subset" ),
	All rows,
	columns( :Bioreactor, :Age, :Time, Eval(var) )
)

dt = Current Data Table();

lstMissing = Missing Columns(dt,"Bioreactor","Age","Time");

Assert(NItems(lstMissing)==0,

"One or more required columns are missing"

);

var = Column(dt,NCols(dt)) << Get Name;

dtSub = dt << Subset(

Output Table( "My Subset" ),

All rows,

columns( :Bioreactor, :Age, :Time, Eval(var) )

)

Here I am taking advantage of two user-defined functions: Missing Columns and Assert.

An Alternative to Step 3

In step 3 I used Eval(var) to substitute an individual column name. In practice I typically use a more general approach which is to use a variable for the entire list of columns.

First I rewrite the code to verify that the column specifications can be replaced with a list:

dt = Current Data Table();
colList = {"Bioreactor","Age","Time","Glucose"};
dtSub = dt << Subset(
	Output Table( "My Subset" ),
	All rows,
	columns( colList )
)

dt = Current Data Table();

colList = {"Bioreactor","Age","Time","Glucose"};

dtSub = dt << Subset(

Output Table( "My Subset" ),

All rows,

columns( colList )

)

Now I am free to perform whatever tasks are necessary to generate the items within the list colList; for example:

dt = Current Data Table();
colList = {"Bioreactor","Age","Time"};
var = Column(dt,NCols(dt)) << Get Name;
InsertInto(colList ,var);
dtSub = dt << Subset(
	Output Table( "My Subset" ),
	All rows,
	columns( colList )
)

dt = Current Data Table();

colList = {"Bioreactor","Age","Time"};

var = Column(dt,NCols(dt)) << Get Name;

InsertInto(colList ,var);

dtSub = dt << Subset(

Output Table( "My Subset" ),

All rows,

columns( colList )

)

This might seem overly complicated. But the benefit is that I have totally decoupled the code that performs the table subset from the code that identifies the source columns.

2 thoughts on “Scripting Table Subsets”

Mitch says:

February 14, 2017 at 2:02 pm

Interessante Website. Das Design und die nuetzlichen Informationen gefallen mir besonders gut.

Lew says:

July 11, 2019 at 10:31 am

Maybe give an example of using this method where rows are filtered as well?

JMP Ahead

Scripting Table Subsets

So What’s The Problem?

Step 1: Referencing the Source Table

Step 2: Handling Column References

Step 3: Generating the Variable

Step 4: Being Thorough

An Alternative to Step 3

2 thoughts on “Scripting Table Subsets”

Leave a Reply Cancel reply

Insights in the use of JMP and the scripting language JSL