The best way to script table manipulation tasks such as joins and subsets is to first perform the task interactively and then make a copy of the source JSL that is automatically generated by JMP. In many instances this code is sufficient, but sometimes you need to make the code more general, and that’s where things can get tricky.
In this post I will take you through the process of transforming the JMP-generated code into a more flexible piece of JSL.
So What’s The Problem?
I want to make a copy of a table, but to only copy specific columns. I can do that in JMP using Table>Subset. The resultant output table has a table script source attached to it:
1 2 3 4 5 |
Data Table( "Demo Data" ) << Subset( Output Table( "My Subset" ), All rows, columns( :Bioreactor, :Age, :Time, :Glucose ) ) |
This script works fine against my existing table Demo Data, but will not necessarily run correctly against other tables that I am working with. The reason is that different tables will contain different variables names: the column Glucose is specific to this table. Other tables will contain a different variable name. So I need to generalise the code to work with a variety of different tables and columns names.
Step 1: Referencing the Source Table
The JMP-generated code hard-codes the name of the table acting as the source data table. In practice, my script will open the data table and return a table reference dt. For the purposes of testing, I will specify the table reference as the currently active data table. I’ll also take the opportunity to assign a table reference (dtSub) to the new output table.
1 2 3 4 5 6 |
dt = Current Data Table(); dtSub = dt << Subset( Output Table( "My Subset" ), All rows, columns( :Bioreactor, :Age, :Time, :Glucose ) ) |
Step 2: Handling Column References
This is the important step. Specifically I have a hard-coded reference to the column Glucose; I want to change the code so that the name of the column is stored in a variable var.
In most languages you would do the following:
1 2 3 4 5 6 |
dt = Current Data Table(); dtSub = dt << Subset( Output Table( "My Subset" ), All rows, columns( :Bioreactor, :Age, :Time, var ) ) |
However, this code will fail. The JMP log window will contain the message Column not found in access or evaluation of ‘Bad Argument’.
The solution is to explicitly evaluate the variable by replacing var with Eval(var);
1 2 3 4 5 6 7 |
dt = Current Data Table(); var = "Glucose"; dtSub = dt << Subset( Output Table( "My Subset" ), All rows, columns( :Bioreactor, :Age, :Time, Eval(var) ) ) |
Step 3: Generating the Variable
In the previous step I verified that the code successfully runs using a variable to store the column name. But the value is still hard-coded. Now I can derive the value from the actual table. For my data I know it will be the last column:
1 2 3 4 5 6 7 |
dt = Current Data Table(); var = Column(dt,NCols(dt)) << Get Name; dtSub = dt << Subset( Output Table( "My Subset" ), All rows, columns( :Bioreactor, :Age, :Time, Eval(var) ) ) |
Step 4: Being Thorough
The script will fail if the columns Bioreactor, Age and Time do not exist. If I am in a controlled environment, or using the script just for myself then almost certainly consider that to be an unlikely risk and not worry about it. However, if I want to generate “production quality” code then I could be more rigorous and explicitly check the for existence of these columns.
1 2 3 4 5 6 7 8 9 10 11 |
dt = Current Data Table(); lstMissing = Missing Columns(dt,"Bioreactor","Age","Time"); Assert(NItems(lstMissing)==0, "One or more required columns are missing" ); var = Column(dt,NCols(dt)) << Get Name; dtSub = dt << Subset( Output Table( "My Subset" ), All rows, columns( :Bioreactor, :Age, :Time, Eval(var) ) ) |
Here I am taking advantage of two user-defined functions: Missing Columns and Assert.
An Alternative to Step 3
In step 3 I used Eval(var) to substitute an individual column name. In practice I typically use a more general approach which is to use a variable for the entire list of columns.
First I rewrite the code to verify that the column specifications can be replaced with a list:
1 2 3 4 5 6 7 |
dt = Current Data Table(); colList = {"Bioreactor","Age","Time","Glucose"}; dtSub = dt << Subset( Output Table( "My Subset" ), All rows, columns( colList ) ) |
Now I am free to perform whatever tasks are necessary to generate the items within the list colList; for example:
1 2 3 4 5 6 7 8 9 |
dt = Current Data Table(); colList = {"Bioreactor","Age","Time"}; var = Column(dt,NCols(dt)) << Get Name; InsertInto(colList ,var); dtSub = dt << Subset( Output Table( "My Subset" ), All rows, columns( colList ) ) |
This might seem overly complicated. But the benefit is that I have totally decoupled the code that performs the table subset from the code that identifies the source columns.
Interessante Website. Das Design und die nuetzlichen Informationen gefallen mir besonders gut.
Maybe give an example of using this method where rows are filtered as well?