It turns out that the prediction profiler has a hidden secret. And not just some easter egg feature that is just a bit of fun. This secret is core to how you use the profiler – and might just totally change how you use it in future.
In my previous post I introduced the sample data table Pet Survey. I created a column formula to classify each respondent to determine whether they owned a cat, a dog, or both. In this simple example, there were signs of the problems that arise when processing unstructured text data. My classification of “dog” missed out responses referring to huskies; my classification of “cat” incorrectly included references to cattle. I looked at the Text Explorer platform and focused on the output contained in the lists of terms and phrases. In this post I want to focus on workflow: using the functionality within Text Explorer platform to gain meaningful insights into my data, and to answer specific questions.
Traditionally statistical methods have focused on the use of numerical data, perhaps partitioned by classification data. A classic example of this would be oneway analysis of variance, or linear multiple regression containing classification variables that had been internally coded as integer values.
JSL is often described as a scripting language. Personally I think that doesn’t do it justice. I prefer to think of it as a programming language. The difference? For me an obvious difference is that instead of using hard-coded values I want to use variables. In particular I want to use variables to handle column references.
It’s been a while – so, since it’s Friday, here is a collection of Friday’s Functions … some of my favourite user-defined functions.
The JMP website has introduced a similar theme, JSL Cookbook, so probably I will change the tag associated with these posts to be JSL Cookbook instead of Friday’s Functions.
Most of the work associated with building a predictive model is associated with either performance tuning or data prepping.
I’m almost half way through prepping some data. It’s not necessary to script this but a script allows me to adjust the data preparation in the future and more importantly to document the sequence of steps that I have taken.
I’m sure there is a more technically correct term for this: I use the phrase segmented regression to describe the process whereby I select a segment of data within a curve and build a regression model for just that segment.
I have some code to aid the process. The code illustrates how to perform regression on-the-fly as well as how to utilise the MouseTrap function to handle mouse movement events.