In this post I will walk through some of the common tasks that are undertaken when we process unstructured text-based data. This will also give me the opportunity to introduce the terminology associated with text processing.
Traditionally statistical methods have focused on the use of numerical data, perhaps partitioned by classification data. A classic example of this would be oneway analysis of variance, or linear multiple regression containing classification variables that had been internally coded as integer values.
JSL is often described as a scripting language. Personally I think that doesn’t do it justice. I prefer to think of it as a programming language. The difference? For me an obvious difference is that instead of using hard-coded values I want to use variables. In particular I want to use variables to handle column references.
It’s been a while – so, since it’s Friday, here is a collection of Friday’s Functions … some of my favourite user-defined functions.
The JMP website has introduced a similar theme, JSL Cookbook, so probably I will change the tag associated with these posts to be JSL Cookbook instead of Friday’s Functions.
Most of the work associated with building a predictive model is associated with either performance tuning or data prepping.
I’m almost half way through prepping some data. It’s not necessary to script this but a script allows me to adjust the data preparation in the future and more importantly to document the sequence of steps that I have taken.
I was recently asked a question about updating display boxes. Display boxes are the building blocks of JMP output windows. Fundamentally there are two methods of updating these display boxes, which I will take a closer look at. (more…)
I’m sure there is a more technically correct term for this: I use the phrase segmented regression to describe the process whereby I select a segment of data within a curve and build a regression model for just that segment.
I have some code to aid the process. The code illustrates how to perform regression on-the-fly as well as how to utilise the MouseTrap function to handle mouse movement events.
An Easter egg is an intentional inside joke, a hidden message, or a secret feature of an interactive work (often, a computer program, video game or DVD menu screen). The name is used to evoke the idea of a traditional Easter egg hunt
The above visualisation is a 3D tree view of a decision tree generated with the Partition platform.
However, if you look under the red triangle hotspot for the platform you won’t find an option to create this output.
I have been investigating the use of logistic regression to model image pixel data. Now I want to take a look at the use of neural networks. In this post I am going to build the simplest possible neural network and compare it against a simple logistic regression.