Linearly Separable Data

videoscribeIn my last post I outlined some “homework” that I had set myself – to write a script that would create linearly separable data.  I want the ability to create it in an interactive environment.  But before I create the interactivity I want to get the foundations correct.  So in this post I will build the code but with the interactive elements.

In the last post I identified these four key elements to the script:

  1. Data
  2. Graph
  3. Line
  4. Response

Let’s take a look at each of these in turn.

1. Data

The start point is to create a data table containing columns for two input variables (X1 and X2) and a binary response (Y).

Lines 5 & 7
The number of data points in the table have been set as a variable so that this can be easily changed (and ultimately specified by the user.

Lines 6 to 25
This is the New Table function.  If it looks like a lot of code that’s just because I have put each New Column field on a separate line.  It could have been written like this:

– more compact but ultimately less readable.

Lines 12 & 18
The function Random Uniform generates random numbers uniformly from the range 0 to 1.

Lines 11 & 17
The numbers have been formatted so that they display with two decimal places.  This is purely cosmetic.

Lines 20 to 24
This is the definition of the response column.  The response is binary.  Setting the modelling type to Nominal will assist when I want to colour the data points based on the binary value (it will prevent JMP trying to use a continuous scale for the colour).

Line 23
The Value Colors property of the column can be used to pre-assign colours to the binary levels (-1 and +1) of the response.  To figure out the syntax for this property you can refer to the documentation or get JMP to do it for you:

2. Graph

The above code can be used to create a data table containing 100 rows of X1 and X2 data.   Using Graph Builder a scatterplot of X2 versus X1 can be produced.  Y can be applied to the Color role (even though it doesn’t yet have any data).  The range of the axes can be modified to be from 0 to 1 and the marker sizes can be made as large as possible.  With all of these edits JMP will create the following code (with a couple of exceptions listed below):

– line numbers are based on the full script, so this block of code follows on from the data creation.

Line 26
I have added an explicit reference to the table (dt) and added a reference to the Graph Builder object, gb.

3. Line

Here is the code followed by an explanation:

Lines 41 to 44
The equation of a line can be written as:

X2 = m.X1 + c

Drawing the line only requires the line to be evaluated at two points (p1 and p2) corresponding to X1=0 and X1 = 1 respectively.

p1 = {0,c}  and  p2 = {1, m+c}

Line 45
A reference to the Graph Builder window is obtained by sending the Report messsage to the Graph Builder object.

Line 46
This line obtains a reference to the FrameBox, which is the graphical  region of the graph (where the markers are plotted).

Lines 47 to 49
The contents of the FrameBox can be customised by adding graphics scripts.  The code uses the Line function to plot a line between the points p1 and p2.

4. Response

The line defines a boundary.  Data points above the boundary need to be assigned a Y value of +1.  Those points satisfy the criteria:

X2 \geq m.X1 + c

The data points below the line are set to -1.

Ultimately the definition of the line will be interactive (next post!) and so it is helpful if the response if defined as a formula, and the formula is based on externally defined variables.  So I am going to use table variables to store the values of m and c:

The Y column has already been assigned as a Color role on the graph so this formula also results in the markers being colour-coded:

lsg

I now have a script which generates a Y column containing binary values that are separated by a straight line of abitrary orientation.  In my next blog I will add “grab-handles” to the line so that the orientation can be defined interactively by the user.

Finally, here is a full listing of the code:

One thought on “Linearly Separable Data”

Leave a Reply

Your email address will not be published. Required fields are marked *