Linearly Separable Data

In my last post I outlined some “homework” that I had set myself – to write a script that would create linearly separable data. I want the ability to create it in an interactive environment. But before I create the interactivity I want to get the foundations correct. So in this post I will build the code but with the interactive elements.

In the last post I identified these four key elements to the script:

Data
Graph
Line
Response

Let’s take a look at each of these in turn.

1. Data

The start point is to create a data table containing columns for two input variables (X1 and X2) and a binary response (Y).

Names Default To Here(1);

// create X1 and X2 data and set colour 
// properties for binary response
nDataPoints = 100;
dt = New Table("Linearly Separable Data",
	Add Rows(nDataPoints),
	New Column("X1", 
		Numeric, 
		Continuous, 
		Format( "Fixed Dec", 10, 2 ),
		Set Formula( Random Uniform() ) 
	),
	New Column("X2", 
		Numeric, 
		Continuous, 
		Format( "Fixed Dec", 10, 2 ),
		Set Formula( Random Uniform() ) 
	),
	New column("Y",
		Numeric, 
		Nominal, 
		Set Property( "Value Colors", {-1 = 3, 1 = 4)
	)
);

Names Default To Here(1);

// create X1 and X2 data and set colour

// properties for binary response

nDataPoints = 100;

dt = New Table("Linearly Separable Data",

Add Rows(nDataPoints),

New Column("X1",

Numeric,

Continuous,

Format( "Fixed Dec", 10, 2 ),

Set Formula( Random Uniform() )

New Column("X2",

Numeric,

Continuous,

Format( "Fixed Dec", 10, 2 ),

Set Formula( Random Uniform() )

New column("Y",

Numeric,

Nominal,

Set Property( "Value Colors", {-1 = 3, 1 = 4)

)

);

Lines 5 & 7
The number of data points in the table have been set as a variable so that this can be easily changed (and ultimately specified by the user.

Lines 6 to 25
This is the New Table function. If it looks like a lot of code that’s just because I have put each New Column field on a separate line. It could have been written like this:

dt = New Table("Linearly Separable Data",
	Add Rows(nDataPoints),
	New Column("X1", Numeric, Continuous, Format( "Fixed Dec", 10, 2 ),Set Formula( Random Uniform() ) ),
	New Column("X2", Numeric, Continuous, Format( "Fixed Dec", 10, 2 ),Set Formula( Random Uniform() ) ),
	New column("Y",Numeric, Nominal, Set Property( "Value Colors", {-1 = 3, 1 = 4} ))
);

dt = New Table("Linearly Separable Data",

Add Rows(nDataPoints),

New Column("X1", Numeric, Continuous, Format( "Fixed Dec", 10, 2 ),Set Formula( Random Uniform() ) ),

New Column("X2", Numeric, Continuous, Format( "Fixed Dec", 10, 2 ),Set Formula( Random Uniform() ) ),

New column("Y",Numeric, Nominal, Set Property( "Value Colors", {-1 = 3, 1 = 4} ))

);

– more compact but ultimately less readable.

Lines 12 & 18
The function Random Uniform generates random numbers uniformly from the range 0 to 1.

Lines 11 & 17
The numbers have been formatted so that they display with two decimal places. This is purely cosmetic.

Lines 20 to 24
This is the definition of the response column. The response is binary. Setting the modelling type to Nominal will assist when I want to colour the data points based on the binary value (it will prevent JMP trying to use a continuous scale for the colour).

Line 23
The Value Colors property of the column can be used to pre-assign colours to the binary levels (-1 and +1) of the response. To figure out the syntax for this property you can refer to the documentation or get JMP to do it for you:

2. Graph

The above code can be used to create a data table containing 100 rows of X1 and X2 data. Using Graph Builder a scatterplot of X2 versus X1 can be produced. Y can be applied to the Color role (even though it doesn’t yet have any data). The range of the axes can be modified to be from 0 to 1 and the marker sizes can be made as large as possible. With all of these edits JMP will create the following code (with a couple of exceptions listed below):

// create graph of X1 versus X2 with Y assigned as colour
gb = dt << Graph Builder(
	Size( 500, 500 ),
	Show Control Panel( 0 ),
	Variables( X( :X1 ), Y( :X2 ), Color( :Y ) ),
	Elements( Points( X, Y, Legend( 17 ) ) ),
	SendToReport(
		Dispatch( {}, "X1", ScaleBox, {Min(0),Max(1),Inc(1)} ),
		Dispatch( {}, "X2", ScaleBox, {Min(0),Max(1),Inc(1)} ),
		Dispatch( {}, "graph title", TextEditBox, {Set Text( "" )} ),
		Dispatch( {}, "Graph Builder", FrameBox, {Marker Size( 6 )} )
	)
);

// create graph of X1 versus X2 with Y assigned as colour

gb = dt << Graph Builder(

Size( 500, 500 ),

Show Control Panel( 0 ),

Variables( X( :X1 ), Y( :X2 ), Color( :Y ) ),

Elements( Points( X, Y, Legend( 17 ) ) ),

SendToReport(

Dispatch( {}, "X1", ScaleBox, {Min(0),Max(1),Inc(1)} ),

Dispatch( {}, "X2", ScaleBox, {Min(0),Max(1),Inc(1)} ),

Dispatch( {}, "graph title", TextEditBox, {Set Text( "" )} ),

Dispatch( {}, "Graph Builder", FrameBox, {Marker Size( 6 )} )

)

);

– line numbers are based on the full script, so this block of code follows on from the data creation.

Line 26
I have added an explicit reference to the table (dt) and added a reference to the Graph Builder object, gb.

3. Line

Here is the code followed by an explanation:

// draw line: y=mx+c
m = -0.4;
c = 0.7;
p1 = {0,c};
p2 = {1,m+c};
rep = gb << Report;
fb = rep[FrameBox(1)];
fb << Add Graphics Script(
	Line(p1,p2)
);

// draw line: y=mx+c

m = -0.4;

c = 0.7;

p1 = {0,c};

p2 = {1,m+c};

rep = gb << Report;

fb = rep[FrameBox(1)];

fb << Add Graphics Script(

Line(p1,p2)

);

Lines 41 to 44
The equation of a line can be written as:

$X2 = m.X1 + c$

Drawing the line only requires the line to be evaluated at two points (p1 and p2) corresponding to X1=0 and X1 = 1 respectively.

p1 = {0,c} and p2 = {1, m+c}

Line 45
A reference to the Graph Builder window is obtained by sending the Report messsage to the Graph Builder object.

Line 46
This line obtains a reference to the FrameBox, which is the graphical region of the graph (where the markers are plotted).

Lines 47 to 49
The contents of the FrameBox can be customised by adding graphics scripts. The code uses the Line function to plot a line between the points p1 and p2.

4. Response

The line defines a boundary. Data points above the boundary need to be assigned a Y value of +1. Those points satisfy the criteria:

$X2 \geq m.X1 + c$

The data points below the line are set to -1.

Ultimately the definition of the line will be interactive (next post!) and so it is helpful if the response if defined as a formula, and the formula is based on externally defined variables. So I am going to use table variables to store the values of m and c:

// binary response
// 1 if above line, -1 if below
dt << Set Table Variable("m",m);
dt << Set Table Variable("c",c);
Column(dt,"Y") << Set Formula(
	If ( :X2 > m*:X1+c,
		1
	,
		-1
	)
);

// binary response

// 1 if above line, -1 if below

dt << Set Table Variable("m",m);

dt << Set Table Variable("c",c);

Column(dt,"Y") << Set Formula(

If ( :X2 > m*:X1+c,

-1

)

);

The Y column has already been assigned as a Color role on the graph so this formula also results in the markers being colour-coded:

I now have a script which generates a Y column containing binary values that are separated by a straight line of abitrary orientation. In my next blog I will add “grab-handles” to the line so that the orientation can be defined interactively by the user.

Finally, here is a full listing of the code:

Names Default To Here(1);

// create X1 and X2 data and set colour properties for binary response
nDataPoints = 100;
dt = New Table("Linearly Separable Data",
	Add Rows(nDataPoints),
	New Column("X1", 
		Numeric, 
		Continuous, 
		Format( "Fixed Dec", 10, 2 ),
		Set Formula( Random Uniform() ) 
	),
	New Column("X2", 
		Numeric, 
		Continuous, 
		Format( "Fixed Dec", 10, 2 ),
		Set Formula( Random Uniform() ) 
	),
	New column("Y",
		Numeric, 
		Nominal, 
		Set Property( "Value Colors", {-1 = 3, 1 = 4} )
	)
);
	
// create graph of X1 versus X2 with Y assigned as colour
gb = dt << Graph Builder(
	Size( 500, 500 ),
	Show Control Panel( 0 ),
	Variables( X( :X1 ), Y( :X2 ), Color( :Y ) ),
	Elements( Points( X, Y, Legend( 17 ) ) ),
	SendToReport(
		Dispatch( {}, "X1", ScaleBox, {Min(0),Max(1),Inc(1)} ),
		Dispatch( {}, "X2", ScaleBox, {Min(0),Max(1),Inc(1)} ),
		Dispatch( {}, "graph title", TextEditBox, {Set Text( "" )} ),
		Dispatch( {}, "Graph Builder", FrameBox, {Marker Size( 6 )} )
	)
);

// draw line: y=mx+c
m = -0.4;
c = 0.7;
p1 = {0,c};
p2 = {1,m+c};
rep = gb << Report;
fb = rep[FrameBox(1)];
fb << Add Graphics Script(
	Line(p1,p2)
);

// binary response
// 1 if above line, -1 if below
dt << Set Table Variable("m",m);
dt << Set Table Variable("c",c);
Column(dt,"Y") << Set Formula(
	If ( :X2 > m*:X1+c,
		1
	,
		-1
	)
);

Names Default To Here(1);

// create X1 and X2 data and set colour properties for binary response

nDataPoints = 100;

dt = New Table("Linearly Separable Data",

Add Rows(nDataPoints),

New Column("X1",

Numeric,

Continuous,

Format( "Fixed Dec", 10, 2 ),

Set Formula( Random Uniform() )

New Column("X2",

Numeric,

Continuous,

Format( "Fixed Dec", 10, 2 ),

Set Formula( Random Uniform() )

New column("Y",

Numeric,

Nominal,

Set Property( "Value Colors", {-1 = 3, 1 = 4} )

)

);

// create graph of X1 versus X2 with Y assigned as colour

gb = dt << Graph Builder(

Size( 500, 500 ),

Show Control Panel( 0 ),

Variables( X( :X1 ), Y( :X2 ), Color( :Y ) ),

Elements( Points( X, Y, Legend( 17 ) ) ),

SendToReport(

Dispatch( {}, "X1", ScaleBox, {Min(0),Max(1),Inc(1)} ),

Dispatch( {}, "X2", ScaleBox, {Min(0),Max(1),Inc(1)} ),

Dispatch( {}, "graph title", TextEditBox, {Set Text( "" )} ),

Dispatch( {}, "Graph Builder", FrameBox, {Marker Size( 6 )} )

)

);

// draw line: y=mx+c

m = -0.4;

c = 0.7;

p1 = {0,c};

p2 = {1,m+c};

rep = gb << Report;

fb = rep[FrameBox(1)];

fb << Add Graphics Script(

Line(p1,p2)

);

// binary response

// 1 if above line, -1 if below

dt << Set Table Variable("m",m);

dt << Set Table Variable("c",c);

Column(dt,"Y") << Set Formula(

If ( :X2 > m*:X1+c,

-1

)

);

JMP Ahead

Linearly Separable Data

1. Data

2. Graph

3. Line

4. Response

One thought on “Linearly Separable Data”

Leave a Reply Cancel reply

Insights in the use of JMP and the scripting language JSL