Segmented Regression

I’m sure there is a more technically correct term for this: I use the phrase segmented regression to describe the process whereby I select a segment of data within a curve and build a regression model for just that segment.

click on the image to see an animated view
click on the image to see an animated view

I have some code to aid the process.  The code illustrates how to perform regression on-the-fly as well as how to utilise the MouseTrap function to handle mouse movement events.

I’m going to list the code below with line numbers and do my best to provide a high-level narrative of how it works.

Line 1

Good (best) practice for JSL coding.  This statement declares variables to be members of a namespace local to the script as opposed to global throughout JMP.  Less well known maybe that you can explicitly reference this namespace by using the word “here”.  I use this notation so that I can reference variables across the functions that are defined within the script.

Lines 3 – 5

This script uses a variable dt to reference a data table and the variables xvar and yvar to reference the data columns.  A trivial example data table is used but any table data could be referenced at this point. I’m referencing the columns by their string names, just a personal preference.

Lines 6 – 9

An Overlay Plot is constructed to display a scatterplot.  Graph Builder could have been used, but I like the brevity of code associated with the Overlay Plot.

Line 10

JMP distinguishes between the analysis platform (think of objects) and the container that displays the results (think of windows).  This line retrieves a reference to the report window for the overlay plot.

Lines 11 – 13

This is the important code.  MouseTrap is used to assign 2 event handlers.  The first handler, which I have named MouseDown deals with mouse movement events (this events occur continuously whilst the mouse button is held down and moving).  The second handler, which I have named MouseUp handles the mouse click event (a click doesn’t occur until you release the mouse key!).  Just to be clear, MouseTrap is a JSL function (you can find it in the online help) whereas the two event handlers are my own user-defined functions.  I could have written all of this on a single line; but splitting it across lines makes it more readable and doesn’t affect execution times.

Lines 14 – 19

Display boxes have been appended to the Overlay Plot report window.  The Text Boxes contain brief user instructions.  The Outline Box is a standard JMP user interface component for organising content (it appears as a disclosure triangle with a border heading.

Lines 20 – 21

As the mouse moves a sequence of mouse coordinates will be generated.  The x and y components of these coordinates will be stored in the lists that have been initialised on these lines.

Lines 23 – 43

These lines define the function MouseDown.  As a result of the MouseTrap, this function gets called whenever the user performs a mouse-click or a click & drag movement.  Details of the function are explained below.

Line 23

This is the declaration of the MouseDown function.  Note that the function takes two arguments px and py.  This are the x and y positional coordinates of the mouse (with respect to the graph axis coordinate system).

Lines 24 – 25

The first thing I do is save the current coordinates to a list.  As the mouse is moved the list will build up a history of the mouse movements.

Lines 26 – 33

I identify the first and last elements of the list of x-coordinates.  These values are then used to select the corresponding data in the data table.

Lines 34 – 39

In addition to selecting the data points a light gray rectangle is constructed to show the range of the selected data.  This rectangle is drawn by adding a graphics script to the Overlay Plot (specifically to the FrameBox, which is the central graphical area of the plot).

Lines 40 – 42

Line 41 performs the segmented regression by making a call to the user-defined function Fit Line Segment.  The call is within a try-block to protect against any unexpected errors that might occur (e.g. trying to fit a line when no data is selected).

Lines 45 – 49

These lines define the function MouseUp.  When the user releases the mouse button we nolonger want to select data or update the regression line, so this function simply does some housekeeping – clearing the select row status within the table and clearing the lists that contain the saved position coordinates.

Lines 51 – 82

These lines define the function Fit Line Segment.  This function is called by the MouseDown function (line 41) and is responsible for constructing a linear regression using only the data that has been selected.  The regression itself is performed using a matrix formulation that I describe in a separate post.

Lines 52 – 57

The MouseDown function selects data in the data table prior to calling this function.  These lines identify the selected rows, and abort if none are selected.  The variable sel is a row matrix that is applied to the data in lines 56, 57 to build a vectors of data values associated with the selected rows.

Lines 58 – 59

These lines identify the low and high x values within the selected data.  The values are used to control the range over which the regression line will be drawn.

Lines 60 – 61

Line 60 constructs an X matrix (confusingly called XX!) and this matrix, together with a vector of y values (yVector) is used to generate the parameter coefficients that result from least squares regression (line 61, in matrix form).

Lines 62 – 63

Given the low and high values of x (startX and endX from lines 58,59) and the equation of the line, the start and end Y values can now be calculated.  All the information required to draw the regression line has now been established.

Lines 64 – 71

This code is responsible for drawing the regression line on the graph.

Lines 72 – 81

Drawing the regression line is really just a method of providing visual feedback to the user.  The real goal is to use the regression line to measure the slope.  These lines of code display the slope as a message in the top-left of the graph (at the positions determined on lines 73,74) and also in the status area of the window (line 81).

Lines 84 – 106

These are just utility functions to determine the range of the axes for the plot.  They are used in lines 73,74 to calculation a position at the top-left of the graph with coordinates {xpos,ypos}.

Share the joy:

Leave a Reply