A neuron is a single node within a neural network. By analogy with neurons within the brain we can think of a neuron “firing” in response to an input trigger, and we can think of machine learning as the process of training the neuron to recognise that input trigger.
Most of the work associated with building a predictive model is associated with either performance tuning or data prepping.
I’m almost half way through prepping some data. It’s not necessary to script this but a script allows me to adjust the data preparation in the future and more importantly to document the sequence of steps that I have taken.
In a recent post I created a table that contained two classes of data: images that represent either the handwritten digit ‘5’ or the digit ‘6’. In this post I’ll model the data using logistic regression. I will also take the opportunity to look at the role of training and test datasets, and to highlight the distinction between testing and validation.
In my last post I illustrated the performance boost generated by using matrix operations to conduct least squares regression calculations. Matrices by their nature require numerical data. So what about handling a categorical predictor variable? To do this it’s necessary to create dummy variables – separate variables for each unique level of the predictor variable.
I’m working on some predictive modelling projects and I need to iteratively compute R2 statistics over 100’s of variables. Each time I do the calculations I need to go and have an extended coffee break – and I’m starting to buzz with too much caffeine so I thought I would look to see whether I could make my code more efficient!