"the programmer should make a number of definite assertions which can be checked individually, and from which the correctness of the whole program follows" - Alan Turin
In my last post I illustrated the performance boost generated by using matrix operations to conduct least squares regression calculations. Matrices by their nature require numerical data. So what about handling a categorical predictor variable? To do this it’s necessary to create dummy variables – separate variables for each unique level of the predictor variable.
I’m working on some predictive modelling projects and I need to iteratively compute R2 statistics over 100’s of variables. Each time I do the calculations I need to go and have an extended coffee break – and I’m starting to buzz with too much caffeine so I thought I would look to see whether I could make my code more efficient!