Performance Trap

I was recently processing a number of files using pattern matching. During the processing I was storing information in lists which were subsequently used to populate new JMP data tables.

hit computerEverything worked fine until I increased the number of files by a factor of 10.

After some time I started hitting ‘escape’ and ‘CTRL-Z’ in a frenetic effort to seize control of my laptop.

I had hit a performance trap associated with inserting data into a list.

InsertInto( myList, value )

If you insert 1000 items into the list you would expect it to take 10 times longer than the time it would take to insert 100 items.  That is, you expect performance to be linear:

lionear performance
In this graph the x-axis represents the number of items inserted into a list, and the y-axis is the overall execution time

However, as the number of items gets  larger the performance significantly degrades.

nonlinear performance

You can investigate this yourself by experimenting with the value for maxN in the code below:

Fortunately there is a simple way out of this performance trap.  The default behaviour of the Insert function is to append the item to the end of the list.  It is however, possible to place the item at the front:

InsertInto( myList, value, 1 )

Unlike the default behaviour, this operation scales linearly with the number of items.  Once the list is fully constructed the order of items can be reversed to produce to original ordering of the data:

ReverseInto( myList )

Here is the code restructured to use this new approach:

And here is the performance (with maxN=10million!)

largeN

2 thoughts on “Performance Trap”

  1. wow. looking at the numbers on the last graph you did 5 million inserts in 10 seconds whereas before you got less than 20k for the same duration. impressed!

Leave a Reply

Your email address will not be published. Required fields are marked *