Tag Archives: computing

Geek Corner: Parallelization Of Forecasting and Risk

20 Apr

A problem that has plagued me greatly here in the cave is the lack of computational resources when it comes to generating much of the data that I use to gain a better picture of the bond markets. Firstly, generating ARIMA based forecasts for 40+ instruments and derivative time series took around 5 minutes. That’s not a long time, but when I recently shipped the task off to a significantly slower computer, the Arima selection process went from 5 to nearly 30 minutes. Likewise, generating VaR sensitivities for around 12 instruments took even longer. In short, to generate most of the analytical content and data needed to write my commentary and research here took an excruciatingly long amount of time from getting the data into the database, managing the data and then applying processes in R.

I recently installed the foreach and SNOW (Simple Network Of Workstations) R packages for Windows and set out to decrease the time for these 2 tasks, as they represent the lions share of the time in the process. The test machine that I used was a 8 core machine running Windows Server 2008. You can see the result here for VaR Sensitivities generation:

User System Elapsed
Before 722.03 1.57 763.18
After 0.04 0.25 572.01

ARIMA forecasts only now take less than a minute across 45 instruments. That includes selecting the best ARIMA model in which to generate the fit, applying the fit and generating a N-day forecast.

So how did I achieve this exactly? Simple. SNOW can be used for multiple cores/processors residing on the same machine, or can be used across networks. While I won’t get into setting up a SNOW cluster across a network, the syntax is mostly the same with a few extra steps. For those looking to troubleshoot getting a SNOW cluster working on both remote Win/Linux workstations, I have one hint for you when you start the cluster, use the master=TRUE statement when initializing the cluster.

print(‘Setting Up SNOW Cluster for parallelization’)

require(snow)

require(doSNOW)

require(foreach)

#Define the functions that we want parallelized across cores/cpus

parallel.arima <- function(data) {

library(forecast)

fit = auto.arima(ts(data), approximation=TRUE, allowdrift=TRUE)

}

cl.tmp = makeCluster(rep(‘localhost’,8), type=’SOCK’)  #define that we want 8 instances, one for each core

registerDoSNOW(cl.tmp) #register the cluster with the ‘foreach’ command

res = foreach(dat=MyGlobalInstruments) %dopar% parallel.arima(dat)

Now, you may ask how do you get the results (the arima fit for each instrument) out of the res object that is created from foreach? Simple:

res[[1]] #retrieves the first result

res[[2]] #retrieves the second result

forecast(res[[1]], h=50)

Now, for the VaR Sensitivity graphs, I recommend that you run these on the SAME machine (i.e. create a cluster spanning multiple cores). The function that we’re going to pass to foreach will create a PNG file for each VaR Sens. graph that’s created. You can guess what will happen if you distribute the task out to multiple machines. You will have graphs scattered all throughout your cluster.

Using the same cluster object as before, the steps are pretty much the same:

#define the parallelization function

run.sens <- function(R) {

library(PerformanceAnalytics)

png(file=paste(‘VAR-Sens-‘,R,’.png’, sep=”), width=500, height=500)

chart.VaRSensitivity(R, methods=c(‘HistoricalVaR’, ‘ModifiedVaR’, ‘GaussianVaR’), clean=’geltner’, colorset=bluefocus, lwd=2)

dev.off()

}

#let’s do it, using the instrument returns

foreach(R=MyGlobalInstruments.returns) %dopar% run.sens(R)

#Always good to shutdown the cluster when we’re done, to free up resources

stopCluster(cl.tmp)

rm(cl.tmp)

Just by doing those things can make a vast improvement to any process in R. I wish I learned this little trick of being able to parallelize the ‘embarrassingly parallel’ tasks earlier.

Advertisements