Sunday Geekfest: R, Blotter and SNOW

27 Mar

I’ve been experimenting with a number of “R” packages in my work for quite a few years now and recently have stepped up testing ideas. I ran into quite an interesting problem while using the blotter package the other week and the thought that popped into my head was: this is an “embarrasingly parallel” problem. Today’s “Geekfest” shows how to parallelize backtests from the blotter package using SNOW.

I’ve written about SNOW lightly in the past here, basically it’s a cluster of workstations that R uses to speed up the computing time. The previous performance enhancements were here (running VaR sensitivity and ARIMA forecasting) – today’s example will show you how to parallelize a simple trend following system based off of Mebane Faber’s work.

The Problem

Each “run” of the LongTrend.r script in blotter is fine if you have a small set of symbols and a relatively small set of data (back to 2005 or so). What if we wanted to remove as much human interaction in the tests and expand it to test the S&P 500 though? Running the tests in a synchronous fashion would simply eat up your day in terms of time.

LongTrend.r in it’s original (synchronous) state:

# This is a very simple trend following strategy for testing the results of:
# Faber, Mebane T., "A Quantitative Approach to Tactical Asset Allocation."
# Journal of Risk Management (Spring 2007).
# The article proposes a very simple quantitative market-timing model. They
# test the model in sample on the US stock market since 1900 before testing
# it out-of-sample in twenty other markets.

# The article discusses a 200-day simple moving average, which is proposed
# in Jeremy Seigel's book "Stocks for the Long Run" for timing the DJIA. He
# concludes that a simple market timing strategy improves the absolute and
# risk adjusted returns over a buy-and-hold strategy. After all transaction
# costs are included, the timing strategy falls short on the absolute return,
# but still provides a better risk-adjusted return. Siegel also tests timing on
# the Nasdaq composite since 1972 and finds better absolute and risk adjusted
# returns.

# The article implements a simpler version of the 200-day SMA, opting for a
# 10-month SMA. Monthly data is more easily available for long periods of time,
# and the lower granularity should translate to lower transaction costs.

# The rules of the system are relatively simple:
# - Buy when monthly price > 10-month SMA
# - Sell and move to cash when monthly price < 10-month SMA

# 1. All entry and exit prices are on the day of the signal at the close.
# 2. All data series are total return series including dividends, updated monthly.
# For the purposes of this demo, we only use price returns.
# 3. Cash returns are estimated with 90-day commercial paper. Margin rates for
# leveraged models are estimated with the broker call rate. Again, for the
# purposes of this demo, we ignore interest and leverage.
# 4. Taxes, commissions, and slippage are excluded.

# This simple strategy is different from well-known trend-following systems in
# three respects. First, there's no shorting. Positions are converted to cash on
# a 'sell' signal, rather than taking a short position. Second, the entire position
# is put on at trade inception. No assumptions are made about increasing position
# size as the trend progresses. Third, there are no stops. If the trend reverts
# quickly, this system will wait for a sell signal before selling the position.

# Data
# Instead of using total returns data, this demo uses monthly data for the SP500
# downloaded from Yahoo Finance. We'll use about 10 years of data, starting at
# the beginning of 1998.

# Load required libraries
require(quantmod)
require(TTR)
require(blotter)

# Try to clean up in case the demo was run previously
try(rm("account.longtrend","portfolio.longtrend",pos=.blotter),silent=TRUE)
try(rm("ltaccount","ltportfolio","ClosePrice","CurrentDate","equity","GSPC","i","initDate","initEq","Posn","UnitSize","verbose"),silent=TRUE)

# Set initial values
initDate='1997-12-31'
initEq=100000

# Load data with quantmod
print("Loading data")
currency("USD")
stock("GSPC",currency="USD",multiplier=1)
getSymbols('^GSPC', src='yahoo', index.class=c("POSIXt","POSIXct"),from='1998-01-01')
GSPC=to.monthly(GSPC, indexAt='endof')

# Set up indicators with TTR
print("Setting up indicators")
GSPC$SMA10m <- SMA(GSPC[,grep('Adj',colnames(GSPC))], 10) # Set up a portfolio object and an account object in blotter print("Initializing portfolio and account structure") ltportfolio='longtrend' ltaccount='longtrend' initPortf(ltportfolio,'GSPC', initDate=initDate) initAcct(ltaccount,portfolios='longtrend', initDate=initDate, initEq=initEq) verbose=TRUE # Create trades for( i in 10:NROW(GSPC) ) { # browser() CurrentDate=time(GSPC)[i] cat(".") equity = getEndEq(ltaccount, CurrentDate) ClosePrice = as.numeric(Ad(GSPC[i,])) Posn = getPosQty(ltportfolio, Symbol='GSPC', Date=CurrentDate) UnitSize = as.numeric(trunc(equity/ClosePrice)) # Position Entry (assume fill at close) if( Posn == 0 ) { # No position, so test to initiate Long position if( as.numeric(Ad(GSPC[i,])) > as.numeric(GSPC[i,'SMA10m']) ) {
cat('\n')
# Store trade with blotter
addTxn(ltportfolio, Symbol='GSPC', TxnDate=CurrentDate, TxnPrice=ClosePrice, TxnQty = UnitSize , TxnFees=0, verbose=verbose)
}
} else {
# Have a position, so check exit
if( as.numeric(Ad(GSPC[i,])) < as.numeric(GSPC[i,'SMA10m'])) {
cat('\n')
# Store trade with blotter
addTxn(ltportfolio, Symbol='GSPC', TxnDate=CurrentDate, TxnPrice=ClosePrice, TxnQty = -Posn , TxnFees=0, verbose=verbose)
}
}

# Calculate P&L and resulting equity with blotter
updatePortf(ltportfolio, Dates = CurrentDate)
updateAcct(ltaccount, Dates = CurrentDate)
updateEndEq(ltaccount, Dates = CurrentDate)
} # End dates loop
cat('\n')

# Chart results with quantmod
chart.Posn(ltportfolio, Symbol = 'GSPC', Dates = '1998::')
plot(add_SMA(n=10,col='darkgreen', on=1))

#look at a transaction summary
getTxns(Portfolio="longtrend", Symbol="GSPC")

# Copy the results into the local environment
print("Retrieving resulting portfolio and account")
ltportfolio = getPortfolio("longtrend")
ltaccount = getAccount("longtrend")

 

###############################################################################
# Blotter: Tools for transaction-oriented trading systems development
# for R (see http://r-project.org/)
# Copyright (c) 2008 Peter Carl and Brian G. Peterson
#
# This library is distributed under the terms of the GNU Public License (GPL)
# for full details see the file COPYING
#
# $Id$
#
###############################################################################

We’re going to make some changes to this, mainly:

  1. Turn this script into a function
  2. Remove all hardcoded symbol references (e.g. GSPC) within the function
  3. Initialize a SNOW cluster
  4. Gather our symbols
  5. Run, Grab a Coffee
  6. Compile The Results

Steps 1 and 2 are done here:


# This is a very simple trend following strategy for testing the results of:
# Faber, Mebane T., "A Quantitative Approach to Tactical Asset Allocation."
# Journal of Risk Management (Spring 2007).
# The article proposes a very simple quantitative market-timing model. They
# test the model in sample on the US stock market since 1900 before testing
# it out-of-sample in twenty other markets.

# The article discusses a 200-day simple moving average, which is proposed
# in Jeremy Seigel's book "Stocks for the Long Run" for timing the DJIA. He
# concludes that a simple market timing strategy improves the absolute and
# risk adjusted returns over a buy-and-hold strategy. After all transaction
# costs are included, the timing strategy falls short on the absolute return,
# but still provides a better risk-adjusted return. Siegel also tests timing on
# the Nasdaq composite since 1972 and finds better absolute and risk adjusted
# returns.

# The article implements a simpler version of the 200-day SMA, opting for a
# 10-month SMA. Monthly data is more easily available for long periods of time,
# and the lower granularity should translate to lower transaction costs.

# The rules of the system are relatively simple:
# - Buy when monthly price > 10-month SMA
# - Sell and move to cash when monthly price < 10-month SMA

# 1. All entry and exit prices are on the day of the signal at the close.
# 2. All data series are total return series including dividends, updated monthly.
# For the purposes of this demo, we only use price returns.
# 3. Cash returns are estimated with 90-day commercial paper. Margin rates for
# leveraged models are estimated with the broker call rate. Again, for the
# purposes of this demo, we ignore interest and leverage.
# 4. Taxes, commissions, and slippage are excluded.

# This simple strategy is different from well-known trend-following systems in
# three respects. First, there's no shorting. Positions are converted to cash on
# a 'sell' signal, rather than taking a short position. Second, the entire position
# is put on at trade inception. No assumptions are made about increasing position
# size as the trend progresses. Third, there are no stops. If the trend reverts
# quickly, this system will wait for a sell signal before selling the position.

# Data
# Instead of using total returns data, this demo uses monthly data for the SP500
# downloaded from Yahoo Finance. We'll use about 10 years of data, starting at
# the beginning of 1998.

longTrend <- function(data) {

# Load required libraries
require(quantmod)
require(TTR)
require(blotter)

# Try to clean up in case the demo was run previously
try(rm("account.longtrend","portfolio.longtrend",pos=.blotter),silent=TRUE)
try(rm("ltaccount","ltportfolio","ClosePrice","CurrentDate","equity",as.character(data),"i","initDate","initEq","Posn","UnitSize","verbose"),silent=TRUE)

print("Loading Symbol from Google Finance")
getSymbols(as.character(data), src="google")

xdat <- get(data)

# Set initial values
## For starters, let's just go with the first day of the timeseries --CJ
initDate=index(xdat)[1]
initEq=100000

# Load data with quantmod
print("Loading data")
currency("USD")
stock(data,currency="USD",multiplier=1)
xdat=to.monthly(xdat, indexAt='endof')

# Set up indicators with TTR
# Using the Close value, ignoring the "Adjusted for dividends, split, etc.. Can further refine
# this in latter versions if using a provider that doesn't automatically do this for you
print("Setting up indicators")
xdat$SMA10m <- SMA(Cl(xdat), 10) # Set up a portfolio object and an account object in blotter print("Initializing portfolio and account structure") ltportfolio='longtrend' ltaccount='longtrend' initPortf(ltportfolio,data, initDate=initDate) initAcct(ltaccount,portfolios='longtrend', initDate=initDate, initEq=initEq) verbose=TRUE # Create trades for( i in 10:NROW(xdat) ) { # browser() CurrentDate=time(xdat)[i] cat(".") equity = getEndEq(ltaccount, CurrentDate) ClosePrice = as.numeric(Cl(xdat[i,])) Posn = getPosQty(ltportfolio, Symbol=data, Date=CurrentDate) UnitSize = as.numeric(trunc(equity/ClosePrice)) # Position Entry (assume fill at close) if( Posn == 0 ) { # No position, so test to initiate Long position if( as.numeric(Cl(xdat[i,])) > as.numeric(xdat[i,'SMA10m']) ) {
cat('\n')
# Store trade with blotter
addTxn(ltportfolio, Symbol=data, TxnDate=CurrentDate, TxnPrice=ClosePrice, TxnQty = UnitSize , TxnFees=0, verbose=verbose)
}
} else {
# Have a position, so check exit
if( as.numeric(Cl(xdat[i,])) < as.numeric(xdat[i,'SMA10m'])) {
cat('\n')
# Store trade with blotter
addTxn(ltportfolio, Symbol=data, TxnDate=CurrentDate, TxnPrice=ClosePrice, TxnQty = -Posn , TxnFees=0, verbose=verbose)
}
}

# Calculate P&L and resulting equity with blotter
updatePortf(ltportfolio, Dates = CurrentDate)
updateAcct(ltaccount, Dates = CurrentDate)
updateEndEq(ltaccount, Dates = CurrentDate)
} # End dates loop
cat('\n')

# Chart results with quantmod
#chart.Posn(ltportfolio, Symbol = 'GSPC', Dates = '1998::')
#plot(add_SMA(n=10,col='darkgreen', on=1))

#look at a transaction summary
#getTxns(Portfolio="longtrend", Symbol="GSPC")

# Copy the results into the local environment
#print("Retrieving resulting portfolio and account")
#ltportfolio = getPortfolio("longtrend")
#ltaccount = getAccount("longtrend")

return(getPortfolio('longtrend'))
}

 

Now, let’s test it out. In the console, type the following:

> source("LongTrend.r")
> longTrend("BAC")
Loading required package: quantmod
Loading required package: Defaults
Loading required package: xts
Loading required package: zoo
Loading required package: TTR
Loading required package: blotter
Loading required package: FinancialInstrument
[1] "Loading Symbol from Google Finance"
[1] "Loading data"
[1] "Setting up indicators"
[1] "Initializing portfolio and account structure"
......................
[1] "2009-07-31 BAC 6761 @ 14.79"
..........
[1] "2010-05-31 BAC -6761 @ 15.74"
........
[1] "2011-01-31 BAC 6766 @ 13.73"
..

If all goes well, you should see a result set with the following (make a note, we’re going to need these for analyzing a LOT more results):

  • $symbols$BAC
  • $symbols$BAC$txn  <– All of our transactions from the test
  • $symbols$BAC$posPL and $symbols$BAC$posPL.USD (Running PnL and PnL in the portfolio currency)
  • $summary

Here’s a closer look at the BAC summary:

Long.Value Short.Value Net.Value Gross.Value Realized.PL Unrealized.PL Gross.Trading.PL Txn.Fees Net.Trading.PL
1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
23 99995.19 0.00 99995.19 99995.19 0.00 5544.02 5544.02 0.00 5544.02
24 118925.99 0.00 118925.99 118925.99 0.00 -2636.79 -2636.79 0.00 -2636.79
25 114396.12 0.00 114396.12 114396.12 0.00 -1622.64 -1622.64 0.00 -1622.64
26 98575.38 0.00 98575.38 98575.38 0.00 -7775.15 -7775.15 0.00 -7775.15
27 107161.85 0.00 107161.85 107161.85 0.00 2569.18 2569.18 0.00 2569.18
28 101820.66 0.00 101820.66 101820.66 0.00 -67.61 -67.61 0.00 -67.61
29 102631.98 0.00 102631.98 102631.98 0.00 -1284.59 -1284.59 0.00 -1284.59
30 112638.26 0.00 112638.26 112638.26 0.00 743.71 743.71 0.00 743.71
31 120683.85 0.00 120683.85 120683.85 0.00 608.49 608.49 0.00 608.49
32 120548.63 0.00 120548.63 120548.63 0.00 -3177.67 -3177.67 0.00 -3177.67
33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
34 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
35 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
36 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
37 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
38 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
39 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
41 92897.18 0.00 92897.18 92897.18 0.00 879.58 879.58 0.00 879.58
42 96686.14 0.00 96686.14 96686.14 0.00 608.94 608.94 0.00 608.94
43 90258.44 0.00 90258.44 90258.44 0.00 -947.24 -947.24 0.00 -947.24

Starting a SNOW Cluster

Now, let’s say we have a list of symbols that we want to test. Yes, we could sit here and go through them all one by one manually but what fun would that be? Not much at all. SNOW = Simple Network Of Workstations, you could do this with several R packages theoretically (multicore, MPI, doSMP, etc.) You will need the following to get a snow cluster running:

  1. Workstations running Linux (either on your LAN or hosted sites like Linode), SSH port 22 (SNOW doesn’t support non-standard SSH ports at this time)
  2. Preferrably you will have ssh key authentication (passwordless) setup between your “Master” and the SNOW “Slaves”
  3. All workstations should have the SAME version of quantmod, blotter and required packages.

If one of the clusters fails in processing the job, it will mess up the entire result set. The way I do it here is quite simple: all R instances on all machines are running the same version (2.12.1) and all packages are kept up to date. If I make any changes to local packages, I build and deploy them to all the workstations. That makes troubleshooting this a heck of a lot easier.

So, let’s get started, load up our SNOW and related packages:

> require(snow)
require(doSNOW)
require(foreach)
Loading required package: snow

Attaching package: 'snow'

The following object(s) are masked from 'package:base':

enquote

 

Loading required package: doSNOW
Loading required package: foreach
Loading required package: iterators
Loading required package: codetools
foreach: simple, scalable parallel programming from REvolution Computing
Use REvolution R for scalability, fault tolerance and more.
http://www.revolution-computing.com
>

Next, let’s make the actual cluster and register it with the SNOW function (we’ll see why this is important later):

cl.tmp = makeCluster(c(rep("localhost",2), rep("10.80.10.1",4), rep("10.80.11.20",2)), type="SOCK", master="10.80.11.6")
registerDoSNOW(cl.tmp)

You should see the snow cluster start with output such as this:

Attaching package: 'snow'

The following object(s) are masked from 'package:base':

enquote

Once the cluster is initialized, we need to do a few more things to get this to work properly. First, i’ll explain what we’ve done in the previous steps: the call to makeCluster initialized 8 R instances across 3 different machines. 2 of those instances were on the localhost (you can change this up or down depending on your available processing power). It’s important to define the master (in this case 10.80.11.6 is localhost) so that the spawned R processes know where to send their data back to when we call foreach. Type is pretty self-explanatory, it’s a SOCKET connection that we’re making. You can see more about the inner workings of the cluster by typing cl.tmp on the R command line and it will show you something similar to this:

[[6]]
$con
description class mode text opened can read
"<-unknown:10187" "sockconn" "a+b" "binary" "opened" "yes"
can write
"yes"

$host
[1] "10.80.11.1"

attr(,"class")
[1] "SOCKnode"

 

attr(,"class")
[1] "SOCKcluster" "cluster"

So, let’s get our symbols together, shall we?

Let’s say I want to test just the basic financials for now using Mebane’s system. That’s simple. At the command prompt type:
SymbolList = c("BAC","C","JPM","GS","JEF","MS")

We already have the function defined (longTrend), so let’s go ahead and kick off the job to the cluster(s):

testResult = foreach(dat=SymbolList) %dopar% longTrend(dat)

You can get some coffee, check your processes on the machine with ‘htop’.

Part II we will look at the results from the cluster. That’s enough for now in one post.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: