Nonlinear Support Vector Machines Can Systematically Identify Stocks with High and Low Future Returns
15 Pages Posted: 20 Sep 2011 Last revised: 20 Jan 2014
Date Written: September 6, 2012
This paper investigates the profitability of a trading strategy based on training a model to identify stocks with high or low predicted returns. A tail set is defined to be a group of stocks whose volatility-adjusted price change is in the highest or lowest quantile, for example the highest or lowest 5%. Each stock is represented by a set of technical and fundamental features computed using CRSP and Compustat data. A classifier is trained on historical tail sets and tested on future data. The classifier is chosen to be a nonlinear support vector machine (SVM) due to its simplicity and effectiveness. The SVM is trained once per month, in order to adjust to changing market conditions. Portfolios are formed by ranking stocks using the classifier output. The highest ranked stocks are used for long positions and the lowest ranked ones for short sales. The Global Industry Classification Standard is used to build a model for each sector such that a total of 8 long-short portfolios for Energy, Materials, Industrials, Consumer Discretionary, Consumer Staples, Health Care, Financials, and Information Technology are formed. The data range from 1981 to 2010. Without measuring trading costs, but using 91 day holding periods to minimize these, the strategy leads to annual excess returns (Jensen alpha) of 15% with volatilities under 8% using the top 25% of the stocks of the distribution for training long positions and the bottom 25% for the short ones.
Keywords: support vector machines, machine learning, reinforcement learning, sector neutral
Suggested Citation: Suggested Citation