Machine Learning and AI are the hot concepts being applied in all possible industries given the shift to digital and the associated ease in collecting data. For a bit of background, the core concepts of AI including neural networks came into being a few decades ago. Yet, the firehose of data and economics associated with Moore’s law meant that ML/AI really started hitting their strides around the mid 00’s – even as Google, Facebook, Amazon, and others of their ilk centered their business models around data while Silicon Valley got busy investing in research to push the scaling envelope.
History of ML/ AI
The same family of concepts that were refined while optimizing the Human DNA sequencing project now decide the google ad that you see, personalize your Facebook feed, guide amazon’s product recommendation engine, and plan the best delivery routes in our day to day lives. Even financial services jumped onto the gravy train. Even as the financial services sector is very broad, the obvious use cases revolve around fraud detection by credit card companies, monitoring for stock manipulation by the regulatory bodies, and other operational areas where productivity can be boosted. At the bleeding edge, we have automated trade execution algorithms in the trading domains and firms like Renaissance Technologies and Two Sigma which really juice out predictions from available data sets to directly impact their returns.
With this background, in this blog, we consider how we could apply ML/ AI techniques to investing in general and stock selection in particular – how could retail investors really amp up their stock coverage universe and use the cold logic of mathematics to keep emotions aside. This is a problem that people at large can relate to and is a good starting point to ML/ AI given the ease of availability of data. So, without much ado, here are 2 approaches to this problem:
1. Using historical technical data:
When we talk technical analysis, we are approaching the problem of predicting stock price behavior by analyzing stock charts. Say, we are looking at a stock chart – what is it that we are really looking at apart from the squiggly lines. What information does a chart encapsulate? Can this information be used to predict the future? Are there patterns that will repeat themselves? Can we throw enough charts into a machine learning system and teach it to discriminate between a potentially rising stock vs a potentially falling stock – in the same way, that we could teach a machine learning program to discriminate between a cat and a dog given enough data. Think pattern recognition!
The first issue to consider in a pattern recognition system for stocks is price data and its frequency. In the current context, we are not looking at building a solution for an HFT shop, so stock tick level data is a bit much. A simple OHLCV dataset works. The other issue is how much data. If we use too old data, then we could be feeding a different regime to the ML program. If we use too little, then we might not be feeding enough patterns to be statistically relevant. Ultimately, a few years of data should be sufficient (as opposed to say 10 years which would include the Great Financial Crisis and skewed, perhaps irrelevant behavior) based purely on the limitations of personal computing power and the time required to process the information.
So coming back to the information in the charts – we want to pick up the information from a visual domain (where a convoluted neural network would be suited) to a slightly more linear domain where SVMs and decision trees could be used. So we can start considering features like distances from periodic highs and lows, momentum/ trend/ volatility indicators, volume trends, and metrics as well as some metrics that present the current stock’s performance within the context of the performance of the rest of the universe. This is feature engineering. The problem with this approach is that the results would depend on the features that we are presenting to the ML algo. The other approach would be to use Deep learning (Neural networks, RNNs, CNNs, etc) with multiple layers which would obviate the need to engineer features since the neural network would configure itself to take care of this issue. The problem that we can face with Deep Learning is that philosophically we would like to believe that each stock’s data tells its own story. So the behavior of a large-cap like RELIANCE may not be appropriate for a small-cap like VENKYS. This means we would have to create as many neural nets as the stocks in our investable universe (1000+). Besides, any neural network worth its salt needs data points in millions as opposed to the thousands that we want to use. Given the enormity of the exercise, we could eschew Deep Learning and stick to shallow machine learning with ensemble approaches, and with features that we believe cover a wide gamut of indicators.
2. Using historical fundamental data:
When we talk in terms of Fundamental data and creating time-series around this data, the first challenge is that the frequency of changes in fundamental data is lower as compared to price data – companies typically report quarterly information. So, even if we collect 10 years' worth of data, we are looking at 40 snapshots of information to process. If we start sampling too frequently for changes in valuation metrics (say Price/ Equity, Price/ Book Value, EV/ EBITDA) and so on, the delta between these samples starts looking similar to the delta between the price changes in technical data (since the market valuation changes with price, but the operational metrics change quarterly). So when we look at time series of fundamental data, we typically look at small data sets.
Given that we are using stock price returns as the outcomes to determine failure and success, it is relatively easy to label the data for supervised learning approaches. So data collection, while non-trivial, is not that difficult. Selecting the different algos/ packages for pushing data through is also not that difficult.
For both these above approaches, before we run the predictive algorithms, we have to put this data through the wringer of normalization/ standardization, principal component analysis and separate out training data and testing data to check the accuracy of the system.
How was the experience?
Having done this exercise in-house, our experience is that there is definitely a ‘Learning Curve’ in Machine Learning as applied to Investing. We can conceptually understand the different approaches – it is what goes on inside and the interpretation of results that is the biggest challenge. These exercises are inherently that of creating statistical black boxes – some things which predict how a stock is going to perform given its history. And by default, these are garbage in garbage out systems. As long as we don’t forget that the output from the algo is only due to the data that we have fed it, we are doing fine. There is no guarantee that the pattern that the system saw in the data will be repeated. But, what we can do is to calculate the odds. We can calculate that given the current pattern, in the past the stock has been up by 5% in 5 days say 60% of the time. Or that valuation has continued to get more expensive 75% of the time after the stock has hit a new high. A pure fundamental approach would caution us from buying a stock when it is more expensive, but an ML/AI-based pattern recognition system could tell us that historically the valuation has continued to rise after the current state of affairs. This is a nice cushion when we look at things. Emotional investing vs pure math – as long as we are aware that we make an emotional decision whenever we decide to buy or sell (we ‘like/ dislike’ a stock), and we have an unemotional system to either back or refute us, we are happy with the investment process. Welcome, Quantamental!
Current state of the art
Finally, in a recent report by AQR, they argue that Machine Learning ‘will likely apply to problems involving optimizing portfolio construction, such as risk management, transaction cost analysis, and factor construction at first. That is because finance and markets are different from other areas where ML has come to offer up breakthrough research’. What this means is that markets are inherently noisy. A cat can easily morph into a dog. Besides, markets show reflexivity – the perpetuation of self-fulfilling prophecies. Markets are also adaptive. Machine Learning at least in its current avatar, cannot handle this complexity. AI in its current journey is still not fully sentient – Skynet is still under construction. Overall, when painting the portrait before a buy or sell decision, we can still find it very useful to check the output of the system and keep emotion under control.
Also, read our next blog on The ultimate guide to SaaS partner program