I am using techniques from information theory as applied to financial market trading, based on the approach of theorist Thomas Cover and his idea of universal portfolios, but combined with pattern-matching algorithms implemented in a highly optimised manner for large data sets.
If an investor wanted to know how they should distribute their wealth among a number of stocks in the stock market on any given trading day, the approach used is to look at how the market behaved in the past few days and to find days further in the past when the market behaved in a similar manner.
The assumption is what happened in the past is most likely to happen again. This assumption needs to be carefully tested, but it does appear to be a useful idea.
Each trader or investment agent, which is represented by a strategy or model, uses different methods of defining these similarities or patterns in the market.
I am testing these strategies on daily data from the New York Stock Exchange and the JSE. The different agents are then tested against faster or higher frequency data, for example, on minute time scales.
Our research is important from the perspective of learning how to manage big-data pipelines and high-performance computing infrastructure, but also in simplifying accessibility of such technologies and techniques for teaching the next generation of applied mathematicians.
For us, however, this is primarily about understanding our fundamental science questions: Are similar patterns of information detectable in high-frequency financial market data? How can one represent financial markets on the microscale? Can this be used for autonomous unsupervised portfolio control and, if so, on what timescales?
The implications for our understanding of stock markets are interesting because not only do these types of strategies provide simple computer-based learning agents the ability to outperform human traders and investors, they can also teach us something about how predictable or unpredictable stock markets are.
Some parameters in the models used to define the trading agents are in terms of lengths of time: how long to take predictions. So this work can also be used to learn something about predictions and the timescales of predictions before they get washed away by other news or information.
I am also collaborating with scientific and research systems managers Shunmuga Pillay and Brian Miastry, from mathematical sciences support services at the University of the Witwatersrand, to build and manage a 64 CPU distributed computing framework and workflow using the Matlab MDCE model for both our research projects and for teaching the use of high-performance cloud computing.
With the massive data sets prepared by graduate students Michael Harvey and Dieter Hendricks in conjunction with our research group, we use this distributed computing facility to try to better understand predictability on the JSE. This human-powered computer facility is central to our big-data projects and it forms a framework directly and easily accessible to students in the school of computer science and applied mathematics, such as those in the advanced mathematics of finance programme.
I have successfully defended my master’s by research project proposals at Wits University while preparing the required data sets and computational infrastructure and plan two papers, one to be presented at an international conference in December 2015, and another that is being prepared as a research paper.
Fayyaaz Loonat is a master’s student at Wits University’s school of computer science and applied mathematics
The quest for quick-time prediction and data-driven perfection
I am investigating whether the high-frequency trade price of an asset trading on the JSE can be predicted on the fastest timescales and, if so, how the ability to predict changes with time.
For example, is there enough information with low enough noise to make a useful prediction one trade ahead into the future, or one second ahead, perhaps 50 seconds, or three minutes? After what length of time are predictions no longer possible? Five minutes? Fifteen minutes? Trades are events that occur irregularly in time and are a result of buyers and sellers exchanging shares of an asset. The trade price is the price at which a single exchange takes place. To predict the trade price, this research uses ideas from nonlinear time series analysis to build a model. This model needs to be represented in terms of variables to describe the price behaviour from observing price sequence . The sequence of prices we are interested is a special version of what one actually measures in real financial markets. With such a model, one can then use machine-learning techniques, such as nearest-neighbour searches, to find similar pieces of information in the past, patterns in high-frequency trade prices, and use these regularities to understand predictions and how long these possible periods of predictability can last for. The implications for our understanding of stock markets are profound. Of course, if the trade price is predictable and this predictability is exploitable, an agent can use the information to inform how to trade and invest in real markets. More important, though, is that, if we can understand different kinds of regularities in financial markets, perhaps we can better understand what properties of markets are natural to all markets and what properties are a result of how different markets function, or the result of different components of markets. This can be used to teach us something about how markets should be built to better serve society, the ultimate end user of financial markets. Working with colleagues, I am developing various data processing techniques for the many terabytes of financial data we need to manage to estimate and investigate the models. The construction and management of a data pipeline is central to any data-driven project, more so for big-data-driven projects in which significant time and care is required in preparing data, and the management of data, given that small mistakes in data processing and management have significant unhelpful consequences for the processing of the data. Moving large data sets around for computation is a bottleneck in the data pipeline and has to be managed with thoughtful planning. – Michael Harvey Michael Harvey is a master’s student at Wits University’s school of computer science and applied mathematics