Machine Learning Prediction Model for SPY

A quantitative research project using machine learning to predict 15-day forward returns on SPY, with multiple models tested and validated against buy-and-hold performance.

The Brief

The client wanted to explore whether a machine learning model could identify market conditions in SPY where the probability of a positive return over the next 15 days was high enough to act on, with better risk-adjusted performance than simply holding the market.

What We Did

Data Collection We built a custom MQL5 tool to calculate and record indicator values directly from the chart into a structured dataset. This ensured the data matched exactly what a trader would see, with over 5 years of SPY daily history.

Variable Analysis Before building any model, we analysed each variable independently against forward returns to understand which conditions were actually predictive. The feature set covered RSI, CCI, ADX, and Z-Score, along with their rate of change over multiple lookback periods, and directional momentum over 15, 25, and 40-day windows.

Model Testing We tested multiple approaches: logistic regression, random forest, gradient boosting, and ensemble combinations. Each model was evaluated on training, validation, and test sets split chronologically to avoid look-ahead bias. Random forest showed clear signs of overfitting. Gradient boosting was comparable to logistic regression but without meaningful improvement. Logistic regression produced the most reliable and generalisable results.

Strategy Logic Rather than investing all capital at once on a signal, the strategy allocates a small slice of capital each day a buy signal fires over a 15-day window. This creates a staggered entry approach that smooths out timing risk. On average, the model kept around 50% of capital deployed at any given time, only committing when signals were strong.

Results

On the combined validation and test set, the model produced a 34.5% cumulative return with a maximum drawdown of 6.87%, giving a return-to-drawdown ratio of 5.02. Across the full dataset, it returned 90.93% with an 11.84% max drawdown and a ratio of 7.68.

The model achieves this by staying invested only around 50% of the time, deploying capital selectively when signals are strong rather than holding through all market conditions. This results in significantly lower drawdowns and smoother capital growth compared to passive exposure.