Imports
import pandas as pdimport numpy as npfrom fbprophet import Prophetimport matplotlib.pyplot as pltfrom functools import reduce%matplotlib inlineimport warningswarnings.filterwarnings('ignore') plt.style.use('seaborn-deep')pd.options.display.float_format = "{:,.2f}".format
For this project we will be importing the standard libraries for data anaysis with Python. We will also import Prophet and reduce from functools which will be used to help simulate our Forecasts.
The Data
stock_price = pd.read_csv('^GSPC.csv',parse_dates=['Date'])
stock_price.info()
<class 'pandas.core.frame.DataFrame'>RangeIndex: 9885 entries, 0 to 9884Data columns (total 7 columns):Date 9885 non-null datetime64[ns]Open 9885 non-null float64High 9885 non-null float64Low 9885 non-null float64Close 9885 non-null float64Adj Close 9885 non-null float64Volume 9885 non-null int64dtypes: datetime64[ns](1), float64(5), int64(1)memory usage: 540.7 KB
stock_price.describe()
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
count | 9,885.00 | 9,885.00 | 9,885.00 | 9,885.00 | 9,885.00 | 9,885.00 |
mean | 958.64 | 964.26 | 952.65 | 958.86 | 958.86 | 1,628,930,637.33 |
std | 696.19 | 699.39 | 692.67 | 696.24 | 696.24 | 1,765,031,087.55 |
min | 98.22 | 99.58 | 94.23 | 98.22 | 98.22 | 14,990,000.00 |
25% | 326.24 | 328.00 | 322.98 | 326.35 | 326.35 | 169,010,000.00 |
50% | 955.40 | 965.38 | 949.45 | 955.41 | 955.41 | 805,900,000.00 |
75% | 1,347.74 | 1,355.87 | 1,336.36 | 1,347.74 | 1,347.74 | 3,150,330,000.00 |
max | 2,936.76 | 2,940.91 | 2,927.11 | 2,930.75 | 2,930.75 | 11,456,230,000.00 |
The data we are using is the historical S&P500 prices dating back to 1980. You can find the data here.
Data Preparation
stock_price = stock_price[['Date','Adj Close']]
stock_price.columns = ['ds', 'y']stock_price.head(10)
ds | y | |
---|---|---|
0 | 1980-01-02 | 105.76 |
1 | 1980-01-03 | 105.22 |
2 | 1980-01-04 | 106.52 |
3 | 1980-01-07 | 106.81 |
4 | 1980-01-08 | 108.95 |
5 | 1980-01-09 | 109.05 |
6 | 1980-01-10 | 109.89 |
7 | 1980-01-11 | 109.92 |
8 | 1980-01-14 | 110.38 |
9 | 1980-01-15 | 111.14 |
For prophet to work, we need to change the names of the 'Date' and 'Adj Close' columns to 'ds' and 'y'. The term 'y' is typically used for the target column (what you are trying to predict) in most machine learning projects.
Prophet
stock_price.set_index('ds').y.plot(figsize=(12,6), grid=True);
Before we use Prophet to create a forecast let's visualize our data. It's always a good idea to create a few visualitions to gain a better understanding of the data you are working with.
model = Prophet()model.fit(stock_price)
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
<fbprophet.forecaster.Prophet at 0x21216301c18>
To activate the Prophet Model we simply call Prophet()
and assign it to a variabl called model
. Next fit our stock data to the model by calling the fit
method.
future = model.make_future_dataframe(1095, freq='d')future_boolean = future['ds'].map(lambda x : True if x.weekday() in range(0, 5) else False)future = future[future_boolean] future.tail()
ds | |
---|---|
10973 | 2022-03-07 |
10974 | 2022-03-08 |
10975 | 2022-03-09 |
10976 | 2022-03-10 |
10977 | 2022-03-11 |
To create a forecast with our model we need to create some futue dates. Prophet provides us with a helper function called make_future_dataframe
. We pass in the number of future periods and frequency. Above we created a forecast for the next 1095 days or 3 years.
Since stocks can only be traded on weekdays we need to remove the weekends from our forecast dataframe. To do so we create a boolean expression where if a day does not equal 0 - 4 then return False. "0 = Monday, 6=Saturday, etc.."
We then pass the boolean expression to our dataframe with returns only True values. We now have a forecast dataframe comprised of the next 3 years of weekdays.
forecast = model.predict(future)
forecast.tail()
ds | trend | yhat_lower | yhat_upper | trend_lower | trend_upper | additive_terms | additive_terms_lower | additive_terms_upper | weekly | weekly_lower | weekly_upper | yearly | yearly_lower | yearly_upper | multiplicative_terms | multiplicative_terms_lower | multiplicative_terms_upper | yhat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10661 | 2022-03-07 | 3,398.27 | 3,043.92 | 3,765.33 | 3,050.21 | 3,757.76 | 0.46 | 0.46 | 0.46 | 0.90 | 0.90 | 0.90 | -0.44 | -0.44 | -0.44 | 0.00 | 0.00 | 0.00 | 3,398.73 |
10662 | 2022-03-08 | 3,398.82 | 3,035.27 | 3,784.78 | 3,049.62 | 3,758.73 | 1.09 | 1.09 | 1.09 | 1.40 | 1.40 | 1.40 | -0.30 | -0.30 | -0.30 | 0.00 | 0.00 | 0.00 | 3,399.92 |
10663 | 2022-03-09 | 3,399.38 | 3,044.68 | 3,769.69 | 3,049.04 | 3,759.69 | 1.25 | 1.25 | 1.25 | 1.37 | 1.37 | 1.37 | -0.12 | -0.12 | -0.12 | 0.00 | 0.00 | 0.00 | 3,400.63 |
10664 | 2022-03-10 | 3,399.93 | 3,034.44 | 3,796.22 | 3,048.49 | 3,760.65 | 1.43 | 1.43 | 1.43 | 1.32 | 1.32 | 1.32 | 0.10 | 0.10 | 0.10 | 0.00 | 0.00 | 0.00 | 3,401.36 |
10665 | 2022-03-11 | 3,400.49 | 3,040.25 | 3,798.08 | 3,047.96 | 3,761.62 | 1.48 | 1.48 | 1.48 | 1.11 | 1.11 | 1.11 | 0.37 | 0.37 | 0.37 | 0.00 | 0.00 | 0.00 | 3,401.97 |
To create the forecast we call predict
from our model
and pass in the future
dataframe we created earlier. We return the results in a new dataframe called forecast
.
When we inspect the forecast
dataframe we see a bunch of new terms. The one we are most interested in is yhat
which is our forecasted value.
model.plot(forecast);
model.plot_components(forecast);
All the new fields appear a bit daunting but fortunately Prophet comes with two handy visualization helpers, plot
and plot_components
. The plot
functions creates a graph of our actuals and forecast and plot_components
provides us a graph of our trend and seasonality.
stock_price_forecast = forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]df = pd.merge(stock_price, stock_price_forecast, on='ds', how='right')df.set_index('ds').plot(figsize=(16,8), color=['royalblue', "#34495e", "#e74c3c", "#e74c3c"], grid=True);
The visualization helpers are just using the data in our forecast
dataframe. We can actually recreate the same graphs. Above I recreated the plot
graph.
Simulating Forecasts
While the 3 year forecast we created above is pretty cool we don't want to make any trading decisions on it without backtesting the performance and a trading strategy.
In this section we will simulate as if Prophet existed back in 1980 and we used it to creat a monthly forecast through 2019. We will then use this data in the following section to simulate how various trading strategies did vs if we just bought and held on to the stock.
stock_price['dayname'] = stock_price['ds'].dt.day_name()stock_price['month'] = stock_price['ds'].dt.monthstock_price['year'] = stock_price['ds'].dt.yearstock_price['month/year'] = stock_price['month'].map(str) + '/' + stock_price['year'].map(str) stock_price = pd.merge(stock_price, stock_price['month/year'].drop_duplicates().reset_index(drop=True).reset_index(), on='month/year', how='left')stock_price = stock_price.rename(columns={'index':'month/year_index'})
stock_price.tail()
ds | y | dayname | month | year | month/year | month/year_index | |
---|---|---|---|---|---|---|---|
9880 | 2019-03-08 | 2,743.07 | Friday | 3 | 2019 | 3/2019 | 470 |
9881 | 2019-03-11 | 2,783.30 | Monday | 3 | 2019 | 3/2019 | 470 |
9882 | 2019-03-12 | 2,791.52 | Tuesday | 3 | 2019 | 3/2019 | 470 |
9883 | 2019-03-13 | 2,810.92 | Wednesday | 3 | 2019 | 3/2019 | 470 |
9884 | 2019-03-14 | 2,808.48 | Thursday | 3 | 2019 | 3/2019 | 470 |
Before we simulate the monthly forecasts we need to add some columns to our stock_price
dataframe we created in the beginning of this project to make it a bit easier to work with. We add month, year, month/year, and month/year_index.
loop_list = stock_price['month/year'].unique().tolist()max_num = len(loop_list) - 1forecast_frames = []for num, item in enumerate(loop_list): if num == max_num: pass else: df = stock_price.set_index('ds')[ stock_price[stock_price['month/year'] == loop_list[0]]['ds'].min():\ stock_price[stock_price['month/year'] == item]['ds'].max()] df = df.reset_index()[['ds', 'y']] model = Prophet() model.fit(df) future = stock_price[stock_price['month/year_index'] == (num + 1)][['ds']] forecast = model.predict(future) forecast_frames.append(forecast)
stock_price_forecast = reduce(lambda top, bottom: pd.concat([top, bottom], sort=False), forecast_frames)stock_price_forecast = stock_price_forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]stock_price_forecast.to_csv('stock_price_forecast.csv', index=False)
Above is a lot but essentially we are looping through each unique month/year
in the stock_price and fitting the Prophet
model with the stock data available to that period and then forecasting out one month ahead. We continue to do this until we hit the last unique month/year
. Finally we combine these forecasts into a single dataframe called stock_price_forecast
. I save the results as it take a while to run and in case I need to reset I can pull the csv file instead of running the model again.
stock_price_forecast = pd.read_csv('stock_price_forecast.csv', parse_dates=['ds'])
df = pd.merge(stock_price[['ds','y', 'month/year_index']], stock_price_forecast, on='ds')df['Percent Change'] = df['y'].pct_change()df.set_index('ds')[['y', 'yhat', 'yhat_lower', 'yhat_upper']].plot(figsize=(16,8), color=['royalblue', "#34495e", "#e74c3c", "#e74c3c"], grid=True)
<matplotlib.axes._subplots.AxesSubplot at 0x2121698dc18>
df.head()
ds | y | month/year_index | yhat | yhat_lower | yhat_upper | Percent Change | |
---|---|---|---|---|---|---|---|
0 | 1980-02-01 | 115.12 | 1 | 115.23 | 114.60 | 115.85 | nan |
1 | 1980-02-04 | 114.37 | 1 | 115.74 | 115.12 | 116.46 | -0.01 |
2 | 1980-02-05 | 114.66 | 1 | 116.05 | 115.38 | 116.75 | 0.00 |
3 | 1980-02-06 | 115.72 | 1 | 116.79 | 116.15 | 117.50 | 0.01 |
4 | 1980-02-07 | 116.28 | 1 | 116.50 | 115.79 | 117.23 | 0.00 |
Finally we combine our forecast with the actual prices and create a Percent Change
column which will be used in our Trading Algorithms below. Lastly, I plot the forecasts with the actuals to see how well it did. As you can see there is a bit of a delay. It kind of behaves a lot like a moving average would.
Trading Algorithms
df['Hold'] = (df['Percent Change'] + 1).cumprod()df['Prophet'] = ((df['yhat'].shift(-1) > df['yhat']).shift(1) * (df['Percent Change']) + 1).cumprod()df['Prophet Thresh'] = ((df['y'] > df['yhat_lower']).shift(1)* (df['Percent Change']) + 1).cumprod()df['Seasonality'] = ((~df['ds'].dt.month.isin([8,9])).shift(1) * (df['Percent Change']) + 1).cumprod()
Above we create four initial trading algorithms:
- Hold: Our bench mark. This is a buy and hold strategy. Meaning we buy the stock and hold on to it until the end time period.
- Prophet: This strategy is to sell when our forecast indicates a down trend and buy back in when it iindicates an upward trend
- Prophet Thresh: This strategy is to only sell when the stock price fall below our yhat_lower boundary.
- Seasonality: This strategy is to exit the market in August and re-enter in Ocober. This was based on the seasonality chart from above.
(df.dropna().set_index('ds')[['Hold', 'Prophet', 'Prophet Thresh','Seasonality']] * 1000).plot(figsize=(16,8), grid=True)print(f"Hold = {df['Hold'].iloc[-1]*1000:,.0f}")print(f"Prophet = {df['Prophet'].iloc[-1]*1000:,.0f}")print(f"Prophet Thresh = {df['Prophet Thresh'].iloc[-1]*1000:,.0f}")print(f"Seasonality = {df['Seasonality'].iloc[-1]*1000:,.0f}")
Hold = 24,396Prophet = 13,366Prophet Thresh = 17,087Seasonality = 30,861
Above we plot the results simulating an initial investment of $1,000 dollars. As you can see our Seasonality did best and our benchmark strategy of Hold did the second best. Both Prophet based strategies didn't do so well. Let's see if we can improve the Prophet Thresh
buy optimizing the threshold.
performance = {}for x in np.linspace(.9,.99,10): y = ((df['y'] > df['yhat_lower']*x).shift(1)* (df['Percent Change']) + 1).cumprod() performance[x] = y best_yhat = pd.DataFrame(performance).max().idxmax()pd.DataFrame(performance).plot(figsize=(16,8), grid=True);f'Best Yhat = {best_yhat:,.2f}'
'Best Yhat = 0.92'
Above we loop through various percents of the thresh to find the optimal thresh. It appears the best threshhold is 92% of our current yhat_lower.
df['Optimized Prophet Thresh'] = ((df['y'] > df['yhat_lower'] * best_yhat).shift(1) * (df['Percent Change']) + 1).cumprod()
(df.dropna().set_index('ds')[['Hold', 'Prophet', 'Prophet Thresh', 'Seasonality', 'Optimized Prophet Thresh']] * 1000).plot(figsize=(16,8), grid=True)print(f"Hold = {df['Hold'].iloc[-1]*1000:,.0f}")print(f"Prophet = {df['Prophet'].iloc[-1]*1000:,.0f}")print(f"Prophet Thresh = {df['Prophet Thresh'].iloc[-1]*1000:,.0f}")print(f"Seasonality = {df['Seasonality'].iloc[-1]*1000:,.0f}")print(f"Optimized Prophet Thresh = {df['Optimized Prophet Thresh'].iloc[-1]*1000:,.0f}")
Hold = 24,396Prophet = 13,366Prophet Thresh = 17,087Seasonality = 30,861Optimized Prophet Thresh = 36,375
Above we see our new Optimized Prophet Thresh
is the best trading stratege. Unfortunately, both the Seasonaity
and Optimized Prophet Thresh
are both cheating since they are using data from the future that wouldn't be available at the time of our trade. We are going to need to create an Optimized Thresh for each curent point in time of our Forecast.
fcst_thresh = {}for num, index in enumerate(df['month/year_index'].unique()): temp_df = df.set_index('ds')[ df[df['month/year_index'] == df['month/year_index'].unique()[0]]['ds'].min():\ df[df['month/year_index'] == index]['ds'].max()] performance = {} for thresh in np.linspace(0, .99, 100): percent = ((temp_df['y'] > temp_df['yhat_lower'] * thresh).shift(1)* (temp_df['Percent Change']) + 1).cumprod() performance[thresh] = percent best_thresh = pd.DataFrame(performance).max().idxmax() if num == len(df['month/year_index'].unique())-1: pass else: fcst_thresh[df['month/year_index'].unique()[num+1]] = best_thresh
fcst_thresh = pd.DataFrame([fcst_thresh]).T.reset_index().rename(columns={'index':'month/year_index', 0:'Fcst Thresh'})
fcst_thresh['Fcst Thresh'].plot(figsize=(16,8), grid=True);
Above, like how we created our monthly forecast, we loop through the data and find the optimal thresh percent period to date for that current point in time. As you can see the % of the current thresh jumps around as we get further in to the periods (1/1/1980 - 3/18/2019).
df['yhat_optimized'] = pd.merge(df, fcst_thresh, on='month/year_index', how='left')['Fcst Thresh'].shift(1) * df['yhat_lower']
df['Prophet Fcst Thresh'] = ((df['y'] > df['yhat_optimized']).shift(1)* (df['Percent Change']) + 1).cumprod()
(df.dropna().set_index('ds')[['Hold', 'Prophet', 'Prophet Thresh', 'Prophet Fcst Thresh']] * 1000).plot(figsize=(16,8), grid=True)print(f"Hold = {df['Hold'].iloc[-1]*1000:,.0f}")print(f"Prophet = {df['Prophet'].iloc[-1]*1000:,.0f}")print(f"Prophet Thresh = {df['Prophet Thresh'].iloc[-1]*1000:,.0f}")# print(f"Seasonality = {df['Seasonality'].iloc[-1]*1000:,.0f}")print(f"Prophet Fcst Thresh = {df['Prophet Fcst Thresh'].iloc[-1]*1000:,.0f}")
Hold = 24,396Prophet = 13,366Prophet Thresh = 17,087Prophet Fcst Thresh = 20,620
As we did before we create the new trading strategy and graph it. Unfortunately are results have gotten worse but we did do better then our initial Prophet Thresh
. Instead of calculating the thresh using the full period to date let's try various rolling windows of time like you would see with a moving average(30, 60, 90, etc.).
rolling_thresh = {}for num, index in enumerate(df['month/year_index'].unique()): rolling_performance = {} for roll in range(10, 400, 10): temp_df = df.set_index('ds')[ df[df['month/year_index'] == index]['ds'].min() - pd.DateOffset(months=roll):\ df[df['month/year_index'] == index]['ds'].max()] performance = {} for thresh in np.linspace(.0,.99, 100): percent = ((temp_df['y'] > temp_df['yhat_lower'] * thresh).shift(1)* (temp_df['Percent Change']) + 1).cumprod() performance[thresh] = percent per_df = pd.DataFrame(performance) best_thresh = per_df.iloc[[-1]].max().idxmax() percents = per_df[best_thresh] rolling_performance[best_thresh] = percents per_df = pd.DataFrame(rolling_performance) best_rolling_thresh = per_df.iloc[[-1]].max().idxmax() if num == len(df['month/year_index'].unique())-1: pass else: rolling_thresh[df['month/year_index'].unique()[num+1]] = best_rolling_thresh
rolling_thresh = pd.DataFrame([rolling_thresh]).T.reset_index().rename(columns={'index':'month/year_index', 0:'Fcst Thresh'})
rolling_thresh['Fcst Thresh'].plot(figsize=(16,8), grid=True);
Above is very simliar to before but now we are trying out various moving windows along with various threshold percents. This is getting quite complex. No wonder Quants make so much money. As you can see from above the thresh percents change over time. Now let's see how we did.
df['yhat_optimized'] = pd.merge(df, rolling_thresh, on='month/year_index', how='left')['Fcst Thresh'].fillna(1).shift(1) * df['yhat_lower']
df['Prophet Rolling Thresh'] = ((df['y'] > df['yhat_optimized']).shift(1)* (df['Percent Change']) + 1).cumprod()
(df.dropna().set_index('ds')[['Hold', 'Prophet', 'Prophet Thresh', 'Prophet Fcst Thresh', 'Prophet Rolling Thresh']] * 1000).plot(figsize=(16,8), grid=True)print(f"Hold = {df['Hold'].iloc[-1]*1000:,.0f}")print(f"Prophet = {df['Prophet'].iloc[-1]*1000:,.0f}")print(f"Prophet Thresh = {df['Prophet Thresh'].iloc[-1]*1000:,.0f}")# print(f"Seasonality = {df['Seasonality'].iloc[-1]*1000:,.0f}")print(f"Prophet Fcst Thresh = {df['Prophet Fcst Thresh'].iloc[-1]*1000:,.0f}")print(f"Prophet Rolling Thresh = {df['Prophet Rolling Thresh'].iloc[-1]*1000:,.0f}")
Hold = 24,396Prophet = 13,366Prophet Thresh = 17,087Prophet Fcst Thresh = 20,620Prophet Rolling Thresh = 23,621
As you can see our new Porphet Rolling Thresh
did pretty well but still did't beat out the simpliest Hold
strategy. Perhaps the saying "Time in the Market is better then Timing the Market" has some truth to it.
df['Time Traveler'] = ((df['y'].shift(-1) > df['yhat']).shift(1) * (df['Percent Change']) + 1).cumprod()
(df.dropna().set_index('ds')[['Hold', 'Prophet', 'Prophet Thresh', 'Prophet Fcst Thresh', 'Prophet Rolling Thresh', 'Time Traveler']] * 1000).plot(figsize=(16,8), grid=True)print(f"Hold = {df['Hold'].iloc[-1]*1000:,.0f}")print(f"Prophet = {df['Prophet'].iloc[-1]*1000:,.0f}")print(f"Prophet Thresh = {df['Prophet Thresh'].iloc[-1]*1000:,.0f}")# print(f"Seasonality = {df['Seasonality'].iloc[-1]*1000:,.0f}")print(f"Prophet Fcst Thresh = {df['Prophet Fcst Thresh'].iloc[-1]*1000:,.0f}")print(f"Prophet Rolling Thresh = {df['Prophet Rolling Thresh'].iloc[-1]*1000:,.0f}")print(f"Time Traveler = {df['Time Traveler'].iloc[-1]*1000:,.0f}")
Hold = 24,396Prophet = 13,366Prophet Thresh = 17,087Prophet Fcst Thresh = 20,620Prophet Rolling Thresh = 23,621Time Traveler = 288,513
Above I implemented my Time Traveler
strategy. This of course would be a perfect trading strategy as I know in advance when the market moves up or down. As you can the most you could make is $288,513 from $1,000.
Summary
Time Series Forecasting can be quite complex however Prophet makes it very easy to create robust forecasts with little effort. While it didn't make us rich with its stock market predictions it is still very useful and can be implemented quickly to solve many use cases in various areas.