Abstract
This is a practitioners exercise and as such, mathematical and statistical theory will be kept to a minimum to help for easier reading but embedded links on theories and further background are highlighted for the interested reader.
Bitcoin (BTC/USD) price is modeled as a stochastic process following a fractional Brownian motion (fBm) demonstrated via a Hurst exponent (H) to try and measure the long term memory in the time series. Monte Carlo simulations were performed on this model to extend historical data and forecast Bitcoin price. Out of sample simulation results showed accuracy was to within ~10% of current prices. The 180 day (6 month) most probable (median) forward looking Bitcoin price prediction is ~USD14,211 by May 2018 and implying upside risk of ~95%. In addition, within this time frame Quantile risk/loss estimates show that there is only a 5% tail-end risk of a drop back to the ~$2000 price level (or a ~70% price drop).
Introduction
This simulation exercise is a practical extension of an original paper by Tarnopolski(2017). Here we have used a ~30% larger historical set of aggregated Bitcoin price Exchange data from Cryptocompare.com (vs data from BitcoinCharts in the initial paper). We examined a variety of different methods (including R/S, time domain partitioning, d4 & d6 Wavelet transforms and a Wavelet lifting transform ) to estimate the Hurst exponent from the historical data (vs using a Haar (d2) wavelet transform in the original paper). Also our engine for simulating discrete fractional Brownian motion paths is a more robust dvFbm package in R (see Achard & Coeurjolly 2009) compared to the Mathematica FractionalBrownianMotionProcess implementation used by Tarnopolski.
Why model Bitcoin as a fractional Brownian motion (fbm)?
The fractional Brownian motion (fBm) is a popular model for both short-range
dependent and long-range dependent phenomena in various fields, including physics,
biology, hydrology, network research, financial mathematics etc. There are many
good sources devoted to the fBm. For financial time series, there have been studies based on fBm and estimation the Hurst exponent, including an analysis of the S&P500 using one-minute time data examined over 11.5 years! (see Bayraktar, Poor & Sicar 2003)
In our case a large number (20,000) of fBm paths will be simulated with drift (“mean”) (μ), volatility (“standard deviation”) (σ) and the Hurst exponent (H) as input parameters and inserted into the solution below:
of the stochastic differential equation for a fBm:
where X(0) is taken as the last historical price in the data set being examined, and B(H,t) is a generated fBm path with H as an input.
And how is the Hurst (H) exponent involved?
The Hurst exponent for a data set provides a measure of whether the data is a pure “white noise” random process or has underlying trends (ie. some degree of autocorrelation). When the autocorrelation has a very long decay process it is sometimes referred to as a long memory process. Processes that we might assume are purely white noise may sometimes turn out to exhibit Hurst exponent statistics for long memory processes (they are “colored noise”). Unfortunately, the Hurst exponent is not a closed form calculation but more of an estimation from a data set, hence there are different methods to try an arrive at a stable and accurate estimation – which in itself is a whole area of research!
By definition the estimated value of H (where 0<H<1) determines what kind of process the fBm is:
- if H = 0.5 then the process is in fact a simple Brownian motion (“random motion”)
- if 0.5<H <1 then the process increments are positively correlated (prices tend to “trend” more)
- if 0<H < 0.5 then the process increments are negatively correlated (prices tend to be more “mean reverting”)
Data and Method:
Data: The examined data set spans from 2010-07-17 to 2017-11-06 generating a total of 2670 daily price observations below:
Hurst exponent: To estimate H a variety of methods as described earlier were examined. In the end, a more robust “Wavelet lifting transform” method was chosen which has been documented to cope well with irregularities and missing data. (In one line: it is the spectral analysis of a fitted linear interpolation wavelet transform as a high/low pass filter to a data set) (See Knight, Nason Numes (2016)).
Method: We will use 2 data sets :
(A) the full data set of 2670 days from 2010-07-11 to 2017-11-06,
(B) the full data set minus the last 180 days (~7% of the data) to leave for out-of-sample testing. So the data set will be from 2010-07-11 to 2017-05-10.
Key parameters for both data sets will be calculated including log-returns, drift (μ), volatility (σ) and the Hurst exponent (H).
A large number of fBm paths will be simulated on the smaller data to check the validity and accuracy of the simulated forecast compared to current price levels.
Finally, the values of the full data set will be used to make a 180 forward looking forecast on Bitcoin price.
Key computed values:
The following drift and volatility parameters were calculated from each data set of daily log returns. The Hurst exponent was estimated via Wavelet lifting transform.
Dataset (A) | DataSet (B) | |
Drift ( µ ) | 0.0073 | 0.0072 |
Volatility ( σ ) | 0.0938 | 0.0962 |
Hurst (H) | 0.5713 | 0.5453 |
Hurst Std Dev | +/-0.0860 | +/-0.1290 |
It is interesting to note that the smaller data set (B) actually has higher volatility than the full data set (A) which has Bitcoin price climbing to $7296 on the close of 2017-11-06 vs a close of $1752 for the close of data set (B).
Results!
(1) Out of Sample testing – Smaller Data Set B:
180 day forward looking fBm paths were simulated from the closing price ($1752) of Data set (B) to see how near the median forecast would be from the actual close on the 180th day (ie. $7296 on 2017-11-06).
Sample paths taken from 20,000 Simulated fBm trajectories:
Clearly some of individual paths hit Bitcoin prices of over $100,000+ (!) but we are interested in examining the overall distribution of simulated paths. This can be seen via the empirical Probability Density function (PDF) which is overlaid below with a
log-normal fitted curve. This yielded a median price (blue dot) of $6799 and a mean price (red dot) of $14,000:
By definition, the median (ie. exp(meanlog) ) and not the mean of a log-normal distribution denotes the most probable value. (A good chart here shows the transform relationships of the mean and median between a normal and a lognormal distribution) .
Hence $6799 is taken as the predicted value for the 180 day forward forecast vs the actual price of $7296 on 2017-11-06. This is an acceptable discrepancy of only ~7% and it’s accuracy goes to help validate the methodology used.
(2). 180 day forward forecast using Full Data Set (A):
We are now aiming to forecast the most probable value for Bitcoin price 180 days forward from the last data point from the Full Data Set (A) (ie. May 5th 2018). The same exercise above is repeated but this time using the parameters estimated from the full data set.
This yielded the empirical PDF :
The median of the fitted lognormal distribution is $14,211 and a mean of $33,350 (just shown for completeness).
We can go one step further and calculate an empirical Cumulative Distribution Function (CDF) to evaluate (risk/loss) probabilities at any given level:
By definition, the median price probability is 0.5. (ie. a 50% chance that prices will reach our forecast median price level of $14,211). From the empirical CDF we can also calculate specific tail end risk/loss probabilities as summarized below:
180 days forward | 180 days forward | |
Est from CDF: | Est. Price ($) | Est. Risk/Loss |
50% probability | 14,211 | +95% |
10% probability | 3,321 | -54% |
5% probability | 2,095 | -71% |
1% probability | 784 | -89% |
Hence from the simulated paths there is a only 5% chance of a drop to $2095 ( -71% drop) and a 10% chance of a drop to $3321 (-54% drop). And for upside risk I’ve added a line on the CDF that shows the probability that prices can reach the $20,000 level is ~32%. How’s that for a “blue sky” scenario!
Concluding remarks:
Bitcoin price was modeled as a geometric fBm via an estimated Hurst exponent and a Monte Carlo approach was used to simulate 20,000 paths. Out of sample testing showed model prediction levels were about ~10% than lower compared to actual prices. From the overall simulated distribution, the 180 forward (early May 2018) most probable predicted price is $14,211 – which implies about a +95% upside risk from current price levels. Tail end risk loss estimates were also calculated at the 1%, 5% & 10% levels.
Caveats: Models are always an approximation of reality! And the key Hurst exponent parameter for the simulated fBm paths is particularly difficult to estimate accurately. The drift, volatility and also the Hurst exponent itself will continuously change over time as market prices and conditions change. Therefore, it is better to perform shorter term forward forecasts with periodically updated parameters. It is also true that the model cannot predict “Black Swan” events (like the recent cancellation of the upcoming Segwit2X hard fork which prompted a sudden~90% drop in B2X futures!) even though it can generate risk/loss probability levels as general guidance.
Disclaimer: This is not investment advice and is a practical example for illustrative purposes only. Please note that historical gains may not be representative of future returns. And as always before any investment, especially in cryptoworld, please thoroughly DYOR. #DoYourOwnResearch
Happy Trading!
AJ
Hey mate,
can the hurst number be calculated on excel easily, or will it be better to use something like R?
Cheers
Great post btw !
LikeLike
Better off using some “built in” function in R (or other language) to do it. If u want to try using Excel for the simpler Rescaled Range model – u would still need to write a script.
LikeLike
@andrewjim111: I tried to replicate your result in R but for some reason it doesn’t work. The fact that standard deviation is really small result in a narrow, symmetric-looking distribution that doesn’t look log-normal. Any ideas on what went wrong?
LikeLike
Hmm not sure .. perhaps check that u are using LOG returns and check the time period of the data. Given the volatility of bitcoin, there’s little chance of a narrow StdDev too.
LikeLike
I am using LOG return. I actually got around the same mean and standard deviation as you did. This is basically the formula I used:
np.exp(mu * t + sigma * simulated_log_returns)
But somehow sigma (which is around .04 for me) is so tiny that e^(sigma*simulated_log_returns) is a really small number.
LikeLike
which kinda makes sense, since the simulated_log_returns (my fractional brownian) is hovering around 1, so e^(sigma*simulated_log_returns)
~ e^(0.04*1) is still a small number.
LikeLike
Do you mind sending me the original R code? I would really appreciate it.
LikeLike
Sorry it’s still proprietary at the moment.
LikeLike