从Pandas DataFrame计算RSI指标?

为了说明这一点，我将numpy解决方案与dimitris_ps的循环解决方案进行了比较:import pandas as pdimport numpy as npimport timeitmult = 1 # length of dataframe = 23 * multnumber = 1000 # number of loop for timeitdf0 = pd.DataFrame({'close':[4724.89, 4378.51,6463.00,9838.96,13716.36,10285.10, 10326.76,6923.91,9246.01,7485.01,6390.07,7730.93, 7011.21,6626.57,6371.93,4041.32,3702.90,3434.10, 3813.69,4103.95,5320.81,8555.00,10854.10] * mult })n = 14def rsi_np(): # my numpy solution from above return dfdef rsi_loop(): # loop solution https://stackoverflow.com/a/57008625/3944322 # without the wrong alternative calculation of df['avg_gain'][14] return dfdf = df0.copy()time_np = timeit.timeit('rsi_np()', globals=globals(), number = number) / 1000 * numberdf = df0.copy()time_loop = timeit.timeit('rsi_loop()', globals=globals(), number = number) / 1000 * numberprint(f'rows\tnp\tloop\n{len(df0)}\t{time_np:.1f}\t{time_loop:.1f}')assert np.allclose(rsi_np(), rsi_loop(), equal_nan=True) 结果(毫秒/循环): rows np loop23 4.9 9.2230 5.0 112.32300 5.5 1122.7因此，即使对于8行(第15 ... 22行)，循环求解所花费的时间也是numpy解决方案的两倍. Numpy可很好地扩展，而对于大型数据集，循环解决方案不可行.My problemI tried many libraries on Github but all of them did not produce matching results for TradingView so I followed the formula on this link to calculate RSI indicator. I calculated it with Excel and collated the results with TradingView. I know it's absolutely correct but, but I didn't find a way to calculate it with Pandas.Formula 100RSI = 100 - -------- 1 + RSRS = Average Gain / Average LossThe very first calculations for average gain and average loss are simple14-period averages:First Average Gain = Sum of Gains over the past 14 periods / 14.First Average Loss = Sum of Losses over the past 14 periods / 14The second, and subsequent, calculations are based on the prior averagesand the current gain loss:Average Gain = [(previous Average Gain) x 13 + current Gain] / 14.Average Loss = [(previous Average Loss) x 13 + current Loss] / 14.Expected Results close change gain loss avg_gian avg_loss rs \0 4724.89 NaN NaN NaN NaN NaN NaN1 4378.51 -346.38 0.00 346.38 NaN NaN NaN2 6463.00 2084.49 2084.49 0.00 NaN NaN NaN3 9838.96 3375.96 3375.96 0.00 NaN NaN NaN4 13716.36 3877.40 3877.40 0.00 NaN NaN NaN5 10285.10 -3431.26 0.00 3431.26 NaN NaN NaN6 10326.76 41.66 41.66 0.00 NaN NaN NaN7 6923.91 -3402.85 0.00 3402.85 NaN NaN NaN8 9246.01 2322.10 2322.10 0.00 NaN NaN NaN9 7485.01 -1761.00 0.00 1761.00 NaN NaN NaN10 6390.07 -1094.94 0.00 1094.94 NaN NaN NaN11 7730.93 1340.86 1340.86 0.00 NaN NaN NaN12 7011.21 -719.72 0.00 719.72 NaN NaN NaN13 6626.57 -384.64 0.00 384.64 NaN NaN NaN14 6371.93 -254.64 0.00 254.64 931.605000 813.959286 1.14453515 4041.32 -2330.61 0.00 2330.61 865.061786 922.291480 0.93794816 3702.90 -338.42 0.00 338.42 803.271658 880.586374 0.91220117 3434.10 -268.80 0.00 268.80 745.895111 836.887347 0.89127318 3813.69 379.59 379.59 0.00 719.730460 777.109680 0.92616319 4103.95 290.26 290.26 0.00 689.053999 721.601845 0.95489520 5320.81 1216.86 1216.86 0.00 726.754428 670.058856 1.08461321 8555.00 3234.19 3234.19 0.00 905.856968 622.197509 1.45589922 10854.10 2299.10 2299.10 0.00 1005.374328 577.754830 1.740140 rsi_140 NaN1 NaN2 NaN3 NaN4 NaN5 NaN6 NaN7 NaN8 NaN9 NaN10 NaN11 NaN12 NaN13 NaN14 53.36984815 48.39903816 47.70423917 47.12556118 48.08332219 48.84635820 52.02946121 59.28171922 63.505515My CodeImportimport pandas as pdimport numpy as npLoad datadf = pd.read_csv("rsi_14_test_data.csv")close = df['close']print(close)0 4724.891 4378.512 6463.003 9838.964 13716.365 10285.106 10326.767 6923.918 9246.019 7485.0110 6390.0711 7730.9312 7011.2113 6626.5714 6371.9315 4041.3216 3702.9017 3434.1018 3813.6919 4103.9520 5320.8121 8555.0022 10854.10Name: close, dtype: float64ChangeCalculate change every rowchange = close.diff(1)print(change)0 NaN1 -346.382 2084.493 3375.964 3877.405 -3431.266 41.667 -3402.858 2322.109 -1761.0010 -1094.9411 1340.8612 -719.7213 -384.6414 -254.6415 -2330.6116 -338.4217 -268.8018 379.5919 290.2620 1216.8621 3234.1922 2299.10Name: close, dtype: float64Gain and lossget gain and loss from changeis_gain, is_loss = change > 0, change < 0gain, loss = change, -changegain[is_loss] = 0loss[is_gain] = 0gain.name = 'gain'loss.name = 'loss'print(loss)0 NaN1 346.382 0.003 0.004 0.005 3431.266 0.007 3402.858 0.009 1761.0010 1094.9411 0.0012 719.7213 384.6414 254.6415 2330.6116 338.4217 268.8018 0.0019 0.0020 0.0021 0.0022 0.00Name: loss, dtype: float64Calculate fist avg gain and lossMean of n prior rowsn = 14avg_gain = change * np.nanavg_loss = change * np.nanavg_gain[n] = gain[:n+1].mean()avg_loss[n] = loss[:n+1].mean()avg_gain.name = 'avg_gain'avg_loss.name = 'avg_loss'avg_df = pd.concat([gain, loss, avg_gain, avg_loss], axis=1)print(avg_df) gain loss avg_gain avg_loss0 NaN NaN NaN NaN1 0.00 346.38 NaN NaN2 2084.49 0.00 NaN NaN3 3375.96 0.00 NaN NaN4 3877.40 0.00 NaN NaN5 0.00 3431.26 NaN NaN6 41.66 0.00 NaN NaN7 0.00 3402.85 NaN NaN8 2322.10 0.00 NaN NaN9 0.00 1761.00 NaN NaN10 0.00 1094.94 NaN NaN11 1340.86 0.00 NaN NaN12 0.00 719.72 NaN NaN13 0.00 384.64 NaN NaN14 0.00 254.64 931.605 813.95928615 0.00 2330.61 NaN NaN16 0.00 338.42 NaN NaN17 0.00 268.80 NaN NaN18 379.59 0.00 NaN NaN19 290.26 0.00 NaN NaN20 1216.86 0.00 NaN NaN21 3234.19 0.00 NaN NaN22 2299.10 0.00 NaN NaNThe very first calculations for average gain and the average loss is ok but I don't know how to apply pandas.core.window.Rolling.apply for the second, and subsequent because they are in many rows and different columns.It may be something like this:avg_gain[n] = (avg_gain[n-1]*13 + gain[n]) / 14My Wish - My QuestionThe best way to calculate and work with technical indicators?Complete the above code in "Pandas Style".Does the traditional way of coding with loops reduce performance compared to Pandas? 解决方案 The average gain and loss are calculated by a recursive formula, which can't be vectorized with numpy. We can, however, try and find an analytical (i.e. non-recursive) solution for calculating the individual elements. Such a solution can then be implemented using numpy.Denoting the average gain as y and the current gain as x, we get y[i] = a*y[i-1] + b*x[i], where a = 13/14 and b = 1/14 for n = 14. Unwrapping the recursion leads to:(sorry for the picture, was just to cumbersome to type it)This can be efficiently calculated in numpy using cumsum (rma = running moving average):import pandas as pdimport numpy as npdf = pd.DataFrame({'close':[4724.89, 4378.51,6463.00,9838.96,13716.36,10285.10, 10326.76,6923.91,9246.01,7485.01,6390.07,7730.93, 7011.21,6626.57,6371.93,4041.32,3702.90,3434.10, 3813.69,4103.95,5320.81,8555.00,10854.10]})n = 14def rma(x, n, y0): a = (n-1) / n ak = a**np.arange(len(x)-1, -1, -1) return np.append(y0, np.cumsum(ak * x) / ak / n + y0 * a**np.arange(1, len(x)+1))df['change'] = df['close'].diff()df['gain'] = df.change.mask(df.change < 0, 0.0)df['loss'] = -df.change.mask(df.change > 0, -0.0)df.loc[n:,'avg_gain'] = rma( df.gain[n+1:].values, n, df.loc[:n, 'gain'].mean())df.loc[n:,'avg_loss'] = rma( df.loss[n+1:].values, n, df.loc[:n, 'loss'].mean())df['rs'] = df.avg_gain / df.avg_lossdf['rsi_14'] = 100 - (100 / (1 + df.rs))Output of df.round(2): close change gain loss avg_gain avg_loss rs rsi rsi_140 4724.89 NaN NaN NaN NaN NaN NaN NaN NaN1 4378.51 -346.38 0.00 346.38 NaN NaN NaN NaN NaN2 6463.00 2084.49 2084.49 0.00 NaN NaN NaN NaN NaN3 9838.96 3375.96 3375.96 0.00 NaN NaN NaN NaN NaN4 13716.36 3877.40 3877.40 0.00 NaN NaN NaN NaN NaN5 10285.10 -3431.26 0.00 3431.26 NaN NaN NaN NaN NaN6 10326.76 41.66 41.66 0.00 NaN NaN NaN NaN NaN7 6923.91 -3402.85 0.00 3402.85 NaN NaN NaN NaN NaN8 9246.01 2322.10 2322.10 0.00 NaN NaN NaN NaN NaN9 7485.01 -1761.00 0.00 1761.00 NaN NaN NaN NaN NaN10 6390.07 -1094.94 0.00 1094.94 NaN NaN NaN NaN NaN11 7730.93 1340.86 1340.86 0.00 NaN NaN NaN NaN NaN12 7011.21 -719.72 0.00 719.72 NaN NaN NaN NaN NaN13 6626.57 -384.64 0.00 384.64 NaN NaN NaN NaN NaN14 6371.93 -254.64 0.00 254.64 931.61 813.96 1.14 53.37 53.3715 4041.32 -2330.61 0.00 2330.61 865.06 922.29 0.94 48.40 48.4016 3702.90 -338.42 0.00 338.42 803.27 880.59 0.91 47.70 47.7017 3434.10 -268.80 0.00 268.80 745.90 836.89 0.89 47.13 47.1318 3813.69 379.59 379.59 0.00 719.73 777.11 0.93 48.08 48.0819 4103.95 290.26 290.26 0.00 689.05 721.60 0.95 48.85 48.8520 5320.81 1216.86 1216.86 0.00 726.75 670.06 1.08 52.03 52.0321 8555.00 3234.19 3234.19 0.00 905.86 622.20 1.46 59.28 59.2822 10854.10 2299.10 2299.10 0.00 1005.37 577.75 1.74 63.51 63.51Concerning your last question about performance: explicite loops in python / pandas are terrible, avoid them whenever you can. If you can't, try cython or numba.To illustrate this, I made a small comparison of my numpy solution with dimitris_ps' loop solution:import pandas as pdimport numpy as npimport timeitmult = 1 # length of dataframe = 23 * multnumber = 1000 # number of loop for timeitdf0 = pd.DataFrame({'close':[4724.89, 4378.51,6463.00,9838.96,13716.36,10285.10, 10326.76,6923.91,9246.01,7485.01,6390.07,7730.93, 7011.21,6626.57,6371.93,4041.32,3702.90,3434.10, 3813.69,4103.95,5320.81,8555.00,10854.10] * mult })n = 14def rsi_np(): # my numpy solution from above return dfdef rsi_loop(): # loop solution https://stackoverflow.com/a/57008625/3944322 # without the wrong alternative calculation of df['avg_gain'][14] return dfdf = df0.copy()time_np = timeit.timeit('rsi_np()', globals=globals(), number = number) / 1000 * numberdf = df0.copy()time_loop = timeit.timeit('rsi_loop()', globals=globals(), number = number) / 1000 * numberprint(f'rows\tnp\tloop\n{len(df0)}\t{time_np:.1f}\t{time_loop:.1f}')assert np.allclose(rsi_np(), rsi_loop(), equal_nan=True)Results (ms / loop):rows np loop23 4.9 9.2230 5.0 112.32300 5.5 1122.7So even for 8 rows (rows 15...22) the loop solution takes about twice the time of the numpy solution. Numpy scales well, whereas the loop solution isn't feasable for large datasets. 这篇关于从Pandas DataFrame计算RSI指标?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！