ONLINE PORTFOLIO OPTIMIZATION WITH EXPONENTIAL GRADIENT AND TIME VARYING CAPM OTIMIZAÇÃO DE PORTFÓLIO ONLINE COM GRADIENTE

Since Harry Markowitz’s seminal work in 1952, which initiated modern portfolio theory, portfolio allocation strategies have been intensely discussed in the literature. With the development of online optimization techniques, dynamic learning algorithms have proven to be an effective approach to build portfolios. The purpose of this paper was to implement a new version of the Exponential Gradient algorithm in which important information about the risk of stocks are considered in the algorithm’s projection step. The portfolios built were compared with the Dow Jones Industrial Average Index (DJIA) and Best Constant Rebalanced Portfolio (BCRP). We used DJIA data from January 2000 to December 2017 with daily observations. The EG beta algorithm outperformed the DJIA in all tests performed, and it was very close to BCRP in periods of market upturn and was able to outperform it in downturns.


INTRODUCTION
The portfolio selection process is a decision problem in which the investor must allocate a quantity of wealth to a finite set of assets within a time horizon. In order to solve this problem, the investor decides how much of his wealth will be allocated to each of the assets available in the market. Each asset represents a distinct investment opportunity and the decision made for an allocation is a portfolio. In this problem, the investor seeks to allocate his money in a stock market in order to obtain a good relationship between expected return and risk.
Choosing the optimal portfolio is as old a problem as the stock market itself. However, it was from the work of (MARKOWITZ, 1952) that this question became a mathematical problem. The modern Portfolio Theory, introduced by Markowitz, presented portfolio risk and diversification as factors inherent in investment decisions, as opposed to common sense at the time, which was the concentration of resources on the highest expected return asset.
Approaches based on machine learning techniques have been intensively applied in recent decades, becoming an important and active research area, as it offers the possibility to define online investment strategies that allow maximizing wealth without using statistical assumptions about the price of the assets.
The use of machine learning in portfolio selection problems is based on works such as (BELL; COVER, 1988); (ALGOET; , which showed that a constant rebalanced portfolio for a specific allocation (Constant Rebalancing Portfolio) may be more beneficial than selecting a portfolio based on a performance measure and maintaining it throughout the period. In addition, (COVER, 1991) has shown that there is a constant allocation strategy that provides the greatest long-term wealth growth (Best Constant Rebalancing Portfolio).
While portfolio selection techniques derived from the MV model have the problem of relying on statistical assumptions, online optimization algorithms only look at portfolio returns, but of course the decision to invest in an asset is not related only to the total return obtained, the related risk must be evaluated, since it is the combination of these two factors that has an effect on the value of the asset. In this sense, this work combines strong features of the portfolio selection method developed by works derived from the MV model, which consider risk aspects, with online methods that are nonparametric in nature, highly adaptive and computationally efficient.
The remainder of this paper is organized as follows. Section 2 presents the reference investment strategy, formally formulates the online portfolio selection problem and presents the CAPM model. Section 3 proposes a novel projection strategy to incorporate risk in the Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 2, p. 220-01, 220-13, 2020 DOI: 10.21575/25254782rmetg2020vol5n21151 online portfolio selection algorithm. Section 4 presents and analyzes the results of our strategy experiments. Section 5 concludes this paper.

Constant Rebalanced Portfolio
Finding the absolute maximum wealth portfolio within the market is a very ambitious task (HAZAN; ARORA, 2006), so what we seek is to minimize the distance between the portfolio chosen by the online algorithm and the Constant Rebalanced Portfolio (CRP), as proposed by (COVER, 1991).
CRP is an investment strategy in which the portfolio is rebalanced in such a way that the proportion allocated to each of the assets in the original portfolio is preserved. Due to daily financial market movements, the CRP portfolio will have a different allocation from the original asset allocation. In a CRP portfolio, for each period of time t asset purchase and sale operations are performed so the wealth percentage b it = b i(t+1) or every instant of time t and every asset i belonging to the portfolio, that is, the percentage of wealth allocated to each asset remains constant over time. Let b n 1 = {b 1 , b 2 , ..., b n } be a vector containing the ratio allocated to each portfolio asset and r t the vector containing the return of each of the i assets at time t, the optimal offline CRP can be defined as: The function to be maximized is convex and therefore can be efficiently solved. The vector b is called Best Constant Rebalanced Portfolios (BCRP). BCRP is found through offline algorithms, that is, the algorithm operates on every set of returns within the evaluated T time horizon. For this reason, the BCRP is considered a hard to beat benchmark (HAZAN; ARORA, 2006).

Online Portfolio Selection
A convex optimization problem can be written as an infinite sequence of convex cost functions {c 1 , c 2 , ..} and a set of workable solutions S n ⊆ R n . At each step t, the algorithm uses a strategy x t ∈ S n and after choosing this strategy evaluates the cost at c t . Since not all information is available, the algorithm uses the performance t = 1, ..., t − 1 to define the future position. It is noteworthy that cost functions can be quite distinct at each step as long as they Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 2, p. 220-01, 220-13, 2020 DOI: 10.21575/25254782rmetg2020vol5n21151 are convex.
As the cost function sequence {c 1 , c 2 , ...} is not known in advance, it is not possible to choose x i such that c i is minimal, so online optimization algorithms don't search for optimal solutions, but for certain objectives. One of the techniques used aims to minimize the Regret function. According to (ZINKEVICH, 2003), the Regret function can be calculated by comparing an online algorithm with the decision made by an offline algorithm but having access to the first cost functions {c 1 , c 2 , ...}. The idea is to measure the distance (cost) to some optimal strategy, in the specific case of online portfolio selection, the reference strategy is the BCRP.
Online portfolio optimization algorithms are based on the current asset price to select future portfolio allocation, meaning that asset price information arrives sequentially and the allocation decision is made immediately.
In each trading period t, for t = 1, ..., T , an investor observes a return vector r t , where each component r it is the share return i at time t with closing price up to t − 1. The b t vector contains the ratio applied to each of the n assets. The wealth of a portfolio invested in b t , considering only one period, can be calculated by the domestic product b t r t . After T periods the accumulated wealth is given by T t=1 (b t r t ). Taking the logarithm of wealth evolution we get the growth rate T t=1 log(b t r t ). Similarly, taking the wealth log of a portfolio using the CRP investment strategy with b ∈ S n , we get the logarithmic growth rate T t=1 log(b r t ). The Regret of an online algorithm that selects the b t to t = 1, ..., T , is the given by: Thus, the online algorithm attempts to minimize the distance between the b t portfolio and the optimal CRP or BCRP. A portfolio selected by a low-regret algorithm is expected to have the same asymptotic behavior as BCRP (HAZAN; ARORA, 2006).

CAPM Model
Starting from the theory proposed by (MARKOWITZ, 1952), (SHARPE, 1964) and (LINTNER, 1965), working in isolation, began a capital asset pricing model classically called the CAPM (Capital Asset Price Model). The CAPM introduced two new premises to Markowitz's classic model: homogeneous expectations and risk-free rate. The assumption of homogeneous expectations says that investors have the same perspectives on expected returns, standard deviation and covariance of assets (efficient market assumption). The assumption of Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 2, p. 220-01, 220-13, 2020 DOI: 10.21575/25254782rmetg2020vol5n21151 the free rate is that there is an investment in the market in which its remuneration is assured exactly as expected, economic and cyclical factors do not have the ability to affect the liquidity of such an investment. CAPM relates non-diversifiable risk to return for each asset, and is represented by the Equation 3: Where E(R m ) is the expected return on the market portfolio and R f is the return rate of the risk-free asset, σ p is the portfolio standard deviation.
The beta coefficient β indicates a degree of variability of return on assets in response to a change in market return and it is presented by Equation 4: Despite the simplicity of its representation, the relationship between return and risk in the CAPM in the relationship between the return on the market and the object asset in the beta (β) calculation, built the basis that structured the theory of investments analysis and, more specifically, performance appraisal methods.
Naturally, there are criticisms of the model, the most notable one is the sensitivity of β in relation to the estimation period. It is reasonable to assume that the risk of a particular company changes over time and this investors perception is very difficult to predict.
In order to incorporate this aspect, some authors propose the use of structural models with β being a latent variable in time, that is, not being directly observed or estimated (CARMONA, 2014). For the purpose of estimating such structural models, a widely used tool is Kalman filters, where under certain hypotheses of normality of observations it is possible to propose a structure for the dynamics of β of assets and extract it with optimization algorithms (DURBIN; KOOPMAN, 2012).

Dataset
In order to evaluate the results of the proposed algorithm, the data were collected from Dow Jones stock exchanges, obtained from the Yahoo Finance repository and made available by the rugarch package for the R software. Data are composed of 29 DJIA companies from Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 2, p. 220-01, 220-13, 2020 DOI: 10.21575/25254782rmetg2020vol5n21151 January 2000 to December 2017 with daily observations. To estimate the CAPM Model, we considered the New York Stock Exchange (NYSE) index as market return and the T-Bond 10Y issued by the US Treasury as a risk-free rate.

Algorithm
In this paper we test a direct implementation of the Exponential Gradient algorithm (HELMBOLD et al., 1998) and also the Beta constrained Exponential Gradient algorithm of the built portfolios.
The return on a portfolio for each period can be calculated as: Where x t can be interpreted as the increase or decrease of wealth in each of the trading periods. The first order information will then be given by: A Θ i t > 1 value indicates the performance of i asset during the period t was better than the current portfolio allocation b t . In the same way, Θ i t < 1 indicates that i asset performance was worse than the allocation b t .
Starting from an arbitrary allocation b t ∈ B n , the allocation of b (t+1) is based on the current allocation b t , calculating the first order information. The EG idea is, if the asset return i in trading period t is higher than wealth change, calculated by equation 5, then the portfolio ratio to the i asset to t + 1 must be increased. The learning rate η regulates the intensity of this change in position in the portfolio. Thus, the EG algorithm update is given by: For i = 1, ..., n. In the extreme case where η = 0, the proportion defined in t = 1 remains constant for subsequent periods.
All portfolio positions greater than or equal to zero were considered as belonging to the possible solution set, so that the portfolio cannot assume short positions, as a sum of all positions must be less than or equal to one, portfolio leverage is not being allowed. Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 2, p. 220-01, 220-13, 2020 DOI: 10.21575/25254782rmetg2020vol5n21151 The viable set S n is defined as a polytope, where it is not possible to take short positions and leverage is not allowed, thus b 1 = 1 e b i ≥ 0, i = 1, ..., n, being n the number of assets. Constraints were used in β − CAP M of the investor portfolio. As the portfolio's β is represented by the weighted sum of the β of the individual assets that make up the portfolio, it is possible to set upper and lower limits even as Ax ≤ b, while maintaining the convexity of the solution space. So, where β i t is the time variable beta of asset i at moment t.
In order to adjust the CAPM model with β time variant, a structural model with random shocks in the β of each asset was used. Mathematically we have then that: Where r t represents the risk-free rate, R m the market return, j a specific asset, η and normal independent variables. Thus, it turns out that β of each asset varies over time as a random walk, indicating that the underlying risk of a stock varies over time and without prior investor knowledge if the company will become more or less risky at a later time, thus defining a Martinguile for the β of each asset since its increments are independent. In continuous time, the analog to this model would be a Brownian motion for the β of each asset. The algorithm with update step considering β is exposed in the Algorithm1.

RESULTS
Initially, we analyzed the behavior of the EG algorithm without any additional restrictions in the Portfolio Beta and compared the result with BCRP and DJIA. In addition to the 29 stocks analyzed, EG was also allowed to reverse into fixed-income assets, this same opportunity was given to the offline optimization algorithm used to find the BCRP. Thus, the viable portfolio set consists of 30 assets. Although EG underperformed BCRP, the portfolio built was able to outperform the Dow Jones Index. These results are shown in Figure 1. In a second moment, we tested the risk control performance through the proposed algorithm: EG beta, in two scenarios. In the first, we evaluate the performance of the algorithm when we force the set of possible solutions to vary in the range β min = −1 and β max = 0.2, forcing mostly to go against DJIA. In Figure 2, it can be seen that during 2000 and 2005 the built portfolio outperformed the DJIA and the BCRP, this is due to the fact that it adopted a Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 2, p. 220-01, 220-13, 2020 DOI: 10.21575/25254782rmetg2020vol5n21151 position contrary to the market movement, allocating much of the wealth in the risk-free asset. However, during the period of earnings, after 2009, the adoption of a conservative position leads to the built portfolio to have gains above Dow Jones but much lower than BCRP. Looking at the behavior of Beta in Figure 2 (b) we see that Beta is positive throughout the interval. As we are not allowing short positions, it is difficult to build negative beta portfolios. This is because most stocks available for investment are positively correlated with the Dow Jones Index. Thus, the alternative of the algorithm to reduce the correlation with the market is to invest most of the wealth in risk-free assets, which actually happened, as can be seen in Figure 3. In the second evaluated scenario, the defined interval varies between β min = 1 and β max = 1.6. From CAPM theory, we know that by forcing the formation of a portfolio Revista Mundi, Engenharia e Gestão, Paranaguá, PR, v. 5, n. 2, p. 220-01, 220-13, 2020 DOI: 10.21575/25254782rmetg2020vol5n21151 with β > 1 will result in higher risk portfolios. In Figure 4 we see that by forcing a positive correlation with the market it is possible to keep up with DJIA growth periods, but when the market index falls, the built portfolio has even greater losses. This suggests that in order to achieve substantial market gains it is important to adapt the accepted range more frequently, so the investment decision should take place at shorter time intervals. In order to demonstrate that a proper choice of the Beta interval leads to consistent performance improvements in periods of high or low in the market, we selected two subsets of the historical series. To assess the risk of constructed portfolios we calculated two risk measures widely used in financial theory, the Value-at-risck (VaR) and the Conditional valueat-risk (CVaR) (ROCKAFELLAR;URYASEV et al., 2000). The first period selected is a clear downturn in the DJIA from May 24, 2002until March 12, 2003. The second period is an ample growth of the DJIA, from August 08, 2017 to December 29, 2017. between β min = −1 e β max = 0.1 caused the portfolio to avoid losses, especially by investing in risk-free assets, since short positions were not allowed. The results with the respective Betas are shown in Figure 5 (b). In terms of risk, we see in Table 1 that the portfolio built by the EG Beta algorithm presented a considerably lower risk compared to the risk of the portfolios built by BCRP and Dow Jones. This reduction in VaR and CVaR is due to the fact that much of the wealth was allocated to risk-free assets and the remainder of the portfolio was invested in stocks of companies that had a negative correlation with the market in the downward period.  Figure 6 shows that by forcing the portfolio beta between β min = 1.3 and β max = 2 during the projection stage, the number of stocks more closely correlated with the market increased, allowing the portfolio to outperform the Dow Jones index and very close to the BCRP, and could even surpass it at times. In Figure 4 (b) it can be observed that when the EG Beta algorithm can outperform the BCRP is exactly when the market conjuncture has allowed the construction of higher beta portfolios. Regarding risk measures, there was a substantial increase in VaR and CVaR in the built portfolios. This result was already expected and consistent with the CAPM theory since by allowing only betas greater than 1 we are willing to take risks higher than market risks. Risk measures are presented in Table 2.  EG_BETA BCRP Dow Jones 1% 5% 1% 5% 1% 5% VaR -0.0147841 -0.0085257 -0.0105320 -0.0078903 -0.0107870 -0.0047868 CVaR -0.0331748 -0.0149133 -0.0245809 -0.0118162 -0.0233727 -0.0096561

CONCLUSIONS
In this paper, the benefits of combining portfolio risk control with the EG algorithm were explored. Working with beta variant over time was critical to properly capturing the correlation of each asset with the market.
We saw no gain in controlling the portfolio beta for long periods of time. In fact, selecting just one beta range and keeping it fixed for almost 17 years would not be an efficient decision as market behavior changes over time. However, forcing a correlation greater than one in bull markets or less than one in bear markets has proved to be an efficient way to improve EG algorithm returns.
During downturns, forcing a negative or slightly positive Beta led the algorithm to find no solutions that offered capital increase in the stock market by forcing it to risk-free asset most capital, which prevented losses during that period.
As a direction for future research, we can propose the use of dynamic learning functions to define the best beta range for the moment, allowing the application of the technique in different scenarios within the market.