Manual 04A for MD++
Conjugate Gradient Method in MD++

Keonwook Kang and Wei Cai

Mar 19 , 2009

Conjugate Gradient Method

The conjugate gradient (CG) method was initially proposed as an iterative method to solve a large linear system of equations,^[1] such as $\mathrm {A} \mathbf {x} =\mathbf {b}$ , where A is an n by n symmetric (positive-definite) matrix, and b is an n by 1 vector. The basic idea is that the quadratic form of a function Ψ(x),

\Psi (\mathbf {x} )={\frac {1}{2}}\mathbf {x} ^{T}\mathrm {A} \mathbf {x} -\mathbf {b} ^{T}\mathbf {x} +c

is minimized by the solution of $\mathrm {A} \mathbf {x} =\mathbf {b}$ if A is symmetric and positive-definite. Later, Fletcher and Reeves^[2] adopted it to solve nonlinear optimization problems. In MD++, the CG method is used to find a meta-stable structure by minimizing a given (usu. nonlinear) potential energy function.

The algorithm of the CG method is structured by two nested loops. The outer loop tries different search directions, and the inner loop aims to find the optimal step length which minimizes the object function along a given search direction. The inner loop may not be needed when the exact expression of the optimal step length can be given. If a certain condition is met, the program exits out of the loops and stops. The algorithm is shown in figure 1.

Figure 1: Algorithm of the (nonlinear) CG method.

In figure 1, Ψ(x) is a function to be minimized, g is the gradient of Ψ, and s is the search direction. α_k is the optimal step length that minimizes the function Ψ(x_k+α_ks_k) at the k-th iteration. β_k is the parameter related with update of the search direction.

Depending on how we choose β_k, the minimization methods can be different. For example, if β_k=0, it is the steepest descent (SD) method.

While there exists the analytical expression of α for a linear system, or

$\alpha _{k}={\Bigg \{}$	Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{\mathbf{r}^T_k\mathbf{r}_k}{\mathbf{r}^T_k\mathrm{A}\mathbf{r}_k}}	for SD,^[3]
$\alpha _{k}={\Bigg \{}$	Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{\mathbf{r}^T_k\mathbf{r}_k}{\mathbf{s}^T_k\mathrm{A}\mathbf{s}_k} }	for CG,

where r_k is the residual and is defined as Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{r}_k \equiv \mathbf{b} - \mathrm{A}\mathbf{x}_k} , for nonlinear problems, α is generally obtained by a numerical method such as Newton's method, quasi-Newton method, secant method, interpolation method, or other similiar methods. These methods of finding an optimal α are called as the line search methods. For details of the line search method, refer Nocedal and Wright's Numerial Optimization^[4]

For the CG method, the search directions are constructed by using the conjugacy of the residuals, or

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{r}_i^T \mathbf{r}_j = 0} for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle i \neq j} .

From this condition, β_k is obtained as

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta_k \equiv \frac{\mathbf{r}_{k}^T\mathbf{r}_{k}}{\mathbf{r}_{k-1}^T\mathbf{r}_{k-1}}} .

For details of the derivation, refer J. R. Shewchuk's.^[5] The benefit of this choice is that a new search direction is orthogonal to all the previous search directions. In other words, we don't need to store all the previous search directions to produce a linearly independent new search direction.

Non-linear CG method

For non-linear CG method, there exist different choices of β as shown in table 1, in which the residual r_k is now defined as the negative gradient or Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{r}_k = -\mathbf{g}_k} .

Table 1. The update parameter β in non-linear CG. For more different choices of β, refer Hager and Zhang's.^[6]
Name	Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta}	Reference	Features
HS	Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta^{HS}_k= \frac{\mathbf{g}_{k}^T\mathbf{g}_{k} - \mathbf{g}_{k}^T\mathbf{g}_{k-1}} {(\mathbf{g}_{k} - \mathbf{g}_{k-1})^T\mathbf{s}_{k-1}}} Cite error: Closing `</ref>` missing for `<ref>` tag	Hestenes and Stiefel (1952)
FR	Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta^{FR}_k= \frac{\mathbf{r}_{k}^T\mathbf{r}_{k}}{\mathbf{r}_{k-1}^T\mathbf{r}_{k-1}} = \frac{\mathbf{g}_{k}^T\mathbf{g}_{k}}{\mathbf{g}_{k-1}^T\mathbf{g}_{k-1}}}	Fletcher and Reeves (1964)	Considered as the 1st non-linear CG method. Same with β in linear CG. Jamming problem^[7]
PR	Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta^{PR}_k= \frac{\mathbf{r}_{k}^T\mathbf{r}_{k}-\mathbf{r}_{k}^T\mathbf{r}_{k-1} }{\mathbf{r}_{k-1}^T\mathbf{r}_{k-1}} = \frac{\mathbf{g}_{k}^T\mathbf{g}_{k}-\mathbf{g}_{k}^T\mathbf{g}_{k-1}}{\mathbf{g}_{k-1}^T\mathbf{g}_{k-1}}}	Polak and Ribiere (1969), and Polyak (1969)	If an object function Ψ is strongly convex quadratic and the line search is exact, β^PR = β^FR because r_i^Tr_j=0 for Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle i \neq j} . Can avoid jamming problem by effectively restarting CG. ^[8]
PR+	Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta^{PR+}_k= \max\{\beta^{PR}_k,0\}}	Powell (1984)^[9]	Convergence guaranteed. Restart CG if β_k<0. Restarting CG means that all the previous search directions are to be forgotten and the new search direction is set to be the steepest descent direction.
HS+	Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta^{HS+}_k= \max\{\beta^{HS}_k,0\}}

Hybrid method

For better performance, people may use a hybrid method, which can select different β at each iteration. One example of hybrid methods is PR-FR method, in which method β is selected as

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta_k = \left\{ \begin{array}{rll} -\beta_k^{FR} & \mathrm{if} & \beta_k^{PR}<-\beta_k^{FR} \\ \beta_k^{PR} & \mathrm{if} & |\beta_k^{PR}| \leq \beta_k^{FR}\\ \beta_k^{FR} & \mathrm{if} &\beta_k^{PR}<\beta_k^{FR} \end{array}. \right. }

Stopping conditions

Below is listed the conditions to be used for stopping the CG program.

Energy criterion
Force criterion
Max. evaluations
Max. iterations

Implementation of the CG method in MD++

MD++ is equipped with two different CG methods. One is what we call as zxcgr and the other is PR+. Details of each method will be explained below.

zxcgr

zxcgr code was written by Dongyi Liao at MIT in 1999 based on IMSL(International Mathematics and Statistics Library)'s U2CGG.F. This code follows Powell's restart procedures for the CG method.^[10] According to his paper, the search direction is updated as

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{s}_k = -\mathbf{g}_k + \beta_k\mathbf{s}_{k-1} + \lambda_k\mathbf{s}_t \quad \mathrm{for} \quad k \geq 1 \quad \mathrm{and} \quad \mathbf{s}_0 = -\mathbf{g}_0} ,

where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta_k = \beta_k^{HS}} and the extra term Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda_k\mathbf{s}_t} is related with how we restart CG. If we need to restart CG at k-th iteration, we set t = k - 1 and λ_k = 0. Then the restarting direction simply becomes s_k = -g_k + β_ks_k-1. After restarting CG, λ_k is defined as

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda_k = \frac{\mathbf{g}_k^T(\mathbf{g}_{t+1}-\mathbf{g}_t)} {\mathbf{s}_t^T(\mathbf{g}_{t+1}-\mathbf{g}_t)}, }

where t is the iteration number that is one time earlier than the iteration number at which restarting happened. Thus, λ_k can be summarized as

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lambda_k = \left\{ \begin{array}{ll} 0 & \mathrm{if} \quad k=t+1 \\ \frac{\mathbf{g}_k^T(\mathbf{g}_{t+1}-\mathbf{g}_t)} {\mathbf{s}_t^T(\mathbf{g}_{t+1}-\mathbf{g}_t)} & \mathrm{if} \quad k > t+1. \end{array} \right.}

The initial value of t is zero and will not change until the 1st restart occurs.

Unlike the other CG methods, this variation of CG method provides a restarting direction that is not a steepest descent direction. This modification usually takes less iterations to get to the minimum.

PR+

Comparision of the CG methods in different implementation

In the first part of this section, I compare the computation results obtained from CG relxation between MD++ (ver.Mar202009) and LAMMPS (ver.Mar262009). The test system is a 3[100] x 3[010] x 3[001] Si diamond cubic structure stretched by 10% in all three directions with two missing atoms (of atom ID 0 and 99). The number of atoms in the test system is 8x3³ - 2 = 214. The potential fuction is MEAM Siz with the cut-off radius r_c=6(Å). The CG method is implemented by using the Polak-Ribiere method (PR+) and the backtracking method. The energy and the force tolerances are given as etol = 1e-6 (no unit) and ftol = 1e-8 (eV/Å), respectively. In the 1st column of table 1, the potential energies of the perfect structure from MD++ and LAMMPS are compared, and they are identical. In the 2nd column, the potential energies of the defect structure before relaxation are compared, and they are again same. In the 4th column, the energies of the relaxed structure are listed in different colors depending on the values of d_max (0.1, 0.01, and 0.001), and we can say that the both results are effectively identical. For this small test system, MD++ is faster than LAMMPS, because of less evaluation of the potential function. For a larger system, LAMMPS can perform better due to its ability of parallel run.

Table 1. Relaxation of 3[100]x3[010]x3[001] Si structure with a defect. Potential file : MEAM Siz w/ r_c = 6 (A). CG method = (PR+) + (backtracking). Conditions: etol = 1e-6 (no unit), ftol = 1e-8 (eV/Å). Compiler = intel compiler
	E_perf (eV)	E_defect (eV)	d_max (Å)	E_rlx (eV)	Iter. No.	No. of Calls	CPU time
MD++ (PR+)	-1.00008000022397e+03	-9.00066105589930e+02	0.1	-9.04000029718829e+02	15	46	0.30002
			0.01	-9.04000167032814e+02	46	51	0.33202
			0.001	-9.03987772936424e+02	424	425	2.2201
LAMMPS (Serial Run)	-1.00008000022397e+03	-9.00066105589930e+02	0.1	-9.04000029718830e+02	15	60	0.75296
			0.01	-9.04000167032815e+02	46	96	1.2149
			0.001	-9.03987772936423e+02	424	848	10.281

The distinct difference of the CG method between MD++ and LAMMPS lies in the fact that MD++ runs the CG method in scaled coordinate (x^*) while LAMMPS does in real coordinate (x). Because of this scaling effect, d_max must be devided by the box length L for MD++, or d^*_max = d_max/L. By the same reason, the force tolerance in MD++ needs to be multiplied by the box length, or ftol^* = L·ftol.

MD++ is equipped with another implementation of the CG method, called as "zxcgr". This "zxcgr" code was written by Dongyi Liao at MIT in 1999 based on IMSL's U2CGG.F, and used the Hestenes-Stiefel method and the quadratic spline interpolation. Now I will compare two implemetations; MD++(zxcgr) and MD++(PR+). Since it is only the force tolerance that the "zxcgr" code considers to check the convergence, the energy tolerance will be set to be 0 in the comparison below. In table 2, the numbers are same upto 1e-8. Note that the energy obtained by the PR+ method becomes lower than the number (in table 1) from the same method with nonzero etol (=1e-6). This further relaxation can be understood, if we consider that in the case of nonzero energy tolerance a small step makes the energy change small enough to satisfy the energy tolerance and the structure is prone to stop relaxing further even though it is able to do because of its nonzero gradient (or force).

Table 2. Relaxation of 3[100]x3[010]x3[001] Si structure with a defect. Potential file : MEAM Siz w/ r_c = 6 (A). Conditions: etol = 0 (no unit), ftol = 1e-4 (eV/Å).
	d_max (Å)	E_rlx (eV)	Iter. No.	No. of Calls	CPU time
MD++ (zxcgr)		-9.04000245866814e+02	31	45	0.31602
MD++ (PR+)	0.1	-9.04000245866095e+02	35	124	0.70004
	0.01	-9.04000245869413e+02	63	117	0.70004
	0.001	-9.04000245869795e+02	460	483	2.6002

At this point, we may raise a question. Does the scaling effect ever affect the relaxation of a structure in a non-cubic cell? Let's say that we have a 6[100] x 3[010] x 3[001] Si diamond cubic structure stretched by 20% in x direction with two missing atoms (of atom ID 0 and 99). The relaxation results are listed in table 3 below.

Table 2. Relaxation of 6[100]x3[010]x3[001] Si structure with a defect. Potential file : MEAM Siz w/ r_c = 6 (A). Conditions: etol = 0 (no unit), ftol = 1e-4 (eV/Å). For the PR+ method, d_max = 0.1 (Å). No. of atoms = 430
	E_perf (eV)	E_defect (eV)	E_rlx (eV)	Iter. No.	No. of Calls
MD++ (zxcgr)	-2.00016000044795e+03	-1.86579478735442e+03	-1.86764046397026e+03	129	175
MD++ (PR+)			-1.86764046392565e+03	168	686
LAMMPS	-2.00016000044795e+03	-1.86579478735442e+03	-1.86764046393067e+03	74	387

The relaxed energy values are same upto 1e-6.

Notes

↑ M. R. Hestenes and E. Stiefel, "Methods of conjugate gradients for solving linear systems", Journal of Research of the National Bureau of Standards, 49(1952) pp.409-436
↑ R. Fletcher and C. M. Reeves, "Function minimization by conjugate gradients", Computer Journal, 7(1964) pp.149-154
↑ This expression can be obtained by using two consecutive gradients are orthogonal, or g^T_k+1g_k=0. In this case, we don't need the inner loop.
↑ Jorge Nocedal and Stephen J. Wright, Numerical Optimization 2nd ed. Springer (2006)
↑ J. R. Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain (1994)
↑ W. W. Hager and H. Zhang, "A survey of nonlinear conjugate gradient methods", Pacific J. of Optimization, 2(2006) pp.35-58
↑ While you minimize a general object function, you may have a bad search direction, which is almost orthogonal to the steepest descent direction, -g_k. Once you have a bad direction, the FR method is likely to give you a next search direction close to the previous direction, or Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{s}_{k+1}\sim \mathbf{s}_{k}} , which is another bad direction. See Nocedal and Wright pp. 125 - 127 Behavior of the Fletcher-Reeves Method
↑ When a search direction is almost orthogonal to the steepest descent direction, β^PR becomes almost zero (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta \sim 0} ) and the search direction is close to the steepest descent direction.
↑ M. J. D. Powell, "Nonconvex minimization calculations and the conjugate gradient method", Numerical Ananlysis (Dundee, 1983), Lecture Notes in Mathematics, vol. 1066, Springer-Verlag, Berlin, 1984 pp.122-141
↑ M. J. D. Powell, "Restart Procedures for the Conjugate Gradient Method", Mathematical Programming 12 (1977) pp.241-254

[1] M. R. Hestenes and E. Stiefel, "Methods of conjugate gradients for solving linear systems", Journal of Research of the National Bureau of Standards, 49(1952) pp.409-436

[2] R. Fletcher and C. M. Reeves, "Function minimization by conjugate gradients", Computer Journal, 7(1964) pp.149-154

[3] This expression can be obtained by using two consecutive gradients are orthogonal, or g^T_k+1g_k=0. In this case, we don't need the inner loop.

[Nocedal-4] Jorge Nocedal and Stephen J. Wright, Numerical Optimization 2nd ed. Springer (2006)

[5] J. R. Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain (1994)

[6] W. W. Hager and H. Zhang, "A survey of nonlinear conjugate gradient methods", Pacific J. of Optimization, 2(2006) pp.35-58

[7] While you minimize a general object function, you may have a bad search direction, which is almost orthogonal to the steepest descent direction, -g_k. Once you have a bad direction, the FR method is likely to give you a next search direction close to the previous direction, or Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{s}_{k+1}\sim \mathbf{s}_{k}} , which is another bad direction. See Nocedal and Wright pp. 125 - 127 Behavior of the Fletcher-Reeves Method

[8] When a search direction is almost orthogonal to the steepest descent direction, β^PR becomes almost zero (Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta \sim 0} ) and the search direction is close to the steepest descent direction.

[9] M. J. D. Powell, "Nonconvex minimization calculations and the conjugate gradient method", Numerical Ananlysis (Dundee, 1983), Lecture Notes in Mathematics, vol. 1066, Springer-Verlag, Berlin, 1984 pp.122-141

[10] M. J. D. Powell, "Restart Procedures for the Conjugate Gradient Method", Mathematical Programming 12 (1977) pp.241-254

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

M04A Conjugate Gradient Method in MD++

Contents

Manual 04A for MD++
Conjugate Gradient Method in MD++

Conjugate Gradient Method

Non-linear CG method

Hybrid method

Stopping conditions

Implementation of the CG method in MD++

zxcgr

PR+

Comparision of the CG methods in different implementation

Notes

Navigation menu

M04A Conjugate Gradient Method in MD++

Manual 04A for MD++ Conjugate Gradient Method in MD++

Conjugate Gradient Method

Non-linear CG method

Hybrid method

Stopping conditions

Implementation of the CG method in MD++

zxcgr

PR+

Comparision of the CG methods in different implementation

Notes

Navigation menu

Search

Manual 04A for MD++
Conjugate Gradient Method in MD++