Contents

Solution Methods for Microeconomic
Dynamic Stochastic Optimization Problems

2024-06-08
 
Christopher D. Carroll1

Note: The GitHub repo SolvingMicroDSOPs associated with this document contains python code that produces all results, from scratch, except for the last section on indirect inference. The numerical results have been confirmed by showing that the answers that the raw python produces correspond to the answers produced by tools available in the Econ-ARK toolkit, more specifically those in the HARK which has full documentation. The MSM results at the end have have been superseded by tools in the EstimatingMicroDSOPs repo.


_____________________________________________________________________________________

Abstract
These notes describe tools for solving microeconomic dynamic stochastic optimization problems, and show how to use those tools for efficiently estimating a standard life cycle consumption/saving model using microeconomic data. No attempt is made at a systematic overview of the many possible technical choices; instead, I present a specific set of methods that have proven useful in my own work (and explain why other popular methods, such as value function iteration, are a bad idea). Paired with these notes is Python code that solves the problems described in the text.

            Keywords 

Dynamic Stochastic Optimization, Method of Simulated Moments, Structural Estimation, Indirect Inference

            JEL codes 

E21, F41

pict

    PDF:  https://github.com/llorracc/SolvingMicroDSOPs/blob/master/SolvingMicroDSOPs.pdf

 Slides:  https://github.com/llorracc/SolvingMicroDSOPs/blob/master/SolvingMicroDSOPs-Slides.pdf

    Web:  https://llorracc.github.io/SolvingMicroDSOPs

   Code:  https://github.com/llorracc/SolvingMicroDSOPs/tree/master/Code

Archive:  https://github.com/llorracc/SolvingMicroDSOPs

          (Contains LaTeX code for this document and software producing figures and results)

1Carroll: Department of Economics, Johns Hopkins University, Baltimore, MD, ccarroll@jhu.edu     The notes were originally written for my Advanced Topics in Macroeconomic Theory

class at Johns Hopkins University; instructors elsewhere are welcome to use them for teaching purposes. Relative to earlier drafts, this version incorporates several improvements related to new results in the paper “Theoretical Foundations of Buffer Stock Saving” (especially tools for approximating the consumption and value functions). Like the last major draft, it also builds on material in “The Method of Endogenous Gridpoints for Solving Dynamic Stochastic Optimization Problems” published in Economics Letters, available at http://www.econ2.jhu.edu/people/ccarroll/EndogenousArchive.zip, and by including sample code for a method of simulated moments estimation of the life cycle model a la ? and Cagetti (?). Background derivations, notation, and related subjects are treated in my class notes for first year macro, available at http://www.econ2.jhu.edu/people/ccarroll/public/lecturenotes/consumption. I am grateful to several generations of graduate students in helping me to refine these notes, to Marc Chan for help in updating the text and software to be consistent with ?, to Kiichi Tokuoka for drafting the section on structural estimation, to Damiano Sandri for exceptionally insightful help in revising and updating the method of simulated moments estimation section, and to Weifeng Wu and Metin Uyanik for revising to be consistent with the ‘method of moderation’ and other improvements. All errors are my own. This document can be cited as ? in the references.

1 Introduction

{sec:introduction}

These lecture notes provide a gentle introduction to a particular set of solution tools for the canonical consumption-saving/portfolio allocation problem. Specifically, the notes describe and solve optimization problems for a consumer facing uninsurable idiosyncratic risk to nonfinancial income (e.g., labor or transfer income), first without and then with optimal portfolio choice,1 with detailed intuitive discussion of various mathematical and computational techniques that, together, speed the solution by many orders of magnitude. The problem is solved with and without liquidity constraints, and the infinite horizon solution is obtained as the limit of the finite horizon solution. After the basic consumption/saving problem with a deterministic interest rate is described and solved, an extension with portfolio choice between a riskless and a risky asset is also solved. Finally, a simple example shows how to use these methods (via the statistical ‘method of simulated moments’ (MSM for short)) to estimate structural parameters like the coefficient of relative risk aversion (a la Gourinchas and Parker (?) and Cagetti (?)).

2 The Problem

{sec:the-problem}

The usual analysis of dynamic stochastic programming problems packs a great many events (intertemporal choice, stochastic shocks, intertemporal returns, income growth, the taking of expectations, time discounting, and more) into a complex decision in which the agent makes an optimal choice simultaneously taking all these elements into account. For the dissection here, we will be careful to break down everything that happens into distinct operations so that each element can be scrutinized and understood in isolation.

We are interested in the behavior a consumer who begins period t  with a certain amount of ‘capital’ k t  , which is immediately rewarded by a return factor Rt  with the proceeds deposited in a bank balance:

b =  k R .
 t    t t
(1)

Simultaneously with the realization of the capital return, the consumer also receives noncapital income yt  , which is determined by multiplying the consumer’s ‘permanent income’ pt  by a transitory shock 𝜃t  :

y t = pt𝜃t
(2)

whose whose expectation is 1 (that is, before realization of the transitory shock, the consumer’s expectation is that actual income will on average be equal to permanent income pt ).

The combination of bank balances b  and income y  define’s the consumer’s ‘market resources’ (sometimes called ‘cash-on-hand,’ following ?):

m t = bt + yt,
(3)

available to be spent on consumption ct  for a consumer subject to a liquidity constraint that requires c ≤ m  (though we are not imposing such a constraint yet - see subsection 6.7). Finally we define

a t = m t − ct
(4)

mnemnoically as ‘assets-after-all-actions-are-accomplished.’

The consumer’s goal is to maximize discounted utility from consumption over the rest of a lifetime ending at date T  :

        [T∑ −t          ]
max  𝔼t      βnu (ct+n ) .
         n=0
(5)

Income evolves according to:

            pt+1 = 𝒢t+1pt  – permanent  labor income  dynamics
log  𝜃   ∼   𝒩 (− σ2∕2,σ2)  – lognormal  transitory shocks ∀ n > 0.
     t+n          𝜃    𝜃
(6)

Equation (6) indicates that we are allowing for a predictable average profile of income growth over the lifetime {𝒢 }T0   (to capture typical career wage paths, pension arrangements, etc).2 Finally, the utility function is of the Constant Relative Risk Aversion (CRRA), form, u(∙ ) = ∙1− ρ∕(1 − ρ )  .

It is well known that this problem can be rewritten in recursive (Bellman) form:

vt(m  t,pt ) = maxc  u(c) + β𝔼t[vt+1(m t+1,pt+1)]
(7)

subject to the Dynamic Budget Constraint (DBC) implicitly defined by equations (1)-(3) and to the transition equation that defines next period’s initial capital as this period’s end-of-period assets:

kt+1 = at.
(8)

3 Normalization

{sec:normalization}

The single most powerful method for speeding the solution of such models is to redefine the problem in a way that reduces the number of state variables (if at all possible). In the consumption context, the obvious idea is to see whether the problem can be rewritten in terms of the ratio of various variables to permanent noncapital (‘labor’) income pt  (henceforth for brevity, ‘permanent income.’)

In the last period of life T  , there is no future value, vT+1 =  0  , so the optimal plan is to consume everything:

                 1−ρ
vT (m  T,pT ) = m-T---.
               1 − ρ
(9)

Now define nonbold variables as the bold variable divided by the level of permanent income in the same period, so that, for example, mT  = m  T∕pT  ; and define vT (mT ) = u(mT )  .3 For our CRRA utility function, u(xy) = x1−ρu (y )  , so (9) can be rewritten as

                    m1 −ρ
vT (m T ,pT) = p1T−ρ--T---
                    1 − ρ
             = p1 −ρ𝒢1− ρvT(mT ).
                 T−1  T
(10)

Now define a new optimization problem:

                             1−ρ
vt(mt ) = macxt   u(ct) + β𝔼t [𝒢 t+1 vt+1(mt+1)]

       s.t.
     at = mt − ct

   kt+1 =  at
   bt+1 =  (R ∕𝒢t+1)kt+1
          ◟--◝◜--◞
           ≡ ℛt+1
  mt+1 =  bt+1 + 𝜃t+1,
(11)

where division by 𝒢 in second-to-last equation yields a normalized return factor ℛ  which is the consequence of the fact that we have divided t + 1  level variables by p    =  𝒢t+1p
  t+1         t  .

Then it is easy to see that for t = T − 1  , we can write boldface (nonnormalized) v  as a function of v  (normalized value) and permanent income:

vt(m  t,pt ) = p1t−ρvt(mt),
(12)

and so on back to all earlier periods. Hence, if we solve the problem (11) which has only a single state variable mt  , we can obtain the levels of the value function from (12), and of consumption and all other variables from the corresponding permanent-income-normalized solution objects by multiplying each by p
 t  , e.g. 

                   m
                 ◜--◞t◟-◝
ct(m  t,pt ) = ptct(m  t∕pt ).

We have thus reduced the problem from two continuous state variables to one (and thereby enormously simplified its solution).

For future reference it is useful to write (11) in the traditional way, by substituting bt+1,kt+1,  and at  into mt+1   :

                                    ◜--------mt◞+◟1---------◝
v (m  ) = max   u(c) + β 𝔼 [𝒢1 −ρv  ((m  − c)(R∕𝒢   ) + 𝜃   )].
 t   t     c             t  t+1  t+1    t         t+1     t+1
(13)

4 Notation

{sec:notation}

4.1 Periods, Stages, Steps

The problem so far assumes that the agent has only one decision problem to solve in any period. But it is increasingly common to model agents who have multiple choice stages per period; a problem might have, say, a consumption decision (call it the c  stage), a labor supply stage (call it ℓ  ) and a choice of what proportion ς  of their assets to invest in a risky asset (the portfolio-choice stage).

The modeler might well want to explore whether the order in which the stages are solved makes any difference, either to the substantive results or to aspects of the computational solution like speed and accuracy.

If, as in section 2, we hard-wire into the solution code for each stage an assumption that its successor stage will be something in particular (say, the consumption stage assumes that the portfolio choice is next), then if we want to change the order of the stages (say, labor supply after consumption, followed by portfolio choice), we will need to re-hard-wire each of the stages to know particular things about its new successor (for example, the specifics of the distribution of the rate of return on the risky asset must be known by whatever stage precedes the portfolio choice stage).

But one of the cardinal insights of Bellman’s (1957, “Dynamic Programming”) original work is that everything that matters for the solution to the current problem is encoded in a ‘continuation-value function.’ Using Bellman’s insight, we describe here a framework for isolating the stage problems within a period from each other, and the period from its successors in any future period; the advantage of this is that the isolated stage and period problems will then be ‘modular’: We can solve them in any order without changing any code (only transitions need to be rewired). After considering the stage-order [ℓ,c,ς]  , the modeler can costlessly reorder the stages to consider, say, the order [ℓ,ς,c]  .4

4.2 Steps

{subsec:steps}

The key is to distinguish, within each stage’s Bellman problem, three steps:

  1. Arrival: Incoming state variables (e.g., k  ) are known, but any shocks associated with the period have not been realized and decision(s) have not yet been made

  2. Decision: The agent solves the decision problem for the period

  3. Continuation: After all decisions have been made, their consequences are measured by evaluation of the continuing-value function at the values of the ‘outgoing’ state variables (sometimes called ‘post-state’ variables)

Notice that this specification is silent about when the stochastic shocks are realized; this may occur either before or after the decision stage. In the consumption problem we are studying, the natural choice is to assume that the shocks have been realized before the decision is made so that the consumer knows what their income has been for the period. In the portfolio problem we will examine below, the portfolio share decision must be made before the stochastic returns are realized.

When we want to refer to a specific step in the stage we will do so by using an indicator which identifies that step. Here we use the consumption stage problem described above to exemplify the usage:

         Step  |Indicator |State  |Usage  |Explanation
---------------|----------|-------|-------|---------------------------------------
       Arrival |    ↼     |  k    |v↼ (k) |value at entry to stage (before shocks )
   Decision(s) | (blank)  |  m    |v(m )  |value of stage-decision (after shocks)
-Continuation--|----⇁--------a-----v⇁-(a)--value-at-exit-(after decision-)----------
               |

Notice that the value functions at different steps of the stage have distinct state variables. Only k  is known at the beginning of the stage, and other variables take on their values with equations like b = k ℛ  and m =  b + 𝜃.  We will refer to such within-the-stage creation of variables as ‘evolutions.’ So, the consumption stage problem has two evolutions: from k  to m  and from m  to a  .

4.3 Transitions

{subsec:transitions}

In the backward-induction world of Bellman solutions, to solve the problem of a particular period we must start with an end-of-period (continuation) value function, which we designate by explicitly including the period indicator in the subscript (the :=  symbol denotes that the object on the right hand side is assigned to the object on the left hand side; the left object ‘gets’ the right object):needs discussion: It’s made at the time of execution of Matt’s link structure; but is it a pointer, a deepcopy, an algorithm, or what?

                   ◜=◞k◟◝
vt⇁ (a) := βv ↼(t+1)( a ),
(14)

and we are not done solving the problem of period t until we have constructed a beginning-of-period value function v t(k)
 ↼  .

Similarly, in order to solve the problem of any stage, we must endow it with an end-of-stage continuation-value function. For the last stage in a period, the end-of-stage function is taken to be end-of-period value function; in our case where there is only one stage, this can be written cleanly as:

v  (a) :=  v  (a).
  ⇁        t⇁
(15)

pseudocode?

4.4 The Decision Problem in the New Notation

{subsec:decision-problem}

From ‘inside’ the decision stage, the Decision problem can now be written much more cleanly than in equation (11):

                         =a
                       ◜-◞◟-◝
v(m ) = maxc  u(c) + v ⇁(m − c)
(16)

5 The Usual Theory, and a Bit More Notation

{sec:the-usual-theory}

For reference and to illustrate our new notation, we will now derive the Euler equation and other standard results for the problem described above. Since we can write value as of the end of the consumption stage as a function of a  :

                                                         mt+1
                                           1− ρ    ◜------◞◟------◝
v⇁(a ) := vt⇁(a) := βv↼ (t+1)(a) = β𝔼 ↼(t+1)[𝒢t+1vt+1(a(R ∕𝒢t+1) + 𝜃t+1)],

the first order condition for (13) with respect to a  (given mt  ) is

uc(mt  − a) = vat (a) = 𝔼 ↼(t+1 )[βℛt+1 𝒢1t−+ρ1vmt+1(mt+1 )]
               ⇁                      − ρ m
                    = 𝔼 ↼(t+1 )[βR    𝒢t+1 vt+1(mt+1 )]
(17)

and because the Envelope theorem tells us that

vm(mt ) = 𝔼  (t+1)[βR𝒢 −ρvm  (mt+1 )]
 t         ↼          t+1  t+1
(18)

we can substitute the LHS of (18) for the RHS of (17) to get

uc(ct) = vmt (mt )
(19)

and rolling forward one period,

uc(ct+1) = vmt+1 (atℛt+1 + 𝜃t+1)
(20)

so that substituting the LHS in equation (17) finally gives us the Euler equation for consumption:

uc(ct) = 𝔼t⇁ [βR 𝒢−t+ρ1uc(ct+1)].
(21)

We can now restate the problem (13) with our new within-stage notation:

v(m ) = max   u(c) + v⇁ (m  − c)
          c
(22)

whose first order condition with respect to c  is

uc(c) = va (m −  c)
         ⇁
(23)

which is mathematically equivalent to the usual Euler equation for consumption.

We will revert to this formulation when we reach section 6.8.

6 Solving the Next-to-Last Period

{sec:solving-the-next}

To reduce clutter, we now temporarily assume that 𝒢t = 1  for all t  , so that the 𝒢 terms from the earlier derivations disappear, and setting t = T  the problem in the second-to-last period of life can now be expressed as

                                   a
                                ◜-◞◟-◝
v(t−1)(m ) = maxc   u(c) + v(t− 1)⇁ (m  − c)
(24)

where

                             ⌊            ⌋

v(t−1) (a) :=  βv↼t (a ) ≡ β𝔼↼t ⌈vt(aℛt +  𝜃t)⌉
     ⇁                           ◟--◝◜--◞
                                    mt

Using (0) t = T  ; (1) vt(m ) = u(m )  ; (2) the definition of u (m )  ; and (3) the definition of the expectations operator,

          ∫ ∞           1−ρ
              (aℛt-+--𝜗)---
v↼t(a ) =  0      1 − ρ    dℱ (𝜗)
(25)

where ℱ (𝜃 )  is the cumulative distribution function for 𝜃  .

This maximization problem implicitly defines a ‘local function’ ct− 1(m )  that yields optimal consumption in period t − 1  for any specific numerical level of resources like m  = 1.7  .

But because there is no general analytical solution to this problem, for any given m  we must use numerical computational tools to find the c  that maximizes the expression. This is excruciatingly slow because for every potential c  to be considered, a definite integral over the interval (0,∞  )  must be calculated numerically, and numerical integration is very slow (especially over an unbounded domain!).

6.1 Discretizing the Distribution

Our first speedup trick is therefore to construct a discrete approximation to the lognormal distribution that can be used in place of numerical integration. That is, we want to approximate the expectation over 𝜃  of a function g(𝜃)  by calculating its value at set of n𝜃  points 𝜃i  , each of which has an associated probability weight wi  :

             -
          ∫  𝜃
𝔼 [g(𝜃)] =   (𝜗 )dℱ (𝜗 )
            𝜃
          ∑n
        ≈     wig(𝜃i)
           𝜃=1

(because adding n  weighted values to each other is enormously faster than general-purpose numerical integration).

Such a procedure is called a ‘quadrature’ method of integration; ? survey a number of options, but for our purposes we choose the one which is easiest to understand: An ‘equiprobable’ approximation (that is, one where each of the values of 𝜃i  has an equal probability, equal to 1∕n𝜃  ).

We calculate such an n  -point approximation as follows.

Define a set of points from ♯0   to ♯n
  𝜃   on the [0,1]  interval as the elements of the set ♯ = {0,1 ∕n,2∕n, ...,1} .5 Call the inverse of the 𝜃  distribution   −1
ℱ   , and define the points  −1     −1
♯i  = ℱ   (♯i)  . Then the conditional mean of 𝜃  in each of the intervals numbered 1 to n  is:

                          ∫  −1
          −1        − 1      ♯i
𝜃i ≡ 𝔼[𝜃|♯i− 1 ≤ 𝜃 < ♯i  ] =  −1  𝜗 dℱ (𝜗),
                            ♯i−1
(26)

and when the integral is evaluated numerically for each i  the result is a set of values of 𝜃  that correspond to the mean value in each of the n  intervals.

The method is illustrated in Figure 1. The solid continuous curve represents the “true” CDF ℱ (𝜃)  for a lognormal distribution such that 𝔼[𝜃] = 1  , σ𝜃 = 0.1  . The short vertical line segments represent the n 𝜃  equiprobable values of 𝜃i  which are used to approximate this distribution.6

pict

Figure 1: Equiprobable Discrete Approximation to Lognormal Distribution ℱ
{fig:discreteapprox}

Because one of the purposes of these notes is to connect the math to the code that solves the math, we display here a brief snippet from the notebook that constructs these points.

We now substitute our approximation (27) for v(t−1)⇁ (a)  in (24) which is simply the sum of n𝜃  numbers and is therefore easy to calculate (compared to the full-fledged numerical integration (25) that it replaces).

              (   )
                1   ∑n𝜃 (ℛta + 𝜃i)1− ρ
vt−1⇁ (a) = β  ---      -------------
               n 𝜃  i=1     1 − ρ
(27)

6.2 The Approximate Consumption and Value Functions

Given any particular value of m  , a numerical maximization tool can now find the c  that solves (24) in a reasonable amount of time.

The notebook code responsible for computing an estimated consumption function begins in “Solving the Model by Value Function Maximization,” where a vector containing a set of possible values of market resources m  is created (in the code, various m  vectors have names beginning mVec; in these notes we will use a boldface monotype (computer) font to represent vectors, so for example we can refer to our collection of m  points as mmm with values indexed by brackets: mmm[1]  is the first entry in the vector, up to a last entry mmm[− 1]  ; we arbitrarily (and suboptimally) pick the first five integers as our five mVec gridpoints (in the code, mVec-int = {0.,1.,2.,3.,4.} )).

6.3 An Interpolated Consumption Function

{subsec:LinInterp}

This is accomplished in “An Interpolated Consumption Function,” which generates an interpolating function that we designate `c(t−1)(m )  .

Figures 2 and 3 show plots of the constructed `ct− 1   and `vt−1   . While the `ct−1   function looks very smooth, the fact that the `vt−1   function is a set of line segments is very evident. This figure provides the beginning of the intuition for why trying to approximate the value function directly is a bad idea (in this context).7

pict
Figure 2: cT− 1(m )  (solid) versus `cT− 1(m )  (dashed)
{fig:PlotcTm1Simple}

pict
Figure 3: vT −1   (solid) versus `vT−1(m )  (dashed)
{fig:PlotVTm1Simple}

6.4 Interpolating Expectations

Piecewise linear ‘spline’ interpolation as described above works well for generating a good approximation to the true optimal consumption function. However, there is a clear inefficiency in the program: Since it uses equation (24), for every value of m  the program must calculate the utility consequences of various possible choices of c  (and therefore at− 1   ) as it searches for the best choice.

For any given index j  in mmm[j]  , as it searches for the corresponding optimal a  , the algorithm will end up calculating v(t−1)⇁ (˜a)  for many ˜a  values close to the optimal at− 1   . Indeed, even when searching for the optimal a  for a different m  (say mmm[k ]  for k ⁄= j  ) the search process might compute v(t−1)⇁ (a)  for an a  close to the correct optimal a  for mmm[j]  . But if that difficult computation does not correspond to the exact solution to the mmm[k]  problem, it is discarded.

The notebook section “Interpolating Expectations,” now interpolates the expected value of ending the period with a given amount of assets.8

Figure 4 compares the true value function to the approximation produced by following the interpolation procedure; the approximated and exact functions are of course identical at the gridpoints of aaa and they appear reasonably close except in the region below m  = 1  .

pict
Figure 4: End-Of-Period Value v (t−1) (at−1)
      ⇁  (solid) versus `v(T−1) (aT− 1)
     ⇁  (dashed)
{fig:PlotOTm1RawVSInt}

pict
Figure 5: cT− 1(m )  (solid) versus `cT− 1(m )  (dashed)
{fig:PlotComparecTm1AB}

In all figs, replace gothic h with notation corresponding to the lecture notes.

Nevertheless, the consumption rule obtained when the approximating `v      (a   )
 (t− 1)⇁   t− 1  is used instead of v (t−1)⇁(at−1)  is surprisingly bad, as shown in figure 5. For example, when m  goes from 2 to 3, `ct−1   goes from about 1 to about 2, yet when m  goes from 3 to 4, `c goes from about 2 to about 2.05. The function fails even to be concave, which is distressing because Carroll and Kimball (?) prove that the correct consumption function is strictly concave in a wide class of problems that includes this one.

6.5 Value Function versus First Order Condition

{subsec:vVsuP}

Loosely speaking, our difficulty reflects the fact that the consumption choice is governed by the marginal value function, not by the level of the value function (which is the object that we approximated). To understand this point, recall that a quadratic utility function exhibits risk aversion because with a stochastic c  ,

           2              2
𝔼 [− (c − /c)] < − (𝔼[c] − /c)
(28)

(where /c  is the ‘bliss point’ which is assumed always to exceed feasible c  ). However, unlike the CRRA utility function, with quadratic utility the consumption/saving behavior of consumers is unaffected by risk since behavior is determined by the first order condition, which depends on marginal utility, and when utility is quadratic, marginal utility is unaffected by risk:

𝔼[− 2(c − /c)] = − 2(𝔼[c] − /c).
(29)

Intuitively, if one’s goal is to accurately capture choices that are governed by marginal value, numerical techniques that approximate the marginal value function will yield a more accurate approximation to optimal behavior than techniques that approximate the level of the value function.

The first order condition of the maximization problem in period T − 1  is:

 c                  c
u (c) = β 𝔼⇁ (T−1)[Ru (ct)]
           ( 1 ) ∑n𝜃
 c− ρ = R β  ---     (R(m  − c) + 𝜃i)−ρ .
             n𝜃  i=1
(30)

pict
Figure 6: uc(c)  versus va     (3 − c),va     (4 − c),`va     (3 − c),`va     (4 − c)
 (T−1)⇁         (T−1)⇁         (T−1)⇁          (T −1)⇁
{fig:PlotuPrimeVSOPrime}

The downward-sloping curve in Figure 6 shows the value of c−ρ  for our baseline parameter values for 0 ≤ c ≤ 4  (the horizontal axis). The solid upward-sloping curve shows the value of the RHS of (30) as a function of c  under the assumption that m  = 3  . Constructing this figure is time-consuming, because for every value of c  plotted we must calculate the RHS of (30). The value of c  for which the RHS and LHS of (30) are equal is the optimal level of consumption given that m  = 3  , so the intersection of the downward-sloping and the upward-sloping curves gives the (approximated) optimal value of c  . As we can see, the two curves intersect just below c = 2  . Similarly, the upward-sloping dashed curve shows the expected value of the RHS of (30) under the assumption that m  = 4  , and the intersection of this curve with uc (c)  yields the optimal level of consumption if m =  4  . These two curves intersect slightly below c = 2.5  . Thus, increasing m  from 3 to 4 increases optimal consumption by about 0.5.

Now consider the derivative of our function `v(t−1)(at− 1)  . Because we have constructed `v(t−1)   as a linear interpolation, the slope of `v (t−1)(at−1)  between any two adjacent points {aaa [i],aaa[i + 1]} is constant. The level of the slope immediately below any particular gridpoint is different, of course, from the slope above that gridpoint, a fact which implies that the derivative of `v(t− 1)⇁ (at− 1)  follows a step function.

The solid-line step function in Figure 6 depicts the actual value of  a
`v(t−1)⇁(3 − c)  . When we attempt to find optimal values of c  given m  using `v(t− 1)⇁ (at−1)  , the numerical optimization routine will return the c  for which uc(c) = `va    (m −  c)
         (t−1)⇁  . Thus, for m  = 3  the program will return the value of c  for which the downward-sloping  c
u (c)  curve intersects with the   a
`v(t−1)⇁ (3 − c)  ; as the diagram shows, this value is exactly equal to 2. Similarly, if we ask the routine to find the optimal c  for m  = 4  , it finds the point of intersection of uc(c)  with `va     (4 − c)
 (t−1)⇁  ; and as the diagram shows, this intersection is only slightly above 2. Hence, this figure illustrates why the numerical consumption function plotted earlier returned values very close to c = 2  for both m =  3  and m  = 4  .

We would obviously obtain much better estimates of the point of intersection between uc(c)  and va    (m  − c)
 (t−1)⇁  if our estimate of `va
  (t−1)⇁   were not a step function. In fact, we already know how to construct linear interpolations to functions, so the obvious next step is to construct a linear interpolating approximation to the expected marginal value of end-of-period assets function at the points in aaa :

                (   )  n𝜃
 a               -1-  ∑            − ρ
v(t−1)⇁(aaa) = βR   n 𝜃      (ℛtaaa + 𝜃i)
                      i=1
(31)

yielding vvva
 (t−1)⇁   (the vector of expected end-of-period-(T  − 1)  marginal values of assets corresponding to aVec ), and construct  a
`v(t− 1)⇁ (at−1)  as the linear interpolating function that fits this set of points.

pict
Figure 7: va     (at−1)
 (t−1)⇁  versus `va    (at−1)
 (t−1)⇁
{fig:PlotOPRawVSFOC}

The results are shown in Figure 7. The linear interpolating approximation looks roughly as good (or bad) for the marginal value function as it was for the level of the value function. However, Figure 8 shows that the new consumption function (long dashes) is a considerably better approximation of the true consumption function (solid) than was the consumption function obtained by approximating the level of the value function (short dashes).

pict
Figure 8: ct−1(m )  (solid) Versus Two Methods for Constructing `ct− 1(m)
{fig:PlotcTm1ABC}

6.6 Transformation

{subsec:transformation}

Even the new-and-improved consumption function diverges notably from the true solution, especially at lower values of m  . That is because the linear interpolation does an increasingly poor job of capturing the nonlinearity of  a
v(t− 1)⇁   at lower and lower levels of a  .

This is where we unveil our next trick. To understand the logic, start by considering the case where ℛ  =  β = 𝒢  = 1
  t         t  and there is no uncertainty (that is, we know for sure that income next period will be 𝜃t = 1  ). The final Euler equation (recall that we are still assuming that t = T  ) is then:

c−t−ρ1 = c−tρ.
(32)

In the case we are now considering with no uncertainty and no liquidity constraints, the optimizing consumer does not care whether a unit of income is scheduled to be received in the future period t  or the current period t − 1  ; there is perfect certainty that the income will be received, so the consumer treats its PDV as equivalent to a unit of current wealth. Total resources available at the point when the consumption decision is made is therefore comprised of two types: current market resources m  and ‘human wealth’ (the PDV of future income) of ht−1 = 1  (because it is the value of human wealth as of the end of the period, there is only one more period of income of 1 left).

            (       ) −ρ
 m            m--+-1
v(t−1)(m ) =     2       .
(33)

Of course, this is a highly nonlinear function. However, if we raise both sides of (33) to the power (− 1 ∕ρ)  the result is a linear function:

[ m       ]−1∕ρ   m  + 1
 v (t−1)(m )      = ------.
                     2
(34)

This is a specific example of a general phenomenon: A theoretical literature discussed in ? establishes that under perfect certainty, if the period-by-period marginal utility function is of the form  − ρ
ct  , the marginal value function will be of the form (γmt + ζ)− ρ  for some constants {γ,ζ} . This means that if we were solving the perfect foresight problem numerically, we could always calculate a numerically exact (because linear) interpolation.

To put the key insight in intuitive terms, the nonlinearity we are facing springs in large part from the fact that the marginal value function is highly nonlinear. But we have a compelling solution to that problem, because the nonlinearity springs largely from the fact that we are raising something to the power − ρ  . In effect, we can ‘unwind’ all of the nonlinearity owing to that operation and the remaining nonlinearity will not be nearly so great. Specifically, applying the foregoing insights to the end-of-period value function va(t−1)(a)  , we can define an ‘inverse marginal value’ function

 a       ( a    ) −1∕ρ
Λt⇁(a) ≡  vt⇁ (a )
(35)

which would be linear in the perfect foresight case.9 We then construct a piecewise-linear interpolating approximation to the Λa
 t  function,  a
`Λt⇁ (at)  , and for any a  that falls in the range {aaa[1],aaa[− 1]} we obtain our approximation of marginal value from:

`vat(a ) = [`Λat(a)]−ρ
(36)

The most interesting thing about all of this, though, is that the Λat  function has another interpretation. Recall our point in (23) that uc(ct) = va (mt − ct)
         ⇁  . Since with CRRA utility uc(c) = c−ρ  , this can be rewritten and inverted

(ct)−ρ = va (at)
        (⇁      )
    ct = vat⇁ (a) −1∕ρ .
(37)

What this means is that for any given a  , if we can calculate the marginal value associated with ending the period with that a  , then we can learn the level of c  that the consumer must have chosen if they ended up with that a  as the result of an optimal unconstrained choice. This leads us to an alternative interpretation of Λa  . It is the function that reveals, for any ending a  , how much the agent must have consumed to (optimally) get to that a  . We will therefore henceforth refer to it as the ‘consumed function:’

          `a
`ct⇁(at) ≡ Λt⇁ (at).
(38)

Thus, for example, for period t − 1  our procedure is to calculate the vector of ccc points on the consumed function:

ccc = c(t−1)⇁(aaa )
(39)

with the idea that we will construct an approximation of the consumed function `c(t− 1)⇁ (a )  as the interpolating function connecting these {aaa,ccc } points.

6.7 The Natural Borrowing Constraint and the at−1   Lower Bound

{subsec:LiqConstrSelfImposed}

This is the appropriate moment to ask an awkward question: How should an interpolated, approximated ‘consumed’ function like `c(t−1)⇁(at−1)  be extrapolated to return an estimated ‘consumed’ amount when evaluated at an at−1   outside the range spanned by {aaa [1],...,aaa[n]} ?

For most canned piecewise-linear interpolation tools like scipy.interpolate, when the ‘interpolating’ function is evaluated at a point outside the provided range, the algorithm extrapolates under the assumption that the slope of the function remains constant beyond its measured boundaries (that is, the slope is assumed to be equal to the slope of nearest piecewise segment within the interpolated range); for example, if the bottommost gridpoint is a1 = aaa[1]  and the corresponding consumed level is c  = c      (a  )
 1    (t− 1)⇁   1  we could calculate the ‘marginal propensity to have consumed’        a
ϰ1 =  `c(t−1)⇁(a1)  and construct the approximation as the linear extrapolation below aaa[1]  from:

`c(t−1)⇁(a) ≡ c1 + (a − a1)ϰ1.
(40)

To see that this will lead us into difficulties, consider what happens to the true (not approximated) va     (at−1)
 (t−1)⇁  as at− 1   approaches a quantity we will call the ‘natural borrowing constraint’:            −1
at− 1 = − 𝜃ℛ t  . From (31) we have

                          (    ) ∑n𝜃
 lim  va     (a) =  lim  βR   -1-     (a ℛ +  𝜃)−ρ .
a↓at−1 (t−1)⇁       a↓at−1     n 𝜃         t    i
                                 i=1
(41)

But since 𝜃 =  𝜃
--    1   , exactly at a = a
    -t−1   the first term in the summation would be           −ρ      ρ
(− 𝜃-+ 𝜃1)  =  1∕0  which is infinity. The reason is simple: − at−1   is the PDV, as of t − 1  , of the minimum possible realization of income in t  (ℛtat− 1 = − 𝜃1   ). Thus, if the consumer borrows an amount greater than or equal to 𝜃ℛ −t1  (that is, if the consumer ends t − 1  with at−1 ≤ − 𝜃ℛ −1
            t  ) and then draws the worst possible income shock in period t  , they will have to consume zero in period t  , which yields − ∞ utility and + ∞ marginal utility.

As ? first noticed, this means that the consumer faces a ‘self-imposed’ (or, as above, ‘natural’) borrowing constraint (which springs from the precautionary motive): They will never borrow an amount greater than or equal to 𝜃ℛ−t 1  (that is, assets will never reach the lower bound of a
-t−1   ). The constraint is ‘self-imposed’ in the precise sense that if the utility function were different (say, Constant Absolute Risk Aversion), the consumer might be willing to borrow more than    −1
𝜃ℛ t  because a choice of zero or negative consumption in period t  would yield some finite amount of utility.10

This self-imposed constraint cannot be captured well when the va
 (t− 1)⇁   function is approximated by a piecewise linear function like  m
`v(t− 1)⇁   , because it is impossible for the linear extrapolation below a-  to correctly predict va(t−1)⇁(at−1) = ∞.

So, the marginal value of saving approaches infinity as a ↓ a-  =  − 𝜃ℛ− 1
     t−1       t  . But this implies that                         a        −1∕ρ
lima↓at−1 c(t− 1)⇁ (a ) = (v (t−1)⇁ (a))  =  0  ; that is, as a  approaches its ‘natural borrowing constraint’ minimum possible value, the corresponding amount of worst-case c  must approach its lower bound: zero.

The upshot is a realization that all we need to do to address these problems is to prepend each of the aaa
 t−1  and ccc
  t− 1  from (39) with an extra point so that the first element in the mapping that produces our interpolation function is {at−1,0.} . This is done in section “The Self-Imposed ‘Natural’ Borrowing Constraint and the at−1   Lower Bound” of the notebook.

The vertical axis should be relabeled - not gothic c anymore, instead Λa

pict
Figure 9: True Λa    (a)
 (t−1)⇁  vs its approximation `Λa     (a)
 (t− 1)⇁
{fig:GothVInvVSGothC}

Figure 9 shows the result. The solid line calculates the exact numerical value of the consumed function c(t− 1) (a)
     ⇁  while the dashed line is the linear interpolating approximation `c      (a).
 (t− 1)⇁  This figure illustrates the value of the transformation: The true function is close to linear, and so the linear approximation is almost indistinguishable from the true function except at the very lowest values of a  .

Figure 10 similarly shows that when we generate ``va(t− 1)⇁ (a)  using our augmented [`c(t−1)⇁ (a)]− ρ  (dashed line) we obtain a much closer approximation to the true marginal value function va     (a)
 (t−1)⇁  (solid line) than we obtained in the previous exercise which did not do the transformation (Figure 7).11

fix the problem articulated in the footnote

pict
Figure 10: True va    (a)
 (t−1)⇁  vs. ``va     (a )
 (t− 1)⇁  Constructed Using `c(t−1) (a)
     ⇁
{fig:GothVVSGothCInv}

6.8 The Method of Endogenous Gridpoints (‘EGM’)

{subsec:egm}

The solution procedure above for finding c   (m )
 t− 1  still requires us, for each point in mmm
  t− 1  , to use a numerical rootfinding algorithm to search for the value of c  that solves  c       a
u (c) = v(t− 1)⇁ (m − c)  . Though sections 6.6 and 6.7 developed a highly efficient and accurate procedure to calculate `va(t−1)⇁   , those approximations do nothing to eliminate the need for using a rootfinding operation for calculating, for an arbitrary m  , the optimal c  . And rootfinding is a notoriously computation-intensive (that is, slow!) operation.

Fortunately, it turns out that there is a way to completely skip this slow rootfinding step. The method can be understood by noting that we have already calculated, for a set of arbitrary values of aaa = aaat− 1  , the corresponding ccc values for which this aaa is optimal.

But with mutually consistent values of ccct−1  and aaat−1  (consistent, in the sense that they are the unique optimal values that correspond to the solution to the problem), we can obtain the mmmt−1  vector that corresponds to both of them from

mmmt−1 = ccct− 1 + aaat−1.
(42)

Rename gothic class to: EndPrd. Also, harmonize the notation in the notebook to that in the notes - for example, everwhere in the text we use cNrm=lower case letter c for normalized consumption, but for some reason it is capital C in the gothic function.

fix the problem articulated in the footnote

These m  gridpoints are “endogenous” in contrast to the usual solution method of specifying some ex-ante (exogenous) grid of values of mmm and then using a rootfinding routine to locate the corresponding optimal consumption vector ccc .

This routine is performed in the “Endogenous Gridpoints” section of the notebook. First, the gothic.C_Tminus1 function is called for each of the pre-specfied values of end-of-period assets stored in aVec . These values of consumption and assets are used to produce the list of endogenous gridpoints, stored in the object mVec_egm. With the ccc  values in hand, the notebook can generate a set of mmmt−1  and ccct−1  pairs that can be interpolated between in order to yield `c(t−1)(m )  at virtually zero computational cost!12

One might worry about whether the {m,  c} points obtained in this way will provide a good representation of the consumption function as a whole, but in practice there are good reasons why they work well (basically, this procedure generates a set of gridpoints that is naturally dense right around the parts of the function with the greatest nonlinearity).

pict
Figure 11: ct−1(m )  (solid) versus `ct− 1(m )  (dashed)
{fig:ComparecTm1AD}

Figure 11 plots the actual consumption function ct− 1   and the approximated consumption function `ct−1   derived by the method of endogenous grid points. Compared to the approximate consumption functions illustrated in Figure 8, `c
 t−1   is quite close to the actual consumption function.

6.9 Improving the a  Grid

{subsec:improving-the-a-grid}

Thus far, we have arbitrarily used a  gridpoints of {0.,1.,2.,3.,4.} (augmented in the last subsection by a
-t−1   ). But it has been obvious from the figures that the approximated `c
 (t−1)⇁   function tends to be farthest from its true value at low values of a  . Combining this with our insight that at− 1   is a lower bound, we are now in position to define a more deliberate method for constructing gridpoints for a  – a method that yields values that are more densely spaced at low values of a  where the function is more nonlinear.

A pragmatic choice that works well is to find the values such that (1) the last value exceeds the lower bound by the same amount ¯a  as our original maximum gridpoint (in our case, 4.); (2) we have the same number of gridpoints as before; and (3) the multi-exponential growth rate (that is,   ...
eee   for some number of exponentiations n  – our default is 3) from each point to the next point is constant (instead of, as previously, imposing constancy of the absolute gap between points).

pict
Figure 12: c(t−1) (a)
     ⇁  versus `c(t−1) (a)
     ⇁  , Multi-Exponential aVec
{fig:GothVInvVSGothCEE}

pict

Figure 13: va(t−1)⇁ (a)  vs. ``va(t−1)⇁(a)  , Multi-Exponential aVec
{fig:GothVVSGothCInvEE}

Section “Improve the Ggrid  ” begins by defining a function which takes as arguments the specifications of an initial grid of assets and returns the new grid incorporating the multi-exponential approach outlined above.

Notice that the graphs depicted in Figures 12 and 13 are notably closer to their respective truths than the corresponding figures that used the original grid.

6.10 Program Structure

In section “Solve for ct(m )  in Multiple Periods,” the natural and artificial borrowing constraints are combined with the endogenous gridpoints method to approximate the optimal consumption function for a specific period. Then, this function is used to compute the approximated consumption in the previous period, and this process is repeated for some specified number of periods.

The essential structure of the program is a loop that iteratively solves for consumption functions by working backward from an assumed final period, using the dictionary cFunc_life to store the interpolated consumption functions up to the beginning period. Consumption in a given period is utilized to determine the endogenous gridpoints for the preceding period. This is the sense in which the computation of optimal consumption is done recursively.

For a realistic life cycle problem, it would also be necessary at a minimum to calibrate a nonconstant path of expected income growth over the lifetime that matches the empirical profile; allowing for such a calibration is the reason we have included the {𝒢 }Tt  vector in our computational specification of the problem.

6.11 Results

The code creates the relevant `ct(m )  functions for any period in the horizon, at the given values of m  . Figure 14 shows `cT −n(m )  for n =  {20,15,10,5, 1} . At least one feature of this figure is encouraging: the consumption functions converge as the horizon extends, something that ? shows must be true under certain parametric conditions that are satisfied by the baseline parameter values being used here.

pict

Figure 14: Converging `cT− n(m )  Functions as n  Increases
{fig:PlotCFuncsConverge}

7 The Infinite Horizon

{sec:the-infinite-horizon}

All of the solution methods presented so far have involved period-by-period iteration from an assumed last period of life, as is appropriate for life cycle problems. However, if the parameter values for the problem satisfy certain conditions (detailed in ?), the consumption rules (and the rest of the problem) will converge to a fixed rule as the horizon (remaining lifetime) gets large, as illustrated in Figure 14. Furthermore, Deaton (?), Carroll (??) and others have argued that the ‘buffer-stock’ saving behavior that emerges under some further restrictions on parameter values is a good approximation of the behavior of typical consumers over much of the lifetime. Methods for finding the converged functions are therefore of interest, and are dealt with in this section.

Of course, the simplest such method is to solve the problem as specified above for a large number of periods. This is feasible, but there are much faster methods.

7.1 Convergence

{subsec:convergence}

In solving an infinite-horizon problem, it is necessary to have some metric that determines when to stop because a solution that is ‘good enough’ has been found.

A natural metric is defined by the unique ‘target’ level of wealth that ? proves will exist in problems of this kind under certain conditions: The mˆ  such that

𝔼t [mt+1 ∕mt ] = 1 if mt = ˆm
(43)

where the accent is meant to signify that this is the value that other m  ’s ‘point to.’

Given a consumption rule c(m )  it is straightforward to find the corresponding ˆm  . So for our problem, a solution is declared to have converged if the following criterion is met: |mˆt+1 −  ˆmt| < 𝜖  , where 𝜖  is a very small number and defines our degree of convergence tolerance.

Similar criteria can obviously be specified for other problems. However, it is always wise to plot successive function differences and to experiment a bit with convergence criteria to verify that the function has converged for all practical purposes.

8 Multiple Control Variables

{sec:multiple-control-variables}

We now consider how to solve problems with multiple control variables. Specifically, we will examine a consumer who has both a choice of how much to consume and a choice of how much of their unconsumed resources to invest in risky versus safe assets.

8.1 Theory

{subsec:MCTheory}

The portfolio-share control-variable is captured by the archaic Greek character ‘stigma’: ς  represents the share of their available assets the agent invests in the risky asset (conventionally, the stock market). Designating the return factor for the risky asset as R  and the share of the portfolio invested in R  as ς  , the realized portfolio rate of return ℜ  as a function of the share ς  is:

ℜ (ς) = R + (R − R )ς.
(44)

If we imagine the portfolio share decision as being made simultaneously with the c  decision, the traditional way of writing the problem is (substituting the budget constraint):

vt(m ) = max   u (c) + 𝔼 [βvt+1 ((m  − c)ℜ (ς) + 𝜃t+1)]
         {c,ς}
(45)

where we have deliberately omitted the period-designating subscripts for ς  and the return factors to highlight the point that, once the consumption and ς  decisions have been made, it makes no difference to this equation whether the risky return factor R  is revealed a nanosecond before the end of the current period or a nanosecond after the beginning of the successor period.

8.2 Stages Within a Period

{subsec:stageswithin}

In most cases it is possible to take multiple-control problems and turn them into a sequence of single-control ‘stages’ which can be solved sequentially. For this problem we will call the ‘consumption stage’ c  and the ‘portfolio stage’ ς  . Our earlier point that, substantively, the timing of the realization of the return shocks does not matter means that these could come in either order in the period: We designate the ‘portfolio choice first, then consumption’ version by [ς,c]  and the ‘consumption choice first, then portfolio’ scheme as [c,ς]  .

In a problem with multiple stages, if we want to refer to a sub-step of a particular stage – say, the Arrival step of the portfolio stage – we simply add a stage-indicator subscript (in square brackets) to the notation we have been using until now. That is, the Arrival stage of the portfolio problem would be v  [ς]
 ↼   .{SB, AL, MNW: An alternative notational choice would be v[↼ς]   .} (The version where both choices are made simultaneously could be designated as a single stage named [cς]  ) with arrival value function v↼ [cς]   .{SB, AL, MNW: with arrival value function v[↼c ς].  }

8.2.1 The (Revised) Consumer’s Problem
{subsubsec:revised-consumers-problem}

A slight modification to the consumer’s problem specified earlier is necessary to make the stages of the problem completely modular. The difficulty with the earlier formulation is that it assumed that asset returns occurred in the middle step of the consumption problem. Our revised version of the consumption problem takes as its input state the amount of bank balances that have resulted from any prior portfolio decision. The problem is therefore:

 v[c](m ) = max   u (c) + v[c]⇁(m◟-◝−◜-c◞)
            c                  a
               ⌊      m   ⌋
               ⌈     ◜◞◟◝ ⌉
v↼ [c](b) = 𝔼 ↼[c]  v[c](b + 𝜃)
(46)

8.2.2 The Investor’s Problem
{subsubsec:investors-problem}

Consider the standalone problem of an ‘investor’ whose continuation-value function v[ς]⇁   depends on how much wealth ´w  they end up after the realization of the stochastic R  return.

Using the ˘  accent to designate the optimized value of the accented control, the Decision stage of this problem yields the portfolio share function:

                   ⌊             ⌋
                         ◜--w◞´◟--◝
˘ς(w) = argmax  𝔼[ς]⌈v[ς]⇁ (wℜ (ς))⌉ ,
          ς
(47)

and the Arrival value function is the expectation of the Continuation-value function over the wealth that results from the portfolio returns obtained under the choice of portfolio share made in the Decision step of the problem, ˘
ℜ  = R + ˘ς(w )(R  − R )  :

              [          ]
v↼ [ς](w ) =𝔼 [ς] v[ς]  (w ˘ℜ ) .
                  ⇁
(48)

The reward for all this notational investment is that it is now clear that exactly the same code for solving the portfolio share problem can be used in two distinct problems: a ‘beginning-of-period-returns’ model and an ‘end-of-period-returns’ model.

8.2.3 The ‘beginning-of-period returns’ Problem
{subsubsec:beginning-returns}

The beginning-returns problem effectively just inserts a portfolio choice that happens at a stage immediately before the consumption stage in the optimal consumption problem described in (46), for which we had a beginning-of-stage value function v↼ [c](b)  . The agent makes their portfolio share decision within the stage but (obviously) before the risky returns R  for the period have been realized. So the problem’s portfolio-choice stage also takes k  as its initial state and solves the investor’s problem outlined in section 8.2.2:

v↼ [ς](k ) = 𝔼↼ [ς][v[ς]⇁(◟k◝˘ℜ◜◞)]
                       b

 v[ς]⇁ (b) = v↼[c](b)
(49)

Since in this setup bank balances have been determined before the consumption problems starts, we need to rewrite the consumption stage as a function of bank balances that will have resulted from the portfolio investment b  , combined with the income shocks 𝜃  :

                                  m
                                ◜◞ ◟◝
v↼[c](b) = max  u(c) + 𝔼 ↼ [c][v[c]⇁(b◟-+◝𝜃◜−-c◞)]
           c                        a
(50)

and since the consumption stage is the last stage in the period, the (undated) a  that emerges from this equation is equivalent to the at  characterizing the end of the period. The ‘state transition’ equation between t  and t + 1  is simply k    = a
  t+1    t  and the continuation-value function transition is vt⇁ (k) := βv ↼(t+1)(k )  which reflects the above-mentioned point that there is no substantive difference between the two problems (their v[c](m )  value functions and c(m )  functions will be identical).

v[c]⇁ (a) =vt⇁ (a)
(51)

(and recall that vt⇁ (a )  is exogenously provided as an input to the period’s problem via the transition equation assumed earlier: vt⇁(a) = βv ↼(t+1)(a)  ).

8.2.4 The ‘end-of-period-returns’ Problem

If the portfolio share and risky returns are realized at the end of the period, we need to move the portfolio choice stage to immediately before the point at which returns are realized (and after the c  choice has been made). This creates a slight awkwardness because the variable we have heretofore dubbed a  is no longer the end-of-period state, since this money must be invested and the returns realized before the end of the period. We want to continue using a  for ‘assets-after-all-actions-are-accomplished’ but now to include the ‘actions’ of the market, so we will temporarily designate the consumer’s unspent market resources by w = mN  rm  − c  because defined w  earlier as the input to the investor’s problem. So, the portfolio stage of the problem is

v ↼[ς](w) = 𝔼↼ [ς][v[ς]⇁(ℜ˘w  )]
                     ◟◝≡◜a◞
(52)

so the continuation-value function is v[ς] ( a  )
  ⇁  ◟◝◜◞
     ≡ℜw  is still a function of a  (and the ‘state transition’ equation between t  and t + 1  remains k    = a
  t+1    t  and the continuation-value function transition is vt⇁ (a) := βv ↼(t+1)(k )  .

(Note that we are assuming that there will be only one consumption function in the period, so no stage subscript is necessary to pick out ‘the consumption function’).

8.2.5 Numerical Solution

we can solve it numerically for the optimal ς  at a vector of aaa (aVec in the code) and then construct an approximated optimal portfolio share function `
˘ς(a)  as the interpolating function among the members of the {aaa, ςςς} mapping. Having done this, we can now calculate a vector of values and marginal values that correspond to aVec :

 vvv = v  [ς](aaa )
 a    ↼a
vvv  = v↼ [ς](aaa ).
(53)

With the  a
vvv  approximation described in hand, we can construct our approximation to the consumption function using exactly the same EGM procedure that we used in solving the problem without a portfolio choice (see (35)):

      a − 1∕ρ
ccc ≡ (vvv )    ,
(54)

which, following a procedure identical to that in the EGM subsection 6.8, yields an approximated consumption function `c (m )
 t  . Thus, again, we can construct the consumption function at nearly zero cost (once we have calculated  a
vvv  ).

8.2.6 The Point
{subsubsec:the-point}

The upshot is that all we need to do is change some of the transition equations and we can use the same solution code (both for the ς  -stage and the c  -stage) to solve the problem with either assumption (beginning-of-period or end-of-period) about the timing of portfolio choice. There is even an obvious notation for the two problems: v↼t[ςc]   can be the period-arrival value function for the version where the portfolio share is chosen at the beginning of the period, and v  t[cς]
 ↼   is period-arrival value for the the problem where the share choice is at the end.

What is the benefit of writing effectively the identical problem in two different ways? There are several:

8.3 Application

{subsec:MCApplication}

In specifying the stochastic process for Rt+1   , we follow the common practice of assuming that returns are lognormally distributed, log R ∼ 𝒩  (ϕ + r − σ2r∕2, σ2r)  where ϕ  is the equity premium over the thin returns r  available on the riskless asset.13

As with labor income uncertainty, it is necessary to discretize the rate-of-return risk in order to have a problem that is soluble in a reasonable amount of time. We follow the same procedure as for labor income uncertainty, generating a set of nr  equiprobable shocks to the rate of return; in a slight abuse of notation, we will designate the portfolio-weighted return (contingent on the chosen portfolio share in equity, and potentially contingent on any other aspect of the consumer’s problem) simply as ℜi,j  (where dependence on i  is allowed to permit the possibility of nonzero correlation between the return on the risky asset and the 𝜃  shock to labor income (for example, in recessions the stock market falls and labor income also declines)).

The direct expressions for the derivatives of v⇁   are

             (      )  n𝜃 nr
 a             --1--  ∑   ∑                        −ρ
v⇁(at,ςt) = β  nrn 𝜃         ℜi,j (ct+1(ℜi,jat + 𝜃i))
             (      ) i=1 j=1
 ς               1    ∑n𝜃 n∑r                            − ρ
v⇁(at,ςt) = β  -----         (Ri,j − R )(ct+1 (ℜi,jat + 𝜃i))  .
               nrn 𝜃  i=1 j=1
(55)

Writing these equations out explicitly makes a problem very apparent: For every different combination of {at,ςt} that the routine wishes to consider, it must perform two double-summations of n  × n
  r    𝜃  terms. Once again, there is an inefficiency if it must perform these same calculations many times for the same or nearby values of {at,ςt} , and again the solution is to construct an approximation to the (inverses of the) derivatives of the v⇁   function.

Details of the construction of the interpolating approximations are given below; assume for the moment that we have the approximations  a
`v⇁   and   ς
`v ⇁   in hand and we want to proceed. As noted above in the discussion of (45), nonlinear equation solvers can find the solution to a set of simultaneous equations. Thus we could ask one to solve

c−tρ = `vat  (mt − ct,ςt)
        ⇁ς
  0 = `vt⇁ (mt − ct,ςt)
(56)

simultaneously for c  and ς  at the set of potential mt  values defined in mVec. However, as noted above, multidimensional constrained maximization problems are difficult and sometimes quite slow to solve.

There is a better way. Define the problem

˘vt⇁(at) = max   v⇁ (at,ςt)
            ςt
       s.t.

    0 ≤ ςt ≤ 1

where the tilde over ˘v (a )  indicates that this is the v  that has been optimized with respect to all of the arguments other than the one still present (at  ). We solve this problem for the set of gridpoints in aVec and use the results to construct the interpolating function `a
˘vt(at)  .14 With this function in hand, we can use the first order condition from the single-control problem

c−ρ = `˘va(mt −  ct)
 t     t

to solve for the optimal level of consumption as a function of mt  using the endogenous gridpoints method described above. Thus we have transformed the multidimensional optimization problem into a sequence of two simple optimization problems.

Note the parallel between this trick and the fundamental insight of dynamic programming: Dynamic programming techniques transform a multi-period (or infinite-period) optimization problem into a sequence of two-period optimization problems which are individually much easier to solve; we have done the same thing here, but with multiple dimensions of controls rather than multiple periods.

8.4 Implementation

Following the discussion from section 8.1, to provide a numerical solution to the problem with multiple control variables, we must define expressions that capture the expected marginal value of end-of-period assets with respect to the level of assets and the share invested in risky assets. This is addressed in “Multiple Control Variables.”

8.5 Results With Multiple Controls

{subsec:results-with-multiple-controls}

Figure 15 plots the t − 1  consumption function generated by the program; qualitatively it does not look much different from the consumption functions generated by the program without portfolio choice.

But Figure 16 which plots the optimal portfolio share as a function of the level of assets, exhibits several interesting features. First, even with a coefficient of relative risk aversion of 6, an equity premium of only 4 percent, and an annual standard deviation in equity returns of 15 percent, the optimal choice is for the agent to invest a proportion 1 (100 percent) of the portfolio in stocks (instead of the safe bank account with riskless return R  ) is at values of at  less than about 2. Second, the proportion of the portfolio kept in stocks is declining in the level of wealth - i.e., the poor should hold all of their meager assets in stocks, while the rich should be cautious, holding more of their wealth in safe bank deposits and less in stocks. This seemingly bizarre (and highly counterfactual – see ?) prediction reflects the nature of the risks the consumer faces. Those consumers who are poor in measured financial wealth will likely derive a high proportion of future consumption from their labor income. Since by assumption labor income risk is uncorrelated with rate-of-return risk, the covariance between their future consumption and future stock returns is relatively low. By contrast, persons with relatively large wealth will be paying for a large proportion of future consumption out of that wealth, and hence if they invest too much of it in stocks their consumption will have a high covariance with stock returns. Consequently, they reduce that correlation by holding some of their wealth in the riskless form.

pict

Figure 15: c(m1 )  With Portfolio Choice
{fig:PlotctMultContr}

pict

Figure 16: Portfolio Share in Risky Assets in First Period ς(a)
{fig:PlotRiskySharetOfat}

9 Structural Estimation

{sec:structural-estimation}

This section describes how to use the methods developed above to structurally estimate a life-cycle consumption model, following closely the work of ?.15 The key idea of structural estimation is to look for the parameter values (for the time preference rate, relative risk aversion, or other parameters) which lead to the best possible match between simulated and empirical moments.

9.1 Life Cycle Model

{subsec:life-cycle-model}

Realistic calibration of a life cycle model needs to take into account a few things that we omitted from the bare-bones model described above. For example, the whole point of the life cycle model is that life is finite, so we need to include a realistic treatment of life expectancy; this is done easily enough, by assuming that utility accrues only if you live, so effectively the rising mortality rate with age is treated as an extra reason for discounting the future. Similarly, we may want to capture the demographic evolution of the household (e.g., arrival and departure of kids). A common way to handle that, too, is by modifying the discount factor (arrival of a kid might increase the total utility of the household by, say, 0.2, so if the ‘pure’ rate of time preference were 1.0  the ‘household-size-adjusted’ discount factor might be 1.2. We therefore modify the model presented above to allow age-varying discount factors that capture both mortality and family-size changes (we just adopt the factors used by ? directly), with the probability of remaining alive between t  and t + n  captured by ℒ and with ˆβ  now reflecting all the age-varying discount factor adjustments (mortality, family-size, etc). Using ℶ  (the Hebrew cognate of β  ) for the ‘pure’ time preference factor, the value function for the revised problem is

                              [T −t                   ]
                                ∑    n  t+n ˆt+n
vt(pt, m t) = m{ac}xT   u(ct) + 𝔼t ⇁    ℶ  ℒ t  βt  u(ct+n)
               t                n=1
(57)

subject to the constraints

   a t = m t − ct
 pt+1 = 𝒢t+1pt ψt+1
 y    = p    𝜃
   t+1     t+1  t+1
m  t+1 = Ra t + yt+1

where

ℒt+n : probability to ℒive until age t + n given alive at age t
  t
βˆtt+n : age -varying discount factor between ages t and t + n
  ψ  : mean -one shock  to permanent   income
    t
   ℶ : time -invariant ‘pure ’ discount factor

and all the other variables are defined as in section 2.

Households start life at age s = 25  and live with probability 1 until retirement (s = 65  ). Thereafter the survival probability shrinks every year and agents are dead by s = 91  as assumed by Cagetti.

Transitory and permanent shocks are distributed as follows:

        (
        |{ 0      with probability q > 0
   Ξ  =                                                       2     2
    s   |( 𝜃s∕q   with probability (1 − q), where log 𝜃s ∽ 𝒩 (− σ𝜃∕2,σ𝜃)

              2     2
log ψs ∽ 𝒩 (− σψ ∕2,σψ)
(58)

where q  is the probability of unemployment (and unemployment shocks are turned off after retirement).

The parameter values for the shocks are taken from Carroll (?), q = 0.5∕100  , σ𝜃 = 0.1  , and σ ψ = 0.1  .16 The income growth profile 𝒢t  is from Carroll (?) and the values of ℒt  and ˆ
βt  are obtained from Cagetti (?) (Figure 17).17 The interest rate is assumed to equal 1.03  . The model parameters are included in Table 1.

pict

Figure 17: Time Varying Parameters
{fig:TimeVaryingParam}

Table 1: Parameter Values
{table:StrEstParams}

--------------------------------
   σ𝜃       0.1      Carroll (?)
   σψ       0.1      Carroll (?)

   q       0.005     Carroll (?)
   𝒢s    figure 17   Carroll (?)
 ˆβs,ℒs   figure 17   Cagetti (? )
   R       1.03     Cagetti (? )
--------------------------------

The structural estimation of the parameters ℶ  and ρ  is carried out using the procedure specified in the following section, which is then implemented in the StructEstimation.py file. This file consists of two main components. The first section defines the objects required to execute the structural estimation procedure, while the second section executes the procedure and various optional experiments with their corresponding commands. The next section elaborates on the procedure and its accompanying code implementation in greater detail.

9.2 Estimation

When economists say that they are performing “structural estimation” of a model like this, they mean that they have devised a formal procedure for searching for values for the parameters ℶ  and ρ  at which some measure of the model’s outcome (like “median wealth by age”) is as close as possible to an empirical measure of the same thing. Here, we choose to match the median of the wealth to permanent income ratio across 7 age groups, from age 26 − 30  up to 56 − 60  .18 The choice of matching the medians rather the means is motivated by the fact that the wealth distribution is much more concentrated at the top than the model is capable of explaining using a single set of parameter values. This means that in practice one must pick some portion of the population who one wants to match well; since the model has little hope of capturing the behavior of Bill Gates, but might conceivably match the behavior of Homer Simpson, we choose to match medians rather than means.

As explained in section 3, it is convenient to work with the normalized version of the model which can be written in Bellman form as:

                                               1− ρ
vt(mt ) = macx    u(ct) + ℶℒt+1 ˆβt+1𝔼t[(ψt+1 𝒢t+1)   vt+1 (mt+1 )]
            t
       s.t.
     at = mt − ct
            (         )
  m    =  a   ---R----- +  𝜃
    t+1    t  ψt+1𝒢t+1      t+1
            ◟----◝◜----◞
                ≡ℛt+1

with the first order condition:

uc(ct) = ℶ ℒt+1ˆβt+1R𝔼t [uc(ψt+1𝒢t+1ct+1(atℛt+1 + 𝜃t+1))].
(59)

The first substantive step in this estimation procedure is to solve for the consumption functions at each age. We need to discretize the shock distribution and solve for the policy functions by backward induction using equation (59) following the procedure in sections 6 and ‘Recursion.’ The latter routine is slightly complicated by the fact that we are considering a life-cycle model and therefore the growth rate of permanent income, the probability of death, the time-varying discount factor and the distribution of shocks will be different across the years. We thus must ensure that at each backward iteration the right parameter values are used.

Correspondingly, the first part of the StructEstimation.py file begins by defining the agent type by inheriting from the baseline agent type IndShockConsumerType, with the modification to include time-varying discount factors. Next, an instance of this “life-cycle” consumer is created for the estimation procedure. The number of periods for the life cycle of a given agent is set and, following Cagetti,  (?), we initialize the wealth to income ratio of agents at age 25  by randomly assigning the equal probability values to 0.17  , 0.50  and 0.83  . In particular, we consider a population of agents at age 25 and follow their consumption and wealth accumulation dynamics as they reach the age of 60  , using the appropriate age-specific consumption functions and the age-varying parameters. The simulated medians are obtained by taking the medians of the wealth to income ratio of the 7  age groups.

To complete the creation of the consumer type needed for the simulation, a history of shocks is drawn for each agent across all periods by invoking the make_shock_history function. This involves discretizing the shock distribution for as many points as the number of agents we want to simulate and then randomly permuting this shock vector as many times as we need to simulate the model for. In this way, we obtain a time varying shock for each agent. This is much more time efficient than drawing at each time from the shock distribution a shock for each agent, and also ensures a stable distribution of shocks across the simulation periods even for a small number of agents. (Similarly, in order to speed up the process, at each backward iteration we compute the consumption function and other variables as a vector at once.)

With the age-varying consumption functions derived from the life-cycle agent, we can proceed to generate simulated data and compute the corresponding medians. Estimating the model involves comparing these simulated medians with empirical medians, measuring the model’s success by calculating the difference between the two. However, before performing the necessary steps of solving and simulating the model to generate simulated moments, it’s important to note a difficulty in producing the target moments using the available data.

Specifically, defining ξ  as the set of parameters to be estimated (in the current case ξ = {ρ, ℶ} ), we could search for the parameter values which solve

      7
    ∑     τ   τ
minξ     |ς − s (ξ)|
    τ=1
(60)

where ςτ  and sτ  are respectively the empirical and simulated medians of the wealth to permanent income ratio for age group τ  . A drawback of proceeding in this way is that it treats the empirically estimated medians as though they reflected perfect measurements of the truth. Imagine, however, that one of the age groups happened to have (in the consumer survey) four times as many data observations as another age group; then we would expect the median to be more precisely estimated for the age group with more observations; yet (60) assigns equal importance to a deviation between the model and the data for all age groups.

We can get around this problem (and a variety of others) by instead minimizing a slightly more complex object:

    ∑N      τ    τ
miξn     ωi|ςi − s (ξ)|
     i
(61)

where ωi  is the weight of household i  in the entire population,19 and ςiτ  is the empirical wealth to permanent income ratio of household i  whose head belongs to age group τ  . ωi  is needed because unequal weight is assigned to each observation in the Survey of Consumer Finances (SCF). The absolute value is used since the formula is based on the fact that the median is the value that minimizes the sum of the absolute deviations from itself.

With this in mind, we turn our attention to the computation of the weighted median wealth target moments for each age cohort using this data from the 2004 Survery of Consumer Finances on household wealth. The objects necessary to accomplish this task are weighted_median and get_targeted_moments. The actual data are taken from several waves of the SCF and the medians and means for each age category are plotted in figure 18. More details on the SCF data are included in appendix A.

pict

Figure 18: Wealth to Permanent Income Ratios from SCF (means (dashed) and medians (solid))
{fig:MeanMedianSCF}

We now turn our attention to the the two key functions in this section of the code file. The first, simulate_moments, executes the solving (solve) and simulation (simulation) steps for the defined life-cycle agent. Subsequently, the function uses the agents’ tracked levels of wealth based on their optimal consumption behavior to compute and store the simulated median wealth to income ratio for each age cohort. The second function, smmObjectiveFxn, calls the simulate_moments function to create the objective function described in (61), which is necessary to perform the SMM estimation.

Thus, for a given pair of the parameters to be estimated, the single call to the function smmObjectiveFxn executes the following:

  1. solves for the consumption functions for the life-cycle agent

  2. simulates the data and computes the simulated medians

  3. returns the value of equation (61)

We delegate the task of finding the coefficients that minimize the smmObjectiveFxn function to the minimize_nelder_mead function, which is defined elsewhere and called in the second part of this file. This task can be quite slow and rather problematic if the smmObjectiveFxn function has very flat regions or sharp features. It is thus wise to verify the accuracy of the solution, for example by experimenting with a variety of alternative starting values for the parameter search.

The final object defined in this first part of the StructEstimation.py file is calculateStandardErrorsByBootstrap. As the name suggsts, the purpose of this function is to compute the standard errors by bootstrap.20 This involves:

  1. drawing new shocks for the simulation

  2. drawing a random sample (with replacement) of actual data from the SCF

  3. obtaining new estimates for ρ  and ℶ

We repeat the above procedure several times (Bootstrap) and take the standard deviation for each of the estimated parameters across the various bootstrap iterations.

9.2.1 An Aside to Computing Sensitivity Measures
{subsubsec:sensmeas}

A common drawback in commonly used structural estimation procedures is a lack of transparency in its estimates. As ? notes, a researcher employing such structural empirical methods may be interested in how alternative assumptions (such as misspecification or measurement bias in the data) would “change the moments of the data that the estimator uses as inputs, and how changes in these moments affect the estimates.” The authors provide a measure of sensitivity for given estimator that makes it easy to map the effects of different assumptions on the moments into predictable bias in the estimates for non-linear models.

In the language of ?, section 9 is aimed at providing an estimator ξ = {ρ,ℶ } that has some true value ξ0   by assumption. Under the assumption a0   of the researcher, the empirical targets computed from the SCF is measured accurately. These moments of the data are precisely what determine our estimate ˆ
ξ , which minimizes (61). Under alternative assumptions a  , such that a given cohort is mismeasured in the survey, a different estimate is computed. Using the plug-in estimate provided by the authors, we can see quantitatively how our estimate changes under these alternative assumptions a  which correspond to mismeasurement in the median wealth to income ratio for a given age cohort.

9.3 Results

The second part of the file StructEstimation.py defines a function main which produces our ρ  and ℶ  estimates with standard errors using 10,000 simulated agents by setting the positional arguments estimate_model and compute_standard_errors to true.21 Results are reported in Table 2.22

Table 2: Estimation Results
{tab:EstResults}
-------------------
----ρ--------ℶ-----
  3.69      0.88
 (0.047)  (0.002)
-------------------

The literature on consumption and saving behavior over the lifecycle in the presenece of labor income uncertainty23 warns us to be careful in disentangling the effect of time preference and risk aversion when describing the optimal behavior of households in this setting. Since the precautionary saving motive dominates in the early stages of life, the coefficient of relative risk aversion (as well as expected labor income growth) has a larger effect on optimal consumption and saving behavior through their magnitude relative to the interest rate. Over time, life-cycle considerations (such as saving for retirement) become more important and the time preference factor plays a larger role in determining optimal behavior for this cohort.

Using the positional argument compute_sensitivity, Figure 19 provides a plot of the plug-in estimate of the sensitivity measure described in 9.2.1. As you can see from the figure the inverse relationship between ρ  and ℶ  over the life-cycle is retained by the sensitivity measure. Specifically, under the alternative assumption that a particular cohort is mismeasured in the SCF dataset, we see that the y-axis suggests that our estimate of ρ  and ℶ  change in a predictable way.

Suppose that there are not enough observations of the oldest cohort of households in the sample. Suppose further that the researcher predicts that adding more observations of these households to correct this mismeasurement would correspond to a higher median wealth to income ratio for this cohort. In this case, our estimate of the time preference factor should increase: the behavior of these older households is driven by their time preference, so a higher value of ℶ  is required to match the affected wealth to income targets under this alternative assumption. Since risk aversion is less important in explaining the behavior of this cohort, a lower value of ρ  is required to match the affected empirical moments.

To recap, the sensitivity measure not only matches our intuition about the inverse relationship between ρ  and ℶ  over the life-cycle, but provides a quantitative estimate of what would happen to our estimates of these parameters under the alternative assumption that the data is mismeasured in some way.

PIC

Figure 19: Sensitivty of Estimates {ρ, ℶ} regarding Alternative Mismeasurement Assumptions.
{fig:PlotSensitivityMeasure}

By setting the positional argument make_contour_plot to true, Figure 20 shows the contour plot of the smmObjectiveFxn function and the parameter estimates. The contour plot shows equally spaced isoquants of the smmObjectiveFxn function, i.e. the pairs of ρ  and ℶ  which lead to the same deviations between simulated and empirical medians (equivalent values of equation (61)). Interestingly, there is a large rather flat region; or, more formally speaking, there exists a broad set of parameter pairs which leads to similar simulated wealth to income ratios. Intuitively, the flatter and larger is this region, the harder it is for the structural estimation procedure to precisely identify the parameters.

PIC

Figure 20: Contour Plot (larger values are shown lighter) with {ρ,ℶ } Estimates (red dot).
{fig:PlotContourMedianStrEst}

10 Conclusion

Many choices can be made for solving microeconomic dynamic stochastic optimization problems. The set of techniques, and associated code, described in these notes represents an approach that I have found to be powerful, flexible, and efficient, but other problems may require other techniques. For a much broader treatment of many of the issues considered here, see Judd (?).

Appendices

A SCF Data

{app:scf-data}

Data used in the estimation is constructed using the SCF 1992, 1995, 1998, 2001 and 2004 waves. The definition of wealth is net worth including housing wealth, but excluding pensions and social securities. The data set contains only households whose heads are aged 26-60 and excludes singles, following Cagetti (?).24 Furthermore, the data set contains only households whose heads are college graduates. The total sample size is 4,774.

In the waves between 1995 and 2004 of the SCF, levels of normal income are reported. The question in the questionnaire is "About what would your income have been if it had been a normal year?" We consider the level of normal income as corresponding to the model’s theoretical object P  , permanent noncapital income. Levels of normal income are not reported in the 1992 wave. Instead, in this wave there is a variable which reports whether the level of income is normal or not. Regarding the 1992 wave, only observations which report that the level of income is normal are used, and the levels of income of remaining observations in the 1992 wave are interpreted as the levels of permanent income.

Normal income levels in the SCF are before-tax figures. These before-tax permanent income figures must be rescaled so that the median of the rescaled permanent income of each age group matches the median of each age group’s income which is assumed in the simulation. This rescaled permanent income is interpreted as after-tax permanent income. Rescaling is crucial since in the estimation empirical profiles are matched with simulated ones which are generated using after-tax permanent income (remember the income process assumed in the main text). Wealth / permanent income ratio is computed by dividing the level of wealth by the level of (after-tax) permanent income, and this ratio is used for the estimation.25