## 布里斯班essay代寫

How Do People Learn to ?

Comparing Two Learning Theories

Jo ¨ rg Rieskamp, Jerome R. Busemeyer, and Tei Laine

Indiana University Bloomington

How do people learn to allocate resources? To answer this question, 2 major learning models are

compared, each incorporating different learning principles. One is a global search model, which assumes

that allocations are made probabilistically on the basis of expectations formed through the entire history

of past decisions. The 2nd is a local adaptation model, which assumes that allocations are made by

comparing the present decision with the most successful decision up to that point, ignoring all other past

decisions. In 2 studies, participants repeatedly allocated a capital resource to 3 financial assets. Substan-

tial learning effects occurred, although the optimal allocation was often not found. From the calibrated

models of Study 1, a priori predictions were derived and tested in Study 2. This generalization test shows

that the local adaptation model provides a better account of learning in resource allocations than the

global search model.

How do people learn to improve their decision-making behavior

through past experience? The purpose of this article is to compare

two fundamentally different learning approaches introduced in the

decision-making literature that address this issue. One approach,

called global search models, assumes that individuals form expec-

tancies for every feasible choice alternative by keeping track of the

history of all previous decisions and searching for the strongest of

all these expectancies. Prominent recent examples that belong to

this approach are the reinforcement-learning models of Erev and

Roth (1998; see also Erev, 1998; Roth & Erev, 1995). These

models follow up a long tradition of stochastic learning models

we are considering.

Resource Allocation Decision Making

Allocating resources to different assets is a decision problem

people often face. A few examples of resource allocation decision

making are dividing work time between different activities, divid-

ing attention between different tasks, allocating a portfolio to

different financial assets, or devoting land to different types of

farming. Despite the ubiquity in real life, resource allocation

decision making has not been thoroughly acknowledged in the

psychological literature.

How good are people at making resource allocation decisions?

Only a handful of studies have tried to address this question. In one

of the earliest studies by Gingrich and Soli (1984), participants

Jo ¨ rg Rieskamp and Jerome R. Busemeyer, Department of Psychology,

Indiana University Bloomington; Tei Laine, Computer Science Depart-

ment, Indiana University Bloomington.

This study was supported in part by Grants SBR9521918 and

SES0083511 from the Center for the Study of Institutions, Population, and

Environmental Change through the National Science Foundation.

We acknowledge helpful comments by Jim Walker and Hugh Kelley

with whom we worked on a similar research project on which the present

study is based. In addition, we thank Ido Erev, Scott Fisher, Wieland

Mu ¨ ller, Elinor Ostrom, Reinhard Selten, the members of the bio-

complexity project at Indiana University Bloomington and two anonymous

reviewers for helpful comments.

Correspondence concerning this article should be addressed to Jo ¨rg

Rieskamp, who is now at the Max Planck Institute for Human Develop-

ment, Lentzeallee 94, 14195 Berlin, Germany. E-mail: rieskamp@mpib-

berlin.mpg.de

Journal of Experimental Psychology: Copyright 2003 by the American Psychological Association, Inc.

Learning, Memory, and Cognition

2003, Vol. 29, No. 6, 1066–1081

0278-7393/03/$12.00 DOI: 10.1037/0278-7393.29.6.1066

1066were asked to evaluate all of the potential assets before making

their allocations. Although the assets were evaluated accurately,

the majority of participants failed to find the optimal allocation.

Northcraft and Neale (1986) also demonstrated individuals’ diffi-

culties with allocation decisions when attention had to be paid to

financial setbacks and opportunity costs.

Benartzi and Thaler (2001) studied retirement asset allocations.

For this allocation problem the strategy of diversifying one’s

investment among assets (i.e., bonds and stocks) appears to be

reasonable (Brennan, Schwartz, & Lagnado, 1997). Benartzi and

Thaler showed that many people follow a “1/n strategy” by equally

dividing the resource among the suggested investment assets.

Although such a strategy leads to a sufficient diversification, the

final allocation depends on the number of assets and, thus, can lead

to inconsistent decisions.

The above studies show that individuals often do not allocate

their resources in an optimal way, which is not surprising given the

complexity of most allocation problems. Furthermore, in the above

studies, little opportunity was provided for learning because the

allocation decisions were made only once or infrequently. In

contrast, Langholtz, Gettys, and Foote (1993) required participants

to make allocations repeatedly (eight times). Two resources—fuel

and personnel hours for helicopters—were allocated across a

working week to maximize the operating hours of the helicopters.

Participants improved their performance substantially through

learning and almost reached the optimal allocation. However,

under conditions of risk or uncertainty, in which the amount of the

resource fluctuated over time, the improvement was less substan-

tial. For similar allocation problems, Langholtz, Gettys, and Foote

(1994, 1995) and Langholtz, Ball, Sopchak, and Auble (1997)

showed that learning leads to substantially improved allocations.

Interestingly, a tendency to allocate the resource equally among

the assets was found here also. Ball, Langholtz, Auble, and

Sopchak (1998) investigated participants’ verbal protocols when

they solved an allocation problem. According to these protocols,

participants seemed to use simplifying heuristics, which brought

them surprisingly close to the optimal allocation (i.e., reached on

average 94% efficiency).

In a study by Busemeyer, Swenson, and Lazarte (1986), an

extensive learning opportunity was provided, as participants made

30 resource allocations. Participants quickly found the optimal

allocation for a simple allocation problem that had a single global

maximum. However, when the payoff function had several max-

ima, the optimal allocation was frequently not found. Busemeyer

and Myung (1987) studied the effect of the range of payoffs

between the best and worst allocation and the variability of return

rates for the assets. For the majority of conditions, participants

reached good allocations through substantial learning. However,

under a condition with widely varying return rates and with a low

range of payoffs, participants got lost and did not exhibit much

learning effect. Furthermore, Busemeyer and Myung showed that

a hill-climbing learning model, which assumes that individuals

improve their allocations step-by-step, provided a good description

of the learning process. However, this model was not compared

against alternative models, and therefore it remains unclear

whether this is the best way to characterize learning in these tasks.

It can be concluded that when individuals make only a single

resource allocation decision, with no opportunity to learn from

experience, they generally do not find good allocations at once. In

such situations, individuals have a tendency to allocate an equal

share of the resource to the different assets, which, depending on

the situation, can lead to bad outcomes. Alternatively, when indi-

viduals are given the opportunity to improve their allocations

through feedback, substantial learning effects are found and indi-

viduals often approach optimal allocations. However, if local

maxima are present or the payoffs are highly variable, then sub-

optimal allocations can result even after extensive training.

In the following two studies, repeated allocation decisions with

outcome feedback were made, providing sufficient opportunity for

learning. The allocation problem of both studies can be defined as

follows: The decision maker is provided with a financial resource

that can be invested in three financial assets. A particular alloca-

tion, i (allocation alternative), can be represented by a three-

dimensional vector, X, where each dimension represents the pro-

portion of the resource invested in one of the three assets. For

repeated decisions, the symbol Xt

represents the allocation made at

Trial t. The distance between two allocations can be measured by

the Euclidean distance, which is the length of the vector that leads

from one allocation to the other.

1

The restriction of proportions to

integer percentages implies a finite number (N 5,151) of pos-

sible allocations.

The first learning model proposed to describe learning effects of

repeated resource allocations represents the global search (GLOS)

model approach. The GLOS model presented here is a modified

version of the reinforcement-learning model proposed by Erev

(1998). The second learning model represents the local adaptation

(LOCAD) model approach. The LOCAD model presented here is

a modified version of the hill-climbing learning model proposed

by Busemeyer and Myung (1987).

As already pointed out above, previous research has demon-

strated that the two learning model approaches have been success-

fully applied to describe people’s learning processes for various

decision problems: For example, in Busemeyer and Myungs

(1987) research, a hill-climbing learning model was appropriate to

describe people’s learning process for a resource allocation task;

Busemeyer and Myung (1992) also applied a hill-climbing model

successfully to describe criterion learning for a probabilistic cate-

gorization task. In contrast, Erev (1998) has shown that a

reinforcement-learning model is also appropriate to describe the

learning process for a categorization task. Furthermore, Erev ex-

plicitly proposed the reinforcement-learning model as an alterna-

tive model to the hill-climbing model. Moreover, Erev and Gopher

(1999) suggested the reinforcement-learning model for a resource

allocation task in which attention was the resource to be allocated

and which shows, by simulations, that the model’s predictions are

consistent with experimental findings. In summary, direct compar-

isons of the two approaches appear necessary to decide in which

domain each learning model approach works best. To extend the

generality of our comparison of the GLOS and LOCAD models,

we first compare the models with respect to how well they predict

a learning process for repeated resource allocations and, second,

we test whether the models are also capable of predicting individ-

ual characteristics of the learning process. Finally, although it is

1

The Euclidean distance between two allocations Xki

and Xkj

with three

possible assets k is defined as Dij

k1

3

Xki

Xkj

2

.

1067 LEARNING TO ALLOCATE RESOURCEStrue that we can only test special cases for each approach, we

currently do not know of any other examples within either ap-

proach that can outperform the versions that we are testing.

Global Search Model

Erev (1998), Roth and Erev (1995), and Erev and Roth (1998)

have proposed in varying forms a reinforcement-learning model

for learning in different decision problems. The GLOS model was

particularly designed for the resource allocation problem. The

basic idea of the model is that decisions are made probabilistically

proportional to expectancies (called propensities by Erev and

colleagues). The expectancy for a particular option increases

whenever a positive payoff or reward is provided after it is chosen.

This general reinforcement idea can be traced back to early work

by Bush and Mosteller (1955), Estes (1950), and Luce (1959); for

more recent learning models see Bo ¨ rgers and Sarin (1997), Cam-

erer and Ho (1999a, 1999b), Harley (1981), Stahl (1996), and

Sutton and Barto (1998).

The GLOS learning model for the resource allocation problem is

based on the following assumptions: Each allocation alternative is

assigned a particular expectancy. First, an allocation is selected

probabilistically proportional to the expectancies. Second, the re-

ceived payoff is used to determine the reinforcement for all allo-

cation alternatives, in such a way that the chosen allocation alter-

native receives reinforcement equal to the obtained payoff;

allocation alternatives close to the chosen one receive slightly less

reinforcement; and allocation alternatives that are far away from

the chosen allocation alternative receive very little reinforcement.

Finally, the reinforcement is used to update the expectancies of

each allocation alternative and the process returns to the first step.

In more detail GLOS is defined as follows: The preferences for

the different allocation alternatives are expressed by expectancies

qit

, where i is an index of the finite number of possible allocations.

The probability pit

that a particular allocation, i, is chosen at Trial

t is defined by (cf. Erev & Roth, 1998)

pit

qit

/ i1

N

qit

. (1)

For the first trial, all expectancies are assumed to be equal and

determined by the average payoff that can be expected from

random choice, multiplied by w, which is a free, so-called “initial

strength parameter” and is restricted by w 0. After a choice of

allocation alternative j on Trial t is made, the expectancies are

updated by the reinforcement received from the decision, which is

defined as the received payoff, rjt

. For a large grid of allocation

alternatives, it is reasonable to assume that not only the chosen

allocation is reinforced but also similar allocations. Therefore, to

update the expectancies of any given allocation alternative, i, the

reinforcement rit

is determined by the following generalization

function (cf. Erev, 1998):

rit

rjt

gxij

rjt

exp xij

2

/2 R

2

, (2)

where xij

is the Euclidean distance of a particular allocation, i, to

the chosen allocation, j, and with the standard deviation R as the

second free parameter. This function was chosen so that the

reinforcement rit

for the chosen allocation j should equal the

received payoff r jt

.

2

In the case of a negative payoff, rjt

0,

Equation 2 was modified as follows: rit

rjt

g(xij

) rjt

. By using

this modification, if the current payoff is negative, then the chosen

allocation receives the reinforcement of zero, whereas all other

allocation alternatives receive positive reinforcements. Finally, the

determined reinforcement is used to update the expectancies by the

following updating rule (see Erev & Roth, 1998):

qit

1 qit1 rit

, (3)

where [0,1] is the third free parameter, the forgetting rate. The

forgetting rate determines how strongly previous expectancies

affect new expectancies. If the forgetting rate is large, the obtained

reinforcement has a strong effect on the new expectancies. To

ensure that all possible allocation alternatives are chosen, at least

with a small probability, the minimum expectancy for all options

is restricted to v 0.0001 (according to Erev, 1998). After the

updating process, the probability of selecting any particular allo-

cation alternative is determined again.

In summary, the GLOS learning model has three free parame-

ters: (a) the initial strength parameter, w, which determines the

impact of the initial expectancies; (b) the standard deviation, R,

of

the generalization function, which determines how similar (close)

allocations have to be to the chosen allocation to receive substan-

tial reinforcement; and (c) the forgetting rate that determines the

impact of past experience compared with present experience. It is

important to limit the number of parameters to a relatively small

number because models built on the basis of too many parameters

will fail to generalize to new experimental conditions.

Local Adaptation Learning Model

The LOCAD learning model incorporates the idea of a hill-

climbing learning mechanism. In general, hill-climbing mecha-

nisms are widely used heuristics for optimization problems whose

analytic solutions are too complex (Russel & Norvig, 1995). The

basic idea is to start with a randomly chosen decision as a tempo-

rary solution and to change the decision slightly in the next trial.

If the present decision leads to a better outcome than a reference

outcome (i.e., the best previous outcome), this decision is taken as

a new temporary solution. Starting from this solution, a slightly

different decision is made in the same direction as the present one.

If the present decision leads to an inferior outcome, the temporary

solution is kept, and starting from this solution a new decision is

made in the opposite direction from the present decision. The step

size, that is, the distance between successive decisions, usually

declines during search. The search stops when no further changes

using this method yield substantial improvement. This process

requires that the available decision alternatives have an underlying

causal structure, such that they can be ordered by some criteria and

2

This is one aspect where the GLOS model varies from Erev’s (1998)

reinforcement model, where the density of the generalization function is set

equal to the received payoff. This constraint, set by Erev, has the disad-

vantage that the standard deviation of the generalization function, which is

supposed to be a free parameter, interacts with the received payoff used as

a reinforcement, such that a large reinforcement for a chosen allocation is,

for example, only possible if a small standard deviation of the generaliza-

tion function is chosen. We are confident that this difference and all other

differences of the GLOS model from the reinforcement model by Erev

represent improvements, in particular for the allocation task, which makes

it a strong competitor to the LOCAD model.

1068 RIESKAMP, BUSEMEYER, AND LAINEa direction of change exists. Consequently, for decision problems

that do not fulfill this requirement, the LOCAD learning model

cannot be applied. It is well known that hill-climbing heuristics are

efficient, because they often require little search, but their disad-

vantage can be suboptimal convergence (Russel & Norvig, 1995),

in other words, “getting stuck” in a local maximum.

LOCAD is defined as follows: It is assumed that decisions are

made probabilistically as in the GLOS learning model. In the first

trial, identical to GLOS, an initial allocation is selected with equal

probability, pit

, of all possible allocations. For the second alloca-

tion, the probability of selecting any particular allocation is defined

by the following distribution function:

pit

fSxij

/K exp xij

st

2

/2S

2

/K, (4)

where xij

is the Euclidean distance of any allocation, i, to the first

chosen allocation, j, with a standard deviation S as the first free

parameter, and K is simply a constant that normalizes the proba-

bilities so that they sum to one. The step size, st

, changes across

trials as follows:

st

s1

2

t1 t2

b

s1

t

, (5)

where s1 is the initial step as the second free parameter, vt

is the

received payoff (with v0 0), and vb is the payoff of the reference

allocation. The reference allocation is the allocation alternative

that produced the highest payoff in the past and is represented by

the index b for best allocation so far. Accordingly, the step size is

defined by two components. The first component depends on the

payoffs of the preceding allocations and the maximum payoff

received so far. The second component is time, manipulated so that

the step size automatically declines over time. Note that for Trial

t 2, the step size, s2,

equals the initial step size, s1.

For the third and all following trials, the probability of selecting

any particular allocation is determined by the product of two

operations, one that selects the step size and the other that selects

the direction of change. More formally, the probability of selecting

an allocation alternative i on Trial t 2 is given by

pit

fSxibfAyij

/K. (6)

In the above equation, the probability of selecting a step size is

determined by the function, fS (xib), which is the same function

previously defined in Equation 4 with the distance xib defined as

the Euclidean distance from any allocation i to the reference

allocation b. The second function is represented by fA (yij

)

exp[(yij

– at

)

2

/2A

2

], where yij

is the angle between the direction

vector of any allocation i to the direction vector of the preceding

allocation, j, and at

equals 0° if the preceding allocation led to a

higher or equal payoff than the reference allocation; otherwise at

equals 180o

. The direction vector of any allocation i is defined as

the vector from the preceding allocation j to the allocation i

(defined as Xi

– Xj

). The angle between the two direction vectors

ranges from 0° to 180° (mathematically the angle is determined by

the arccosines of the vector product of the two direction vectors

normalized to a length of one). The function fA (yij

) has a standard

deviation A as the third free parameter.

In summary, the LOCAD learning model has the following

steps. In the first trial, an allocation alternative is chosen with

equal probability, and in the second trial a slightly different allo-

cation alternative is selected. For selecting an allocation alternative

in the third and all following trials, the payoff received in the

preceding trial is compared with the reference allocation that

produced the maximum payoff received so far (this is an important

difference to the model proposed by Busemeyer & Myung, 1987,

where the reference allocation was the previous allocation). If the

payoff increased (or stayed the same), allocations in the same

direction as the preceding allocation are likely to be selected. On

the other hand, if the payoff decreased, allocations in the opposing

direction are more likely to be selected. The LOCAD learning

model has three free parameters: (a) the initial step size, s1, which

is used to determine the most likely distance between the first and

second allocation, and on which the succeeding step sizes depend;

(b) the standard deviation, S, of the distribution function, fS,

which determines how likely the distance between new allocations

and the reference allocation differ from the distance defined by the

step size, st

; and (c) the standard deviation, A, of the distribution

function, fA, which determines how likely the direction of new

allocations differ from the direction (or opposing direction) of the

preceding allocation.

The LOCAD learning model has similarities to the learning

direction theory proposed by Selten and Sto ¨cker (1986) and to the

hill-climbing learning model proposed by Busemeyer and Myung

(1987). Learning direction theory also assumes that decisions are

slightly adjusted on the basis of feedback, by comparing the

outcome of a decision with hypothetical outcomes of alternative

decisions. The LOCAD model represents a simple learning model

with only three free parameters, compared with the hill-climbing

model proposed by Busemeyer and Myung (1987) with eight free

parameters.

The LOCAD model is to some extent also related to so-called

belief-based learning models (Brown, 1951; Cheung & Friedman,

1997; Fudenberg & Levine, 1995; see also Camerer & Ho’s,

produce higher payoffs compared with the present decision. How-

ever, in contrast to belief-based models, these beliefs are not based

on foregone payoffs that are determined by the total history of past

decisions but are based on an assumption of the underlying causal

structure of the decision problem.

The Relationship of the Two Learning Models

The two models presented, in our view, are appropriate imple-

mentations of the two approaches of learning models we consider.

Any empirical test of the two models, strictly speaking, only

allows conclusions on the empirical accuracy of the particular

learning models implemented. However, keeping this restriction in

mind, both learning models are provided with a flexibility (ex-

pressed in the three free parameters of each model) that allows

them to predict various learning processes. Variations of our

implementations (e.g., using an exponential choice rule for deter-

mining choice probabilities instead of the implemented linear

choice rule) might increase the empirical fit of the model but will

1069 LEARNING TO ALLOCATE RESOURCESnot abolish substantial different predictions made by the two

learning models for the allocation decision problem we consider.

3

What are the different predictions that can be derived from the

two learning models? In general, the GLOS model predicts that the

probabilities with which decision alternatives are selected depend

on the total stock of previous reinforcements for these alternatives.

This implies a global search process within the entire set of

alternatives, which should frequently find the optimal alternative.

In contrast, the LOCAD model only compares the outcome of the

present decision with the best outcome so far and ignores all other

experienced outcomes, which are not integrated in an expectancy

score for each alternative. Instead, which alternatives will be

chosen depends on the success and the direction of the present

decision; thereby, an alternative similar to the present alternative

will most likely be selected. This implies a strong path depen-

dency, so that depending on the starting point of the learning

process, the model will often not converge to the optimal outcome

if several payoff maxima exist.

However, the specific predictions of the models depend on the

parameters, so that particular parameter values could lead to sim-

ilar behavior for both models. For example, if the GLOS model has

a high forgetting rate, the present allocation strongly influences the

succeeding allocation, resulting in a local search similar to the

LOCAD model, so that it could also explain convergence to local

payoff maxima. Likewise, if the LOCAD model incorporates a

large initial step size it implies a more global, random search

process, and therefore it could explain convergence to a global

maximum. Because of this flexibility of both models, one can

expect a relatively good fit of both models when the parameter

values are fitted to the data.

Therefore, we used the generalization method (Busemeyer &

Wang, 2000) to compare the models, which entails using a two-

stage procedure. As a first stage, in Study 1, each model was fit to

the individual learning data, and the fits of the two models were

compared. These fits provided estimates of the distribution of the

parameters over individuals for each model. As a second stage, the

parameter distributions estimated from Study 1 were used to

generate model predictions for a new learning condition presented

in Study 2. The accuracies of the a priori predictions of the two

models for the new condition in Study 2 provide the basis for a

rigorous comparison of the two models.

Study 1

In this experiment, the decision problem consisted of repeatedly

allocating a resource among three financial assets. The rates of

return were initially unknown, but they could be learned by feed-

back from past decisions. To add a level of difficulty to the

decision problem, the rate of return for each asset varied depending

on the amount invested in that asset and on the amount invested in

the other assets. One could imagine a real-life analogue in which

financial assets have varying returns because of fixed costs, econ-

omies of scale, or efficiency, depending on investments in other

assets. The purpose of the first study was to explore how people

learn to improve their allocation decisions and whether they are

able to find the optimal allocation that leads to the maximum

payoff. Study 1 was also used to compare the fits of the two

models with the individual data and to estimate the distribution of

parameters for each model.

Method

Participants. Twenty persons (14 women and 6 men), with an average

age of 22 years, participated in the experiment. The computerized task

lasted approximately 1 h. Most participants (95%) were students in various

departments of Indiana University. For their participation, they received an

initial payment of $2. All additional payments depended on the partici-

pants’ performance; the average payment was $18.

Procedure. The total payoff from an allocation is defined as the sum of

payoffs obtained from the three assets. The selection of the particular

payoff function was motivated by the two learning models’ predictions. As

can be seen in Figure 1, the allocation problem was constructed such that

a local and a global maximum with respect to the possible payoffs resulted.

In general, one would expect that people will get stuck at the local payoff

maximum if their learning process is consistent with the LOCAD model. In

contrast, the GLOS model predicts a learning process that frequently

should converge at the global payoff maximum.

Figure 1 only shows the proportion invested in Asset B and Asset C,

with the rest being invested in Asset A. High investments in Asset C lead

to low payoffs (in the worst case, a payoff of $3.28), whereas low

investments in Asset C result in higher payoffs. The difficult part is to find

out that there are two payoff maxima: first, the local maximum with a

payoff of $32.82 when investing 28% in Asset B and 19% in Asset C and,

second, the global maximum with a payoff of $34.46 when investing 88%

in Asset B and 12% in Asset C, yielding a difference of $1.64 between the

two maxima. Note that there is no variability in the payoffs at each

allocation alternative so that if a person compares the local and the global

maximum, it is perfectly obvious that there is a payoff difference between

them favoring the latter. The main difficulty for the person is finding the

global maximum, not detecting a difference between the local and global

maxima. The Euclidean distance between the corresponding allocations of

the local and global maximum is 80 (the maximum possible distance

between two allocations is 141). From random choice an average payoff of

$24.39 can be expected. The payoff functions for each asset are provided

in the Appendix.

The participants received the following instructions: They were to make

repeated allocation decisions in two phases of 100 trials. On each trial, they

would receive a loan of $100 that had to be allocated among three

“financial assets” from which they could earn profit. The loan had to be

repaid after each round, so that the profit from the investment decisions

equaled the participant’s gains. The three assets were described as follows:

Investments in Asset A “pay a guaranteed return equal to 10% of your

investment,” whereas the returns from Asset B and Asset C depended on

how much of the loan was invested in the asset. Participants were informed

that there existed an allocation among Asset A, Asset B, and Asset C that

would maximize the total payoffs and that the return rates for the three

assets were fixed for the whole experiment. It was explained that they

would receive 0.25% of their total gains as payment for their participation.

After the first phase of 100 trials, participants took a small break. There-

after, they received the information that the payoff functions for Asset B

3

In fact when we constructed the learning models for Study 1, various

modifications of both learning models were tested. For example, for the

GLOS model, among other things, we used different generalization func-

tions to determine reinforcements (see also footnote 2) or different methods

to determine the reinforcement in case of negative payoffs. For the

LOCAD model, among other things, we used different reference outcomes

with which the outcome of a present decision was compared with deter-

mining the success of a decision or different methods for how the step size

of the current trial was determined. In summary, the specified LOCAD and

GLOS learning models were the best models (according to the goodness-

of-fit criterion in Study 1) representing the two approaches of learning

models. Therefore, the conclusions we draw from the results of our model

comparisons are robust to variations of the present definition of the two

learning models.

1070 RIESKAMP, BUSEMEYER, AND LAINEand Asset C were changed but that everything else was identical to the first

block.

In fact, the payoff functions for Asset B and Asset C were interchanged

for the second phase. To control any order effects of which payoff function

was assigned to Asset B and which to Asset C, for half of the participants

the payoff function of Asset B in the first phase was assigned to Asset C

and the payoff function for Asset C was assigned to Asset B. For the

second phase, the reverse order was used.

Results

First, a potential learning effect is analyzed before the two

learning models are compared and more specific characteristics of

the learning process are considered.

Learning effects. The average investment in Asset B increased

from 26% (SD 15%) in the 1st trial to an average investment of

36% (SD 21%) in the 100th trial, whereas the average invest-

ment in Asset C decreased from an average of 34% (SD 21%)

in the 1st trial to an average of 19% (SD 8%) in the 100th trial.

This difference represents a substantial change in allocations cor-

responding to an average Euclidean distance of 28 (SD 29),

t(19) 4.21, p .001, d 0.94. Furthermore, this change leads

to an improvement in payoffs, which is discussed in more detail

below. Figure 2 shows the learning curve for the first phase of the

experiment. The percentages invested in Asset B and Asset C are

plotted as a function of training (with a moving average of 9 trials).

To investigate the potential learning effect, the 100 trials of each

phase were aggregated into blocks of 10 trials (trial blocks). A

repeated measure analysis of variance (ANOVA) was conducted,

with the average obtained payoff as the dependent variable, the

trial blocks and the two phases of 100 trials as two within-subject

factors, and the order in which the payoff functions were assigned

to the assets as a between-subjects factor. A strong learning effect

could be documented, as the average obtained payoff of $28 in the

first block (SD 2.2) increased substantially across the 100 trials

to an average payoff of $32 (SD 1.6) in the last block, F(9,

10) 5.18, p .008, 2

0.82. In addition, there was a learning

effect between the two phases as participants on average did better

in the second phase (M $30 for the first phase, SD 2.1 vs.

M $31 for the second phase, SD 1.9), F(1, 18) 12.69, p

.002, 2

0.41. However, this effect was moderated by an

interaction between trial blocks and the two phases, F(9, 10)

3.30, p .038, 2

0.75. This interaction can be attributed to a

more rapid learning process for the second phase compared with

the first phase: The average obtained payoff was higher in the

second phase from the 2nd to 5th trial blocks, whereas for the 1st

trial block and last 5 trial blocks, the payoffs did not differ. The

order in which the payoff functions were assigned to Asset B and

Asset C had no effect on the average payoffs (therefore, for

simplicity, in the following and for the presented figures, the

investments in Assets B and C are interchanged for half of the

participants). No other interactions were observed.

Model comparison. How well do the two learning models fit

the observed leaning data? We wished to compare the models

under conditions where participants had no prior knowledge, and

so we only used the data from the first phase to test the models.

Each model was fit separately to each individual’s learning data as

follows.

First, a set of parameter values were selected for a model

separately for each individual. Using the model and parameters,

we generated a prediction for each new trial, conditioned on the

past allocations and received payoffs of the participant before that

trial. The model’s predictions are represented by a probability

distribution across all possible 5,151 allocation alternatives, where

the selected allocation alternative of a participant received a value

of 1 and all other allocation alternatives received values of 0. The

accuracy of the prediction for each trial was evaluated using the

Figure 1. The payoff function for the total payoff of the allocation

problem in Study 1. The figure shows the investment in Asset B and Asset

C (which determines the investment in Asset A) and the corresponding

payoff.

Figure 2. Average participants’ allocations and average predictions of the

two learning models fitted to each individual. The figure shows a moving

average of nine trials, such that for each trial the average of the present

allocation and the preceding and succeeding four allocations are presented.

(Note for the first 4 trials the moving average is determined by five to eight

trials.) GLOS global search model; LOCAD local adaptation model.

Solid diamonds represent Real Asset B; solid triangles represent Real Asset

C; solid lines represent GLOS Asset B; hatched lines represent GLOS

Asset C; open squares represent LOCAD Asset B; and open triangles

represent LOCAD Asset C.

1071 LEARNING TO ALLOCATE RESOURCESsum of squared error. That is, we computed the squared error of the

observed (0 or 1) response and the predicted probability for each

of the 5,151 allocation alternatives and summed these squared

errors across all the alternatives for each trial to obtain the sum of

squared error for each trial (this score ranged from 0 to 2). To

assess the overall fit for a given individual, model, and set of

parameters, we determined the average of the sum of squared error

(SSE) across all 100 trials.

4

To compare the fits of the two learning models for Study 1, we

searched for the parameter values that minimized the SSE for each

model and individual. To optimize the parameters for each partic-

ipant and model, reasonable parameter values were first selected

by a grid-search technique, and thereafter the best fitting grid

values were used as a starting point for a subsequent optimization

using the Nelder–Mead simplex method (Nelder & Mead, 1965).

For the optimization process, the parameter values for the GLOS

model were restricted to initial strength values w between 0 and 10,

standard deviations R of the generalization function between 1

and 141, and forgetting rates between 0 and 1. The parameter

values for the LOCAD model were restricted to initial step sizes

between 1 and 141, a standard deviation S of the distribution

function fS between 1 and 141, and a standard deviation A of the

distribution function fA between 0° and 360°.

The above procedure was applied to each of the 20 participants

to obtain 20 sets of optimal parameter estimates. For the GLOS

model, this produced the following means and standard deviations

for the three parameters: initial strength mean, w 3.4 (SD 4.4);

forgetting rate mean, 0.24 (SD 0.18); and a standard

deviation mean, R 1.8 (SD 1.5) of the generalization

function. The mean and standard deviation of the SSE for the

GLOS model was 0.94 (SD 0.10).

For the LOCAD learning model, this estimation procedure pro-

duced the following means and standard deviations: Initial step

size mean of s1 23 (SD 30), standard deviation mean for the

distribution function fS of S 22 (SD 40), and a standard

deviation mean for the distribution function fA of A 119o

(SD 72). The mean and standard deviation of the SSE for the

LOCAD model were 0.91 and 0.18, respectively. In summary, for

Study 1 the LOCAD model was slightly more appropriate com-

pared with the GLOS model according to the SSE to predict

participants’ allocations (Z 1.5, p .135; Wilcoxon signed rank

test).

Figure 2 shows the average allocation of the participants across

the first 100 trials. Additionally, Figure 2 shows the predicted

average allocation by both learning models when fitted to each

participant. Both models adequately describe the last two thirds of

the learning process. However, for the first third, GLOS predicts

an excessively large proportion invested in Asset C, whereas

LOCAD overestimates the proportion invested in Asset B and

underestimates the proportion invested in Asset C.

5

Individual characteristics of the learning process. In addition

to analyzing the allocations of the participants, one can ask

whether the learning models are also capable of predicting indi-

vidual characteristics of the learning process. One characteristic is

whether a participant eventually found the global maximum, only

came close to the local maximum, or was not close to either

maximum. Figure 3A shows the percentage of participants who

were close ( 5%) to the allocations that produced the global or

local maximum across the 100 trials. In the first trial, no partici-

pant made an allocation corresponding to the local or global

maximum. At the end of training, only 10% of participants were

able to find the optimal allocation producing the maximum payoff,

whereas 50% of the participants ended up choosing allocations

close to the local maximum. Figure 3A also shows the predictions

of the models. Both models accurately describe the proportion of

participants who make allocations according to the local or global

maximum.

As noted earlier, participants were able to increase their payoffs

over the 100 trials through learning (see Figure 3B). Both learning

models also accurately describe this increase in payoffs.

As a third criterion for comparing the two learning models, the

effect of training on the magnitude with which individuals changed

their decisions was considered. To describe these changes during

learning, the Euclidean distances between successive trials were

determined. Figure 4A shows that in the beginning of the learning

process, succeeding allocations differed substantially with an av-

erage Euclidean distance of 30 units, whereas at the end of the

task, small changes in allocations were observed (M distance 9

units). LOCAD more accurately predicts the magnitude with

which participants change their allocation than GLOS. GLOS on

average predicts a too small magnitude with which allocations are

changed in successive trials.

A fourth characteristic to examine is the direction of change

in allocations that individuals made following different types of

outcome feedback. The LOCAD model predicts that the out-

come of a decision is compared with the outcome of the most

successful decision to that point, and, if the present decision

leads to a greater payoff, the direction of the succeeding deci-

sion is likely to be in the same direction as the present decision.

If a decision leads to a smaller payoff, the succeeding decision

is likely to be in the opposite direction. In contrast, the GLOS

model predicts that a decision is based on the aggregated

success and failure of all past decisions, so that no strong

correlation between the success of a present decision and the

direction of the succeeding decision is expected. To test this

prediction, the angles between the direction of an allocation and

the direction of the preceding allocation were determined for all

allocations. Figure 4B shows the proportion of the preceding

allocations that were successful for all preceding allocations

(i.e., led to a greater payoff than the allocation before), cate-

gorized with respect to the angle between the direction of an

allocation and the direction of the preceding allocation. Con-

sistent with the LOCAD model, we observed an association

4

As an alternative method for parameter estimation, compared with the

least-squares estimation, maximum likelihood estimation has the drawback

that it is sensitive to very small predicted probabilities, which frequently

occurred for the present task with the large number of possible allocations;

for advantages of the least-squares estimation see Selten (1998). Further-

more, the optimal properties of maximum likelihood only hold when the

model is the true model, which is almost never correct. In addition, these

properties only hold if the parameters fall inside the convex boundary of

the parameters, which is not guaranteed in our models. In summary, under

conditions of possible model misspecification, least-squares estimation is

more robust that maximum likelihood estimation, so the statistical justifi-

cations for maximum likelihood do not hold up under these conditions.

5

Note that the models’ parameters were not fitted by optimizing the

predicted average allocations compared with the observed average alloca-

tions but by optimizing the predicted probabilities of which allocation was

selected; otherwise a closer fit would result.

1072 RIESKAMP, BUSEMEYER, AND LAINEbetween the participants’ allocation directions and their suc-

cess: For 70% of all allocations made in the same direction as

the preceding allocation, the preceding allocation was success-

ful compared with only 35% of all allocations made in an

opposite direction.

This association was predicted, although to different extents,

by both models. As expected for LOCAD, a preceding alloca-

tion was likely to be successful (in 67% of all cases) when the

direction of an allocation was the same as the direction of the

preceding allocation (angles between 0 and 30o

), whereas the

preceding allocation was unlikely to be successful (only in

41% of all cases) when the direction of an allocation was

opposite to the preceding direction. Surprisingly, this associa-

tion was also observed for GLOS: For 73% of all allocations

Figure 3. Individual characteristics of the decision process in Study 1. A: Percentage of allocations corre-

sponding to the local or global payoff maximum across all trials (with a tolerated deviation of 5% from the

allocations that lead to the global or local maximum), presented with a moving average of nine trials. B: Average

payoff across all trials, presented with a moving average of nine trials. GLOS global search model; LOCAD

local adaptation model.

1073 LEARNING TO ALLOCATE RESOURCESmade in a similar direction to the preceding allocation, the

preceding allocation was successful, compared with 39% of all

allocations made in an opposite direction. However, the pro-

portions of successful preceding allocations for the different

angles was more strongly correlated with LOCAD’s predictions

(r .95) than with GLOS’s predictions (r .84).

Summary of Study 1

In Study 1, we showed that people are able to improve their

decisions in an allocation situation substantially when provided

with feedback. However, only a few participants were able to find

the allocation that produced the maximum possible payoff. This

Figure 4. Individual characteristics of the decision process in Study 1. A: Average magnitude of changes (step

size) measured with the Euclidean distance between the allocations of successive trials (with possible values

ranging from 0 to 141), presented with a moving average of nine trials. B: The angles between allocations’

directions compared with the direction of preceding allocations were determined and categorized in six intervals.

For each category, the percentage of successful preceding allocations (i.e., those leading to a higher payoff than

the allocations before) are presented. GLOS global search model; LOCAD local adaptation model.

1074 RIESKAMP, BUSEMEYER, AND LAINEresult can be explained by the LOCAD learning model, which

described the empirical results slightly better than the GLOS

learning model, on the basis of the goodness-of-fit criterion. If

people start with a particular allocation and try to improve their

situation by slightly adapting their decisions, as predicted by

LOCAD, depending on their starting position, they will often not

find the global payoff maximum.

However, because both models were fitted to each individual

separately, it is difficult to decide which model is more appropri-

ate, as the two models make similar predictions.When focusing on

several individual learning characteristics only one out of four

characteristics supports the LOCAD model: the magnitude with

which allocations are changed in successive trials. The other three

process characteristics are appropriately described by both learn-

ing models. This result is not very surprising if one considers that

both models were fitted for each individual and only predicted

each new trial on the basis of the information of previous trials. In

contrast, in Study 2 both models made a priori predictions for

independent data, enabling a rigorous comparison of the two

models.

Study 2

In light of the results found in Study 1 that people, even when

provided with substantial learning opportunity, often end up with

suboptimal outcomes, one might object that the function of the

total payoff used in Study 1 only produced a relatively small

payoff difference between the two maxima, providing small in-

centives for participants to search for the global maximum. In

addition, if one takes the opportunity costs of search into account,

it might be reasonable to stay at the local maximum. One could

criticize that the small difference between the payoffs does not

satisfy the criterion of payoff dominance (Smith, 1982), that is, the

additional payoff does not dominate any (subjective) costs of

finding the optimal outcome, so that participants are not suffi-

ciently motivated to find the global payoff maximum. In Study 2,

we addressed this critique by increasing the payoff difference

between the local and global payoff maximum but keeping the

shape of the total payoff function similar to that in Study 1.

Increasing the payoff difference between the local and global

payoff maximum has direct implications for the predictions of the

GLOS learning model: If the reinforcement for the global payoff

maximum increases relative to the local payoff maximum, the

probability of selecting the allocation alternative corresponding to

the global maximum should increase according to the GLOS

model. Therefore, one would expect the GLOS model to predict

that more people will find the global maximum. In contrast, a

larger payoff difference between the local and global payoff max-

imum does not affect the prediction of the LOCAD model.

Study 2 also provides an opportunity to test the two learning

models on new independent data, by simulating 50,000 agents

using the model parameter values randomly selected from normal

distributions with the means and standard deviations of the param-

eter values derived from the individual fitting process of Study 1.

Given that the models’ parameter values are not fitted by the data

of Study 2, the models’ predictions provide a stronger empirical

generalization test of the models, which has been often asked for

but seldom done (Busemeyer & Wang, 2000).

Method

Participants. Twenty persons (13 women and 7 men) with an average

age of 21 years participated in the experiment. The duration of the com-

puterized task was approximately 1 h. Most participants (90%) were

students in various departments of Indiana University. For their participa-

tion they received an initial payment of $2. Additional payment was

contingent on the participants’ performance; the average payment was $20.

Procedure. The allocation problem was identical to the one used in

Study 1, with the only difference being the modified payoff functions. The

payoff functions differed by an increase in the payoff difference between

the local and global payoff maximum (see Figure 5). Again, high invest-

ments in Asset C led to low payoffs, in the worst case to a payoff of

–$34.55, whereas small investments in Asset C result in higher payoffs.

The local maximum with a payoff of $32.48 was obtained when investing

29% in Asset B and 21% in Asset C (cf. 28% and 19%, respectively, with

a payoff of $32.82 in Study 1), whereas the global maximum with a payoff

of $41.15 was reached when investing 12% in Asset B and 88% in Asset

C (the same allocation led to the global payoff maximum of $34.46 in

Study 1). From random choice, an average payoff of $17.44 could be

expected. The payoff functions yielded a difference of $8.67 and a Euclid-

ean distance of 79 between the allocations corresponding to the local and

global payoff maximum.

The instructions for the task in Study 2 were identical to those used in

Study 1.

Results

As in Study 1, first we analyze a potential learning effect before

the two learning models are compared and more specific charac-

teristics of the learning process are considered.

Learning effects. In the 1st trial, the average allocation con-

sisted of an investment of 26% in Asset B (SD 12%), which

increased to an average investment of 48% in Asset B (SD 27%)

in the 100th trial. The investment in Asset C decreased from 27%

(SD 13%) in the first trial to 22% (SD 12%) in the 100th trial.

As in Study 1, participants in Study 2 had the tendency in the first

trial to invest slightly more in Asset A, which guaranteed a fixed

return. The allocation in the first trial substantially differed from

Figure 5. The payoff function for the total payoff of the allocation

problem in Study 2.

1075 LEARNING TO ALLOCATE RESOURCESthat in the 100th trial with a Euclidean distance of 41, t(19) 6.93,

p .001, d 1.55.

To investigate any learning effect, the 100 trials of both phases

were aggregated into blocks of 10 trials (trial blocks). A repeated

measure ANOVA was conducted, with the obtained payoff as the

dependent variable, the trial blocks and the two phases of 100 trials

as two within-subject factors, and the order in which the payoff

functions were assigned to the assets as a between-subjects factor.

A strong learning effect was documented, as the average ob-

tained payoff of $25 in the first block (SD 3.6) increased

substantially across the 100 trials to an average payoff of $34

(SD 4.7) in the last block, F(9, 10) 4.09, p .019, 2

0.79.

In addition, there was a learning effect between the two phases, as

participants on average did better in the second phase (M 30,

SD 3.7 vs. M 33, SD 4.5), F(1, 18) 8.59, p .009, 2

0.32. In contrast to Study 1, the interaction between trial blocks

and the two phases was not significant, F(9, 10) 2.24, p .112,

2

0.67. The order in which the payoff functions were assigned

to Asset B and Asset C had no effect on the average payoffs

(therefore, for simplicity, in the following, the investments in

Assets B and C are interchanged for half of the participants). No

other interactions were observed.

Model comparison. How well did the two learning models

predict participants’ allocations across the first 100 trials? For

Study 2, no parameter values were estimated. Instead, our testing

approach consisted of simulating a large number of agents with the

models’ parameter values randomly selected from normal distri-

butions, with the means and standard deviations of the parameter

values derived from the fitting process of Study 1. Finally, the

models’ fits were assessed by calculating the mean squared error

(MSE) of the average observed and average predicted allocations

(the deviation between two allocations is defined by the Euclidean

distance).

Figure 6 shows the development of the average allocation of the

participants across all 100 trials. In addition, the figure shows the

predicted average allocation of both learning models. The LOCAD

learning model better describes the development of the allocations

across the 100 trials, and MSE equals 39. In contrast, the GLOS

learning model less appropriately describes the learning process,

with an MSE of the predicted and observed average allocation of

117. GLOS underestimates the magnitude of the learning effect for

the allocation task.

Characteristics of the learning process. Does the LOCAD

learning model predict individualistic characteristics of the learn-

ing process more suitably than the GLOS model? Figure 7A shows

again for Study 2 the proportion of allocations across all trials that

correspond to the allocations that led to the local or global payoff

maximum (with a tolerated deviation of 5%). Similar to Study 1,

the proportion of participants that made allocations according to

the local or global maximum increased substantially through learn-

ing across the 100 trials. However, again only a small number of

participants (20%) finally found the allocation corresponding to

the global payoff maximum, whereas a larger proportion (40%) got

stuck at the allocation corresponding to the local payoff maximum.

This result was again predicted by the LOCAD learning model.

Although both models underestimate the proportion of allocations

according to the local or global payoff maximum, the predicted

proportions by LOCAD were closer to the observed data.

Through learning, participants were able to increase their payoff

over the 100 trials (see Figure 7B). Both models underestimated

the payoff increase, but LOCAD’s prediction was closer to the

observed payoff increase than GLOS’s prediction.

The effect of training on the magnitude with which the partic-

ipants changed their decisions was similar to Study 1 (see Figure

8A), starting with an average magnitude of a Euclidean distance of

29 for the first 10 trials and ending with an average magnitude of

5 for the last 10 trials. Although both models underestimated the

decline in the magnitude with which decisions were adapted, the

predictions of LOCAD come closer to the observed development.

Similar to Study 1, an association between allocations’ direc-

tions and their success was observed for the participants’ deci-

sions: For 74% of all allocations in the same direction as the

preceding allocation (angles between 0o

and 30o

), the preceding

allocation was successful, compared with only 35% of all alloca-

tions made in an opposite direction (angles between 150o

and

180o

; see Figure 8B).

An even stronger association was predicted by the LOCAD

model: For 92% of all allocations made in the same direction as the

preceding allocation, the preceding allocation was successful,

compared with 20% of all allocations made in an opposite direc-

tion. In contrast, the GLOS model predicted a weak association:

For 61% of all allocations made in the same direction as the

preceding allocation, the preceding allocation was successful,

compared with 46% of all allocations made in an opposite direc-

tion. The proportions of successful preceding allocations for the

different angles were strongly correlated with both models’ pre-

dictions (r .93 for LOCAD and r .92 for GLOS).

Figure 6. Average participants’ allocations and average predictions of the

two learning models when simulating 50,000 agents. The figure shows a

moving average of nine trials, such that for each trial the average of the

present allocation and the preceding and succeeding four allocations are

presented. (Note for the first 4 trials, the moving average is determined by

five to eight trials.) GLOS global search model; LOCAD local

adaptation model. Solid diamonds represent Real Asset B; solid triangles

represent Real Asset C; solid lines represent GLOS Asset B; hatched lines

represent GLOS Asset C; open squares represent LOCAD Asset B; and

open triangles represent LOCAD Asset C.

1076 RIESKAMP, BUSEMEYER, AND LAINESummary of Study 2

Study 2 illustrates the robustness of the findings from Study 1.

Although the payoff difference between the local and global payoff

maximum was substantially increased, only a small proportion of

participants were able to find the global maximum, whereas many

participants got stuck at the local maximum. Such a result is consis-

tent with the main learning mechanism of the LOCAD learning

model, which better predicted the observed learning process for the

allocation problem compared with the GLOS learning model.

Of course, one aspect of the payoff function that influences the

difficulty with which the local or global payoff maxima can be

detected is their localizations in the search space of possible

allocations. The allocation corresponding to the local payoff max-

imum was located near the center of the search space, that is, near

an allocation with an equal share invested in all three assets. In

contrast, the allocation producing the global payoff maximum was

located at the border of the search space, that is, an allocation with

disproportional investments in the different assets. If people tend

Figure 7. Individual characteristics of the decision process in Study 2. A: Percentage of allocations corre-

sponding to the local or global payoff maximum across all trials (with a tolerated deviation of 5% from the

allocations that lead to the global or local maximum), presented with a moving average of nine trials. B: Average

payoff across all trials, presented with a moving average of nine trials. GLOS global search model; LOCAD

local adaptation model.

1077 LEARNING TO ALLOCATE RESOURCESto start with evenly distributed investments in all three assets and

if they follow a learning process as predicted by the LOCAD

model, they should frequently get stuck at the local payoff maxi-

mum. In contrast, one could imagine a payoff function for which

the positions of the allocations corresponding to the local and

global payoff maxima were interchanged. For such a payoff func-

tion, the majority of participants would presumable find the global

payoff maximum. However, such a function would not allow

discrimination between the predictions of the two learning models

and was therefore not used.

In summary, the results that many participants got stuck at the

local payoff maximum in both of our studies is a result of the

Figure 8. Characteristics of the decision process in Study 2. A: Average magnitude of changes (step size)

measured with the Euclidean distance between the allocations of successive trials (with possible values ranging

from 0 to 141), presented with a moving average of nine trials. B: The angles between allocations’ directions

compared with the direction of preceding allocations were determined and categorized in six intervals. For each

category, the percentage of successful preceding allocations (i.e., those leading to a higher payoff than the

allocations before) are presented. GLOS global search model; LOCAD local adaptation model.

1078 RIESKAMP, BUSEMEYER, AND LAINEpayoff function used and can be predicted with the proposed

LOCAD learning model. The generalization test of the learning

models in Study 2 was more substantial than that in Study 1,

because no parameter values were fitted to the data; instead the

models predicted independent behavior of a different decision

problem.

Discussion

Recently, several learning theories for decision-making prob-

lems have been proposed (e.g., Bo ¨ rgers & Sarin, 1997; Busemeyer

& Myung, 1992; Camerer & Ho, 1999a, 1999b; Erev & Roth,

1998; Selten & Sto ¨cker, 1986; Stahl, 1996). Most of these learning

theories build on the basic idea that people do not solve a problem

from scratch but adapt their behavior on the basis of experience.

The theories differ according to the learning mechanism that

people apply, that is, their assumptions about cognitive processes.

The reinforcement-learning model proposed by Erev and Roth

(1998) and the experience-weighted attraction learning model pro-

posed by Camerer and Ho (1999a, 1999b) in general belong to the

class of global search models. These models assume that all

possible decision alternatives can be assigned an overall evalua-

tion. Whereas the evaluation for the reinforcement-learning model

only depends on the experienced consequences of past decisions,

the experience-weighted attraction model additionally can take

hypothetical consequences and foregone payoffs into account.

Both models make the assumption that people integrate their

experience for an overall evaluation, and alternatives that are

evaluated positively are more likely to be selected.

The other approach—local adaptation models—does not as-

sume that people necessarily acquire a global representation of the

consequences of the available decision alternatives through learn-

ing. Instead, the hill-climbing model by Busemeyer and Myung

(1987) and the learning direction theory of Selten and Sto ¨cker

(1986) assume that decisions are adapted locally, so that a preced-

ing decision might be slightly modified according to its success or

failure.

Busemeyer and Myung (1992) suggested that models in the

global search class may be applicable to situations in which the

decision alternatives form a small set of qualitatively different

strategies, whereas models in the local adaptation class may be

applicable in situations in which the decision alternatives form a

continuous metric space of strategies. Global search models have

been successfully applied to constant-sum games, in which there

are only a small number of options. The purpose of this research

was to examine learning processes in a resource allocation task,

which provides a continuous metric space of strategies.

A new version of the global search model, called the GLOS

model, and a new version of the local adaptation model, called the

LOCAD model, were developed for this task. These two models

were the best representations of the two classes that we constructed

for the resource allocation task. The models were compared in two

different studies. In the first study, the model parameters were

estimated separately for each participant, and the model fits were

compared with the individual data. In the second study, we used

the estimated parameters from the first study to generate a priori

predictions for a new payoff condition, and the predictions of the

models were compared with the mean learning curves.

In both studies, the resource allocation task consisted of repeat-

edly allocating a capital resource to different financial assets. The

task was difficult because the rates of return were unknown for two

assets, the rates of return depended in a nonlinear manner on the

amount invested in the assets, and the number of allocation alter-

natives was quite large. However, because any investment led to a

deterministic return, it was always obvious which of two alloca-

tions performed better after the payoffs for these allocation alter-

natives were presented. Therefore, the essence of the task that the

participants faced in both studies consisted of a search problem for

a good allocation alternative. Given that the participants were

provided with a large number of trials, finding the best possible

allocation alternative was possible. However, it turned out that the

majority of participants did not find the best possible allocation

corresponding to the global payoff maximum but became dis-

tracted by the local payoff maximum. Nevertheless, a substantial

learning process was observed: At the beginning of the task there

was a tendency to allocate an equal proportion of the resource to

all three assets with a slightly larger proportion invested in the

asset that guaranteed a fixed return. These allocations led to

relatively low average payoffs, which then increased substantially

over the 100 trials through learning. This learning process can be

characterized by substantial changes of allocations at the begin-

ning of the task, which then declined substantially over time. The

direction in which the allocations were changed depended strongly

on the success of previous changes, characterizing a directional

learning process.

These central findings correspond to the learning principles of

the local adaptation model. Therefore, it is not surprising that the

local adaptation model reached a better fit compared with the

global search model in predicting individuals’ allocations in both

studies. In Study 1, when fitting both models to each individual

separately, LOCAD reached a slightly better fit in describing the

learning process. In Study 2, the a priori predicted average allo-

cations by the LOCAD model (see Figure 6) properly described

the observed average allocation across 100 trials, corresponding to

a smaller MSE for LOCAD compared with GLOS. Given that in

Study 2 the payoff function differed substantially from the payoff

function of Study 1, these results provide strong empirical support

for LOCAD.

The appropriateness of LOCAD to describe the learning process

is also supported by individual characteristics of the process. In

Study 1, the LOCAD model, compared with GLOS, more accu-

rately predicted the magnitude with which successive allocations

were changed. In contrast, the other three individual characteristics

of the learning process are equally well described by the two

models in Study 1. This result changes substantially when turning

to Study 2; here the LOCAD model also more suitably described

the development of payoffs and the development of the number of

allocations corresponding to the local and global payoff maximum.

Unexpectedly, in both studies, the association between the direc-

tion of allocations and the success of previous allocations was

appropriately described by the LOCAD model as well as the

GLOS model.

Why is it that the LOCAD model, compared with the GLOS

model, better describes the learning process in the resource allo-

cation task? Although the predictions of the two models can be

similar with respect to specific aspects, the learning principles of

the models are quite different. The learning principles of LOCAD

seem to correspond more accurately to individuals’ behavior for

this task. According to LOCAD, starting with a specific allocation,

new allocations are made in the same direction as the direction of

1079 LEARNING TO ALLOCATE RESOURCESthe preceding successful allocation. Although this learning princi-

ple is very effective at improving allocations, it can lead to the

result of missing the global maximum, as decisions have to be

changed substantially to find the global maximum. Yet this result

is exactly what was found in both studies. In contrast, the GLOS

model eventually found a global payoff maximum, especially

when experience made at the beginning of a learning process was

not given too strong a weight. In this case the GLOS model

selected all different kinds of allocations and eventually at some

point also selected allocations corresponding to the global payoff

maximum, for which it then developed a preference. However,

given that most participants did not find the global payoff maxi-

mum, when fitting the GLOS model to the data, parameter values

were selected so that the model would not lead to a convergence to

the global payoff maximum. However, with these parameter val-

ues, the model also does not converge frequently to any allocation,

so that it still does not predict the convergence to the local payoff

maximum, which was found for most participants.

To what extent can the results of the present studies be gener-

alized to different learning models? The two models that we

implemented are the best examples of the two approaches of

learning models we found. Both were supported by past research

and both were directly compared in previous theoretical analyses

(see, e.g., Erev, 1998). More important, we also compared many

variations of each model, although because of space limitations,

we only present the results for the best version of each model.

Nevertheless, our conclusions are supported in the sense that no

variation of the GLOS model outperformed the LOCAD model

that we present here, and instead, all the variations did worse than

the GLOS model that we present here. Furthermore, because of the

flexibility of the implemented models, that is, their free parame-

ters, we doubt that slight modifications of the presented models

would lead to substantially different results that would challenge

our claim that the LOCAD learning model is better to predict the

learning process for the resource allocation problem.

To what extent did Study 2 provide a fair test of the two models?

The answer, we argue, is more than fair. First, both types of

learning models have been applied in previous theoretical analyses

to resource allocation tasks similar to the one used in Study 1 (Erev

& Gopher, 1999). Thus, there is no reason to claim that Study 1

does not provide a suitable test ground. In the second study, we

simply increased the difference between the local and global

maxima, which encouraged more participants to find the global

maximum. This manipulation actually favors the GLOS model

because the a priori tendency for the LOCAD model is to be

attracted to the local maximum. Thus, the second study provided

the best possible a priori chance for the GLOS model to outper-

form the LOCAD model in the generalization test.

To what extent can the results of the present studies be gener-

alized to other decision problems? It should be emphasized that the

current conclusions are restricted to the decision problem we

considered. We expect that in similar decision problems that

provide a large number of strategies that form a natural order, the

LOCAD model would better describe the learning process. In such

situations, people can form a hypothesis about the underlying

causal structure of the decision process that enables a directed

learning process. For example, when deciding how much to invest

in a repeated public-good game, a local adaptation learning process

might occur.

However, there are many situations for which global search

learning models describe learning processes better. For example,

there is a large amount of empirical evidence that global search

models appropriately describe the learning process for constant-

sum games with a small number of actions (Erev & Roth, 1998).

In a constant-sum game, no possibility exists for the players to

increase the mutual payoff by “cooperation.” The prediction from

game theory asserts that the different decision strategies (options)

should be selected with a particular probability. In such a situation,

there are only a small number of categorically different alterna-

tives, making it difficult to apply a local adaptation model, because

the set of alternatives provides no natural order to define directions

for changes in strategies.

The present article demonstrates a rigorous test of two learning

models representing two approaches in the recent learning litera-

ture. It also provides an illustration that learning often does not

lead to optimal outcomes as claimed, for example, by Simon

(1990) or Selten (1991). Yet, people improve their decisions

substantially through learning: For example, even when individu-

als start with a suboptimal decision of allocating an equal share to

the different assets, they quickly change their decision by making

allocations that produce higher payoffs. This learning process can

be described by the local adaptation learning model, which is

commonly characterized by high efficiency but can lead to sub-

optimal outcomes. For other domains, other learning mecha-

nism(s) might govern behavior, and each learning model might

have its own domain in which it works well. Identifying these

domains is a promising enterprise.

References

Ball, C. T., Langholtz, H. J., Auble, J., & Sopchak, B. (1998). Resource-

allocation strategies: A verbal protocol analysis. Organizational Behav-

ior & Human Decision Processes, 76, 70–88.

Benartzi, S., & Thaler, R. H. (2001). Naive diversification strategies in

defined contribution saving plans. American Economic Review, 91,

79–98.

Bo ¨ rgers, T., & Sarin, R. (1997). Learning through reinforcement and

replicator dynamics. Journal of Economic Theory, 77, 1–14.

Brennan, M. J., Schwartz, E. S., & Lagnado, R. (1997). Strategic asset

allocation. Journal of Economic Dynamics & Control, 21, 1377–1403.

Brown, G.W. (1951). Iterative solution of games by fictitious play, In T. C.

Koopmans (Ed.), Activity analysis of production and allocation (pp.

374–376). New York: Wiley.

Busemeyer, J. R., & Myung, I. J. (1987). Resource allocation decision-

making in an uncertain environment. Acta Psychologica, 66, 1–19.

Busemeyer, J. R., & Myung, I. J. (1992). An adaptive approach to human

decision-making: Learning theory, decision theory, and human perfor-

mance. Journal of Experimental Psychology: General, 121, 177–194.

Busemeyer, J. R., Swenson, K., & Lazarte, A. (1986). An adaptive ap-

proach to resource allocation. Organizational Behavior & Human De-

cision Processes, 38, 318–341.

Busemeyer, J. R., & Wang, Y.-M. (2000). Model comparisons and model

selections based on generalization criterion methodology. Journal of

Mathematical Psychology, 44, 171–189.

Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning. New

York: Wiley.

Camerer, C., & Ho, T.-H. (1999a). Experience-weighted attraction learning

in games: Estimates from weak-link games. In D. V. Budescu & I. Erev

(Eds.), Games and human behavior: Essays in honor of Amnon Rap-

oport (pp. 31–51). Mahwah, NJ: Erlbaum.

Camerer, C., & Ho, T.-H. (1999b). Experience-weighted attraction learning

in normal form games. Econometrica, 67, 827–874.

1080 RIESKAMP, BUSEMEYER, AND LAINECheung, Y.-W., & Friedman, D. (1997). Individual learning in normal form

games: Some laboratory results. Games & Economic Behavior, 19,

46–76.

Dorfman, D. D., Saslow, C. F., & Simpson, J. C. (1975). Learning models

for a continuum of sensory states reexamined. Journal of Mathematical

Psychology, 12, 178–211.

Erev, I. (1998). Signal detection by human observers: A cutoff

reinforcement-learning model of categorization decisions under uncer-

tainty. Psychological Review, 105, 280–298.

Erev, I., & Gopher, D. (1999). A cognitive game-theoretic analysis of

attention strategies, ability, and incentives. In D. Gopher & A. Koriat

(Eds.), Attention and performance XVII: Cognitive regulation of perfor-

mance. Interaction of theory and application (pp. 343–371). Cambridge,

MA: MIT Press.

Erev, I., & Roth, A. E. (1998). Predicting how people play games: Rein-

forcement learning in experimental games with unique, mixed strategy

equilibria. American Economic Review, 88, 848–881.

Estes, W. K. (1950). Toward a statistical theory of learning. Psychological

Review, 57, 94–107.

Fudenberg, D., & Levine, D. K. (1995). Consistency and cautious fictitious

play. Journal of Economic Dynamics & Control, 19, 1065–1089.

Gingrich, G., & Soli, S. D. (1984). Subjective evaluation and allocation of

resources in routine decision-making. Organizational Behavior & Hu-

man Decision Processes, 33, 187–203.

Harley, C. B. (1981). Learning the evolutionary stable strategy. Journal of

Theoretical Biology, 89, 611–633.

Langholtz, H. J., Ball, C., Sopchak, B., & Auble, J. (1997). Resource-

allocation behavior in complex but commonplace tasks. Organizational

Behavior & Human Decision Processes, 70, 249–266.

Langholtz, H., Gettys, C., & Foote, B. (1993). Resource-allocation behav-

ior under certainty, risk, and uncertainty. Organizational Behavior &

Human Decision Processes, 54, 203–224.

Langholtz, H., Gettys, C., & Foote, B. (1994). Allocating resources over

time in benign and harsh environments. Organizational Behavior &

Human Decision Processes, 58, 28–50.

Langholtz, H., Gettys, C., & Foote, B. (1995). Are resource fluctuations

anticipated in resource allocation tasks? Organizational Behavior &

Human Decision Processes, 64, 274–282.

Luce, R. D. (1959). Individual choice behavior. New York: Wiley.

Nelder, J. A., & Mead, R. (1965). A simplex method for function mini-

mization. Computer Journal, 7, 308–313.

Northcraft, G. B., & Neale, M. A. (1986). Opportunity costs and the

framing of resource allocation decisions. Organizational Behavior &

Human Decision Processes, 37, 348–356.

Roth, A. E., & Erev, I. (1995). Learning in extensive-form games: Exper-

imental data and simple dynamic models in the intermediate term.

Games & Economic Behavior, 8, 164–212.

Russel, S. J., & Norvig, P. (1995). Artificial intelligence. Englewood Cliffs,

NJ: Prentice Hall.

Selten, R. (1991). Evolution, learning, and economic behavior. Games &

Economic Behavior, 3, 3–24.

Selten, R. (1998). Axiomatic characterization of the quadratic scoring

rules. Experimental Economics, 1, 43–62.

Selten, R., & Sto ¨cker, R. (1986). End behavior in sequences of finite

prisoner’s dilemma supergames: A learning theory approach. Journal of

Economic Behavior & Organization, 7, 47–70.

Simon, H. A. (1990). Invariants of human behavior. Annual Review of

Psychology, 41, 1–19.

Smith, V. L. (1982). Microeconomic systems as an experimental science.

American Economic Review, 72, 923–955.

Stahl, D. O. (1996). Boundedly rational rule learning in a guessing game.

Games & Economic Behavior, 16, 303–330.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An intro-

duction. Cambridge, MA: MIT Press.

Thomas, E. A. C. (1973). On a class of additive learning models: Error

correcting and probability matching. Journal of Mathematical Psychol-

ogy, 10, 241–264.

Appendix

Payoff Functions Used in Study 1 and Study 2

In Study 1 the payoff functions were defined as follows: The first

allocation asset produced a fixed rate or return of 10% (with the payoff

function uA( pA) 0.1pA

0.1R with pA [0,1] as the percent of the

resource R invested in asset A.) For the other two allocation assets, the rate

of return varied with the amount invested in the asset. For asset B, the

payoff function was defined as uB( pB, pA) 10-0.1pA

R 40

[sin(3.2

( pB-0.781)-9)/(3.2

( pB-0.781)-9)] with pB, pA [0,1].

For asset C, the payoff function was defined as uC( pC) 5 [4R

sin(1.1

( pC-0.781)-24.6)/(1.1

( pC-0.781)-24.6)] with pC [0,1].

In Study 2 the payoff functions were defined as follows: The payoff

function for asset A was identical to the one used in Study 1. For asset B

the payoff function was defined as uB( pB, pA) 6 – 0.2pA

R 80

[sin(3.2

( pB – 0.781) –9)/(3.2

( pB – 0.781) –9)] with pB, pA

[0,1] and for the third asset C the payoff function was defined as uC( pC)

–4 [8R

sin(1.1

( pC-0.781)-24.6)/(1.1

( pC-0.781)-24.6)] with

pC [0,1].

Received October 22, 2002

Revision received March 26, 2003

Accepted March 30, 2003

1081 LEARNING TO ALLOCATE RESOURCESCopyright of Journal of Experimental Psychology. Learning, Memory & Cognition is the property of American

Psychological Association and its content may not be copied or emailed to multiple sites or posted to a listserv

without the copyright holder's express written permission. However, users may print, download, or email

<标题>
articles for individual use.