澳洲論文代寫 > essay代寫 > 布里斯班essay代寫 > > 正文

布里斯班essay代寫

Allocate Resources代寫Learning Models

How Do People Learn to ?
Comparing Two Learning Theories
Jo ¨ rg Rieskamp, Jerome R. Busemeyer, and Tei Laine
Indiana University Bloomington
How do people learn to allocate resources? To answer this question, 2 major learning models are
compared, each incorporating different learning principles. One is a global search model, which assumes
that allocations are made probabilistically on the basis of expectations formed through the entire history
of past decisions. The 2nd is a local adaptation model, which assumes that allocations are made by
comparing the present decision with the most successful decision up to that point, ignoring all other past
decisions. In 2 studies, participants repeatedly allocated a capital resource to 3 financial assets. Substan-
tial learning effects occurred, although the optimal allocation was often not found. From the calibrated
models of Study 1, a priori predictions were derived and tested in Study 2. This generalization test shows
that the local adaptation model provides a better account of learning in resource allocations than the
global search model.
How do people learn to improve their decision-making behavior
through past experience? The purpose of this article is to compare
two fundamentally different learning approaches introduced in the
decision-making literature that address this issue. One approach,
called global search models, assumes that individuals form expec-
tancies for every feasible choice alternative by keeping track of the
history of all previous decisions and searching for the strongest of
all these expectancies. Prominent recent examples that belong to
this approach are the reinforcement-learning models of Erev and
Roth (1998; see also Erev, 1998; Roth & Erev, 1995). These
models follow up a long tradition of stochastic learning models
we are considering.
Resource Allocation Decision Making
Allocating resources to different assets is a decision problem
people often face. A few examples of resource allocation decision
making are dividing work time between different activities, divid-
ing attention between different tasks, allocating a portfolio to
different financial assets, or devoting land to different types of
farming. Despite the ubiquity in real life, resource allocation
decision making has not been thoroughly acknowledged in the
psychological literature.
How good are people at making resource allocation decisions?
Only a handful of studies have tried to address this question. In one
of the earliest studies by Gingrich and Soli (1984), participants
Jo ¨ rg Rieskamp and Jerome R. Busemeyer, Department of Psychology,
Indiana University Bloomington; Tei Laine, Computer Science Depart-
ment, Indiana University Bloomington.
This study was supported in part by Grants SBR9521918 and
SES0083511 from the Center for the Study of Institutions, Population, and
Environmental Change through the National Science Foundation.
We acknowledge helpful comments by Jim Walker and Hugh Kelley
with whom we worked on a similar research project on which the present
study is based. In addition, we thank Ido Erev, Scott Fisher, Wieland
Mu ¨ ller, Elinor Ostrom, Reinhard Selten, the members of the bio-
complexity project at Indiana University Bloomington and two anonymous
reviewers for helpful comments.
Correspondence concerning this article should be addressed to Jo ¨rg
Rieskamp, who is now at the Max Planck Institute for Human Develop-
ment, Lentzeallee 94, 14195 Berlin, Germany. E-mail: rieskamp@mpib-
berlin.mpg.de
Journal of Experimental Psychology: Copyright 2003 by the American Psychological Association, Inc.
Learning, Memory, and Cognition
2003, Vol. 29, No. 6, 1066–1081
0278-7393/03/$12.00 DOI: 10.1037/0278-7393.29.6.1066
1066were asked to evaluate all of the potential assets before making
their allocations. Although the assets were evaluated accurately,
the majority of participants failed to find the optimal allocation.
Northcraft and Neale (1986) also demonstrated individuals’ diffi-
culties with allocation decisions when attention had to be paid to
financial setbacks and opportunity costs.
Benartzi and Thaler (2001) studied retirement asset allocations.
For this allocation problem the strategy of diversifying one’s
investment among assets (i.e., bonds and stocks) appears to be
reasonable (Brennan, Schwartz, & Lagnado, 1997). Benartzi and
Thaler showed that many people follow a “1/n strategy” by equally
dividing the resource among the suggested investment assets.
Although such a strategy leads to a sufficient diversification, the
final allocation depends on the number of assets and, thus, can lead
to inconsistent decisions.
The above studies show that individuals often do not allocate
their resources in an optimal way, which is not surprising given the
complexity of most allocation problems. Furthermore, in the above
studies, little opportunity was provided for learning because the
allocation decisions were made only once or infrequently. In
contrast, Langholtz, Gettys, and Foote (1993) required participants
to make allocations repeatedly (eight times). Two resources—fuel
and personnel hours for helicopters—were allocated across a
working week to maximize the operating hours of the helicopters.
Participants improved their performance substantially through
learning and almost reached the optimal allocation. However,
under conditions of risk or uncertainty, in which the amount of the
resource fluctuated over time, the improvement was less substan-
tial. For similar allocation problems, Langholtz, Gettys, and Foote
(1994, 1995) and Langholtz, Ball, Sopchak, and Auble (1997)
showed that learning leads to substantially improved allocations.
Interestingly, a tendency to allocate the resource equally among
the assets was found here also. Ball, Langholtz, Auble, and
Sopchak (1998) investigated participants’ verbal protocols when
they solved an allocation problem. According to these protocols,
participants seemed to use simplifying heuristics, which brought
them surprisingly close to the optimal allocation (i.e., reached on
average 94% efficiency).
In a study by Busemeyer, Swenson, and Lazarte (1986), an
extensive learning opportunity was provided, as participants made
30 resource allocations. Participants quickly found the optimal
allocation for a simple allocation problem that had a single global
maximum. However, when the payoff function had several max-
ima, the optimal allocation was frequently not found. Busemeyer
and Myung (1987) studied the effect of the range of payoffs
between the best and worst allocation and the variability of return
rates for the assets. For the majority of conditions, participants
reached good allocations through substantial learning. However,
under a condition with widely varying return rates and with a low
range of payoffs, participants got lost and did not exhibit much
learning effect. Furthermore, Busemeyer and Myung showed that
a hill-climbing learning model, which assumes that individuals
improve their allocations step-by-step, provided a good description
of the learning process. However, this model was not compared
against alternative models, and therefore it remains unclear
whether this is the best way to characterize learning in these tasks.
It can be concluded that when individuals make only a single
resource allocation decision, with no opportunity to learn from
experience, they generally do not find good allocations at once. In
such situations, individuals have a tendency to allocate an equal
share of the resource to the different assets, which, depending on
the situation, can lead to bad outcomes. Alternatively, when indi-
viduals are given the opportunity to improve their allocations
through feedback, substantial learning effects are found and indi-
viduals often approach optimal allocations. However, if local
maxima are present or the payoffs are highly variable, then sub-
optimal allocations can result even after extensive training.
In the following two studies, repeated allocation decisions with
outcome feedback were made, providing sufficient opportunity for
learning. The allocation problem of both studies can be defined as
follows: The decision maker is provided with a financial resource
that can be invested in three financial assets. A particular alloca-
tion, i (allocation alternative), can be represented by a three-
dimensional vector, X, where each dimension represents the pro-
portion of the resource invested in one of the three assets. For
repeated decisions, the symbol Xt
represents the allocation made at
Trial t. The distance between two allocations can be measured by
the Euclidean distance, which is the length of the vector that leads
from one allocation to the other.
1
The restriction of proportions to
integer percentages implies a finite number (N  5,151) of pos-
sible allocations.

The first learning model proposed to describe learning effects of
repeated resource allocations represents the global search (GLOS)
model approach. The GLOS model presented here is a modified
version of the reinforcement-learning model proposed by Erev
(1998). The second learning model represents the local adaptation
(LOCAD) model approach. The LOCAD model presented here is
a modified version of the hill-climbing learning model proposed
by Busemeyer and Myung (1987).
As already pointed out above, previous research has demon-
strated that the two learning model approaches have been success-
fully applied to describe people’s learning processes for various
decision problems: For example, in Busemeyer and Myungs
(1987) research, a hill-climbing learning model was appropriate to
describe people’s learning process for a resource allocation task;
Busemeyer and Myung (1992) also applied a hill-climbing model
successfully to describe criterion learning for a probabilistic cate-
gorization task. In contrast, Erev (1998) has shown that a
reinforcement-learning model is also appropriate to describe the
learning process for a categorization task. Furthermore, Erev ex-
plicitly proposed the reinforcement-learning model as an alterna-
tive model to the hill-climbing model. Moreover, Erev and Gopher
(1999) suggested the reinforcement-learning model for a resource
allocation task in which attention was the resource to be allocated
and which shows, by simulations, that the model’s predictions are
consistent with experimental findings. In summary, direct compar-
isons of the two approaches appear necessary to decide in which
domain each learning model approach works best. To extend the
generality of our comparison of the GLOS and LOCAD models,
we first compare the models with respect to how well they predict
a learning process for repeated resource allocations and, second,
we test whether the models are also capable of predicting individ-
ual characteristics of the learning process. Finally, although it is
1
The Euclidean distance between two allocations Xki
and Xkj
with three
possible assets k is defined as Dij
 k1
3
Xki
 Xkj

2
.
1067 LEARNING TO ALLOCATE RESOURCEStrue that we can only test special cases for each approach, we
currently do not know of any other examples within either ap-
proach that can outperform the versions that we are testing.
Global Search Model
Erev (1998), Roth and Erev (1995), and Erev and Roth (1998)
have proposed in varying forms a reinforcement-learning model
for learning in different decision problems. The GLOS model was
particularly designed for the resource allocation problem. The
basic idea of the model is that decisions are made probabilistically
proportional to expectancies (called propensities by Erev and
colleagues). The expectancy for a particular option increases
whenever a positive payoff or reward is provided after it is chosen.
This general reinforcement idea can be traced back to early work
by Bush and Mosteller (1955), Estes (1950), and Luce (1959); for
more recent learning models see Bo ¨ rgers and Sarin (1997), Cam-
erer and Ho (1999a, 1999b), Harley (1981), Stahl (1996), and
Sutton and Barto (1998).
The GLOS learning model for the resource allocation problem is
based on the following assumptions: Each allocation alternative is
assigned a particular expectancy. First, an allocation is selected
probabilistically proportional to the expectancies. Second, the re-
ceived payoff is used to determine the reinforcement for all allo-
cation alternatives, in such a way that the chosen allocation alter-
native receives reinforcement equal to the obtained payoff;
allocation alternatives close to the chosen one receive slightly less
reinforcement; and allocation alternatives that are far away from
the chosen allocation alternative receive very little reinforcement.
Finally, the reinforcement is used to update the expectancies of
each allocation alternative and the process returns to the first step.
In more detail GLOS is defined as follows: The preferences for
the different allocation alternatives are expressed by expectancies
qit
, where i is an index of the finite number of possible allocations.
The probability pit
that a particular allocation, i, is chosen at Trial
t is defined by (cf. Erev & Roth, 1998)
pit
 qit
/  i1
N
qit
. (1)
For the first trial, all expectancies are assumed to be equal and
determined by the average payoff that can be expected from
random choice, multiplied by w, which is a free, so-called “initial
strength parameter” and is restricted by w  0. After a choice of
allocation alternative j on Trial t is made, the expectancies are
updated by the reinforcement received from the decision, which is
defined as the received payoff, rjt
. For a large grid of allocation
alternatives, it is reasonable to assume that not only the chosen
allocation is reinforced but also similar allocations. Therefore, to
update the expectancies of any given allocation alternative, i, the
reinforcement rit
is determined by the following generalization
function (cf. Erev, 1998):
rit
 rjt
 gxij
  rjt
 exp  xij
2
/2 R
2
, (2)
where xij
is the Euclidean distance of a particular allocation, i, to
the chosen allocation, j, and with the standard deviation R as the
second free parameter. This function was chosen so that the
reinforcement rit
for the chosen allocation j should equal the
received payoff r jt
.
2
In the case of a negative payoff, rjt
 0,
Equation 2 was modified as follows: rit
 rjt
g(xij
)  rjt
. By using
this modification, if the current payoff is negative, then the chosen
allocation receives the reinforcement of zero, whereas all other
allocation alternatives receive positive reinforcements. Finally, the
determined reinforcement is used to update the expectancies by the
following updating rule (see Erev & Roth, 1998):
qit
 1  qit1  rit
, (3)
where   [0,1] is the third free parameter, the forgetting rate. The
forgetting rate determines how strongly previous expectancies
affect new expectancies. If the forgetting rate is large, the obtained
reinforcement has a strong effect on the new expectancies. To
ensure that all possible allocation alternatives are chosen, at least
with a small probability, the minimum expectancy for all options
is restricted to v  0.0001 (according to Erev, 1998). After the
updating process, the probability of selecting any particular allo-
cation alternative is determined again.
In summary, the GLOS learning model has three free parame-
ters: (a) the initial strength parameter, w, which determines the
impact of the initial expectancies; (b) the standard deviation, R,
of
the generalization function, which determines how similar (close)
allocations have to be to the chosen allocation to receive substan-
tial reinforcement; and (c) the forgetting rate  that determines the
impact of past experience compared with present experience. It is
important to limit the number of parameters to a relatively small
number because models built on the basis of too many parameters
will fail to generalize to new experimental conditions.
Local Adaptation Learning Model
The LOCAD learning model incorporates the idea of a hill-
climbing learning mechanism. In general, hill-climbing mecha-
nisms are widely used heuristics for optimization problems whose
analytic solutions are too complex (Russel & Norvig, 1995). The
basic idea is to start with a randomly chosen decision as a tempo-
rary solution and to change the decision slightly in the next trial.
If the present decision leads to a better outcome than a reference
outcome (i.e., the best previous outcome), this decision is taken as
a new temporary solution. Starting from this solution, a slightly
different decision is made in the same direction as the present one.
If the present decision leads to an inferior outcome, the temporary
solution is kept, and starting from this solution a new decision is
made in the opposite direction from the present decision. The step
size, that is, the distance between successive decisions, usually
declines during search. The search stops when no further changes
using this method yield substantial improvement. This process
requires that the available decision alternatives have an underlying
causal structure, such that they can be ordered by some criteria and
2
This is one aspect where the GLOS model varies from Erev’s (1998)
reinforcement model, where the density of the generalization function is set
equal to the received payoff. This constraint, set by Erev, has the disad-
vantage that the standard deviation of the generalization function, which is
supposed to be a free parameter, interacts with the received payoff used as
a reinforcement, such that a large reinforcement for a chosen allocation is,
for example, only possible if a small standard deviation of the generaliza-
tion function is chosen. We are confident that this difference and all other
differences of the GLOS model from the reinforcement model by Erev
represent improvements, in particular for the allocation task, which makes
it a strong competitor to the LOCAD model.
1068 RIESKAMP, BUSEMEYER, AND LAINEa direction of change exists. Consequently, for decision problems
that do not fulfill this requirement, the LOCAD learning model
cannot be applied. It is well known that hill-climbing heuristics are
efficient, because they often require little search, but their disad-
vantage can be suboptimal convergence (Russel & Norvig, 1995),
in other words, “getting stuck” in a local maximum.
LOCAD is defined as follows: It is assumed that decisions are
made probabilistically as in the GLOS learning model. In the first
trial, identical to GLOS, an initial allocation is selected with equal
probability, pit
, of all possible allocations. For the second alloca-
tion, the probability of selecting any particular allocation is defined
by the following distribution function:
pit
 fSxij
/K  exp  xij
 st

2
/2S
2
/K, (4)
where xij
is the Euclidean distance of any allocation, i, to the first
chosen allocation, j, with a standard deviation S as the first free
parameter, and K is simply a constant that normalizes the proba-
bilities so that they sum to one. The step size, st
, changes across
trials as follows:
st
 s1
2
t1  t2
b
 s1
t
, (5)
where s1 is the initial step as the second free parameter, vt
is the
received payoff (with v0  0), and vb is the payoff of the reference
allocation. The reference allocation is the allocation alternative
that produced the highest payoff in the past and is represented by
the index b for best allocation so far. Accordingly, the step size is
defined by two components. The first component depends on the
payoffs of the preceding allocations and the maximum payoff
received so far. The second component is time, manipulated so that
the step size automatically declines over time. Note that for Trial
t  2, the step size, s2,
equals the initial step size, s1.
For the third and all following trials, the probability of selecting
any particular allocation is determined by the product of two
operations, one that selects the step size and the other that selects
the direction of change. More formally, the probability of selecting
an allocation alternative i on Trial t  2 is given by
pit
 fSxibfAyij
/K. (6)
In the above equation, the probability of selecting a step size is
determined by the function, fS (xib), which is the same function
previously defined in Equation 4 with the distance xib defined as
the Euclidean distance from any allocation i to the reference
allocation b. The second function is represented by fA (yij
) 
exp[(yij
– at
)
2
/2A
2
], where yij
is the angle between the direction
vector of any allocation i to the direction vector of the preceding
allocation, j, and at
equals 0° if the preceding allocation led to a
higher or equal payoff than the reference allocation; otherwise at
equals 180o
. The direction vector of any allocation i is defined as
the vector from the preceding allocation j to the allocation i
(defined as Xi
– Xj
). The angle between the two direction vectors
ranges from 0° to 180° (mathematically the angle is determined by
the arccosines of the vector product of the two direction vectors
normalized to a length of one). The function fA (yij
) has a standard
deviation A as the third free parameter.
In summary, the LOCAD learning model has the following
steps. In the first trial, an allocation alternative is chosen with
equal probability, and in the second trial a slightly different allo-
cation alternative is selected. For selecting an allocation alternative
in the third and all following trials, the payoff received in the
preceding trial is compared with the reference allocation that
produced the maximum payoff received so far (this is an important
difference to the model proposed by Busemeyer & Myung, 1987,
where the reference allocation was the previous allocation). If the
payoff increased (or stayed the same), allocations in the same
direction as the preceding allocation are likely to be selected. On
the other hand, if the payoff decreased, allocations in the opposing
direction are more likely to be selected. The LOCAD learning
model has three free parameters: (a) the initial step size, s1, which
is used to determine the most likely distance between the first and
second allocation, and on which the succeeding step sizes depend;
(b) the standard deviation, S, of the distribution function, fS,
which determines how likely the distance between new allocations
and the reference allocation differ from the distance defined by the
step size, st
; and (c) the standard deviation, A, of the distribution
function, fA, which determines how likely the direction of new
allocations differ from the direction (or opposing direction) of the
preceding allocation.
The LOCAD learning model has similarities to the learning
direction theory proposed by Selten and Sto ¨cker (1986) and to the
hill-climbing learning model proposed by Busemeyer and Myung
(1987). Learning direction theory also assumes that decisions are
slightly adjusted on the basis of feedback, by comparing the
outcome of a decision with hypothetical outcomes of alternative
decisions. The LOCAD model represents a simple learning model
with only three free parameters, compared with the hill-climbing
model proposed by Busemeyer and Myung (1987) with eight free
parameters.
The LOCAD model is to some extent also related to so-called
belief-based learning models (Brown, 1951; Cheung & Friedman,
1997; Fudenberg & Levine, 1995; see also Camerer & Ho’s,

produce higher payoffs compared with the present decision. How-
ever, in contrast to belief-based models, these beliefs are not based
on foregone payoffs that are determined by the total history of past
decisions but are based on an assumption of the underlying causal
structure of the decision problem.
The Relationship of the Two Learning Models
The two models presented, in our view, are appropriate imple-
mentations of the two approaches of learning models we consider.
Any empirical test of the two models, strictly speaking, only
allows conclusions on the empirical accuracy of the particular
learning models implemented. However, keeping this restriction in
mind, both learning models are provided with a flexibility (ex-
pressed in the three free parameters of each model) that allows
them to predict various learning processes. Variations of our
implementations (e.g., using an exponential choice rule for deter-
mining choice probabilities instead of the implemented linear
choice rule) might increase the empirical fit of the model but will
1069 LEARNING TO ALLOCATE RESOURCESnot abolish substantial different predictions made by the two
learning models for the allocation decision problem we consider.
3
What are the different predictions that can be derived from the
two learning models? In general, the GLOS model predicts that the
probabilities with which decision alternatives are selected depend
on the total stock of previous reinforcements for these alternatives.
This implies a global search process within the entire set of
alternatives, which should frequently find the optimal alternative.
In contrast, the LOCAD model only compares the outcome of the
present decision with the best outcome so far and ignores all other
experienced outcomes, which are not integrated in an expectancy
score for each alternative. Instead, which alternatives will be
chosen depends on the success and the direction of the present
decision; thereby, an alternative similar to the present alternative
will most likely be selected. This implies a strong path depen-
dency, so that depending on the starting point of the learning
process, the model will often not converge to the optimal outcome
if several payoff maxima exist.
However, the specific predictions of the models depend on the
parameters, so that particular parameter values could lead to sim-
ilar behavior for both models. For example, if the GLOS model has
a high forgetting rate, the present allocation strongly influences the
succeeding allocation, resulting in a local search similar to the
LOCAD model, so that it could also explain convergence to local
payoff maxima. Likewise, if the LOCAD model incorporates a
large initial step size it implies a more global, random search
process, and therefore it could explain convergence to a global
maximum. Because of this flexibility of both models, one can
expect a relatively good fit of both models when the parameter
values are fitted to the data.
Therefore, we used the generalization method (Busemeyer &
Wang, 2000) to compare the models, which entails using a two-
stage procedure. As a first stage, in Study 1, each model was fit to
the individual learning data, and the fits of the two models were
compared. These fits provided estimates of the distribution of the
parameters over individuals for each model. As a second stage, the
parameter distributions estimated from Study 1 were used to
generate model predictions for a new learning condition presented
in Study 2. The accuracies of the a priori predictions of the two
models for the new condition in Study 2 provide the basis for a
rigorous comparison of the two models.
Study 1
In this experiment, the decision problem consisted of repeatedly
allocating a resource among three financial assets. The rates of
return were initially unknown, but they could be learned by feed-
back from past decisions. To add a level of difficulty to the
decision problem, the rate of return for each asset varied depending
on the amount invested in that asset and on the amount invested in
the other assets. One could imagine a real-life analogue in which
financial assets have varying returns because of fixed costs, econ-
omies of scale, or efficiency, depending on investments in other
assets. The purpose of the first study was to explore how people
learn to improve their allocation decisions and whether they are
able to find the optimal allocation that leads to the maximum
payoff. Study 1 was also used to compare the fits of the two
models with the individual data and to estimate the distribution of
parameters for each model.
Method
Participants. Twenty persons (14 women and 6 men), with an average
age of 22 years, participated in the experiment. The computerized task
lasted approximately 1 h. Most participants (95%) were students in various
departments of Indiana University. For their participation, they received an
initial payment of $2. All additional payments depended on the partici-
pants’ performance; the average payment was $18.
Procedure. The total payoff from an allocation is defined as the sum of
payoffs obtained from the three assets. The selection of the particular
payoff function was motivated by the two learning models’ predictions. As
can be seen in Figure 1, the allocation problem was constructed such that
a local and a global maximum with respect to the possible payoffs resulted.
In general, one would expect that people will get stuck at the local payoff
maximum if their learning process is consistent with the LOCAD model. In
contrast, the GLOS model predicts a learning process that frequently
should converge at the global payoff maximum.
Figure 1 only shows the proportion invested in Asset B and Asset C,
with the rest being invested in Asset A. High investments in Asset C lead
to low payoffs (in the worst case, a payoff of $3.28), whereas low
investments in Asset C result in higher payoffs. The difficult part is to find
out that there are two payoff maxima: first, the local maximum with a
payoff of $32.82 when investing 28% in Asset B and 19% in Asset C and,
second, the global maximum with a payoff of $34.46 when investing 88%
in Asset B and 12% in Asset C, yielding a difference of $1.64 between the
two maxima. Note that there is no variability in the payoffs at each
allocation alternative so that if a person compares the local and the global
maximum, it is perfectly obvious that there is a payoff difference between
them favoring the latter. The main difficulty for the person is finding the
global maximum, not detecting a difference between the local and global
maxima. The Euclidean distance between the corresponding allocations of
the local and global maximum is 80 (the maximum possible distance
between two allocations is 141). From random choice an average payoff of
$24.39 can be expected. The payoff functions for each asset are provided
in the Appendix.
The participants received the following instructions: They were to make
repeated allocation decisions in two phases of 100 trials. On each trial, they
would receive a loan of $100 that had to be allocated among three
“financial assets” from which they could earn profit. The loan had to be
repaid after each round, so that the profit from the investment decisions
equaled the participant’s gains. The three assets were described as follows:
Investments in Asset A “pay a guaranteed return equal to 10% of your
investment,” whereas the returns from Asset B and Asset C depended on
how much of the loan was invested in the asset. Participants were informed
that there existed an allocation among Asset A, Asset B, and Asset C that
would maximize the total payoffs and that the return rates for the three
assets were fixed for the whole experiment. It was explained that they
would receive 0.25% of their total gains as payment for their participation.
After the first phase of 100 trials, participants took a small break. There-
after, they received the information that the payoff functions for Asset B
3
In fact when we constructed the learning models for Study 1, various
modifications of both learning models were tested. For example, for the
GLOS model, among other things, we used different generalization func-
tions to determine reinforcements (see also footnote 2) or different methods
to determine the reinforcement in case of negative payoffs. For the
LOCAD model, among other things, we used different reference outcomes
with which the outcome of a present decision was compared with deter-
mining the success of a decision or different methods for how the step size
of the current trial was determined. In summary, the specified LOCAD and
GLOS learning models were the best models (according to the goodness-
of-fit criterion in Study 1) representing the two approaches of learning
models. Therefore, the conclusions we draw from the results of our model
comparisons are robust to variations of the present definition of the two
learning models.
1070 RIESKAMP, BUSEMEYER, AND LAINEand Asset C were changed but that everything else was identical to the first
block.
In fact, the payoff functions for Asset B and Asset C were interchanged
for the second phase. To control any order effects of which payoff function
was assigned to Asset B and which to Asset C, for half of the participants
the payoff function of Asset B in the first phase was assigned to Asset C
and the payoff function for Asset C was assigned to Asset B. For the
second phase, the reverse order was used.
Results
First, a potential learning effect is analyzed before the two
learning models are compared and more specific characteristics of
the learning process are considered.
Learning effects. The average investment in Asset B increased
from 26% (SD  15%) in the 1st trial to an average investment of
36% (SD  21%) in the 100th trial, whereas the average invest-
ment in Asset C decreased from an average of 34% (SD  21%)
in the 1st trial to an average of 19% (SD  8%) in the 100th trial.
This difference represents a substantial change in allocations cor-
responding to an average Euclidean distance of 28 (SD  29),
t(19)  4.21, p  .001, d  0.94. Furthermore, this change leads
to an improvement in payoffs, which is discussed in more detail
below. Figure 2 shows the learning curve for the first phase of the
experiment. The percentages invested in Asset B and Asset C are
plotted as a function of training (with a moving average of 9 trials).
To investigate the potential learning effect, the 100 trials of each
phase were aggregated into blocks of 10 trials (trial blocks). A
repeated measure analysis of variance (ANOVA) was conducted,
with the average obtained payoff as the dependent variable, the
trial blocks and the two phases of 100 trials as two within-subject
factors, and the order in which the payoff functions were assigned
to the assets as a between-subjects factor. A strong learning effect
could be documented, as the average obtained payoff of $28 in the
first block (SD  2.2) increased substantially across the 100 trials
to an average payoff of $32 (SD  1.6) in the last block, F(9,
10)  5.18, p  .008, 2
 0.82. In addition, there was a learning
effect between the two phases as participants on average did better
in the second phase (M  $30 for the first phase, SD  2.1 vs.
M  $31 for the second phase, SD  1.9), F(1, 18)  12.69, p 
.002, 2
 0.41. However, this effect was moderated by an
interaction between trial blocks and the two phases, F(9, 10) 
3.30, p  .038, 2
 0.75. This interaction can be attributed to a
more rapid learning process for the second phase compared with
the first phase: The average obtained payoff was higher in the
second phase from the 2nd to 5th trial blocks, whereas for the 1st
trial block and last 5 trial blocks, the payoffs did not differ. The
order in which the payoff functions were assigned to Asset B and
Asset C had no effect on the average payoffs (therefore, for
simplicity, in the following and for the presented figures, the
investments in Assets B and C are interchanged for half of the
participants). No other interactions were observed.
Model comparison. How well do the two learning models fit
the observed leaning data? We wished to compare the models
under conditions where participants had no prior knowledge, and
so we only used the data from the first phase to test the models.
Each model was fit separately to each individual’s learning data as
follows.
First, a set of parameter values were selected for a model
separately for each individual. Using the model and parameters,
we generated a prediction for each new trial, conditioned on the
past allocations and received payoffs of the participant before that
trial. The model’s predictions are represented by a probability
distribution across all possible 5,151 allocation alternatives, where
the selected allocation alternative of a participant received a value
of 1 and all other allocation alternatives received values of 0. The
accuracy of the prediction for each trial was evaluated using the
Figure 1. The payoff function for the total payoff of the allocation
problem in Study 1. The figure shows the investment in Asset B and Asset
C (which determines the investment in Asset A) and the corresponding
payoff.
Figure 2. Average participants’ allocations and average predictions of the
two learning models fitted to each individual. The figure shows a moving
average of nine trials, such that for each trial the average of the present
allocation and the preceding and succeeding four allocations are presented.
(Note for the first 4 trials the moving average is determined by five to eight
trials.) GLOS  global search model; LOCAD  local adaptation model.
Solid diamonds represent Real Asset B; solid triangles represent Real Asset
C; solid lines represent GLOS Asset B; hatched lines represent GLOS
Asset C; open squares represent LOCAD Asset B; and open triangles
represent LOCAD Asset C.
1071 LEARNING TO ALLOCATE RESOURCESsum of squared error. That is, we computed the squared error of the
observed (0 or 1) response and the predicted probability for each
of the 5,151 allocation alternatives and summed these squared
errors across all the alternatives for each trial to obtain the sum of
squared error for each trial (this score ranged from 0 to 2). To
assess the overall fit for a given individual, model, and set of
parameters, we determined the average of the sum of squared error
(SSE) across all 100 trials.
4
To compare the fits of the two learning models for Study 1, we
searched for the parameter values that minimized the SSE for each
model and individual. To optimize the parameters for each partic-
ipant and model, reasonable parameter values were first selected
by a grid-search technique, and thereafter the best fitting grid
values were used as a starting point for a subsequent optimization
using the Nelder–Mead simplex method (Nelder & Mead, 1965).
For the optimization process, the parameter values for the GLOS
model were restricted to initial strength values w between 0 and 10,
standard deviations R of the generalization function between 1
and 141, and forgetting rates  between 0 and 1. The parameter
values for the LOCAD model were restricted to initial step sizes
between 1 and 141, a standard deviation S of the distribution
function fS between 1 and 141, and a standard deviation A of the
distribution function fA between 0° and 360°.
The above procedure was applied to each of the 20 participants
to obtain 20 sets of optimal parameter estimates. For the GLOS
model, this produced the following means and standard deviations
for the three parameters: initial strength mean, w  3.4 (SD  4.4);
forgetting rate mean,   0.24 (SD  0.18); and a standard
deviation mean, R  1.8 (SD  1.5) of the generalization
function. The mean and standard deviation of the SSE for the
GLOS model was 0.94 (SD  0.10).
For the LOCAD learning model, this estimation procedure pro-
duced the following means and standard deviations: Initial step
size mean of s1  23 (SD  30), standard deviation mean for the
distribution function fS of S  22 (SD  40), and a standard
deviation mean for the distribution function fA of A  119o
(SD  72). The mean and standard deviation of the SSE for the
LOCAD model were 0.91 and 0.18, respectively. In summary, for
Study 1 the LOCAD model was slightly more appropriate com-
pared with the GLOS model according to the SSE to predict
participants’ allocations (Z  1.5, p  .135; Wilcoxon signed rank
test).
Figure 2 shows the average allocation of the participants across
the first 100 trials. Additionally, Figure 2 shows the predicted
average allocation by both learning models when fitted to each
participant. Both models adequately describe the last two thirds of
the learning process. However, for the first third, GLOS predicts
an excessively large proportion invested in Asset C, whereas
LOCAD overestimates the proportion invested in Asset B and
underestimates the proportion invested in Asset C.
5
Individual characteristics of the learning process. In addition
to analyzing the allocations of the participants, one can ask
whether the learning models are also capable of predicting indi-
vidual characteristics of the learning process. One characteristic is
whether a participant eventually found the global maximum, only
came close to the local maximum, or was not close to either
maximum. Figure 3A shows the percentage of participants who
were close ( 5%) to the allocations that produced the global or
local maximum across the 100 trials. In the first trial, no partici-
pant made an allocation corresponding to the local or global
maximum. At the end of training, only 10% of participants were
able to find the optimal allocation producing the maximum payoff,
whereas 50% of the participants ended up choosing allocations
close to the local maximum. Figure 3A also shows the predictions
of the models. Both models accurately describe the proportion of
participants who make allocations according to the local or global
maximum.
As noted earlier, participants were able to increase their payoffs
over the 100 trials through learning (see Figure 3B). Both learning
models also accurately describe this increase in payoffs.
As a third criterion for comparing the two learning models, the
effect of training on the magnitude with which individuals changed
their decisions was considered. To describe these changes during
learning, the Euclidean distances between successive trials were
determined. Figure 4A shows that in the beginning of the learning
process, succeeding allocations differed substantially with an av-
erage Euclidean distance of 30 units, whereas at the end of the
task, small changes in allocations were observed (M distance  9
units). LOCAD more accurately predicts the magnitude with
which participants change their allocation than GLOS. GLOS on
average predicts a too small magnitude with which allocations are
changed in successive trials.
A fourth characteristic to examine is the direction of change
in allocations that individuals made following different types of
outcome feedback. The LOCAD model predicts that the out-
come of a decision is compared with the outcome of the most
successful decision to that point, and, if the present decision
leads to a greater payoff, the direction of the succeeding deci-
sion is likely to be in the same direction as the present decision.
If a decision leads to a smaller payoff, the succeeding decision
is likely to be in the opposite direction. In contrast, the GLOS
model predicts that a decision is based on the aggregated
success and failure of all past decisions, so that no strong
correlation between the success of a present decision and the
direction of the succeeding decision is expected. To test this
prediction, the angles between the direction of an allocation and
the direction of the preceding allocation were determined for all
allocations. Figure 4B shows the proportion of the preceding
allocations that were successful for all preceding allocations
(i.e., led to a greater payoff than the allocation before), cate-
gorized with respect to the angle between the direction of an
allocation and the direction of the preceding allocation. Con-
sistent with the LOCAD model, we observed an association
4
As an alternative method for parameter estimation, compared with the
least-squares estimation, maximum likelihood estimation has the drawback
that it is sensitive to very small predicted probabilities, which frequently
occurred for the present task with the large number of possible allocations;
for advantages of the least-squares estimation see Selten (1998). Further-
more, the optimal properties of maximum likelihood only hold when the
model is the true model, which is almost never correct. In addition, these
properties only hold if the parameters fall inside the convex boundary of
the parameters, which is not guaranteed in our models. In summary, under
conditions of possible model misspecification, least-squares estimation is
more robust that maximum likelihood estimation, so the statistical justifi-
cations for maximum likelihood do not hold up under these conditions.
5
Note that the models’ parameters were not fitted by optimizing the
predicted average allocations compared with the observed average alloca-
tions but by optimizing the predicted probabilities of which allocation was
selected; otherwise a closer fit would result.
1072 RIESKAMP, BUSEMEYER, AND LAINEbetween the participants’ allocation directions and their suc-
cess: For 70% of all allocations made in the same direction as
the preceding allocation, the preceding allocation was success-
ful compared with only 35% of all allocations made in an
opposite direction.
This association was predicted, although to different extents,
by both models. As expected for LOCAD, a preceding alloca-
tion was likely to be successful (in 67% of all cases) when the
direction of an allocation was the same as the direction of the
preceding allocation (angles between 0 and 30o
), whereas the
preceding allocation was unlikely to be successful (only in
41% of all cases) when the direction of an allocation was
opposite to the preceding direction. Surprisingly, this associa-
tion was also observed for GLOS: For 73% of all allocations
Figure 3. Individual characteristics of the decision process in Study 1. A: Percentage of allocations corre-
sponding to the local or global payoff maximum across all trials (with a tolerated deviation of  5% from the
allocations that lead to the global or local maximum), presented with a moving average of nine trials. B: Average
payoff across all trials, presented with a moving average of nine trials. GLOS  global search model; LOCAD 
local adaptation model.
1073 LEARNING TO ALLOCATE RESOURCESmade in a similar direction to the preceding allocation, the
preceding allocation was successful, compared with 39% of all
allocations made in an opposite direction. However, the pro-
portions of successful preceding allocations for the different
angles was more strongly correlated with LOCAD’s predictions
(r  .95) than with GLOS’s predictions (r  .84).
Summary of Study 1
In Study 1, we showed that people are able to improve their
decisions in an allocation situation substantially when provided
with feedback. However, only a few participants were able to find
the allocation that produced the maximum possible payoff. This
Figure 4. Individual characteristics of the decision process in Study 1. A: Average magnitude of changes (step
size) measured with the Euclidean distance between the allocations of successive trials (with possible values
ranging from 0 to 141), presented with a moving average of nine trials. B: The angles between allocations’
directions compared with the direction of preceding allocations were determined and categorized in six intervals.
For each category, the percentage of successful preceding allocations (i.e., those leading to a higher payoff than
the allocations before) are presented. GLOS  global search model; LOCAD  local adaptation model.
1074 RIESKAMP, BUSEMEYER, AND LAINEresult can be explained by the LOCAD learning model, which
described the empirical results slightly better than the GLOS
learning model, on the basis of the goodness-of-fit criterion. If
people start with a particular allocation and try to improve their
situation by slightly adapting their decisions, as predicted by
LOCAD, depending on their starting position, they will often not
find the global payoff maximum.
However, because both models were fitted to each individual
separately, it is difficult to decide which model is more appropri-
ate, as the two models make similar predictions.When focusing on
several individual learning characteristics only one out of four
characteristics supports the LOCAD model: the magnitude with
which allocations are changed in successive trials. The other three
process characteristics are appropriately described by both learn-
ing models. This result is not very surprising if one considers that
both models were fitted for each individual and only predicted
each new trial on the basis of the information of previous trials. In
contrast, in Study 2 both models made a priori predictions for
independent data, enabling a rigorous comparison of the two
models.
Study 2
In light of the results found in Study 1 that people, even when
provided with substantial learning opportunity, often end up with
suboptimal outcomes, one might object that the function of the
total payoff used in Study 1 only produced a relatively small
payoff difference between the two maxima, providing small in-
centives for participants to search for the global maximum. In
addition, if one takes the opportunity costs of search into account,
it might be reasonable to stay at the local maximum. One could
criticize that the small difference between the payoffs does not
satisfy the criterion of payoff dominance (Smith, 1982), that is, the
additional payoff does not dominate any (subjective) costs of
finding the optimal outcome, so that participants are not suffi-
ciently motivated to find the global payoff maximum. In Study 2,
we addressed this critique by increasing the payoff difference
between the local and global payoff maximum but keeping the
shape of the total payoff function similar to that in Study 1.
Increasing the payoff difference between the local and global
payoff maximum has direct implications for the predictions of the
GLOS learning model: If the reinforcement for the global payoff
maximum increases relative to the local payoff maximum, the
probability of selecting the allocation alternative corresponding to
the global maximum should increase according to the GLOS
model. Therefore, one would expect the GLOS model to predict
that more people will find the global maximum. In contrast, a
larger payoff difference between the local and global payoff max-
imum does not affect the prediction of the LOCAD model.
Study 2 also provides an opportunity to test the two learning
models on new independent data, by simulating 50,000 agents
using the model parameter values randomly selected from normal
distributions with the means and standard deviations of the param-
eter values derived from the individual fitting process of Study 1.
Given that the models’ parameter values are not fitted by the data
of Study 2, the models’ predictions provide a stronger empirical
generalization test of the models, which has been often asked for
but seldom done (Busemeyer & Wang, 2000).
Method
Participants. Twenty persons (13 women and 7 men) with an average
age of 21 years participated in the experiment. The duration of the com-
puterized task was approximately 1 h. Most participants (90%) were
students in various departments of Indiana University. For their participa-
tion they received an initial payment of $2. Additional payment was
contingent on the participants’ performance; the average payment was $20.
Procedure. The allocation problem was identical to the one used in
Study 1, with the only difference being the modified payoff functions. The
payoff functions differed by an increase in the payoff difference between
the local and global payoff maximum (see Figure 5). Again, high invest-
ments in Asset C led to low payoffs, in the worst case to a payoff of
–$34.55, whereas small investments in Asset C result in higher payoffs.
The local maximum with a payoff of $32.48 was obtained when investing
29% in Asset B and 21% in Asset C (cf. 28% and 19%, respectively, with
a payoff of $32.82 in Study 1), whereas the global maximum with a payoff
of $41.15 was reached when investing 12% in Asset B and 88% in Asset
C (the same allocation led to the global payoff maximum of $34.46 in
Study 1). From random choice, an average payoff of $17.44 could be
expected. The payoff functions yielded a difference of $8.67 and a Euclid-
ean distance of 79 between the allocations corresponding to the local and
global payoff maximum.
The instructions for the task in Study 2 were identical to those used in
Study 1.
Results
As in Study 1, first we analyze a potential learning effect before
the two learning models are compared and more specific charac-
teristics of the learning process are considered.
Learning effects. In the 1st trial, the average allocation con-
sisted of an investment of 26% in Asset B (SD  12%), which
increased to an average investment of 48% in Asset B (SD  27%)
in the 100th trial. The investment in Asset C decreased from 27%
(SD  13%) in the first trial to 22% (SD  12%) in the 100th trial.
As in Study 1, participants in Study 2 had the tendency in the first
trial to invest slightly more in Asset A, which guaranteed a fixed
return. The allocation in the first trial substantially differed from
Figure 5. The payoff function for the total payoff of the allocation
problem in Study 2.
1075 LEARNING TO ALLOCATE RESOURCESthat in the 100th trial with a Euclidean distance of 41, t(19)  6.93,
p  .001, d  1.55.
To investigate any learning effect, the 100 trials of both phases
were aggregated into blocks of 10 trials (trial blocks). A repeated
measure ANOVA was conducted, with the obtained payoff as the
dependent variable, the trial blocks and the two phases of 100 trials
as two within-subject factors, and the order in which the payoff
functions were assigned to the assets as a between-subjects factor.
A strong learning effect was documented, as the average ob-
tained payoff of $25 in the first block (SD  3.6) increased
substantially across the 100 trials to an average payoff of $34
(SD  4.7) in the last block, F(9, 10)  4.09, p  .019, 2
 0.79.
In addition, there was a learning effect between the two phases, as
participants on average did better in the second phase (M  30,
SD  3.7 vs. M  33, SD  4.5), F(1, 18)  8.59, p  .009, 2

0.32. In contrast to Study 1, the interaction between trial blocks
and the two phases was not significant, F(9, 10)  2.24, p  .112,
2
 0.67. The order in which the payoff functions were assigned
to Asset B and Asset C had no effect on the average payoffs
(therefore, for simplicity, in the following, the investments in
Assets B and C are interchanged for half of the participants). No
other interactions were observed.
Model comparison. How well did the two learning models
predict participants’ allocations across the first 100 trials? For
Study 2, no parameter values were estimated. Instead, our testing
approach consisted of simulating a large number of agents with the
models’ parameter values randomly selected from normal distri-
butions, with the means and standard deviations of the parameter
values derived from the fitting process of Study 1. Finally, the
models’ fits were assessed by calculating the mean squared error
(MSE) of the average observed and average predicted allocations
(the deviation between two allocations is defined by the Euclidean
distance).
Figure 6 shows the development of the average allocation of the
participants across all 100 trials. In addition, the figure shows the
predicted average allocation of both learning models. The LOCAD
learning model better describes the development of the allocations
across the 100 trials, and MSE equals 39. In contrast, the GLOS
learning model less appropriately describes the learning process,
with an MSE of the predicted and observed average allocation of
117. GLOS underestimates the magnitude of the learning effect for
the allocation task.
Characteristics of the learning process. Does the LOCAD
learning model predict individualistic characteristics of the learn-
ing process more suitably than the GLOS model? Figure 7A shows
again for Study 2 the proportion of allocations across all trials that
correspond to the allocations that led to the local or global payoff
maximum (with a tolerated deviation of  5%). Similar to Study 1,
the proportion of participants that made allocations according to
the local or global maximum increased substantially through learn-
ing across the 100 trials. However, again only a small number of
participants (20%) finally found the allocation corresponding to
the global payoff maximum, whereas a larger proportion (40%) got
stuck at the allocation corresponding to the local payoff maximum.
This result was again predicted by the LOCAD learning model.
Although both models underestimate the proportion of allocations
according to the local or global payoff maximum, the predicted
proportions by LOCAD were closer to the observed data.
Through learning, participants were able to increase their payoff
over the 100 trials (see Figure 7B). Both models underestimated
the payoff increase, but LOCAD’s prediction was closer to the
observed payoff increase than GLOS’s prediction.
The effect of training on the magnitude with which the partic-
ipants changed their decisions was similar to Study 1 (see Figure
8A), starting with an average magnitude of a Euclidean distance of
29 for the first 10 trials and ending with an average magnitude of
5 for the last 10 trials. Although both models underestimated the
decline in the magnitude with which decisions were adapted, the
predictions of LOCAD come closer to the observed development.
Similar to Study 1, an association between allocations’ direc-
tions and their success was observed for the participants’ deci-
sions: For 74% of all allocations in the same direction as the
preceding allocation (angles between 0o
and 30o
), the preceding
allocation was successful, compared with only 35% of all alloca-
tions made in an opposite direction (angles between 150o
and
180o
; see Figure 8B).
An even stronger association was predicted by the LOCAD
model: For 92% of all allocations made in the same direction as the
preceding allocation, the preceding allocation was successful,
compared with 20% of all allocations made in an opposite direc-
tion. In contrast, the GLOS model predicted a weak association:
For 61% of all allocations made in the same direction as the
preceding allocation, the preceding allocation was successful,
compared with 46% of all allocations made in an opposite direc-
tion. The proportions of successful preceding allocations for the
different angles were strongly correlated with both models’ pre-
dictions (r  .93 for LOCAD and r  .92 for GLOS).
Figure 6. Average participants’ allocations and average predictions of the
two learning models when simulating 50,000 agents. The figure shows a
moving average of nine trials, such that for each trial the average of the
present allocation and the preceding and succeeding four allocations are
presented. (Note for the first 4 trials, the moving average is determined by
five to eight trials.) GLOS  global search model; LOCAD  local
adaptation model. Solid diamonds represent Real Asset B; solid triangles
represent Real Asset C; solid lines represent GLOS Asset B; hatched lines
represent GLOS Asset C; open squares represent LOCAD Asset B; and
open triangles represent LOCAD Asset C.
1076 RIESKAMP, BUSEMEYER, AND LAINESummary of Study 2
Study 2 illustrates the robustness of the findings from Study 1.
Although the payoff difference between the local and global payoff
maximum was substantially increased, only a small proportion of
participants were able to find the global maximum, whereas many
participants got stuck at the local maximum. Such a result is consis-
tent with the main learning mechanism of the LOCAD learning
model, which better predicted the observed learning process for the
allocation problem compared with the GLOS learning model.
Of course, one aspect of the payoff function that influences the
difficulty with which the local or global payoff maxima can be
detected is their localizations in the search space of possible
allocations. The allocation corresponding to the local payoff max-
imum was located near the center of the search space, that is, near
an allocation with an equal share invested in all three assets. In
contrast, the allocation producing the global payoff maximum was
located at the border of the search space, that is, an allocation with
disproportional investments in the different assets. If people tend
Figure 7. Individual characteristics of the decision process in Study 2. A: Percentage of allocations corre-
sponding to the local or global payoff maximum across all trials (with a tolerated deviation of  5% from the
allocations that lead to the global or local maximum), presented with a moving average of nine trials. B: Average
payoff across all trials, presented with a moving average of nine trials. GLOS  global search model; LOCAD 
local adaptation model.
1077 LEARNING TO ALLOCATE RESOURCESto start with evenly distributed investments in all three assets and
if they follow a learning process as predicted by the LOCAD
model, they should frequently get stuck at the local payoff maxi-
mum. In contrast, one could imagine a payoff function for which
the positions of the allocations corresponding to the local and
global payoff maxima were interchanged. For such a payoff func-
tion, the majority of participants would presumable find the global
payoff maximum. However, such a function would not allow
discrimination between the predictions of the two learning models
and was therefore not used.
In summary, the results that many participants got stuck at the
local payoff maximum in both of our studies is a result of the
Figure 8. Characteristics of the decision process in Study 2. A: Average magnitude of changes (step size)
measured with the Euclidean distance between the allocations of successive trials (with possible values ranging
from 0 to 141), presented with a moving average of nine trials. B: The angles between allocations’ directions
compared with the direction of preceding allocations were determined and categorized in six intervals. For each
category, the percentage of successful preceding allocations (i.e., those leading to a higher payoff than the
allocations before) are presented. GLOS  global search model; LOCAD  local adaptation model.
1078 RIESKAMP, BUSEMEYER, AND LAINEpayoff function used and can be predicted with the proposed
LOCAD learning model. The generalization test of the learning
models in Study 2 was more substantial than that in Study 1,
because no parameter values were fitted to the data; instead the
models predicted independent behavior of a different decision
problem.
Discussion
Recently, several learning theories for decision-making prob-
lems have been proposed (e.g., Bo ¨ rgers & Sarin, 1997; Busemeyer
& Myung, 1992; Camerer & Ho, 1999a, 1999b; Erev & Roth,
1998; Selten & Sto ¨cker, 1986; Stahl, 1996). Most of these learning
theories build on the basic idea that people do not solve a problem
from scratch but adapt their behavior on the basis of experience.
The theories differ according to the learning mechanism that
people apply, that is, their assumptions about cognitive processes.
The reinforcement-learning model proposed by Erev and Roth
(1998) and the experience-weighted attraction learning model pro-
posed by Camerer and Ho (1999a, 1999b) in general belong to the
class of global search models. These models assume that all
possible decision alternatives can be assigned an overall evalua-
tion. Whereas the evaluation for the reinforcement-learning model
only depends on the experienced consequences of past decisions,
the experience-weighted attraction model additionally can take
hypothetical consequences and foregone payoffs into account.
Both models make the assumption that people integrate their
experience for an overall evaluation, and alternatives that are
evaluated positively are more likely to be selected.
The other approach—local adaptation models—does not as-
sume that people necessarily acquire a global representation of the
consequences of the available decision alternatives through learn-
ing. Instead, the hill-climbing model by Busemeyer and Myung
(1987) and the learning direction theory of Selten and Sto ¨cker
(1986) assume that decisions are adapted locally, so that a preced-
ing decision might be slightly modified according to its success or
failure.
Busemeyer and Myung (1992) suggested that models in the
global search class may be applicable to situations in which the
decision alternatives form a small set of qualitatively different
strategies, whereas models in the local adaptation class may be
applicable in situations in which the decision alternatives form a
continuous metric space of strategies. Global search models have
been successfully applied to constant-sum games, in which there
are only a small number of options. The purpose of this research
was to examine learning processes in a resource allocation task,
which provides a continuous metric space of strategies.
A new version of the global search model, called the GLOS
model, and a new version of the local adaptation model, called the
LOCAD model, were developed for this task. These two models
were the best representations of the two classes that we constructed
for the resource allocation task. The models were compared in two
different studies. In the first study, the model parameters were
estimated separately for each participant, and the model fits were
compared with the individual data. In the second study, we used
the estimated parameters from the first study to generate a priori
predictions for a new payoff condition, and the predictions of the
models were compared with the mean learning curves.
In both studies, the resource allocation task consisted of repeat-
edly allocating a capital resource to different financial assets. The
task was difficult because the rates of return were unknown for two
assets, the rates of return depended in a nonlinear manner on the
amount invested in the assets, and the number of allocation alter-
natives was quite large. However, because any investment led to a
deterministic return, it was always obvious which of two alloca-
tions performed better after the payoffs for these allocation alter-
natives were presented. Therefore, the essence of the task that the
participants faced in both studies consisted of a search problem for
a good allocation alternative. Given that the participants were
provided with a large number of trials, finding the best possible
allocation alternative was possible. However, it turned out that the
majority of participants did not find the best possible allocation
corresponding to the global payoff maximum but became dis-
tracted by the local payoff maximum. Nevertheless, a substantial
learning process was observed: At the beginning of the task there
was a tendency to allocate an equal proportion of the resource to
all three assets with a slightly larger proportion invested in the
asset that guaranteed a fixed return. These allocations led to
relatively low average payoffs, which then increased substantially
over the 100 trials through learning. This learning process can be
characterized by substantial changes of allocations at the begin-
ning of the task, which then declined substantially over time. The
direction in which the allocations were changed depended strongly
on the success of previous changes, characterizing a directional
learning process.
These central findings correspond to the learning principles of
the local adaptation model. Therefore, it is not surprising that the
local adaptation model reached a better fit compared with the
global search model in predicting individuals’ allocations in both
studies. In Study 1, when fitting both models to each individual
separately, LOCAD reached a slightly better fit in describing the
learning process. In Study 2, the a priori predicted average allo-
cations by the LOCAD model (see Figure 6) properly described
the observed average allocation across 100 trials, corresponding to
a smaller MSE for LOCAD compared with GLOS. Given that in
Study 2 the payoff function differed substantially from the payoff
function of Study 1, these results provide strong empirical support
for LOCAD.
The appropriateness of LOCAD to describe the learning process
is also supported by individual characteristics of the process. In
Study 1, the LOCAD model, compared with GLOS, more accu-
rately predicted the magnitude with which successive allocations
were changed. In contrast, the other three individual characteristics
of the learning process are equally well described by the two
models in Study 1. This result changes substantially when turning
to Study 2; here the LOCAD model also more suitably described
the development of payoffs and the development of the number of
allocations corresponding to the local and global payoff maximum.
Unexpectedly, in both studies, the association between the direc-
tion of allocations and the success of previous allocations was
appropriately described by the LOCAD model as well as the
GLOS model.
Why is it that the LOCAD model, compared with the GLOS
model, better describes the learning process in the resource allo-
cation task? Although the predictions of the two models can be
similar with respect to specific aspects, the learning principles of
the models are quite different. The learning principles of LOCAD
seem to correspond more accurately to individuals’ behavior for
this task. According to LOCAD, starting with a specific allocation,
new allocations are made in the same direction as the direction of
1079 LEARNING TO ALLOCATE RESOURCESthe preceding successful allocation. Although this learning princi-
ple is very effective at improving allocations, it can lead to the
result of missing the global maximum, as decisions have to be
changed substantially to find the global maximum. Yet this result
is exactly what was found in both studies. In contrast, the GLOS
model eventually found a global payoff maximum, especially
when experience made at the beginning of a learning process was
not given too strong a weight. In this case the GLOS model
selected all different kinds of allocations and eventually at some
point also selected allocations corresponding to the global payoff
maximum, for which it then developed a preference. However,
given that most participants did not find the global payoff maxi-
mum, when fitting the GLOS model to the data, parameter values
were selected so that the model would not lead to a convergence to
the global payoff maximum. However, with these parameter val-
ues, the model also does not converge frequently to any allocation,
so that it still does not predict the convergence to the local payoff
maximum, which was found for most participants.
To what extent can the results of the present studies be gener-
alized to different learning models? The two models that we
implemented are the best examples of the two approaches of
learning models we found. Both were supported by past research
and both were directly compared in previous theoretical analyses
(see, e.g., Erev, 1998). More important, we also compared many
variations of each model, although because of space limitations,
we only present the results for the best version of each model.
Nevertheless, our conclusions are supported in the sense that no
variation of the GLOS model outperformed the LOCAD model
that we present here, and instead, all the variations did worse than
the GLOS model that we present here. Furthermore, because of the
flexibility of the implemented models, that is, their free parame-
ters, we doubt that slight modifications of the presented models
would lead to substantially different results that would challenge
our claim that the LOCAD learning model is better to predict the
learning process for the resource allocation problem.
To what extent did Study 2 provide a fair test of the two models?
The answer, we argue, is more than fair. First, both types of
learning models have been applied in previous theoretical analyses
to resource allocation tasks similar to the one used in Study 1 (Erev
& Gopher, 1999). Thus, there is no reason to claim that Study 1
does not provide a suitable test ground. In the second study, we
simply increased the difference between the local and global
maxima, which encouraged more participants to find the global
maximum. This manipulation actually favors the GLOS model
because the a priori tendency for the LOCAD model is to be
attracted to the local maximum. Thus, the second study provided
the best possible a priori chance for the GLOS model to outper-
form the LOCAD model in the generalization test.
To what extent can the results of the present studies be gener-
alized to other decision problems? It should be emphasized that the
current conclusions are restricted to the decision problem we
considered. We expect that in similar decision problems that
provide a large number of strategies that form a natural order, the
LOCAD model would better describe the learning process. In such
situations, people can form a hypothesis about the underlying
causal structure of the decision process that enables a directed
learning process. For example, when deciding how much to invest
in a repeated public-good game, a local adaptation learning process
might occur.
However, there are many situations for which global search
learning models describe learning processes better. For example,
there is a large amount of empirical evidence that global search
models appropriately describe the learning process for constant-
sum games with a small number of actions (Erev & Roth, 1998).
In a constant-sum game, no possibility exists for the players to
increase the mutual payoff by “cooperation.” The prediction from
game theory asserts that the different decision strategies (options)
should be selected with a particular probability. In such a situation,
there are only a small number of categorically different alterna-
tives, making it difficult to apply a local adaptation model, because
the set of alternatives provides no natural order to define directions
for changes in strategies.
The present article demonstrates a rigorous test of two learning
models representing two approaches in the recent learning litera-
ture. It also provides an illustration that learning often does not
lead to optimal outcomes as claimed, for example, by Simon
(1990) or Selten (1991). Yet, people improve their decisions
substantially through learning: For example, even when individu-
als start with a suboptimal decision of allocating an equal share to
the different assets, they quickly change their decision by making
allocations that produce higher payoffs. This learning process can
be described by the local adaptation learning model, which is
commonly characterized by high efficiency but can lead to sub-
optimal outcomes. For other domains, other learning mecha-
nism(s) might govern behavior, and each learning model might
have its own domain in which it works well. Identifying these
domains is a promising enterprise.
References
Ball, C. T., Langholtz, H. J., Auble, J., & Sopchak, B. (1998). Resource-
allocation strategies: A verbal protocol analysis. Organizational Behav-
ior & Human Decision Processes, 76, 70–88.
Benartzi, S., & Thaler, R. H. (2001). Naive diversification strategies in
defined contribution saving plans. American Economic Review, 91,
79–98.
Bo ¨ rgers, T., & Sarin, R. (1997). Learning through reinforcement and
replicator dynamics. Journal of Economic Theory, 77, 1–14.
Brennan, M. J., Schwartz, E. S., & Lagnado, R. (1997). Strategic asset
allocation. Journal of Economic Dynamics & Control, 21, 1377–1403.
Brown, G.W. (1951). Iterative solution of games by fictitious play, In T. C.
Koopmans (Ed.), Activity analysis of production and allocation (pp.
374–376). New York: Wiley.
Busemeyer, J. R., & Myung, I. J. (1987). Resource allocation decision-
making in an uncertain environment. Acta Psychologica, 66, 1–19.
Busemeyer, J. R., & Myung, I. J. (1992). An adaptive approach to human
decision-making: Learning theory, decision theory, and human perfor-
mance. Journal of Experimental Psychology: General, 121, 177–194.
Busemeyer, J. R., Swenson, K., & Lazarte, A. (1986). An adaptive ap-
proach to resource allocation. Organizational Behavior & Human De-
cision Processes, 38, 318–341.
Busemeyer, J. R., & Wang, Y.-M. (2000). Model comparisons and model
selections based on generalization criterion methodology. Journal of
Mathematical Psychology, 44, 171–189.
Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning. New
York: Wiley.
Camerer, C., & Ho, T.-H. (1999a). Experience-weighted attraction learning
in games: Estimates from weak-link games. In D. V. Budescu & I. Erev
(Eds.), Games and human behavior: Essays in honor of Amnon Rap-
oport (pp. 31–51). Mahwah, NJ: Erlbaum.
Camerer, C., & Ho, T.-H. (1999b). Experience-weighted attraction learning
in normal form games. Econometrica, 67, 827–874.
1080 RIESKAMP, BUSEMEYER, AND LAINECheung, Y.-W., & Friedman, D. (1997). Individual learning in normal form
games: Some laboratory results. Games & Economic Behavior, 19,
46–76.
Dorfman, D. D., Saslow, C. F., & Simpson, J. C. (1975). Learning models
for a continuum of sensory states reexamined. Journal of Mathematical
Psychology, 12, 178–211.
Erev, I. (1998). Signal detection by human observers: A cutoff
reinforcement-learning model of categorization decisions under uncer-
tainty. Psychological Review, 105, 280–298.
Erev, I., & Gopher, D. (1999). A cognitive game-theoretic analysis of
attention strategies, ability, and incentives. In D. Gopher & A. Koriat
(Eds.), Attention and performance XVII: Cognitive regulation of perfor-
mance. Interaction of theory and application (pp. 343–371). Cambridge,
MA: MIT Press.
Erev, I., & Roth, A. E. (1998). Predicting how people play games: Rein-
forcement learning in experimental games with unique, mixed strategy
equilibria. American Economic Review, 88, 848–881.
Estes, W. K. (1950). Toward a statistical theory of learning. Psychological
Review, 57, 94–107.
Fudenberg, D., & Levine, D. K. (1995). Consistency and cautious fictitious
play. Journal of Economic Dynamics & Control, 19, 1065–1089.
Gingrich, G., & Soli, S. D. (1984). Subjective evaluation and allocation of
resources in routine decision-making. Organizational Behavior & Hu-
man Decision Processes, 33, 187–203.
Harley, C. B. (1981). Learning the evolutionary stable strategy. Journal of
Theoretical Biology, 89, 611–633.
Langholtz, H. J., Ball, C., Sopchak, B., & Auble, J. (1997). Resource-
allocation behavior in complex but commonplace tasks. Organizational
Behavior & Human Decision Processes, 70, 249–266.
Langholtz, H., Gettys, C., & Foote, B. (1993). Resource-allocation behav-
ior under certainty, risk, and uncertainty. Organizational Behavior &
Human Decision Processes, 54, 203–224.
Langholtz, H., Gettys, C., & Foote, B. (1994). Allocating resources over
time in benign and harsh environments. Organizational Behavior &
Human Decision Processes, 58, 28–50.
Langholtz, H., Gettys, C., & Foote, B. (1995). Are resource fluctuations
anticipated in resource allocation tasks? Organizational Behavior &
Human Decision Processes, 64, 274–282.
Luce, R. D. (1959). Individual choice behavior. New York: Wiley.
Nelder, J. A., & Mead, R. (1965). A simplex method for function mini-
mization. Computer Journal, 7, 308–313.
Northcraft, G. B., & Neale, M. A. (1986). Opportunity costs and the
framing of resource allocation decisions. Organizational Behavior &
Human Decision Processes, 37, 348–356.
Roth, A. E., & Erev, I. (1995). Learning in extensive-form games: Exper-
imental data and simple dynamic models in the intermediate term.
Games & Economic Behavior, 8, 164–212.
Russel, S. J., & Norvig, P. (1995). Artificial intelligence. Englewood Cliffs,
NJ: Prentice Hall.
Selten, R. (1991). Evolution, learning, and economic behavior. Games &
Economic Behavior, 3, 3–24.
Selten, R. (1998). Axiomatic characterization of the quadratic scoring
rules. Experimental Economics, 1, 43–62.
Selten, R., & Sto ¨cker, R. (1986). End behavior in sequences of finite
prisoner’s dilemma supergames: A learning theory approach. Journal of
Economic Behavior & Organization, 7, 47–70.
Simon, H. A. (1990). Invariants of human behavior. Annual Review of
Psychology, 41, 1–19.
Smith, V. L. (1982). Microeconomic systems as an experimental science.
American Economic Review, 72, 923–955.
Stahl, D. O. (1996). Boundedly rational rule learning in a guessing game.
Games & Economic Behavior, 16, 303–330.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An intro-
duction. Cambridge, MA: MIT Press.
Thomas, E. A. C. (1973). On a class of additive learning models: Error
correcting and probability matching. Journal of Mathematical Psychol-
ogy, 10, 241–264.
Appendix
Payoff Functions Used in Study 1 and Study 2
In Study 1 the payoff functions were defined as follows: The first
allocation asset produced a fixed rate or return of 10% (with the payoff
function uA( pA)  0.1pA
 0.1R with pA  [0,1] as the percent of the
resource R invested in asset A.) For the other two allocation assets, the rate
of return varied with the amount invested in the asset. For asset B, the
payoff function was defined as uB( pB, pA)  10-0.1pA
 R 40

[sin(3.2
 ( pB-0.781)-9)/(3.2
 ( pB-0.781)-9)] with pB, pA  [0,1].
For asset C, the payoff function was defined as uC( pC)  5 [4R

sin(1.1
 ( pC-0.781)-24.6)/(1.1
 ( pC-0.781)-24.6)] with pC  [0,1].
In Study 2 the payoff functions were defined as follows: The payoff
function for asset A was identical to the one used in Study 1. For asset B
the payoff function was defined as uB( pB, pA)  6 – 0.2pA
 R 80

[sin(3.2
 ( pB – 0.781) –9)/(3.2
 ( pB – 0.781) –9)] with pB, pA 
[0,1] and for the third asset C the payoff function was defined as uC( pC)
–4 [8R
 sin(1.1
 ( pC-0.781)-24.6)/(1.1
 ( pC-0.781)-24.6)] with
pC  [0,1].
Received October 22, 2002
Revision received March 26, 2003
Accepted March 30, 2003 
1081 LEARNING TO ALLOCATE RESOURCESCopyright of Journal of Experimental Psychology. Learning, Memory & Cognition is the property of American
Psychological Association and its content may not be copied or emailed to multiple sites or posted to a listserv
without the copyright holder's express written permission. However, users may print, download, or email
<标题> articles for individual use.

tag:

Contact us / 聯系我們

QQ: 273427
QQ: 273427

Online Service / 在線客服

Hours / 服務時間
全天24小時為您服務

熱情 專業 誠信 守時
Copyright ? 2008-2018 assignment代寫

在線客服

售前咨詢
售后咨詢
微信號
Badgeniuscs
微信