
www.Usenet.com
| <-- __Chronological__ --> | <-- __Thread__ --> |
Problem Area: action selection
AI Type: decision making, learning (secondary), personality
(secondary)
Detail Level: mid-level
Technique: matching law
Assumptions: Options are relatively equal
Example Uses: sports: choosing a shot type
FPS: choosing a weapon
RPG: choosing a spell
RTS: choosing a build unit type
Explanation
-----------
In 1962, psychologist Herrnstein discovered that, when an animal is
presented with two options, the relative rate of responding (choosing)
of a given option is equal to the relative rate of reinforcement
(success) of that option. This was called the matching law and written
as RA/RA+RB = rA/rA+rB. In 1974, psychologist Baum refined the formula
by adding the notion of bias and sensitivity, resulting in the new
equation RA/RB = b(rA/rB)^s. The matching law has been repeatedly
upheld in human and non-human animal studies
In 2000, Vollmer and Bourret studied the shot selection of 26 college
basketball players. They found that how often a player attempted a
three-point shot or a two-point shot was proportional to the relative
rates of reinforcement (success rates) for each. If a player
successfully completed 10% more two-point shots than three-point
shots, the player would should 10% more two-point shots than
three-point shots
Variables
---------
RA/RA+RB = b(rA/rA+rB)^s = matching law
RA = Rate of response for option A.
How often option A is chosen.
This is a counter
RA/RA+RB = Relative rate of response for option A.
The percentage of time option A is chosen from options A&B
rA = Rate of reinforcement for option A.
The percentage of time choosing option A has lead to a good
result
rA/rA+rB = Relative rate of response for option A.
The percentage of overall successes that have come from
choosing option A
b = Response bias.
A preference for a given option or outcome.
b>1 means prefers
s = Sensitivity.
How much better an option must be to switch to it.
s<1 means must be better than the other option(s).
A major cause for sensitivity is switching cost.
In an FPS, switching cost would be the time spent
unarmed while switching to the new weapon
Game Example
------------
A wizard is 30 meters away from a group of orcs. He has three third
level spells that are appropriate to use - flamestrike, iceblade and
stonestorm. Question: which spell should the wizard cast?
Assume that the wizard has successfully hit his enemies 3/10 times
with flamestrike, 2/4 times with iceblade and 6/7 times with
stonestorm.
r(flamestrike) = 3/10 = 0.3 (30%)
r(iceblade) = 2/4 = 0.5 (50%)
r(stonestorm) = 6/7 = 0.86 (86%)
relative r(flamestrike) = .3/.3+.5+.86 = .3/1.66 = 0.18 (18%)
relative r(iceblade) = .5/.3+.5+.86 = .5/1.66 = 0.30 (30%)
relative r(stonestorm) = .86/.3+.5+.86 = .86/1.66 = 0.52 (52%)
So the wizard would cast stonestorm 52% of the time, iceblade 30% of
the time and flamestrike 18% of the time
Personality. To see how this affects the personality of the character,
assume we have an NPC named Pyro the Flame Wizard. The NPC has no
special advantages with fire, he just likes how pretty it is. Pyro's
bias for fire-based spells is 1.25 (25% bias).
relative r(flamestrike) = 1.25(.3/.3+.5+.86) = 1.25(.3/1.66) = 0.23
(23%)
Note that if the other options (iceblade and stonestorm) do not have
their biases downgraded accordingly, the percentages (.23, .30, .52)
will add up to 105%, not 100%. Given that this is, by definition,
impossible, all numbers would need to be normalized. We do this by
dividing each result by the overage (1.05)
relative r(flamestrike) = 0.23/1.05 = 0.22 (22%)
relative r(iceblade) = 0.30/1.05 = 0.29 (29%)
relative r(stonestorm) = 0.52/1.05 = 0.50 (50%)
The numbers here add up to 101 because i did a lot of rounding, but if
you don't round they should add to 100%
Note that, even with the bias, Pyro the Flame Wizard still only casts
flamestrike 22% of the time. This is because he'd be suicidal or a
fool to cast it more than that given his poor track record with the
spell
Learning. Action selection is driven by two counters, numAttempted and
numAttemptsSuccessful. Because of that, the agents will constantly
adapt their actions based on feedback and their history
Limitations
-----------
- Learning via the matching law is slow. Does not smoothly accomodate
rapid changes in skills such as happens in games when leveling up
or aquiring a skill-enhancing weapon
- As stated, does not accomodate learning by observation (ie,
switching
to a sniper rifle because the winner of the last four rounds used
the sniper rifle)
- Modifications must be made to accomodate context-specific decisions
(ie, tracking numAttempts at a per-enemy or per-enemyType level
rather than
at a global level)
Notes
-----
- Bias and sensitivity are not learned, they are hard coded by the
designer
- Sensitivity should be situation specific. Sensitivity to switching
weapons (armor, etc.) should be normal (~1) in safe spots and should
be lower when surrounded by enemies. The sensitivity of the current
item should reflect the fact that there is no switching cost (ie,
s=1)
| <-- __Chronological__ --> | <-- __Thread__ --> |
Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.