Usenet.com

www.Usenet.com

Group Index

Comp Thread Archive from Usenet.com

<-- __Chronological__ --> <-- __Thread__ -->

Bag of Tricks: Choosing actions via the matching law



Problem Area: action selection
     AI Type: decision making, learning (secondary), personality
(secondary)
Detail Level: mid-level
   Technique: matching law
 Assumptions: Options are relatively equal
Example Uses: sports: choosing a shot type
              FPS: choosing a weapon
              RPG: choosing a spell
              RTS: choosing a build unit type

Explanation
-----------
In 1962, psychologist Herrnstein discovered that, when an animal is
presented with two options, the relative rate of responding (choosing)
of a given option is equal to the relative rate of reinforcement
(success) of that option. This was called the matching law and written
as RA/RA+RB = rA/rA+rB. In 1974, psychologist Baum refined the formula
by adding the notion of bias and sensitivity, resulting in the new
equation RA/RB = b(rA/rB)^s. The matching law has been repeatedly
upheld in human and non-human animal studies

In 2000, Vollmer and Bourret studied the shot selection of 26 college
basketball players. They found that how often a player attempted a
three-point shot or a two-point shot was proportional to the relative
rates of reinforcement (success rates) for each. If a player
successfully completed 10% more two-point shots than three-point
shots, the player would should 10% more two-point shots than
three-point shots


Variables
---------
RA/RA+RB = b(rA/rA+rB)^s = matching law
RA       = Rate of response for option A. 
           How often option A is chosen.
           This is a counter
RA/RA+RB = Relative rate of response for option A. 
           The percentage of time option A is chosen from options A&B
rA       = Rate of reinforcement for option A.
           The percentage of time choosing option A has lead to a good
result
rA/rA+rB = Relative rate of response for option A. 
           The percentage of overall successes that have come from
             choosing option A
b        = Response bias.
           A preference for a given option or outcome. 
             b>1 means prefers
s        = Sensitivity.
           How much better an option must be to switch to it.
             s<1 means must be better than the other option(s). 
           A major cause for sensitivity is switching cost.
             In an FPS, switching cost would be the time spent
             unarmed while switching to the new weapon

Game Example
------------
A wizard is 30 meters away from a group of orcs. He has three third
level spells that are appropriate to use - flamestrike, iceblade and
stonestorm. Question: which spell should the wizard cast?

Assume that the wizard has successfully hit his enemies 3/10 times
with flamestrike, 2/4 times with iceblade and 6/7 times with
stonestorm.
  r(flamestrike) = 3/10 = 0.3  (30%)
  r(iceblade)    = 2/4  = 0.5  (50%)
  r(stonestorm)  = 6/7  = 0.86 (86%)

  relative r(flamestrike) = .3/.3+.5+.86  = .3/1.66  = 0.18 (18%)
  relative r(iceblade)    = .5/.3+.5+.86  = .5/1.66  = 0.30 (30%)
  relative r(stonestorm)  = .86/.3+.5+.86 = .86/1.66 = 0.52 (52%)

So the wizard would cast stonestorm 52% of the time, iceblade 30% of
the time and flamestrike 18% of the time


Personality. To see how this affects the personality of the character,
assume we have an NPC named Pyro the Flame Wizard. The NPC has no
special advantages with fire, he just likes how pretty it is. Pyro's
bias for fire-based spells is 1.25 (25% bias).
  relative r(flamestrike) = 1.25(.3/.3+.5+.86) = 1.25(.3/1.66) = 0.23
(23%)
Note that if the other options (iceblade and stonestorm) do not have
their biases downgraded accordingly, the percentages (.23, .30, .52)
will add up to 105%, not 100%. Given that this is, by definition,
impossible, all numbers would need to be normalized. We do this by
dividing each result by the overage (1.05)
  relative r(flamestrike) = 0.23/1.05 = 0.22 (22%)
  relative r(iceblade)    = 0.30/1.05 = 0.29 (29%)
  relative r(stonestorm)  = 0.52/1.05 = 0.50 (50%)
The numbers here add up to 101 because i did a lot of rounding, but if
you don't round they should add to 100%

Note that, even with the bias, Pyro the Flame Wizard still only casts
flamestrike 22% of the time. This is because he'd be suicidal or a
fool to cast it more than that given his poor track record with the
spell

Learning. Action selection is driven by two counters, numAttempted and
numAttemptsSuccessful. Because of that, the agents will constantly
adapt their actions based on feedback and their history


Limitations
-----------
- Learning via the matching law is slow. Does not smoothly accomodate
  rapid changes in skills such as happens in games when leveling up
  or aquiring a skill-enhancing weapon
- As stated, does not accomodate learning by observation (ie,
switching
  to a sniper rifle because the winner of the last four rounds used
  the sniper rifle)
- Modifications must be made to accomodate context-specific decisions
  (ie, tracking numAttempts at a per-enemy or per-enemyType level
rather than
  at a global level)

Notes
-----
- Bias and sensitivity are not learned, they are hard coded by the
designer
- Sensitivity should be situation specific. Sensitivity to switching 
  weapons (armor, etc.) should be normal (~1) in safe spots and should
  be lower when surrounded by enemies. The sensitivity of the current
  item should reflect the fact that there is no switching cost (ie,
s=1)



<-- __Chronological__ --> <-- __Thread__ -->


Usenet.com




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.