Usenet.com

www.Usenet.com

Group Index

Comp Thread Archive from Usenet.com

<-- __Chronological__ --> <-- __Thread__ -->

Re: Bag of Tricks: Choosing actions via the matching law



It would be really nice to verify this and other strategies in a
simulated game of some sort on a common dataset. Or much better, might
there be a proof that says this is the optimal strategy for the wizard
game?

[EMAIL PROTECTED] (baylor) wrote in message news:<[EMAIL PROTECTED]>...
> Problem Area: action selection
>      AI Type: decision making, learning (secondary), personality
> (secondary)
> Detail Level: mid-level
>    Technique: matching law
>  Assumptions: Options are relatively equal
> Example Uses: sports: choosing a shot type
>               FPS: choosing a weapon
>               RPG: choosing a spell
>               RTS: choosing a build unit type
> 
> Explanation
> -----------
> In 1962, psychologist Herrnstein discovered that, when an animal is
> presented with two options, the relative rate of responding (choosing)
> of a given option is equal to the relative rate of reinforcement
> (success) of that option. This was called the matching law and written
> as RA/RA+RB = rA/rA+rB. In 1974, psychologist Baum refined the formula
> by adding the notion of bias and sensitivity, resulting in the new
> equation RA/RB = b(rA/rB)^s. The matching law has been repeatedly
> upheld in human and non-human animal studies
> 
> In 2000, Vollmer and Bourret studied the shot selection of 26 college
> basketball players. They found that how often a player attempted a
> three-point shot or a two-point shot was proportional to the relative
> rates of reinforcement (success rates) for each. If a player
> successfully completed 10% more two-point shots than three-point
> shots, the player would should 10% more two-point shots than
> three-point shots
> 
> 
> Variables
> ---------
> RA/RA+RB = b(rA/rA+rB)^s = matching law
> RA       = Rate of response for option A. 
>            How often option A is chosen.
>            This is a counter
> RA/RA+RB = Relative rate of response for option A. 
>            The percentage of time option A is chosen from options A&B
> rA       = Rate of reinforcement for option A.
>            The percentage of time choosing option A has lead to a good
> result
> rA/rA+rB = Relative rate of response for option A. 
>            The percentage of overall successes that have come from
>              choosing option A
> b        = Response bias.
>            A preference for a given option or outcome. 
>              b>1 means prefers
> s        = Sensitivity.
>            How much better an option must be to switch to it.
>              s<1 means must be better than the other option(s). 
>            A major cause for sensitivity is switching cost.
>              In an FPS, switching cost would be the time spent
>              unarmed while switching to the new weapon
> 
> Game Example
> ------------
> A wizard is 30 meters away from a group of orcs. He has three third
> level spells that are appropriate to use - flamestrike, iceblade and
> stonestorm. Question: which spell should the wizard cast?
> 
> Assume that the wizard has successfully hit his enemies 3/10 times
> with flamestrike, 2/4 times with iceblade and 6/7 times with
> stonestorm.
>   r(flamestrike) = 3/10 = 0.3  (30%)
>   r(iceblade)    = 2/4  = 0.5  (50%)
>   r(stonestorm)  = 6/7  = 0.86 (86%)
> 
>   relative r(flamestrike) = .3/.3+.5+.86  = .3/1.66  = 0.18 (18%)
>   relative r(iceblade)    = .5/.3+.5+.86  = .5/1.66  = 0.30 (30%)
>   relative r(stonestorm)  = .86/.3+.5+.86 = .86/1.66 = 0.52 (52%)
> 
> So the wizard would cast stonestorm 52% of the time, iceblade 30% of
> the time and flamestrike 18% of the time
> 
> 
> Personality. To see how this affects the personality of the character,
> assume we have an NPC named Pyro the Flame Wizard. The NPC has no
> special advantages with fire, he just likes how pretty it is. Pyro's
> bias for fire-based spells is 1.25 (25% bias).
>   relative r(flamestrike) = 1.25(.3/.3+.5+.86) = 1.25(.3/1.66) = 0.23
> (23%)
> Note that if the other options (iceblade and stonestorm) do not have
> their biases downgraded accordingly, the percentages (.23, .30, .52)
> will add up to 105%, not 100%. Given that this is, by definition,
> impossible, all numbers would need to be normalized. We do this by
> dividing each result by the overage (1.05)
>   relative r(flamestrike) = 0.23/1.05 = 0.22 (22%)
>   relative r(iceblade)    = 0.30/1.05 = 0.29 (29%)
>   relative r(stonestorm)  = 0.52/1.05 = 0.50 (50%)
> The numbers here add up to 101 because i did a lot of rounding, but if
> you don't round they should add to 100%
> 
> Note that, even with the bias, Pyro the Flame Wizard still only casts
> flamestrike 22% of the time. This is because he'd be suicidal or a
> fool to cast it more than that given his poor track record with the
> spell
> 
> Learning. Action selection is driven by two counters, numAttempted and
> numAttemptsSuccessful. Because of that, the agents will constantly
> adapt their actions based on feedback and their history
> 
> 
> Limitations
> -----------
> - Learning via the matching law is slow. Does not smoothly accomodate
>   rapid changes in skills such as happens in games when leveling up
>   or aquiring a skill-enhancing weapon
> - As stated, does not accomodate learning by observation (ie,
> switching
>   to a sniper rifle because the winner of the last four rounds used
>   the sniper rifle)
> - Modifications must be made to accomodate context-specific decisions
>   (ie, tracking numAttempts at a per-enemy or per-enemyType level
> rather than
>   at a global level)
> 
> Notes
> -----
> - Bias and sensitivity are not learned, they are hard coded by the
> designer
> - Sensitivity should be situation specific. Sensitivity to switching 
>   weapons (armor, etc.) should be normal (~1) in safe spots and should
>   be lower when surrounded by enemies. The sensitivity of the current
>   item should reflect the fact that there is no switching cost (ie,
> s=1)



<-- __Chronological__ --> <-- __Thread__ -->


Usenet.com




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.