
www.Usenet.com
| <-- __Chronological__ --> | <-- __Thread__ --> |
It would be really nice to verify this and other strategies in a simulated game of some sort on a common dataset. Or much better, might there be a proof that says this is the optimal strategy for the wizard game? [EMAIL PROTECTED] (baylor) wrote in message news:<[EMAIL PROTECTED]>... > Problem Area: action selection > AI Type: decision making, learning (secondary), personality > (secondary) > Detail Level: mid-level > Technique: matching law > Assumptions: Options are relatively equal > Example Uses: sports: choosing a shot type > FPS: choosing a weapon > RPG: choosing a spell > RTS: choosing a build unit type > > Explanation > ----------- > In 1962, psychologist Herrnstein discovered that, when an animal is > presented with two options, the relative rate of responding (choosing) > of a given option is equal to the relative rate of reinforcement > (success) of that option. This was called the matching law and written > as RA/RA+RB = rA/rA+rB. In 1974, psychologist Baum refined the formula > by adding the notion of bias and sensitivity, resulting in the new > equation RA/RB = b(rA/rB)^s. The matching law has been repeatedly > upheld in human and non-human animal studies > > In 2000, Vollmer and Bourret studied the shot selection of 26 college > basketball players. They found that how often a player attempted a > three-point shot or a two-point shot was proportional to the relative > rates of reinforcement (success rates) for each. If a player > successfully completed 10% more two-point shots than three-point > shots, the player would should 10% more two-point shots than > three-point shots > > > Variables > --------- > RA/RA+RB = b(rA/rA+rB)^s = matching law > RA = Rate of response for option A. > How often option A is chosen. > This is a counter > RA/RA+RB = Relative rate of response for option A. > The percentage of time option A is chosen from options A&B > rA = Rate of reinforcement for option A. > The percentage of time choosing option A has lead to a good > result > rA/rA+rB = Relative rate of response for option A. > The percentage of overall successes that have come from > choosing option A > b = Response bias. > A preference for a given option or outcome. > b>1 means prefers > s = Sensitivity. > How much better an option must be to switch to it. > s<1 means must be better than the other option(s). > A major cause for sensitivity is switching cost. > In an FPS, switching cost would be the time spent > unarmed while switching to the new weapon > > Game Example > ------------ > A wizard is 30 meters away from a group of orcs. He has three third > level spells that are appropriate to use - flamestrike, iceblade and > stonestorm. Question: which spell should the wizard cast? > > Assume that the wizard has successfully hit his enemies 3/10 times > with flamestrike, 2/4 times with iceblade and 6/7 times with > stonestorm. > r(flamestrike) = 3/10 = 0.3 (30%) > r(iceblade) = 2/4 = 0.5 (50%) > r(stonestorm) = 6/7 = 0.86 (86%) > > relative r(flamestrike) = .3/.3+.5+.86 = .3/1.66 = 0.18 (18%) > relative r(iceblade) = .5/.3+.5+.86 = .5/1.66 = 0.30 (30%) > relative r(stonestorm) = .86/.3+.5+.86 = .86/1.66 = 0.52 (52%) > > So the wizard would cast stonestorm 52% of the time, iceblade 30% of > the time and flamestrike 18% of the time > > > Personality. To see how this affects the personality of the character, > assume we have an NPC named Pyro the Flame Wizard. The NPC has no > special advantages with fire, he just likes how pretty it is. Pyro's > bias for fire-based spells is 1.25 (25% bias). > relative r(flamestrike) = 1.25(.3/.3+.5+.86) = 1.25(.3/1.66) = 0.23 > (23%) > Note that if the other options (iceblade and stonestorm) do not have > their biases downgraded accordingly, the percentages (.23, .30, .52) > will add up to 105%, not 100%. Given that this is, by definition, > impossible, all numbers would need to be normalized. We do this by > dividing each result by the overage (1.05) > relative r(flamestrike) = 0.23/1.05 = 0.22 (22%) > relative r(iceblade) = 0.30/1.05 = 0.29 (29%) > relative r(stonestorm) = 0.52/1.05 = 0.50 (50%) > The numbers here add up to 101 because i did a lot of rounding, but if > you don't round they should add to 100% > > Note that, even with the bias, Pyro the Flame Wizard still only casts > flamestrike 22% of the time. This is because he'd be suicidal or a > fool to cast it more than that given his poor track record with the > spell > > Learning. Action selection is driven by two counters, numAttempted and > numAttemptsSuccessful. Because of that, the agents will constantly > adapt their actions based on feedback and their history > > > Limitations > ----------- > - Learning via the matching law is slow. Does not smoothly accomodate > rapid changes in skills such as happens in games when leveling up > or aquiring a skill-enhancing weapon > - As stated, does not accomodate learning by observation (ie, > switching > to a sniper rifle because the winner of the last four rounds used > the sniper rifle) > - Modifications must be made to accomodate context-specific decisions > (ie, tracking numAttempts at a per-enemy or per-enemyType level > rather than > at a global level) > > Notes > ----- > - Bias and sensitivity are not learned, they are hard coded by the > designer > - Sensitivity should be situation specific. Sensitivity to switching > weapons (armor, etc.) should be normal (~1) in safe spots and should > be lower when surrounded by enemies. The sensitivity of the current > item should reflect the fact that there is no switching cost (ie, > s=1)
| <-- __Chronological__ --> | <-- __Thread__ --> |
Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.