
www.Usenet.com
| <-- __Chronological__ --> | <-- __Thread__ --> |
"baylor" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > "Bryan" <[EMAIL PROTECTED]> wrote: > > Totally agree. The wizard game is an "n-armed bandit problem", where the > > analogy is in putting your money in a set of slot machines where you must > > learn the payoff over a sequence of tries. Perhaps read: > > Um, not exactly... > > Well, sorta. Here's the deal for people who aren't PhDs in AI. AI is > an engineering discipline mostly concerned with finding the one true > answer to a finite, non-changing problem using math. The goal in AI is > to find perfect answers > > Psychology is a science that seeks to explain how things really work. > For humans and anyone wanting to simulate humans (ie, games), the > perfect answer isn't always the best answer and often isn't > achievable. And recent studies have shown that, in real world > settings, human decision making processes are often more successful > than engineering solutions like GAs, ANNs, DTs, linear multiple > regression, etc. > > So given an n-armed bandit (a slot machine), infinite time, infinite > resources, no penalty for play and no change in anything, you could > get accurate probability distributions using Q-learning or, if you > have a model, TDL or ADP > > However, in the real world, you have limits - you don't have that much > money, you can't use all the machines (lest you get evicted from the > casino), the machines are moved weekly and some of the networked ones > change probabilities every few hours. So in real world settings for > common tasks, the AI approach wouldn't work. But if you applied an RL > to a domain you had full control over and which was small/simple > enough or given infinite resources, you could use it > > Back to the point, i'm assuming the point of a game is to have an NPC > wizard act like another human. If the wizard has an "AI" brain, he'll > generate very artificial behaviors which make it obvious that, while > it might look like a human, it's really just another damn piece of > software. Whereas if you use the psychology stuff (study of how humans > work, not how to get the "best" answers), the NPC could be as "stupid" > as the player or opponents you might find online > > So Brian is right that this could be dealt with using AI (the field) > techniques, but that wouldn't result in AI (software acting human) > > -b The point you made about the problem dynamically changing is a good one, and I think this explains well why Falk and I saw the matching law as inconsistent with the strategy of reducing exploration and sticking to the best option as time goes on. It would be nice if there were results for the n-armed bandit strategies using payoff probabilities that vary over time instead of being fixed. (I suppose this is what the basketball players are up to when they encounter different games with different opponents). In this situation, the amount of exploration should reach a plateau according to the rate of change of the environment. I wonder if these results would be consistent with the matching law? If this is the case, we would have both AI researchers and experimental psychologists converging on the same law. Being no expert i don't have an answer to this. While there doesn't seem to be a provably optimal solution for the static version of the problem, one popular strategy is to use a Boltzmann Distribution (http://www-anw.cs.umass.edu/~rich/book/2/node4.html) over the actions. This formula has a temperature parameter that can be used to vary between random selection of actions and selection of the current best action. In the dynamic problem, one would expect this temperature to be controlled so as to start high and converge on a plateau somewhere. The Boltzmann formula "looks" somewhat similar to the matching law but it is not the same. Whether there are experimental comparisons out there, who knows.
| <-- __Chronological__ --> | <-- __Thread__ --> |
Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.