Usenet.com

www.Usenet.com

Group Index

Comp Thread Archive from Usenet.com

<-- __Chronological__ --> <-- __Thread__ -->

Re: Bag of Tricks: Choosing actions via the matching law



"baylor" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> "Bryan" <[EMAIL PROTECTED]> wrote:
> > Totally agree. The wizard game is an "n-armed bandit problem", where the
> > analogy is in putting your money in a set of slot machines where you
must
> > learn the payoff over a sequence of tries. Perhaps read:
>
> Um, not exactly...
>
> Well, sorta. Here's the deal for people who aren't PhDs in AI. AI is
> an engineering discipline mostly concerned with finding the one true
> answer to a finite, non-changing problem using math. The goal in AI is
> to find perfect answers
>
> Psychology is a science that seeks to explain how things really work.
> For humans and anyone wanting to simulate humans (ie, games), the
> perfect answer isn't always the best answer and often isn't
> achievable. And recent studies have shown that, in real world
> settings, human decision making processes are often more successful
> than engineering solutions like GAs, ANNs, DTs, linear multiple
> regression, etc.
>
> So given an n-armed bandit (a slot machine), infinite time, infinite
> resources, no penalty for play and no change in anything, you could
> get accurate probability distributions using Q-learning or, if you
> have a model, TDL or ADP
>
> However, in the real world, you have limits - you don't have that much
> money, you can't use all the machines (lest you get evicted from the
> casino), the machines are moved weekly and some of the networked ones
> change probabilities every few hours. So in real world settings for
> common tasks, the AI approach wouldn't work. But if you applied an RL
> to a domain you had full control over and which was small/simple
> enough or given infinite resources, you could use it
>
> Back to the point, i'm assuming the point of a game is to have an NPC
> wizard act like another human. If the wizard has an "AI" brain, he'll
> generate very artificial behaviors which make it obvious that, while
> it might look like a human, it's really just another damn piece of
> software. Whereas if you use the psychology stuff (study of how humans
> work, not how to get the "best" answers), the NPC could be as "stupid"
> as the player or opponents you might find online
>
> So Brian is right that this could be dealt with using AI (the field)
> techniques, but that wouldn't result in AI (software acting human)
>
> -b

The point you made about the problem dynamically changing is a good one, and
I think this explains well why Falk and I saw the matching law as
inconsistent with the strategy of reducing exploration and sticking to the
best option as time goes on. It would be nice if there were results for the
n-armed bandit strategies using payoff probabilities that vary over time
instead of being fixed. (I suppose this is what the basketball players are
up to when they encounter different games with different opponents). In this
situation, the amount of exploration should reach a plateau according to the
rate of change of the environment. I wonder if these results would be
consistent with the matching law? If this is the case, we would have both AI
researchers and experimental psychologists converging on the same law.

Being no expert i don't have an answer to this. While there doesn't seem to
be a provably optimal solution for the static version of the problem, one
popular strategy is to use a Boltzmann Distribution
(http://www-anw.cs.umass.edu/~rich/book/2/node4.html) over the actions. This
formula has a temperature parameter that can be used to vary between random
selection of actions and selection of the current best action. In the
dynamic problem, one would expect this temperature to be controlled so as to
start high and converge on a plateau somewhere. The Boltzmann formula
"looks" somewhat similar to the matching law but it is not the same. Whether
there are experimental comparisons out there, who knows.











<-- __Chronological__ --> <-- __Thread__ -->


Usenet.com




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.