Usenet.com

www.Usenet.com

Group Index

Comp Thread Archive from Usenet.com

<-- __Chronological__ --> <-- __Thread__ -->

Re: Expert System / Network Management



On Sun, 05 Oct 2003 02:09:14 GMT, Christoph Hahn <[EMAIL PROTECTED]>
wrote:
> I'm doing my master thesis in network management using an expert system. 
> I try to design a system that does somehow the following:
> - Scan the network
> - Find problems in the network (like Intrusion detection, wrong user 
> behavior, tech. problems) using an expert system.
> - Simulate the result in the network simulator (ns2).

Christopher, i'm a global architect for a big financial company and
could use a tool similar to what you're describing (both at the
network level and, much better, at the system/application level). So i
have money and need but haven't found much that does it well

There are expensive tools to detect problems. You can Google for "RCA"
and "root cause analysis". BMC's Patrol has a rule-based expert system
to detect problems and CA's Unicenter has Neugents. For monitoring
tools, Mercury's Topaz has some kind of RCA tool; i think it's
Bayesian based. BMC's R&D group also does a lot of work with people at
the University of Massachusets (Lowell)

As for you doing the above for a masters project, i highly doubt
you'll create what you're describing. It's too big. There are many
different problems and you'll need lots of different techniques to
catch them

To use a really simple example, let's say you have a network going to
a Web server to a network to an app server to a network to a DBMS
server. An app stops working. Can you tell which component broke? The
answer is to monitor traffic on each segment, which isn't terribly AI,
although you could use production rules if you really wanted to to run
the investigation (by the way, a tool named Tonic does something
similar to this). This diagnostic module would kick off when alerted
by a monitoring process

There are plenty of low-level problems you'd have to know about. For
example, say that you see an Ethernet packet that you know is corrupt
because the frame CRC is wrong. The header is full of wierd results.
In raw mode, they'd look something like 110110110 (i forget the exact
sequence; it's been a long time since i was allowed to get my hands
dirty). That happens to be the leading signature that comes before the
actual header. What does it mean to see that in a packet? It means a
late collision. And how would you get a late collision? Two ways -
your Ethernet segments are more than 990 feet (which means the bits
can't travel from one end to the other and back again in the time it
takes to broadcast all the header bits) and/or you have a bad
transciever (it can't detect collisions, which is necessary for a
CSMA/CD system). Putting it all together, you can take the information
and run it through a rule base to suggest a diagnosis, which in this
case requires equipment changes

But what about system crackers or bad employees? These people execute
perfectly legal commands, so how are you going to detect them? The
answer is that you need a way to make a profile of people's actions.
Sorta like the concept of Zipfian distributions in IR - some actions
are common, some are rare and some are common or rare given a certain
context. Testing a port is common but testing a port 200 times in 30
seconds is not. If certain actions are happening far more often than
is normal (and part of profiling is to determine what the proper
distribtion and frequency of certain actions are), flag it for further
review. And if a user accumulates lots of flags, start paging people.
Another option is to use a Markov chain. That's something that says
"if X happens, normally Y happens n% of the time, Z happens n2% of the
time and A happens n3% of the time". String them together like n-grams
and you can build a profile of certain user's behaviors. Which is a
very different approach than the approaches described above

And then, of course, there are general system faults that happen based
on system counters like CPU, cache hits, file node usage, disk space
usage, etc. When your system matches a certain profile, there might be
a problem. This one can probably be done best by running a decision
tree (CART or C4.8) against a bunch of data. Which it's unlikely you
have - few companies generate enough traffic to generate a good
sampling of problems. 

Predictive monitoring is a nice thing too. CPU was 10% then 40% then
70%. More than just raw numbers, there's a trend line here that, taken
to a linear conclusion, says you're going to run out of CPU soon. If
i'm going to run out of resources in an hour, i want to know. BUT,
suppose i told you the above numbers were for 10am, 11am and noon?
Since i know my volume maxes at lunch, the trend might not be linear
and is simply peaking at a nice number then dropping. At that point i
don't want to be alerted. And then there's seasonal fluctuations - my
volume really is higher in November and December and is 3x as high the
day before and after Easter (i used to work in retail, and these
trends were a big deal)

The point is that system admins, like humans in the wild, encounter
many different types of problems and need many different types of
solutions to handle them

Throw on the fact that you'll need probes to collect all this info
(and with the creation of switches, collecting network data has become
a major pain) and will want a repository to store it in (check out CIM
or DMTF for a recommendation and schema) and you'll want to avoid
overloading the system and will inevitably lose data and will have all
sorts of things being added and removed from the system (AppleTalk,
FDDI, change in CIRs on frames, adding EJB systems, taking out LU6.2
traffic, etc.) and you have one heck of a thing to keep track of. As
humans, we do OK, although lose a long term employee and you really
hurt yourself (this stuff is a bear to document, especially when
factoring in how changes effect things, so human memory works best).
So i wouldn't conquer it all as a masters project. But if you succeed,
give me a call :)

-baylor

[ comp.ai is moderated.  To submit, just post and be patient, or if ]
[ that fails mail your article to <[EMAIL PROTECTED]>, and ]
[ ask your news administrator to fix the problems with your system. ]



<-- __Chronological__ --> <-- __Thread__ -->


Usenet.com




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.