
www.Usenet.com
| <-- __Chronological__ --> | <-- __Thread__ --> |
On Sun, 05 Oct 2003 02:09:14 GMT, Christoph Hahn <[EMAIL PROTECTED]> wrote: > I'm doing my master thesis in network management using an expert system. > I try to design a system that does somehow the following: > - Scan the network > - Find problems in the network (like Intrusion detection, wrong user > behavior, tech. problems) using an expert system. > - Simulate the result in the network simulator (ns2). Christopher, i'm a global architect for a big financial company and could use a tool similar to what you're describing (both at the network level and, much better, at the system/application level). So i have money and need but haven't found much that does it well There are expensive tools to detect problems. You can Google for "RCA" and "root cause analysis". BMC's Patrol has a rule-based expert system to detect problems and CA's Unicenter has Neugents. For monitoring tools, Mercury's Topaz has some kind of RCA tool; i think it's Bayesian based. BMC's R&D group also does a lot of work with people at the University of Massachusets (Lowell) As for you doing the above for a masters project, i highly doubt you'll create what you're describing. It's too big. There are many different problems and you'll need lots of different techniques to catch them To use a really simple example, let's say you have a network going to a Web server to a network to an app server to a network to a DBMS server. An app stops working. Can you tell which component broke? The answer is to monitor traffic on each segment, which isn't terribly AI, although you could use production rules if you really wanted to to run the investigation (by the way, a tool named Tonic does something similar to this). This diagnostic module would kick off when alerted by a monitoring process There are plenty of low-level problems you'd have to know about. For example, say that you see an Ethernet packet that you know is corrupt because the frame CRC is wrong. The header is full of wierd results. In raw mode, they'd look something like 110110110 (i forget the exact sequence; it's been a long time since i was allowed to get my hands dirty). That happens to be the leading signature that comes before the actual header. What does it mean to see that in a packet? It means a late collision. And how would you get a late collision? Two ways - your Ethernet segments are more than 990 feet (which means the bits can't travel from one end to the other and back again in the time it takes to broadcast all the header bits) and/or you have a bad transciever (it can't detect collisions, which is necessary for a CSMA/CD system). Putting it all together, you can take the information and run it through a rule base to suggest a diagnosis, which in this case requires equipment changes But what about system crackers or bad employees? These people execute perfectly legal commands, so how are you going to detect them? The answer is that you need a way to make a profile of people's actions. Sorta like the concept of Zipfian distributions in IR - some actions are common, some are rare and some are common or rare given a certain context. Testing a port is common but testing a port 200 times in 30 seconds is not. If certain actions are happening far more often than is normal (and part of profiling is to determine what the proper distribtion and frequency of certain actions are), flag it for further review. And if a user accumulates lots of flags, start paging people. Another option is to use a Markov chain. That's something that says "if X happens, normally Y happens n% of the time, Z happens n2% of the time and A happens n3% of the time". String them together like n-grams and you can build a profile of certain user's behaviors. Which is a very different approach than the approaches described above And then, of course, there are general system faults that happen based on system counters like CPU, cache hits, file node usage, disk space usage, etc. When your system matches a certain profile, there might be a problem. This one can probably be done best by running a decision tree (CART or C4.8) against a bunch of data. Which it's unlikely you have - few companies generate enough traffic to generate a good sampling of problems. Predictive monitoring is a nice thing too. CPU was 10% then 40% then 70%. More than just raw numbers, there's a trend line here that, taken to a linear conclusion, says you're going to run out of CPU soon. If i'm going to run out of resources in an hour, i want to know. BUT, suppose i told you the above numbers were for 10am, 11am and noon? Since i know my volume maxes at lunch, the trend might not be linear and is simply peaking at a nice number then dropping. At that point i don't want to be alerted. And then there's seasonal fluctuations - my volume really is higher in November and December and is 3x as high the day before and after Easter (i used to work in retail, and these trends were a big deal) The point is that system admins, like humans in the wild, encounter many different types of problems and need many different types of solutions to handle them Throw on the fact that you'll need probes to collect all this info (and with the creation of switches, collecting network data has become a major pain) and will want a repository to store it in (check out CIM or DMTF for a recommendation and schema) and you'll want to avoid overloading the system and will inevitably lose data and will have all sorts of things being added and removed from the system (AppleTalk, FDDI, change in CIRs on frames, adding EJB systems, taking out LU6.2 traffic, etc.) and you have one heck of a thing to keep track of. As humans, we do OK, although lose a long term employee and you really hurt yourself (this stuff is a bear to document, especially when factoring in how changes effect things, so human memory works best). So i wouldn't conquer it all as a masters project. But if you succeed, give me a call :) -baylor [ comp.ai is moderated. To submit, just post and be patient, or if ] [ that fails mail your article to <[EMAIL PROTECTED]>, and ] [ ask your news administrator to fix the problems with your system. ]
| <-- __Chronological__ --> | <-- __Thread__ --> |
Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.