
www.Usenet.com
| <-- __Chronological__ --> | <-- __Thread__ --> |
> I want to build a tool for data mining from an html page. I want the user to > select an element from a web page, and train my application to recognize it > in its later updates. For example, suppose the user wants to extract some > data from a financial. He want to extract his total balance, plus the table > of the last transactions. What he should do is to highlight the elements > inside the html page. After doing that, the application should analyze the > html element structure, and learns how to find it in similar pages (even > when they are not identical). What I really need is an algorithm to > "understand" a single element (by it's structure, position in page or any > other methods), and then I want to look in a new page, and choose the most > similar element (which should probably be the right one). Seems you are trying to "learn" a structure, for example a grammar for a pattern language. There are a bunch of algorithms out there that can learn text patterns nicely. I've seen something like what you described before, I think it was with the Lexikon Project at DFKI (www.dfki.de). I don't know of any publications out of the top of my head, though. Markus [ comp.ai is moderated. To submit, just post and be patient, or if ] [ that fails mail your article to <[EMAIL PROTECTED]>, and ] [ ask your news administrator to fix the problems with your system. ]
| <-- __Chronological__ --> | <-- __Thread__ --> |
Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.