Usenet.com

www.Usenet.com

Group Index

Comp Thread Archive from Usenet.com

<-- __Chronological__ --> <-- __Thread__ -->

Re: Finding an HTML element



> I want to build a tool for data mining from an html page. I want the user to
> select an element from a web page, and train my application to recognize it
> in its later updates. For example, suppose the user wants to extract some
> data from a financial. He want to extract his total balance, plus the table
> of the last transactions. What he should do is to highlight the elements
> inside the html page. After doing that, the application should analyze the
> html element structure, and learns how to find it in similar pages (even
> when they are not identical). What I really need is an algorithm to
> "understand" a single element (by it's structure, position in page or any
> other methods), and then I want to look in a new page, and choose the most
> similar element (which should probably be the right one).

Seems you are trying to "learn" a structure, for example a grammar for
a pattern language. There are a bunch of algorithms out there that can
learn text patterns nicely.

I've seen something like what you described before, I think it was
with the Lexikon Project at DFKI (www.dfki.de). I don't know of any
publications out of the top of my head, though.

Markus

[ comp.ai is moderated.  To submit, just post and be patient, or if ]
[ that fails mail your article to <[EMAIL PROTECTED]>, and ]
[ ask your news administrator to fix the problems with your system. ]



<-- __Chronological__ --> <-- __Thread__ -->


Usenet.com




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.




Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.