[thesite] Tip Harvester question (was: [***] Formatting tips )
rudy
r937 at interlog.com
Fri Mar 30 14:42:27 CST 2001
> Sometimes thinking "outside the box" is a challenge for me.
i dunno, i think you've been doing fine so far
as far as the harvesting logic is concerned, i dunno if scanning for
"\n<tip" is flexible enough
why not just scan for "<tip"?
then look for the closing ">" and parse everything inside it as name/value
pairs
then scan ahead and find "</tip>" and everything in between is the tip body
the name/value pairs get stored in the database under multiple categories
or whatever(*)
yes, this might result in duplications when tips are (re)posted in replies,
etc.
but so what? look at how matt and michele have gone through tons of old
articles cleaning them up
besides, we are not taking advantage of a huge untapped labour force -- the
authors themselves
(*) special note -- please let me know when you want to start testing
harvest
code and i'll be sure to have a document ready within a day explaining the
evolt tables and relationships
oh, and i just remembered, you must extract the date and author's email id
from the post (headers?)
rudy
More information about the thesite
mailing list