[thesite] Tip Harvester question (was: [***] Formatting tips )

rudy r937 at interlog.com
Fri Mar 30 14:42:27 CST 2001


> Sometimes thinking "outside the box" is a challenge for me.

i dunno, i think you've been doing fine so far

as far as the harvesting logic is concerned, i dunno if scanning for
"\n<tip" is flexible enough

why not just scan for "<tip"?

then look for the closing ">" and parse everything inside it as name/value
pairs

then scan ahead and find "</tip>" and everything in between is the tip body

the name/value pairs get stored in the database under multiple categories
or whatever(*)

yes, this might result in duplications when tips are (re)posted in replies,
etc.

but so what?  look at how matt and michele have gone through tons of old
articles cleaning them up

besides, we are not taking advantage of a huge untapped labour force -- the
authors themselves


(*) special note -- please let me know when you want to start testing
harvest
code and i'll be sure to have a document ready within a day explaining the
evolt tables and relationships

oh, and i just remembered, you must extract the date and author's email id
from the post (headers?)

rudy








More information about the thesite mailing list