[thesite] thetips database

Daniel J. Cody djc at starkmedia.com
Mon Jun 18 12:19:34 CDT 2001


Seth Bienek wrote:

> Hey Dean,
> 
> 
>>Okay, so the problem that you are having is getting the whole e-mail
>>message into the database?
>>
> 
> Nope.  The database is fine. The problem is parsing the entire .mbox archive (over 50 meg) without any errors.  And since it's so memory and processor-intensive, I can only run the template against the entire archive every couple of hours, or else there ends up being overlapping threads and other issues.. Once the initial database population is done, there shouldn't be any more problems, but that first step is the big one.
> 
> I have some ideas that I will test today, and I'll let you know if I can't get it squared away by this evening.


I thinnk I may have mentioned this before seth, but rather than parsing 
that big ass mbox file, there are all those little weekly text files you 
can parse. e.g. http://lists.evolt.org/archive/Week-of-Mon-20010611.txt

these are split up into easy to digest 500Kb - 1Mb weekly files. they're 
also what deans tip harvester is using to extract shit now.

this would also solve having to parse the 50Mb file everytime you wanted 
to update the DB. just a thought..


> As far the structure of 'thetips' as it is now, I haven't looked at it but I'm sure it will need to be reworked.


CREATE TABLE THETIPS (
   TIP_ID     NUMBER (8)    NOT NULL,
   TIP_DATE   DATE          NOT NULL,
   AUTHOR_ID  NUMBER (8),
   TIP_TYPE   VARCHAR2 (200),
   AUTHOR     VARCHAR2 (50),
   BODY       LONG,
   PRIMARY KEY ( TIP_ID )

if you guys need anything else, please lemme know :)

.djc.







More information about the thesite mailing list