[thesite] thetips database
Daniel J. Cody
djc at starkmedia.com
Mon Jun 18 12:19:34 CDT 2001
Seth Bienek wrote:
> Hey Dean,
>
>
>>Okay, so the problem that you are having is getting the whole e-mail
>>message into the database?
>>
>
> Nope. The database is fine. The problem is parsing the entire .mbox archive (over 50 meg) without any errors. And since it's so memory and processor-intensive, I can only run the template against the entire archive every couple of hours, or else there ends up being overlapping threads and other issues.. Once the initial database population is done, there shouldn't be any more problems, but that first step is the big one.
>
> I have some ideas that I will test today, and I'll let you know if I can't get it squared away by this evening.
I thinnk I may have mentioned this before seth, but rather than parsing
that big ass mbox file, there are all those little weekly text files you
can parse. e.g. http://lists.evolt.org/archive/Week-of-Mon-20010611.txt
these are split up into easy to digest 500Kb - 1Mb weekly files. they're
also what deans tip harvester is using to extract shit now.
this would also solve having to parse the 50Mb file everytime you wanted
to update the DB. just a thought..
> As far the structure of 'thetips' as it is now, I haven't looked at it but I'm sure it will need to be reworked.
CREATE TABLE THETIPS (
TIP_ID NUMBER (8) NOT NULL,
TIP_DATE DATE NOT NULL,
AUTHOR_ID NUMBER (8),
TIP_TYPE VARCHAR2 (200),
AUTHOR VARCHAR2 (50),
BODY LONG,
PRIMARY KEY ( TIP_ID )
if you guys need anything else, please lemme know :)
.djc.
More information about the thesite
mailing list