[thelist] Local Storage of RSS

Kelly Hallman khallman at wrack.org
Wed Mar 19 17:18:28 CST 2003


On Wed, 19 Mar 2003, Hassan Schroeder wrote:
> Seth Fitzsimmons wrote:
> > How would you store large numbers (500,000+) of RSS documents for 
> > periodic local parsing or serving?
> 
> I'd think the RDBMS overhead would be less than going through that
> much content with an XML parser, but that's a guess not based on
> experience with that volume of data :-)

Yes, I'd assume parsing that much XML would take quite some time if done
on-the-fly.  I also assume that due to the nature of RSS data, once it's
there, it doesn't really need to be changed, correct?  (i.e. for the most
part you are just appending data at the end of the file)

What if the data was in one or several large files and you stored the
offset of each article somewhere else, such as in an RDBMS?  If you know
what byte marks the start of the message that you want, and how long the
message data is, it shouldn't be a big performance hit to pull that out of
the large file using the file seek/read functions of most any language.

If they are sequential, even better -- you'd be able to pull message
ranges out of the file just as quickly as one message.

Could be somewhat efficient as a way of grabbing data from a large file.  
Of course it is only easy if you don't change the data, or do carefully.

RSS files don't lend themselves to a relational structure, but SQL
provides a powerful query language (and speed), relational data or no.

-- 
Kelly Hallman
http://wrack.org/





More information about the thelist mailing list