[thelist] search engine issue (development issue that is)

Colin Mitchell colspan at jerkvision.com
Thu Jun 20 11:00:00 CDT 2002


Hmmm... this is an interesting question.  I can think of a few changes I
might consider, both to make things simpler, and (possibly) a little
quicker:

1 - I would consider basing your temp table not on a user's session_id, but
on the search terms themselves.  This way, if multiple users are searching
for the same keywords (which may or may not be the case for you, but it's
certainly something I've seen), then they can rely on the same entries in
the temp table.  You save space, and you aren't repeating DB queries when
you don't need to.
2 - In terms of cleaning out the table, I would take an approach like this -
put a timestamp into the table, and run a process every now and then to
clean out old searches.  This leads me to idea #3...
3 - If a user comes back to the website and tries to do a search for which
the data has been cleaned out of your table, just regenerate the entries and
go from there.  This is something that you're probably going to need to be
able to do that anyway in case a user bookmarks a search result page, etc,
etc.  Are you passing in your search terms on the URL?

Hope that helps - Colin



> -----Original Message-----
> From: thelist-admin at lists.evolt.org
> [mailto:thelist-admin at lists.evolt.org]On Behalf Of Chris Blessing
> Sent: Thursday, June 20, 2002 11:45 AM
> To: thelist at lists.evolt.org
> Subject: [thelist] search engine issue (development issue that is)
>
>
> Hi all once again-
>
> I'm currently working on a project for my company's website whereby I'm
> creating a search engine to search our database of content.  The details
> aren't important but I am looking for some pointers on how to handle the
> actual search results.  Here's my current thought process:
>
> 1) stored procedure searches content and plops results (consisting of the
> user's session id (IIS SID, not SQL Server SID), search id (an incremental
> number starting at 1, increasing per search that the user
> performs), article
> id and the rank for that article) into a "search results" table which I've
> created ahead of time (i.e. it's not a temp table).
>
> It would look like this:
>
> session_id	search_id	article_id	rank
> ----------	---------	----------	----
> 123-134-ab	1		101		1000
> 123-134-ab	1		104		988
> 123-134-ab	1		198		450
> 323-333-de	1		987		980
> 323-33-de	1		239		430
>
> That's two users' search results.  One user (123-134-ab) got back
> 3 results
> based on the  search terms and they are ranked accordingly.  The
> other user
> (323-333-de) got back 2 results for his/her search.  Note that the
> session_id changes per user but the search_id is constant FOR THIS SEARCH.
> If 123-134-ab decided to search again, the search_id would increment (via
> some ASP code) to 2, and so on.
>
> 2) The actual search results are displayed by going back to this table of
> search results, selecting all rows based on search_id and session_id
> (search_id would be generated before the search itself and held as a
> session-level variable so we can increment it everytime the user
> searches),
> and then getting the article details based on article_id
> (probably a simple
> table join, nothing really fancy)
>
> Seems pretty straight-forward no?
>
> My problem is that this search results table is going to fill up
> mighty fast
> if users are getting 20-200 results per query and doing 1-5 searches per
> session (that's what I'm guessing will happen, anyhow).  I thought about
> cleaning the search results table out at Session_OnEnd but that poses a
> problem. If the user lets their session time out by sitting idle, the
> results are lost and the user would probably not be too happy since they
> most likely forgot whatever their search terms were AND they'd have to go
> back and search again.
>
> Also I'm not so sure I can get the actual session ID generated by IIS once
> the session_onend event begins.  If that's the case, I have no way of
> identifying the session and therefore cannot delete anything reliably from
> the search results table!
>
> I suppose I could circumvent this by writing the session id to a
> real cookie
> (instead of a session variable) and just overwrite that cookie
> the next time
> the user comes to the site, but that seems a bit awkward.  It also still
> doesn't help me with the session timeout problem, which btw is 20
> minutes on
> our servers.
>
> Anyone have any insight or advice for this type of a project?  I apologize
> for the lengthy message... it's so much to explain!
>
> TIA!
>
> Chris Blessing
> webguy at mail.rit.edu
> http://www.330i.net
>
> --
> For unsubscribe and other options, including
> the Tip Harvester and archive of thelist go to:
> http://lists.evolt.org Workers of the Web, evolt !
>
>





More information about the thelist mailing list