[thelist] extracting site "visits" from a table tracking page views (mysql)

Ken Schaefer ken.schaefer at gmail.com
Tue Aug 24 19:05:36 CDT 2004


Jewel's suggestion may be a good one...

Otherwise, if you wish to do it the way you are currently doing it
then you need to count only those records where:
- the time difference between the current and previous record is more
than 4 hours
- the user IDs for the two records are the same
- and the RecordID for the previous record is the one immediately
prior to the recordID for the current record (for the same user),
otherwise, you will be counting many prior records for the current
user.

This may not be the most efficient way to do this, but it might get you started:

SELECT
	a.UserID,
	COUNT(a.RecordID) as UserVisits,
	a.Country
FROM
	Stats_Pages AS a
WHERE EXISTS
	(
	SELECT	
		NULL
	FROM
		Stats_Pages AS b
	WHERE
		a.UserID = b.UserID
	AND
		DateDiff(hh, b.TimeStamp, a.TimeStamp) >= 4
	AND
		b.RecordID = 
		(
			SELECT 
				Max(c.RecordID)
			FROM
				Stats_Pages c
			WHERE
				b.UserID = a.UserID
			AND
				c.RecordID < a.RecordID
		)
GROUP BY  
	a.UserID
	a.Country
ORDER BY  
	COUNT(a.RecordID) DESC

Cheers
Ken

On Tue, 24 Aug 2004 16:59:31 -0300, Sarah Sweeney
<mr.sanders at designshift.com> wrote:
> I'm totally stumped by this query I'm trying to write. I currently have
> a table in my database that tracks the page views by registered users of
> the site. This is used to compile stats for the client such as frequency
> of site use by user country, user organization, etc. I've been asked now
> to compiled similar reports based on user "visits" rather than
> individual page views. I'm not exactly sure how to classify a "visit",
> but I'm thinking that I will say a new visit begins when a user accesses
> a page at least 4 hours after the last page view (this is debatable, but
> not the most important aspect of my question).
> 
> My question is, how do I select only those records from my stats table
> that have a timestamp of at least 4 hours more than the previous record
> for the client. I'm thinking something along these lines...
> 
> SELECT    COUNT(spcurr.UserID) AS UserCount, spcurr.Country
> FROM      Stats_Pages spcurr, Stats_Pages spprev
> WHERE     spcurr.UserID = spprev.UserID
> AND       MAX(spprev.Timestamp) < DATE_SUB(spcurr.Timestamp, INTERVAL 4
> HOUR)
> GROUP BY  sp.Country
> ORDER BY  UserCount DESC
> 
> ....but I know that the MAX() in the WHERE statement is not quite right.
> Any ideas on how this can be accomplished?


More information about the thelist mailing list