[thelist] [MYSQL] - The most effective algorithm for the shortest path.

Matt Warden mwarden at gmail.com
Sat Dec 10 05:05:05 CST 2005

Hash: SHA1

> I had done something similar before, but it was not this much complex:
> It was going only three levels deep.
> Currently the depth of the path is undefined (although there will be
> an upper-limit I suppose).
> And we really have a high server load (2000+ users logged in within
> working hours 200-300 users logged in at night and on weekends).
> I thought about using temporary tables. Will it be as overkilling as
> nested queries or joining the table with itself?
> Have you done something similar?
> Do you have any suggestions?

I designed a cache of this exact information for a social networking
site prototype I worked on about 13 months ago. Basically, I stored 4
pieces of data in a table: the user id we're interested in, the user id
this user is connected to (call this end-user), the number of "hops"
between the two users, and the user id of the person directly connected
to the end-user. In my documentation, I wrote this in English as "userid
x is connected to userid y via userid z in h hops"

So, for a connection of 1-5-12-54, this data would be the first row in
the table:

userid      userid_connected       hops      userid_via
   1                54              3             12
   1                12              2              5
   1                 5              1            NULL

As you can see from this cache of the partial results of the path
calculation, you can recreate the path fully. I tried to think of a way
to structure the data such that I could return only the rows relevant to
the path I am interested in, but I could not figure this out, since it
seems like I would need to know the ids in the path beforehand (which
obviously gets me nowhere).

But, basically what I did was select rows where userid=x and order by
hops descending. Then I would go through the resultset using userid_via
in each loop iteration to find the next relevant record.

As I said, this was a prototype, so it was not ever put to any stress
testing or anything of that sort. The number of relevant cache records
still grows in size exponentially, but you do get to skip the heavy
computation of finding the shortest path dynamically. And it is not
difficult to keep the cache up-to-date, as it requires relatively minor
code additions on the add/remove friend algorithms.

Hopefully this makes sense. If it doesn't, let me know. I spent a
considerable amount of time on this project, so if you want to discuss
this issue or other related issues, let me know.

- --
Matt Warden
Miami University
Oxford, OH, USA

This email proudly and graciously contributes to entropy.
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org


More information about the thelist mailing list