Fabulous idea! You could expand it to be self-healing: Have two IP addresses for each machine, one that is "live" and one for "testing". Your existing routine would test for the "live" address. Assuming that machine 'B' went down, machine 'A' would take over 'B's LIVE address. If machine B then comes back online (gets fixed), the LIVE IP for machine 'B' will be killed because machine 'A' already has it. But machine 'B's TEST address WILL get assigned. Machine 'A' has been testing to see if machine 'B' has come back online by testing for machine 'B's TEST address. When it sees machine 'B' online, it removes machine 'B's LIVE address (ifconfig), and then machine 'B' can assign it back to itself. Glenn Hunt ghunt at hds.ca > Call 'em A, B and C. A cron job on each box pings the next > machine [ A -> B, B -> C, C -> A ] at relatively short intervals. > > If, say, A gets no response from B, it waits for a small period of > time (in case the machine's rebooting for some reason), tries > again, and if no response this time ifconfigs up a virtual > interface with > the missing machine's IP address (and sends an alert to someone's > pager, of course!).