[thelist] Google Pagerank & Validation

Beau Hartshorne beau at members.evolt.org
Thu Oct 18 22:37:40 CDT 2001


Why does it have to check plugins, etc? All it needs to do is check for
a syntactically correct page. Validator.w3.org does this, and returns
errors and or warnings. An accessibility checker does the same sort of
thing -- returns errors and or warnings. The more errors, the less
pagerank.

Google maintains a cached copy of every single page it has indexed. This
is refreshed every 30 days. How long would it take for a validator (an
optimized one running on a supercomputer) to go through all of the
pages, and return how many errors each had? Google already does some
crazy calculations to figure out pagerank. Read the paper below:

http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm

Say it's possible, and technically practical. The question is, how will
returning higher ranking validated results benefit people who are
looking for web pages? So much of the web's content is old, but so much
of it is still very useful. For example, try running the link above
through http://validator.w3.org/, using any HTML version. Ouch.

Maybe it's a feature that could be turned on or off. Who would want to
turn it on, and why? Why would you would want to keep it off?

More ideas, arguments?

Cheers,

Beau

> Well, yea it possible.  But the amount of data the would have 
> to be generated and stored would flipping insane.  Just think 
> about the possible combinations of browser and plugin.  Plus, 
> having to parse and categorize every page.  Then searching 
> through it.  The maintenance on the browser+plugin combo 
> tables alone would be massive.  
> 
> Finally, define "appropriate".  Complete rendering with no 
> visual or script errors?  Good enough to read the words but 
> the design breaks? 
> The choice combinations here are massive as well.





More information about the thelist mailing list