[thelist] html page analyser program

Ken Schaefer ken at adOpenStatic.com
Wed Jul 14 06:36:01 CDT 2004


I don't think the problem is "that their server isnt setup that well". I
think the problem is with your code. Your code appears to be using HTTP/1.0.
If you want to use the HTTP Host:  header you need to use HTTP/1.1

Cheers
Ken

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Alex Beston" <alex at deltatraffic.co.uk>
Subject: [thelist] html page analyser program


: ive written a little php script which extracts useful info from a webpage.
:
: heres the url:
:
: http://www.deltatraffic.co.uk/regexp/elements.php
:
: put in the full url of any page and it will come back at you with the
info.
:
: now the problem is, is that if i try www.photos.org the header shown in
: the prog is this:
:
: Headers Content:
:
: HTTP/1.1 400 Bad Request
: Date: Wed, 14 Jul 2004 11:00:50 GMT
: Server: Apache/1.2.6
: Connection: close
: Content-Type: text/html
:
: now, looking for 400 bad request on a google search,
:
: this page:
:
: http://www.codestyle.org/sitemanager/FAQ.shtml#why400
:
: sheds some light that their server isnt setup that well.
:
: however, when i run ethereal and use a browser, it returns a 200 access
: ok code.
:
: GET / HTTP/1.1
: Host: www.photos.org
: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1)
: Gecko/20040707
: Accept:
:
application/x-shockwave-flash,text/xml,application/xml,application/xhtml+xml
,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q
=0.1
: Accept-Language: en-us,en;q=0.5
: Accept-Encoding: gzip,deflate
: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
: Keep-Alive: 300
: Connection: keep-alive
:
: HTTP/1.1 200 OK
: Date: Wed, 14 Jul 2004 11:05:46 GMT
: Server: Apache/1.2.6
: Keep-Alive: timeout=15, max=100
: Connection: Keep-Alive
: Transfer-Encoding: chunked
: Content-Type: text/html
:
: going back to my program, the code that gets the headers is this:
:
: function get_headers_php4($url)
: {
:    $url_info = parse_url($url);
:    $fp = fsockopen($url_info['host'],80,$errno,$errstr,30);
:    if (!$fp)
:    {
:        print("failed to get headers");
:        exit;
:    }
:    else
:    {
:        $head = "GET ".$url_info['path']."?".$url_info['query'];
:        $head .= " HTTP/1.0\r\nHost: ".$url_info['host']."\r\n\r\n";
:        fputs($fp,$head);
:        echo "<pre>";
:        while(!feof($fp))
:        {
:            $line = fgets($fp,1024);
:            echo($line);
:            if (strpos($line,"\r\n",0) === 0)
:            {
:                fclose($fp); echo "</pre>";
:                return $header;
:            }
:            else
:            {
:                $header[] = $line;
:            }
:        }
:
:    }
: }
:
: so the question is, how can i modify this code so that i dont see a 400
: response code?
:
: maybe i ought to put some thing like:
:
: $head .= "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
: rv:1.7.1) Gecko";
:
: tried that but it comes back with the same 400 code.



More information about the thelist mailing list