[thelist] html page analyser program
Ken Schaefer
ken at adOpenStatic.com
Wed Jul 14 06:36:01 CDT 2004
I don't think the problem is "that their server isnt setup that well". I
think the problem is with your code. Your code appears to be using HTTP/1.0.
If you want to use the HTTP Host: header you need to use HTTP/1.1
Cheers
Ken
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Alex Beston" <alex at deltatraffic.co.uk>
Subject: [thelist] html page analyser program
: ive written a little php script which extracts useful info from a webpage.
:
: heres the url:
:
: http://www.deltatraffic.co.uk/regexp/elements.php
:
: put in the full url of any page and it will come back at you with the
info.
:
: now the problem is, is that if i try www.photos.org the header shown in
: the prog is this:
:
: Headers Content:
:
: HTTP/1.1 400 Bad Request
: Date: Wed, 14 Jul 2004 11:00:50 GMT
: Server: Apache/1.2.6
: Connection: close
: Content-Type: text/html
:
: now, looking for 400 bad request on a google search,
:
: this page:
:
: http://www.codestyle.org/sitemanager/FAQ.shtml#why400
:
: sheds some light that their server isnt setup that well.
:
: however, when i run ethereal and use a browser, it returns a 200 access
: ok code.
:
: GET / HTTP/1.1
: Host: www.photos.org
: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1)
: Gecko/20040707
: Accept:
:
application/x-shockwave-flash,text/xml,application/xml,application/xhtml+xml
,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q
=0.1
: Accept-Language: en-us,en;q=0.5
: Accept-Encoding: gzip,deflate
: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
: Keep-Alive: 300
: Connection: keep-alive
:
: HTTP/1.1 200 OK
: Date: Wed, 14 Jul 2004 11:05:46 GMT
: Server: Apache/1.2.6
: Keep-Alive: timeout=15, max=100
: Connection: Keep-Alive
: Transfer-Encoding: chunked
: Content-Type: text/html
:
: going back to my program, the code that gets the headers is this:
:
: function get_headers_php4($url)
: {
: $url_info = parse_url($url);
: $fp = fsockopen($url_info['host'],80,$errno,$errstr,30);
: if (!$fp)
: {
: print("failed to get headers");
: exit;
: }
: else
: {
: $head = "GET ".$url_info['path']."?".$url_info['query'];
: $head .= " HTTP/1.0\r\nHost: ".$url_info['host']."\r\n\r\n";
: fputs($fp,$head);
: echo "<pre>";
: while(!feof($fp))
: {
: $line = fgets($fp,1024);
: echo($line);
: if (strpos($line,"\r\n",0) === 0)
: {
: fclose($fp); echo "</pre>";
: return $header;
: }
: else
: {
: $header[] = $line;
: }
: }
:
: }
: }
:
: so the question is, how can i modify this code so that i dont see a 400
: response code?
:
: maybe i ought to put some thing like:
:
: $head .= "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
: rv:1.7.1) Gecko";
:
: tried that but it comes back with the same 400 code.
More information about the thelist
mailing list