[thelist] Grabbing some information (Text and image) from a Website with php

ben morrison morrison.ben at gmail.com
Thu Apr 23 03:54:23 CDT 2009


On Thu, Apr 23, 2009 at 4:26 AM, Phyu Phyu Aung <phyuag at gmail.com> wrote:

> I'd like to grab the some part of description and image from other website
> and want to show in my website.
> Anyone can hlep me give the way to get it?
>
> Thanks in advance.


You could use YQL,

http://developer.yahoo.com/yql/

it allows you to use an XPATH expression such as:

select * from html where url="http://finance.yahoo.com/q?s=yhoo" and
xpath='//div[@id="yfi_headlines"]/div[2]/ul/li/a'

WHich is getting the content from this HTML, DIV id="yfi_headlines, 2nd DIV,
UL LI A - so grab all the anchors...

<div class="yfi_quote_headline" id="yfi_headlines"><div class="hd"><h2
class="inline">Headlines</h2><a href="
http://edit.finance.yahoo.com/e1a?.done=http%3A%2F%2Ffinance.yahoo.com%2Fq%3Fs%3Dyhoo">Filter
Headlines</a></div><div class="bd"><ul><li><a href="
http://us.rd.yahoo.com/finance/external/cbsm/SIG=1291gpt38/*http://www.marketwatch.com/news/story/sad-tale-cuil-far-cool/story.aspx?guid=%7B4814DBD8%2DBCC5%2D4E3C%2D8902%2DEDC107773452%7D&siteid=yhoof">Therese
Poletti's Tech Tales: The sad tale of Cuil is far from cool</a><cite>at
MarketWatch<span>(Thu 12:01am)</span></cite></li><li><a href="
http://us.rd.yahoo.com/finance/external/wsj/SIG=11pb3828g/*http://online.wsj.com/article/SB124044762578645961.html?ru=yahoo&amp;mod=yahoo_hs">[$$]
AT&T Backs Privacy Rules</a><cite>at The Wall Street Journal
Online<span>(Wed 11:34pm)</span></cite></li><li><a href="
http://us.rd.yahoo.com/finance/external/reuters/SIG=1165jh9eu/*http://www.reuters.com/legacyArticle?duid=mtfh58520_2009-04-23_03-00-12_seo330624_newsml&rpc=44&type=marketsNews">EBay
wins regulator approval for Gmarket deal</a><cite>at Reuters<span>(Wed
11:00pm)</span></cite></li><li><a href="
http://us.rd.yahoo.com/finance/editorial/xbizwk/SIG=12cecn6um/*http://www.businessweek.com/innovate/content/apr2009/id20090413_723482.htm?campaign_id=yhoo">Can
Widgets Save the Television Industry?</a><cite>at BusinessWeek<span>(Wed
10:52pm)</span></cite></li><li><a href="
http://us.rd.yahoo.com/finance/external/wsj/SIG=11prs7l7s/*http://online.wsj.com/article/SB124033650697439719.html?ru=yahoo&amp;mod=yahoo_hs">Squeezed
in the Middle</a><cite>at The Wall Street Journal Online<span>(Wed
9:03pm)</span></cite></li><li><a href="
http://biz.yahoo.com/ap/090422/lt_latam_markets.html?.v=1">Latin American
stocks gain on credit moves, rates</a><cite>AP<span>(Wed
7:18pm)</span></cite></li><li><a href="
http://biz.yahoo.com/ap/090422/business_highlights.html?.v=1">Business
Highlights</a><cite>AP<span>(Wed  5:49pm)</span></cite></li><li><a href="
http://biz.yahoo.com/ap/090422/us_wall_street_stocks.html?.v=2">Morgan
Stanley, Ford, Yahoo, AirTran big movers</a><cite>AP<span>(Wed
5:33pm)</span></cite></li><li><a href="
http://biz.yahoo.com/ap/090422/na_us_dollar.html?.v=1">Dollar mixed on news
global economy will shrink</a><cite>AP<span>(Wed
5:05pm)</span></cite></li><li><a href="
http://us.rd.yahoo.com/finance/external/cbsm/SIG=12ilvbi40/*http://www.marketwatch.com/news/story/yahoo-sandisk-lead-techs-higher/story.aspx?guid=%7B31367FD7%2DAB65%2D465B%2DAABE%2DAB481E914B40%7D&siteid=yhoof">Tech
Stocks: Yahoo, SanDisk lead techs higher</a><cite>at MarketWatch<span>(Wed
5:03pm)</span></cite></li></ul></div><div class="ft"><a
href="/q/h?s=YHOO&t=2009-04-22T17:03:00-04:00" class="view_more">
                 » More Headlines for YHOO</a><div class="myLinks"><p>Add
YHOO Headlines  to My Yahoo!</p><a href="
http://add.my.yahoo.com/add/module?url=http://finance.yahoo.com/rss/headline%3Fs=YHOO&lg=us"
class="my_yahoo_us">+ My Yahoo!</a> <a href="
http://us.rd.yahoo.com/finance/news/rss/add/*http://finance.yahoo.com/rss/headline?s=YHOO"
class="rss_feed">RSS</a></div></div></div>


Try the console, examples on the right.
http://developer.yahoo.com/yql/console/

Other options are using PHP and CURL etc...

-- 
Ben Morrison



More information about the thelist mailing list