[thelist] Re: Google and PDF's

Laura Carlson lcarlson at d.umn.edu
Thu Feb 19 10:28:52 CST 2004


>> you can select all text from a pdf and paste it into another file
>> (edit->select all or ctrl+a) - is this what you mean?

> No dynamically, I need to strip text from over 1,000 articles.

Would this technique from the University of Minnesota Web Accessibility 
Standards help?

A link can be created that passes the URL of a PDF document ? as a query string ? to an Adobe Acrobat conversion utility script on the access.adobe.com server. A HTML document is returned, which approximates the logical reading order of the text in the PDF document and is formatted it as a single column of text.

All existing hypertext links are converted into HTML links. This includes intra-document links as well as links to other documents on the Internet. Extra HTML links are also created to enable easy navigation between pages.

Link for the PDF version:
<a href="http://www.domain.com/example.pdf">Example Document</a>

Link for the HTML version:
<a href="http://access.adobe.com/perl/convertPDF.pl?url=http://www.domain.
com/example.pdf">Convert "Example Document" to HTML</a>

For more info see:
http://cap.umn.edu/ait/Web/Downloads.html

Laura
___________________________________________
Laura L. Carlson
Information Technology Systems and Services
University of Minnesota Duluth
Duluth, MN  55812-3009
http://www.d.umn.edu/goto/webdesign/


More information about the thelist mailing list