[thelist] Stripping HTML from Emails

Ken Schaefer ken at adOpenStatic.com
Tue May 25 09:11:16 CDT 2004


The "content-type" stuff is not "added by some email browers". It's part of
the MIME spec. You have a multi-part MIME message, and the stuff you are
seeing are the content boundaries and meta-information about each content
type.

If you followed the SMTP spec, or used a email client implemtation, you'd be
able to get a collection of parts. Find the part(s) that is text/plain and
extract the body of that part.

If you are doing this by treating the email message as a bit of plain text
(rather than as an SMTP message), then things will be a bit harder for you.

Surely there must be some kind of component you can call from PHP that acts
like a POP3 email client?

Cheers
Ken

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: <admin at antonakis.co.uk>
Subject: Re: [thelist] Stripping HTML from Emails


: Liam,
:
: Thanks for that, however it won't work.
:
: Perhaps I didn't make myself clear in what I meant by stripping out the
: HTML form emails. I didn't just mean the HTML itself, I meant everything
: else associated with it as well.
:
: For example, here is an HTML email minus it's headers:
: Basically I just want the LX0008 & LX0010 part, and not all the
: content-type, etc that's added by some email browsers.
:
: MIME-Version: 1.0
: Content-Type: multipart/alternative;
: boundary="-----------------------------1085435574"
: X-Mailer: 9.0 for Windows sub 630
:
:
: -------------------------------1085435574
: Content-Type: text/plain; charset="US-ASCII"
: Content-Transfer-Encoding: 7bit
:
: LX0008
: LX0010
:
: -------------------------------1085435574
: Content-Type: text/html; charset="US-ASCII"
: Content-Transfer-Encoding: quoted-printable
:
: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
: <HTML><HEAD>
: <META http-equiv=3DContent-Type content=3D"text/html; charset=3DUS-ASCII">
: <META content=3D"MSHTML 6.00.2737.800" name=3DGENERATOR></HEAD>
: <BODY id=3Drole_body style=3D"FONT-SIZE: 10pt; COLOR: #000000;
: FONT-FAMILY:=20=
: Arial"=20
: bottomMargin=3D7 leftMargin=3D7 topMargin=3D7 rightMargin=3D7><FONT
: id=3Drol=
: e_document=20
: face=3DArial color=3D#000000 size=3D2>
: <DIV>LX0008</DIV></FONT></BODY></HTML>
:
: Regards
: Alexis
: -------------------------------1085435574--
:
: Liam Delahunty wrote:
: > on 24/05/2004 22:04 Alexis Antonakis wrote:
: >
: >> Hi,
: >>
: >> I have a script written in PHP which extracts details from emails.
: >> Everything works fine for plain text emails, but HTML ones are a
: >> nightmare.
: >> Can anybody point me in the right direction as to how I can detail
: >> with these emails.




More information about the thelist mailing list