[thelist] character encoding nightmare

Garth Hagerman hagerman at mcn.org
Thu Mar 16 23:35:50 UTC 2017

>How are you copying data from DB 1 to DB 2 or are you sending data to both at the same time via the same script? Either way, How are you escaping / making the data safe for insertion into the DB?

The updating scripts are not public; they’re in a password-protected folder for just me and my client. So, I’m not concerned about injection attacks. Maybe I should be.

When we update, we enter data into forms on site1 (universitypressaudiobooks.com). The data goes through a script which uses htmlentities(). It also does some other character replacements, but I don’t think that’s likely to be the issue. The entity-ized strings get inserted or updated  into DB1 and the updating page auto-submits a form with the same strings to a page on site2 (militaryaudiobooks.com) which does the same query with what’s supposedly the same data into DB2. But somehow the apostrophes and quotes go cuckoo on site2.


> On Mar 16, 2017, at 4:02 PM, Garth Hagerman <hagerman at mcn.org> wrote:
> Hello-
> Is The List still alive?
> I’ve got a problem which would probably be simple to solve if I knew more about character encoding, but, since I don’t, I’ve been flailing wildly and wasting a lot of time.
> I’ll give you the short version. If somebody replies, I can supply further details.
> I’ve got two MySQL databases running on different hosts. I’ve written a series of scripts which keep certain tables and fields synced between them.
> The basic mechanics of this work fine, but certain characters turn to gibberish on the second site. The exact same html enity-ized strings are being inserted into both DBs, but apostrophes, etc. turn to gibberish on the second site.
> Originally, I had everything with the sites’ pages using iso-8859-1encoding and the DBs set for Latin-Swedish collation. As I read about encoding problems, it was suggested I convert to UTF-8. That makes things worse.
> I really don’t know what to try.
>  thanks in advance
> Garth

