[thelist] sql: order by and umlauts/entities

Joe Crawford jcrawford at avencom.com
Tue Apr 24 16:40:04 CDT 2001


rudy wrote:
> hi joxn
> 
> sounds to me like it's not a question of the  ö  sorting ahead of the  a
> but rather the  &  sorting ahead of the  a
> 
> if somehow the entities *are* getting evaluated, then yes, if you control
> the server and happen to have an alternate character set lying around, you
> can change the sort sequence, but i wouldn't go near something like that
> 
>      "10.1.1 The Character Set Used for Data and Sorting
>        By default, MySQL uses the ISO-8859-1 (Latin1) character set....
>        The character set determines what characters are allowed in
>        names and how things are sorted by the ORDER BY and GROUP BY
>        clauses of the SELECT statement. You can change the character set
>        with the --default-character-set option when you start the server.
> http://www.mysql.com/documentation/mysql/bychapter/manual_Server.html#Chara
> cter_sets
> 
> maybe use a regexp? (i'm guessing, because don't know how to write
> those)...

I have a small tidbit of regular expressions in MySQL - it involves
using rlike - which is a function to do a match for regular expressions.
I used it so I could do strange stuff like getting all the entries which
start with B, R, T and Y.

So on the page below I'd have a querystring where slice=BRTY

$query = "select
          id,
          last,
          first,
          address,
          city,
          state,
          zip,
          phone_home,
          phone_work,
          phone_fax,
          phone_cell,
          email,
          category
from      crawford_addresses";
if(isset($slice))
        { $query .= " where last rlike '^[$slice]'";
        };


Then it would grab those entries where "last" matches the regular
expression ^[BRTY]
(the beginning of the string, then B, R, T, or Y immediately following.

I'm wondering if maybe there is similar functionality in a SORT BY? So
that if an entry had an ampersand AND was also an entity, it would go
through some additional process?

> it would have to search the string for each entity and replace it with the
> unaccented character for sorting purposes, i.e. replace the six characters
> "ö" with the single character "o"
> 
> and you'd have to do that for all the entities you're likely to find in the
> table

This could get hairy - so knowing what entities to expect would be key.

A lazy way to go would be to make an index of all entries with the same
names, but with the entities replaced by standard ascii. Ugly, but
probably how I, not knowing any better, might approach it. :-\

	- Joe <http://artlung.com/>
--
Joe Crawford ||||||||||||||       mailto:jcrawford at avencom.com
||||||||||||||||||||||||             http://www.avencom.com
|||||||||||||||||||||||||||      Avencom: Set Your Sites Higher




More information about the thelist mailing list