[thelist] [TIP] - Use UTF-8 whenever possible, or get used to extra doses of caffeine.

Luther, Ron Ron.Luther at hp.com
Thu May 11 08:29:39 CDT 2006

VOLKAN ÖZÇELIK offered some suggestions:

Hi Volkan!

>>Well I guess it depends on the type of application. I generally expect the users to type the extended characters correctly.

Well yeah, getting users to type things correctly would certainly be helpful.   ;-P

I was thinking more of a case where the data contained several spellings for the same item ... Let's say we're looking at 
purchase order data from around the world and that for POs entered in the US, buyers type in "battery", but that for POs 
entered in Brazil, buyers enter "bẫttery".  [I know, probably a retarded example - this is for illustrative purposes only.]

Now, unless you write your own extremely fancy search engine, I suspect that a single standard search initiated by 
an end user will only return either the US entered 'battery' POs -or- the Brazilian.  My guess is that the user will expect 
the data to be complete, ["Hell, I entered 'battery' didn't I?"]

>>Just out of curiosity I did several searches on google to test it:

Thanks for taking the question seriously ... But I suspect Google does some pretty fancy stuff and doesn’t represent 
the kind of results you would get building a simple "SELECT <stuff> from UTF-8 Back-End WHERE [user input] = <field>".

>>It is easy to write a conversion function that takes "bahçeçİlİk" and converts it to "bahcecilik"

Yup. I used to have my European counterparts do a nice job stripping out and converting the multi-byte characters from 
their data. My Latin American and Asian buddies did not do quite so good a job.

>>SELECT (WHATEVER) FROM ADozenOfJoinedTables WHERE field LIKE "bahçeçİlİk" OR field LIKE "%bahcecilik%"
>>(using LIKE operator excessively can kill performance) at a cost of extra storage space.

I agree with you on the performance.  I try to stay away from LIKE operators as well.  

However, that is still an interesting idea about maintaining a 'shadow' search field with the characters adjusted ... I'll 
have to think about that some more.  I've used some similar techniques (displayed the 'long text' in the user drop-down, 
but used the 'short text' for the SQL). Perhaps I can get the designers to add a few of these for 'fields highly likely to 
be searched like this'.

Thanks!  I'll try pushing in that direction.


More information about the thelist mailing list