[thelist] Address Standardization

Luther, Ron Ron.Luther at hp.com
Wed Dec 11 07:41:01 CST 2002


Hi Tab,

I haven't been following developments in this area for a long time.
However, it used to be a painful thing to try to do.  (I think it
used to be called 'intelligent matching' and sometimes it was pretty
hard.)

IIRC company names and street addresses were particularly tough to do. {e.g.
That company that makes 'Mustang's and 'Thunderbird's might show up in
your company records as "Ford", "FoMoCo", "Ford Motor", "Ford Inc.", "Ford
Motor Company", and a dozen other variants differing in spelling, spacing,
capitalization, and punctuation.}

I would think States and Countries wouldn't be too bad since it isn't all
that big a list and there is pretty good agreement on what is 'correct';
(Both parts are important.)

1) Create/acquire a 'good' list.
2) Run your data against the known 'good' values and drop the exceptions
to a temp file.
3) Add a field to the temp file for 'corrected' values.
4) Have someone manually review and populate that field.
5) Update and correct your data.  (This also provides you with an audit
trail explaining what got changed and how it was changed.)

City names would be more painful, not only because of the volume, but
also because of duplicates and 'near-duplicates' across States. (Not to
mention certain cities where the 'correct' spelling itself may be in
dispute ... you can find references both to 'McAllen, Texas' and
'Mc Allen, Texas' for example.)

Whichever way you go, I would recommend carefully monitoring what records
and values actually get changed.  (Quebecois residents may object to having
'Saint Paule' replaced with 'Saint Paul' for example.)

Good Luck & HTH (some),

RonL.

-----Original Message-----
From: Tab Alleman [mailto:Tab.Alleman at MetroGuide.com]

Does anybody have or know of some software that "guesses" at misspelled
address data and corrects it?

We'd want it to do the same for States and Countries as well.



More information about the thelist mailing list