[thelist] CF: List Manipulations

Frank framar at interlog.com
Fri Oct 19 17:00:24 CDT 2001


>: I've created a little utility to import a tab/return/comma delimited
>: text file into a database. I've got two questions.
>:
>: 1) I want to make it a little more robust, so that if the list is
>: somehow inconsistent that it returns an error. Can someone suggest
>: how I may go about verifying the consistency of a list?

Clarification on the above. The utility serves to allow the user to 
upload a text file that has been exported by something like an 
emailer or spreadsheet. The reason I'm doing this is that the target 
user is rather unsophisticated.

>  1. Does each delimited item in the import file represent
>  a single record.

>  2. Does each row of data in the import file represent a
>  record where each otherwise delimted record represent a
>  field in that record.

Yes. One list item is one email address, to be be imported into a 
table where each email address is one row.

>  3. Are any values quotes?

That would be determined by what may export it. I would think that 
comma delimited would, but it's not a given.

>  4. If so, and the field and/or record delimiters appear
>  within the quotes, are they included within the field?

No. An email cannot contain commas, returns or tabs, so that would 
preclude them.

>  5. If row=record, is there a set number of fields that
>  you expect in each row?

One field, one row.

>  6. Are you able to determine ahead of time the number of
>  expected records?

No, that's variable.

>  : 2) I've been trying to figure out how to get my app to
>  identify what : delimits a list. Imagine I have a list
>  of emails, delimited by : commas. From a machine point
>  of view '@', '.' or ',' might be the : delimiters. Now
>  if one of the emails happens to be missing with '@' : or
>  the '.', once can see how quickly a problem could arise.

What I mean by this more specifically: A human can by sight recognize 
an email, even if it's malformed. But how on earth does one go about 
instructing a machine to determine that pattern when there is a 
possibility of it's being malformed? If it was a given that all 
emails would be perfect I could simply use a simple regex like this: 
[A-Za-z0-9]+@[A-Za-z0-9]+.[A-Za-z]+

It might not be, and someone might have abc at domain with no .*  That 
might cause the next email to appear as abc at domaindef@domain.com, and 
thus I would have a corrupt.

This is what I mean by ensuring that the list data is consistent. So 
I would like to be able to figure out how the list is delimited 
without the necessity for the user to know, and then to ensure that 
what I import is consistent.
-- 

Our best destiny, as planetary cohabitants, is the development
of what has been called "species consciousness" - something over
and above nationalisms, blocs, religions, ethnicities.


Frank Marion                      Framar Studios
frank at framarstudios.com           http://www.framarstudios.com




More information about the thelist mailing list