[thelist] Text Comparison

Ken Moore psm2713 at hotmail.com
Thu Jun 8 15:24:53 CDT 2006


Hi all,

John Hicks has a text processing question.

You need to use the context of "tokens". Compilers use this when they check 
for errors. When it sees an "if", it starts looking for a "then". In the 
example below, "To", "or", and "the question" are the tokens.

You will need to process the list twice. The first time, count the maximum 
number of spaces between any two tokens in the input strings. Below, there 
are 6 before the "or" in line 3 and 22 before "the question" in line 1.

The second time through, you will know how many extras to put before each 
token.

Ken

> > Let's say this is the input:
> >
> > 1 To be or not to be that is the question.
> > 2 To or not that's the question.
> > 3 To been or to bend is the question.
> >
> > The output is like this, but I will use _ instead of spaces:
> >
> > 1_To___be_or_not_to_be_that_is_the_question.
> > 2_To______or_not_______that's__the_question.
> > 3_To_been_or_____to_bend____is_the_question.
> >
> > So now you can easily see that all texts begin with 'To' and then have
> > 'or' and end with 'the question.' Now you can also see where some texts
> > have more words or less words or different words than the other texts.
> > All the algorithm must do is figure the common words and then output
> > them lined up vertically so one can visually scan and see the
> > differences easily.

_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how to 
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement




More information about the thelist mailing list