[thelist] Text Comparison

Hershel Robinson hershel at galleryrobinson.com
Thu Jun 8 13:30:15 CDT 2006


Peter Brunone (EasyListBox.com) wrote:
> 		   I'm not even sure I understand the requirement.  Users are going to enter sentences without spaces and you're going to split them up into their component words?  That doesn't make sense, so I'm pretty sure I don't get it.
> 
>    What's the final goal here?

Let me clarify. This is a tool to compare various texts of ancient 
manuscripts. So the idea is to be able to easily see which texts have 
extra words or are missing words, compared to other texts.

Perhaps the spaces didn't render correctly in everyone's mail client. 
Let's say this is the input:

1 To be or not to be that is the question.
2 To or not that's the question.
3 To been or to bend is the question.

The output is like this, but I will use _ instead of spaces:

1_To___be_or_not_to_be_that_is_the_question.
2_To______or_not_______that's__the_question.
3_To_been_or_____to_bend____is_the_question.

So now you can easily see that all texts begin with 'To' and then have 
'or' and end with 'the question.' Now you can also see where some texts 
have more words or less words or different words than the other texts. 
All the algorithm must do is figure the common words and then output 
them lined up vertically so one can visually scan and see the 
differences easily.

It's easier to see with spaces, but this is the basic idea.

Hershel

-- 
Gallery Robinson Web Services
http://web.galleryrobinson.com/



More information about the thelist mailing list