[thelist] Text Comparison

Peter Brunone (EasyListBox.com) peter at easylistbox.com
Thu Jun 8 12:41:10 CDT 2006


		Aha!  I get it now.  Still would be nice to know what happened to the spaces, though.

   This sounds very similar to the presentation you get with source/version control software like Visual SourceSafe (though in that case it's usually line breaks rather than spaces).  Any chance you could find a free SCC app and bend it to your will?

Just a thought...

Peter

				From: Hershel Robinson hershel at galleryrobinson.com

Peter Brunone (EasyListBox.com) wrote:
> I'm not even sure I understand the requirement. Users are going to enter sentences without spaces and you're going to split them up into their component words? That doesn't make sense, so I'm pretty sure I don't get it.
> 
> What's the final goal here?

Let me clarify. This is a tool to compare various texts of ancient 
manuscripts. So the idea is to be able to easily see which texts have 
extra words or are missing words, compared to other texts.

Perhaps the spaces didn't render correctly in everyone's mail client. 
Let's say this is the input:

1 To be or not to be that is the question.
2 To or not that's the question.
3 To been or to bend is the question.

The output is like this, but I will use _ instead of spaces:

1_To___be_or_not_to_be_that_is_the_question.
2_To______or_not_______that's__the_question.
3_To_been_or_____to_bend____is_the_question.

So now you can easily see that all texts begin with 'To' and then have 
'or' and end with 'the question.' Now you can also see where some texts 
have more words or less words or different words than the other texts. 
All the algorithm must do is figure the common words and then output 
them lined up vertically so one can visually scan and see the 
differences easily.

It's easier to see with spaces, but this is the basic idea.

Hershel



More information about the thelist mailing list