[thelist] Text Comparison
Hershel Robinson
hershel at galleryrobinson.com
Thu Jun 8 13:30:15 CDT 2006
Peter Brunone (EasyListBox.com) wrote:
> I'm not even sure I understand the requirement. Users are going to enter sentences without spaces and you're going to split them up into their component words? That doesn't make sense, so I'm pretty sure I don't get it.
>
> What's the final goal here?
Let me clarify. This is a tool to compare various texts of ancient
manuscripts. So the idea is to be able to easily see which texts have
extra words or are missing words, compared to other texts.
Perhaps the spaces didn't render correctly in everyone's mail client.
Let's say this is the input:
1 To be or not to be that is the question.
2 To or not that's the question.
3 To been or to bend is the question.
The output is like this, but I will use _ instead of spaces:
1_To___be_or_not_to_be_that_is_the_question.
2_To______or_not_______that's__the_question.
3_To_been_or_____to_bend____is_the_question.
So now you can easily see that all texts begin with 'To' and then have
'or' and end with 'the question.' Now you can also see where some texts
have more words or less words or different words than the other texts.
All the algorithm must do is figure the common words and then output
them lined up vertically so one can visually scan and see the
differences easily.
It's easier to see with spaces, but this is the basic idea.
Hershel
--
Gallery Robinson Web Services
http://web.galleryrobinson.com/
More information about the thelist
mailing list