[thelist] reg exp for last whack

Canfield, Joel JCanfield at PacAdvantage.org
Sun Apr 16 14:14:16 CDT 2006


on won xp pro trying to parse a list of file paths to find dups; about
6,000 files in an extensive directory structure

i have a text dump with one full path and file name per line. my
thinking was to split the path from the filename, then dump it to SQL
and query for dups.

tried this in textpad for a reg exp and it says it's invalid:

    \\\([a-z0-9]*\)\n

what i'm trying to write is

    find a whack (escaped)       \\
    tagged expression opening    \(
    any number of alphanumerics  [a-z0-9]* 
        (textpad is not case sensitive unless you specify)
    tagged expression closing    \)
    newline                      \n

and replace it with

    \t\1\n

a tab, the tagged expression, a newline

i've stripped all non-alphanumerics from the file names, trying to avoid
having to include every possible special character in the reg exp.

here's a sample of the data

C:\music\+SortedByYear\2000\Don Henley Nobody Else in the World But You
mp3
C:\music\+SortedByYear\2000\Don Henley The Genie mp3
C:\music\+SortedByYear\2000\Don Henley They're Not Here, They're Not
Coming mp3
C:\music\+SortedByYear\2000\Don Henley Workin It mp3
C:\music\+SortedByYear\2000\Don Henley Taking you Home mp3
C:\music\+SortedByYear\2000\Don Henley Inside Job mp3
C:\music\10,000 Maniacs\Blind Man's Zoo\Dust Bowl wma
C:\music\10,000 Maniacs\Blind Man's Zoo\Eat for Two wma
C:\music\10,000 Maniacs\Blind Man's Zoo\Hateful Hate wma
C:\music\10,000 Maniacs\Blind Man's Zoo\Headstrong wma
C:\music\10,000 Maniacs\Blind Man's Zoo\Jubilee wma
C:\music\10,000 Maniacs\Blind Man's Zoo\Please Forgive Us wma
C:\music\10,000 Maniacs\Blind Man's Zoo\Poison in the Well wma

i'm stumped; just dug out my mastering reg exp book to finally go
through it; any help in the meantime?

spinhead



More information about the thelist mailing list