Okay, so on a work computer I have a file directory similar to this:
STATES |_ ALABAMA |_ ALASKA . . . |_ WYOMING
In each folder there are large numbers of "proprietary" binary files, with each file having large amounts of ASCII text inside. The file is named like so: DATA - CITY1.stl. In my example, this will be "Joe Smith - Detroit.stl"
The inside of a file would be similar to so: ...header... "NAME"="Joe Smith" "CITY2"="Detroit" "PHONE"="3015551212" "IMAGE"=encoded monochrome image ...other fields... ...footer...
I'd like a script/app that goes through the folder, including all subdirectories, extracts the DATA, CITY1, NAME, CITY2 and PHONE fields from each file, and concatenates it all into a CSV file where a sample CSV would look like so:
DATA,CITY1,NAME,CITY2,PHONE Joe Smith,Detroit,Joe Smith,Detroit,3015551212
The character combination to parse the filename will always be " - ". That's right, whitespace-hyphen-whitespace.
grep should do the trick, as would a Perl script, but it's been a good decade since I've dealt with either.
Anyone kind enough to save me some time? Once I have the feel for it, I should be able to fill in the particular ASCII to hunt for, as far as file data goes, myself.
I _can_ do it myself, but it will just take longer that way. I have a feeling someone here should be able to code this without even realizing it.
For bonus points, insert another field in the CSV that displays "% similar" statistic in comparing CITY1 and CITY2 fields.
In return, you get a tasty pudding snack.
- Stiletto
Edited by Stiletto (03/29/07 08:47 AM)
|