I wrote the following script to diff the outputs of two directores with all the same files in them as such:
#!/bin/bash
for file in `find . -name "*.csv"`
do
echo "file = $file";
diff $file /some/other/path/$file;
read char;
done
I know there are other ways to achieve this. Curiously though, this script fails when the files have spaces in them. How can I deal with this?
Example output of find:
./zQuery - abc - Do Not Prompt for Date.csv
ANSWER:-
Short answer (closest to your answer, but handles spaces)OIFS="$IFS" IFS=$'\n' for file in `find . -type f -name "*.csv"` do echo "file = $file" diff "$file" "/some/other/path/$file" read line done IFS="$OIFS"Better answer (also handles wildcards and newlines in file names)find . -type f -name "*.csv" -print0 | while IFS= read -r -d '' file; do echo "file = $file" diff "$file" "/some/other/path/$file" read line </dev/tty doneBest answer (based on Gilles' answer)find . -type f -name '*.csv' -exec sh -c ' file="$0" echo "$file" diff "$file" "/some/other/path/$file" read line </dev/tty ' {} ';'Or even better, to avoid running oneshper file:find . -type f -name '*.csv' -exec sh -c ' for file do echo "$file" diff "$file" "/some/other/path/$file" read line </dev/tty done ' sh {} +
Long answerYou have three problems:
- By default, the shell splits the output of a command on spaces, tabs, and newlines
- Filenames could contain wildcard characters which would get expanded
- What if there is a directory whose name ends in
*.csv?1. Splitting only on newlinesTo figure out what to setfileto, the shell has to take the output offindand interpret it somehow, otherwisefilewould just be the entire output offind.The shell reads theIFSvariable, which is which is set to<space><tab><newline>by default.Then it looks at each character in the output offind. As soon as it sees any character that's inIFS, it thinks that marks the end of the file name, so it setsfileto whatever characters it saw until now and runs the loop. Then it starts where it left off to get the next file name, and runs the next loop, etc., until it reaches the end of output.So it's effectively doing this:for file in "zquery" "-" "abc" ...To tell it to only split the input on newlines, you need to doIFS=$'\n'before yourfor ... findcommand.That setsIFSto a single newline, so it only splits on newlines, and not spaces and tabs as well.If you are usingshordashinstead ofksh93,bashorzsh, you need to writeIFS=$'\n'like this instead:IFS=' 'That is probably enough to get your script working, but if you're interested to handle some other corner cases properly, read on...2. Expanding$filewithout wildcardsInside the loop where you dodiff $file /some/other/path/$filethe shell tries to expand$file(again!).It could contain spaces, but since we already setIFSabove, that won't be a problem here.But it could also contain wildcard characters such as*or?, which would lead to unpredictable behavior. (Thanks to Gilles for pointing this out.)To tell the shell not to expand wildcard characters, put the variable inside double quotes, e.g.diff "$file" "/some/other/path/$file"The same problem could also bite us infor file in `find . -name "*.csv"`For example, if you had these three filesfile1.csv file2.csv *.csv(very unlikely, but still possible)It would be as if you had runfor file in file1.csv file2.csv *.csvwhich will get expanded tofor file in file1.csv file2.csv *.csv file1.csv file2.csvcausingfile1.csvandfile2.csvto be processed twice.Instead, we have to dofind . -name "*.csv" -print | while IFS= read -r file; do echo "file = $file" diff "$file" "/some/other/path/$file" read line </dev/tty donereadreads lines from standard input, splits the line into words according toIFSand stores them in the variable names that you specify.Here, we're telling it not to split the line into words, and to store the line in$file.Also note thatread linehas changed toread line </dev/tty.This is because inside the loop, standard input is coming fromfindvia the pipeline.If we just didread, it would be consuming part or all of a file name, and some files would be skipped./dev/ttyis the terminal where the user is running the script from. Note that this will cause an error if the script is run via cron, but I assume this is not important in this case.Then, what if a file name contains newlines?We can handle that by changing-print0and usingread -d ''on the end of a pipeline:find . -name "*.csv" -print0 | while IFS= read -r -d '' file; do echo "file = $file" diff "$file" "/some/other/path/$file" read char </dev/tty doneThis makesfindput a null byte at the end of each file name. Null bytes are the only characters not allowed in file names, so this should handle all possible file names, no matter how weird.To get the file name on the other side, we useIFS= read -r -d ''.Where we usedreadabove, we used the default line delimiter of newline, but now,findis using null as the line delimiter. Inbash, you can't pass a NUL character in an argument to a command (even builtin ones), butbashunderstands-d ''as meaning NUL delimited. So we use-d ''to makereaduse the same line delimiter asfind. Note that-d $'\0', incidentally, works as well, becausebashnot supporting NUL bytes treats it as the empty string.To be correct, we also add-r, which says don't handle backslashes in file names specially. For example, without-r,\<newline>are removed, and\nis converted inton.A more portable way of writing this that doesn't requirebashorzshor remembering all the above rules about null bytes (again, thanks to Gilles):find . -name '*.csv' -exec sh -c ' file="$0" echo "$file" diff "$file" "/some/other/path/$file" read char </dev/tty ' {} ';'3. Skipping directories whose names end in *.csvfind . -name "*.csv"will also match directories that are calledsomething.csv.To avoid this, add-type fto thefindcommand.find . -type f -name '*.csv' -exec sh -c ' file="$0" echo "$file" diff "$file" "/some/other/path/$file" read line </dev/tty ' {} ';'As glenn jackman points out, in both of these examples, the commands to execute for each file are being run in a subshell, so if you change any variables inside the loop, they will be forgotten.If you need to set variables and have them still set at the end of the loop, you can rewrite it to use process substitution like this:i=0 while IFS= read -r -d '' file; do echo "file = $file" diff "$file" "/some/other/path/$file" read line </dev/tty i=$((i+1)) done < <(find . -type f -name '*.csv' -print0) echo "$i files processed"Note that if you try copying and pasting this at the command line,read linewill consume theecho "$i files processed", so that command won't get run.To avoid this, you could removeread line </dev/ttyand send the result to a pager likeless.
NOTESI removed the semi-colons (;) inside the loop. You can put them back if you want, but they are not needed.These days,$(command)is more common than`command`. This is mainly because it's easier to write$(command1 $(command2))than`command1 \`command2\``.read chardoesn't really read a character. It reads a whole line so I changed it toread line.
0 comments:
Post a Comment
Don't Forget to comment