chhotii: (caffeine)
[personal profile] chhotii

UNIX Geek Of The Day bragging rights to whoever can propose the best solution to this problem: Identify all text files in the current directory which contain the string "jpg" on two consecutive lines. Using the typical command-line tools such as grep, not by writing a program. Extra points if the location (line number) of these consecutive lines of interest are output.

Posted via LiveJournal app for Android.

Date: 2013-12-10 08:25 pm (UTC)
ceo: (blueshirt)
From: [personal profile] ceo
With GNU grep, the -P option enables Perl regular expressions, which handle newlines better than plain ol'regexes do.

grep -Pn '.*jpg.*\n.*jpg.*' * appears to work.

Date: 2013-12-11 12:25 am (UTC)
From: [identity profile] chhotii.livejournal.com
Whoa, clever! Thanks!

Date: 2013-12-11 02:16 pm (UTC)
From: [identity profile] achinhibitor.livejournal.com
[edited for correctness]

Given that grep is defined to filter lines, and lines by definition do not contain regexps newlines, is there any guarantee that this works in general?

Also, what does it do if there are three jpg lines in sequence? There are *two* matches that should be reported.
Edited Date: 2013-12-12 08:10 pm (UTC)

Date: 2013-12-11 01:21 am (UTC)
ext_106590: (waffle off)
From: [identity profile] frobzwiththingz.livejournal.com
Ok, Chip's got a good answer... Here's a truly horrible one:

for I in *; do test -f "$I" && awk ' BEGIN { last = 0 ; status = 1 } { if (/jpg/) { if (last == 1) { print $0, "line number ", NR ; status = 0 } else last = 1 } else last=0 } END { if (status == 1) exit 1 }' < "$I" && echo " file $I matches"; done

Don't use this. There's *got* to be a horrible bug in there somewhere. Unless you hate your workplace. In that case, make sure it gets into some important automated build process.

Date: 2013-12-11 03:18 am (UTC)
From: [identity profile] whitebird.livejournal.com
You are a nicely evil person. :)

Date: 2013-12-11 10:22 am (UTC)
From: [identity profile] chhotii.livejournal.com
So totally NOT what I was looking for.

Date: 2013-12-11 02:11 pm (UTC)
From: [identity profile] achinhibitor.livejournal.com
I can do this without using anything that is programmable (like sed or awk):

for FILE in *
do
    paste <(head --lines=-1 $FILE) \
         <(while read X; do echo Throat-warbler-mangrove ; done <$FILE) \
         <(tail --lines=+2 $FILE) |
    grep -n $'jpg.*\tThroat-warbler-mangrove\t.*jpg' &&
    echo $FILE
done
Edited Date: 2013-12-11 02:12 pm (UTC)

Profile

chhotii: (Default)
chhotii

July 2023

S M T W T F S
      1
2345678
9101112131415
16 171819202122
23 242526272829
3031     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 1st, 2025 06:27 pm
Powered by Dreamwidth Studios