Linux Command Grep

Posted on with tags: linux / software utility

Grep command searches file contents to find lines that contain a pattern. The pattern is described in Regular Expression (regex) language. Here are three youtube video links about Grep and regular expressions.

Ryan’s Tutorials website has a nice introduction to Grep and Regular Expression. Note egrep is a shortcut to grep -E in Ubuntu. The man page of grep describes that “in GNU grep there is no difference in available functionality between basic and extended syntaxes”.

$which egrep
$file /bin/egrep
/bin/egrep: POSIX shell script, ASCII text executable
$vim /bin/egrep  # file content is
exec grep -E "$@"

Here is a very simple grep example:

$grep "Jane Williams" filename.txt

The grep command has many options. A few common ones are:

  • -w : Whole word
  • -i : ignore case
  • -n : line number
  • -B 4 : 4 lines before
  • -A 4 : 4 lines after
  • -C 2 : 2 lines before, 2 lines after

The filename.txt in the above example could be either ./* or ./*.txt.

For example, this command will search current directory and subdirectories.

$grep -winr "Jane Williams" ./

Here are a few more options:

  • -r : recursive
  • -l : file name only
  • -c : count, print a count

You can use pipes to search the results of another command.

$history | grep "git commit"

The grep -P option interprets the pattern as a Perl-compatible regex. The Python and Perl “essentially has the same regex syntax”. So with the -P option turned on, the grep command will accept Python style special characters in pattern. Here is a list of common regex special characters in Python:

  • .(dot) : any char
  • \w : word char ‘A-Za-z0-9_’
  • \d : digit
  • \s : white space
  • * : zero or more
  • + : one or more

Update on 8/28/2020:

Today I want to see how many blog posts I have written this year and compare the number with last year. I want to search date: 2020-, but find grep does not like the character -. A Google search with phrases “grep search dash” finds an stackoverflow post. It turns out that the dash character needs to escape twice like this date: 2020\\-. So the command to tally blog posts becomes like this,

grep -r "date: 2020\\-" | wc -l
grep -r "date: 2019\\-" | wc -l

The result is 26 v.s. 26. The 2019 only covers 9 months starting from 4/1 and 2020 covers 8 months so far, so it is close.