I will here save strange combinations of commands that has helped me in my daily work
Get all unique dates in a file where the dates apear in the second column (but not on all rows)
awk -F " " '{print $2}' <filename> | egrep "^[0-9]{4}" | sort | uniq
With awk I select the second ($2) column in the file. Columns are separated with space (” “) .
Egrep selects all rows that starts with 4 digits like in “2011-07-07”
The sort command sorts all rows – needed for the uniq command
The uniq command removes all duplicated rows
Get all unique rows that match regexp “<xml-tag>.*</xml-tag>”
egrep "<xml-tag>.*</xml-tag>" /path/to/file | sort | uniq
With egrep I get all rows that matches the regular expression
The sort command sorts all rows returned from egreg which is needed for the uniq command
The uniq command removes all duplicated rows
Get a list of occurrences of unique rows in a gziped textfile based on date (in second column) of rows containing a search string
zgrep search_string filename.gz | awk -F " " '{print $2}' | sort | uniq -c
This will give you a list of dates together with the sum of occurrences of search_string like this:
909 2011-07-01
1608 2011-07-02
1604 2011-07-03
2775 2011-07-04
2765 2011-07-05
1757 2011-07-06
3716 2011-07-07
2785 2011-07-08
1711 2011-07-09
1655 2011-07-10
With zgrep we grep in a gziped file without unzipping it first
With awk we select the second column (in this case a YYYY-MM-DD formated date) on the row
The sort is only needed if the dates do not come in order
The uniq -c gives us the list of occurrences of the uniq dates (grouped together to one row per unique date)
Sum up integrer values in a specific column in a file
awk -F " " '{tot+=$1} END {print tot}' /path/to/the/numbers
Here the values to sum up are in the first column ($1) in the file. The -F ” “ option tells awk to consider a singel space ” “ to be the column separator
Get min/max integer from a file with integers (one per row)
awk -F " " 'value=="" || $5 < value {value=$5} END {print value}' /path/to/file
This will give you the min value of the numbers in the first column in the file. The -F " " option tells awk to consider a singel space " " to be the column separator. To look for the max value just change the <
Create pretty print copies of XML-onliners using xmllint
for f in * ; do xmllint.exe --format "$f" --output "prettyprint/${f%}.df" ; done
This will run all files in current directory through xmllint with --format option and place them as new files in a folder called prettyprint