Here are several of 100% working, simple, and yet very powerful TIPs on how AWK can be used to parse/process the server log (apache’s “access.log”) to return the data about website visitors and other useful statistics:
1. Find the number of total unique visitors:
cat access.log | awk '{print $1}' | sort | uniq -c | wc -l
2. Find the number of unique visitors today:
cat access.log | grep `date '+%e/%b/%G'` | awk '{print $1}' | sort | uniq -c | wc -l
3. Find the number of unique visitors this month:
cat access.log | grep `date '+%b/%G'` | awk '{print $1}' | sort | uniq -c | wc -l
4. Find the number of unique visitors on arbitrary date – for example March 22nd of 2007:
cat access.log | grep 22/Mar/2007 | awk '{print $1}' | sort | uniq -c | wc -l
5. (based on #3) Find the number of unique visitors for the month of March:
cat access.log | grep Mar/2007 | awk '{print $1}' | sort | uniq -c | wc -l
6. Show the sorted statistics of “number of visits/requests” “visitor’s IP address”:
cat access.log | awk '{print "requests from " $1}' | sort | uniq -c | sort
- for better understanding here is a snapshot/part of the result:
- …. …. …. …. …. ….
4217 requests from 68.25.101.134
4374 requests from 78.14.245.20
4601 requests from 71.222.80.119
4829 requests from 92.73.226.209
4892 requests from 70.45.131.7
5003 requests from 214.178.52.97
5294 requests from 129.21.217.229
7249 requests from 68.32.32.64
15739 requests from 68.46.26.47
16105 requests from 61.299.15.129
29140 requests from 68.208.154.18
196452 requests from 139.21.68.20
603581 requests from 102.78.30.12
7. Similarly by adding “grep date”, as in above tips, the same statistics will be produces for “that” date:
cat access.log | grep 26/Mar/2007 | awk '{print "requests from " $1}' | sort | uniq -c | sort
Below is the “very general” about the above: :)
There is a certain point in the life of any website owner when he/she stops in the middle of the street/hall/train/plane puzzled with a question: “hm… how many creatures did actually see my work?”, or even “how many creatures did see my work today?” The answer lies in front of the eyes, we just often do not see it – due to the simple fact that
we are sure we see everything ;)
but it is there… there, in a server log.
There are tons of log analyzer software out there, just go to sourceforge.net and look for one – you’ll see plenty (good example is awstats), but what if something very simple needed, and what if installing all other software is overhead, and most of the cases, another security risk? Then we go to Google, find this blog and keep reading :)
There is a very powerful, yet not well used (known?) language/tool as AWK which is there on almost any Linux/Unix. The purpose is simple:
“AWK is a general purpose programming language that is designed for processing text-based data, either in files or data streams.”