"; */ ?>

linux


22
Mar 07

Convert TABs to spaces using shell scripting

The purpose is simple and clear – to archive a “indentational independence“, when indentation is the same across all platforms, editors and systems.

Here is the simple shell script that takes the file as the command line argument and substitute all TABs with spaces:

#!/bin/sh

#
#   Script converts TABs to spaces in a given file
#

if [ $# -ne 1 ]; then
     echo 1>&2 "This script converts TABs to spaces in a given file
                \\n\\n\\tUsage: $0 "
     exit 127
fi

expand $1 > temp && mv temp $1

The above script shows how that it is all done by virtue of the “expand” command that does just that ;)

And no need to worry about sed‘s s/tab/space/g pattern, because “expand” does exactly what we need ;)


22
Mar 07

Golden AWK tips to analyze a server log

Here are several of 100% working, simple, and yet very powerful TIPs on how AWK can be used to parse/process the server log (apache’s “access.log”) to return the data about website visitors and other useful statistics:

1. Find the number of total unique visitors:

cat access.log | awk '{print $1}' | sort | uniq -c | wc -l

2. Find the number of unique visitors today:

cat access.log | grep `date '+%e/%b/%G'` | awk '{print $1}' | sort | uniq -c | wc -l

3. Find the number of unique visitors this month:

cat access.log | grep `date '+%b/%G'` | awk '{print $1}' | sort | uniq -c | wc -l

4. Find the number of unique visitors on arbitrary date – for example March 22nd of 2007:

cat access.log | grep 22/Mar/2007 | awk '{print $1}' | sort | uniq -c | wc -l

5. (based on #3) Find the number of unique visitors for the month of March:

cat access.log | grep Mar/2007 | awk '{print $1}' | sort | uniq -c | wc -l

6. Show the sorted statistics of “number of visits/requests” “visitor’s IP address”:

cat access.log | awk '{print "requests from " $1}' | sort | uniq -c | sort

    for better understanding here is a snapshot/part of the result:
    …. …. …. …. …. ….
    4217 requests from 68.25.101.134
    4374 requests from 78.14.245.20
    4601 requests from 71.222.80.119
    4829 requests from 92.73.226.209
    4892 requests from 70.45.131.7
    5003 requests from 214.178.52.97
    5294 requests from 129.21.217.229
    7249 requests from 68.32.32.64
    15739 requests from 68.46.26.47
    16105 requests from 61.299.15.129
    29140 requests from 68.208.154.18
    196452 requests from 139.21.68.20
    603581 requests from 102.78.30.12

7. Similarly by adding “grep date”, as in above tips, the same statistics will be produces for “that” date:

cat access.log | grep 26/Mar/2007 | awk '{print "requests from " $1}' | sort | uniq -c | sort


Below is the “very general” about the above: :)
There is a certain point in the life of any website owner when he/she stops in the middle of the street/hall/train/plane puzzled with a question: “hm… how many creatures did actually see my work?”, or even “how many creatures did see my work today?” The answer lies in front of the eyes, we just often do not see it – due to the simple fact that

we are sure we see everything ;)

but it is there… there, in a server log.

There are tons of log analyzer software out there, just go to sourceforge.net and look for one – you’ll see plenty (good example is awstats), but what if something very simple needed, and what if installing all other software is overhead, and most of the cases, another security risk? Then we go to Google, find this blog and keep reading :)

There is a very powerful, yet not well used (known?) language/tool as AWK which is there on almost any Linux/Unix. The purpose is simple:

“AWK is a general purpose programming language that is designed for processing text-based data, either in files or data streams.”