Unique visitors in apache logs


I have an apache webserver connected to internet and I want to know the number of unique visitors that it receives.
I want to differentiate the visitors by its source IP address.


By default the Apache log access files have the following sintaxis:

IP  RFC1413_identity userid [date] HTTP_request HTTP_code_response Downloaded_bytes

You can configure this output following the instructions given by Apache:

This is a piece of my apache access log: - - [15/Feb/2014:10:04:03 +0100] "GET /datamatrixes/1 HTTP/1.1" 200 5335 - - [15/Feb/2014:10:09:41 +0100] "GET /preprocessors/5 HTTP/1.1" 200 29680 - - [15/Feb/2014:23:10:30 +0100] "GET /robots.txt HTTP/1.1" 200 102 - - [15/Feb/2014:23:11:50 +0100] "GET /datamatrixes?page=1&size=25 HTTP/1.1" 200 10731

I can get the list of IPs by getting the first column when using a whitespace as separator:

cat webgenekfca.com-access_log  | cut -d ' ' -f 1

or with awk:

cat webgenekfca.com-access_log  | awk -F" " '{print $1}'

But, there are many duplicate IP address, so I can filter them out with the command uniq:

cat webgenekfca.com-access_log  | cut -d ' ' -f 1 | sort | uniq

Finally, if I just want to count the number of different IPs I use wc:

cat webgenekfca.com-access_log  | cut -d ' ' -f 1 | sort | uniq | wc -l

I know that there are better ways to estimate the number of users, the use of cookies is very helpful because it allows to track down the user behaviour even if it changes his IP. This post simply tries to give an use example of awk,cut,uniq and wc.

