Unique visitors in apache logs


I have an apache webserver connected to internet and I want to know the number of unique visitors that it receives.
I want to differentiate the visitors by its source IP address.


By default the Apache log access files have the following sintaxis:

IP  RFC1413_identity userid [date] HTTP_request HTTP_code_response Downloaded_bytes

You can configure this output following the instructions given by Apache:

This is a piece of my apache access log: - - [15/Feb/2014:10:04:03 +0100] "GET /datamatrixes/1 HTTP/1.1" 200 5335 - - [15/Feb/2014:10:09:41 +0100] "GET /preprocessors/5 HTTP/1.1" 200 29680 - - [15/Feb/2014:23:10:30 +0100] "GET /robots.txt HTTP/1.1" 200 102 - - [15/Feb/2014:23:11:50 +0100] "GET /datamatrixes?page=1&size=25 HTTP/1.1" 200 10731

I can get the list of IPs by getting the first column when using a whitespace as separator:

cat webgenekfca.com-access_log  | cut -d ' ' -f 1

or with awk:

cat webgenekfca.com-access_log  | awk -F" " '{print $1}'

But, there are many duplicate IP address, so I can filter them out with the command uniq:

cat webgenekfca.com-access_log  | cut -d ' ' -f 1 | sort | uniq

Finally, if I just want to count the number of different IPs I use wc:

cat webgenekfca.com-access_log  | cut -d ' ' -f 1 | sort | uniq | wc -l

I know that there are better ways to estimate the number of users, the use of cookies is very helpful because it allows to track down the user behaviour even if it changes his IP. This post simply tries to give an use example of awk,cut,uniq and wc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s