Unique visitors in apache logs

Problem

I have an apache webserver connected to internet and I want to know the number of unique visitors that it receives.
I want to differentiate the visitors by its source IP address.

Solution

By default the Apache log access files have the following sintaxis:

IP  RFC1413_identity userid [date] HTTP_request HTTP_code_response Downloaded_bytes

You can configure this output following the instructions given by Apache:
https://httpd.apache.org/docs/2.2/logs.html

This is a piece of my apache access log:

180.76.5.196 - - [15/Feb/2014:10:04:03 +0100] "GET /datamatrixes/1 HTTP/1.1" 200 5335
157.55.32.58 - - [15/Feb/2014:10:09:41 +0100] "GET /preprocessors/5 HTTP/1.1" 200 29680
65.55.52.117 - - [15/Feb/2014:23:10:30 +0100] "GET /robots.txt HTTP/1.1" 200 102
65.55.52.117 - - [15/Feb/2014:23:11:50 +0100] "GET /datamatrixes?page=1&size=25 HTTP/1.1" 200 10731

I can get the list of IPs by getting the first column when using a whitespace as separator:

cat webgenekfca.com-access_log  | cut -d ' ' -f 1

or with awk:

cat webgenekfca.com-access_log  | awk -F" " '{print $1}'

But, there are many duplicate IP address, so I can filter them out with the command uniq:

cat webgenekfca.com-access_log  | cut -d ' ' -f 1 | sort | uniq

Finally, if I just want to count the number of different IPs I use wc:

cat webgenekfca.com-access_log  | cut -d ' ' -f 1 | sort | uniq | wc -l

I know that there are better ways to estimate the number of users, the use of cookies is very helpful because it allows to track down the user behaviour even if it changes his IP. This post simply tries to give an use example of awk,cut,uniq and wc.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s