Computing: Website and Database Programming

Website visits analysis.


2. Simple visits count.

2.1. What and how to count?
 
Aim of the application: daily counting of the visits on the site (home page), as well as on its major parts: sections of Computing (Computer Basics, Lazarus/Free Pascal Programming, Website and Database Applications) and subsections of Homless Life in Luxembourg. First approach: each time one of these pages is accessed, increment a counter in a MySQL database for this page and the actual date.
Is this really what I wanted? The answer is No! Proceeding this way, I count all hits on the different pages getting values normally lots bigger than the visits number (i.e. number of different visits to a given section). In fact if the visitor goes to Computer Basics, than back to the home page in order to visit the Lazarus page and the same to take a look at the Website and Database section, the count for the home page would be incremented by 3 and this value not really gives an estimate of the number of visits. Thus, second approach: check if the visitor comes from "a level below" the page where the counting is done (e.g. the Lazarus GUI Applications page, when counting visitors for the Lazarus page); increment the counter only if this is not the case.
The PHP variable $_SERVER['HTTP_REFERER'] contains the URL from where the visitor comes, thus checking the "level below" is nothing else than checking if this URL refers to the hosting server and if it contains the URL part of the page where the counting is done (e.g. /computing/website/ for the Website and Database page). Thus, 1. writing a script containing the database code and 2. in the HTML of the site sections main pages, inserting a variable with the URL to compare the referer to and an include instruction for the script - and it's done! Perhaps you'll find my code bizarre. The reason is that I have no idea about PHP! No importance, the script is small and quick and does exactely what I wanted (in fact not exactly: if the visitor goes to the different Database pages, then to the home page and finally returns to the main Database page, the count is incremented by 2. What's not a real problem, as it is not so different from 2 visits at 2 different moments of the day).
2.2. What and how to do?
 
For now, just considering a simple PHP application with MySQL backend (the frontend with statistics display to be created later). I chose to create 2 MySQL tables, the first (web_sites) defining the pages to be monitored and the second (site_hits) containing the daily counts. This is not necessary but more in the sense of how relational databases are organized. Thus, to do:
  • Creating the MySQL database (statistics) and a user (stats) with insert and update access to its tables
  • Filling the web_sites database table
  • Creating the script "site_hits.php" (and place it for example in "/sitestats")
  • Modifying the concerned index.html files (and renaming them to index.php!) by inserting the URL variable and the include instruction
  • In all HTML files, changing the links refering to the now called index.php files
Code preview for the PHP script:
  1. Split the referer URL in order to check if the URL of the page where the count is actually done is part of it
  2. If this is NOT the case, proceed with the increment of the counter in the MySQL table
  3. Read the visits count for the actual URL and the actual date from the database
  4. If the count is 0, insert a new record in the site_hits table
  5. If the count is not 0, add 1 to the counter and update the site_hits table
  6. If the URL refers to a "level below", do nothing (incrementing a second counter here, the sum of the 2 counters would give the total site hits)
2.3. Bypassing the index pages.
 
The script actually does what I expected it to do, but what happens if the visitor bypassses the index pages? Well, nothing at all, no count update is done in this case! This is a real issue: I could imagine that people knowing my site and knowing that all new staff is mentionned on the "What's new?" page, use the links there in preference to passing through the index page. What can I do to also count these visits? The links on the "What's new?" page don't go deeper than 1 level below the index page. Thus transforming the HTML files there to PHP files, including the code to run the site_hits.php script (with e.g. specifying $url = '/computing/lazarus/'; in lazarus_gui.php ... and that's it! However, there's still one case where visitors access the site and counting isn't correctly done: people exclusively interrested in my Homeless site perhaps bookmark this site and always go there without using the main index page. In this case, the count for /sdf/ is correctly updated, but no count increment is done for /. Nothing's perfect in this imperfect world! To get statistics with really real numbers of visitors, visits, site hits, file access... a lots more complicated approach, updating counts based on the visitor's IP address is necessary. And that's not what I intended to do with my "Simple visits count" application!
2.4. Running CGI scripts.
 
The script, as previewed so far, will fail to correctly update the visits count if CGI scripts are executed. That's simply because the referer URL now contains a "/cgi-bin/" chunk and by that the visits count script doesn't anymore be able to see that the page where the count is actually done is part of the referer URL and increments the count, as if a new access to the page would have been done.
Not difficult to resolve this issue. Just check if the referer URL contains a "/cgi-bin/" chunk and don't do the visits count update if the visitor comes form this server and either the actual URL is part of the referer URL or contains the substring "/cgi-bin/".
2.5. SQL and PHP code.
 
You can view the code by opening the tabs below. Or click the following link to download all you need to install this appliation.
a. SQL to create (and fill) the tables.
CREATE TABLE web_sites (
    site_id CHAR(2) NOT NULL,
    site_seqnum TINYINT UNSIGNED NOT NULL,
    site_url VARCHAR(32) NOT NULL,
    site_description VARCHAR(40) NOT NULL,
    PRIMARY KEY (site_id)
);

CREATE TABLE site_hits (
    web_site CHAR(2) NOT NULL,
    access_date DATE NOT NULL,
    access_count SMALLINT UNSIGNED NOT NULL DEFAULT 0,
    PRIMARY KEY (web_site, access_date),
    FOREIGN KEY (web_site)
        REFERENCES web_sites (site_id)
        ON DELETE RESTRICT
        ON UPDATE RESTRICT
);

INSERT INTO web_sites
    VALUES ('AL', 0, '/', '');
INSERT INTO web_sites
    VALUES ('CB', 10, '/computing/basics/', 'Computer basics');
INSERT INTO web_sites
    VALUES ('CL', 20, '/computing/lazarus/', 'Lazarus / Free Pascal programming');
INSERT INTO web_sites
    VALUES ('CW', 30, '/computing/website/', 'Website and database applications');
...
You may wonder what the "site_seqnum" column in the "web_sites" database is for. I included it to explicitely specify a sort order, to get the listings of my counts display scripts sorted exactly as I want.
b. PHP code to be inserted into the index.php files.
<?php

// Change $url variable accordingly to the page for which the count is done
$url = '/computing/website/';
$path = $_SERVER['DOCUMENT_ROOT'] . '/sitestats/site_hits.php';
include($path);

?>
c. The visits counting script (/sitestats/site_hits.php).
<?php

if (!isset($url)) {
    $url = '/';
}
$server = ''; $ref = ''; $cgi = '';
// Extract the server name and the URL, where the visitor comes from, from $_SERVER['HTTP_REFERER'] variable
if (isset($_SERVER['HTTP_REFERER'])) {
    $ref = $_SERVER['HTTP_REFERER'];
    $refchunks = explode('http://', $ref);
    $ref = $refchunks[1];
    $refchunks = preg_split("/\//", $ref, 2);
    $server = $refchunks[0]; $ref = '/' . $refchunks[1];
    $refchunks = str_split($ref, strlen($url));
    $ref = $refchunks[0];
    $refchunks = str_split($ref, 9);
    $cgi = $refchunks[0];
}
// Do database update only if the visitor comes from a URL that is not "situated at a level
// below" the actual URL and that is not a cgi-bin URL on this server
if (!($server == $_SERVER['SERVER_NAME'] and ($ref == $url or $cgi == '/cgi-bin/'))) {
    // Connect to the "statistics" database
    $user = 'stats'; $passwd = 'password';
    $pdo = new PDO('mysql:host=localhost;dbname=statistics', $user, $passwd);
    // Get the site_id for the web site to be counted from the web_sites table
    $sql = "SELECT site_id AS _site FROM web_sites ";
    $sql .= "WHERE site_url = '$url'";
    $sth = $pdo->query($sql);
    $row = $sth->fetch(PDO::FETCH_ASSOC);
    $site = $row['_site'];
    if ($site) {
        // Get the count for the site to be counted at the actual date from the site_hits table
        $date = date("Y") . "-" . date("m") . "-" . date("d");
        $sql = "SELECT access_count AS _count FROM site_hits ";
        $sql .= "WHERE (web_site = '$site') AND (access_date = '$date')";
        $sth = $pdo->query($sql);
        $row = $sth->fetch(PDO::FETCH_ASSOC);
        $count = $row['_count'];
        // If this count is zero (i.e. there is not yet an entry for this site at this date
        // in the table), insert a new record in the site_hits table
        if ($count == 0) {
            $count = 1;
            $sql = "INSERT INTO site_hits ";
            $sql .= "VALUES ('$site', '$date', $count)";
            }
        // If this count is not zero (i.e. there is already an entry for this site at this date
        // in the table), update the record in the site_hits table, incrementing the counter by 1
        else {
            $count++;
            $sql = "UPDATE site_hits ";
            $sql .= "SET access_count = $count ";
            $sql .= "WHERE (web_site = '$site' AND access_date = '$date')";
        }
    }
    $sth = $pdo->query($sql);
}

?>
2.6. Security and error handling.
 
You may have wondered if including username and password in the site_hits.php script wouldn't be a security issue, if doing so couldn't make these data visible to the website's visitor. The answer is NO! In fact, the PHP code itself is never included into a website; the only thing that appears there is the output of the script (and that's nothing at all in this case). So there's no possibility to see your credentials in the webbrowser - so long as the webserver itself is configured correctly. This means first that the directory where the script is stored is configured as "No Indexing", i.e. that you can't list the directory's content (but get an "Error 403 - Forbidden" if you access the directory). A second point is that the directory containing the script must not be accessible via (anonymous) FTP. If there is a security issue to be considered, then it's on the server itself, in the case where other people have direct access to your web folders. As conclusion, you may say that including passwords in PHP scripts doesn't normally constitute any real security problem, even though a better choice would be doing this data in an INI file, stored at a secure place in your home directory (outsite your webserver's htdocs) and reading it in using a database_connect script, included in all PHP scripts that access the database.
The script doesn't include any code to handle errors when connecting to or otherwise accessing the MySQL database. This is not actually needed! In fact, if something goes wrong (e.g. the database server is down), PHP outputs an error message by itself. The script being placed in my "clearfloat" <div>, defined with attributes "height:0; line-height: 0px; font-size: 1px;" there isn't any display in the browser (except a small black line) and the visitor can continue using the site just as if all would work fine (the difference being that the database counts aren't updated, of course).
2.7. Extending the script.
 
It would be easy to make some (useful) extensions to the site_hits.php script. Here some possibilities:
  • As mentionned above, counting total site hits by using a second counter
  • Doing a more detailed counting by considering the URL of pages at one level below the index page
  • Counting per half-day or even per hour instead of per day
2.8. Does this application really do what I wanted?
 
My "visits counts" application is running for several months now, I installed further scripts to display statistics for the last 7 days and the current month, as well as various charts showing the visits counts at the different sections of my website. But do these counts really reflect the visits on my site? No idea, why I didn't realize from the beginning, what I'm aware of now: Statistics, based on the visits counts, as described here, are of limited interest only! In fact, I suppose that a large part of the visitors coming to my website, come there by using a link provided by search engines. This means, they go directly to a particular page (where no counting is done) and even, if from there they go the main or section index, as coming from sub-level URLs is filtered out, there visit will not be recorded in the database at all! There is, of course, the possibility to transform all HTML to PHP pages, and to include the code, that calls /sitestats/site_hits.php, within all pages of the site.
In order to remain simple, but at least doing the counting if visitors navigate from a search engine document to an index page, removing the "not coming from a sub-level page" filter would resolve the issue (partially resolve the issue, because if the visitor only looks at the individual document, there were still no counting done). On the other side, counting 4 visits to the homeless section, if a visitor opens 4 times the homeless section index page to go to the different subsections, is not really useful statistics data either... Decided to let all as is and previewing a completely different approach visits counts application some day: Parsing the Apache access log files, counting access to individual documents and summing up these to get the visits counts for the site's sections and subsections...