All posts by Ruturaj Vartak

Script to make entries

We have our tables in place.
Its time to get with the code.

Lets get all the search engines and their attributes first.

$res_se = mysql_query('select * from search_engines', $conn);
$se = array();
while ($row_se = mysql_fetch_object($res_se)) {
    $se['id'][] = $row_se->se_id;
    $se['name'][] = $row_se->se_name;
    $se['regex'][] = $row_se->se_regex;
}


Lets collect our referers

$sql = 'select view_ref from view_log';
$res = mysql_query($sql, $conn);
$refs = array();
while ($row = mysql_fetch_object($res)) {
    $refs[] = $row->view_ref;
}
mysql_free_result($res);

Before we find out the keywords from the Referers, Let me explain how do you form a regular expression to grab out the keyword.
Let us consider someone searches a keyword “ruturaj” on google.com.
So the url where google shows the listing of my page will be
http://www.google.co.in/search?hl=en&q=ruturaj&btnG=Google+Search&meta=
This will also be the referer.

The most important part of this URL is the string “q=ruturaj”, and then the keyword “ruturaj” from that string.
let us start…

/ Pattern starts..
.*google.* Allow any charecters around string “google”
?q= These will be charecters prefixing the keyword
([^&]*) Start a class, which will allow all characters. But it should not contain any &, which means any other GET query.
.* Allow any trailing characters
/i End the pattern, specifying that is case-insensitive

So the final pattern would be…
/.*google.*?q=([^&]*).*/i

We have the Search Engine Attributes, and the referers, now is the time to apply the regular expression of the search engines to referers and grab out the keyword term.

$keywords = array();
$keyword = '';
$keyword_count = 0;
$se_cnt = array();

for ($i=0; $i<count($refs); $i++) {
   for ($j=0; $j<count($se['id']); $j++) {
       if(preg_match($se['regex'][$j], $refs[$i], $matches))
       {
           $k = strtolower($matches[1]);
           if ( !isset($keywords[$k]) ) { //exists....
               $keywords[$k] = 1;
           } else {
               $keywords[$k] += 1;
           }
           if ( !isset( $se_cnt[$se['name'][$j]] ) ) { //exists....
               $se_cnt[$se['name'][$j]] = 1;
           } else {
               $se_cnt[$se['name'][$j]] += 1;
           }
           //echo "<p>$refs[$i]<br/><b>{$se['name'][$j]}</b> - " . urldecode($matches[1]) . '</p>';
           //echo "<b>{$se['name'][$j]}</b> - " . urldecode($matches[1]) . '<br/>';
           break;
       }
   }
}

Get the array sorted in descending order maintaining their indexes

arsort($keywords);
arsort($se_cnt);

Grab the indexes and the values in arrays…

$query_term = array_keys($keywords);
$query_term_cnt = array_values($keywords);

Finally strip out the url encoding out of the keyword.

$final = array();
for ($i=0; $i<count($query_term); $i++) {
    $final[] = array(urldecode($query_term[$i]), $query_term_cnt[$i]);
}

Creating tables

The first and foremost need is to have access logs, Since in this tutorial, we’ll be logging in MySQL, we need to have a table in which we can maintain a good access log.

For an access log here are the following necessary data fields

  • Accessed URL
  • Referer to the Acessed URL
  • User Agent accessing the document
  • IP of the Client
  • Datetime stamp


Access Log Table
Here is the table view_log

CREATE TABLE `view_log` (
 `view_id` int(11) NOT NULL auto_increment,
 `view_url` varchar(255) NOT NULL default '',
 `view_ref` varchar(255) NOT NULL default '',
 `view_ua` varchar(255) NOT NULL default '',
 `view_ip` varchar(20) NOT NULL default '',
 `view_datetime` datetime NOT NULL default '0000-00-00 00:00:00',
 PRIMARY KEY  (`view_id`)
) TYPE=MyISAM COMMENT='Saves the log of viewing log'

Search Engines Table
Now we need to have a list of the Search Engines that link us to the site. I’ve taken the most popular of them, and the ones that I know..
The most important field that we have in this table is the se_regex, this field stores the regular expression, that will parse out the keyword from the referered URL.

CREATE TABLE `search_engines` (
`se_id` int(11) NOT NULL auto_increment,
`se_name` varchar(255) NOT NULL default '',
`se_regex` varchar(255) NOT NULL default '',
PRIMARY KEY (`se_id`)
) TYPE=MyISAM COMMENT='Search Engines'

Now that we have a table structure in place lets populate some data into it.
I’ll explain the regular expressions for the engines in the next page.

INSERT INTO search_engines VALUES (1,'Google','/.*google.*?q=([^&]*).*/i');
INSERT INTO search_engines VALUES (2,'Yahoo','/.*yahoo.*?p=([^&]*).*/i');
INSERT INTO search_engines VALUES (3,'MSN','/.*msn.*?q=([^&]*).*/i');
INSERT INTO search_engines VALUES (4,'Netscape','/.*netscape.*?search=([^&]*).*/i');
INSERT INTO search_engines VALUES (5,'AOL','/.*aol.*?query=([^&]*).*/i');
INSERT INTO search_engines VALUES (6,'Alexa','/.*alexa.*?q=([^&]*).*/i');
INSERT INTO search_engines VALUES (7,'AltaVista','/.*altavista.*?q=([^&]*).*/i');
INSERT INTO search_engines VALUES (8,'AllTheWeb','/.*alltheweb.*?q=([^&]*).*/i');
INSERT INTO search_engines VALUES (9,'A9','/.*a9.*?search=([^&]*).*/i');
INSERT INTO search_engines VALUES (10,'DMoz','/.*dmoz.*?search=([^&]*).*/i');
INSERT INTO search_engines VALUES (11,'Lycos','/.*lycos.*?query=([^&]*).*/i');
INSERT INTO search_engines VALUES (12,'Terra Lycos','/.*terra.*?query=([^&]*).*/i');
INSERT INTO search_engines VALUES (13,'Alexa','/.*alexa.*?q=([^&]*).*/i');
INSERT INTO search_engines VALUES (14,'Rediff','/.*rediff.*?MT=([^&]*).*/i');

Keyword Statistics Table
This table will hold keyword and its hit counter.

CREATE TABLE `keyword_search_stats` (
`keyword_search_stats` int(11) NOT NULL auto_increment,
`keyword` varchar(255) default NULL,
`keyword_count` int(11) default NULL,
`update` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`keyword_search_stats`),
KEY `keyword_index` (`keyword`)
) TYPE=MyISAM

I’m creating an index on the keyword column, so that it is easier to search the keywords

Search Engine Statistics Table
This table will hold Search Engine and no. of referals from it.

CREATE TABLE `search_engine_stats` (
`search_engine_stats_id` int(11) NOT NULL auto_increment,
`search_engine` varchar(255) default NULL,
`search_engine_count` int(11) NOT NULL default '0',
`update` datetime default NULL,
PRIMARY KEY (`search_engine_stats_id`)
) TYPE=MyISAM

Search Engine Referer Keyword Tracking

You really want to analyze your source of traffic. Most of the times you install, use some of the free softwares available on the net. But If you are a programmer… You will want to know how to track these visitors, Search engine keywords, etc…

Here I’ll be showing the programmer’s point of view to develop a solution.

To track most of the important aspects of search engine referals, are the HTTP_REFERER and the HTTP_USER_AGENT variables.

I’m assuming you have Apache as the web server and PHP as the scripting language with my favourite MySQL as the database server.

There are two ways that you can track the above content

  • Apache access logs
  • Database logging

Keyword Hits
So the final result would be like

Keyword Hit Count
keyowrd 1 100
keyowrd 2 70
keyowrd 3 60

Search Engine Referers
Search Engine hit counts

Search Engine Ref. Count
Google 100
Yahoo 70
MSN 60

In this tutorial, I’ll be focussing on the MySQL logging. So lets begin with it.

Scales and Chords in Scale of C

This entry is part 3 of 41 in the series Guitar

We’ll now play the scale of C, The major scale of C has no sharp or flat notes, it has plain notes. The notes in scale of C are C, D, E, F, G, A, B

In any scale the 3rd and 4th note are separate by half note and same goes with the 7th and 8th note.
So when incrementing the frets, whiel playing the 4th note, just increment the 3rd note by 1 fret.

Let us play them in this fashion.

e||----------------------|----------------------|-------0----1----3----|
B||----------------------|------------0----1----|--3-------------------|
G||----------------------|--0----2--------------|----------------------|
D||-------0----2----3----|----------------------|----------------------|
A||--3-------------------|----------------------|----------------------|
E||----------------------|----------------------|----------------------|

--5----7----8----8----|--7----5----3----1----|--0-------------------|
----------------------|----------------------|-------3----1----0----|
----------------------|----------------------|----------------------|
----------------------|----------------------|----------------------|
----------------------|----------------------|----------------------|
----------------------|----------------------|----------------------|

----------------------|------------||
----------------------|------------||
--2----0--------------|------------||
------------3----2----|--0---------||
----------------------|-------3----||
----------------------|------------||

All the notes have to be played in a down strokes. Play very slowly and at regular interval.

Let every note ring

Practice this 20-30 times till you can play the whole thing easily.

Alternate Picking
Now, that you can play the Note of C easily, let us start with a new way of picking strings… “Alternate Picking”
For alternate picking, every note you play, you pick in the opposite way to which you played the initial note.
ie. Start the C note by playing a down-stroke, the next note is the open String d, this you pick in up-stroke, then the E note you pick it in down-stroke, and so on…

You must grab this picking … So to start with… you do it very slowly, see every note that you pick, and register that note in your mind, and while playing that note try to find out the next note position and the type of stroke that has to be played.

Chords in Scale of C

[Tonic / Root / Key Note]    [Sub-Dominant]    [Dominant]    [Dominant 7th]
C                                  F                 G             G7
|                                  |                 |
Am                                 Dm                Em

Here is how to play each chord. Here the notations are each no. indicates finger no. 1 is the index finger, 3 is the ring finger, etc. The 0 at the begining denotes play open string. and x means do not play the string, or mute it. Each horizontal line is a fret.
C

0---|---|---|---|
|-1-|---|---|---|
|---|-2-|---|---|
0---|---|---|---|
|---|---|-3-|---|
|---|---|-4-|---|

Am

0---|---|---|---|
|-1-|---|---|---|
|---|-2-|---|---|
|---|-3-|---|---|
0---|---|---|---|
0---|---|---|---|

F

|-1-|---|---|---|
|-1-|---|---|---|
|---|-2-|---|---|
|---|---|-3-|---|
|---|---|-4-|---|
x---|---|---|---|

You see 2 notes are to be played by the index or 1st finger, this means you need to bar the first finger, ie fret the e and B strings with 1st finger.
Dm

|-1-|---|---|---|
|---|---|-3-|---|
|---|-2-|---|---|
0---|---|---|---|
0---|---|---|---|
|---|---|---|---|

G

|---|---|-4-|---|
|---|---|-3-|---|
0---|---|---|---|
0---|---|---|---|
|---|-1-|---|---|
|---|---|-2-|---|

Em

0---|---|---|---|
0---|---|---|---|
0---|---|---|---|
|---|-3-|---|---|
|---|-2-|---|---|
0---|---|---|---|

G7

|-1-|---|---|---|
0---|---|---|---|
0---|---|---|---|
0---|---|---|---|
|---|-2-|---|---|
|---|---|-3-|---|

Firefox 1.5 released

Firefox 1.5
The latest version of Firefox, ie Firefox 1.5 (DeerPark) has been released with a new home. Mozilla.com

Here are some of the features in Firefox 1.5

  • Firefox 1.5 provides easier navigation for everyone, including those who are visually or motor-impaired. Firefox is now the first browser to support DHTML accessibility, which enables Web content to be read aloud – even new kinds of graphics-rich content. Users may navigate with keystrokes rather than mouse clicks, reducing the tabbing required to navigate documents such as spreadsheets. Firefox 1.5 is also the first browser to meet government requirements that software be easily accessible to users with physical impairments.
  • Clear Private Data, Protect your privacy with the new Clear Private Data tool. With a single click, you can delete all personal data, including browsing history, cookies, web form entries and passwords.
  • New support for Web Standards including SVG, CSS 2 and CSS 3, and JavaScript 1.6.
  • Improved pop-up blocking
  • Faster browser navigation with improvements to back and forward button performance.
  • Drag and drop reordering for browser tabs.

VirtualHosts little bit more…

You can make a domain run on a different port than 80, which is the default port of HTTP, in the previous examples of VirtualHost Configurations, I’ven’t specified the port, which implicitly is 80.

If you want to run the website on a different port, you need to make sure Apache is listning on that port. To do that, you set a directive Listen

Listen 8080

Alternatively you can also specify the IP on which it should listen.

Listen 67.66.65.64:8080

Now if you want to run a Name-based VirtualHost on a specific, you make sure that you set the NameVirtualHost directive to a specific port as well.

NameVirtualHost 67.66.65.64:8080

Once you’ve set the NameVirtualHost, you need to set the actual VirtualHost configuration as well.
there is just once change to be made…

<VirtualHost 67.66.65.64:8080>
...
</VirtualHost>>

Important: You should note that all the domains, ruturaj.net, www.ruturaj.net, yourname.com, should always resolve an IP address on which NameVirtualHost is defined. Without which, the configuration does not make any sense.

Setting VirtualHosts

VirtualHosts
The most important part of setting Apache is setting the hosts, or VirtualHosts. The term “VirtualHost” comes from the fact that one single host or comptuer is hosting many hostnames. Apache was the one to start of with this type of hosting, in this Apache picks up the Host header from a standard HTTP request to translate the website associated for that host. This type of hosting is known as the Name-based virtual hosting, which is the most common of all the hosting types. The other one is the IP-based hosting which requires each domain to have a separate IP.

What I will show you is how to set up a name based virtualhost.

Now, A simple GET request for my page root would be as

GET / HTTP/1.1
Host: www.ruturaj.net

Now apache picks up “www.ruturaj.net” from the request header and then translates it to the virtual host that is mapped to www.ruturaj.net

Lets assume you have an IP 67.66.65.64, that you need to set up for virtual hosting, then first, you need to tell Apache that this IP is used for Namebased Virtual hosting.

NameVirtualHost 67.66.65.64

Now that you have done with setting the IP for virtual hosting, you need to configure the VirtualHosts.

Let us take ruturaj.net as the domain that needs to be set. So here it goes

<VirtualHost 67.66.65.64>
  ServerName ruturaj.net
  ServerAlias www.ruturaj.net
  DocumentRoot /www/domains/ruturaj.net
  CustomLog logs/ruturaj.net-access_log combined
  ErrorLog logs/ruturaj.net-error_log
  DirectoryIndex index.php
  ServerAdmin ruturaj@ruturaj.net
</VirtualHost>

Now let us review the configurations

  • ServerName: this is the main servername, it should be domain name
  • ServerAlias: this is an alias, eg www.ruturaj.net should mean same as ruturaj.net on HTTP
    You can set anything like default.ruturaj.net as well. Just make sure that default.ruturaj.net points to 67.66.65.64
  • DocumentRoot: This is the main directory that points to ruturaj.net domain, this is the file system path to the directory
  • CustomLog: This is the access_log for ruturaj.net, remember, we’d set the variable of “combined” log format, we are useing it here, if you want a different format, you can specify the LogFormat before specifying the CustomLog directive
  • ErrorLog: Any errors while serving are logged in this file
  • DirectoryIndex: Defines the default document page for root, eg when you do http://ruturaj.net/ it tells the server to serve “index.php”, so you can set it whatever you want default-page.html, default.pl, etc.
  • ServerAdmin: Just specify the email address, this would show up, when there is any server error.

So now if you want to add a configuration for host “johnsmith.com”…

<VirtualHost 67.66.65.64>
  ServerName johnsmith.com
  ServerAlias www.johnsmith.com
  DocumentRoot /www/domains/johnsmith.com
  CustomLog logs/johnsmith.com-access_log combined
  ErrorLog logs/johnsmith.com-error_log
  DirectoryIndex index.php
  ServerAdmin admin@johnsmith.com
</VirtualHost>

The httpd.conf file

The httpd.conf file is the main configuration file of Apache. It rests in “apache-install-dir/conf”

Now lets take a look at some important and useful parameters

ServerName
This is param sets the default server name, it should generally be the FQDN or the Fully Qualified Domain Name of the machine, or the IP, if the machine doesn’t have any FQDN.

Directory
This is a setting which encloses any of the settings for the given directory. So you specify the physical directory as the argument. So if you have a directory as /websites/mywebsite/somedir, you would do the following.

<Directory /websites/mywebsite/somedir>
... your settings
...
</Directory>

AllowOverride
AllowOverride
The AllowOverride allows the user, to override some of the settings by using their own file. This own file is the magical .htaccess file. By default it is set to None, which means the user can’t override the settings by specifying the .htaccess file in the directory. But you can change the AllowOverride None setting to AllowOverride All

Options
This directive takes several options, I’ll explain some them,
Indexes: This allows a directory listing. U must have come accross something like this
Directory Listing

FollowSymLinks: This allows apache to follow symbolic links, symbolic links are nothing but links in *nix systems, eg. “files” in /etc/ can point to /files/myfiles/files
You can use both these options at once by

Options +Indexes -FollowSymLinks

The above setting will allow directory listing but won’t allow Symbolic links. So “+” to apply and “-” to remove the setting

AccessFileName
I talked about the magic file .htaccess, This is the place where you specify the name of the file, By default it is “.htaccess”
The . “period” start is to make it a hidden file in *nix systems

Denying files
To deny files over the web, is the job of the server, in apache, we can do exactly by using the Files directive.

<Files ~ "^\.ht">
    Order allow,deny
    Deny from all
    Satisfy All
</Files>

Note the ~ sign, this is used when you are giving a regular expression to match the files., Once the files are selected, they can be denied by using the Deny directive.
The above regex is to deny all the files that start with a “.ht”

Access Logs
To create access logs, we need to specify the format of the log, and the file path.
First we need to set the LogFormat directive
The most common is the “combined” log, which logs ip, user, time error code, referer and user agent

LogFormat “%h %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-Agent}i\”” combined

Note: the log format has been given a name “combined”, feel free to create different formats for your needs and name it accordingly
Then we need to set the filename of the log,

CustomLog /usr/local/apache/logs/access_log common

The second parameter of the CustomLog directive which sets the filename of the log is the log format name, that we defined earlier.

Server-Status
When you want to look at the current status of the server, ie whom is it responding to, what pages is it serving, how many servers are running… and so on..
There is no better way than to set server-status
Check the screen shot of it.

server-staus

To enable it …

<Location /server-status>
    SetHandler server-status
    Order deny,allow
    Deny from all
    Allow from 192.168.0.84
</Location>

check the configuration, it is allowing only IP 84 to check the stats and others are forbidden. You can set your IP as you wish.
If you want even more info. you can set the Extended status

ExtendedStatus On