We have our tables in place.
Its time to get with the code.
Lets get all the search engines and their attributes first.
$res_se = mysql_query('select * from search_engines', $conn);
$se = array();
while ($row_se = mysql_fetch_object($res_se)) {
$se['id'][] = $row_se->se_id;
$se['name'][] = $row_se->se_name;
$se['regex'][] = $row_se->se_regex;
}
Lets collect our referers
$sql = 'select view_ref from view_log';
$res = mysql_query($sql, $conn);
$refs = array();
while ($row = mysql_fetch_object($res)) {
$refs[] = $row->view_ref;
}
mysql_free_result($res);
Before we find out the keywords from the Referers, Let me explain how do you form a regular expression to grab out the keyword.
Let us consider someone searches a keyword “ruturaj” on google.com.
So the url where google shows the listing of my page will be
http://www.google.co.in/search?hl=en&q=ruturaj&btnG=Google+Search&meta=
This will also be the referer.
The most important part of this URL is the string “q=ruturaj”, and then the keyword “ruturaj” from that string.
let us start…
| / | Pattern starts.. |
| .*google.* | Allow any charecters around string “google” |
| ?q= | These will be charecters prefixing the keyword |
| ([^&]*) | Start a class, which will allow all characters. But it should not contain any &, which means any other GET query. |
| .* | Allow any trailing characters |
| /i | End the pattern, specifying that is case-insensitive |
| So the final pattern would be… /.*google.*?q=([^&]*).*/i |
|
We have the Search Engine Attributes, and the referers, now is the time to apply the regular expression of the search engines to referers and grab out the keyword term.
$keywords = array();
$keyword = '';
$keyword_count = 0;
$se_cnt = array();
for ($i=0; $i<count($refs); $i++) {
for ($j=0; $j<count($se['id']); $j++) {
if(preg_match($se['regex'][$j], $refs[$i], $matches))
{
$k = strtolower($matches[1]);
if ( !isset($keywords[$k]) ) { //exists....
$keywords[$k] = 1;
} else {
$keywords[$k] += 1;
}
if ( !isset( $se_cnt[$se['name'][$j]] ) ) { //exists....
$se_cnt[$se['name'][$j]] = 1;
} else {
$se_cnt[$se['name'][$j]] += 1;
}
//echo "<p>$refs[$i]<br/><b>{$se['name'][$j]}</b> - " . urldecode($matches[1]) . '</p>';
//echo "<b>{$se['name'][$j]}</b> - " . urldecode($matches[1]) . '<br/>';
break;
}
}
}
Get the array sorted in descending order maintaining their indexes
arsort($keywords); arsort($se_cnt);
Grab the indexes and the values in arrays…
$query_term = array_keys($keywords); $query_term_cnt = array_values($keywords);
Finally strip out the url encoding out of the keyword.
$final = array();
for ($i=0; $i<count($query_term); $i++) {
$final[] = array(urldecode($query_term[$i]), $query_term_cnt[$i]);
}