Many of the content sites (blogs, news sites) that you see these days have a specific url for each page. Eg. News.com
Many of the sites have something like /news/75-news-title.html. You can’t have a actual distinct page for each content. The most common solution to this is URL Rewrite the content.
The idea is to grab the specific content from the URL and then map that content, id, or whatever from the url as a get parameter’s value to a specific page.
In the above scenario, check the URL: /news/75-news-title.html. What is commonly done is the content_id the key by which the content is mapped in the content table is placed in the URL along with the title.
As in this case Content ID: 75 Title Text: news-title (The hyphens are to make it readable instead of %20 for space)
So lets assume that we have a page called news.php in which we will give a get parameter as newsid. All we got to do now is write the URL Rewrite rule using Apache’s mod_rewrite engine.
First we used the Server Variable REQUEST_URI, to match the pattern with the request. The variable is referenced using %{SERVER_VARIABLE} format.
The RewriteCond is basically a If condition, which means if the condition is true, only then the condition or rules below that statement will be executed. That means the pattern should match for the rule to work
The regular expression pattern we made was accepting a integer value after /news/, After that integer value any text can come. But should end with .html. As emphasized by the $ at the end.
Now if the Condition works, we need to write the rule for it, so we use RewriteRule. The first argument is .*, which means accept any URL
The second argument is the actual mapping of the news.php with the newsid parameter. Check that we’ve used %1 which means the first back reference of the RewriteCond regex pattern
Since our pattern was /news/([0-9]+).*\.html$ and had just one class in it, that class i.e. ([0-9]+) should be referenced by %1 in the RewriteRule directive
Long before anything like web.config or web.xml was used/invented, Apache had this wonderful file “.htaccess”
This file as you would expect, is a file to control the Web Application’s behaviour. The possibilities with this file are endless… from Password Protected Directories to Complex URL Rewrites, All can be done using this file.
.htaccess
The file’s extension is “htaccess” and has no initial filename. This comes from the *nix’s legacy system of having all the hidden files starting with a period “.”
This file could be placed in any directory of your web application. Lets say your DocumentRoot is /domains/ruturaj.net. Now if you place the .htaccess file in the main DocumentRoot, Any configurations that are present in the .htaccess file are available in all the subfolders of ruturaj.net
So if I put the fol. code in the .htaccess file,
DirectoryIndex rutu-default.php
All the sub directories or folders in directory ruturaj.net will have rutu-default.php page as the default index page.
But to ensure that the .htaccess file is read and implemented, you need to tell Apache.
To tell Apache which is the standard Configuration file, you need to modify the entry in the httpd.conf file. AccessFileName is the parameter which specifies which file is the “.htaccess” file, by default, the value of the parameter is set to “.htaccess”
AccessFileName .htaccess
There is also another parameter, AllowOverride, which tells Apache whether to read and implement the AccessFileName. You need to make the foll. settings in your VirtualHost or Directory mapping as
AllowOverride All
This will enable the implementation of the .htaccess file.
You really want to analyze your source of traffic. Most of the times you install, use some of the free softwares available on the net. But If you are a programmer… You will want to know how to track these visitors, Search engine keywords, etc…
Here I’ll be showing the programmer’s point of view to develop a solution.
To track most of the important aspects of search engine referals, are the HTTP_REFERER and the HTTP_USER_AGENT variables.
I’m assuming you have Apache as the web server and PHP as the scripting language with my favourite MySQL as the database server.
There are two ways that you can track the above content
Apache access logs
Database logging
Keyword Hits
So the final result would be like
Keyword
Hit Count
keyowrd 1
100
keyowrd 2
70
keyowrd 3
60
Search Engine Referers
Search Engine hit counts
Search Engine
Ref. Count
Google
100
Yahoo
70
MSN
60
In this tutorial, I’ll be focussing on the MySQL logging. So lets begin with it.
You can make a domain run on a different port than 80, which is the default port of HTTP, in the previous examples of VirtualHost Configurations, I’ven’t specified the port, which implicitly is 80.
If you want to run the website on a different port, you need to make sure Apache is listning on that port. To do that, you set a directive Listen
Listen 8080
Alternatively you can also specify the IP on which it should listen.
Listen 67.66.65.64:8080
Now if you want to run a Name-based VirtualHost on a specific, you make sure that you set the NameVirtualHost directive to a specific port as well.
NameVirtualHost 67.66.65.64:8080
Once you’ve set the NameVirtualHost, you need to set the actual VirtualHost configuration as well.
there is just once change to be made…
Important: You should note that all the domains, ruturaj.net, www.ruturaj.net, yourname.com, should always resolve an IP address on which NameVirtualHost is defined. Without which, the configuration does not make any sense.
VirtualHosts
The most important part of setting Apache is setting the hosts, or VirtualHosts. The term “VirtualHost” comes from the fact that one single host or comptuer is hosting many hostnames. Apache was the one to start of with this type of hosting, in this Apache picks up the Host header from a standard HTTP request to translate the website associated for that host. This type of hosting is known as the Name-based virtual hosting, which is the most common of all the hosting types. The other one is the IP-based hosting which requires each domain to have a separate IP.
What I will show you is how to set up a name based virtualhost.
Now, A simple GET request for my page root would be as
GET / HTTP/1.1
Host: www.ruturaj.net
Now apache picks up “www.ruturaj.net” from the request header and then translates it to the virtual host that is mapped to www.ruturaj.net
Lets assume you have an IP 67.66.65.64, that you need to set up for virtual hosting, then first, you need to tell Apache that this IP is used for Namebased Virtual hosting.
NameVirtualHost 67.66.65.64
Now that you have done with setting the IP for virtual hosting, you need to configure the VirtualHosts.
Let us take ruturaj.net as the domain that needs to be set. So here it goes
ServerName: this is the main servername, it should be domain name
ServerAlias: this is an alias, eg www.ruturaj.net should mean same as ruturaj.net on HTTP
You can set anything like default.ruturaj.net as well. Just make sure that default.ruturaj.net points to 67.66.65.64
DocumentRoot: This is the main directory that points to ruturaj.net domain, this is the file system path to the directory
CustomLog: This is the access_log for ruturaj.net, remember, we’d set the variable of “combined” log format, we are useing it here, if you want a different format, you can specify the LogFormat before specifying the CustomLog directive
ErrorLog: Any errors while serving are logged in this file
DirectoryIndex: Defines the default document page for root, eg when you do http://ruturaj.net/ it tells the server to serve “index.php”, so you can set it whatever you want default-page.html, default.pl, etc.
ServerAdmin: Just specify the email address, this would show up, when there is any server error.
So now if you want to add a configuration for host “johnsmith.com”…
The httpd.conf file is the main configuration file of Apache. It rests in “apache-install-dir/conf”
Now lets take a look at some important and useful parameters
ServerName
This is param sets the default server name, it should generally be the FQDN or the Fully Qualified Domain Name of the machine, or the IP, if the machine doesn’t have any FQDN.
Directory
This is a setting which encloses any of the settings for the given directory. So you specify the physical directory as the argument. So if you have a directory as /websites/mywebsite/somedir, you would do the following.
<Directory /websites/mywebsite/somedir>
... your settings
...
</Directory>
AllowOverride
The AllowOverride allows the user, to override some of the settings by using their own file. This own file is the magical .htaccess file. By default it is set to None, which means the user can’t override the settings by specifying the .htaccess file in the directory. But you can change the AllowOverride None setting to AllowOverride All
Options
This directive takes several options, I’ll explain some them, Indexes: This allows a directory listing. U must have come accross something like this
FollowSymLinks: This allows apache to follow symbolic links, symbolic links are nothing but links in *nix systems, eg. “files” in /etc/ can point to /files/myfiles/files
You can use both these options at once by
Options +Indexes -FollowSymLinks
The above setting will allow directory listing but won’t allow Symbolic links. So “+” to apply and “-” to remove the setting
AccessFileName
I talked about the magic file .htaccess, This is the place where you specify the name of the file, By default it is “.htaccess”
The . “period” start is to make it a hidden file in *nix systems
Denying files
To deny files over the web, is the job of the server, in apache, we can do exactly by using the Files directive.
<Files ~ "^\.ht">
Order allow,deny
Deny from all
Satisfy All
</Files>
Note the ~ sign, this is used when you are giving a regular expression to match the files., Once the files are selected, they can be denied by using the Deny directive.
The above regex is to deny all the files that start with a “.ht”
Access Logs
To create access logs, we need to specify the format of the log, and the file path.
First we need to set the LogFormat directive
The most common is the “combined” log, which logs ip, user, time error code, referer and user agent
Note: the log format has been given a name “combined”, feel free to create different formats for your needs and name it accordingly
Then we need to set the filename of the log,
CustomLog /usr/local/apache/logs/access_log common
The second parameter of the CustomLog directive which sets the filename of the log is the log format name, that we defined earlier.
Server-Status
When you want to look at the current status of the server, ie whom is it responding to, what pages is it serving, how many servers are running… and so on..
There is no better way than to set server-status
Check the screen shot of it.
To enable it …
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 192.168.0.84
</Location>
check the configuration, it is allowing only IP 84 to check the stats and others are forbidden. You can set your IP as you wish.
If you want even more info. you can set the Extended status
For guys who have reached here, but still don’t know what httpd is,
Apache is a web server, For all the web pages, websites, blogs, image galleries that are hosted on the web, there needs to be server who “serves” these documents (pages, images, files) to the client (the user’s browser)
Apache got its name from … well… its nothing but a “A patchy server”, httpd apache is an open-source project, which was programmed by many programmers over the world. And everytime a bug-fix, a new feature was required, the main code was just “patched”. And hence it got its name Apache.
Apache being a standard web-browser, runs on port 80, this is the standard HTTP port. Before you begin ahead, let me warn you changing the settings of Apache can change the way a website behaves, and to edit its settings you need root access or Administrator access.
To control apache, you basically need to edit 2 important files “httpd.conf” and “.htaccess”
This is my favourite one, Apache is the best and probably the most popular software on the Web. It runs over any other web-servers with its popularity, support and its stability, scalability and … words are just not enough.