Category Archives: Web

on the Web

Flickr

flickr.com

twitter

Picasa

Picasa

Hiphop PHP

Facebook has been working a lot, Cassandra, Scribe, u name it. The new kid from Facebook is Hiphop PHP which is an automated system that converts PHP to C++

HipHop programmatically transforms your PHP source code into highly optimized C++ and then uses g++ to compile it.

Read More here: HipHop

Redis, Memcached, Tokyo Tyrant and MySQL comparision

I wanted to compare the following DBs, NoSQLs and caching solutions for speed and connections. Tested the following

My test had the following criteria

  • 2 client boxes
  • All clients connecting to the server using Python
  • Used Python’s threads to create concurrency
  • Each thread made 10,000 open-close connections to the server
  • The server was
    • Intel(R) Pentium(R) D CPU 3.00GHz
    • Fedora 10 32bit
    • Intel(R) Pentium(R) D CPU 3.00GHz
    • 2.6.27.38-170.2.113.fc10.i686 #1 SMP
    • 1GB RAM
  • Used a md5 as key and a value that was saved
  • Created an index on the key column of the table
  • Each server had SET and GET requests as a different test at same concurrency

Results please !

Work sheet

throughput set

throughput get

I wanted to simulate a situation where I had 2 servers (clients) serving my code, which connected to the 1 server (memcached, redis, or whatever). Another thing to note was that I used Python as the client in all the tests, definately the tests would give a different output had I used PHP. Again the test was done to check how well the clients could make and break the connections to the server, and I wanted the overall throughput after making and breaking the connections. I did not monitor the response times. I didnt change absolutely any parameters for the servers, eg didn’t change the innodb_buffer_pool_size or key_buffer_size.

MySQL

MySQL lacked the whole scene terribly, I monitored the MySQL server via the MySQL Administrator and found that hardly there were any conncurrent inserts or selects, I could see the unauthenticated users, which meant that the client had connected to MySQL and was doing a handshake using MySQL authentication (using username and password). As you could see I didn’t even perform the 40 and 60 thread tests.

I truncated the table before I swtiched my tests from MyISAM to InnoDB. And always started the tests from lesser threads. My table was as follows

CREATE TABLE `comp_dump` (
  `k` char(32) DEFAULT NULL,
  `v` char(32) DEFAULT NULL,
  KEY `ix_k` (`k`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

NoSQL

For Tokyo Tyrant I used a file.tch as the DB, which is a hash database. I also tried MongoDB as u may find if u have opened the worksheet, But the server kept failing or actually the mongod failed after coming at an unhandled Exception. I found something similar over here. I tried 1.0.1, 1.1.3 and the available Nightly build, but all failed and I lost my patience.

Now what

If you need speed just to fetch a data for a given combination or key, Redis is a solution that you need to look at. MySQL can no way compare to Redis and Memcache. If you find Memcache good enough, you may want to look at Tokyo Tyrant as it does a synchronous writes. But you need to check for your application which server/combination suits you the best. In Marathi there is a saying “मेल्या शिवाय स्वर्ग दिसत नाही”, which means “You can’t see heaven without dieing” or need to do your hard work, can’t escape that 😉

I’ve attached the source code used to test, if anybody has any doubts, questions feel free to ask

How sessions work in PHP

HTTP is a stateless protocol. Which means that every request the browser makes to the server cant be identified by the server as a subsequent request of that user/IP/browser or a brand new request.

HTTP doesn’t understand who is requesting. So how do sessions manage to make HTTP look intelligent? The Answer lies in the request-response model with data.


When a normal request is made, eg my website, the minimalistic data passed by the client/browser is this

GET / HTTP/1.1
Host: ruturaj.net

The server responds by giving the output. But when a developer does a session_start();, What actually happens is, the PHP engine sets a PHPSESSID cookie. This data is sent from the Server as Set-Cookie header. So the response goes somewhat like this

HTTP/1.x 200 OK
Date: xxxx
Set-Cookie: PHPSESSID=<32charhexvalue>; expires=xxxx
...

Now considering the browser does accept the cookies, it saves the PHPSESSID cookie. Consequently the server also creates a file in the specified directory (by default on Linux as /tmp) as /tmp/sess_32charid.

Now when another request is made by the user/browser, the Cookie header is passed through the GET request back to the server, something like this…

GET /session2.php HTTP/1.1
Host: ruturaj.net
Cookie: PHPSESSID=<32charid>; othercookies=othervalues;

The session2.php, for example, is setting a value of name in session, by this

$_SESSION['name'] = $name_obtained_from_somewhere;

Now as the script finishes, the script flushes all the $_SESSION data into the /tmp/sess_32charid file associated to that session id. It saves all the data in the serialized format

Consider the browser makes another request to session3.php where $_SESSION['name'] is echoed. Now when the request is made, just like previous case, the PHPSESSID is passed in the cookie.

Now as mandated by php.net, that every page where sessions should be needed, a session_start(); is required. So as soon this function is invoked, PHP checks if the browser’s request had any PHPSESSID cookie sent in the header, as it was sent in our case, PHP Engine will open /tmp/sess_32charid file (with the same session id) and unserialize the contents of the file. It then assigns the values of the unserialized data structures to the $_SESSION variable.

The simple echo $_SESSION['name']; will now be able to output the name!! Sessions working…

On a session_destroy();, PHP sends a destructive, previous timestamp cookie for PHPSESSID and unlinks or deletes the /tmp/sess_32charid file. This ensures that no reference of that session is left.

References

  • http://in3.php.net/manual/en/session.configuration.php

Scribe PHP logging

I’d put some efforts to make scribed logging work with PHP, what I did was follow python’s example script “scribe_cat”. And made a similar PHP Script out of it, I’d to create many PHP scripts out of n number of .thrift files. Anyways I’ve got a working example. Here it is.

<?php
/*
 * As found on http://highscalability.com/product-scribe-facebooks-scalable-logging-system
        $messages = array();
        $entry = new LogEntry;
        $entry->category = "buckettest";
        $entry->message = "something very interesting happened";
        $messages []= $entry;
        $result = $conn->Log($messages);
*/

$GLOBALS['THRIFT_ROOT'] = './includes';

include_once $GLOBALS['THRIFT_ROOT'] . '/scribe.php';
include_once $GLOBALS['THRIFT_ROOT'] . '/transport/TSocket.php';
include_once $GLOBALS['THRIFT_ROOT'] . '/transport/TFramedTransport.php';
include_once $GLOBALS['THRIFT_ROOT'] . '/protocol/TBinaryProtocol.php';
//include_once '/usr/local/src/releases/scribe-2.0/src/gen-php/scribe.php';

$msg1['category'] = 'keyword';
$msg1['message'] = "This is some message for the category\n";
$msg2['category'] = 'keyword';
$msg2['message'] = "Some other message for the category\n";
$entry1 = new LogEntry($msg1);
$entry2 = new LogEntry($msg2);
$messages = array($entry1, $entry2);

$socket = new TSocket('localhost', 1464, true);
$transport = new TFramedTransport($socket);
$protocol = new TBinaryProtocol($transport, false, false);
$scribe_client = new scribeClient($protocol, $protocol);

$transport->open();
$scribe_client->Log($messages);
$transport->close();

You can have as many messages or entries into one log, as I’ve demonstrated or tried above, please change the corresponding scribed’s host and port values. I’ve attached a working file and all the required includes generated by scribe. Except for the above script everything is generated by Scribe/Thrift.

PHP, Python Consistent Hashing

I found out the hashing algorithm used in PHP-Memcache is different from that of Python-Memcache. The keys went to different servers as the hash created by python and php were different.

I posted a question on the memcache groups and was lucky to find this wonderful reply.

import memcache
import binascii
m = memcache.Client(['192.168.28.7:11211', '192.168.28.8:11211
', '192.168.28.9:11211'])

def php_hash(key):
    return (binascii.crc32(key) >> 16) & 0x7fff

for i in range(30):
       key = 'key' + str(i)
       a = m.get((php_hash(key), key))
       print i, a

This is the only thing that has to be done on Python’s end, change the way the hash is calculated. The coding on PHP end remains same. All you guys using PHP for web based front-end with MySQL and Python for back-end scripts shall find this helpful.

Thanks Brian Rue.

Reference: http://groups.google.com/group/memcached/msg/7bb75a026c44ec43

Google Browser, gBrowser or Google Chrome

It was there … hanging, if Google was getting into the Browser market, now its almost a reality with Google releasing features of its browser in a comic strip.

Google Chrome comic strip

Baptised as Google Chrome, It has some of these features

Google Chrome Features

  • Each tab is a process
  • Each plugin has its own addres space
  • New JavaScript Engine V8 (Virtualized)
  • Tab is the primary UI (so no tabs in browsers, but browsers in tabs)
  • better autocomplete in the Address bar !! (Firefox 3 ?)
  • Tab page (9 most visited pages, Opera has a similar feature of quick links)
  • A Privacy Tab ! (nothing saved back to browser, readonly)
  • Sanboxing of Tabs’ memory (no reading from other tabs’s process, forget writing)
  • Google Gears in-built

You can enjoy the comic strip here. Or read more.

Photography is like Javascript

A lovely quote from one of my favourite feeds, dustindiaz.com

Photography is like JavaScript
As if you didn’t know it was coming to this. But let me be the first to put it into words. Photography is like JavaScript, and JavaScript is like photography. They are both expressive and beautiful. They can be done bad or good … Both can detailed and complicated. Both can be simple. They are both object-oriented.

..more