Facebook has been working a lot, Cassandra, Scribe, u name it. The new kid from Facebook is Hiphop PHP which is an automated system that converts PHP to C++
HipHop programmatically transforms your PHP source code into highly optimized C++ and then uses g++ to compile it.
Read More here: HipHop
It features some of these
- Even better performance compared to v. 1.3.x
- Serialization like PHP, same param serialized like foo=bar1&foo=bar2
- Easy Setter Functions
- The performance of .css() and .attr() has been improved
- All Events Can Be Live Events
I wanted to compare the following DBs, NoSQLs and caching solutions for speed and connections. Tested the following
- Tokyo Tyrant / Tokyo Cabinet
- MySQL 5.1.40 (MyISAM)
- MySQL 5.1.40 (with Innodb Plugin 1.0.4), compiled into source of MySQL
My test had the following criteria
- 2 client boxes
- All clients connecting to the server using Python
- Used Python’s threads to create concurrency
- Each thread made 10,000 open-close connections to the server
- The server was
- Intel(R) Pentium(R) D CPU 3.00GHz
- Fedora 10 32bit
- Intel(R) Pentium(R) D CPU 3.00GHz
- 184.108.40.206-170.2.113.fc10.i686 #1 SMP
- 1GB RAM
- Used a md5 as key and a value that was saved
- Created an index on the key column of the table
- Each server had SET and GET requests as a different test at same concurrency
Results please !
I wanted to simulate a situation where I had 2 servers (clients) serving my code, which connected to the 1 server (memcached, redis, or whatever). Another thing to note was that I used Python as the client in all the tests, definately the tests would give a different output had I used PHP. Again the test was done to check how well the clients could make and break the connections to the server, and I wanted the overall throughput after making and breaking the connections. I did not monitor the response times. I didnt change absolutely any parameters for the servers, eg didn’t change the innodb_buffer_pool_size or key_buffer_size.
MySQL lacked the whole scene terribly, I monitored the MySQL server via the MySQL Administrator and found that hardly there were any conncurrent inserts or selects, I could see the unauthenticated users, which meant that the client had connected to MySQL and was doing a handshake using MySQL authentication (using username and password). As you could see I didn’t even perform the 40 and 60 thread tests.
I truncated the table before I swtiched my tests from MyISAM to InnoDB. And always started the tests from lesser threads. My table was as follows
CREATE TABLE `comp_dump` ( `k` char(32) DEFAULT NULL, `v` char(32) DEFAULT NULL, KEY `ix_k` (`k`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
For Tokyo Tyrant I used a file.tch as the DB, which is a hash database. I also tried MongoDB as u may find if u have opened the worksheet, But the server kept failing or actually the mongod failed after coming at an unhandled Exception. I found something similar over here. I tried 1.0.1, 1.1.3 and the available Nightly build, but all failed and I lost my patience.
If you need speed just to fetch a data for a given combination or key, Redis is a solution that you need to look at. MySQL can no way compare to Redis and Memcache. If you find Memcache good enough, you may want to look at Tokyo Tyrant as it does a synchronous writes. But you need to check for your application which server/combination suits you the best. In Marathi there is a saying “मेल्या शिवाय स्वर्ग दिसत नाही”, which means “You can’t see heaven without dieing” or need to do your hard work, can’t escape that 😉
I’ve attached the source code used to test, if anybody has any doubts, questions feel free to ask
HTTP is a stateless protocol. Which means that every request the browser makes to the server cant be identified by the server as a subsequent request of that user/IP/browser or a brand new request.
HTTP doesn’t understand who is requesting. So how do sessions manage to make HTTP look intelligent? The Answer lies in the request-response model with data.
When a normal request is made, eg my website, the minimalistic data passed by the client/browser is this
GET / HTTP/1.1 Host: ruturaj.net
The server responds by giving the output. But when a developer does a
session_start();, What actually happens is, the PHP engine sets a PHPSESSID cookie. This data is sent from the Server as
Set-Cookie header. So the response goes somewhat like this
HTTP/1.x 200 OK Date: xxxx Set-Cookie: PHPSESSID=<32charhexvalue>; expires=xxxx ...
Now considering the browser does accept the cookies, it saves the PHPSESSID cookie. Consequently the server also creates a file in the specified directory (by default on Linux as /tmp) as /tmp/sess_32charid.
Now when another request is made by the user/browser, the Cookie header is passed through the GET request back to the server, something like this…
GET /session2.php HTTP/1.1 Host: ruturaj.net Cookie: PHPSESSID=<32charid>; othercookies=othervalues;
The session2.php, for example, is setting a value of name in session, by this
$_SESSION['name'] = $name_obtained_from_somewhere;
Now as the script finishes, the script flushes all the
$_SESSION data into the /tmp/sess_32charid file associated to that session id. It saves all the data in the serialized format
Consider the browser makes another request to session3.php where
$_SESSION['name'] is echoed. Now when the request is made, just like previous case, the PHPSESSID is passed in the cookie.
Now as mandated by php.net, that every page where sessions should be needed, a
session_start(); is required. So as soon this function is invoked, PHP checks if the browser’s request had any PHPSESSID cookie sent in the header, as it was sent in our case, PHP Engine will open /tmp/sess_32charid file (with the same session id) and unserialize the contents of the file. It then assigns the values of the unserialized data structures to the
echo $_SESSION['name']; will now be able to output the name!! Sessions working…
session_destroy();, PHP sends a destructive, previous timestamp cookie for PHPSESSID and unlinks or deletes the /tmp/sess_32charid file. This ensures that no reference of that session is left.
I’d put some efforts to make scribed logging work with PHP, what I did was follow python’s example script “scribe_cat”. And made a similar PHP Script out of it, I’d to create many PHP scripts out of n number of .thrift files. Anyways I’ve got a working example. Here it is.
<?php /* * As found on http://highscalability.com/product-scribe-facebooks-scalable-logging-system $messages = array(); $entry = new LogEntry; $entry->category = "buckettest"; $entry->message = "something very interesting happened"; $messages = $entry; $result = $conn->Log($messages); */ $GLOBALS['THRIFT_ROOT'] = './includes'; include_once $GLOBALS['THRIFT_ROOT'] . '/scribe.php'; include_once $GLOBALS['THRIFT_ROOT'] . '/transport/TSocket.php'; include_once $GLOBALS['THRIFT_ROOT'] . '/transport/TFramedTransport.php'; include_once $GLOBALS['THRIFT_ROOT'] . '/protocol/TBinaryProtocol.php'; //include_once '/usr/local/src/releases/scribe-2.0/src/gen-php/scribe.php'; $msg1['category'] = 'keyword'; $msg1['message'] = "This is some message for the category\n"; $msg2['category'] = 'keyword'; $msg2['message'] = "Some other message for the category\n"; $entry1 = new LogEntry($msg1); $entry2 = new LogEntry($msg2); $messages = array($entry1, $entry2); $socket = new TSocket('localhost', 1464, true); $transport = new TFramedTransport($socket); $protocol = new TBinaryProtocol($transport, false, false); $scribe_client = new scribeClient($protocol, $protocol); $transport->open(); $scribe_client->Log($messages); $transport->close();
You can have as many messages or entries into one log, as I’ve demonstrated or tried above, please change the corresponding scribed’s host and port values. I’ve attached a working file and all the required includes generated by scribe. Except for the above script everything is generated by Scribe/Thrift.
I found out the hashing algorithm used in PHP-Memcache is different from that of Python-Memcache. The keys went to different servers as the hash created by python and php were different.
I posted a question on the memcache groups and was lucky to find this wonderful reply.
import memcache import binascii m = memcache.Client(['192.168.28.7:11211', '192.168.28.8:11211 ', '192.168.28.9:11211']) def php_hash(key): return (binascii.crc32(key) >> 16) & 0x7fff for i in range(30): key = 'key' + str(i) a = m.get((php_hash(key), key)) print i, a
This is the only thing that has to be done on Python’s end, change the way the hash is calculated. The coding on PHP end remains same. All you guys using PHP for web based front-end with MySQL and Python for back-end scripts shall find this helpful.
Thanks Brian Rue.
It was there … hanging, if Google was getting into the Browser market, now its almost a reality with Google releasing features of its browser in a comic strip.
Baptised as Google Chrome, It has some of these features
Google Chrome Features
- Each tab is a process
- Each plugin has its own addres space
- Tab is the primary UI (so no tabs in browsers, but browsers in tabs)
- better autocomplete in the Address bar !! (Firefox 3 ?)
- Tab page (9 most visited pages, Opera has a similar feature of quick links)
- A Privacy Tab ! (nothing saved back to browser, readonly)
- Sanboxing of Tabs’ memory (no reading from other tabs’s process, forget writing)
- Google Gears in-built
A lovely quote from one of my favourite feeds, dustindiaz.com