Category Archives: Web

Javascript Docs – Here we come

A new campaign has started – Promoting JavaScript Documentation, Promote JS.

One of the things that echoed my thoughts about JavaScript were these

This is the first impression of JS by the general masses who are coming to this language and once you see this, you can see why people consider it a “toy language” and understand how so much bad code and disdain can exist for JS. We have hidden the better tutorials, learnings, and documentations away from ourselves AND more to the point, those trying to learn this language. New entrants struggle to learn JS, but eventually just adopt what they know from PHP, Java, Perl, Python and Ruby to a close approximation of runnable code that suffices. They then publish it back out, proud of what they have done, and continue to perpetuate this plague of improper JS coding.

http://hacks.mozilla.org/2010/10/promotejs-a-worldwide-call-for-improving-js-documentation-visibility/

The aim of the campaign is

  • Improve JavaScript Documentation
  • Add more back links to current MDN / MDC documentation. So that a search terms like “JavaScript” gets MDN’s link as #1 reference instead of Wikipedia

 

on the Web

Flickr

flickr.com

twitter

Picasa

Picasa

Hiphop PHP

Facebook has been working a lot, Cassandra, Scribe, u name it. The new kid from Facebook is Hiphop PHP which is an automated system that converts PHP to C++

HipHop programmatically transforms your PHP source code into highly optimized C++ and then uses g++ to compile it.

Read More here: HipHop

Redis, Memcached, Tokyo Tyrant and MySQL comparision

I wanted to compare the following DBs, NoSQLs and caching solutions for speed and connections. Tested the following

My test had the following criteria

  • 2 client boxes
  • All clients connecting to the server using Python
  • Used Python’s threads to create concurrency
  • Each thread made 10,000 open-close connections to the server
  • The server was
    • Intel(R) Pentium(R) D CPU 3.00GHz
    • Fedora 10 32bit
    • Intel(R) Pentium(R) D CPU 3.00GHz
    • 2.6.27.38-170.2.113.fc10.i686 #1 SMP
    • 1GB RAM
  • Used a md5 as key and a value that was saved
  • Created an index on the key column of the table
  • Each server had SET and GET requests as a different test at same concurrency

Results please !

Work sheet

throughput set

throughput get

I wanted to simulate a situation where I had 2 servers (clients) serving my code, which connected to the 1 server (memcached, redis, or whatever). Another thing to note was that I used Python as the client in all the tests, definately the tests would give a different output had I used PHP. Again the test was done to check how well the clients could make and break the connections to the server, and I wanted the overall throughput after making and breaking the connections. I did not monitor the response times. I didnt change absolutely any parameters for the servers, eg didn’t change the innodb_buffer_pool_size or key_buffer_size.

MySQL

MySQL lacked the whole scene terribly, I monitored the MySQL server via the MySQL Administrator and found that hardly there were any conncurrent inserts or selects, I could see the unauthenticated users, which meant that the client had connected to MySQL and was doing a handshake using MySQL authentication (using username and password). As you could see I didn’t even perform the 40 and 60 thread tests.

I truncated the table before I swtiched my tests from MyISAM to InnoDB. And always started the tests from lesser threads. My table was as follows

CREATE TABLE `comp_dump` (
  `k` char(32) DEFAULT NULL,
  `v` char(32) DEFAULT NULL,
  KEY `ix_k` (`k`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

NoSQL

For Tokyo Tyrant I used a file.tch as the DB, which is a hash database. I also tried MongoDB as u may find if u have opened the worksheet, But the server kept failing or actually the mongod failed after coming at an unhandled Exception. I found something similar over here. I tried 1.0.1, 1.1.3 and the available Nightly build, but all failed and I lost my patience.

Now what

If you need speed just to fetch a data for a given combination or key, Redis is a solution that you need to look at. MySQL can no way compare to Redis and Memcache. If you find Memcache good enough, you may want to look at Tokyo Tyrant as it does a synchronous writes. But you need to check for your application which server/combination suits you the best. In Marathi there is a saying “मेल्या शिवाय स्वर्ग दिसत नाही”, which means “You can’t see heaven without dieing” or need to do your hard work, can’t escape that 😉

I’ve attached the source code used to test, if anybody has any doubts, questions feel free to ask

How sessions work in PHP

HTTP is a stateless protocol. Which means that every request the browser makes to the server cant be identified by the server as a subsequent request of that user/IP/browser or a brand new request.

HTTP doesn’t understand who is requesting. So how do sessions manage to make HTTP look intelligent? The Answer lies in the request-response model with data.


When a normal request is made, eg my website, the minimalistic data passed by the client/browser is this

GET / HTTP/1.1
Host: ruturaj.net

The server responds by giving the output. But when a developer does a session_start();, What actually happens is, the PHP engine sets a PHPSESSID cookie. This data is sent from the Server as Set-Cookie header. So the response goes somewhat like this

HTTP/1.x 200 OK
Date: xxxx
Set-Cookie: PHPSESSID=<32charhexvalue>; expires=xxxx
...

Now considering the browser does accept the cookies, it saves the PHPSESSID cookie. Consequently the server also creates a file in the specified directory (by default on Linux as /tmp) as /tmp/sess_32charid.

Now when another request is made by the user/browser, the Cookie header is passed through the GET request back to the server, something like this…

GET /session2.php HTTP/1.1
Host: ruturaj.net
Cookie: PHPSESSID=<32charid>; othercookies=othervalues;

The session2.php, for example, is setting a value of name in session, by this

$_SESSION['name'] = $name_obtained_from_somewhere;

Now as the script finishes, the script flushes all the $_SESSION data into the /tmp/sess_32charid file associated to that session id. It saves all the data in the serialized format

Consider the browser makes another request to session3.php where $_SESSION['name'] is echoed. Now when the request is made, just like previous case, the PHPSESSID is passed in the cookie.

Now as mandated by php.net, that every page where sessions should be needed, a session_start(); is required. So as soon this function is invoked, PHP checks if the browser’s request had any PHPSESSID cookie sent in the header, as it was sent in our case, PHP Engine will open /tmp/sess_32charid file (with the same session id) and unserialize the contents of the file. It then assigns the values of the unserialized data structures to the $_SESSION variable.

The simple echo $_SESSION['name']; will now be able to output the name!! Sessions working…

On a session_destroy();, PHP sends a destructive, previous timestamp cookie for PHPSESSID and unlinks or deletes the /tmp/sess_32charid file. This ensures that no reference of that session is left.

References

  • http://in3.php.net/manual/en/session.configuration.php

Scribe PHP logging

I’d put some efforts to make scribed logging work with PHP, what I did was follow python’s example script “scribe_cat”. And made a similar PHP Script out of it, I’d to create many PHP scripts out of n number of .thrift files. Anyways I’ve got a working example. Here it is.

<?php
/*
 * As found on http://highscalability.com/product-scribe-facebooks-scalable-logging-system
        $messages = array();
        $entry = new LogEntry;
        $entry->category = "buckettest";
        $entry->message = "something very interesting happened";
        $messages []= $entry;
        $result = $conn->Log($messages);
*/

$GLOBALS['THRIFT_ROOT'] = './includes';

include_once $GLOBALS['THRIFT_ROOT'] . '/scribe.php';
include_once $GLOBALS['THRIFT_ROOT'] . '/transport/TSocket.php';
include_once $GLOBALS['THRIFT_ROOT'] . '/transport/TFramedTransport.php';
include_once $GLOBALS['THRIFT_ROOT'] . '/protocol/TBinaryProtocol.php';
//include_once '/usr/local/src/releases/scribe-2.0/src/gen-php/scribe.php';

$msg1['category'] = 'keyword';
$msg1['message'] = "This is some message for the category\n";
$msg2['category'] = 'keyword';
$msg2['message'] = "Some other message for the category\n";
$entry1 = new LogEntry($msg1);
$entry2 = new LogEntry($msg2);
$messages = array($entry1, $entry2);

$socket = new TSocket('localhost', 1464, true);
$transport = new TFramedTransport($socket);
$protocol = new TBinaryProtocol($transport, false, false);
$scribe_client = new scribeClient($protocol, $protocol);

$transport->open();
$scribe_client->Log($messages);
$transport->close();

You can have as many messages or entries into one log, as I’ve demonstrated or tried above, please change the corresponding scribed’s host and port values. I’ve attached a working file and all the required includes generated by scribe. Except for the above script everything is generated by Scribe/Thrift.