<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
   <title>Pankaj Kumar&apos;s Weblog</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/" />
   <link rel="self" type="application/atom+xml" href="http://pankaj-k.net/weblog/atom.xml" />
   <id>tag:pankaj-k.net,2008:/weblog//1</id>
   <updated>2008-04-01T01:51:45Z</updated>
   <subtitle>Random thoughts, musings, experiences, ideas, and opinions</subtitle>
   <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.33</generator>

<entry>
   <title>Did you know that each integer in a PHP array takes 68 bytes of storage?</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2008/03/did_you_know_that_each_integer.html" />
   <id>tag:pankaj-k.net,2008:/weblog//1.191</id>
   
   <published>2008-04-01T00:49:32Z</published>
   <updated>2008-04-01T01:51:45Z</updated>
   
   <summary> I should clarify upfront that I love PHP for its simplicity in developing web applications and this post is not meant to be a PHP bashing by any stretch of imagination. My only motivation is to plainly state certain...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>
I should clarify upfront that I love PHP for its simplicity in developing web applications and this post is not meant to be a PHP bashing by any stretch of imagination. My only motivation is to plainly state certain facts that I came across while researching/experimenting about a design decision on how best to keep track of structured information within a PHP program. What I found was quite surprising, to say the least.
<p>
One of my function calls returned a collection of pairs of integers and I was wondering whether to store the pair as an array of two named values (as in <code>array('value1' => $value1, 'value2' => $value2)</code>) or a PHP5 class (as in <code>class ValuePair { var $value1; var $value2; }</code>). As the number of pairs could be quite large, I thought I'll optimize for memory. Based on experience with compiled languages such as C/C++ and Java, I expected the class based implementation to take less space. Based on a simple memory measurement program, as I'll explain later, this expectation turned out to be misplaced. Apparently PHP implements both arrays and objects as hash tables and in fact, objects require a little more memory than arrays with same members. In hindsight, this doesn't appear so surprising. Compiled languages can convert member accesses to fixed offsets but this is not possible for dynamic languages.
<p>
But what did surprise me was the amount of space being used for an array of two elements. Each array having two integers, when placed in another array representing the collection, was using around 300 bytes. The corresponding number for objects is around 350 bytes. I did some googling and found out that <a href="http://bugs.php.net/bug.php?id=41053">a single integer value stored within an PHP array uses 68 bytes</a>: 16 bytes for value structure (zval), 36 bytes for hash bucket, and 2*8 = 16 bytes for memory allocation headers. No wonder an array with two named integer values takes up around 300 bytes.
<p>
I am not really complaining -- PHP is not designed for writing data intensive programs. After all, how much data are you going to display on a single web page. But it is still nice to know the actual memory usage of variables within your program. What if your PHP program is not generating an HTML page to be rendered in the browser but a PDF or Excel report to be saved on disk? Would you want your program to exceed memory limit on a slightly larger data set?
<p>
Coming back to the original problem -- how should I store a collection pair of values? array of arrays or array of objects? For memory optimization, the answer may be to have two arrays, one for each value.
<p>
For those who care for nitty-gritties, here is the program I used for measurements:
<pre>&lt;?php
class EmptyObject { };
class NonEmptyObject {
  var $int1;
  var $int2;
  function NonEmptyObject($a1, $a2){
    $this-&gt;int1= $a1;
    $this-&gt;int2= $a2;
  }
};
$num = 1000;
$u1 = memory_get_usage();
$int_array = array();
for ($i = 0; $i &lt; $num; $i++){
  $int_array[$i] = $i;
}
$u2 = memory_get_usage();
$str_array = array();
for ($i = 0; $i &lt; $num; $i++){
  $str_array[$i] = "$i";
}
$u3 = memory_get_usage();
$arr_array = array();
for ($i = 0; $i &lt; $num; $i++){
  $arr_array[$i] = array();
}
$u4 = memory_get_usage();
$obj_array = array();
for ($i = 0; $i &lt; $num; $i++){
  $obj_array[$i] = new EmptyObject();
}
$u5 = memory_get_usage();
$arr2_array = array();
for ($i = 0; $i &lt; $num; $i++){
  $arr2_array[$i] = array('int1' => $i, 'int2' => $i + $i);
}
$u6 = memory_get_usage();
$obj2_array = array();
for ($i = 0; $i &lt; $num; $i++){
  $obj2_array[$i] = new NonEmptyObject($i, $i + $i);
}
$u7 = memory_get_usage();

echo "Space Used by int_array: " . ($u2 - $u1) . "\n";
echo "Space Used by str_array: " . ($u3 - $u2) . "\n";
echo "Space Used by arr_array: " . ($u4 - $u3) . "\n";
echo "Space Used by obj_array: " . ($u5 - $u4) . "\n";
echo "Space Used by arr2_array: " . ($u6 - $u5) . "\n";
echo "Space Used by obj2_array: " . ($u7 - $u6) . "\n";
?&gt;</pre>
And here is a sample run:
<pre>[pankaj@fc7-dev ~]$ php -v
PHP 5.2.4 (cli) (built: Sep 18 2007 08:50:58)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies
[pankaj@fc7-dev ~]$ php -C memtest.php
Space Used by int_array: 72492
Space Used by str_array: 88264
Space Used by arr_array: 160292
Space Used by obj_array: 180316
Space Used by arr2_array: 304344
Space Used by obj2_array: 349144
[pankaj@fc7-dev ~]$</pre>

]]>
      
   </content>
</entry>
<entry>
   <title>All you would ever need to know about Ajax</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2008/03/all_you_would_ever_need_to_kno.html" />
   <id>tag:pankaj-k.net,2008:/weblog//1.190</id>
   
   <published>2008-03-12T00:20:58Z</published>
   <updated>2008-03-19T00:51:20Z</updated>
   
   <summary> Okay, a short blog post like this (or even a big one, like those penned by Steve Yegge) can&apos;t tell you everything *known today* about Ajax, forget &quot;all you ever need to know&quot;. In fact, it can&apos;t tell you...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="AJAX" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>
Okay, a short blog post like this (or even a big one, like <a href="http://steve-yegge.blogspot.com/">those penned by Steve Yegge</a>) can't tell you everything *known today* about Ajax, forget "all you ever need to know". In fact, it can't tell you everything about anything worth knowing. There is just way too much information and knowledge around us about almost everything, consequential or not. To make things worse, at least for those who claim to "tell everything", this body of information and knowledge keeps growing every minue. 
<p>
So why did I choose this particular title? No, I didn't intend to write everything <em>I know</em> about Ajax. It is just a link-bait. Seems to have worked quite well for others. Might work for me as well. 
<p>
What I really want to do in this post is to write a short review of "Ajax -- The Definitive Guide", a book published by O'Reilly. Those who are familiar with Oreilly's <em>The Definitive Guide</em> series know that these books have a reputation of being very comprehensive and all encompassing about the chosen topic. This certainly seems to be the case for a number of  books in this series on my bookshelf, such as "JavaScript: The Definitive Guide" and "SSH, The Secure Shell: The Definitive Guide". But a definitive guide on something like Ajax? It would have to cover a lot of stuff, in all their fullness and fine details, to do justice to the title: the basics of Ajax interactions, (X)HTML, JavaScript, XML, XmlHttpRequest, CSS, DOM, browser idiosyncrasies, Ajax programming style and design patterns, tips-n-tricks, numerous browser side Ajax libraries such as prototype, YUI library, jQuery etc. and their integration with server side frameworks such as RoR, Drupal etc. The list is fairly long, if not endless. And each topic worthy of a book by itself.
<p>
<center><iframe marginwidth="0" marginheight="0" width="120" height="240" scrolling="no" frameborder="0"
  src="http://rcm.amazon.com/e/cm?o=1&l=as1&f=ifr&t=charteous-20&p=8&asins=0596528388&IS2=1&lt1=_blank">
  <MAP NAME="boxmap-p8"><AREA SHAPE="RECT" COORDS="14, 200, 103, 207" HREF="http://rcm.amazon.com/e/cm/privacy-policy.html?o=1" >
	<AREA COORDS="0,0,10000,10000" HREF="http://www.amazon.com/exec/obidos/redirect-home/charteous-20" >
  </MAP>
  <img src="http://rcm-images.amazon.com/images/G/01/rcm/120x240.gif" width="120" height="240" border="0" usemap="#boxmap-p8" alt="Shop at Amazon.com">
</iframe></center>
</p>
Fortunately, <em>Ajax -- The Definitive Guide</em> doesn't try to be a definitive guide for everything that goes or could go into an Ajaxy application. I found the book more to be a good collection of interesting and relevant topics that the author Anthony T. Holdener III has had first hand experience with. Most of these I knew about, some I was vaguely familiar with and a few were quite new to me. However, I wouldn't call the collection a "definitive guide for Ajax". If you are new to Ajax and are somewhat lost, in terms of where to start and how things relate to each other, then this book is certainly worth paying for. However, if you have already been into Ajax development for sometime and are craving for a single text to answer recurring questions around Ajax specific patterns, solution to common problems, browser differences and ways to tackle them then this is perhaps not the book for you. In this sense, the book doesn't really fit into the "The Definitive Guide" pattern.
<p>
On the other hand, the book does provide good introduction to basic concepts, is quite readable, includes a lot of source code for non-trivial working programs and lists relevant resources, such as Ajax libraries, frameworks and applications, in its References section. I especially liked the "chat" and "whiteboard" application that allows two or more users to share a whiteboard and chat through their browsers.
<p>
Okay, so how does this book compares with other books on the same topic? This is a tough question, for I haven't been paying attention to most books that have come out on this topic. Though there is a answer, and it comes from this Amazon Sales Rank comparison chart:
<script type="text/javascript">
Charteous = {params: {amzn: { asins: ["0596528388", "1932394613", "0596102259", "1932394990"]}}};
</script>
<script type="text/javascript"
src="http://charteo.us/charteous/show-chart.js">
</script>
<p>
A higher Sales Rank for an item implies that more people are buying it from Amazon. This doesn't tell how well a particular book will meet your needs but just that the high ranking items, in general, are being bought by more people than the low ranking ones. The above chart does indicate that <em>Ajax -- The Definitive Guide</em> is outselling its rivals, at least at the time of this review (March 17-18, 2008).]]>
      
   </content>
</entry>
<entry>
   <title>Google -- Innovation Model or Anomaly?</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/11/google_innovation_model_or_ano.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.188</id>
   
   <published>2007-11-27T22:24:21Z</published>
   <updated>2007-11-27T23:06:30Z</updated>
   
   <summary> &quot;Should innovation-minded managers look at the fast-growing Internet company as a model — or an anomaly?&quot; This is the question posed by Nick G Carr in a Strategy &amp; Business article. Delving into various aspects of the enigmatic company,...</summary>
   <author>
      <name></name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>
"Should innovation-minded managers look at the fast-growing Internet company as a model — or an anomaly?" This is the question posed by <a href="http://roughtype.com/">Nick G Carr</a> in a <a href="http://www.strategy-business.com/press/article/07404?gko=a2bce-1876-26510326">Strategy & Business article</a>. Delving into various aspects of the enigmatic company, he opines:
<p>
<blockquote>The way Google makes money is actually straightforward: It brokers and publishes advertisements through digital media. ... snip ... Google’s protean appearance is not a reflection of its core business. Rather, it stems from the vast number of complements to its core business. ... snip ...  For Google, literally everything that happens on the Internet is a complement to its main business. The more things that people and companies do online, the more ads they see and the more money Google makes. In addition, as Internet activity increases, Google collects more data on consumers’ needs and behavior and can tailor its ads more precisely, strengthening its competitive advantage and further increasing its income. As more and more products and services are delivered digitally over computer networks - entertainment, news, software programs, financial transactions - Google’s range of complements is expanding into ever more industry sectors.</blockquote> 
<p>
Though this argument appears plausible, I don't think it will withstand critical scrutiny. Not all online activities can be equally monetized through ads. It is well documented that ads alongside search results perform much better than ads on content pages, email messages, online productivity apps, video clips or social networks (to be fair the verdict on last two is still not out). Would a company as focussed on effectiveness as Google try to increase the online ad market by doing things which are proven not to be very effective?
<p>
In my opinion, Google's core competency is in developing and running highly customized hardware and software systems and they will use this competency to solve mega-problems that others are ill-equipped to address. In the process, they will disrupt a number of established businesses.


]]>
      
   </content>
</entry>
<entry>
   <title>Amazon Games Its own Ranking System to Put Kindle on Top</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/11/amazon_games_its_own_ranking_s.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.187</id>
   
   <published>2007-11-20T17:30:11Z</published>
   <updated>2007-11-20T18:28:23Z</updated>
   
   <summary> With all the buzz around Amazon&apos;s latest ebook reader Kindle, I tried to add it to Sales Rank tracking site Charteous, hoping to compare its popularity with other similar readers from Sony and Franklin over a period of time....</summary>
   <author>
      <name></name>
      
   </author>
         <category term="Trends" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>
With all the buzz around <a href="http://www.amazon.com/o/ASIN/B000FI73MA/charteous-20">Amazon's latest ebook reader Kindle</a>, I <em>tried</em> to add it to Sales Rank tracking site <a href="http://charteo.us">Charteous</a>, hoping to compare its popularity with other similar readers from <a href="http://www.amazon.com/o/ASIN/B000WPXQ2M/charteous-20">Sony</a> and <a href="http://www.amazon.com/o/ASIN/B00004WHFK/charteous-20">Franklin</a> over a period of time. Notice the emphasis on word <em>tried</em>, for although I could easily add Kindle to Charteous, it isn't possible to compare its Sales Rank with those of other eBook Readers. Why so? Just look at the Product Details section of Kindle:
<p>
<img alt="kindle-product-description.gif" src="http://pankaj-k.net/weblog/kindle-product-description.gif" width="490" height="203" />
<p>
As you can see, Amazon has created a whole new product category called <strong><a href="http://www.amazon.com/exec/obidos/tg/browse/-/133141011">Kindle Store</a></strong> for items related to its own eBook Reader. As the other eBook Readers are ranked within Electronics category, it is just not possible to compare their relative popularity through a <a href="http://charteo.us/amzn/compare/B000WPXQ2M,B00004WHFK">eBook Readers Sales Rank Comparison Chart</a>.
<p>
While digging for this blog post, I came across something else that didn't make sense: Kindle is not the only item within Kindle Store with a Sales Rank of 1 (highest possible rank). It shared this rank with digital book <a href="http://www.amazon.com/You-Staying-Owners-Extending-Warranty/dp/B000UZNS36/ref=pd_ts_kinc_1?ie=UTF8&s=digital-text">You: Staying Young: The Owner's Manual for Extending Your Warranty (You) (Kindle Edition)</a> at the time I looked at almost the same time (Note: the ranking is updated every hour, so this may not hold when you try to confirm). Now, no one knows the secret algorithm that Amazon uses to calculate this rank and it may be possible the equal ranking is assigned by the algorithm, though that seems very unlikely to me.
<p>
A single instance of gaming the ranking system doesn't invalidate the whole notion of gauging popularity through <a href="http://www.amazon.com/gp/help/customer/display.html?ie=UTF8&nodeId=525376">Sales Rank</a>, but it does confirm that even a company like Amazon is not immune from manipulating its system of recommendations and measurements to gain an unfair advantage.
]]>
      
   </content>
</entry>
<entry>
   <title>Named Captures Are Cool</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/08/named_captures_are_cool.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.186</id>
   
   <published>2007-08-26T04:30:27Z</published>
   <updated>2007-08-26T07:11:02Z</updated>
   
   <summary>Regular Expressions are well known for their power and brevity in validating textual patterns. Less known is their ability to extract substrings surrounded by known patterns of text through a construct known as round bracket groupings. The text matching the...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>Regular Expressions are well known for their power and brevity in validating textual patterns. Less known is their ability to extract substrings surrounded by known patterns of text through a construct known as <a href="http://www.regular-expressions.info/brackets.html">round bracket groupings</a>. The text matching the sub-expression within a pair of round brackets is captured and is available as a backreference within the regular expression itself  or an indexed variable outside. For example, the PHP statement
<pre>
preg_match('/Name: (.+), Age: (\d+)/', $text, $matches);
</pre>
<p>would return 1 on finding a substring that matches the specified pattern and stores the matched name, ie; the first captured group, in <code>$matches[1]</code> and matched age, ie; the second captured group, in <code>$matches[2]</code>. <code>$match[0]</code> stores the full matched text. Other languages that support regular expressions, and the list of such languages is pretty long, have similar conventions.
<p>
Counting the capturing groups to get the index of the captured text works okay with short regualr expressions that don't change often. However, counting the position becomes tedious and error prone when the number is large and new groups may get introduced or existing ones removed as the code evolves.
<p>
If you just rely on the documentation accompanying your programming language, such as <a href="http://us.php.net/manual/en/reference.pcre.pattern.syntax.php">this regex syntax for PHP</a>, or <a href="http://java.sun.com/javase/6/docs/api/index.html?java/util/regex/package-summary.html">this Javadoc page for Java</a>, then you are not likely to find a better solution to this problem. At least this is what happened to me, for I wrote code that had the magic indexes all over till I started readingJeffrey E.F. Friedl's excellent <em>Mastering Regular Expression</em> and came across PHP's support for <a href="http://www.regular-expressions.info/named.html">named captures</a>, a mechanism to associate symbolic names to captured groups.
<center><iframe marginwidth="0" marginheight="0" width="120" height="240" scrolling="no" frameborder="0"
  src="http://rcm.amazon.com/e/cm?o=1&l=as1&f=ifr&t=charteous-20&p=8&asins=0596528124&IS2=1&lt1=_blank">
  <MAP NAME="boxmap-p8"><AREA SHAPE="RECT" COORDS="14, 200, 103, 207" HREF="http://rcm.amazon.com/e/cm/privacy-policy.html?o=1" >
	<AREA COORDS="0,0,10000,10000" HREF="http://www.amazon.com/exec/obidos/redirect-home/charteous-20" >
  </MAP>
  <img src="http://rcm-images.amazon.com/images/G/01/rcm/120x240.gif" width="120" height="240" border="0" usemap="#boxmap-p8" alt="Shop at Amazon.com">
</iframe></center>
<p>
What it essentially means is that I could rewrite the previous statement as
<pre>
preg_match('/Name: (?P&lt;Name&gt;.+), Age: (?P&lt;Age&gt;\d+)/', $text, $matches);
</pre>
<p>and access the matched name and age as <code>$matches['Name']</code> and <code>$matches['Age']</code> and need not worry about introducing (or dropping) groups. It not only improves the readability but also makes the code more robust.
<p>
At this point one could argue that in this particular case the book was just incidental, for the information on named captures was already available on the Web, as my link shows, and I should just have googled it. Unfortunately, you need to know a little bit about something to search for more. Google and the Web are no good if you don't know what you don't know. This is exactly where I think the book <em>Mastering Regular Expressions</em> really shines. You need to go through this to realize what you didn't know and what you should look for. And be assured that there are enough aspects of regualr expressions and their implementations in various languages that you may not know to justify the cost of the book. By the way, named captures are not the only thing that I learned from this book. Other things I learnt inlcude 'x' modifiers, conditionals within regular expressions, lookaheads and lookbehinds, and many others. No wonder this book is selling almost as well as Programming Perl, 3rd Edition, the all time programming best seller from O'Reilly.
<script type="text/javascript">
Charteous = {params: {amzn: { asins: ["0596528124", "0596000278"]}}};
</script>
<script type="text/javascript"
src="http://charteo.us/charteous/show-chart.js">
</script>
<p>
At this point I should add that named captures may not yet be widely available in all languages. In fact, as per the book, Perl doesn't have it, though my research for this post led me to <a href="http://use.perl.org/articles/07/08/26/0142221.shtml">this page</a> and eventually to this <a href="http://www.regex-engineer.org/slides/img17.html">page stating that Perl 5.10 has named captures</a>. In fact, the support in Perl 5.10 are much more powerful and makes available not only the last match, as we saw in PHP, but <a href="http://www.regex-engineer.org/slides/img19.html">all the matches in an array</a>.
Java and JavaScript programmers may have to wait longer for named captures, though!
]]>
      
   </content>
</entry>
<entry>
   <title>Are Freakonomics copycats dud?</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/08/are_freakonomics_copycats_dud.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.185</id>
   
   <published>2007-08-09T21:30:11Z</published>
   <updated>2007-08-10T17:22:02Z</updated>
   
   <summary>The copycats of the 2005 mega seller Freakonomics, such as Discover Your Inner Economist and The Economic Naturalist, aren&apos;t doing well -- says The Wall Street Journal. The story backs it up with some interesting statistics from Nielsen BookScan sales...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="Trends" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>The copycats of the 2005 mega seller <a href="http://charteo.us/amzn/items/0061234001"><em>Freakonomics</em></a>, such as <a href="http://charteo.us/amzn/items/0525950257"><em>Discover Your Inner Economist </em></a>and <a href="http://charteo.us/amzn/items/046500217X"><em>The Economic Naturalist</em></a>, aren't doing well -- <a href="http://online.wsj.com/public/article/SB118609843574586762.html">says The Wall Street Journal</a>. The story backs it up with some interesting statistics from Nielsen BookScan sales data: the original has sold 119,000 copies since January whereas the copycats have sold only 12,000 copies combined since their spring releases. Seth Godin <a href="http://sethgodin.typepad.com/seths_blog/2007/08/the-801-freakon.html">comments on the story</a> and makes the guess that the original is outselling the copycats 80:1.

<p>Let us take a look at how does all this statistics compare with the Amazon Sales Rank comparison charts at <a href="http://charteo.us">charteous</a>:
<script type="text/javascript">
Charteous = {params: {amzn: { asins: ["0061234001", "006073132X", "0525950257", "046500217X"]}}};
</script>
<script type="text/javascript"
src="http://charteo.us/charteous/show-chart.js">
</script>

<p>No doubt the expanded/revised Freakonomics  is doing much better than the copycats. Even the first version (lower line in the chart) is not doing. But I wouldn't call the copycats complete failures. At least not at their current Sales Rank level of between 100 and 1000. It would be interesting to watch this chart over time, though.

<p>There is something else that caught my attention -- The WSJ story compares sales numbers for different time periods: the publish date for <em>Discover Your Inner Economist </em> is Aug. 2, 2007 and that of <em>The Economic Naturalist</em> is May 21, 2007, whereas the reported sales of 119,000 for <em>Freakonomics</em> is since Jan. 1, 2007. So, the copycats may not be doing as bad as a cursory look at the numbers might suggest.

<p>I read the older release of <em>Freakonomics</em> a few weeks ago and was pretty impressed by the basic notion of how the economics of incentives drives human behavior as well as the specific case stories. The first point is easy to understand but its implications in specific situations are usually non-obvious. The specific stories make the connection and often make for very good reading. I am assuming that what WSJ is calling copycats essentially analyze research and observations in different fields with the theory of economic incentives. If so, I wouldn't consider them copycats at all. In fact, I would buy them, at least the ones that become popular, and read them for the stories.
]]>
      
   </content>
</entry>
<entry>
   <title>Is GNU Sort Broken?</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/08/is_gnu_sort_broken.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.184</id>
   
   <published>2007-08-08T03:48:20Z</published>
   <updated>2007-08-08T18:07:09Z</updated>
   
   <summary>Humor me with this simple task -- arrange the following list of strings in lexigraphically ascending order: a.b aab aaa Keep in mind that the ASCII value of &apos;.&apos; is 46, which is less than 97, the ASCII value of...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="Linux" scheme="http://www.sixapart.com/ns/types#category" />
         <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
         <category term="Software Development" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>Humor me with this simple task -- arrange the following list of strings in <a href="http://en.wikipedia.org/wiki/Lexicographical_order">lexigraphically</a> ascending order:
<pre>
a.b
aab
aaa
</pre>
<p>Keep in mind that the ASCII value of '.' is 46, which is less than 97, the ASCII value of 'a'. Note down your arranged list. Now, create a text file <code>list.txt</code> with the above strings in separate lines and sort them on a Linux system using the <strong>sort</strong> utility with the following command:
<pre>
$ sort list.txt
</pre>
<p>Did you get what you were expecting? I didn't. Here is what I was expecting and what I got under three different Linux systems (Fedora Core, Mandrake and Ubuntu):
<pre>
Expected           sort output
=======           ========
a.b                   aaa
aaa                   aab
aab                   a.b
</pre>
<p>What is going on here? Looks like sort is simply ignoring the '.' character. It shouldn't, at least not as per the <a href="http://www.hmug.org/man/1/sort.php">sort man page</a>. There is this option '-d' to ignore all characters except letters, digits and blanks, and hence '.', but this is not a default option.

<p>Just to confirm that I didn't make a mistake in my manual sort to arrive at the expected list, I sorted the strings within PHP command line shell:
<pre>
php > $a = array("a.b", "aab", "aaa");
php > sort($a);
php > print_r($a);
Array
(
    [0] => a.b
    [1] => aaa
    [2] => aab
)
</pre>
<p>This output is same as what I expected. So, no mistake on my part!
<p>And this led me to the question: is GNU Sort broken? or did I miss something. After shifting through sort man pages at different machines, noticed this warning on a Fedora Core 6 box:
<blockquote>       
 *** WARNING *** The locale specified by the  environment  affects  sort order.  Set LC_ALL=C to get the traditional sort order that uses native  byte values.
</blockquote>
<p>So, this is what I was missing! Btw, this is not something obvious that I just didn't pay attention to. Rechecking the <a href="http://www.hmug.org/man/1/sort.php">online man page</a>, something that I tend to use more often than the man output on a 20x80 terminal screen, confirmed that the warning wasn't there. Also, none of the machines I had tried, all installed for US locale, had LC_ALL set to C by default. And keep in mind that I came across the above discrepancy in sort output only after my program finding the difference of two sorted files failed on certain specific input values. Like most normal folks, I suspected my program first and it took a while to suspect the sort output as the culprit.
<p>Sorry for the provocative title -- I found out about  LC_ALL environment variable only while writing this blog post and double checking my facts (one of the few advantages of writing things down) and didn't feel like changing the title. After all, how many of us will think of setting LC_ALL=C before issuing sort! In that sense, Gnu sort IS broken.]]>
      
   </content>
</entry>
<entry>
   <title>July-August 2007 HBR Case Study: Monolithic Enterprise Software or SOA</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/07/julyaugust_2007_hbr_case_study.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.183</id>
   
   <published>2007-07-14T06:36:55Z</published>
   <updated>2007-07-14T19:05:00Z</updated>
   
   <summary> The HBR Case Study in July-August 2007 issue Too Far Ahead of the IT Curve, authored by John P. Glaser, CIO of Partners Healthcare Systems and co-author of Managing Health Care Information Systems, presents the case of failing IT...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="Software Development" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>
The HBR Case Study in July-August 2007 issue <a href="http://harvardbusinessonline.hbsp.harvard.edu/b01/en/common/item_detail.jhtml?id=R0707A">Too Far Ahead of the IT Curve</a>, authored by John P. Glaser, CIO of Partners Healthcare Systems and co-author of <em><a href="http://charteo.us/amzn/items/0787974684">Managing Health Care Information Systems</a></em>, presents the case of failing IT infrastructure of Peachtree Healthcare, "a federation of 11 hospitals of assorted sizes and special purposes, each with its own proud history and culture, and each with its own weird mishmash of IT systems of various vintages and vendor pedigrees".
<p>
The main problems with the existing system and goals for the future system identified in the study are:
<p>
<ul>
<li>Keeping all the different systems running with acceptable up-time and performance is a strain on the IT department: "the IT infrastructure was consuming so much maintenance energy that further innovation was becoming a luxury".</li>
<li>Sharing of patient records, ensuring quality, consistency, and continuity of care across the entire network of hospitals and physicians.</li>
<li>"Selective" standardization of certain medical procedures across the network but allow sufficient flexibility to individual hospitals and professionals in other areas.</li>
</ul>
<p>
Of course, these points are not so neatly laid out but are embedded within the story in a typical HBR case study style. I had to read it twice.
<p>
Two options are presented to address the current problems and meet future objectives:
<p>
<ul>
<li>Deploy a monolithic enterprise software system that will be much more manageable but will also standardize the business processes across the network. Peachtree Healthcare CEO Max Berndt does not like the brute force homogenization across the network hospitals, especially for non-routine stuff.</li>
<li>Adopt Service Oriented Architecture (SOA) which will enable selective standardization. Though the details are somewhat hazy  -- are they talking about (a) integrating existing IT systems within various hospitals using SOA; or (b) completely replace the existing systems and build the equivalent functionality on top of SOA building blocks such as SOA capable App servers, registries, business process engines and so on. (a) will not address the up-time and performance problems being faced by individual hospitals. (b) will require a costly redesign and rewrite of systems, but will provide the desired flexibility and agility.</li>
</ul>
<p>
As usual, the expert opinions on this case are varied: <a href="https://newsmedia.kaiserpermanente.org/kpweb/executiveprofiles/detailpage.do?bodyContainer=/htmlapp/feature/119executiveprofiles/nat_georgechalvorson.html">George C. Halvorson</a>, the chairman and CEO of Kaiser Permanente, is concerned that the CIO of Peachtree is not enthusiastic about about SOA and recommends more work around defining the vision and identifying the objectives. Typical CEO speak, but it might help the CIO in better understanding the pros and cons of the two options. <a href="http://www.aa.com/content/amrcorp/corporateInformation/bios/ford.jhtml">Monte Ford</a>, senior VP and CIO at American Airlines, recommends SOA based on his experience in adopting SOA. <a href="http://www.forrester.com/ER/Research/List/Analyst/Personal/0,,784,00.html">Randy Heffner</a>, a VP at Forrester Research, makes the comment that "by goofing around SOA as a product category instead of looking at it as a methodology, the CIO has missed key perspectives" and recommends SOA. John A Kastor, a professor of medicine at the Univ. of Maryland School of Medicine, agrees with Peachtree CEO Max that indiscriminate standardization of all medical processes is not the right thing to do, but offers no choice for IT infrastructure modernization.
<p>
The interesting thing to note is that none of the experts recommend a monolithic enterprise software system.
]]>
      
   </content>
</entry>
<entry>
   <title>What is wrong with this widely used AJAX event handler registration code?</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/07/what_is_wrong_with_this_widely.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.182</id>
   
   <published>2007-07-06T07:32:27Z</published>
   <updated>2007-07-06T09:32:07Z</updated>
   
   <summary>John Resig&apos;s blog post on Flexible Javascript Events presents cross-browser functions to register and deregister DOM events to/from any DOM element: addEvent() and removeEvent(). He wrote these functions in response to a addEvent() recoding contest, that was published at a...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="AJAX" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[John Resig's blog post on <a href="http://ejohn.org/projects/flexible-javascript-events/">Flexible Javascript Events</a> presents cross-browser functions to register and deregister DOM events to/from any DOM element: <font size="+1"><code>addEvent()</code></font> and <font size="+1"><code>removeEvent()</code></font>. He wrote these functions in response to a <a href="http://www.quirksmode.org/blog/archives/2005/09/addevent_recodi.html">addEvent() recoding contest</a>, that was published at <a href="http://www.quirksmode.org/">a well-known site for Web developers run by Peter-Paul Koch</a> and included <a href="http://jszen.blogspot.com/">Scott Andrew LePera</a>, <a href="http://dean.edwards.name/about/">Dean Edwards</a> and John Resig himself as co-judges. The recoding contest itself was a response to wide interest in his blog post <a href="http://www.quirksmode.org/blog/archives/2005/08/addevent_consid.html">addEvent() considered harmful</a> where he outlined a problem with a widely used <a href="http://www.scottandrew.com/weblog/articles/cbs-events">function addEvent() published by Scott Andrew LePera</a>. It should also be noted that <a href="http://www.quirksmode.org/blog/archives/2005/10/_and_the_winner_1.html">John Resig's entry was judged as the winner entry</a>.

Most web developers are familiar with the names mentioned in the previous paragraph. They have published books, maintain highly visible websites (Google PageRank of websites/blogs maintained by <a href="http://www.quirksmode.org/">Peter-Paul Koch</a>, <a href="http://dean.edwards.name/">Dean Edwards</a>, <a href="http://ejohn.org/">John Resig</a>, <a href="http://jszen.blogspot.com/">Scott Andrew LePera</a> are 9, 8, 7 and 7, respectively at the time of this blog post), blog regularly and are generally considered gurus in the area of client side web development.

I add all this background only to make the point that writing cross-browser DOM event handling code is non-trivial and has attracted the attention of best minds in the field. With that feeling of comfort that comes with being in good hands, one would think that the problem, although considered difficult in the past, has been solved once and for all and can be reused without much thought.

At least this is what I thought till some strange behavior in my AJAX code that used John Resig's winning addEvent() and removeEvent() forced me to analyze each and every line of the whole program and <em>discovered</em> a couple of really interesting things about the addEvent() function. But before I get into my discovery, let us take a look at the addEvent() code from John Resig's page:

<pre><font size="+1">
function addEvent( obj, type, fn ) { 
  if ( obj.attachEvent ) { 
    obj['e'+type+fn] = fn; 
    obj[type+fn] = function()
      {obj['e'+type+fn]( window.event );} 
    obj.attachEvent( 'on'+type, obj[type+fn] ); 
  } else 
    obj.addEventListener( type, fn, false ); 
}
</font></pre>

As you can see, this code takes on two issues with IE's support for DOM events: (a) IE uses a non-standard method attachEvent() to register event handlers; and (b) it runs the handler code in the global context (ie; built-in variable <code><font size="+1">this</font></code> is set to <code><font size="+1">window</font></code> object during handler execution) and not in the context of the element to which the handler is registered. 

The removeEvent() code is very similar and doesn't need to be reproduced here.

So, what is the problem? Actually, none whatsoever, at least not until you have an event handler function that is few tens of lines long and you pass the name of the function as the last argument to addEvent() function. If you are like me, you would think that the code will either use the function name string or some kind of address to create a short string as key to store the handler function reference within the DOM element object. But what really happens is that whole text of the handler function consisting of few tens of lines of code becomes part of the key (key is 'on' + type + fn). In my code I had a key with length greater than 2000! This in itself would not be much of a problem if the key was created only once during registration and then used for lookup during handler execution, though even a lookup in a hash table with very long strings is probably going to tax the JavaScript interpreter badly. The killer is that the key gets created every time the handler is run. This could be very frequent if the event type is 'mousemove' and could easily result in excessive memory use and sluggish behavior.

"This doesn't sound like an insurmountable problem," you may say, "just wrap your long function within another function that simply invokes the long function. This way the addEvent() code will use the body of the wrapper function for forming the key and avoid creation of long strings."

Actually, this is very similar to what I tried, my motivation bring two-fold: reduce the length of the code that gets used as part of the key and also pass an argument at the time of event handler registration. The wrapper creation function looked something like this:

<pre><font size="+1">
function create_handler(func, arg1){
  return function(event){ 
    return lfunc.call(null, 
      event || window.event, arg1); 
  }
}
</font></pre>

And I used it as follows:

<pre><font size="+1">
function long_function(event, arg1){
 ... tens of lines of code ...
}
addEvent(obj, 'mousemove', 
  create_handler(long_function, arg1));
</font></pre>

which, actually, ended up creating this fixed text for every function: "function(event){ return lfunc.call(null, event || window.event, arg1); }". As the key is a created by concatenating the even type and function text, same key will be created for different handlers if the event type remains same, causing overwrite! This actually happened in my code!

So, even the winning entry has skeletons in the cupboard. It is not that every use would result in broken programs, but there certainly are situations where they fall short. In fact, this is true for most library function and it is always a good practice to know not only the interface and purpose but also the underlying assumptions and how the thing actually works. To be fair to the author John Resig, the recoding contest post had a strict set of requirements and being a reusable function under different conditions was not one of those.
]]>
      
   </content>
</entry>
<entry>
   <title>The most informative iPhone article (Or why I haven&apos;t bought one yet!)</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/07/most_informative_iphone_articl.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.181</id>
   
   <published>2007-07-05T22:41:23Z</published>
   <updated>2007-07-06T07:04:41Z</updated>
   
   <summary>Like most techno geeks, I have been reading an awful lot about iPhone. Note the emphasis on reading, for I haven&apos;t got one yet! Based on all the glowing reports about its ruggedness, record sales and continuing surge in AAPL...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="apple" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[Like most techno geeks, I have been <em>reading</em> an awful lot about iPhone. Note the emphasis on reading, for I haven't got one yet! Based on all the glowing reports about <a href="http://youtube.com/watch?v=czCCavcnNd8">its ruggedness</a>, <a href="http://www.waitingforiphone.com/2007/07/05/att-activates-over-1-million-iphones/">record sales</a> and continuing surge in AAPL stock price post-launch , it seems to be living upto the hype that was created prior to the launch.

However, <a href="http://mailbox.allthingsd.com/20070705/questions-about-apples-iphone/">the article I found to be most informative on iPhone</a> , which is actually not even published in an article format -- it is just a set of question and answers, makes me feel that it is essentially a version 1.0 product. This Q&A column by Walt Mossberg, a WSJ technology columnist, addresses some of the questions I had, such as can I change its dead battery when the inevitable happens (I recently replaced the 2 512MB memory modules of my Mac mini with 2 1GB memory modules with great effort but the kind of satisfaction that only a techno-geek can experience and wanted to know whether something similar was possible with iPhone battery); or can I watch YouTube clips on it; or can I use it like a hand held computer with wi-fi connectivity without signing up for an AT&T service. Unfortunately, the short answer is NO  for these questions (and for few others as well!). One question that it didn't answer and for which I think the answer is a NO is this: Can I access my favorite Web Apps such as <a href="http://www.google.com/ig">iGoogle</a>, <a href="https://www.google.com/adsense/login/en_US/">GMail</a>, <a href="http://www.google.com/analytics/">Google Analytics</a>, <a href="https://www.google.com/adsense/l">Google AdSense</a>, <a href="http://www.movabletype.org/">MovableType Blogging Interface</a>, <a href="http://drupal.org/">Drupal Admin Interface</a> etc. from an iPhone.This actually makes me feel good, for I didn't queue up and have no <a href="http://en.wikipedia.org/wiki/Buyer's_remorse">buyers' remorse</a>.

As you can see, what I am looking for in iPhone is not just a cool phone with MP3 player but a handheld thin client that can also be used as phone, camera, music player, and a personal tv. I have no doubt that iPhone, or its clones, will eventually become this dream device. And that would be a good time to retire my minimal SamSung phone with T-Mobile service.]]>
      
   </content>
</entry>
<entry>
   <title>Save time and money -- install from ISO image file without burning a CD</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/07/save_time_and_money_install_fr.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.180</id>
   
   <published>2007-07-03T20:17:21Z</published>
   <updated>2007-07-03T20:50:53Z</updated>
   
   <summary>Every now and then I need to download, install and experiment with Windows only enterprise software products as part my job. Often these software products are distributed as ISO image files. These are opaque files and need to be burned...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="Software Development" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[Every now and then I need to download, install and experiment with Windows only enterprise software products as part my job. Often these software products are distributed as ISO image files. These are opaque files and need to be burned into a CD before one can see inside these and install the software. Or at least this is what I thought till yesterday when I found out about a free utility for Windows XP distributed by Microsoft that lets you mount an ISO image file and map to an unused Windows drive.

Here are the steps to download and use this free utility known as "Virtual CD Control Panel":
<ol>
<li>Download the <a href="http://download.microsoft.com/download/7/b/6/7b6abd84-7841-4978-96f5-bd58df02efa2/winxpvirtualcdcontrolpanel_21.exe">Virtual CD Control Panel bits</a> from microsoft site. What you get is a .exe file that creates three files when executed: VCdControlTool.exe, VCdRom.sys and readme.txt.</li>
<li>Follow the instructions in readme.txt to load the driver and mount the ISO file as a Windows drive. You would actually need to closely follow the steps as the UI is not very intuitive. One thing I did notice that there was no need to actually copy the VCdRom.sys file to %systemroot%\system32\drivers folder.</li>
</ol>
Not the best tool I have come across but what I can say is that this has been saving me a lot of time and money (in wasted CD ROMs) lately.]]>
      
   </content>
</entry>
<entry>
   <title>Linus on Source Code Management (SCM)</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/06/linus_on_source_code_managemen.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.179</id>
   
   <published>2007-06-04T19:01:54Z</published>
   <updated>2007-06-04T19:37:41Z</updated>
   
   <summary>Came across this interesting and thought provoking YouTube video of Linus Torvalds talking about Source Code Management and GIT to Google employees. It is a long one, over an hour, and with pretty harsh words (sales technique -- I assume)...</summary>
   <author>
      <name></name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[Came across this interesting and thought provoking <a href="http://www.youtube.com/watch?v=4XpnKHJAok8">YouTube video</a> of Linus Torvalds talking about Source Code Management and <a href="http://git.or.cz/">GIT</a> to Google employees. It is a long one, over an hour, and with pretty harsh words (sales technique -- I assume) for CVS/SVN, most other commercial SCM tools, and their users but is worth watching.

My key takeways:
<ul>
<li>Use a tool that meets your or your organizations development process. Conversely, the tool you use will define the process. Linus illustrates this well with the "committers club" based process that has developed around most OSS projects that use CVS/SVN as their SCM tool. Also, how these tools fall short to support Linus's main activity of accepting patches (read: merging branches) to his tree. However, can't really see most commerical organizations using a development process that Linus follows for Linux kernel.</li>

<li>CVS/SVN work on a centralized repository with "slow" over-the-network check-in and checkouts. GIT, and other similar tools, work on a local repository with good support to merge branches from remote repositories. CVS/SVN suck at merging different branches and hence are rarely used. Agree that maintaining branches with either CVS and SVN is a pain. Haven't used GIT or other similar tools, so can't really comment.</li>

<li>Speed of doing SCM activities matters a lot to Linus. And he is not talking about the response time for specific commands but the overall time consumed in SCM activities. Having witnessed the frustration of multi-hour checkouts from a remote ClearCase repository, I cannot agree more.</li>

<li>Integrity guarantees provided by SHA-1 hashes (though, for some reason, Linus referred this as "Consistency" in his talk) as a safeguard against break-ins and/or disk corruption seem to matter a lot to Linus.</li>
</ul>
There are more comments at <a href="http://developers.slashdot.org/article.pl?sid=07/06/03/004214">Slashdot coverage</a> and another <a href="http://codicesoftware.blogspot.com/2007/05/linus-torvalds-on-git-and-scm.html">blog post</a>.
]]>
      
   </content>
</entry>
<entry>
   <title>Backing up PC data to a Linux box with rsync</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/04/backing_up_pc_data_to_a_linux.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.178</id>
   
   <published>2007-04-21T00:15:51Z</published>
   <updated>2007-04-21T01:38:37Z</updated>
   
   <summary>Few days ago I figured out a simple, inexpensive and elegant solution to my backup needs and am thinking this may be of interest to others as well. Though keep in mind that this solution is somewhat specific to my...</summary>
   <author>
      <name></name>
      
   </author>
         <category term="Linux" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>Few days ago I figured out a simple, inexpensive and elegant solution to my backup needs and am thinking this may be of interest to others as well. Though keep in mind that this solution is somewhat specific to my setup and your mileage may vary.</p>

<p>I use a compaq nc4000 laptop running Windows XP as the main system and have a number of other machines running either WIndows or Linux. The main Windows XP system also includes <a href="http://www.cygwin.com/">Cygwin</a>, a collection of GNU tools such as ssh, svn, cvs, tar, wget etc. Use of ssh keys allows ssh sessions to the Linux system without the need to enter a password. All my data files, including Outlook PST files, are under a single directory in C drive.</p>

<p>Copying a directory tree from Windows XP system to a Linux host machine is fairly straight-forward with <a href="http://samba.anu.edu.au/rsync/">rsync</a> (available within Cygwin):
<font size=+1><pre>
rsync -avz &lt;datadir&gt; &lt;user&gt;@&lt;host&gt;:backup
</pre></font>
Running this command for the first time takes some time (proportional to the size of the data directory), but subsequent runs are pretty fast as it transfers only new or modified files.</p>

<p>Turning this simple command into a daily backup is fairly straight forward: create a command file backup.cmd with following text (or adapt to your setup):
<font size=+1><pre>
c:
cd \
rsync -avz &lt;datadir&gt; &lt;user&gt;@&lt;host&gt;:backup >> backup.log
</pre></font>
and setup a Windows Scheduled task by clicking on All Programs --> Accessories --> System Tools --> Scheduled Tasks. Besides specifying the task schedule, you would need to specify the exact pathname of the backup.cmd file.</p>

<p>That is all you need to do. Just look at c:\backup.log once in a while to make sure that the new or modified files are indeed being transfered.</p>

<p>No backup process is complete without the restore capability. So, here is the restore equivalent:
<font size=+1><pre>
rsync -avz  &lt;user&gt;@&lt;host&gt;:backup/&lt;datadir&gt; .
</pre></font>
Simple, inexpensive and elegant, isn't it!</p>]]>
      
   </content>
</entry>
<entry>
   <title>Sudoku Solving Program : Translating Python to JavaScript</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/03/sudoku_solving_program_transla.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.177</id>
   
   <published>2007-03-13T05:39:17Z</published>
   <updated>2007-03-13T06:42:39Z</updated>
   
   <summary>I had thought of writing a Sudoku solving program while in India last year, but then my laptop died on a sudden surge of voltage (not uncommon in India) and I had to shelf the idea. Once back to US,...</summary>
   <author>
      <name></name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[I had thought of writing a <a href="http://en.wikipedia.org/wiki/Sudoku">Sudoku</a> solving program while in India last year, but then my laptop died on a sudden surge of voltage (not uncommon in India) and I had to shelf the idea. Once back to US, I never found enough time to code the solution and the whole thing just faded away from my memory.

So when I came across <a href="http://norvig.com/bio.html">Peter Norvig</a>'s <a href="http://norvig.com/sudoku.html">Python program to solve every Sudoku</a> last week, I took the time to understand the underlying data-structure and the algorithm and couldn't help but appreciate the beauty of the solution (and Python code). I even downloaded <a href="http://norvig.com/sudo.py">the code</a> and tried it out on a couple of puzzles. It worked fast and without any flaw, but with a minor annoyance -- the program must be run as a command line tool and the input must be entered as a string of 81 characters. Not a problem for me, but perhaps not very "user friendly" to most people on the Internet.

So I decided to rewrite the program in JavaScript, the most ubiquitous language on the Web, with a pretty GUI and make it available as a <a href="http://pankaj-k.net/sudoku/sudoku.html">Web based program to solve Sudoku Puzzles</a>. In this program you can either enter a puzzle on a 9x9 grid in a WYSIWYG manner or just pick one from <a href="http://magictour.free.fr/top95">95 hard puzzles</a> as a starting point and then click on "Solve" button to get a solution.

]]>
      <![CDATA[One thing I noticed is that the JavaScript version is not particularly fast. Some of the puzzles take long enough (more than a couple of seconds, even on 2+ GHz machines) that the Browser may prompt you to abort (or Continue, you should click on "continue"). The original Python code solves the same puzzle on the same machine much faster, sometime by a factor of 10. This could be due to the fact that I did a literal translation of Python constructs to JavaScript and haven't paid any attention to reduce hash look-ups or any other optimization. This could also be due to slow JavaScript interpreters (I tried FireFox1.5+ and IE7.0).

Another significant observation I have is about the relative compactness and homogeneity of Python code. Python treats iteration over strings, lists, dicts, and sets uniformly and in a very natural fashion. The corresponding JavaScript code lacks this elegance. You should compare the <a href="http://norvig.com/sudo.py">Python code</a> with the <a href="http://pankaj-k.net/sudoku/sudoku.js">JavaScript Code</a> to get an idea of what I am talking about.

PS: I came across <a href="http://www.ecclestoad.co.uk/blog/2005/06/02/sudoku_solver_in_three_lines_explained.html">this super compact Perl script</a> to solve Sudoku puzzles while researching for this post, <em>after</em> I had already written the JavaScript code. This code employs a simpler data-structure and strategy to to eliminate values and <em>find</em> the right solution to keep the code size to a minimum, but perhaps is not as efficient. It would be interesting to see if this code can be modified to incorporate the elimination and search improvements of the former.]]>
   </content>
</entry>
<entry>
   <title>Slashdot Book Review Effect -- Visual Illustration</title>
   <link rel="alternate" type="text/html" href="http://pankaj-k.net/weblog/2007/02/slashdot_book_review_effect_vi.html" />
   <id>tag:pankaj-k.net,2007:/weblog//1.176</id>
   
   <published>2007-02-02T07:59:10Z</published>
   <updated>2007-02-02T08:39:55Z</updated>
   
   <summary> The slashdot effect on websites -- a crippling rush of visitors to the website linked to by a slashdot story -- is well known. Less well known is the effect on sale of books reviewed at Slashdot. Authors, including...</summary>
   <author>
      <name></name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://pankaj-k.net/weblog/">
      <![CDATA[<p>
The slashdot effect on websites -- a crippling rush of visitors to the website linked to by a slashdot story -- is well known. Less well known is the effect on sale of books reviewed at Slashdot. Authors, <a href="http://books.slashdot.org/article.pl?sid=03/12/22/1730229">including myself</a> have seen their <a href="http://charteo.us/definitions#sales-rank">Amazon Sales Rank</a>, a measure of relative strength in sales by Amazon in recent past, soar (or dip, if you are looking at pure numbers, but a dip in Sales Rank is actually quite good) in wake of a Slashdot review. But this phenomena is relatively unknown outside the inner circle of authors. 
<p>
As Amazon revises the Sales Rank every hour, and does not publish historical values, it has been hard to document the short term and long term effect of "Slashdot Book Review Effect".
<p>
Not anymore. You can use <a href="http://charteo.us">Charteous</a>, a website that tracks historical Amazon Sales Rank and presents them through nice charts, to visually observe "Slashdot Book Review Effect". This is actually quite fun!
<p>
Slashdot published the <a href="http://books.slashdot.org/article.pl?sid=07/01/29/1452212">review of <em>CSS: The Definitive Guide</em></a> on Jan 29th with a rating of 9 (out of 10, implying VERY GOOD). Watch the impact on Amazon Sales Rank thought this <a href="http://charteo.us/site-tour">Charteous widget</a>:
<p>
<em>Warning: Charteous widgets use SVG and work fine under Firefox1.5+ and Opera9+. If you are using IE6 or IE7 without Adobe SVG Viewer plug-in then it will ask you to download the plug-in. It is okay if you click Cancel. The widget will then display a PNG image of the chart.</em>
<p>
<script type="text/javascript">
Charteous = {params: {amzn: { asins: ["0596527330"]}}};
</script>
<script type="text/javascript"
src="http://charteo.us/charteous/show-chart.js">
</script>
<p>
Not enough time has passed since the day the review was published, so it is hard to see the long term effect. But as the chart is "live", you should be able to see the effect if you come back after few days.
<p>
Let us look at another example, one that has few weeks of post review data: Slashdot published the review of Wikinomics on Jan 3rd. Here is its Amazon Sales Rank chart.
<p>
<script type="text/javascript">
Charteous = {params: {amzn: { asins: ["1591841380"]}}};
</script>
<script type="text/javascript"
src="http://charteo.us/charteous/show-chart.js">
</script>
<p>
As you can see in both cases, there is a marked uptick on Sales Rank, and hence sales volume, just after the review hits the Slashdot main page. It lasts for a few days and then comes down to the normal (whatever that may be!).]]>
      
   </content>
</entry>

</feed>
