Cookbook-style programming

This conversation surfaced during a debate with a friend, how working on hard problems on software engineering (AI, etc) is more fun and rewarding than working on plain-old CRUD web application.

I like reading about research, implementing random algorithms that I run into in publications and blogs. But I would still consider that as a cookbook-style programming, where I look up for an algorithm, and cook it up according to the recipe predefined recipe.

I might be able to understand the recipe, and be able to modify the recipe a little to make it more sweet, but I would still not know how to make a new recipe.

And is it wrong? is it wrong to enjoy cooking from cookbook, and not knowing or wanting to create your own new recipe

Just sayin'.

Posted in Programming | Leave a comment

Problem of escaping in programming languages.

Almost any kind of textual data format, you'll have to implement escaping of some sort. For example:

  1. In programming languages, string literals are delimited by a quote or double-quote (' or ").
  2. Variants of SGML (including XML and HTML) uses <, >, and / extensively to define markups.

In the above languages, ', ", <, > and / all have special meaning in the data format. However, we may still want to represent them within the data format. For example, how would we distinguish between a quote that starts a string literal ("foo") versus a quote inside a string literal ("& and " are special characters")?

To solve this problem, we introduce the concept of escaping. Special characters has special meaning within the data format are represented using two or more characters in the data format. In many programming languages, \" is used to represent double quote.

To explain this in more complicated way - this problems arises because we're trying to represent more information than we already can. A document with N bytes can represent 256N different variation of data. However, by defining a data format on top of sequence of bytes, we're trying to add more semantic on top of it. If we still want to represent 256N different variation with additional semantic, we need to use more bytes to accomodate that.

There are few problems with escaping, though.

Escaping can introduce invalid or uncertainty to data formats

There is no such thing as invalid text, because every variation of 256N is a valid string. However, as soon as you introduce escaping, you're also introducing

In many languages, \t is a tab, \n is a newline, \\ is a backslash. What is \a, \b, or \c? If no such escape sequence exists, what should exactly happen?

  • Input is rejected (Java)
  • Input is interpreted as is, as if \ is not an escape character (Python)
  • Backslash is ignored (C)

strlen("\k") is 1 in C. len("\k") is 2 in Python, and "\k".length() doesn't even compile in Java. This is problematic because very few people remember all the escape sequences, and the variations between different programming languages.

Introducing escaping in a format introduces more escaping

To have a double quote in a string literal, one needs to type \". Now we can have double quotes in string literals, but how about a backslash? Now that backslash has additional semantic (used to initiate escape sequence), to enter a backslash, it must be escaped. The lesson here is that characters that initiate escape sequence now need to be escaped. This is same for & in XML.

If you have a data embedded in another data, you may need to perform multiple escaping.

No, we're not going to have CSV file with XML document in the cells, where XML document contains Java code. This sounds like an anti-pattern, but it occurs quite common in real life.

  • Regular Expressions in Java. Similar to other languages, backslash is used to initiate escape sequence in string literal. However backslash is also used to initiate escape sequence within regular expression Thus, to have a regular expression to match against a backslash, you need to write 4 backslashes (!!!!) in the string literal.
  • Embedded Javascript in HTML. Javascript code in <script> tags need to be aware of escaping of HTML. This question in Stack Overflow (http://stackoverflow.com/questions/4176511/embedding-json-objects-in-script-tags) gives a detailed discussion.
Posted in Uncategorized | 1 Comment

Grepability 101

Programming in dynamically typed languages such as JavaScript or Python gives you a lot of freedom over how you structure your code. However, you lose a lot of powerful static analysis and refactoring tools that are available in languages like Java. For example, in Eclipse, you can easily perform tasks such as:

  • Find all the occurrence of a given class or method.
  • Rename or change signature of a given method, and replace all the references to the previous method.

The fundamental nature of dynamic languages make it very challenging to reliably perform those tasks. Thus, instead of relying on sophisticated tools and IDEs, I end up using traditional unix tools such as grep and find to aid me.

However, it is important to give your code more structure to use those tools more efficiently. Here are few things I've learned:

Avoid short and common names - If you have a variable with short names such as x, y, or i, it would be almost impossible to efficiently find all the occurrence of them. Generic names such as min, max, and index are bad for the same reason. Names such as CategoricalFiltertwitter_api_handler are good examples.

Prevent names from being a substring of another - This one is less detrimental than the above one, but it can be annoying if you encounter it frequently. If you have a identifier of both form X and YX (Say, View and PanelView), grep for X would also result in YX. This can be avoided by say, renaming View to BaseView.

Try to use fully qualified names as much as possible - By fully qualified name, I'm referring to the names that includes the module name or namespace identifier. The reason for this is simple - fully qualified names are easier to grep for. In Python, it means that you should prefer "import x.y.z" over "from x.y import z", because the first construct forces you to refer to z via its fully qualified name. In Coffeescript, syntax {identifiers...} = module serves a similar purpose in this context, so it should be avoided.

Avoid referencing names over multiple lines - This can occur if the fully qualified name is too long. if the name is x.y.z, keep it one line.

(Javascript) - Modularize / namespace your code - By default, JS doesn't provide any abstraction for module, and it's very tempting to place all the function declarations, global variables, and prototypes in the single global scope, but this is bad and unscalable, because it's not trivial to trace back where a given function or variable is declared from. To prevent this, do something to give the code more structure. You don't have to do something very fancy:

  • Make sure that every variable in a JavaScript has a specific prefix. For example, all the functions declared in foo.js may start with prefix foo_.
  • Each JavaScript file defines an object that is the same name as the file, and all the variables are defined in that object.
  • Use module management libraries such such as require.js.

I might be missing a lot of stuff, but those are few things I'm thinking about whenever I'm coding in JavaScript and Python.

Posted in Programming, Uncategorized | Leave a comment

Judging a book by its cover

There's a popular English idiom, Don't judge a book by its cover. It means that you should not judge others solely upon their appearance or first impression.

However, it's helpful to know that this phrase was first said in 1800s. Back then, all books looked the same. Blend covers with gloomy hues, centered titles with the same fonts. When all are the same, it is fatal to judge them by their cover.

It is different nowadays. The authors and publishers have the full control over the covers of their book. They can paint it as unique as they want. It's yet another facade for them to express its message. And yet, if you don't do anything about it, then it's inexcusable.

The phrase still holds a valuable message. We should not approach the world with prejudice. But we should not shelter upon its wisdom. Don't expect the others to not judge you by your cover, especially when you and the everyone else have the full control over it.

Posted in Uncategorized | 1 Comment

The Unbearable Lightness of Chocolate

Imagine a young passionate yet inexperienced man, trying to impress his dear lady with a gift. He's faced with two conflicting objectives. First, he wants to make an everlasting impact. He wants his gift to have some form of permanent impact on her. Old wises can ridicule him, telling how futile and greedy he is by pursing permanency in this world. However, even those cynical laughing crowd will have some sympathy towards this young inexperienced soul.

Secondly, he is afraid that his gift is not welcomed by her. What if the gift does not suit her taste? Even worst, what if the gift makes her angry or hostile? What if he succeeds at leaving an everlasting impression on her by upsetting her?

As he looked around the mall to find a suitable gift, he passed by a chocolatier. Chocolates - aren't they wonderful? The young man thought to himself. Sweet, bitter, deep, smooth, hence indulging. No one dislikes chocolates, or there's no reason to be hostile towards them.

But they are so volatile. Once they've all melted in her mouth, only its crumbs and wrappers would remain in the box. At some point, even this lonely box will be thrown out, leaving no traces of itself. Not even in her memory. Eventually, even he will forget the box of chocolate that he have poured his heart into. It's as if this box of chocolate have never exited in this world. That's bit sad, depressing, but isn't everything in the world like that? We think to ourselves that there's element of permanency in our lives. But everything will be forgotten, flushed into the oblivion...

"Sir, May I help you?", the clerk have awaken him from his train of thought.

"Yes, can I have a box of milk chocolate? Gift wrap it for me, please."

Posted in Kundera | 1 Comment

Graphomania - Desire to write

In his book The book of laughter and forgetting, Kundera makes a cynical criticism on graphomania - desire to write.

The reason we write books is that our kids don't give a damn. We turn to an anonymous world because our wife stops up her ears when we talk to her ... Graphomania (an obsession with writing books) takes on the proportions of a mass epidemic whenever a society develops to the point where it can provide three basic conditions:

1. a high enough degree of general well-being to enable people to devote their energies to useless activities;

2. an advanced state of social atomization and the resultant general feeling of the isolation of the individual;

3. a radical absence of significant social change in the internal development of the nation. (In this connection I find it symptomatic that in France, a country where nothing really happens, the percentage of writers is twenty-one times higher than in Israel).

It is strange to find that even an influential writer like Kundera feels uncertain about himself. Combination of humbleness and self-degeneration. Regardless, his observation of our society is surprisingly accurate. If I have to summarize Kundera's point, we will have unstoppable desire to write if the following conditions hold in our world:

  1. We only face first world problems.
  2. We are lonely.
  3. We are bored.

With the advent of internet and social networking services, Kundera's vision was never truer. Remember, Laughter and Forgetting was published in 1979. Blogging, Facebook, and Twitter are suppose to make us more open, social, and connected. But are they? In my opinion, those services do not eliminate Kundera's three conditions of graphomania (The first criterion is not a problem so it is better to be left without a solution). Social media services does not stop the social atomization, and it is not a significant social change.

The last sentence is controversial without an explanation. Advent of internet and social media are arguably the biggest change since the industrial revolution. It promotes the free, uncensored spreading of the messages, and played influential roles in recent movements such as 2011 Egyptian Revolution and Occupy movement. Arguably, if Facebook existed during the cold war, it could have stopped the Russian intervention of Czechoslovakia in 1948. Prague would have rejoiced in its perpetual spring, and Kundera would not been bitter enough to write his novels.

I'm not objecting their monumentality. But I do think that it is not a change by itself, but a catalyst for the other changes. It is a very powerful catalyst, soil for the future revolutions. But without a powerful social phenomena to exploit this catalyst, social media would not stop our craving to write. So I have to rephrase myself: Social network is not the significant social change that would make us to forget our boredom.

So social media do not solve our problems. However, it they are very good at mediating the symptoms. They fill our graphomaniac needs. I can write about Kundera, how much I find him influential, and force it down my friends' throat via Facebook. I can tweet about it too, and hope that the entire world care about it.

I will end the post with a demotivational poster.

BLOGGING - Never before have so many people with so little to say said so much to so few

Posted in Kundera, Uncategorized | Leave a comment

Email notification whenever an user ssh to your Linux machine

TL;DR - Instructions

  1. Install pam_python module. This can be done in ubuntu by "sudo apt-get install libpam-python".
  2. Download the following script (https://gist.github.com/2380454), to a convenient location. I saved it to /lib/security/pam_notify.py
  3. In the file pam_notify.py, modify the variables FROM_ADDRESS and TO_ADDRESS.
  4. Add the following line to /etc/pam.d/sshd

    session optional pam_python.so /lib/security/pam_notify.py

  5. Done. SSH into your server, and see if you get an email. Check your spam filter, too.

Introduction

For security reasons, you might want to be notified whenever someone logs into your server. I've asked the following question - http://serverfault.com/questions/375558/how-to-email-notify-admin-when-users-log-in-to-the-linux-server, and I was referred to a C PAM module to do the job. I couldn't get the module to work for some reason. I wasn't feeling to happy to debug C code to figure out what's going on under the scene, so I re-wrote it as a python script.

Explanation

We're using the following technologies

  • Pluggable Authentication Module (PAM)
  • pam_python
  • Python
  1. PAM gives you fine-grained control over the authentication system in an *nix environment. For our use, we're adding the logic to send email whenever a ssh session is created.
  2. PAM modules need to be compiled into *.so, native shared library object. Which is annoying for debugging and development, because it means that the PAM modules need to be written in C, and need to be compiled. However, the library pam_python provides a bridge between PAM modules and python scripts, allowing you to write them in Python.
  3. pam_notify.py is our little nifty script to send email notifications.
  4. Finally, the PAM module needs to be configured into the machine. All the PAM configuration files are stored in the directory /etc/pam.d/. The line "session optional pam_python.so /lib/security/pam_notify.py" in /etc/pam.d/sshd configures PAM to call the python script whenever a ssh session is created.
Posted in Uncategorized | 3 Comments

Startup snippets

Few snippets of conversations I had with other people while running Polychart...

Snippet #1

A: Do you guys have a website ready?

B: No, but we will by this Friday.

Snippet #2

A: But isn't it impolite to act so?

B: Bah, you're running a startup. Being polite is the last thing you're caring about. Stop be so Asian.

Posted in Uncategorized | Leave a comment

On triviality

As a math student, one of my favorite mathematical jargon is trivial. In mathematics, a given problem or structure is trivial if it is relatively simple or well-understood.

Some trivial structures in mathematics are semi-formally defined. Empty set, singleton group, and singleton ring are considered trivial, and most mathematicians refer to them by trivial set, trivial group, and trivial ring respectively.

Informally, mathematicians say that a problem is trivial if it is well understood. This is very subjective, because it really depends on one's mathematical knowledge. "Union of two finite sets is finite" may be trivial to almost everyone. "Differentiable functions are continuous" is a trivial fact for senior math students, but it's a totally reasonable question to appear in a first year calculus exam. Algebraists say that finite abelian groups are trivial. They're all isomorphic to direct products of cyclic groups, which are relatively well understood structures.

I started to use the term trivial in my every day conversation. When someone told me that they did well on their exam, I replied "Well that's trivial, you normally get over 80 in your tests anyway". If someone complained about problem they're having with one's girlfriend, I would've replied "You know what, I think your problem is trivial. There are so many people who already went through this problem".

Then I came back to reality, and realized that the word trivial has more negative, non-mathematical meaning. Oxford English Dictionary says:

Of small account, little esteemed, paltry, poor; trifling, inconsiderable, unimportant, slight.

 

All the problems that my friends are having - they were not inconsiderable and unimportant. They were all very grave matter to me. But for me, I understood their graveness, but the problems themselves were within the predictable boundary of my expectations, thus trivial.

In some sense, mathematicians are always addicted to new unsolved problems. Their job is to turn mysteries into trivialities. I personally think that's the biggest philosophical difference between mathematicians and engineers. Whenever mathematicians discover a pattern in structure, they are awed by its beauty, and move onto the next problem, bored by its triviality. Engineers, however, when they discover a pattern in nature, they want to understand it, exploit its predictability, and integrate it to their work. Final example of this is regular expressions. Regular languages are well understood within the hierarchy of formal languages. Regular languages themselves are not very interesting in theoretical computer scientist's perspective, but regular expressions are one of the most useful things in computer programming and software development. They're simple to understand, and they're very powerful tools. Engineers love them.

Posted in Mathematics, Uncategorized | Leave a comment

A new year

Within several hours, it is going to be year 2012, and it is a good time to start a blog.

-Jee

Posted in Uncategorized | Leave a comment