Deep learning and Bayesian Nonparametrics

Deep learning thrives on problems where automatic feature learning is crucial and lots of data are available. Large amounts of data are needed to optimize the large number of free parameters in a deep network. In many domains, that would benefit from automatic feature learning, large amounts of data are not available, or not available initially. This means that models with low capacity must be used with hand crafted features, and if large amounts of data become available at a later time, it may be worthwhile to learn a deep network on the problem. If deep learning really reflects the way the human brain works, then I imagine that the network being trained by a baby must not be very deep, and must have few parameters than networks that are trained later in life, as more data becomes available. This is reminiscent of Bayesian Nonparametric models, wherein the model capacity is allowed to grow with increasing amounts of data. 

Creativity is Just Connecting Things

"Creativity is just connecting things. When you ask creative people how they did something, they feel a little guilty because they didn’t really do it, they just saw something. It seemed obvious to them after a while. That’s because they were able to connect experiences they’ve had and synthesize new things. And the reason they were able to do that was that they’ve had more experiences or they have thought more about their experiences than other people. (...)

Unfortunately, that’s too rare a commodity. A lot of people in our industry haven’t had very diverse experiences. So they don’t have enough dots to connect, and they end up with very linear solutions without a broad perspective on the problem. The broader one’s understanding of the human experience, the better design we will have."

- Steve Jobs

Defining Intelligence

Intelligence is a slippery thing to define. The following definition recently struck me:

That which allows one to maximize happiness over long term.

I like this definition because it is short (c.f. MDL, Occam's Razor), it makes logical sense, and it carries a lot of meaning without going into details of how to be intelligent. It is logical to me because of the following argument: Suppose a person is allowed to life two versions of his life starting from some fixed point in his life. All events and circumstances in the two versions are the same except for actions taken by the person. Then he can be said to be more intelligent in that version of his life in which he achieves greater happiness over the rest of his life.

Intelligence is needed in order to understand what actions will make us happy, for how long, and whether there will be any effects of those actions on our future happiness. Making decisions to maximize cumulative happiness is certainly a non-trivial task. Sometimes one must put oneself through short-term adversity (e.g. graduate school at little on no stipend, or an athlete undergoing gruelling training for a race) to be able to do well later. Sometimes, one decides to undertake an action that provides short term happiness, but at the cost of long term happiness. It takes intelligence to learn to avoid such behaviour n the future.

Modern definitions of intelligence from the scientific and psychology community are incredibly long-winded [Wikipedia]

A very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings—"catching on," "making sense" of things, or "figuring out" what to do.

and:

Individuals differ from one another in their ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought. Although these individual differences can be substantial, they are never entirely consistent: a given person's intellectual performance will vary on different occasions, in different domains, as judged by different criteria. Concepts of "intelligence" are attempts to clarify and organize this complex set of phenomena. Although considerable clarity has been achieved in some areas, no such conceptualization has yet answered all the important questions, and none commands universal assent. Indeed, when two dozen prominent theorists were recently asked to define intelligence, they gave two dozen, somewhat different, definitions.

The same Wikipedia page also lists various different definitions given by researchers. The long-windedness of these definitions is somewhat excusable as an attempt to be all-inclusive and general. But in the end, the notion of intelligence is a man-made model, invented to try and explain phenomena. I think a focus on happiness as the central phenomenon to be explained goes a long way in simplifying our understanding of intelligence.

What is the biggest problem in the world?

I have been posing this question to friends and acquaintances (and to myself) in one form or another for a while now. The answers I have received have varied significantly. I am not the first to pose this question of course. Here is one of several online polls, posing the same question, with 700+ responses so far. Here are some others. Some of the responses I have received personally and gathered from various online postings like the ones above, in no particular order include:

Environmental change Poverty, Hunger, clean drinking water
The P vs NP problem Ego
War Communication between people
Lack of tolerance Jobs, economy
Ignorance Fear
Greed Lack of genuine love, hatred
Religion Racism
Moral decline Energy shortage
Sin Drugs
Terrorism Apathy, lack of empathy
Anger Pollution
Love for Money Politics
Forgetfulness of God Overpopulation, limited world resources
Toxic waste Consumerism
Death Selfishness
HIV All –isms: Nationalism, sexism, racism..
Cancer Envy

One of the problems with the way the question is posed above is that it does not specify what 'problem' and 'biggest' mean.

Define problem. We will define problem as 'That which brings suffering to humans'.

Define biggest.  Biggest could mean 'one that affects the largest number of people', 'the scientific problem that would create the biggest impact if solved', or 'one with the greatest economic impact', etc. I am interested in a specific version of this question, in which 'biggest' means 'most fundamental', i.e. one which can be said to be a root cause of many other problems.

Causal structure. A natural question to pose in order to move in the direction of getting an answer to my version of 'biggest problem' is: how many degrees of freedom are really present in the above responses (and what are they)? That is, are they all independent problems, or do they stem from a relatively small set (1-2) of root causes (with others being effects)? For example, lack of tolerance and energy shortage can be said to be causes of war.  It is also clear that not all the problems listed above are at the same level of generality -- some seem intuitively more abstract or fundamental than others. For e.g., war seems more in the realm of effects or symptoms, compared to say anger, fear or greed. In other words, even though they are all problems, some of the items in the list above are really effects rather than causes, and I am interested in the causes. To restate the question properly:

What is the true causal structure of the world problems?

Here is a small toy example of what I mean by causal structure:

An arrow from A to B indicates 'A causes B'. In the above example, energy shortage is stated to be a cause for war, and lack of tolerance is also stated as a cause for war. Also, once energy shortage is taken into account as a cause for war, then war is not caused by overpopulation or consumerism. In other words, overpopulation and consumerism do lead to war, but only through energy shortage.

One correct answer. What strikes me most about the restated question above is that there must exist a definite answer to it. That is, there is an objective reality associated with the question. The causal structure is not a matter of subjective opinion. There is one true structure of cause and effect. I am not claiming the number of independent root causes at the very top of the causal structure is 1 (perhaps this is the case). All I am saying there is one definite causal structure. The 'one correct answer' aspect is interesting because while it is arduous to build a causal structure, checking whether a proposed structure makes sense should be much easier.

I am looking for this causal structure. I think that gaining an understanding of the causal structure can be more insightful than an understanding of the each of the problems in isolation [1]. If you think you have have a causal structure of even part of the list of problems above, please write to me or leave me a comment. If you contact me with a proposed causal structure, please use the following format:

Cause1 -> Effect1 Cause2 -> Effect2

and so on, with one cause-effect pair per line. For the above toy example, this would be:

Overpopulation -> Energy shortage Consumerism -> Energy shortage Energy shortage -> War Lack of tolerance -> War

Think of this as a jigsaw puzzle, in which the problems are the blocks (feel free to pick whatever set of problems you want from the above list, or otherwise. Of course, the more complete the set, the better.), and one has access to as many arrows as needed (The fewer the arrows, the better).

_____

Notes

[1] I think this may be true in general. In middle school I recall homework and exam questions in various subjects asking us to fill in the blanks or match entries in column A with the entries in column B. I feel explaining the causal structure between a set of things would make a very instructive exercise in school because it would force a student to think.

Habit Formation, Part 2

This post builds upon an earlier post on modeling habit formation. To follow this post, please read this post first. Go on, I'll wait. The urn model in that post is not really a complete description of habit formation because the limit distribution of the system (i.e., fraction of white balls after a very large number of draws) is completely determined by the initial configuration of the urn (i.e., number of white and black balls). In reality, habits are formed from repeated actions, and our actions are based on not just our inclinations but also the conscious choices that we make, which can sometimes be against our inclinations. Free will is missing from this model. There is no way to break a bad habit if one start out with one.

Therefore, let us alter the basic urn model slightly to capture choice: At each step, before picking a ball, we explicitly declare which color we are interested in for that pick: a white ball ( = follow the desirable habit) or a black ball ( = follow the undesirable habit). Suppose we choose to aim for a white ball. A ball is then sampled from the urn at random, and if a white ball does indeed come up, then:

  1. A payoff of $W is received (We wanted white and we got white, so we are happy to the extent of $W).
  2. The white ball is placed back into the urn, along with an extra white ball (Reinforcement).

But if we decide to pick white and a black ball comes up, then there is no payoff, and the black ball is placed back in the urn without any extra ball. However, if we chose black, a black ball is guaranteed to show up, we get a payoff of #B, where # is a different currency from $, and the black ball is placed back along with an extra black. The extra black ball makes picking a white ball harder when a white is desired.

Consider the implications of this model:

  1. Suppose there are very few white balls compared to black. A white ball would rarely show up. With the decision to pick a white, most of the time there will be no payoff and no reinforcement. But with the decision to pick a black ball, there is guaranteed payoff of #B units, but at the cost of picking a white later on harder.
  2. As the bad habit is reinforced, the chance of reverting to the good habit never dies away completely, but diminishes as the number of black balls grows in comparison to the whites. ('It's never too late, but it does get harder').
  3. The purpose of the different currencies is to model the fact that the pleasure obtained from following a bad habit is qualitatively different from the one obtained from following a good one. Since most habits are based on short-term pleasure at the expense of long term benefit (e.g. cigarette smoking, binge eating /drinking, putting off exercise),  currency # may correspond to lots of short term kicks, while $ may correspond to long term well being.

Short-term thrills

I believe human behavior is fundamentally driven by a desire to be happy. However, different individuals can have very different ideas about what will make them happy. Since most bad habits result from a focus on short term happiness, at the expense of long term happiness, we would do well to make the following tweak: # and $ can be added to give a total payoff, but each unit of # currency is worth only $latex \epsilon < 1$ unit, 1 iteration after it was earned. The other currency, $, however, does not decay with time. The question is what behavior emerges when the objective is simply maximization of total payoff? It is intuitive that the answer will depend upon whether the game has a finite horizon or an infinite horizon. Since players have finite lifetimes, it is appropriate to use a finite time horizon, but since a person does not know when he/she will die, the horizon must be a life expectancy with some uncertainty (variance). In other words, nobody knows when they will die, but everybody knows they will die. The only time it makes logical sense to pick a black ball is if the player believes his remaining lifetime is so small that the expected cumulative gain from deciding to pick whites is smaller than the expected cumulative gain (including the decay effect) from picking blacks. Of course this depends upon how he spent his past -- i.e. whether or not he built up a high fraction of whites.

Breaking a bad habit

Suppose a person starts out with a predominant inclination towards an undesirable habit and wishes to switch to the good habit. It is clear that if she always chooses the white ball, then she will eventually succeed in creating a larger number of white balls and a good $ balance. But she will have to go many tries and no payoff before that happens. In practice, this requires determination, and she may be tempted to pick the black ball because it is so easily available.

Suppose $latex D \in [0,1]$ is the fraction of times that (s)he will decide to aim for a white ball, i.e. the player's determination. It is intuitive that a larger value  $latex D$ would help her switch to the good habit in fewer iterations. It would be interesting to see if there is a threshold value of $latex D$ below which the bad habit is never broken. In particular, I expect there to be a threshold value $latex D(w,N-w)$, below which the bad hait is never broken, w.p. 1, where $latex w$ is the number of whites in the urn and $latex N-w$ is the number of blacks.

Game over when wealth < 1 unit

Now let us further suppose, that in the course of playing the game, a player can never have less than 1 currency units. In other words, the game stops if the total currency units with a player reaches zero. All other rules are the same: 1 unit earned from a white ball, 1 unit from a black ball but units earned from black balls decay to $latex \epsilon$ of their value every time slot. Deciding to pick a black guarantees a black, and an extra black goes in. Deciding to pick a white, provides a white with a probability proportional to the fraction of whites in the urn, and an extra white goes in. With the new rule stipulating 'game over' when the player's wealth falls below 1, the player is incentivized to pick a black if she ever approaches 1 unit of wealth. This is meant to model the inclination of individuals with low self worth and happiness levels to engage in activity that would yield short term happiness (alcohol, drugs, binge eating, etc. ) at the expense of long term prospects.

Learning using Puzzles

Mathematical and logic puzzles are a great way to learn. I think they should be used to a greater extent in school and college education than they are, because they:

  1. Create an excitement in the mind of the student/solver.
  2. Force the solver to think about what method to use to attack the problem. This is much closer to real world situations, where one is not told about what area a problem is in. Tech companies recognize this when they use puzzles  while interviewing job candidates as a way to observe how the candidate thinks.
  3. Over time, allow students to recognize patterns in problems and the best way to attack them, so that wehn they are faced with a new, real world problem, they have a good idea of what to try first.

I am certainly not the first one to note the educational value ofpuzzles. Martin Gardner, whose puzzles I have tremendously enjoyed has himself advocated for puzzle-based learning.

Each puzzle is usually based on one (or, sometimes two, but very rarely more than two) fundamental areas of math: probability, number theory, geometry, critical thinking etc. For example, take the following puzzle (try these puzzles before reading answers at the end of the post):

[1] By moving just one of the digits in the following equation, make the equation true:

$latex 62 - 63 = 1$

It should be clear that this puzzle is part math and part logic.

Often there is more than one way to arrive at the solutions -- e.g. using a geometric visualization for a algebraic problem in 2 or 3 variables. For example take the following puzzle:

[2] A stick is broken up into 3 parts at random. What is the probability that the thee parts form a triangle?

One may choose to solve the above problem using algebra, but it can be visualized geometrically too.

Speaking of geometric methods, there is also something to be said about solving puzzles inside one's head, rather than using pen and paper. I feel it adds to the thrill because it can be more challenging. It is also fun to visualize objects and their relationships in mathematical terms,  completely inside one's head. The following is an example of a pretty hard puzzle, which was solved entirely inside one mathematicians head :

[3] Peter is given the product of two integers both of which are between 2 and 99, both included. Sally is given only their sum. The following conversation takes place between them:

P: I cannot figure out the two numbers from the product alone.

S: I already knew that you couldn't have figured them out.

P: Oh, now that you say that, I know what the two numbers are.

S: I see. Now that you say you know what they are, I also know what they are.

What are the two integers?

___

ANSWERS:

[1] $latex 2^6 - 63 = 1$

[2] Without loss of generality, assume that the stick is of unit length. Since the three lengths add up to 1, and are each $latex \in [0,1]$, they can be treated as three random variables $latex (x,y,z)$ in 3D space, lying on the 2-dimensional triangular simplex with vertices at (1,0,0), (0,1,0) and (0,0,1). The problem now boils down to determining what fraction of the surface of this triangle corresponds to tuples that cannot form a triangle. From the triangle inequality, we know that $latex x + y > z$ and so on for the other two pairs. A little mental visualization reveals that these boundary conditions are met when each side of these inequalities is $latex \frac{1}{2}$. The region corresponding to feasible points is a smaller equilateral triangle inscribed within the simplex and is one-fourth the area of the simplex. The answer therefore must be $latex \frac{1}{4}$.

[3] The answer can be found in this PDF of a note written by EW Dijkstra, who solved the problem in his head.

Andre Geim

"Many people choose a subject for their PhD and then continue the same subject until they retire. I despise this approach. I have changed my subject five times before I got my first tenured position and that helped me to learn different subjects." - Sir Andre Konstantin Geim (2010 Nobel Prize winner in Physics).

You are my hero Andre.

Compressive sensing of vacant parking spaces with mobile sensors

My work on mobile sensing for automotive parking applications has just been accepted to ACM Mobisys 2010 and I will br presenting this work at Mobisys in San Francisco in June 2010. The paper is about the design, implementation and evaluation of a mobile sensing system called ParkNet, for the purpose of harvesting in as close to real time as possible, information about the availability street-parking spaces in urban areas. ParkNet was recently featured in the MIT Technology review. It has just been covered by Rutgers Today, an organization within Rutgers University that produces news articles about promising research efforts within Rutgers. ParkNet proposes that sensors for sensing vacant street side parking spaces be made mobile (see this project trial that uses stationary sensors installed in the asphalt road, one sensor per spot, presumably at a huge cost) by exploiting vehicles that regularly comb city environment, such as taxicabs. ParkNet can be thought of as a compressive sensing system because it drastically reduces the number of sensors needed [with a corresponding dramatic decrease in overall cost], with a very small loss in spatio-temporal accuracy.

Naturally, one of the things I needed to figure out was: how well can a fleet of N taxicabs cover a given geographical area, in terms of mean time between successive cabs visiting a given street? How does this number vary with N. The San Francisco Taxicab dataset came in handy. The visualization on the right is made with good old Matlab©. It shows GPS coordinates of a single taxicab in the San Francisco area, measured about once per minute, over a period of 30 days. When I plotted this, I immediately realized that taxicabs would be great for ParkNet because most taxicabs tend to spend most of their time in the crowded downtown area of the city (dense, upper right corner in the plot), and this is where the parking problem is most serious.

Update: My ParkNet paper received the best paper award at ACM's annual Mobile Systems and Applications Conference [ACM Mobisys]!