After reading no more than 10 pages of a new chapter of my new book, a basic and problematic thought came to my mind. First of all, I was thinking about how modern object recognition algorithms work. They are hazardously based on statistics which may or may not work depending on training samples. Regardless of the classifier used, which may be SVM, K-means, NN, etc, they all end up separating a hyperspace through a hyperplane to later let the user or the corresponding algorithm differentiate and draw certain conclusions. As far as I know, this is the general idea of how classifiers work, despite refined techniques to improve results such as boosting or cross-validation, to mention just a couple of examples.

Therefore, I was thinking about the perfect way to recognize (classify) objects having a look, as it is usual in engineering, to the nature: how it works in humans. Let us say that I want a machine to learn how to detect a pen given a picture. Let us not go in depth into certain intricacies such as the size (proportion), number of pens, orientation or color (I am being extremely optimistic). A pen, as any other object have some features that we can enunciate, mostly based on its contour and texture:
-Rectangle contour (or cilindric in 3 dimensions)
-Pointy end (part that we use to write)

Even if we somehow are able to mathematically model that information (optimistically), let us not forget that that features may correspond to thousands of different objects in the world. How do we know then that we have a pen in the table rather than a chopstick? Here comes the key: environment. We will probably find a pen in an office and a chopstick in an Asian kitchen, and when I say probably that means that statistics are inevitably needed.

The question now is: how do we know we are in a kitchen or some other place? You only need to imagine that you have a button which can randomly teletransport you into any room of your house. After you open your eyes and analize every object in the room, you will be able to make a very accurate guess. But how can you distinguish or classify objects if you previously have no information about where you are (environment)? Here is the paradox.

A clever guy could have realized that not always it is needed to analyze every single object to know where you are. In case of your bedroom, you may think that the most important object is the bed. Thus, after detecting this element you have an environment and you can continue recogning objects. Nonetheless, in spite of this new conclussion that we can recognize an environment given certain recognized objects, many unanswered questions come to my mind:

-What if we have just a partial image which does not include any crucial recognizable object to learn about the environment?
-Is it possible to model any environment? (how do you realized that you’re floating in the space or relaxing in the countryside)
-What if those crucial recognizable objects have common features and tend confuse our algorithm?

It is not especially hard to come up with innumerable questions like those.