Personal Blog

    I temporaly decided to stop writing about general stuff and focus on Artificial Intelligence. Because of this, I opened a new WordPress blog in a subdomain of mine –>

    Neural Networks, Image Processing, [Un]supervised learning, and so on. I also decided to share every piece of code uploading it to my personal github repositories which can be found in the new blog.

    Long ago, around 4 years ago already, I implemented a Twitter Fingerprinting tool in PHP that was available for everybody which unfortunately stopped working when the API changed. I was really excited because I could also help the Spanish police with another version of this online tool.

    Recently, I decided to implement something similar in Python, but with a considerable difference: not using the Twitter API. There are many reasons:

    • Due to the own Twitter API limitation, you cannot use the application intensively. I do not know the limit nowadays, but long ago it was around 150 requests per hour and IP.
    • I do not usually like that the guys of Twitter know what I am doing with the API.
    • If you use the API, you have to make an account, and use certain credentials. Some people may not like this.
    • Less flexibility. With the API you only can do what you can only do. Obviously.

    On the other hand, the basic disadvantage of not using the API and parsing code with regexps directly is that a small change in their website will make the code not work properly.

    After saying this, here is the Github link of this small library where you can find more information.

    An example of how to use it:

    Get all the images uploaded from a specific user:

    from TwitterFingerprint import TwitterFingerprint
    tw = TwitterFingerprint("google")
    tw.obtainLastTweets() # Get all tweets

    Get the language, hashtags, and text (tweet) of the last 30 tweets:

    from TwitterFingerprint import TwitterFingerprint
    tw = TwitterFingerprint("google")
    for tweet in tw.tweets:

    Get three histograms (months, weekdays, hours) of the last 500 tweets. Interesting when analyzing when someone is using Twitter.

    from TwitterFingerprint import TwitterFingerprint
    tw = TwitterFingerprint("google")
    [histMonths,histWeekdays,histHours] = tw.getHistograms()

    As a Japanese language learner, any tool that makes this difficult journey easier is always welcome. I do not want to debate anything about the learning process, so I will just say that, as with any other language, the “reading” skill is very useful as a proof of understanding both grammar and vocabulary. For this reason, everyday I have to say “thank you” for the guys of NHK Web Easy, who upload news in very simple Japanese (the grammar is very simple, the vocabulary is something around “medium level”).

    When I go to the university I always read on the metro as many news as I can, but very often I have to switch back to my dictionary because I do not have certain words in my vocabulary. This event makes the reading task more difficult because during the “long” process of memorizing the word (reading+writting), minimizing the browser, opening the dictionary, writting the word and understanding it, I usually forgot what I was reading before. And it is not a matter of memory. When you are reading in a language you are not good at, it is extremely difficult to keep track of everything, especially in Japanese where the grammar is absolutely different from any European language (even more different than Finnish).

    Therefore, I decided to make a tool to avoid all those previously mentioned steps (except for the “understanding part” of course). This tool allows me to read very fast and make the reading way more pleasant. I called this tool “NHK Reader” and, in a few words, is a language parser tool.


    1. Takes the text from the news
    2. Uses to separate the words (POS tagging)
    3. Uses to get the meanings of the words
    4. Pastes that into the webpage (using jQuery)

    POS tagging is not something easy, so it does not detects all words correctly and, sadly, I cannot do anything about that since I take the output from

    A couple of screenshots


    This tool is parsing directly from jisho and nhk, so the regular expressions are hardcoded and it might fail if the owners decide to change the HTML code, but it should not be difficult to fix.

    The code will be available on my github when the version 1.0 is ready.

    My computer:
    Windows 10, Python 3.4, OpenCV 3.0.0

    I had uncountable desperate attempts to install OpenCV 3.0 in my machine, including the always last plan when installing software: building it from the source. There is an apparently nice tutorial from the official OpenCV webpage which didn’t work for me because I needed to do a couple of steps, so please, try to follow this tutorial (section Building OpenCV from source) with this addition tips:

    • 7.2: Use the path you want, but remember that you will not be able to delete it (because you need to refer to that path) so choose wisely.
    • 7.4: You can use any Visual Studio. In fact, I used Visual Studio 13. Just remember to specify it when configuring CMake
    • 8,9,10,11: When you are checking and unchecking all those checkboxes you will realize that many of the options are not listed on the provided pictures. What to do in that case? Easy: just leave it there as it is.
    • 16: Some of the projects will not be built and will be skipped. Don’t worry.

    Now, it’s supposed to be installed in your computer, and if you try to import cv2, it should work.
    However, at least for me, it didn’t work. The final step to get rid of that annoying message is to add the appropiate path to the PATH system variable. This is the path that you have to add: X\bin\Release where X is the folder where you compiled it.

    I hope this is helpful. This could have saved me many hours…

    After reading no more than 10 pages of a new chapter of my new book, a basic and problematic thought came to my mind. First of all, I was thinking about how modern object recognition algorithms work. They are hazardously based on statistics which may or may not work depending on training samples. Regardless of the classifier used, which may be SVM, K-means, NN, etc, they all end up separating a hyperspace through a hyperplane to later let the user or the corresponding algorithm differentiate and draw certain conclusions. As far as I know, this is the general idea of how classifiers work, despite refined techniques to improve results such as boosting or cross-validation, to mention just a couple of examples.

    Therefore, I was thinking about the perfect way to recognize (classify) objects having a look, as it is usual in engineering, to the nature: how it works in humans. Let us say that I want a machine to learn how to detect a pen given a picture. Let us not go in depth into certain intricacies such as the size (proportion), number of pens, orientation or color (I am being extremely optimistic). A pen, as any other object have some features that we can enunciate, mostly based on its contour and texture:
    -Rectangle contour (or cilindric in 3 dimensions)
    -Pointy end (part that we use to write)

    Even if we somehow are able to mathematically model that information (optimistically), let us not forget that that features may correspond to thousands of different objects in the world. How do we know then that we have a pen in the table rather than a chopstick? Here comes the key: environment. We will probably find a pen in an office and a chopstick in an Asian kitchen, and when I say probably that means that statistics are inevitably needed.

    The question now is: how do we know we are in a kitchen or some other place? You only need to imagine that you have a button which can randomly teletransport you into any room of your house. After you open your eyes and analize every object in the room, you will be able to make a very accurate guess. But how can you distinguish or classify objects if you previously have no information about where you are (environment)? Here is the paradox.

    A clever guy could have realized that not always it is needed to analyze every single object to know where you are. In case of your bedroom, you may think that the most important object is the bed. Thus, after detecting this element you have an environment and you can continue recogning objects. Nonetheless, in spite of this new conclussion that we can recognize an environment given certain recognized objects, many unanswered questions come to my mind:

    -What if we have just a partial image which does not include any crucial recognizable object to learn about the environment?
    -Is it possible to model any environment? (how do you realized that you’re floating in the space or relaxing in the countryside)
    -What if those crucial recognizable objects have common features and tend confuse our algorithm?

    It is not especially hard to come up with innumerable questions like those.