I am totally excited in this post to interview LI Zhe, a software engineer in Bordeaux, whom I have only known through email and Twitter for the past two years, and only managed to meet in real life about a month ago.
We share a common passion about Machine Learning and mathematics, and for this interview I asked her to tell us a little bit more about one of her favorite topic: Deep Learning.
And for those who live in Bordeaux, LI Zhe is giving a talk on Machine Learning with Spark on Thursday 19th.
Ludwine. Can you introduce yourself?
My name is LI Zhe. I am a chinese girl and I have been living in France for 8 years. After graduating with a masters degree in computer sciences, I started my working as a Java developer in Paris. I later moved to Bordeaux, where I live and work as a full-stack developer at ARCA Computing.
I love science in general, whether it’s mathematics, physics… especially astronomy and aeronautics. I am attracted to science because I love the way we look the world through it. Science possesses not only the truth, but supreme beauty → i.e logic.
I am also interested in electronics: I am making my first small robot at the moment.
Otherwise, I play piano and GuZheng (a chinese traditional musical instrument). I play with diverse Rubik’s Cubes or the Tangram to relax. I also like travelling and discovering different cultures.
Why did you choose to work in computer science? Did you have some bias about working in IT?
I have fallen into computer sciences haphazardly. After graduating from High School, I chose science and engineering of materials in university. I joined a program to study this field, after a partnership between my university and an engineering school in France was launched in 2003.I was attracted to the program because there were not just theoretical courses, but also practical and experimental courses like working with Cisco routers or Oracle databases, and building websites. And I learned to speak french 🙂
Since middle school, I was told that boys were smarter than girls in science in general. Before I started working in this domain, I wasn’t sure if I would find my place.
Today I love my job, I love coding, thinking, studying and sharing. When I am working, I only see the job, and whether you are a girl or a boy, what matters is how passionate and committed you are to succeed.
In October you gave a talk about Deep Learning and neural networks at the bdx.io conference. Can you tell us more about these topics?
Neural networks is a family of models, inspired by biological neural network in the brain. It is used to solve a variety of tasks that are hard to solve using ordinary Machine Learning methods, especially in computer vision, natural language processing and speech recognition. A neural network does not follow a linear path: information is processed collectively, in parallel throughout a network of nodes (the nodes being neurones). Like in the image below, that represents a Feedforward Neural Network.
For each layer we want to compute the value of each neuron.
Given a neuron A, we take all neurons connected to this neuron A from the previous layer. Every input neuron can be weighted. We get the sum of the values of all weighted neurons.
Then this sum is passed to an activation function like the sigmoïd so we get the “activation”, i.e the value of the neuron A.
Computations are then applied to all neurons of the network and repeated layer to layer until an output neuron is activated.
The crux of every Deep Learning work is to find the optimal value of weight for each neuron. And off course changing the weight have consequences to the accuracy of the network and the output of the model.
Now, let’s take an example of handwriting recognition for the further explanations.
Let’s say that we have a collection of images representing handwritten digits. Given an image of a handwritten digit, we want to predict this digit knowing that this digit was not in data used to train the model.
We start by choosing input neurons where a neuron represents a pixel’s intensity. In order to get these neurons, we first need to normalize every input image to a grayscale one with a size of 28 x 28, i.e 786 pixels. To extract pixels from the grayscale image we can use openCV, an open-source image recognition library. At the end, we obtain 786 input neurons.
The weight values are chosen randomly at first. Then for each layer we process the computation of the activation function until the output layer. Finally we compare the computed output value with the correct one.
To improve the model accuracy and to reduce the difference between our output and the correct output, we can apply the back-propagation method: it determines how much each weight (via delta) is responsible for the final error so that you can adjust weight values consequently. The testing data can be also used to verify the accuracy for each iteration.
Well, but why do we need hidden layers?
Hidden layers’ job is to transform inputs into something that the output layer can use.
As an example let’s look at how image classification works.
As you can see in the picture above, the first hidden layer looks for short pieces of edges in the image: these are very easy to find from raw pixel data, but not very useful for telling you if you are looking at a face, a bus or an elephant.
Each hidden layer will help successively detect specific areas of the face like nose, eyes until you can compose the entire face.
It would be too hard for a single layer to detect all specific areas of the face from the raw pixels, that why it is important to have enough hidden layers in your model.
Deep Learning is mostly based on Artificial Neural Networks. Since 2006 a set of techniques have been developed to enhance learning in deep neural nets. These methods have enabled much deeper (and larger) networks to be trained – people now routinely train networks with 5 to 10 hidden layers. And it turns out that they perform far better on many problems than shallow neural networks, i.e. networks with less than three layers. The reason, of course, is the ability of deep nets to build up a complex hierarchy of concepts.
The network I decided to use in my presentation is Recurrent Neural Network (or RNN). This is a popular model which has shown great promise in natural language processing because it can use sequential information. For example, if you want to predict the next word in a sentence, you would better know the previous ones.
RNNs have internal states that can be thought as “memory”, that captures information about what has been computed so far.
Regarding Deep Learning, the theory has existed for thirty years, and Yann Le Cun is one of the researcher who has contributed much to the development of this field of research. Today I think that Deep Learning is a real buzzword, and data analysis have become a new challenge for many companies. What do you think of that? Can we put Machine Learning everywhere?
It is true that Deep Learning have shown us a great results so far, from the generation of text written in the style of Shakespeare, to the composition of music. For example Google DeepMind created a neural network that learns how to play video games in a manner similar to humans.
In my opinion, Deep learning will help us make real progress towards artificial intelligence.
But there is still much to do. Hopefully today, Machine Learning may be used wherever there are data 🙂
What fascinates you in AI?
Artificial intelligence can make our lives more effective. And I think I would like to work on projects using AI to make an impact on people’s everyday lives.
Another example I like is Argus II, an epiretinal prosthesis who can be implanted in eyes of blind people in order to regain some sight.
To those who would want to learn me re about A.I, what do you recommend? Do we need to have a strong knowledge in mathematics?
Regarding neural networks, I found that the book Neutral network and deep learning is good: Michael Nielsen explains the techniques used to improve the performance of neural networks with a mathematical point of view.
If you are interested in working in this domain, I think that you should have a minimum knowledge in statistics, linear algebra and calculus. No need to become an expert mathematician, so do not be afraid to get back to learn some basic math.
In the future, I think it will be easier to get started without a big maths background, thanks to high level APIs, that can help you get going without caring much to nifty optimization details. There are already nice frameworks and libraries going into this direction, like Theano in Python, Torch in Lua, or Caffe from the University of California, Berkeley. And recently Google open sourced its in house deep learning library TensorFlow, available for free, for everyone to use.
Finally, are there any app or tool you can not live without?
Google search engine 🙂
PS: Thanks to Ludwine for her beautiful paintings for this interview!!
A big thank you to LI Zhe for this interview!