Neural networks, which are a type of machine-learning system, have changed the face of Artificial Intelligence (AI). They learn to perform tasks by training with large sets of data provided specifically for training. They might perform tasks related to image recognition or translating text from one language to another. However after training, even the designers of the neural networks do not know all the data elements that they process for their tasks.  In other words, we know how to get neural networks up and running, but we have very little insight into how they actually perform once the training wheels come off. Researchers are attempting to gain insight regarding the workings of neural networks; specifically those trained to process natural language.

The idea here is that being able to understand what neural networks are doing and how they do it will help researchers boost their performance. Hopefully these insights can also be transferred to other applications too. So how do we peer into these neural networks? Unlike humans, a psychiatrist is not going to be of much use here. Computer scientists have developed some interesting techniques to delve into the inner workings of neural networks.

Technique

At the 2017 Conference on Empirical Methods on Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory presented a new general-purpose technique to make sense of neural networks trained for NLP (natural language processing) tasks. Some examples of such tasks are computers interpreting freeform text that is written in ordinary or “natural” language.

This technique is applicable to any system that has text input and produces output such as a string of symbols. Therefore, it can be used on automatic translators and online NLP services without accessing the actual software. It simply operates on analysis of inputs and outputs.

The underlying mechanics of the software do not impact the technique, so it can be used on any black-box text processing system. Interestingly experiments have demonstrated that the technique also identifies idiosyncrasies among human translators’ work.

The technique functions similar to those used on neural networks that perform object recognition and computer vision tasks. Such software perturbs or changes different parts of the image and resubmits it so analysis can identify what image features are the neural network makes use of to classify images. The same approach is being applied to neural networks that work on NLP. However, with language things aren’t quite as simple. Sentences usually have meaning, so how can a sentence be perturbed or changed semantically? Language is inherently different and more complex than an image. This was the problem encountered by Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science at MIT and one of the new paper’s two authors.

Strangely enough to generate test sentences for the black-box neural nets, scientists turned to a black-box neural net. What does it say when we need machines to help us test other machines? An interesting dilemma to say the least. The researchers (Jaakkola and Alvarez-Melis) trained a network to compress and decompress sentences and during training both the encoder and decoder were evaluated at the same time. The idea was to see how well the decoder’s output matched the encoder’s input.

Neural networks are inherently probabilistic and the networks in this experiment provided alternatives for each word along with probabilities of correctness. Co-occurrences of words are used to improve decoding accuracy as certain words are often used together.  So for any sentence, the system generated a list of closely linked likely outputs. The researchers then examined these input-output pairs to see the correlations between input variations to output variations.

Test cases

This technique was tested on three different types of NLP systems. One was a set of translators: a human and an automatic translator. The second was a chatbot of sorts, a simple computer dialog system and the third was a system that infers the pronunciation of words.

When it came to the translators, one of the interesting findings was the identification of gender biases in the text used for training the machine translators. There were also strong dependencies between individual words in the input and output.

The dialog system was trained on Hollywood movie lines. A large training set was provided to a deliberately underpowered network. Alvarez-Melis explained that this experiment is to do with flawed systems. If a black-box model is not doing its job well, this approach can help figure out what the problems are. It provides insight into what is going wrong and why, which enables the systems to be fixed and improved.

Time will tell if we succeed in fully understanding the inner workings of neural networks. Whatever the findings turn out to be, they are sure to have intriguing ramifications for computer scientists and AI.