1785 words - 8 pages

In this paper we modeled the process of learning orthographic structure from sequences of letters using a recently proposed generative neural network, the Recurrent Temporal Restricted Boltzmann Machine (RTRBM; Sutskever et al., 2008). We showed that this sequential network is able to learn the structure of English monosyllables in a completely unsupervised fashion, by only trying to accurately reproduce input wordforms presented one letter at a time.

We first demonstrated that the RTRBM successfully learned graphotactics by testing its performance on a prediction task, where the initial part of a word was given as a context and the network predicted the probability distribution of the following letter. The RTRBM yielded a prediction performance comparable to that of a simple recurrent network (SRN), which is the most widely used connectionist architecture to model sequential data. We also compared the RTRBM with other popular (non-connectionist) probabilistic generative models, that is Hidden Markov Models and n-grams, which constitute the state-of-the-art in several sequence learning tasks but do not provide any insight in terms of the underlying neural computation. We then assessed the generative ability of the considered models, by letting them to autonomously produce sequences of letters and measuring their well-formedness. In particular, we calculated their averaged length, orthographic neighborhood and constrained bigram and trigram frequencies, and we used these indicators to compare the quality of generated pseudowords with that of two pseudoword generators used in psycholinguistic studies, namely the ARC nonwords database (Rastle et al., 2002) and Wuggy (Keuleers & Brysbaert, 2010). We found that the RTRBM produced very high quality pseudowords, thus confirming that it correctly learned the orthographic structure of English monosyllables. In this regard, it is worth noting that the results of our model should readily extend to other alphabetic languages.

In order to allow autonomous generation of sequences with the SRN, which is inherently deterministic and input-driven (i.e., bottom-up), we had to extend its basic formulation. We therefore presented a stochastic variant of SRNs, in which the activations of the output units are first normalized in order to be treated as conditional probability distributions. An external stochastic process is then used to sample the next element of the sequence, which is fed back as input to the network at the following timestep. Interestingly, the pattern of results obtained with the RTRBM was very similar to that obtained with the stochastic variant of SRNs. On the one hand, this is not surprising because both connectionist models try to predict the next element of a sequence by learning conditional probabilities from the training data. Indeed, there is a tight formal relationship between probabilistic graphical models and recursive neural networks (Baldi & Rosen-Zvi, 2005). However, the two...

Get inspired and start your paper now!