Academic publications

The following academic publications have been produced based on the research involved in the FIRST project:

Year 3:

  • Dornescu, I. Evans, R. and Orasan, C. (2014). 'Relative clause extraction for syntactic simplification'. In Proceedings of the Workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA 2014), Dublin, Ireland, pp.1-10 (Full text

 Abstract: This paper investigates non-destructive simplification, a type of syntactic text simplification which focuses on extracting embedded clauses from structurally complex sentences and rephrasing them without affecting their original meaning. This process reduces the average sentence length and complexity to make text simpler. Although relevant for human readers with low reading skills or language disabilities, the process has direct applications in NLP. In this paper we analyse the extraction of relative clauses through a tagging approach. A dataset covering three genres was manually annotated and used to develop and compare several approaches for automatically detecting appositions and non-restrictive relative clauses. The best results are obtained by a ML model developed using crfsuite, followed by a rule based method.

  • Evans, R., Dornescu, I., and Orasan, C. (2014) ‘An evaluation of syntactic simplification rules for people with autism’. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR). Gothenburg, Sweden, pp. 131 – 140 (Full text)

 Abstract: Syntactically complex sentences constitute an obstacle for some people with Autistic Spectrum Disorders. This paper evaluates a set of simplification rules specifically designed for tackling complex and compound sentences. In total, 127 different rules were developed for the rewriting of complex sentences and 56 for the rewriting of compound sentences. The evaluation assessed the accuracy of these rules individually and revealed that fully automatic conversion of these sentences into a more accessible form is not very reliable.

  • González-Navarro, Ana; Freire-Prudencio, Sandra; Gil, David; Martos-Pérez, Juan; Jordanova, Vesna; Cerga-Pashoja, Arlinda; Shishkova, Antoneta & Evans, Richard (2014), ‘FIRST: una herramienta para facilitar la comprensión lectora en el trastorno del espectro autista de alto funcionamiento’, Revista Neurología, 58, Suplemento 1: XVI Curso Internacional de Actualización en Neuropediatría y Neuropsicología Infantil, Spain (PDF)

 Abstract: This article presents the work developed by a multidisciplinary team under the framework of a project funded by the European Union. It is an explanatory document intended to justify the needs of the population with high-functioning ASD in relation to accessing written information. The project is developing a tool (Open Book) designed not only for people with ASD, but with people with ASD.

  • Martín Valdivia, María-Teresa; Martínez Cámara, Eugenio; Barbu, Eduard L; Ureña López, Alfonso; Moreda, Paloma; Lloret, Elena (2014). ‘Proyecto FIRST (Flexible Interactive Reading Support Tool): Desarrollo de una herramienta para ayudar a personas con autismo mediante la simplificación de textos’, Procesamiento de Lenguaje Natural, 53, pp.143-146 (Full text)

 Abstract: This article presents the process of developing the Open Book tool within the multidisciplinary FIRST Project. The article presents Autism Spectrum Disorder (ASD); a disorder that prevents proper development of cognitive, social and communication skills in people and the difficulties encountered by those with ASD in relation to reading comprehension. The FIRST project, a European project aimed at developing a multilingual tool called Open Book using Human Language Technologies to identify barriers to reading comprehension is introduced and its purpose – helping carers and people with autism to transform written documents into a simpler format by removing these obstacles identified in the text – is explained.

  • Pavlov, Nikolay (2014), ‘User Interface for People with Autism Spectrum Disorders’, Journal of Software Engineering and Applications, 7, pp.128-134. (Full text)

 Abstract: This paper describes the requirements for building accessible user interface for users with autism spectrum disorders (ASD) and presents the user interface (UI) of Open Book, a reading assistive tool for people with ASD. The requirements are extracted from both existing research on improving reading comprehension for people with ASD, and from the feedback of users and clinical professionals. The findings are applied in practice to create the user interface of Open Book tool. Key screens of the user interface are presented. It is implied that the features of the UI for people with ASD can be successfully applied to improving overall accessibility of any graphical user interface.

  • Štajner, S., Evans R. and Dornescu, I. (2014). ‘Assessing Conformance of Manually Simplified Corpora with User Requirements: the Case of Autistic Readers’. In Proceedings of the COLING workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA), Dublin, Ireland, 24 August 2014, pp. 53-63. (Full text)

 Abstract: In the state of the art, there are scarce resources available to support development and evaluation of automatic text simplification (TS) systems for specific target populations. These comprise parallel corpora consisting of texts in their original form and in a form that is more accessible for different categories of target reader, including neurotypical second language learners and young readers. In this paper, we investigate the potential to exploit resources developed for such readers to support the development of a text simplification system for use by people with autistic spectrum disorders (ASD). We analysed four corpora in terms of nineteen linguistic features which pose obstacles to reading comprehension for people with ASD. The results indicate that the Britannica TS parallel corpus (aimed at young readers) and the Weekly Reader TS parallel corpus (aimed at second language learners) may be suitable for training a TS system to assist people with ASD. Two sets of classification experiments intended to discriminate between original and simplified texts according to the nineteen features lent further support for those findings.

  • Štajner, S., Mitkov, R. and Corpas Pastor, G. (2014). 'Simple or not simple? A readability question'. To appear in N. Gala, R. Rapp, and G. Bel-Enguix (eds), Text, Speech and Language Technology: Recent Advances in Language Production, Cognition and the Lexicon, 48, Springer, pp.379-398. (Abstract)

 Abstract: Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrome, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.

  • Štajner, S., Mitkov, R. and Saggion, H. (2014). ‘One Step Closer to Automatic Evaluation of Text Simplification Systems’. In Proceedings of the EACL workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), Gothenburg, Sweden, 27 April, pp. 1-10. (Full text)

 Abstract: This study explores the possibility of replacing the costly and time-consuming human evaluation of the grammaticality and meaning preservation of the output of text simplification (TS) systems with some automatic measures. The focus is on six widely used machine translation (MT) evaluation metrics and their correlation with human judgements of grammaticality and meaning preservation in text snippets. As the results show a significant correlation between them, we go further and try to classify simplified sentences into:(1) those which are acceptable; (2) those which need minimal post-editing; and (3)those which should be discarded. The preliminary results, reported in this paper, are promising.


Year 2:

  • Barbu, E.; Martín-Valdivia, M.; Alfonso, L. & Lopez, U. (2013), ‘Open Book: a tool for helping ASD users’ semantic comprehension’, Proceedings of the 2th Workshop of Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Atlanta ,United States, 14 June, pages 11-19. (Full text)

 Abstract: In the FIRST project we are building a multilingual tool called Open Book that helps people with autism to better understand texts. The tool applies a series of automatic transformations to users’ documents to identify and remove obstacles to reading comprehension. In this paper, we focus on three semantic components: an image component that retrieves images for the concepts in the text, an idiom detection component and a topic model component.

  • Dornescu, I.; Evans, R., & Orasan, C. (2013), A Tagging Approach to Identify Complex Constituents for Text Simplification’, Proceedings of Recent Advances in Natural Language Processing (RANLP 2013), Hissar, Bulgaria, pp. 221-229. (Full text)

 Abstract: The occurrence of syntactic phenomena such as coordination and subordination is characteristic of long, complex sentences. Text simplification systems need to detect and categorise constituents in order to generate simpler sentences. These constituents are typically bounded or linked by signs of syntactic complexity, which include conjunctions, complementisers, wh-words, and punctuation marks. This paper proposes a supervised tagging approach to classify these signs in accordance with their linking and bounding functions. The performance of the approach is evaluated both intrinsically, using an annotated corpus covering three different genres, and extrinsically, by evaluating the impact of classification errors on an automatic text simplification system. The results are encouraging.

  • Drndarevic, B.; Štajner, S.; Bott, S.; Bautista, S. & Saggion, H.(2013), ‘Automatic Text Simplification in Spanish: A Comparative Evaluation of Complementing Modules’, Lecture Notes in Computer Science: Computational Linguistics and Intelligent Text Processing (Part II) (Proceedings of the ‘CICLing 2013: The 14th International Conference on Intelligent Text Processing and Computational Linguistics’, Samos, Greece, 24-30 March), 7817, Springer. (More information)

 Abstract: In this paper we present two components of an automatic text simplification system for Spanish, aimed at making news articles more accessible to readers with cognitive disabilities. Our system in its current state consists of a rule-based lexical transformation component and a module for syntactic simplification. We evaluate the two components separately and as a whole, with a view to determining the level of simplification and the preservation of meaning and grammaticality. In order to test the readability level pre- and post-simplification, we apply seven readability measures for Spanish to three sets of randomly chosen news articles: the original texts, the output obtained after lexical transformations, the syntactic simplification output, and the output of both system components. To test whether the simplification output is grammatically correct and semantically adequate, we ask human annotators to grade pairs of original and simplified sentences according to these two criteria. Our results suggest that both components of our system produce simpler output when compared to the original, and that grammaticality and meaning preservation are positively rated by the annotators.

  • Evans, R.; & Orasan, C. (2013), ‘Annotating signs of syntactic complexity to support sentence simplification’, in I. Habernal & V. Matousek (eds.), Text, Speech and Dialogue. Proceedings of the 16th International Conference TSD 2013, Plzen, Czech Republic, Springer. pp. 92-104. (Abstract)

 Abstract: This article presents a new annotation scheme for syntactic complexity in text which has the advantage over other existing syntactic annotation schemes that it is easy to apply, is reliable and it is able to encode a wide range of phenomena. It is based on the notion that the syntactic complexity of sentences is explicitly indicated by signs such as conjunctions, complementisers and punctuation marks. The article describes the annotation scheme developed to annotate these signs and evaluates three corpora containing texts from three genres that were annotated using it. Inter-annotator agreement calculated on the three corpora shows that there is at least “substantial agreement” and motivates directions for future work.

  • Glavaš G. & Štajner S. (2013), ‘Event-Centered Simplification of News Stories’, Proceedings of the Student Research Workshop at the International Conference on Recent Advances in Natural Language Processing (RANLP 2013), Hissar, Bulgaria, 9-11 September, pp.71-78. (Full text)

 Abstract: Newswire text is often linguistically complex and stylistically decorated, hence very difficult to comprehend for people with reading disabilities. Acknowledging that events represent the most important information in news, we propose an event-centered approach to news simplification. Our method relies on robust extraction of factual events and elimination of surplus information which is not part of event mentions. Experimental results obtained by combining automated readability measures with human evaluation of correctness justify the proposed event-centered approach to text simplification.

  • Ianakiev, Y. (2013),’Overcoming obstacles to reading comprehension in people with autistic spectrum disorders’, Reading and Dyslexia (Proceedings of the International Conference on Reading  and Dyslexia), Logopedic Centre ‘Romel’ PH, Albena, Bulgaria, 27 May - 2 June, pp.61-79. (More information / Programme).

 Abstract: Children with Autism Spectrum Disorders (ASD) have difficulties in understanding written text. These difficulties present a significant barrier to the participation and inclusion of people with autism in all aspects of society, including education, employment, health care and social activities. The software Open Book aims to reduce these barriers by providing a tool that empowers people with autism to read a broad range of documents without assistance.

  • Ianakiev, Y. (2013), Reading Comprehension: Information and Communication Technologies for Social Inclusion of People with Autism, Plovdiv University Publishing House, Plovdiv, p.204. (More information)

 Abstract: The book begins with a theoretical overview of the different levels of processing in reading comprehension. This is followed by a detailed look at the psychological approaches related to the specificity of language comprehension among autistic children and adults. Emphasis is placed on the interplay between theory, empirical research and practical application that is applicable to the elaboration of the software products as Open Book, designed to adapt written documents into a format that is easier to understand.

  • Niculae, V. & Yaneva, V. (2013), ‘Computational considerations of comparisons and similes’, Proceedings of the ACL Student Research Workshop, Sofia, Bulgaria, August 4-9, pp. 89-95. (Full text)

 Abstract: This paper presents work in progress towards automatic recognition and classification of comparisons and similes. Among possible applications, we discuss the place of this task in text simplification for readers with Autism Spectrum Disorders (ASD), who are known to have deficits in comprehending figurative language. We propose an approach to comparison recognition through the use of syntactic patterns. Keeping in mind the requirements of autistic readers, we discuss the properties relevant for distinguishing semantic criteria like figurativeness and abstractness.

  • Orasan, C.; Evans, R. & Dornescu, I. (2013), ‘Text Simplification for People with Autistic Spectrum Disorders’, in D. Tufis; V. Rus; & C. Forascu (eds.), Towards Multilingual Europe 2020: A Romanian Perspective, Romanian Academy Publishing House, Bucharest, pp. 287-312.

 Abstract: People affected by autism spectrum disorders usually have language deficits which limit their ability to comprehend speech and written text. This is usually caused by the presence in text of linguistic phenomena such as long and syntactically complex sentences, figurative language including metaphor and idioms, semantically ambiguous words and phrases, and technical/specialised words. The FIRST project’s main objective is to implement, deploy and evaluate NLP technologies to support the authoring of accessible content in Bulgarian, English and Spanish. Two experiments aimed at determining the needs of people with ASD confirmed that syntactic simplification transformations needed to make texts more accessible. In addition to presenting the FIRST project and the experiments which determined the users’ requirements, this paper also presents a syntactic simplification method which combines a machine learning approach with manually created rules is presented.

  • Štajner, S.; Drndarevic, B. & Saggion, H. (2013), ‘Corpus-based Sentence Deletion and Split Decisions for Spanish Text Simplification’, Proceedings of  CICLing 2013: The 14th International Conference on Intelligent Text Processing and Computational Linguistics, Samos, Greece, 24-30 March).

 Abstract: This study addresses the automatic simplification of texts in Spanish in order to make them more accessible to people with cognitive disabilities. A corpus analysis of original and manually simplified news articles was undertaken in order to identify and quantify relevant operations to be implemented in a text simplification system. The articles were further compared at sentence and text level by means of automatic feature extraction and various machine learning classification algorithms, using three different groups of features (POS frequencies, syntactic information, and text complexity measures) with the aim of identifying features that help separate original documents from their simple equivalents. Finally, it was investigated whether these features can be used to decide upon simplification operations to be carried out at the sentence level (split, delete, and reduce). Automatic classification of original sentences into those to be kept and those to be eliminated outperformed the classification that was previously conducted on the same corpus. Kept sentences were further classified into those to be split or significantly reduced in length and those to be left largely unchanged, with the overall F-measure up to 0.92. Both experiments were conducted and compared on two different sets of features: all features and the best subset returned by an attribute selection algorithm.

  • Štajner, S. & Saggion, H. (2013), ‘Adapting Text Simplification Decisions to Different Text Genres and Target Users’, Procesamiento del Lenguaje Natural, 51, pp. 135-142. (Full text)

 Abstract: We investigate sentence deletion and split decisions in Spanish text simplification for two different corpora aimed at different groups of users. We analyse sentence transformations in two parallel corpora of original and manually simplified texts for two different types of users and then conduct two classification experiments -- classifying between those sentences to be deleted and those to be kept; and classifying between sentences to be split and those to be left unsplit. Both experiments were first run on each of the two corpora separately and then run by using one corpus for the training and the other for testing. The results indicated that both sentence decision systems could be successfully trained on one corpus and then used for a different text genre in a text simplification system aimed at a different target population.

  • Vodolazova, T; Lloret, E.; Muñoz, R. & Palomar, M. (2013), ‘Extractive Text Summarization: Can We Use the Same Techniques for Any Text? Natural Language Processing and Information Systems’, Proceedings 18th International Conference on Applications of Natural Language to Information Systems NLDB 2013, Salford, United Kingdom, 19-21 June, pp.164-175.(Full text)

 Abstract: In this paper we analyze whether the performance of a text summarization method depends on the topic of a document, and how certain linguistic properties of a text may affect the performance of automatic text summarization methods. These issues are tackled with the aim to further the research in the FIRST project which focuses on the automatic summarisation of text tohelp people with ASD to understand, and identify the relevant aspects of a given text.

Year 1:

  • Ianakiev, Youri  (2012), 'Creation of communication networks through the theory of hierarchical small worlds', Scientific Days, May, University ed. of Veliko Tarnovo University.

Abstract: This article applies ‘small world theory’ (Frigyes Karinthy, Stanley Milgram) to the development of innovative tools for visualisation and analysis of real networks (eg. social, communication and cognitive networks). The research aims to build a stronger knowledge base to inform the development of tools to assist people with autism to understand the world around them more clearly.

  • Ianakiev, Youri  (2013), ' High technology for autism: creating specialized software to facilitate reading comprehension through conversion and adaptation of the text' in Research Reports, University ed. Paisiy Hilendarski, Plovdiv. 

Abstract: The Flexible Interactive Reading Support Tool (FIRST) project is developing a tool called 'Open Book' to assist people with autism spectrum disorders to adapt written documents into a format that is easier for them to read and understand. The paper presents the goals and objectives of the project, the organisation of activities in work packages and partners, focusing on results from the first year of its implementation.

  • Ianakiev, Youri  (2013), Communication and space knowledge. University ed. Paisiy Hilendarski, Plovdiv.

Abstract: The book presents a theoretical overview of the concept of 'communicative space' in the humanities and social sciences. One of the main applications of theories for organisation and visualisation of the cognitive space is oriented towards implementation of methods for simplification of the text for people with autism. 

  • Mosquera,  Alejandro; Lloret, Elena & Moreda, Paloma (2102), 'Towards Facilitating the Accessibility of Web 2.0 Texts through Text Normalization Resources', Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Istanbul, Turkey (PDF).

Abstract: Web 2.0 allows users to freely write content on the internet. However, the non-standard features of the language can make social media content less accessible than traditional texts. This paper proposes TENOR, a multilingual lexical approach for normalising web 2.0 texts in English or Spanish.

  • Štajner, Sanja; Evans, Richard; Orasan, Constantin & Mitkov, Ruslan (2012), 'What can readability measures really tell us about text complexity?', Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Istanbul, Turkey (PDF).

Abstract: This paper investigates new methods to assess the complexity of documents in three genres and compares their performance with existing readability metrics. The assessment of the text complexity, and the system's response to this assessment is an important aspect of the functionality supported by the software to be developed in FIRST.

  • Yaneva, Victoria (2012), 'Psycholinguistic aspects of reading comprehension process in children with ASD', .

Abstract: This thesis presents the way children with autism interpret various aspects of language. Two main categories are highlighted - literal meaning and figurative speech. We discuss children with ASD's comprehension of pronouns, hyperbole, understatement, irony, indirect and rhetorical questions, and metonymy. Metaphor is emphasised as an initial cognitive tool.