• Login
    • Login
    Advanced Search
    View Item 
    •   Maseno IR Home
    • Theses & Dissertations
    • Doctoral Theses
    • School of Computing & Informatics
    • View Item
    •   Maseno IR Home
    • Theses & Dissertations
    • Doctoral Theses
    • School of Computing & Informatics
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Script Acquisition: A Crowdsourcing and Text mining approach

    Thumbnail
    View/Open
    thesis.pdf (6.570Mb)
    Publication Date
    2019
    Author
    WANZARE, Lilian Diana Awuor
    Metadata
    Show full item record
    Abstract/Overview
    According to Grice’s (1975) theory of pragmatics, people tend to omit basic information when participating in a conversation (or writing a narrative) under the assumption that left out details are already known or can be inferred from commonsense knowledge by the hearer (or reader). Writing and understanding of texts makes particular use of a specific kind of common-sense knowledge, referred to as script knowledge. Schank and Abelson (1977) proposed Scripts as a model of human knowledge represented in memory that stores the frequent habitual activities, called scenarios, (e.g. eating in a fast food restaurant, etc.), and the different courses of action in those routines. This thesis addresses measures to provide a sound empirical basis for high-quality script models. We work on three key areas related to script modeling: script knowledge acquisition, script induction and script identification in text. We extend the existing repository of script knowledge bases in two different ways. First, we crowdsource a corpus of 40 scenarios with 100 event sequence descriptions (ESDs) each, thus going beyond the size of previous script collections. Second, the corpus is enriched with partial alignments of ESDs, done by human annotators. The crowdsourced partial alignments are used as prior knowledge to guide the semi-supervised script-induction algorithm proposed in this dissertation. We further present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets and inducing their temporal order. The proposed semi-supervised clustering model better handles order variation in scripts and extends script representation formalism, Temporal Script graphs, by incorporating "arbitrary order" equivalence classes in order to allow for the flexible event order inherent in scripts. In the third part of this dissertation, we introduce the task of scenario detection, in which we identify references to scripts in narrative texts. We curate a benchmark dataset of annotated narrative texts, with segments labeled according to the scripts they instantiate. The dataset is the first of its kind. The analysis of the annotation shows that one can identify scenario references in text with reasonable reliability. Subsequently, we proposes a benchmark model that automatically segments and identifies text fragments referring to given scenarios. The proposed model achieved promising results, and therefore opens up research on script parsing and wide coverage script acquisition
    Permalink
    https://repository.maseno.ac.ke/handle/123456789/5395
    Collections
    • School of Computing & Informatics [5]

    Maseno University. All rights reserved | Copyright © 2022 
    Contact Us | Send Feedback

     

     

    Browse

    All of Maseno IRCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Maseno University. All rights reserved | Copyright © 2022 
    Contact Us | Send Feedback