Summarization of Document

  • Introduction

    • I was part of the team during my tenure at Mphasis NEXT Labs as a Software Engineer.
    • This project is aimed to summarize grammatically written English document by build a queriable directed graph based on semantics and context (i.e. Event and Action). The query on graph can retrieve information about action and its effect, and entity and its role. We adopted various NLP and text mining methodologies to build the graph and stored the graph in Neo4j for effective information retrieval.
    • This project involved usage of R and Python programming language, CoreNLP package for finding co-references among sentences, tokenization and POS tagging, Senna framework for Semantic Role Labelling, Semafor for frame-semantic parsing and Neo4j framework for graph querying.
  • My Role

    • I implemented co-reference relationship builder using CoreNLP (which internally uses Stanford CoreNLP framework) for replacement of pronouns with their respective nouns.
    • I translated retrieved relationship between prominent nouns and verbs from the document pre-processing to Neo4j graph
    • (Team of two) We created a GUI in RShiny for a prototype demonstration