Event box

This presentation introduces the University of Pittsburgh English Language Institute Corpus (PELIC; Juffs et al., 2020), a publicly available 4.2-million-word learner corpus of written texts. Collected over seven years in the University of Pittsburgh’s Intensive English Program, these texts were produced by more than 1,100 students with diverse linguistic backgrounds and proficiency levels. Unlike most learner corpora which are cross-sectional, PELIC is longitudinal, offering greater opportunities for tracking development in a natural classroom setting. This potential is illustrated in an overview of the research conducted to date with these data. The presentation also provides a description of PELIC’s creation and contents, including how the texts have been managed to facilitate natural language processing. Overall, the corpus contributes to the field of learner corpus research by adding to the pool of freely and publicly available learner corpora, supplemented by a useful set of Python tools and tutorials for accessing these data.


Presenters:

Alan Juffs is a Professor in the Department of Linguistics at the University of Pittsburgh. He was the Director of the English Language Institute at the University of Pittsburgh from 1998-2020. His research interests include the semantics-syntax interface, second language sentence processing, and corpus linguistics. In addition to more theoretical aspects of second language acquisition, he conducts classroom research in English as a Second Language vocabulary teaching and materials development.

Na-Rae Han is a Senior Lecturer in Linguistics and Director of the Robert Henderson Media Center at the University of Pittsburgh. Na-Rae Han's research interests include computational linguistics, corpus linguistics, and NLP (natural language processing) methods for educational assessment and instruction. She has also done work in the following areas: computational morphology, computational semantics/pragmatics, corpus construction and analysis, computational stylistics and authorship attribution.

Ben Naismith is a fifth-year Linguistics PhD candidate at the University of Pittsburgh and has been working the field of English Language Teaching for nearly 20 years. His research interests include lexis, teacher pedagogy, second language acquisition, and corpus linguistics. As part of the University of Pittsburgh English Language Institute (ELI) Data Mining Group, he investigates aspects of lexical development by applying computational methods to the Pitt ELI Corpus (PELIC).

This event is hosted by Digital Scholarship Services.

Date:
Friday, March 25, 2022
Time:
1:00pm - 2:00pm
Location:
Campus:
Pittsburgh
Categories:
Coding and Computational Methods, Dealing with Data, Digital Scholarship Workshop/Presentation, Online / Webinar
Registration has closed.

Event Organizer

Gesina Phillips