Methods in Dialectology 14

« DIALECT AND HERITAGE LANGUAGE CORPORA FOR THE GOOGLE GENERATION»

Isabelle Buchstaller, Karen Corrigan, Adam Mearns and Hermann Moisl.

Scope of Session:

The increasingly widespread use of Information Technology (IT) in most spheres of human activity since the mid-20th century has facilitated and continues to generate digital electronic natural language text, audio and graphics on a huge scale. Research in the Arts, Humanities and Social Sciences has benefitted from this in that the volume of such materials available for study has greatly increased. Moreover, other fields like Statistics, Information Retrieval and Data Mining have provided computational tools for the analysis and interpretation of data abstracted from these resources. A major aspect of the impact of IT and allied subject areas on research in language variation and change across time and space has been the creation of innovative public and private corpora via digitisation of legacy materials or the synthesis of these with new ones. After several decades' activity, the volume of such collections worldwide is both very large and showing no sign of receding. As the number of collections has grown, a variety of conceptual, technical and ethical issues having to do with the preservation and re-use of such resources by academic and non-academic audiences have arisen.

This workshop focuses on four of them: (i) Given that such collections are typically generated in academic environments for academic use and resourced by public funding bodies, how can one ensure their longevity so that the financial and human investment in them is not wasted, i.e. how can they be made sustainable? (ii) What kinds of metadata do these corpora require so that the resources are used accountably by future generations of scholars? (iii) How can the potential of such collections be extended beyond Higher Education to schools, museums and to the general public, that is, how can they achieve even greater social impact? (iv) How can users be assisted to negotiate the access and usage policies of different public and private corpora and how can these be administered efficiently?

Panel of Speakers:

These questions will be addressed in this workshop by a panel of speakers who have created a wide range of diverse digital corpora relevant to the preservation and analysis of dialect and heritage language materials. The workshop will be coordinated by Karen Corrigan of Newcastle University, UK and will include a presentation on the new Diachronic Electronic Corpus of Tyneside English by her and her team (Isabelle Buchstaller, Adam Mearns and Hermann Moisl) as well as inputs from:

1. Joan Beal (University of Sheffield, UK) on historical phonological corpora –
- 'Explaining the Present: Why dialectologists need a historical corpus of English phonology'.

2. Isabelle Buchstaller, Karen Corrigan, Adam Mearns and Hermann Moisl
(Newcastle University, UK) on the Linguistic Time-Capsule for the Google Generation project, http://research.ncl.ac.uk/decte –
- 'The Diachronic Electronic Corpus of Tyneside English: Issues of preservation
and public engagement'.

3. Jenny Cheshire and Sue Fox (Queen Mary, University of London, UK) on the project Linguistic Innovators: the English of Adolescents in London,
http://www.lancs.ac.uk/fss/projects/linguistics/innovators/index.htm
- 'From Sociolinguistic Research to English Language Teaching'.

4. Sandra Clarke (Memorial University of Newfoundland, Canada) on the DANL Corpus, http://www.mun.ca/linguistics/research/language/danl.php –
- 'Adapting Legacy Regional Language Materials to an Interactive Online Format: The Dialect Atlas of Newfoundland and Labrador (DANL) project'.

5. Tyler Kendall (University of Oregon, US) on the SLAAP Corpus, http://ncslaap.lib.ncsu.edu –
- 'Beyond research alone: Considering sociolinguistic archives as "public" resources'.

6. Naomi Nagy (University of Toronto, Canada), on the Heritage Language Variation and Change project, http://individual.utoronto.ca/ngn/research/heritage_lgs.htm –
- 'Heritage Language Variation and Change: Corpus construction and use'.

7. Sali Tagliamonte (University of Toronto, Canada) on the Directions of Change,
York English, Roots, Toronto English and Kids' Corpora,
http://individual.utoronto.ca/tagliamonte/Sociolinguistic_Lab.html –
- 'Can I use your corpus? The joys and perils of building, analyzing and sharing
corpora'.

Arrangements for Abstract Review:

Abstracts for each of the presentations will be reviewed by at least two members of the panel of speakers.

Contact