Identification of Languages in Linked Data: A Diachronic-Diatopic Case Study of French

Sabine Tittel; Frances Gillis-Webber

Ayuda

Identification of Languages in Linked Data: A Diachronic-Diatopic Case Study of French

Sabine Tittel ^[1] ; Frances Gillis-Webber ^[2]
1. [1] Heidelberg Academy of Sciences and Humanities
  
  Heidelberg Academy of Sciences and Humanities
  
  Stadtkreis Heidelberg, Alemania
2. [2] University of Cape Town
  
  University of Cape Town
  
  City of Cape Town, Sudáfrica
Localización: Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference. 1-3 October 2019, Sintra, Portugal / Iztok Kosem (ed. lit.), Tanara Zingano Kuhn (ed. lit.), Margarita Correia (ed. lit.), José Pedro Ferreira (ed. lit.), Maarten Jansen (ed. lit.), Isabel Pereira (ed. lit.), Jelena Kallas (ed. lit.), Miloš Jakubíček (ed. lit.), Simon Krek (ed. lit.), Carole Tiberius (ed. lit.), 2019, págs. 547-569
Idioma: español
Enlaces
- Texto completo (pdf)
Resumen
- When modelling linguistic resources as Linked Data, the identification of languages using language tags and language codes is a mandatory task. IETF’s BCP 47 defines the standard for tags, and ISO 639 provides the codes. However, these codes are insufficient for the identification of diatopic variation within a language and, also, for different historical language stages. This weakness hampers the accurate identification of data, which in turn leads to ambiguity when extending, aggregating and re-using this data—a key notion of Linked Open Data and the Semantic Web. We show the limitations of language identification with a case study of French linguistic data from both a diachronic and a diatopic perspective. Our exemplary data derives from dictionaries of Old French, Middle French, and of Modern French dialects, and from a Modern French linguistic atlas. For each exemplar, we propose a solution using the privateuse sub-tag of BCP 47’s language tag, staying within the boundaries of existing standards. Using a predefined pattern for the privateuse sub-tag, the solutions enable a dialect, a patois, in combination with a time period, to be defined and identified. This can lead to shared agreement of language tags that will increase interoperability within the context of Linked Data.