Research Institute for Linguistics HAS

  Research Group for Language Technology

[Magyar változat]

Home > Departments > Department of Language Technology and Applied Linguistics > Research Group for Language Technology > Projects > HUKILC

Hungarian Kindergarten Language Corpus (HUKILC)

The Hungarian Kindergarten Language Corpus (HUKILC) has been compiled predominantly for child language variation studies. It contains 62 interviews with 4,5–5,5 year-old kindergarten children from Budapest, recorded in spring 2012. The interviews are 20–30 minutes long, and consist of different types of story-telling tasks, guided as well as free conversation. The story-telling tasks range from pre-told picture-story descriptions to description of well-known games.

The corpus consists of 39 000 utterances with about 140 000 words. Due to the nature of the interviews, the speech of the researcher makes about the half of the whole corpus size, and contains a lot of phatic expressions.

Interviews are conducted in two kinds of kindergartens concerning socio-economic status (SES): kindergartens where the parents belong to a higher SES, and other kindergartens where the parents belong to lower SES. Thus children are divided into 4 groups concerning SES and sex. There is a higher SES group with males (hm), and one with females (hf), and a lower SES group with males (lm) and females (lf), respectively. The corpus is also a useful source for other fields of child language research.

Transcription of the recordings follow the CHAT guideline of CHILDES.
Funding: CESAR projekt
HomeDepartmentsLangtech/AppliedLanguage Techn.Projects
HGDS2Finn-OTKATrendMinerSlovake.euSlovak-HungarianECOINNOContracted BrailleINNETMATRICAHGDShelyesírás.mta.huEFNILEX Media Monitor HUKILC