next up previous contents
Next: About this document ... Up: From Cards to Computer Previous: Future work

Subsections

Transcription rules

 

Codes

Brackets

 


12#12

The uncertainty of the transcriber

 


13#13

Missing elements

 
14#14

Pauses

 
15#15

Hesitation

 
16#16

Non-conforming suffix

 
17#17

Hypercorrect ik verb form

 
18#18

-suk/-sk, -szuk/-szk

 
19#19
N.B. Codes <s> (for -suk/-sk ) and <z> (for -szuk/-szk ) are obligatorily followed by explanation. Codes <suk> and <zuk> unambiguously stand for hypercorrect uses of -suk/-sk and -szuk/-szk respectively.

-n k

 
20#20

-e interrogative particle

 
21#21

-ba/-be, -ban/-ben

 
22#22

l-, t-, d- deletion

 
23#23

The shortening of phonologically long l, t, d is usually not transcribed, i. e. kelett is recorded in its standard form kellett , n‹tem as n‹ttem . However, if the shortening results in a form that belongs to another lexeme, it is recorded in the shortened form and is followed by an explanation e. g. halom <= hallom>.

Consonant clusters

 
24#24

Overlapping speech

 

Overlapping speech is transcribed within asterisks. The speech of the speaker who was speaking when the overlap began is transcribed till the end of the overlap. The beginning and end of the overlap is marked with an asterisk. Underneath follows the overlapping speech of the intervening speaker, also bounded by asterisks. If the second speaker takes over, his/her speech is transcribed continuously after the asterisk terminating the overlap. If the overlap is followed by the speech of the first speaker, then a new line is opened with the code of the speaker (a or t ) followed by : if the first speaker paused or by > if s/he carried on without a pause, e.g.


a:   j¢k vo<:><l>tak a do<:><l>gozatok. 25#25 Sza<l>
ez‚r<t>,

a		 *ez‚r<t> volt*

t:		*Igen*.

a>		n la k<l>”n”sen *furcsa az, hogy*

t:		*Igen, 25#25 igen*.

The * can be used word internally as well. Inside the word it is to be placed at syllable boundaries e.g.


t:   25#25 ‚s ezzz 25#25 nem volt megfelel‹? 25#25 Rosszul esett, 
25#25vagy

t		25#25 *nem tartotta megfelel‹nek*?

a:		*Ez most 25#25 a munk m*mal kapcsolatosan van, ugye?

If a word is broken up because of overlapping speech and is continued, both the ending of the first fragment and the beginning of the second is indicated with = e. g.


t: 25#25 *nem tartotta megfelel‹*=

a: 		*H t nem csak az*

t>		=nek?

Slips, self-corrections, false starts

 


26#26

Response giving suffix only

 
27#27

Quotation

 
28#28

Extralinguistic remarks

 
29#29

Foreign words

 
30#30

Lenghtened variant consonants

  They are standardized and not coded.
31#31

Instructions

Codes within words

  The following codes can occur inside words: 25#25, (), *

Long pauses

  At the beginning and the end of long pauses, noises the tape counter setting must be recorded in [ ], see A.1.18.

Pronounciation variants: vowels

 

-
In tests: transcribed
-
In conversations: standardized including short/long variants
e.g. lak¢s g 32#32 lakoss g

exception: - special words (see dictionary)
- compensatory lenghtening (see [*], A.1.20)

		- e/”  variants e.g. fel - f”l 

		- the t”rt‚netibe -type.

Pronounciation variants: consonants

 

The following phenomena are standardized:

- shortening (see A.1.12)

- deletion (except l, t, d -kies‚s, see A.1.12)

- lenghtening (except ss , see A.1.20)

Compensatory lenghtening following vowel shortening (e. g. sz”ll‹, htt‹) is not recorded.

Dialect speech

  Distinctly dialectal features (such as diphtongisation) should be recorded in the general profile of the informant. BSI transcripts only monitor e/” usage.

25#25”25#25”25#25”

  25#25”25#25”25#25” is recorded as many times as the informant utters it

but continuous hesitation is transcribed as ””” (see A.1.5).

Syllable deletion

  Deletion of one syllable is phonetically transcribed and then explained

e.g. sz”veki <=sz”vetkezeti>
but: sz”vetkeeti 32#32 sz”vetkezeti (standardized and not explained).

BSI version 3 transcripts will transcribe not only syllable length deletion but also vowel deletion (including the concommittant deletion of neighbouring consonant(s) if any, e.g. tulank‚ppen <=tulajdonk‚ppen> .

Pauses

  keret25#25tet, but: keret ” -tet (pauses can be marked inside words, hesitation ” must be marked separately) see A.1.12 for how silence should be recorded.

Items to be standardized

1.
Pronounciation variants (except A.2.3, A.2.4)
2.
close ë ; (except in card based test data)

Items not to be transcribed or standardized

1.
every -ja, -je possessive suffix e.g. ablaka-ablakja farka-farkja
2.
Mistakes in agreement

Dictionary (not to be standardised or explained)

- ovoda, b”lcs”de, k‹r£t, p¢sta, ”nt”de,

- mit tom ‚n, asszem-asziszem, aszondja

- szal-szoal-sza-szoval

- kommonista, Ejr¢pa, inekci¢, Sofiane-Sofian‚

- spr‚, sztressz

- gyn

- viszonlag

-

- oszt <=azt n>

- mert, <=mi‚rt> and derived forms (mer, me, mi‚r, mi‚).

Form conventions of transcribed text

 

Division of the transcription

Each conversation module forms a separate unit of text. Each unit has an identifier and a tape counter setting.

The identifier is made up of 8 characters, the first five of which is the ID of the informant, the rest is the three letter code of the conversation module, e. g. B7307bio.

Important formal conventions:

The format of text lines

Each line has 80 characters and they are used divided into the following fixed format:


 
Figure 5.5: The menu system and format of the reading passages
columns Content
1 - 5 identifier of the informant
6 - 8 identifier of the conversational unit
10 - 13 line number within CM
15 identifier of current speaker[*]
16 continuity marker[*]
17 - 72 text
74 - 79 location on tape

Figure A.1 illustrate the above conventions. Transcribers were instructed to carefully observe the following points:

The body of transcribed text occupies character positions 17 - 72. The program breaks the lines automatically, so <ENTER> should only be used to insert empty lines to set off text units from each other.

Character position 16 is only indicated at the beginning of each turn. If the turn extends over several lines this position remains empty meaning there was no change of speaker.

Turns must not be separated with empty lines.

Transcribers only need to fill in the speaker and the continuity positions on the left margin. The identifier, the line numbers are supplied automatically. Tape counter setting should be recorded at roughly 2 minute intervals.


  
Figure A.1: A sample page of transcription
33#33


next up previous contents
Next: About this document ... Up: From Cards to Computer Previous: Future work
Tamás Váradi
12/26/1997