Skip to contents

A dataset containing 1,155 5-minute conversations of 441 speakers of American English created in 1997 and tagged with a shallow discourse tagset of approximately 60 basic dialog act tags (DAMSL) and combinations.

Usage

data("swda")

Format

A data frame with 223,606 observations on the following 16 variables.

doc_id

ID for each conversation document

topic_num

Topic number associated with the conversation

topicality

Subjective rating of the annotator whether the callers conversed generally about what was suggested by the recorded prompt. Scale of 1 to 5, 1 being most on topic.

naturalness

Subjective rating of the annotator whether the the conversation sounded natural. Scale of 1 to 5, 1 being the most natural.

damsl_tag

DAMSL dialog act annotation labels

speaker

Label for each speaker in the conversation

turn_num

Number of contiguous utterance turns for a given speaker

utterance_num

The cumulative number of utterances in the conversation

utterance_text

The actual dialog utterance. Includes disfluency annotation (see details below)

speaker_id

ID for each speaker

sex

The biological sex of the speaker

birth_year

Year that the speaker was born

dialect_area

Region from the US where the speaker spent first 10 years

education

Highest educational level attained: values 0, 1, 2, 3, and 9

topic

Topic description

topic_prompt

Specific topic prompt for the conversation

Details

More information on the metadata in this data can be found here: https://catalog.ldc.upenn.edu/docs/LDC97S62/swb1_manual.txt.

The SWBD-DAMSL manual can be found here: https://web.stanford.edu/~jurafsky/ws97/manual.august1.html.

The Dysfluency Annotation Stylebook for the Switchboard Corpus can be found here: https://staff.fnwi.uva.nl/r.fernandezrovira/teaching/DM-materials/DFL-book.pdf.

Source

Switchboard-1 Release 2 https://catalog.ldc.upenn.edu/docs/LDC97S62/

References

Godfrey, John J., and Edward Holliman. Switchboard-1 Release 2 LDC97S62. Web Download. Philadelphia: Linguistic Data Consortium, 1993.

Jurafsky, Daniel, Elizabeth Shriberg, and Debra Biasca. 1997. "Switchboard SWBD-DAMSL Shallow-Discourse-Function Annotation Coders Manual, Draft 13" University of Colorado, Boulder Institute of Cognitive Science Technical Report 97-02

Meteer, Marie and Ann Taylor. 1995. Dysfluency Annotation Stylebook for the Switchboard Corpus

Examples

data(swda)