Computerized Profiling

Sample Archive


The CP archive contains four types of files:

  1. Orthographic transcript files contain verbatim renderings of the language produced by a speaker, along with notations about the context (e.g., the people present or toys available to a child) in which the language occurred.
  2. Phonetic transcript files contain a set of words produced by a speaker, transcribed in the International Phonetic Alphabet, modified for computer entry and display.
  3. Sound files contain digital recordings of the speakers transcribed in the orthographic and phonetic files.
  4. Analysis files contain sets of codes that mark utterance constituents at different linguistic levels. For example, LARSP (Crystal, Fletcher, & Garman, 1989) analysis files contain codes that mark syntactic constituents such as subject, direct object, personal pronoun, and contracted negative. PRISM-L (Crystal, 1982) analysis files code each lexeme in a sample as a member of a semantic field such as MAN>GROUP or TOOL>ACTION.

The archive is compiled from transcripts, recordings, and analyses contributed by CP users around the world.

These files are available for downloading by all visitors to the Web site. Files are cross-indexed by


