Newsletter 12

No 12
December - 1997



CONTENTS


Dutch, English and English L2 business language in contrast: working with the ACID corpus

What is ACID?

A few years ago, the Centre for Applied Linguistics (ICTL) of the University of Antwerp started to set up a computer-coded database of institutional discourse, which was named the Antwerp Corpus of Institutional Discourse (ACID). Its main aim, formulated at the time, was "the study of pragmatic aspects of spoken and written discourse in an institutional context, both from a native speaker and a non-native speaker perspective" (Geluykens & Van Rillaer 1995:79). The long-term goal of the database is to collect all sorts of data, both written and spoken, and in various languages. For the time being, however, the type of data that have been collected in some quantity is business letters in native Dutch, native English, and Dutch-English interlanguage (i.e. English produced by native speakers of Dutch). "Business" should be interpreted broadly and refers to the fact that all these letters were written as a part of people's work in or with organisations (commercial, non-profit and academic).

The coding system that was used for the database is a revised version of the CHAT/CLAN coding system first developed by McWhinney (1991), originally for spoken child discourse, but in effect also adaptable to fairly "interactive" forms of written language. It is a system that allows for coding of both formal and functional aspects, and is also very flexible in that features can be deleted and added as the need arises. The most important aspect is that it allows for quantitative analyses of large samples. A systematic description of the coding labels that were used in ACID is given in Geluykens & Van Rillaer (1995).

With this system, three samples of 100 business letters for each language (in all 300 letters) have been coded so far, and this is presently being expanded to double that size.[1] The three (parallel) subsets of the corpus that are fully coded therefore consist of 100 Dutch letters written by native speakers of (mostly Flemish) Dutch; 100 English letters written by native speakers of (mostly British) English; and 100 English letters written by native speakers of Dutch. The database thus not only allows for contrastive research in English and Dutch, but also for comparative research of both native and interlanguage data. The database was indeed conceived to make this latter kind of research possible.

What is coded in the data ?

In this section I will give a broad overview of the most important types of information that are coded for the data. A more systematic and detailed description can be found in Geluykens & Van Rillaer (1995). A first group of coding labels is used for situational information. In so far as possible, writers and addressee are coded in terms of what kind of organisation they act on behalf of and what language background they have in relation to the language that is used in the letter. Independently of the specific interpretation of the text, there is also a coding of the (approximate) power relationship between writer and addressee, on three levels: the writer is more, equally or less powerful than the addressee. Further, the letters are coded in terms of their position in a larger correspondence (e.g. initiation; invites a reply), in terms of their domain or communicative act (e.g. business offer), and in terms of the geographical origin and destination they have (many different labels are possible here, including a label for organisation-internal letters).

A second group of codes is used for text-internal forms and functions, the most important of which relate to politeness phenomena. Indeed, the kind of pragmatic analysis first of all envisaged and carried out concerns the realisation of face-threatening acts (FTAs, cf. Brown and Levinson 1987) in the three sets of data. Each individual utterance in the letter thus first received a code specifying the identity of the FTA they realised or helped to realise (e.g. request). The labels for the individual FTAs were based on Brown & Levinson's (1987) categories, but had to be adapted and extended. Moreover, the sequential environment of the labelled utterances was taken into account, in order to avoid a purely local interpretation. Second, for each utterance with an FTA-label, another label was attributed for the specific politeness strategy or strategies used in that utterance, based on the classification developed by Brown & Levinson (1987) (e.g. for a request, negative politeness strategy 1: be conventionally indirect). For some utterances, there is also a so-called 'conventional strategic position' label, which is especially relevant for linguistic material outside the body of the text (e.g. salutation).

It should be noted that at present the number of labels used for classes of FTA and even more for politeness strategies is huge, while the assignment of labels to utterances is often problematic, as was noted by Geluykens & Van Rillaer (1995: 88;90). Future adaptations of the coding system will therefore have to go in the direction of fewer labels. Coupled with an expansion of the database itself, this will make it possible to make clearer distinctions and quantitative results will become more significant.

A third group of codes involves syntactic and semantic information about the utterances in the letter. Here the labels were chosen with an eye to their potential use for the analysis of politeness phenomena, and no attempt was made to be exhaustive. Also here, coding may be changed or added further if this proves to be necessary. At present, for the main clause of utterances, information is coded about e.g. mood, voice, type of subject and type of main verb. For the subordinate clauses utterances in the letters may have, it is specified which type of subordinate clause this involves. Finally, in view of the relevance of modal auxiliaries for the analysis of FTA strategies, utterances are also coded for the presence of these auxiliary verbs (e.g. "mdlD" for an utterance that contains a deontic modal).

ACID-related work

In this section I will mention a few lines of investigation for which the ACID database has proved useful, and which have resulted in publications. This is obviously not an exhaustive discussion of the research that has preceded and come out of the set-up of this database, but rather a very partial selection aimed at illustrating some of the possibilities this database creates.

As pointed out above, ACID was first of all aimed at the analysis of politeness phenomena and the comparison of English and Dutch native vs. Dutch-English non-native strategies in the data. Geluykens (to appear) investigates politeness strategies in native and non-native letters with reference to one particular kind of FTA, viz. requests. He found that on the whole requests in native English letters show a lower frequency of bare imperatives, a higher frequency of imperatives with "please", a more abundant use of modal auxiliaries, the infrequent use of request performatives and the relatively high frequency of off-record strategies.[2] Dutch requests, conversely, have more bare imperatives, fewer modals, more performatives and fewer off-record strategies. He also found that Dutch-English interlanguage users have their own set of politeness strategies, which are not straightforwardly similar to either English or Dutch trends. In some respects, the data are closer to the Dutch data, e.g. in terms of the infrequent use of off-record politeness strategies. In other respects, however, they come closer to the native English data, e.g. in terms of the modals that are used.

In another study about politeness strategies in non-native letters, Braecke, Cuyckens, Geluykens & Jacobs (1997) investigate the use of one particular lexico-grammatical device, viz. modal auxiliaries, in four different FTAs in the three parallel corpora. The FTAs involved are announcements, offers, requests and warnings (in ascending order of face-threatening force). It was found that on the whole and for all four FTAs together, native English writers use more modals than native Dutch writers, while interlanguage users occupy a position in between. It is also clear that for the native data generally, a rising force of face-threat is accompanied by a rising frequency of modals. Here the interlanguage data are sometimes closer to native English and sometimes closer to native Dutch, and for offers they show a break in the pattern. If the data are viewed in terms of figures per modal, however, it can be observed that as face-threat increases, transfer seems to increase too. In other words, for relatively weak FTAs like announcements, there was great overlap between native English and Dutch-English usage. For warnings, on the other end of the face-threat scale, substantial differences were noted. Whereas native English prefers mitigating modals (would, should, could), non-native English writers use modals that are far more directly restrictive (have to, shall), which is similar to what native Dutch writers use in these FTAs (moeten, zullen).

Whereas the previous studies started from functional entities coded in the database, and correlated those to formal properties of the data in question, another study (Pelsmaekers & Braecke (to appear)), approaches the data the other way round, in a more traditionally contrastive fashion, and investigates the discourse-functional properties of utterances in which one particular lexico-grammatical form appeared, viz. can in the native English letters and kunnen in the native Dutch corpus. For this purpose, the coded part of the ACID corpus was extended with another 50 native English and native Dutch letters. It was found that although the British letters have more modal auxiliaries, Dutch finite and non-past kunnen is used far more frequently than English can. In addition, when can appears, it does this more in speech functions that are formulated from a writer perspective (with a first-person subject), while kunnen is used far more with a second-person subject, i.e. in speech functions formulated from a reader perspective. Kunnen and can help to realise six different interactive functions with some frequency: Whereas kunnen typically helps to convey (positive terms of) offers and good news, can appears in all six functions, to a smaller extent, without being very dominant in any. It does, however, frequently appear in hedges and stock phrases, which often fall outside the main business of the letter or the main force of the utterance in which they appear. In two frequent uses for kunnen with second-person subjects, can does not appear at all. These functions are, roughly, details of offers (as in: met uw kredietkaart kunt u helemaal vrij over uw reserve beschikken, 'with your credit card you can? freely dispose of your reserve funds') and procedural directives (as in: u kunt inschrijven aan de hand van het bijgevoegd formulier, 'you can? register by means of the enclosed registration form')

The studies outlined above do not only have descriptive relevance, but may also be valuable from a pedagogical point of view. They point to areas in ESP teaching which are likely to need fine-tuning for native speakers of Dutch. Without wanting to rashly impose British English usage as the absolute norm for non-native or "outer-circle" English usage [3], teachers of Dutch and Flemish EFL students may use findings such as those described above to increase their students' awareness of potential politeness problems in the perception of their prospective British audiences. Studies like these can therefore help advanced L2 learners to make more conscious choices about the kind of style they want to adopt in their business letters.

A few more studies could be mentioned here that were also devised with an eye to EFL teaching, but that make more marginal use of the ACID database, in that other phenomena than politeness or interpersonal strategies were considered, which were not always explicitly coded, but easily retrievable as well. Pelsmaekers, Braecke & Geluykens (forthcoming) analyse the use of clause linking devices in both English L2 student writing and the non-native English letters of the database to come to identify clause linking strategies that Dutch-speaking users of English employ in their writing, some of which may be more acceptable in Dutch than in native English. Building on the same data, Braecke, Geluykens & Pelsmaekers (1997) discuss the important role that positioning subclauses can play in achieving greater cohesion in English L2 writing, which is again somewhat less relevant for native Dutch writing.

Conclusion and prospects

It is clear that in its present form, the ACID database has already provided us with a great deal of information that is not easily available in other corpora. Other, and even much larger corpora of professional letters exist (see e.g. Pilegaard 1997, Yli-Jokipii 1996), and are used to study politeness, but they are usually less directly relevant for Dutch-English interlanguage phenomena. In the future, the ACID-database may even increase its significance by a further extension of its input and the well-considered simplification of its coding system in certain tiers.

Notes

1. Although the funding for the research project that initially set up the database at the Universiy of Antwerp was discontinued, several people have continued to either work with or at the database (see also list of references). Most notable among them is , now professor at the Englisches Seminar of the University of Münster, Germany.
2. When a writer uses an off-record strategy s/he does not literally perform the FTA. Whereas this notion is not without problems, it is usually taken to mean that the utterance is indirect but not conventionally so, so that the reader has the real option of ignoring the utterance as an FTA.
3. See e.g. Kachru (1995) for a discussion of the desirability that users of English who do not belong to native British, American or Australian speech communities, or the "inner circle" of users, adapt their usage to the norms of that "inner circle".

References

  • Braecke, C. , H. Cuyckens, R. Geluykens & G. Jacobs (1997). The use of modal auxiliaries in non-native communicative style. In M. Pütz (ed.) 67-84.
  • Braecke, C., R. Geluykens & K. Pelsmaekers (1997). Clause ordering as a text-building device in written L2. In M. Pütz (ed.) 35-52.
  • Brown, P. & S. Levinson (1987). Politeness: Some universals in language usage (Studies in interactional sociolinguistics, 4). Cambridge: C.U.P.
  • Geluykens, R. (to appear) Requests and politeness in native vs. non-native business correspondence.
  • Geluykens, R. & G. Van Rillaer (1995). Introducing ACID: The Antwerp Corpus of Institutional Discourse. In: Pelsmaekers, K. & R. Geluykens (eds.)(1995) Analysing Institutional Discourse (Interface 10/1),79-98
  • Kachru, Y. (1995). Contrastive rhetoric in World Englishes. English Today 41, 11/1: 21-31.
  • McWhinney, B. (1991). The CHILDES project: Tools for analyzing talk. Hillsdale, N.J.: Erlbaum.
  • Pelsmaekers, K. & C. Braecke (to appear) The interactive functions of can and kunnen in British and Flemish business letters.
  • Pelsmaekers, K., C. Braecke & R. Geluykens (to appear). Rhetorical Relations and Subordination in L2 Writing.
  • Pilegaard, M. (1997). Politeness in written discourse: a textlinguistic perspective on requests. Journal of Pragmatics 28: 223-244.
  • Pütz, M. (ed.) (1997) The Cultural Context in Foreign Language Teaching. Frankfurt am Main: Peter Lang. (= Duisburger Arbeiten zur Sprach- und Kulturwissenschaft, 32)
  • Yli-Jokipii, H. (1996) An approach to contrasting languages and cultures in the corporate context: Finnish, British and American business letters and telefax messages. Multilingua 15, 3: 305-327.

[table of contents]


Book notice

Bart Defrancq

Fuchs, C. and S. Robert (eds.) (1997) Diversité des langues et représentations cognitives. Gap : Ophrys. 288 pp. ISBN: 2-7080-850-1.

This book is a collection of papers read at an international round-table conference held in Paris, which gathered several renowned specialists around a series of fundamental questions on the relationship between language(s) and other fields of human cognition. The idea behind the conference was to offer a two-way approach to the relationship between language and cognition. A first set of papers (Systèmes linguistiques et représentations construites) illustrates how linguistic diversity provides interesting though puzzling data on the way in which the human mind considers the world. The second set (Opérations linguistiques et processus cognitifs) analyses the interaction between language, cognitive processes and brain activity. It is especially the last of these aspects that makes this collection most innovating, from a linguistic point of view at least.

The first part of the collection is indeed rather "classic" in that a majority of the papers deal with a topic that has enjoyed a lot of success in cognitive science, namely the linguistic expression of spatial relations. The already well-documented linguistic variety is confirmed here by papers on Maya Mopan (E. Danziger) and Austronesian languages (F. Ozanne-Rivière). Other papers include at least some information on spatial relations in Ancient Greek (H. Seiler), Chinese (M.-C. Paris) and Sign language (Ch. Cuxac). Yet only the paper by Danziger provides an interesting cognitive and independent test , viz. perception, for its hypotheses on the distribution of universal and language-specific procedures in the description of images. The other articles have more modest ambitions and concentrate on cross-linguistic description, in some cases completed with a diachronic perspective (F. Ozanne-Rivière, for instance). In one case, though, there seem to be serious problems with the interpretation of the data: the contrastive paper on Chinese and French by M.-C. Paris contains at least two conclusions that do not seem to fit the provided data (in an identical context of complex NPs, the complete absence of a preposition with a first person form and its occurrence in 11% of the cases with a second person form are called "une distribution (presque) semblable"; Chinese is said to make no distinction between direct and indirect imperatives, although the former forbids the use of a verb form, which is allowed by the latter).

The diachronic aspect of linguistic variety has also inspired the papers by A. Culioli and Ch. Marchello-Nizia. Culioli focuses on the development of polysemy and on the grammaticalization of lexical items in different languages, and Marchello-Nizia analyses the break-through of a new demonstrative paradigm in French during the 12th-16th centuries. The comparison of different stages in the evolution of a language is indeed, like she says, another way of studying linguistic variety. The first part of the collection is completed by an original paper on drum signals, which emphasises the importance of standard formulae in a sonicly limited code.

A more dynamic picture of cognition is drawn in part two of the volume, which starts with another "classic" in cognitive linguistics: metaphors. In the opening paper G. Lakoff provides an impressive collection of pieces of evidence in favour of a universalist thesis on their formation: data from English, Hopi, Chinese and Japanese illustrate that very similar processes are at work, which partly contradicts earlier investigations, in particular those by Whorf on Hopi. For G. Fauconnier metaphors are only one aspect of what he calls conceptual integration ("intégration conceptuelle"), a concept which allows him to account for certain syntactic phenomena in French (cliticization of the agent or beneficiary).

A short article by J. Lassègue on possible cognitive causes of linguistic variety and evolution leads us into the world of psycho-experimental and brain-oriented research. M. Kail gives an overview of research on comprehension in an attempt to introduce a hierarchy in the linguistic parameters the subject uses to interpret the role of the different parts of an utterance. This research covers a considerable number of languages, which perhaps explains why the synoptic table on p. 215 contains at least one mistake: case is alleged to be the most important factor for agent assignment in Dutch (‘Hollandais'), which is simply not true , especially if case is understood in its morphological sense, which appears to be the situation here.

Kail's overview of research on comprehension is nicely completed with two papers by J.-L. Nespoulous and B. Pachoud on language production, considered from a pathological point of view. Nespoulous shows how research on aphasic phenomena in different languages puts a particular traditional classification of them into question, in that the concrete manifestations of the linguistic deficiency are highly dependent upon the properties of the language involved. Pachoud, on the other hand, analyses the linguistic behaviour of schizophrenic patients and arrives at the conclusion that their linguistic deficiency parallels their motoric disability to the extent that they are unable to keep control of an initiated action or utterance. The linguistic disability of these patients is therefore mainly of a pragmatic nature and, unlike aphasia, probably not dependent upon language-specific properties. Finally, M. Besson and M. Kutas present us with a few interesting ideas about the relation between registered brain activity and language perception based on the measurement of electric brain reaction to errors. They conclude that reactions to syntactic errors appear to parallel reactions to errors in musical performances, which seems to cast at least some doubt on the autonomy of linguistic competence in the brain and opens up new vistas of research .

My overall impression of this volume is that it offers a well-balanced overview of different approaches to the relationship between language, cognition and the brain, and that the papers present rich, solid data that will encourage further research in a still relatively unexplored area of the human sciences.

 


[table of contents]


To the table of contents of other CONTRAGRAM issues