Newsletter 6

No 6
June - 1996


CONTENTS


Types of information in the CVVD

Filip Devos

One of the major steps in the methodology used in compiling CVVD lemmata is the formal analysis of verb complementation patterns. Different types of information are distinguished: distributional, categorial, relational, lexical and transformational information. Next to these formal information types, other types of information are to be found in the semantic organization of the lemmata, the examples, the frequency data, and in indications on stylistic and geographical variation (Devos, 1996). In this contribution I will briefly describe the main types of structural information and the semantic information mentioned in the CVVD.

1. Distributional information

For each verb, the description gives the distribution or syntagmatic potential of the verb, i.e. it describes the valency of the verb or the number of "Mitspieler". This is mainly indicated typographically:

het___         = avalent use
NP___          = monovalent use
NP___X         = bivalent use
NP___X (Y)     = bivalent use with optional complement  
NP___X Y       = trivalent use

2. Categorial information

For each verb, formal-categorial patterns or paradigms are listed (e.g. NP for noun phrases, AdjP for adjective phrases, Pfin for finite clauses or Pinf for infinitive clauses). Starting the analysis by providing functional-relational labels like "direct object" would imply that these labels have to be specified formally anyway, as, for instance, a direct object may occur in the form of a noun phrase, a finite clause, an infinitive phrase, etc. Moreover, a "superficial" approach implies that one interprets verb complements as directly as possible in their formal setup, describing the categorial type of complementation and the possible transformations, which is less interpretative than identifying syntactic relations, i.e. sentence functions. Sinclair (1987) argues that such a superficial approach has proven to be very profitable:

"If [...] the objective is to observe and record behaviour and make generalisations based on the observations, a means of recording structures must be devised which depends as little as possible on a theory. The more superficial, the better. As a general rule, in the research which underlies this project, it has proved profitable to remain quite superficial in terms of linguistic units through much of the description. Terminology was ad hoc and really no more than a labelling for identification. There is virtue in this, especially since we were examining evidence of a kind which had not been gathered before. It was a refreshing change from the usual unseemly rush of linguists to kick aside the concrete linguistic object in favour of some idealised abstraction. We have discovered a new respect for the raw substance of language." (Sinclair, 1987:107)

Moreover, the formal (and hence more transparant) apparatus, which should be as clear and uniform as possible, will contribute to the user-friendly nature of the dictionary. The verbal complements are mentioned to the right side of the verb form, i.e. after the bar. We distinguish the following major categories:

NP:
noun phrase, i.e. a phrase having a noun or pronoun as head (and, optionally, a determiner, quantitative or qualitative element)
PP:
prepositional phrase, i.e. a phrase introduced by a preposition
AdvP:
adverb phrase, i.e. a phrase having an adverb as head
AdjP:
adjective phrase, i.e. a phrase having an adjective as head
(to) Pinf1/2:
(to +) infinitive clause in which the subject of the infinitive is/is not co-referential with the subject of the main clause
Pger1/2:
gerund phrase in which the subject of the gerund is/is not co-referential with the subject of the main clause
Pfin:
finite clause introduced by dat/que/that,by of/si/whether or by a wh-word

PP stands for non-restricted prepositional paradigms. When the paradigms are restricted, on the other hand, the specific prepositions are mentioned:

NP___PP:
achter de schermen, tussen de bomen, in de kast, op het bord, etc. kijken.
NP___op NP:
niet op een frank kijken

3. Relational information

Obviously, formal notions alone, like NP, Pfin or AdjP, do not suffice to describe verbal structures exhaustively and unambiguously. Other differentiations will have to be made. One should, for instance, differentiate between (1) and (2) in English, which both have the structure NP___NP:

(1) John felt an insect (on his arm). = NP___NP
(2) John felt a fool. = NP___NP

These differentiations can best be found by looking at the grammatical potential of these elements, more particularly, by looking at possible transformations. Transformational procedures, like passivization, word order and pronominal or other substitution tests, show (1) to behave in a different way from (2):

(1') John felt an insect (on his arm). --> John felt it.
(2') John felt a fool. --> *John felt it.

On the basis of these transformational tests, a second level of description may be introduced in the analysis. Here, when necessary, relational notions will be mentioned: for instance, (2) will be described as a pattern allowing a subject complement.

(2'') John felt a fool. = NP___C1

Next to specifying the construction type, the introduction of relational notions has another advantage: it will allow us to reduce the amount of structural information given, as different categorial elements expressing the same relation are reduced to a common denominator. For instance, in listing an obligatory local complement (i.e. LOC), the default value of the categorial reading will be taken as being non-specified: when no categorial specifications are given, LOC, and likewise TEMP, will stand for "either AdvP, PP or NP". The following relational notions are used:

C1 = subject complement
C2 = object complement
MAN = manner complement
LOC = local complement
TEMP = temporal complement
QUANT = quantitative complement

In many valency studies these "hidden characteristics", i.e. the transformational procedures mentioned above, very often serve as a criterion for distinguishing between nuclear and peripheral sentence elements. The nuclear ones are traditionally said to make up, or to belong to the valency of the verb, while the peripheral ones do not, as only sentence elements having the structural properties of the kind described above are said to be closely related to the verb. But how many structural properties, and which exactly, make a sentence element a nuclear one, i.e. dependent on the verb valency? In the literature, this question has received various answers, though passivization, pronominalisation and non-deletability have been cited most often as criteria. We think of passivization, pronominalisation and word order as providing quite good criteria for distinguishing between valent and non-valent sentence elements. For all three, transformation tests can be used.

4. Lexical information

The formal typology, which is of a distributional, categorial and relational nature, is completed by lexical information, more particularly lexical specifications and restrictions. Restrictions are mentioned between square brackets, for instance NP[nh]___ , referring to a non-human noun phrase. Semantic restrictions pertaining to single lexical items are indicated in italics (e.g. NP [cats] in Cats swear when they are angry), while restrictions pertaining to conceptual domains are indicated between single quotes (e.g. NP['wonde'] in De wonde begon lelijk te zweren). For these specifications and restrictions one somehow has to find a balance between relevant and non-relevant features. Specifications are most often very relevant for the subject. The subject may have (optional) specifications or semantic labels like [h], [nh], [anim], [abstr], [concr], [plur], [sg] . An indication about the subject is only given when the subject is non-personal, or when a construction with a personal subject has a different meaning from a construction with a non-personal subject. Further, abstract or empty subjects, or obligatory forms concerning number and/or person of the subject are mentioned.

5. Verb specificational and transformational information

Finally, verb specificational or transformational information may be listed. In the lemma, the verb form is indicated by either a bar, standing for the verb itself, or by an other verb, which is a direct reference. Information about modality, tense, imperative, subjunctive or indicative forms, obligatory negation, obligatory passive form, obligatory use in particular sentence types (like questions), and the like, is also given here when this is obligatory for a certain verb form. E.g., ___[Imp] NP Pinf2 refers to the obligatory imperative form of the verb in this construction type. For instance, unlike in French, where regarder can easily take a NP Pinf2 pattern (e.g. Il ne lui restait plus qu'à regarder des Japonais, des Russes l'expulser du podium, le repousser à la huitième place), Dutch and English normally have a wh Pfin, i.e. a finite clause introduced by a wh-word. However, Dutch does allow the NP Pinf2 pattern, but then only when the verb has an imperative form (e.g. Kijk hem eens knoeien). Reference will also be made to constructions such as Er komen steeds meer verantwoordelijkheden bij kijken and Daar stond ze van te kijken, as kijken can only be used in these meanings with the auxiliaries komen and staan respectively. Further, for the sake of economy only one structure is mentioned for sentences like She gave him a book and She gave a book to him, viz. NP___NP (to) NP. Optional rewriting rules are mentioned for introductory constituents of the kind to be found in: Zij beschouwde het als haar taak om te redden wat er te redden was. Here, the pattern NP___NP C2 <___NP ( ___ HET + Pfin/(om) te Pinf> will be mentioned, indicating that the construction mentioned between brackets is an optional rewriting of the complement NP.

The lemma verb can be accompanied by the reflexive pronoun zich, se or oneself, which means that the construction with a particular verb takes an obligatory reflexive pronoun (e.g. zich vergissen: NP___ zich). Optional reflexive pronouns, on the other hand, are mentioned under the complement paradigm (e.g. zich wassen: NP___NP). Further, Dutch complex verbs (e.g. opkijken, uitkijken) are listed separately, as is done in traditional dictionaries. English phrasal verbs (e.g. look up, look forward) will also be listed separately in the CVVD database, as they have a valency pattern of their own (e.g. uitkijken naar NP, look forward to NP). It may have become clear that verb specificational or transformational data are only given when they affect the distributional and/or categorial-relational paradigms.

6. Semantic information

A further step in the analysis is the semantic regrouping of the patterns resulting from the formal analysis: all construction types will be re-categorized into a number of semantically distinct groups in one single schema. These meanings are distinguished by means of Roman numerals. Frequency data will then determine the order in which the semantic groups are listed in the lemma, the most frequent ones being listed first with the most frequent formal patterns being mentioned before the less frequent ones.

7. Summary

In this short contribution, the main types of information given in the CVVD entries have been mentioned. Table 1 summarizes and illustrates these main types by means of some descriptions from the entry kijken/regarder/look.

Table 1: Summary: information types in the CVVD
type of information  example  explanation 
distributional  NP___at NP  bivalent verb (i.e. subject + verb + complement) 
categorial  NP___wh Pfin  complement specified as a finite clause introduced by a wh-word 
relational  NP___LOC  complement specified as a local complement
lexical  NP[nh]___LOC  subject specified as being non-human 
verb specificational and transformational  NP___[Imp] NP Pinf2  verb obligatory in the imperative form
semantic  '(not) pay attention to'  grammatical patterns are glossed between inverted commas in a separate column 

In sum, the notion of valency can best be defined on the basis of the distributional, categorial, relational, lexical and verb transformational properties we have distinguished. We take verb valency to refer to the number of slots a verb may have or prototypically has (i.e. distributional properties) related to the particular form in which the verb may occur (i.e. verb transformational properties) and to the particular form (i.e. the categorial properties), the particular function (i.e. relational properties) and/or lexical make-up (i.e. lexical properties) of the elements that can enter these slots. These interrelated properties are determined by the semantics of the verb.

References

  • Devos, Filip (1996), Contrastive verb valency: overview, criteria, methodology and applications. In: Anne-Marie Simon-Vandenbergen (ed.), Aspects of contrastive verb valency (Studia Germanica Gandensia, 40) (to appear).
  • Sinclair, John (1987), Grammar in the dictionary. In: J. Sinclair (ed.), Looking Up: An Account of the COBUILD Project in Lexical Computing and the Development of the Collins COBUILD English Language Dictionary. London: HarperCollins, pp. 103-115.

[table of contents]


The Odense Valency Dictionary

Lene Schøsler and Karen Van Durme

0. Introduction

The Odense Valency Dictionary (OVD) Project was set off in 1991 under the auspices of the Danish National Research Council for the Humanities with the purpose of developing a man and machine readable valency dictionary of Danish verbs. The project was started by Dr Lene Schøsler in cooperation with PhD-student Sabine Kirchmeier-Andersen at the University of Odense. Since then two more PhD-students joined the OVD; Jan Daugaard and Karen Van Durme.

There are no valency dictionaries for Danish and existing monolingual dictionaries are too inconsequent and inhomogeneous [1] to be useful as a source for Danish valency dictionaries. As a result we wished from the start to design a valency dictionary for Danish on the basis of language immanent criteria and linguistic tests that can assure intra- and intercoding consistency. When we were introduced to the Pronominal Approach (PA), this method seemed an obvious point of departure for a valency description of Danish. The PA is a constructivist method developed by Prof Karel van den Eynde and his PROTON group at the Catholic University of Leuven in Belgium and presented in a.o. Van den Eynde & Blanche-Benveniste (1978), Blanche-Benveniste et alii (1984), Eggermont & Melis (1992), Melis & Eggermont (1994) and Melis (1995). One advantage of the method is that it is based on language immanent criteria which have proven their worth in the Dutch and French valency dictionaries produced by PROTON. Another advantage is that the PA in its methodological basis closely relates to Danish structuralism, which we were familiar with. Outside Denmark, Danish structuralist approaches are best known in the form of the glossematics theory by Louis Hjelmslev and descriptive studies on Romance languages by Knud Togeby.

1. General Principles for the PA and the OVD

In agreement with Harris (1965) the fundamental principle of the PA is that linguistic facts must be examined distributionally; only this allows us to work with procedures and linguistic tests that are operational and language immanent. The valency or combinatorial potential of a lexical element can therefore be described by examining the elements which are selected by the valency kernel. Here we have to distinguish between restrictions that are category specific and belong to the grammar and restrictions that are specific for a particular valency kernel and therefore must be included in the dictionary. Only the latter are relevant for a valency description [2]. In order to uncover the combinatorial potential of a particular valency kernel, one has to study its quantitative and qualitative valency. One way of doing this is to base one's investigations on actual sentences, for example from a corpus. The problem is however, that the possible combinations of lexical valency elements are legion and that it is therefore rather difficult to extract syntactic and, even more so, unambiguous semantic features from lexicalised constructions [3]. The PA on the other hand, solves this problem by stating that there exists a constant relationship of proportionality between lexicalised entities and pronominal paradigms. Conform to the assumption of the PA that a finite number of pronouns or proforms constitute the syntactico-semantic skeleton of the language (see Gebruers 1991), the OVD therefore bases its analysis of the restrictions of the valency kernel on the distribution of the proforms. The first, most imperative task for the OVD was therefore to establish and describe an inventory of proforms for Danish. The results of this inventorisation are presented in Schøsler (1994). As Danish is a language with few cases which on sentence constituent level formally only can distinguish between subject and non-subject case, it immediately became clear that additional syntactic tests had to be devised in order to give a exhaustive description of valency patterns. The OVD has by now established formal distinctions [4] between 10 syntactic functions and 20 syntactico-semantic slot fillers or argument types for verbal valency kernels and has coded ca 3000 of the planned 4000 verb readings in the database. These 3000 readings correspond to approximately 1500 different verbs. A reading of a verb is defined by its quantitative and qualitative valency. If a valency kernel has more than one valency pattern that cannot be reduced, it has to be divided into distinct readings (see also section 2.5).

2. Particularities of the OVD

2.1. General differences between the OVD and the PROTON dictionaries

The OVD differs from the PROTON dictionaries in several respects which we cannot treat exhaustively here. These differences are not so much due to methodological disagreements as to the fact that the OVD could take advantage of the experiences and results that were gathered in the course of the PROTON project.

The first difference between OVD and the Dutch and French dictionaries of PROTON is the result of differences in language structure since the method of analysis is based on proforms and therefore language specific, as already mentioned. Another difference is that the OVD, contrary to the other two dictionaries, does not operate with a full pronominal inventory but instead has attempted to derive syntactico-semantic features from a representative sample of proforms. One of the advantages of the feature system of the OVD is greater transparency [5]. In addition, the OVD records several types of linguistic data, a.o. recovered by means of systematic tests, that have proved relevant to the valency description of verbs. The data include information on diathesis, auxiliaries, verbal aspect, preliminary subject or expletive, existential, verb type and linking. Daugaard & Kirchmeier-Andersen (1995) give an overview of the different types of linguistic data and argue for and exemplify their relevance for a description of the valency of Danish verbs. Besides their relevance in valency description, the additional data were included because of our institutionalised collaboration with a research team from the Business School of Southern Jutland, who work on a Danish version of the machine translation system METAL. While implementing the PROTON dictionaries for Dutch and French, METAL often stood in need of this kind of data. For further details see Adriaens (1992) and Christoffersen (1995).

2.2. Verb typology

The need for a consistent verb typology arose in the course of the actual verb coding in the OVD database. This practical coding work confronted us with a number of verbs which seemed to defy description by means of the PA. The verb typology that gradually emerged from our work on the OVD [6] is by no means final or comprehensive but it attempts to offer some operational procedures to distinguish between verb uses.

The basis of the classification are the concepts of proportionality as defined in section 1 and obligatoriness/optionality. As for the coding format of the OVD, our point of departure are declarative sentences in the present tense with canonical word order. The term extension is used to refer to any pronominal, nominal (including prepositional), verbal, adjectival, adverbial or sentential constituent of the verb that in the broadest possible sense can be regarded as falling under the combinatorial potential or valency of the verb. We reserve the notion of valency element as such for extensions that are proportional to pronominal paradigms. Constituents that do not fall under the valency of the verb are called adjuncts. Our working hypothesis is that adjuncts are optional, extensions that are not proportional to pronominal paradigms are obligatory and valency elements can be both obligatory and optional.

2.2.1. Full verbs and non-full verbs

A first, very general distinction we want to make is between full verbs (FV) and non-full verbs (NFV). Verbs are used as FV when their nominal, verbal and adjectival extensions take the form of pronominal paradigms or corresponding lexicalised phrases. Pronominal and adverbial extensions do not seem to be relevant for the typology and will therefore in the following not be referred to or included in the use of the term "extension". FV in other words, have a valency pattern of their own [7]. Verbs which have extensions that are not proportional to pronominal paradigms, in other words pseudo elements, are used as NFV. In example (1a,b,c) the nominal, verbal and adjectival extensions of the verb are proportional to pronominal paradigms, in example (2a,b,c) they are not:

(1) a. I refuse his offer to assist me  
       I refuse it/what                                 
    b. I refuse to sing                                 
       I refuse that/what                               
    c. I paint the house blue                           
       I paint the house like that                      

(2) a. I have the right to be angry
       * I have it/what 
    b. I have to sing
       * I have that/what
    c. I am good at languages
       * I am such/like that

As should be clear from our classification criteria for FV and NFV, the notion of proportionality as such is not sufficient to distinguish FV and NFV; we need both the concepts of proportionality and obligatoriness/optionality. It is for example not correct or enough to say that FV always and exclusively occur together with constituents that are (lexicalisations of) pronominal paradigms and NFV not. Adjuncts can also be proportional (e.g. adverbials of time, place and manner) or non-proportional (e.g. sentence adverbials or modal adjuncts like of course, not, surely...) to proforms and occur both with FV and NFV. They have no influence on the classification of verbs though. We therefore need the additional criterion of obligatoriness/optionality to classify verb uses. Obligatory non-proportional extensions only occur together with NFV. Non-proportional adjuncts are optional and do not affect the verb classification. A further typology of adjuncts is beyond the scope of this study. The terminological vagueness of the terms extension and adjunct is not really problematic here. Even though proportional extensions or valency elements are sometimes difficult to distinguish from adjuncts because both can be optional, no such difficulty exists with non-proportional extensions; they are necessarily obligatory while non-proportional adjuncts are optional (see table 1 below). We assume here by the way, that valency-boundness is a relatively unproblematic issue, which of course it is not. It is only beyond the scope of this paper. For a brief discussion of criteria for valency-boundness see a.o. Kjærsgaard & Schøsler (1992).

Table 1
PROPORTIONAL  OBLIGATORY  VALENCY 
+/-  VALENCY ELEMENT 
PSEUDO ELEMENT 
ADJUNCT 
MODAL ADJUNCT 

After having introduced proportionality as a means of distinguishing FV from NFV, we can refer the further subclassification of FV to the different pronominal valency paradigms of the FV. A possible subcategory of FV use would for example be incorporation. Incorporating verbs are not stressable and in this respect resemble NFV. On the other hand, the extension of an incorporating verb is proportional to a pronominal paradigm (3a) but in this case the paradigm is reduced in comparison with other FV uses (3b).

(3) a. He eats fish             
       * He eats it                     
       What does he eat?                
    b. He eats this apple 
       He eats it 
       What does he eat?

NFV can be subdivided in the same groups as FV, namely verbs with nominal extensions (support verbs), verbs with verbal extensions (auxiliaries and modals) and verbs with adjectival extensions (copula verbs). The difference with FV is, as already mentioned, that the extensions of NFV are not proportional to pronominal paradigms and therefore can be regarded as part of a complex predicator together with the verb. For a further classification we refer to Van Durme & Van den Eynde (forthcoming) and Van den Eynde & Van Durme (forthcoming).

2.3. Linking in the OVD

The main objective of the PROTON project and the OVD is to give a systematic valency description of the verbal kernel (only FV in a first stage, complex predicators afterwards) and arrive at consequent sense distinctions on the basis of syntactico-semantic features.The method developed allows for a very fine-grained system of sense distinctions but as a result occasionally lacks transparency. Another problem is that sense distinctions for non-related homonyms as in (4) in practice occur at the same level as sense distinctions between related homonyms, as in (5):

(4) Det regner                                          
    It rains
    Jeg regner med dig  
    I count on you
(5) Jeg fylder flasken med mælk         
    I fill the bottle with milk
    Jeg fylder mælk i flasken   
    I fill (= pour) milk in the bottle

The PROTON dictionaries partly solved this problem by grouping related homonyms under the same heading with corresponding reading number, but this is rather an ad hoc-solution. A possible means of interconnecting related homonyms is to examine the nominalisation possibilities of the verbs as outlined in Schøsler & Weilgaard (1995) and in Schøsler & Kirchmeier-Andersen (1996) Another solution which, however, does not apply in all cases, is connecting related homonyms by means of linking. Within the framework of the OVD S. Kirchmeier-Andersen has developed the notion of linking on the basis of research by Carmen Eggermont (see Eggermont 1992 and 1994). For a more detailed discussion on linking see Schøsler & Kirchmeier-Andersen (1996), chapter 2.

2.3.1. Criteria for linked constructions

In our typology linked sentence constructions in modern Danish are characterised by:

a. Stability of syntactic word order

This means that linking is based on a fixed surface word order (subject - verb - object) as it is given for declarative sentences in the present tense.

b. Stability of verb morphology

This means that there are no changes in the diathesis or in the temporal features of the verb group. Since passive in Danish can be morphologically marked it is not considered a linked construction.

c. Stability of pronominal paradigms

This means that the valency slots of the alternating syntactic functions are filled by the same pronominal paradigms.

d. A relation of entailment

This means that one construction entails or implies the other as in (15) and (16) below, where (6) entails (7):

(6) Jeg knækker grenen      I break the branch
(7) Grenen knækker          The branch breaks

This entailment relation is generally unidirectional; (16) for example, does not entail (15). In certain cases a bidirectional entailment seems possible.

On the basis of these criteria a typology with four main types of linked constructions was developed and is now included as a submenu in the OVD reading format.

2.4. Use of corpora

As we mentioned in section 1 the PROTON dictionaries and the OVD describe valency patterns with the help of pronominal paradigms. Contrary to PROTON though, the OVD in addition consults corpora and has at least two reasons for doing so. First, intuition or introspection do not assure the identification of all relevant valency patterns. Since Danish monolingual dictionaries are no great help in valency research, the coding of valency patterns in the OVD necessarily starts from scratch (see note 1). For each of the readings of a verb the PA ensures systematic handling of the information, but it does not ensure the inclusion of all readings. A large-scale corpus helps the lexicographer identify all relevant readings and reduces the likelihood that different lexicographers produce different codings. It is particularly useful for listing complex predicators since it is much more difficult to identify e.g. all the nominal extensions of a support verb than all the construction possibilities of a ordinary verb. Second, information about relative frequencies of verb readings is extremely interesting as it tells us about ongoing changes in the language. As the corpus we use is coded with text type information, the relative frequencies also provide information of "stylistic" interest, a point which we will discuss further in section 2.5. The OVD has two corpora at its disposal. The Bergenholtz Corpus (Bergenholtz 1992) which we usually consult is a general language corpus of 4 million words. It comprises four annual volumes of 1 million words each from the years 1987-1990, consisting of excerpts from 50 newspapers, 50 weekly magazines and 100 newly published novels. In addition, we have access to the database of the coming Dictionary of Modern Danish, a stylistically balanced representative corpus of 40 million words from the years 1983-1992.

2.5. Text type research

Yet another reason to use corpora springs from our cooperation with the Business School of Southern Jutland who study valency in language for specific purposes. In order to examine and compare valency patterns in language for general (LGP) and language for specific purposes (LSP), the OVD is used as a reference frame to establish differences in valency patterns between LGP and LSP.

Our comparison between LSP and LGP is based on the following assumptions about how differences may be described systematically. We believe that differences may be defined on the basis of a classification of verbs and readings of verbs, in the following types of subsets:

1. a) Verbs which occur in LSP corpora only
   b) Verbs which occur in LGP corpora only
2. a) Verbs which occur in both types of corpora, but  with  readings  
      which appear in LSP corpora only
   b) Verbs which occur in both types of corpora,  but  with readings 
      which  appear in LGP corpora only
3. Verbs whose valency frames contain the same  number  of arguments, but 
   with different slot fillers for the arguments in LSP and LGP

Differences such as between 2 and 3 can be measured in terms of frequency variations within the following parameters:

- are all readings represented in both types of corpora?
- are the valency patterns identical?

Other parameters are not associated with valency:

- do we find the same frequency, within the two types of corpora, of subjectless constructions, defined as constructions in which the grammatical subject of the active sentence has not been realised, i.e. passive constructions and infinitive constructions?
- do we find the same distribution of tense and the same distribution of grammatical person in the two types of corpora?

The results of our investigations so far (See Schøsler et alii 1994, Schøsler & Weilgaard 1995 and Schøsler & Kirchmeier-Andersen 1996) are that the OVD is a suitable tool for offering a detailed valency description as well as establishing a norm in relation to which a certain "sublanguage" is characterised as a subset. In this context the Bergenholtz Corpus is conceived of as a representative of "general language" and it is noteworthy that even in a corpus of 4 million words not all identified readings are represented. On the basis of the established criteria for comparisons the LSP corpora which have been investigated may be characterised as essentially different from LGP corpora as the former display a fairly restrictive choice of readings and slot fillers (mostly concrete slot fillers). In addition, they are dominated by constructions without a specific subject, as well as by present tense and third person constructions.

Our conclusion is therefore that a description of differences between types of language on the basis of frequency variations obviously requires the use of corpora. We intend to continue our comparative study of LGP and LSP corpora in order to formulate more specific features for the latter.

3. Spin-off

In the course of the OVD project several other valency databases have been compiled on the basis of the OVD coding format. These, less comprehensive databases serve in the first place as research data for three Ph D-projects on respectively the valency of nouns (Sabine Kirchmeier-Andersen), adjectives (Jan Daugaard) and support verb constructions (Karen Van Durme).

Notes

1. See Schøsler & Kirchmeier-Andersen (1996); in chapter 3 a comparison between the OVD and other dictionaries reveals the inconsequent ad hoc procedures for sense distinction which are used in existing monolingual dictionaries.
2. See Welke (1988) and Gebruers (1991).
3. See section 2.4 for a justification of corpus use at a particular step in the valency analysis and for specific purposes.
4. The number of formal distinctions is the result of a practical classification and not a theoretical prerogative. The terminology of the slot fillers is described in Daugaard & Kirchmeier-Andersen (1995) and is conform to PROTON terminology.
5. For more details see Kirchmeier-Andersen (1995) and Schøsler & Kirchmeier-Andersen (1996), chapter 2.
6. For different versions of the verb typology see Schøsler et alii (1994) and Van Durme & Van den Eynde (forthcoming).
7. FV can of course also be avalent: in the example It rains the pronominal extension it is a mere placeholder and not a genuine valency element because it is not proportional to a pronominal paradigm.

References

  • Adriaens, Geert & Gert De Braekeleer (1992). Semiautomatic lexical acquisition for machine translation: from Proton descriptions to Metal frames. In: Eggermont & Melis.
  • Bergenholtz, Henning (1992). DK87-90. Århus: Handelshøjskolen i Århus.
  • Blanche-Benveniste, Claire et alii (1984). Pronom et Syntaxe. L'Approche pronominale et son application au français. Paris: SELAF.
  • Boje, Frede & Lene Schøsler (1992). DISEM, A Semantic MT-Component. CST Working Paper 1. Copenhagen: CST.
  • Christoffersen, Ellen (1995). Testning af fagsproglige valensrammer i et maskinoversættelsessystem. In: Kjærsgaard & Schøsler, pp. 75-88.
  • Daugaard, Jan, ed. (1995). Valency. The Pronominal Approach applied to Danish, Russian, and Chinese. Odense Working Papers in Language and Communication 8. Odense: Institute of Language and Communication, OU.
  • Daugaard, Jan, Sabine Kirchmeier-Andersen & Lene Schøsler (1992). Parsing large scale corpora for valency information. In: Eggermont & Melis.
  • Daugaard, Jan & Sabine Kirchmeier-Andersen (1995). The Odense Valency Dictionary Programme for Verb Coding. In: Daugaard, pp. 3-35.
  • Eggermont, Carmen (1994). Reformulations et reconstructions. Deux aspects de la systématique des verbes français. Doctoral thesis.
  • Eggermont, Carmen & Ludo Melis, eds. (1992). The Pronominal Approach: From verb to noun phrase. Texts of the final workshop PROTON I-II. Leuven: Linguistics Department, KUL.
  • Gebruers, Rudi (1991). On Valency and Transfer-Based Machine Translation. Leuven: KUL.
  • Harris, Zellig (1965). String analysis of sentence structure. The Hague: Mouton.
  • Hjelmslev, Louis (1943, 1966). Omkring Sprogteoriens Grundlæggelse. Copenhagen: Akademisk Forlag.
  • Jansen, Steen et alii, eds. (1992). Computational Approaches to Text Understanding. Copenhagen: Museum Tusculanum.
  • Kirchmeier-Andersen, Sabine (1995). Valency, Sense Distinction and Inheritance in Different Types of Nominalizations. In: Daugaard, pp. 59-74.
  • Kirchmeier-Andersen, Sabine (forthcoming). Valency, Sense Distinction and Inheritance in Different Types of Nominalizations. In: Van Durme, pp. 59-88.
  • Kirchmeier-Andersen, Sabine, Bolette Sandford Pedersen & Lene Schøsler (1994). Combining Semantics and Syntax in Monolingual Dictionaries. Attacking the Enemy from Both Flanks. In: Martin et alii, pp. 136-146.
  • Kjærsgaard, Poul Søren & Lene Schøsler (1992). A Valency Based Description of Danish Verbs. In: Jansen et alii, pp. 45-61.
  • Kjærsgaard, Poul Søren & Lene Schøsler (1995). UDOG-rapport 3. Odense: Institute of Language and Communication, OU.
  • Maegaard, Bente & Bolette Sandford Pedersen (1994). UDOG-rapport 2. Copenhagen: Center for Language Technology, KUA.
  • Martin, Willy et alii, eds. (1994). Euralex Proceedings 1994. Amsterdam.
  • Melis, Ludo (1995). Les dictionnaires automatisés des valences verbales du français et du néerlandais développés à la K.U. Leuven. Présentation. In: Contragram 4, pp. 4-12. Gent: Department of Dutch Linguistics, University of Gent.
  • Melis, Ludo & Carmen Eggermont, eds. (1994). A pronominal approach to valency dictionaries. International Journal of Lexicography 7. Special issue.
  • Schøsler, Lene (1994). Feature Analysis of Danish Pronominal Paradigms with a View to a Danish Application of the Pronominal Approach. In: Melis & Eggermont, pp. 118-141.
  • Schøsler, Lene et alii (1994). Valensbeskrivelse af verberne fylde og fÅ. In: Maegaard & Pedersen, pp. 27-106.
  • Schøsler, Lene & Lotte Weilgaard (1995). Valensbeskrivelse af verberne TRYKKE og TRÆKKE. In: Kjærsgaard & Schøsler, pp. 31-74.
  • Schøsler, Lene & Sabine Kirchmeier-Andersen (1996). Studies in Valency II. Odense: Odense University Press.
  • Togeby, Knud (1951). Structure immanente de la langue française. Copenhagen: Travaux du Cercle Linguistique de Copenhague VI.
  • Van den Eynde, Karel & Claire Blanche-Benveniste (1978). Syntaxe et mécanismes descriptifs: présentation de l'approche pronominale. Cahiers de Lexicologie XXXII 1, pp. 3-27.
  • Van den Eynde, Karel & Karen Van Durme (forthcoming). A Classification of Dutch Modals. In: Studies in Valency III. Odense: Odense University Press.
  • Van Durme, Karen (1995). Valency of Support Verb Constructions. Some Problems. In: Daugaard, pp. 37-58.
  • Van Durme, Karen, ed. (forthcoming). The valency of Nouns. Odense Working Papers in Language and Communication. Odense: Institute of Language and Communication, OU.
  • Van Durme, Karen & Karel van den Eynde (forthcoming). A Verb Typology on a Distributional Basis. In: Studies in Valency III. Odense: Odense University Press.
  • Welke, Klaus (1988). Einführung in die Valenz- und Kasustheorie. Leipzig.

[table of contents]


Book notice

Dirk Noël

Karin Aijmer, Bengt Altenberg, Mats Johansson (eds.) (1996) Languages in Contrast: Papers from a Symposium on Text-based Cross-linguistic Studies, Lund 4-5 March 1994. Lund: Lund University Press. 200 pp. (= Lund Studies in English, 88.) (ISBN 91 7966 365 6)

Several contributors to this collective volume of papers presented at a symposium on corpus-based contrastive linguistics point out that, after having gone through a low for some time, cross-linguistic analysis is thriving again, and a major contributive factor has been the availability of the resources and methodology of computerized corpus linguistics: we now have the technology to help give contrastive descriptions a firm empirical basis. Contrastive corpus linguistics is still an embryonic discipline, though, and for ideal results it will not do simply to combine the already available monolingual resources. New bi- or multilingual corpora will have to be compiled, the design of which involves a whole new set of considerations.

The most basic one of these is the choice between compiling either a corpus of original texts and their translations, or a corpus of La and Lb texts which are not each other's translations but which are very much comparable contentwise and genrewise. It is symptomatic of the novelty of the discipline that there seems to be no agreement yet on what to call these two basic kinds of bilingual (or multilingual) text corpora. At a translation studies conference held in Dublin last month, I heard Carol Peters - who, together with Eugenio Picchi, is involved in the development of procedures for the construction and query of corpora of both kinds at the Instituto di Linguistica Computazionale in Pisa - and others refer to sets of translationally equivalent texts as 'parallel' corpora and to sets of texts from pairs or multiples of languages that share content and text type features as 'comparable' corpora. In this collection, however, most contributors reserve the term 'parallel' for the latter kind, calling the other kind 'translational' corpora, but even within this volume the first term is sometimes used to refer to both kinds.

A number of contributors repeat the caveat - also uttered by Carol Peters to the dissatisfaction of the translators in her audience - that translational corpora involve the risk of containing 'translationese', a term I use here to refer both to linguistic features more typical of the source language than of the target language, as well as to linguistic realizations of universal features of translated text, the existence of which the group around Mona Baker is trying to prove at the University of Manchester Institute of Science and Technology. If used with caution, however, i.e. in combination with a comparable corpus of original texts against which statements on differences and similarities between La and Lb can be checked, translation corpora have the advantage that their sets of texts are more truly "parallel" and that, as a result, they can be aligned to a much finer extent than the sets of texts in comparable corpora. Because of this greater potential for alignment translation corpora are also more apt to reveal features of both the source and the target language which monolingual analyses of both languages would be unlikely to bring to the surface.

All this, to me, constitutes the main, most generally useful, message that can be gleaned from the individual contributions to this volume, which I will not go into in detail. I will just mention that in addition to a programmatic paper on "new challenges for contrastive linguistics" (Kari Sajavaara) the volume contains reports on contrastive corpus projects undertaken in Belgium (Sylviane Granger, Louvain), Denmark (Karen Lauridsen, Aarhus Business School); Sweden (Karin Aijmer, Bengt Altenberg, Mats Johansson and Martin Gellerstam, Lund; Lars Ahrenberg and Magnus Merkel, Linköping) and Norway (Hilde Hasselgård, Stig Johansson, Jarle Ebeling and Knut Hofland*, Oslo/ *Bergen).


[table of contents]


To the table of contents of other CONTRAGRAM issues