现代汉语构词计量研究

发布时间：2018-05-24 20:48

本文选题：现代汉语 + 构词　；参考：《浙江大学》2016年博士论文

【摘要】：汉语构词是汉语语言学的重要领域。针对汉语词的内部结构及相关的构词现象,已经进行了为数众多的研究,并积累了丰富的认识。汉语构词研究的传统路向往往将研究视野限定在单个的汉语词,容易忽视汉语词所处的真实语境,对于定量方法的重视程度也有待提高。这些因素使得汉语构词研究难以考察与语言真实使用密切相关的构词现象,尤其是汉语词在语言真实使用当中的形成机制和相关的构词属性。本研究尝试了一种基于真实语料并采用定量方法的汉语构词研究新路向。新路向是一种基于使用的汉语构词研究路向,作为其研究对象的词是在汉语真实语境当中使用着的词。由于字是汉语真实语境中能够被直接观察的唯一语言单位,新路向将其作为构词的基本单位,而词被视为字在语言真实使用中形成的固定序列。新路向充分尊重汉语词作为隐含在表现为连续字流的真实语境之下的语言单位的事实。除了在真实语境中考察单个的词之外,新路向也将研究视野拓展到了缺乏词边界的整个汉语语境。这种经过拓展的研究视野有助于探索汉语词在语言真实使用中的形成机制和相关构词属性。同时,新路向选择比较简单并且符合人类认知规律的计量指标,以便让研究结果在语言认知方面具有说服力。本研究采用这种新路向试图解决三个问题。(1)真实语境中的字-字同现关系的统计特征如何反映字的自由／粘着性和定位性?(2)不同的字具有怎样的构词能产性定量特征,其与字-字同现关系的统计特征之间有何联系?(3)字-字同现关系的统计特征如何反映二字词在真实语境中的形成机制?本研究以语言复杂网络的模型和指标为主,计量语言学的传统方法为辅来解答第一个研究问题。一方面,基于结合面宽窄的思想,提出了同现字丰富程度的概念。给定字的同现字丰富程度越高,其结合面就越宽。基于汉语真实语料,将给定字及其与其他字形成的同现关系构建为有向有权的字同现网络,采用平均n次累积出/入度/熵的指标来测量给定字单侧的同现字丰富程度。从概率配价的角度而言,这类计量指标反映的是给定字形成字-字同现关系的效率。另一方面,通过计量语言学的传统方法,对给定字单侧的同现字的类符总数和熵的增长趋势进行了非线性拟合,并基于拟合曲线的计量特征来反映字在单侧的同现字丰富程度。结果表明,字的同现字丰富程度能够在较大程度上反映出字的自由/粘着性和定位性。给定字单侧的同现字丰富程度与其在同侧的构词概率呈现显著的负相关。单侧同现字丰富程度越高,给定字就越不倾向于与同侧的同现字构词(或者说在该侧的自由性趋势越强)；反之,给定字就越倾向于与同侧的同现字构词(或者说在该侧的粘着性趋势越强)。给定字与其单侧频数最高的同现字往往是要构词的。这一趋势对于单侧同现字丰富程度较低的字尤其明显。给定字单侧的同现字丰富程度与其在同侧的构词偏好也呈现显著的负相关。给定字两侧的同现字丰富程度的差异能够大致反映其定位性。统计结果显示,常用字在右侧的同现字丰富程度的总体水平要显著低于左侧,表明这些字可能更容易与右侧的字形成词或者短语。字的整体同现字丰富程度能大致反映字的整体自由/粘着性。整体同现字丰富程度最高的字往往具有语法化趋势,其中也包括了用作单字虚词的字。这些单字虚词在真实语境中是重要的分词符,其对应字的高水平的同现字丰富程度反映着这些词在真实语境中的形成机制。非线性拟合结果表明,字单侧的同现字类符数和熵的增长趋势在较大程度上可以用适宜的曲线方程进行描写。方程的计量特征也能大致反映字在单侧的同现字丰富程度,以及字在同侧的自由/粘着性。然而,非线性拟合的方法的适用性不及基于平均n次累计度/熵的方法。解答第二个研究问题,首先需要进一步明确构词能产性的本质,并区分不同的构词能产性计量指标的意义。构词能产性被区分为过往和当前构词能产性两大类。前者是指在当前的共时阶段之前的构词能产性,而后者是指当前共时阶段的构词能产性,能预测未来产生新词的能力。从同现字丰富程度的不同水平段抽取了部分字,并考察了其过往和当前构词能产性的计量特征。结果表明,字的过往构词能产性与同现字丰富程度之间存在一定的关系。从概率配价的角度来看,这种关系反映着字形成字-字同现关系的效率与构词能力之间的关系。对于同现字丰富程度极高和极低的字,其过往构词能产性一般都是极低的。具有高水平的过往构词能产性的字,其同现字丰富程度应该处于适中的水平。非线性拟合的结果显示,字构词的类符总数的增长趋势可以在较大程度上使用适宜的曲线方程进行描述。曲线的统计特征能够大致反映给定字的当前构词能产性。然而,非线性拟合的方法的适用性不及基于一频次词的构词能产性指标。另外,字构词的丰富程度也能反映其当前构词能产性。字的过往和当前构词能产性之间具有明显的正相关性。字在当前构词能产性上的差异能够反映词库与词法之间的平衡关系。给定字的当前构词能产性越强,其构词过程对于词法的依赖程度可能就越高；反之,其构成的词就越有可能是存储在心理词库当中。本研究从频数的概念入手,研究汉语二字词在真实语境中的形成机制。首先提出假设,认为频数效应在二字词形成过程中的具体作用是在其局部语境中将其突显为一个内部结合强于外部结合的结构性整体。满足这种条件的字-字同现对是文本对应的有向有权字同现网络中的二节点边岛屿。在取自两种语类的汉语真实语料的基础上,分别构建了两个以字为节点、有向的字-字同现关系为边、字-字同现关系频数为边权重的语言网络模型。对两个网络的岛屿提取与分析结果表明,基本上只有汉语二字词才能在有向有权字同现网络中形成二字边岛屿(即,内部结合强于外部结合的有序字对)。而且,任何二字词在适宜的语境中都有机会形成二字边岛屿。这些发现表明,汉语二字词作为结构性整体的特性能够在缺乏词边界的真实语境中自行体现出来。边岛屿的概念为汉语二字词作为结构性整体在缺乏词边界标记的语境中的突显提供了明确的机制,该机制对于二字词在汉语的真实使用当中的形成和习得具有至关重要的作用。另外,边岛屿的提取过程还揭示了一些多字词的形成过程及其内部的层级结构。本研究从实证和定量的角度填补了汉语构词研究领域的某些空白,并有助于加深对于相关构词现象的认识。本研究的发现表明,汉语构词的相关特征与机制能够在较大程度上通过真实语境中的适宜统计特征体现出来。这些统计特征往往与相关的语言单位的相对频数联系密切。例如,字的同现字丰富程度是由字的同现字的相对频数分布决定的；字的当前构词能产性(尤其是基于词的丰富程度的指标)是由字构成的词的相对频数分布反映的；而二字词形成的机制则涉及到有序字对在其局部语境中的频数的相对大小。总之,汉语的词及相关的构词属性(例如,字的自由／粘着性、定位性和构词能产性)能够在较大程度上通过真实语境的简单统计特征得以体现并习得。
[Abstract]:Chinese word formation is an important field of Chinese linguistics. In view of the internal structure and related word formation of Chinese words, many studies have been carried out and rich knowledge has been accumulated. The traditional way of Chinese word formation research is longed to the study field of view in a single Chinese word, which is easy to ignore the true context of Chinese words. The importance of the quantitative method needs to be improved. These factors make it difficult for Chinese word formation research to examine the word formation that is closely related to the real use of language, especially the formation mechanism and the related word formation properties of Chinese words in the real use of language. This study attempts a basic Yu Zhenshi language and uses a quantitative method of Chinese language. The new direction is a new direction of the study of Chinese word formation based on the use of the word. As the object of the study, the word is the word used in the real Chinese context. As the word is the only language unit that can be directly observed in the true context of the Chinese language, the new direction is used as the basic unit of the word formation, and the word is regarded as a word in the language. A fixed sequence formed in real use. New directions fully respect Chinese words as a linguistic unit implicated in the true context of a continuous word flow. In addition to examining a single word in the real context, the new direction extends the research field to the whole Chinese context lacking the boundary of words. The field of vision is helpful to explore the formation mechanism and word formation properties of Chinese words in the real use of language. At the same time, the new direction is more simple and conforms to the human cognitive law, so that the results can be persuasive in language cognition. This new direction is used to solve three problems. (1) real language. How does the statistical characteristics of the word - word relationship in the context reflect the freedom / adhesion and localization of words? (2) what is the quantitative character of the word formation in different words, and what is the relationship between the statistical characteristics of the word and word relationship? (3) how to reflect the formation mechanism of the two words in the real context This study is based on the model and index of the language complex network, and the traditional method of Quantitative Linguistics is supplemented to answer the first research question. On the one hand, the concept of the richness of the same word is proposed based on the idea of the broad and narrow combination. The higher the rich degree of the given word is, the wider it is. The co-occurrence of a given word and its other words is constructed as a weighted word co-occurrence network, using the average N cumulative output / admission / entropy index to measure the richness of the single side of a given word. From the point of view of the probability distribution, this kind of metric reflects the efficiency of a given word to form a word and word relationship. By means of the traditional method of econometric linguistics, the number of the total number of characters and the trend of entropy growth on one side of a given word are nonlinear fitting, and the richness of the characters on one side is reflected on the basis of the measurement characteristics of the fitting curve. The result shows that the abundance of the same word can reflect the freedom of the word to a large extent. There is a significant negative correlation between the abundance of a single word on one side of a given word and the probability of the word formation on the same side. The more rich the single word is, the more the given word is not inclined to the same word with the same side (or the stronger the trend of freedom on the side); on the contrary, the given word is more inclined to the same side as the same side. The present word formation (or the stronger the tendency of the adhesion on the side). The same word with the highest single side frequency is often a word formation. This trend is especially obvious to the low richness of the single word. The abundance of the single side of the single word is negatively correlated with the word formation preference on the same side. The difference in the richness of the same characters on both sides of the word can roughly reflect its location. The statistical results show that the overall level of the abundance of the common characters on the right is significantly lower than that on the left side, indicating that these words may be more easily formed with the right word or phrase. Freedom / stickiness. The most rich word of the whole same word often has the trend of grammaticalization, which also includes the words used as the word function words. These word words are important participle in the real context, and the high level of the corresponding characters of their corresponding characters reflect the formation mechanism of these words in the real context. The results show that the growth trend of the number of characters and entropy of the single side of the single side can be described with a suitable curve equation to a large extent. The measurement characteristics of the equation can also reflect the richness of the characters on one side and the free / stickiness on the same side. However, the applicability of the nonlinear fitting method is not as good as that based on it. The average N cumulative degree / entropy method. To answer the second research questions, first, we need to further clarify the nature of the productive character of the word formation, and distinguish the significance of the productive measurement index of different word formation. The productive ability of the word formation is divided into two major categories of the past and the current word formation. The latter refers to the productive ability of the current synchronic phase of the word formation and the ability to predict the future generation of new words. Some characters are extracted from different levels of the rich level of the same word, and the measurement characteristics of their past and current word formation can be examined. The results show that there is a certain relationship between the productive character of the past word and the abundance of the same word. From the point of view of the probability distribution, this relationship reflects the relationship between the efficiency of word formation and the word formation and the word formation ability. The results of the nonlinear fitting show that the growth trend of the total number of characters of the word formation can be described with a suitable curve equation to a large extent. The statistical characteristics of the curve can roughly reflect the current word formation of a given word. However, the applicability of the nonlinear fitting method is not as good as the word formation based on a frequency word. In addition, the richness of word formation can also reflect the current productive ability of word formation. There is a clear positive correlation between the past and the current word formation of word formation. The difference in the productive ability of the word in the current word formation can reflect the balance relationship between the word bank and the word method. The more productive the current word formation can be, the process of the word formation of a given word is the process of word formation. The higher the degree of dependence may be, the more likely it is, the more likely it is to store the words in the psychological lexicon. This study, starting with the concept of frequency, studies the formation mechanism of the Chinese two words in the real context. First, the hypothesis is put forward that the specific role of the frequency effect in the formation of the two word words is in its local language. In the context, it is highlighted as a structural whole with an internal combination stronger than the external combination. The word word concurrently is the two node side island in the text corresponding to the text corresponding to the right word. On the basis of the authentic Chinese corpus from two languages, two words are constructed as nodes, and the characters are confluent. The results of the extraction and analysis of the islands of two networks show that only the Chinese two words can form two word side islands in the network with the right word (i.e., the internal combination is stronger than the external binding). Moreover, any two words are appropriate. There is a chance to form two word side islands in the context of the context. These findings show that the character of Chinese two words as a structural whole can be reflected in the real context of the lack of word boundaries. The concept of the border islands provides a clear mechanism for the appearance of the Chinese two words as a structural whole in the context of the lack of word boundary markers. This mechanism plays a vital role in the formation and acquisition of the two words in the real use of Chinese. In addition, the process of the extraction of the border islands also reveals the formation and internal structure of some words. This study fills some gaps in the field of Chinese word formation from the perspective of empirical and quantitative. The findings of this study show that the related features and mechanisms of Chinese word formation can be reflected to a large extent through the appropriate statistical characteristics in the real context. These statistical characteristics are often closely linked with the relative frequency of the related language units. For example, the richness of the same word is from the word. The relative frequency distribution of the same word is determined; the current word formation of the word (especially the index based on the richness of the word) is reflected by the relative frequency distribution of words made up of words; and the mechanism of the formation of the two word is related to the relative size of the order word in its local context. In a word, Chinese words and Related words are related. Word formation attributes (for example, the freedom / adhesion of words, orientation and word formation) can be embodied and learned to a large extent through the simple statistical characteristics of the real context.
【学位授予单位】：浙江大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：H146.1
，

本文编号：1930495

资料下载

论文发表

支付宝下载

Download by Alipay
微信下载

Download by Wechat
会员下载

Download by Member

本文链接：https://www.wllwen.com/shoufeilunwen/rwkxbs/1930495.html

上一篇：伊格尔顿的身体话语研究
下一篇：蒋介石与抗日战争几个问题之研究

论文发表

·知网|万方|维普|龙源|省级|国家级|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|