当前位置:主页 > 文艺论文 > 汉语言论文 >

语料库结构研究及其应用

发布时间:2018-05-06 14:43

  本文选题:语料库 + 结构 ; 参考:《江南大学》2012年硕士论文


【摘要】:基于真实的语言数据,语料库语言学以概率的手段从宏观角度进行语言分析,越来越受到语言研究者的青睐。语料库是语料库语言学的研究基础,建设一个全面、具有代表性的语料库对研究结果具有极其重要的意义。语料库的建设需要考虑诸多因素,如建库大小,语料的来源、类型等等。 语料库具不具有代表性,语料是否能全面的代表所要研究领域,折射出语料库的结构是否合理。语料库的结构主要涉及语料的分层标准及其在语料库中所占的相应比例两方面。本文由调查西方主要语料库的结构着手,借鉴系统功能语言学,研究试回答语料库在结构安排上存在何种潜在规律。系统功能语言学创始人韩礼德对语言有过系统的阐述。他认为语言整体上是一个连续体,口语和书面语处于连续体的两端。并且特别的指出居于连续体中间的语体既有口语特征,也具有书面语特征,同时向两端延伸演化为典型口语和书面语。连续体理论反对书面语第一位或口语第一位的论调,从语体上全面、辩证统一的描述了语言。借助于该理论,作者发现SEU语料库、Brown语料库、LOB语料库以及ICE-GB语料库的结构充分考虑了语体的因素,尤以SEU语料库最为突出。SEU中采取written origin、scripted to be spoken、Spoken origin三大主划分,语体从书面语逐步发展为口语。其中scripted to be spoken分层标准包括访谈、剧本、演讲稿等,精确的体现了连续体的口语和书面语的连续。Brown、LOB语料库未收录口语语体,正因为如此,它对书面语的归类具有示范性作用。参照连续体示意图,文章把综上分析结果以及各个主要分层比例一一对映于该坐标,最后得出了一个比较对称的图行,表明了这些语料库具有较好的代表性。但是,语体的分层标准并不是唯一的分类理据,诸如BNC语料库、LLELC语料库、MCLC语料库却采用学科划分标准,比如applied science, social science, arts等等。进一步的研究发现这两类分层标准并不是孤立的,ICE-GB中的learned and the popular分类的子分支沿用了social sciences, natural sciences,这证实该语料库同时采用了两类分层模式。 以上两种分层样式是较常见的语料库结构安排策略。未囿于此,该研究以自建英语专业相关知识语料库的结构为例,从实际出发,深入探讨其结构构建。首先基于英语专业的实习日志数据,分析学生所从事的行业以及英语用途,从而有效的表针社会对英语专业相关知识的需求。研究采用了2006届102名毕业生的实习日志,经过统计,34名同学未从事英语相关的职业。根据每个学生实习日志所关注的重点,剩余学生实习内容主要涉及外贸英语、英语教学、英语翻译、文秘英语、机械英语等行业。按照各个行业实际参入人数,计算出相应所占比例,从而得出各个层次的比重。借鉴学科分层模式,结合行业统计,文章初步给出了外贸、机械、计算机、教学等分层参考样式。每个分层之下,以外贸英语为例,本文运用连续体理论下语料库结构分析成果,尝试性的探讨了如何进行具体划分和收集语料。 着眼于主要西方语料库结构分析,本文结合实例探讨语料库结构划分。但因研究时间、精力有限,本文仍然存在不少亟待完善之处。仅仅102名学生的日志并不能有效的代表所有英语专业相关知识范畴。例如,所有的学生可能未从事与法律有关的英语工作,但这不能说明英语专业相关知识就不囊括法律英语。因此,后期研究仍期望有待进行。尽管如此,本文主要意在开拓一种新思路,为自建语料库,特别是语料库的结构安排提供建设性的借鉴。随着小型语料库不断受到言语工作者的重视,希望本文对语料库建设理论有所裨益。
[Abstract]:Corpus linguistics is becoming more and more popular with language researchers based on real language data. Corpus linguistics is becoming more and more popular with language researchers. Corpus is the foundation of corpus linguistics. Building a comprehensive and representative corpus is of great significance to the research results. Consider many factors, such as the size of the library, the source and type of the corpus.
The corpus is not representative. Whether the corpus can be fully represented is a reflection of the rationality of the structure of the corpus. The structure of the corpus mainly involves the stratification standard of the corpus and the corresponding proportion in the corpus of two aspects. This paper begins with the investigation of the structure of the main corpus in the West and draws on the functional language of the system. Hallidy, the founder of systemic functional linguistics, has a systematic exposition of language. He thinks that language is a continuum on the whole, spoken and written at both ends of the continuum. And it is particularly pointed out that the language in the middle of the continuum has spoken language features. It also has the characteristics of written language, and extends to the two ends as typical spoken and written language. Continuum theory is opposed to the first or the first spoken language of written language, which describes language comprehensively and dialectically. With the help of the theory, the author finds the structure of SEU corpus, Brown corpus, LOB corpus and ICE-GB corpus. Taking full consideration of the factors of the style of language, especially the SEU corpus is most prominent in.SEU, written origin, scripted to be spoken, Spoken origin are divided into three major divisions, and the style of language is gradually developed from written language to spoken language. The continuous.Brown, LOB corpus of the language is not included in the colloquial language. It is precisely because of this, it has a demonstration effect on the classification of the written language. Good representativeness. However, the stratification standard of the corpus is not the only classification principle, such as the BNC corpus, the LLELC corpus, the MCLC corpus and the discipline division standards, such as applied science, social science, arts and so on. Further studies have found that these two classes of stratification standards are not isolated, learned and the in ICE-GB. The sub branches of the classification follow the Social Sciences, natural sciences, which confirms that the corpus adopts two types of hierarchical models simultaneously.
The above two types of stratified styles are a more common corpus arrangement strategy. In this study, the structure of the self built English specialized knowledge corpus is taken as an example to explore its structure. First, it is based on the practice log data of English majors to analyze the profession and English use of the students. The need for English majors related knowledge. The study adopted an internship log of 2006 102 graduates. After statistics, 34 students did not engage in English related professions. According to the focus of each student's internship log, the remaining students' practice content mainly involved foreign trade English, English teaching, English translation, secretarial English, Mechanical English and other industries. According to the actual number of people in each industry, calculate the proportion of the corresponding, so as to draw the proportion of each level. Drawing on the subject stratification model, combined with industry statistics, the article gives a preliminary reference style of foreign trade, machinery, computer and teaching. Under each stratification, the example of foreign trade English is used in this article. Based on the results of corpus structure analysis, we attempt to explore how to divide and collect corpus.
In view of the structure analysis of the main western corpus, this article discusses the structure division of the corpus with an example. However, because of the time and the limited energy, there are still many problems to be perfected. Only 102 students' log can not effectively represent the domain of all English major related knowledge. For example, all the students may not be engaged in the law. The relevant English work, however, does not indicate that English major related knowledge is not included in legal English. Therefore, later research is still expected to be done. However, this article is intended to develop a new idea to provide a constructive reference for the self built corpus, especially the structure of a corpus. We hope that this article will benefit the corpus construction theory.

【学位授予单位】:江南大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:H08

【参考文献】

相关期刊论文 前10条

1 卫乃兴;李文中;濮建忠;;COLSEC语料库的设计原则与标注方法[J];当代语言学;2007年03期

2 顾曰国;语料库与语言研究——兼编者的话[J];当代语言学;1998年01期

3 丁信善;语料库语言学的发展及研究现状[J];当代语言学;1998年01期

4 王海华;高洋;尚晓华;;语料库语言学发展回顾及展望[J];大连海事大学学报(社会科学版);2009年03期

5 何安平;;口语语料库、平行语料库、学习者语料库——第23届国际语料库语言学年会ICAME2002综述[J];国外外语教学;2003年01期

6 陈建生;语料库语言学与英语教学[J];解放军外国语学院学报;2004年01期

7 谢家成;小型英汉平行语料库的建立与运用[J];解放军外国语学院学报;2004年03期

8 蒋林;金兵;;语料库翻译研究的代表性问题[J];中国科技翻译;2007年01期

9 谢徐萍;口语与书面语的关系探讨及其对英语教学的启示[J];南通大学学报(教育科学版);2005年02期

10 李德俊;;语料库的“代表性”问题及其对英汉翻译语料库建设的启示[J];外语研究;2007年05期



本文编号:1852719

资料下载
论文发表

本文链接:https://www.wllwen.com/wenyilunwen/hanyulw/1852719.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户6e825***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com