稀疏矩阵插补及在大型问卷调查中的应用研究
[Abstract]:Since 2012, the word "big data" has appeared more and more in people's life. In work and study,.IBM has conducted a study. The results show that 90% of all the data in our human world have been produced in the past two years from ancient times to the present, and it is expected that the amount of data in the whole human category may reach the target after 2020. 44 times the amount of previous data. Incomplete data is inevitable in the process of generating and expanding a large number of data, and the missing values in incomplete data often have a significant impact on the availability of data. The evaluation system of the network shopping platform plays a great role in collecting a large number of incomplete data. Consumers are all evaluating what they have bought. The scoring system of the online shopping platform can collect all the scoring data into a matrix with a large number of missing values. We call it a "sparse matrix". If some consumers buy a commodity but do not evaluate the commodity, it will improve the sparse moment. Based on the data structure obtained by the network shopping platform scoring system and the film evaluation system of Netflix online film leasing company in the United States, this paper is not difficult to find out that the simple small sample survey can not meet the current social reality. As a result, a new breakthrough is needed both in the size of the questionnaire and in the size of the sample. In the past, the past practice usually gives the respondents a reward or feedback to obtain the cooperation of the respondents. The method not only needs a certain guarantee in human, material and financial resources. The quality of the questionnaire data is not guaranteed. In this paper, the questionnaire segmentation method is used to divide the large questionnaire in the survey into a number of small questionnaires according to the correlation between the questions and the questions. In the course of the investigation, a small number of small questionnaires are randomly selected from each of the respondents. After sorting out the survey data, a sparse matrix with a large number of missing values is finally obtained. Then the sparse matrix is interpolated with the missing value interpolation to obtain the complete data. Two interpolations are taken by random number interpolation and multiple logic model interpolation, and the corresponding conclusions are obtained by comparing the results of interpolation. The data of this paper are derived from the simulation of R-Studio software because of human and time constraints. First, R-Studio software is used to generate analog data, because the data each respondents answered is "unit". "As a unit, therefore, in the process of missing data, we have to realize the missing block, that is, the missing unit, and each of the investigators in the final sparse matrix answers the problem of the number of specific units. Secondly, the problem is used by different respondents as a riveting problem, and the correlation between the respondents in answering the same question is calculated. In the end, the data from the interpolation are compared with the original data to verify the feasibility and accuracy of the questionnaire segmentation method and the interpolation method used in this paper. Because the data used in this paper are simulated by R-Studio software, it has a certain idealization hypothesis in theory. The unit number of each respondents' answers to the questionnaire can be controlled artificially during the investigation, but the respondents' answer to each unit's problem data needs to be assumed to be internal, that is, the whole data matrix has only "unit" missing and no individual missing. The full text includes five chapters. Chapter 1 introduces the basic content of the article. Including the background and purpose of research, literature review, research methods and the innovation of the paper; the second chapter is the introduction of the missing data processing methods, and expounds the methods and simple concepts used by scholars in the absence of data interpolation in recent years. The third chapter, as the core content of this paper, is from easy to difficult, from data generation to missing, In the fourth chapter, the fourth chapter uses the contents of the third chapters and the large sparse matrix generated by the software of the third chapter to verify the feasibility and accuracy of the theory and method of this paper; Fifth Chapter one is the summary of the whole paper and the prospects for the development of the research content in this paper. At the same time, the paper puts forward the improvement methods for the deficiencies of this paper.
【学位授予单位】:河北经贸大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:O151.21
【相似文献】
相关期刊论文 前10条
1 周家斌;一种气象资料插补方法[J];科学通报;1987年15期
2 张时钊;;气象哨温度资料的插补[J];陕西气象;1981年08期
3 曹宗智;利用电子计算机实现水文资料的自动插补[J];干旱区地理;1987年04期
4 蒋勇敏,邱士安;无误差插补方法初探[J];机械;2000年S1期
5 乔丽华;傅德印;;缺失数据的多重插补方法[J];统计教育;2006年12期
6 杨伟东;朱红春;刘丽冰;;计算机数据课程中插补原理教学方法的探讨[J];实验室科学;2009年02期
7 屠其璞;一种气温场序列的延长插补方法[J];南京气象学院学报;1986年01期
8 黄蓉;胡泽勇;关婷;孙根厚;杨耀先;刘火霖;;藏北高原气温资料插补及其变化的初步分析[J];高原气象;2014年03期
9 庞新生;;分层随机抽样条件下缺失数据的多重插补方法[J];统计与信息论坛;2009年05期
10 杨军;赵宇;丁文兴;;抽样调查中缺失数据的插补方法[J];数理统计与管理;2008年05期
相关会议论文 前7条
1 余予;李俊;任芝花;张志富;;标准序列法在日平均气温缺测数据插补中的应用[A];第八届全国优秀青年气象科技工作者学术研讨会论文汇编[C];2014年
2 吕强;;编写数控车、铣床加工多边形插补程序的方法[A];数控技术学术研讨会论文集[C];1999年
3 安金刚;;离线插补技术在运动控制中的应用[A];全国第十二届空间及运动体控制技术学术会议论文集[C];2006年
4 郑金兴;张铭钧;孟庆鑫;;变插补周期的数控进给速度控制算法研究[A];先进制造技术论坛暨第五届制造业自动化与信息化技术交流会论文集[C];2006年
5 谷永山;王锐;韦穗;;基于两幅视图的纵向插补方法[A];第十五届全国图象图形学学术会议论文集[C];2010年
6 宋琦;陈璞;;稀疏求解—结构修改的一种新的可能性[A];北京力学会第20届学术年会论文集[C];2014年
7 徐道远;王宝庭;王向东;冯伯林;;求解大型稀疏矩阵的ICCG法[A];第八届全国结构工程学术会议论文集(第Ⅰ卷)[C];1999年
相关博士学位论文 前9条
1 王允森;基于样条插补的高质量加工关键技术的研究[D];中国科学院研究生院(沈阳计算技术研究所);2015年
2 金永乔;微小线段高速加工的轨迹优化建模及前瞻插补技术研究[D];上海交通大学;2015年
3 叶伟;数控系统纳米插补及控制研究[D];北京交通大学;2010年
4 梅鹏;中国群死群伤火灾数据插补及快速损失评估研究[D];中国科学技术大学;2013年
5 孟书云;高精度开放式数控系统复杂曲线曲面插补关键技术研究[D];南京航空航天大学;2006年
6 刘巍;ARGO稀损数据插补与三维海洋要素场重构研究[D];西南交通大学;2012年
7 郭松;面向稀疏矩阵运算的异构并行算法研究[D];国防科学技术大学;2015年
8 周勇;高速进给驱动系统动态特性分析及其运动控制研究[D];华中科技大学;2008年
9 郝永江;复杂参数曲线曲面加工控制与状态监测技术研究[D];天津大学;2011年
相关硕士学位论文 前10条
1 刘艳玲;调查数据无回答的插补方法及模拟比较[D];天津财经大学;2012年
2 余威;气象相似性网络构建及缺失气象要素数据的插补[D];西南大学;2015年
3 李玲雪;缺失偏态数据下异方差模型的统计推断[D];昆明理工大学;2015年
4 李永杰;基于PH曲线五轴数控插补策略的研究[D];辽宁科技大学;2015年
5 赵伟;针对回归模型的缺失数据插补方法模拟分析[D];天津财经大学;2014年
6 骆新珍;基于DA插补法的线性回归模型系数估计量的模拟研究[D];天津财经大学;2014年
7 肖哲;基于STM32的嵌入式数控插补控制器的研究与实现[D];湖北工业大学;2016年
8 李珍;不完全测量信息系统的辨识研究[D];安徽工程大学;2016年
9 纪忠光;缺失数据的非参数插补[D];华中师范大学;2016年
10 杨晓倩;缺失数据插补方法的选择研究[D];兰州财经大学;2016年
,本文编号:2160108
本文链接:https://www.wllwen.com/kejilunwen/yysx/2160108.html