Hadoop环境下基于Lick线指数的恒星光谱分类与参数测量
发布时间:2018-09-18 15:50
【摘要】:天体光谱中蕴含着非常丰富的天体物理信息,通过对光谱的分析,可以得到天体的物理信息、化学成分以及天体的大气参数等。随着LAMOST、SDSS等大规模巡天望远镜的实施,将会产生海量的光谱数据,尤其是LAMOST正式运行后,每个观测夜产生大约2~4万条光谱数据。如此海量的光谱数据对光谱的快速有效的处理提出了更高的要求。本课题就是在此背景下提出来的,目标是研究海量恒星光谱的自动处理技术。对恒星光谱数据的自动处理技术可以分为两类:一类是恒星光谱的自动分类技术,另一类是恒星大气物理参数的自动测量技术。根据光谱的谱线与连续谱的相对强度以及光谱的其他特征,将恒星分为O型、B型、A型、F型、G型、K型、M型七大类。恒星连续谱的分布以及谱线的轮廓是由恒星大气内的物理因素决定的,恒星大气最基本的物理参数包括表面有效温度(Teff)、表面重力加速度(log g)、化学丰度(Fe/H)。目前利用光谱的波长和流量信息对光谱进行分类以及参数测量的方法比较多,但是光谱数据的维数很高,往往需要经过归一化、降维等一系列的预处理,运算量非常大。本文研究了基于Lick线指数进行光谱分类及大气参数测量的方法,针对海量光谱的情况,基于Hadoop平台实现了Lick线指数的计算,以及利用贝叶斯决策进行光谱分类的方法。利用Hadoop HDFS高吞吐率和高容错性的特点,结合Hadoop MapReduce编程模型的并行优势,提高了对大规模光谱数据的分析和处理效率。本文的创新点为:1.以Lick线指数作为特征,基于贝叶斯算法实现恒星光谱分类,基于核偏最小二乘回归方法实现恒星大气参数的测量;2.基于Hadoop MapReduce分布式计算框架实现Lick线指数的并行计算以及贝叶斯分类过程的并行化。
[Abstract]:The spectrum of celestial bodies contains a lot of astrophysical information. Through the analysis of the spectra, the physical information, chemical composition and atmospheric parameters of celestial bodies can be obtained. With the implementation of LAMOST,SDSS and other large-scale survey telescopes, massive spectral data will be generated, especially after the LAMOST is officially put into operation, and about 24,000 spectral data will be generated in each observation night. Such a large amount of spectral data put forward a higher demand for fast and effective processing of spectrum. The aim of this paper is to study the automatic processing technology of massive star spectrum. The automatic processing of stellar spectral data can be divided into two categories: one is the automatic classification of stellar spectrum and the other is the automatic measurement of stellar atmospheric physical parameters. According to the relative intensity of spectral line and continuous spectrum and other characteristics of spectrum, stars are classified into seven types: O type, B type, A type, F type, G type, K type and M type. The distribution of the stellar continuous spectrum and the profile of the spectral lines are determined by the physical factors in the stellar atmosphere. The most basic physical parameters of the stellar atmosphere include the surface effective temperature (Teff), surface gravity acceleration (log g), chemical abundance (Fe/H). At present, there are many methods to classify and measure spectrum by wavelength and flow information, but the dimension of spectral data is very high, which often needs a series of preprocessing, such as normalization, dimensionality reduction and so on. In this paper, the methods of spectral classification and atmospheric parameter measurement based on Lick line index are studied. The calculation of Lick line index based on Hadoop platform and the method of spectrum classification based on Bayesian decision are realized based on the massive spectrum. Based on the characteristics of high throughput and fault tolerance of Hadoop HDFS and the parallel advantages of Hadoop MapReduce programming model, the efficiency of analyzing and processing large-scale spectral data is improved. The innovation of this paper is: 1. The spectral classification of stars is realized based on Bayesian algorithm with Lick line exponent as the feature, and the atmospheric parameters of stars are measured by kernel partial least square regression method. The parallel computation of Lick line exponent and the parallelization of Bayesian classification process are realized based on Hadoop MapReduce distributed computing framework.
【学位授予单位】:山东大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:P144
本文编号:2248386
[Abstract]:The spectrum of celestial bodies contains a lot of astrophysical information. Through the analysis of the spectra, the physical information, chemical composition and atmospheric parameters of celestial bodies can be obtained. With the implementation of LAMOST,SDSS and other large-scale survey telescopes, massive spectral data will be generated, especially after the LAMOST is officially put into operation, and about 24,000 spectral data will be generated in each observation night. Such a large amount of spectral data put forward a higher demand for fast and effective processing of spectrum. The aim of this paper is to study the automatic processing technology of massive star spectrum. The automatic processing of stellar spectral data can be divided into two categories: one is the automatic classification of stellar spectrum and the other is the automatic measurement of stellar atmospheric physical parameters. According to the relative intensity of spectral line and continuous spectrum and other characteristics of spectrum, stars are classified into seven types: O type, B type, A type, F type, G type, K type and M type. The distribution of the stellar continuous spectrum and the profile of the spectral lines are determined by the physical factors in the stellar atmosphere. The most basic physical parameters of the stellar atmosphere include the surface effective temperature (Teff), surface gravity acceleration (log g), chemical abundance (Fe/H). At present, there are many methods to classify and measure spectrum by wavelength and flow information, but the dimension of spectral data is very high, which often needs a series of preprocessing, such as normalization, dimensionality reduction and so on. In this paper, the methods of spectral classification and atmospheric parameter measurement based on Lick line index are studied. The calculation of Lick line index based on Hadoop platform and the method of spectrum classification based on Bayesian decision are realized based on the massive spectrum. Based on the characteristics of high throughput and fault tolerance of Hadoop HDFS and the parallel advantages of Hadoop MapReduce programming model, the efficiency of analyzing and processing large-scale spectral data is improved. The innovation of this paper is: 1. The spectral classification of stars is realized based on Bayesian algorithm with Lick line exponent as the feature, and the atmospheric parameters of stars are measured by kernel partial least square regression method. The parallel computation of Lick line exponent and the parallelization of Bayesian classification process are realized based on Hadoop MapReduce distributed computing framework.
【学位授予单位】:山东大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:P144
【参考文献】
相关期刊论文 前3条
1 李乡儒;刘中田;胡占义;吴福朝;赵永恒;;巡天光谱分类前的预处理——流量标准化[J];光谱学与光谱分析;2007年07期
2 张健楠;吴福朝;罗阿理;;核回归方法在恒星光谱物理参量自动估计中的应用[J];光谱学与光谱分析;2009年04期
3 刘杰;潘景昌;韦鹏;刘猛;罗阿理;;基于光谱相似度的恒星大气参数自动测量方法[J];光谱学与光谱分析;2012年12期
,本文编号:2248386
本文链接:https://www.wllwen.com/kejilunwen/tianwen/2248386.html