当前位置:主页 > 管理论文 > 财税论文 >

基于有向图的虚开增值税发票行为检测方法研究

发布时间:2018-06-27 07:58

  本文选题:虚开增值税发票行为 + 有向图 ; 参考:《西安科技大学》2017年硕士论文


【摘要】:在侦破虚开增值税发票案件的实践中,从资金交易流中找到资金环路是检测虚开增值税发票行为的关键环节。资金交易流可以抽象成由顶点和边组成的有向图,因此研究基于有向图的虚开增值税发票行为检测方法是非常必要的。本文首先研究了虚开增值税发票行为中的增值税发票环路虚开行为表象与检测方法,将从资金交易流中搜索资金环路问题抽象成对具有多重边的有向图环路搜索问题,其次研究了有向图的存储以及图的环路搜索常用算法,并对算法的适用性进行比较,发现常用的环路搜索算法均不能很好地解决具有多重边的有向图环路搜索问题,因此,提出了基于深度优先的有向图环路搜索算法。先对原始数据预处理,将资金交易流和账户信息使用账号编号关联起来,找出每个顶点的边集合并按交易时间升序排列,避免在运算过程中多次遍历所有边,通过约束边的权值来避免遍历无效边,搜索过程采用逐层退回的方式来保证能遍历到有效的多重边且能找到有向图中的所有环路,并在单机上验证了算法的正确性。但是当有向图的边数增大到100万条以上时,算法执行会发生内存溢出,从而无法得到结果。针对该问题又提出了一种基于Spark的有向图环路搜索算法,先将原始数据抽象成Spark中GraphX图数据结构,然后利用数据缓存、尾递归等方法并使用GraphX框架及Scala语言在本地模式下验证了算法的正确性,但是在相同的数据集下该算法时间效率明显低于最初提出的算法。为了提高该算法的执行效率,通过优化循环、建立索引等方法对其进行优化,在本地模式下验证后,使用边数为10000条以上的图数据集在Spark集群下实验,结果表明,优化后的算法不仅能够解决具有多重边的大规模有向图环路搜索问题,具有良好的扩展性,而且执行效率得到明显提高。最后,本文设计并实现了界面友好且可操作性强的虚开增值税发票行为检测系统。经测试,该系统能够从资金交易流中搜索出资金环路并能够查询出环路关联账户的详细信息,为稽查部门检测虚开增值税发票行为提供了重要的依据,节约了稽查成本。
[Abstract]:In the practice of solving the case of false VAT invoice, it is the key link to detect the false VAT invoice behavior to find the capital loop from the fund transaction flow. The flow of funds can be abstracted into a directed graph composed of vertex and edge, so it is necessary to study the method of detecting the behavior of false VAT invoice based on digraph. In this paper, we first study the behavior representation and detection method of Value-added tax (VAT) invoice Loop in the behavior of false VAT invoicing, and abstract the problem of searching the Loop of Capital from the flow of funds into a pair of directed Graph Loop search problem with multiple edges. Secondly, the storage of directed graph and the loop search algorithm of graph are studied, and the applicability of the algorithm is compared. It is found that the commonly used loop search algorithm can not solve the loop search problem of directed graph with multiple edges. An algorithm for searching the loop of directed graph based on depth first is proposed. First of all, preprocess the raw data, associate the fund flow and account information with account number, find out the edge set of each vertex and arrange it in ascending order according to the transaction time, avoid traversing all sides many times in the course of operation. The weights of the constrained edges are used to avoid traversing the invalid edges. The search process adopts the method of layer by layer regression to ensure that all loops in the directed graph can be traversed effectively and all the loops in the directed graph can be found, and the correctness of the algorithm is verified on a single machine. However, when the number of edges of directed graph increases to more than 1 million, memory overflow will occur in the execution of the algorithm, so the result can not be obtained. In order to solve this problem, a new algorithm of directed graph loop search based on Spark is proposed. The original data is abstracted into GraphX graph data structure in Spark, and then the data cache is used. The algorithm is proved to be correct in local mode by using GraphX framework and Scala language, but the time efficiency of the algorithm is obviously lower than that of the original algorithm under the same data set. In order to improve the efficiency of the algorithm, the algorithm is optimized by optimizing the loop and building the index. After being verified in the local mode, the experiment is carried out in Spark cluster by using the graph data set with more than 10000 edges. The optimized algorithm can not only solve the large scale directed graph loop search problem with multiple edges, but also improve the efficiency of execution. Finally, this paper designs and implements a friendly and operable VAT invoice detection system. The test shows that the system can search out the capital loop from the fund transaction flow and inquire the detailed information of the related account of the loop, which provides an important basis for the audit department to detect the false issuance of VAT invoices and saves the cost of audit.
【学位授予单位】:西安科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F812.42;O157.5

【参考文献】

相关期刊论文 前10条

1 崔淑芬;万建国;;治理“虚开增值税发票罪”应与时俱进[J];新理财;2016年12期

2 胡巍;;增值税专用发票“环开”的危害与防范[J];税务研究;2016年09期

3 侯斌;涂红兵;王云福;;基于CXF框架的可配置化企业内容管理平台Web服务[J];电信科学;2016年05期

4 秦昆;朱文武;高艳龙;张俊青;宋雯;彭扬;李方舟;;最小独立闭合环深度优先算法的一点改进[J];测绘科学技术学报;2015年06期

5 王诏远;王宏杰;邢焕来;李天瑞;;基于Spark的蚁群优化算法[J];计算机应用;2015年10期

6 马洪磊;刘成龙;余乐义;孟凡超;;一种高效的最小独立闭合环自动搜索算法[J];测绘工程;2014年08期

7 欧龙;;一种新的闭合环自动搜索算法[J];柳州师专学报;2014年01期

8 陈涛;秦天亦;陈晓森;王华;;最小独立闭合环搜索算法在探测InSAR相位解缠误差中的应用[J];大地测量与地球动力学;2013年04期

9 史青;王子平;李朝柱;栗红宇;;生成树算法在最小独立闭合环搜索中的应用[J];测绘地理信息;2013年01期

10 李靖;朱丽强;;改进的最小独立闭合环剥蚀搜索算法研究[J];城市勘测;2012年04期

相关硕士学位论文 前8条

1 邴胜男;黑龙江国税增值税发票管控系统的设计与实现[D];哈尔滨工业大学;2015年

2 李文栋;基于Spark的大数据挖掘技术的研究与实现[D];山东大学;2015年

3 梁彦;基于分布式平台Spark和YARN的数据挖掘算法的并行化研究[D];中山大学;2014年

4 郭启文;基于数据仓库的增值税监控分析系统设计[D];山东大学;2014年

5 陈美先;图算法的并行化[D];上海交通大学;2013年

6 郑曼;基于SAS系统的税务数据挖掘研究[D];郑州大学;2012年

7 张敏;云计算环境下的并行数据挖掘策略研究[D];南京邮电大学;2011年

8 孙良清;跨国企业增值税发票处理系统的设计与实现[D];北京邮电大学;2008年



本文编号:2073231

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/shuishoucaizhenglunwen/2073231.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e0d8e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com