查字典论文网 >> 有效的不确定数据概率频繁项集挖掘算法

有效的不确定数据概率频繁项集挖掘算法

小编:

摘要:针对已有概率频繁项集挖掘算法采用模式增长的方式构建树时产生大量树节点,导致内存空间占用较大以及发现概率频繁项集效率低等问题,提出了改进的不确定数据频繁模式增长(PUFPGrowth)算法。该算法通过逐条读取不确定事务数据库中数据,构造类似频繁模式树(FPTree)的紧凑树结构,同时更新项头表中保存所有尾节点相同项集的期望值的动态数组。当所有事务数据插入到改进的不确定数据频繁模式树(PUFPTree)中以后,通过遍历数组得到所有的概率频繁项集。最后通过实验结果和理论分析表明:PUFPGrowth算法可以有效地发现概率频繁项集;与不确定数据频繁模式增长(UFGrowth)算法和压缩的不确定频繁模式挖掘(CUFPMine)算法相比,提出的PUFPGrowth算法能够提高不确定数据概率频繁项集挖掘的效率,并且减少了内存空间的使用。

关键词:数据挖掘;不确定数据;可能世界模型;概率频繁项集;频繁模式

中图分类号: TP301.6 文献标志码:A

英文摘要

Abstract:When using the way of pattern growth to construct tree structure, the exiting algorithms for mining probabilistic frequent itemsets suffer many problems, such as generating large number of tree nodes, occupying large memory space and having low efficiency. In order to solve these problems, a Progressive Uncertain Frequent Pattern Growth algorithm named PUFPGrowth was proposed. By the way of reading data in the uncertain database tuple by tuple, the proposed algorithm constructed tree structure as compact as Frequent Pattern Tree (FPTree) and updated dynamic array of expected value whose header table saved the same itemsets. When all transactions were inserted into the Progressive Uncertain Frequent Pattern tree (PUFPTree), all the probabilistic frequent itemsets could be mined by traversing the dynamic array. The experimental results and theoretical analysis show that PUFPGrowth algorithm can find the probabilistic frequent itemsets effectively. Compared with the Uncertain Frequent pattern Growth (UFGrowth) algorithm and Compressed Uncertain FrequentPattern Mine (CUFPMine) algorithm, the proposed PUFPGrowth algorithm can improve mining efficiency of probabilistic frequent itemsets on uncertain dataset and reduce memory usage to a certain degree.

英文关键词

Key words:data mining; uncertain data; possible world model; probabilistic frequent itemset; frequent pattern

0 引言

[9]CHUI CK, KAO B, HUNG E. Mining frequent itemsets from uncertain data [C]// PAKDD 2007: Proceedings of the 11th Pacific-

Asia conference on Advances in Knowledge Discovery and Data Mining, LNCS 4426. Berlin: Springer, 2007: 47-58.

热点推荐

上一篇:提高中学生物微课质量的几点思考

下一篇:如何对幼儿进行德育教育论文 幼儿园关于德育教育之类的论文

2023年高中生申请加入共青团申请书 加入共青团申请书(实用10篇) 证专业技能与职业素养专题报告数控 专业技能与素养(精选5篇)