A Vertex-extension based Algorithm for Frequent Pattern Mining from Graph Databases

Md Tanvir Alam; Chowdhury Farhan Ahmed; Md Samiullah

doi:https://doi.org/10.3329/dujase.v7i1.62887

Dhaka University Journal of Applied Science & Engineering

Issue: Vol. 7, No. 1, January 2022

Title:	A Vertex-extension based Algorithm for Frequent Pattern Mining from Graph Databases
Authors:	Md Tanvir Alam Department of Computer Science and Engineering, University of D haka, Bangladesh Chowdhury Farhan Ahmed Department of Computer Science and Engineering, University of D haka, Bangladesh Md Samiullah Department of Computer Science and Engineering, University of D haka, Bangladesh
DOI:	https://doi.org/10.3329/dujase.v7i1.62887 PDF
Keywords:	Graph mining, Knowledge graphs.
Abstract:	Frequent pattern mining is a core problem in data mining. Algorithms for frequent pattern mining have been proposed for itemsets, sequences, and graphs. However, existing graph mining frameworks follow an edge-growth approach to building patterns which limits many applications. Motivated by real-life problems, in this work, we define a novel graph mining framework that incorporates vertex-based extensions along with the edge-growth approach. We also propose an efficient algorithm for mining frequent subgraphs. To deal with the exploding search space, we introduce a canonical labeling technique for isomorphic candidates as well as downward closure property-based search space pruning. We present an experimental analysis of our algorithm on real-life benchmark graph datasets to demonstrate the performance in terms of runtime.
References:	H. Cheng, X. Yan, J. Han, and C.-W. Hsu, “Discriminative frequent pattern analysis for effective classification,” in 2007 IEEE 23rd international conference on data engineering. IEEE, pp. 716–725, 2007 M. T. Alam, C. F. Ahmed, M. Samiullah, and C. K. Leung, “Discriminating frequent pattern based supervised graph embedding for classification,” in Advances in Knowledge Discovery and Data Mining, pp. 16–28, 2021 J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” vol. 29, no. 2. ACM, pp. 1–12, 2000. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “Prefixspan: Mining sequential patterns efficiently by prefixprojected pattern growth,” in ICDE. IEEE, 2001. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tompkins, and E. Upfal, “The web as a graph,” in Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 1–10, 2000 L. Tang and H. Liu, “Graph mining applications to social network analysis,” in Managing and Mining Graph Data. Springer, pp. 487–513, 2010. X. Yan and J. Han, “gspan: graph-based substructure pattern mining,” in ICDM, 2002 M. Kuramochi and G. Karypis, “Frequent subgraph discovery,” in ICDM. IEEE, 2001 M. T. Alam, C. F. Ahmed, M. Samiullah, and C. K. Leung, “Mining frequent patterns from hypergraph databases,” in Advances in Knowledge Discovery and Data Mining, pp. 3–15, 2021 Q. Wang, Z. Mao, B. Wang, and L. Guo, “Knowledge graph embedding: A survey of approaches and applications,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 12, pp. 2724–2743, 2017 J. Pujara, H. Miao, L. Getoor, and W. Cohen, “Knowledge graph identification,” in International Semantic Web Conference. Springer, pp. 542–557, 2013 X. Chen, S. Jia, and Y. Xiang, “A review: Knowledge reasoning over knowledge graph,” Expert Systems with Applications, vol. 141, p. 112948, 2020 R. Srikant, Q. Vu, and R. Agrawal, “Mining association rules with item constraints.” in KDD, vol. 97, pp. 67–73, 1997. A. Inokuchi, T. Washio, and H. Motoda, “An apriori-based algorithm for mining frequent substructures from graph data,” in European conference on principles of data mining and knowledge discovery. Springer, 2000 N. Wale, I. A. Watson, and G. Karypis, “Comparison of descriptor spaces for chemical compound retrieval and classification,” Knowledge and Information Systems, vol. 14, no. 3, pp. 347–375, 2008 K. M. Borgwardt, C. S. Ong, S. Schönauer, S. Vishwanathan, A. J. Smola, and H.-P. Kriegel, “Protein function prediction via graph kernels,” Bioinformatics, vol. 21, no. suppl_1, pp. i47–i56, 2005 P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in ACM SIGKDD, pp. 1365–1374, 2015. A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch, “Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity,” Journal of medicinal chemistry, vol. 34, no. 2, pp. 786–797, 1991 H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, and C. Helma, “Statistical evaluation of the predictive toxicology challenge 2000–2001,” Bioinformatics, vol. 19, no. 10, pp. 1183–1193, 2003 S. Nijssen and J. N. Kok, “A quickstart in frequent structure mining can make a difference,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 647–652, 2004 B. Vo, D. Nguyen, and T.-L. Nguyen, “A parallel algorithm for frequent subgraph mining,” in Advanced Computational Methods for Knowledge Engineering. Springer, pp. 163–173, 2015. D. Nguyen, W. Luo, T. D. Nguyen, S. Venkatesh, and D. Phung, “Learning graph representation via frequent subgraphs,” in Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, pp. 306–314, 2018.