|Title:||Data Mining Based Motif Detection In Biological Sequences|
|Keywords:||Motif Discovery, Frequent Pattern, PF-Growth, DNA, Protein, Bioinformatics|
This paper considers the problem of discovering motif in DNA and protein sequences. Motif finding problem has important applications in understanding gene regulation, protein family identification and determination of functionality and structurally important identities. Biological approaches for this problem are long-winded. complex and time-consuming. Here, we have developed a method based on data mining to detect frequent residue motifs. Our proposed method is based on FP-tree and FP-growth algorithms of frequent pattern mining techniques. The limitation of iterative nature of existing Apriori based method has been overcome in the developed PF-tree based method. Also we have developed a tool based on proposed method which can expeditiously detect novel motifs based on information content and shows better performance over the existing Apriori based method. Experimental results show that this new method successfully elucidates true motifs on real biological sequence datasets which support the effectiveness of the method.