A novel technique for duplicate detection and classification of bug reports

Tao Zhang, Byungjeong Lee

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


Software products are increasingly complex, so it is becoming more difficult to find and correct bugs in large programs. Software developers rely on bug reports to fix bugs; thus, bug-tracking tools have been introduced to allow developers to upload, manage, and comment on bug reports to guide corrective software maintenance. However, the very high frequency of duplicate bug reports means that the triagers who help software developers in eliminating bugs must allocate large amounts of time and effort to the identification and analysis of these bug reports. In addition, classifying bug reports can help triagers arrange bugs in categories for the fixers who have more experience for resolving historical bugs in the same category. Unfortunately, due to a large number of submitted bug reports every day, the manual classification for these bug reports increases the triagers' workload. To resolve these problems, in this study, we develop a novel technique for automatic duplicate detection and classification of bug reports, which reduces the time and effort consumed by triagers for bug fixing. Our novel technique uses a support vector machine to check whether a new bug report is a duplicate. The concept profile is also used to classify the bug reports into related categories in a taxonomic tree. Finally, we conduct experiments that demonstrate the feasibility of our proposed approach using bug reports extracted from the large-scale open source project Mozilla.

Original languageEnglish
Pages (from-to)1756-1768
Number of pages13
JournalIEICE Transactions on Information and Systems
Issue number7
StatePublished - Jul 2014


  • Bug report classification
  • Concept profile
  • Duplicate detection
  • Software maintenance
  • Support vector machine


Dive into the research topics of 'A novel technique for duplicate detection and classification of bug reports'. Together they form a unique fingerprint.

Cite this