About

Researchers of various research areas (e.g., defect prediction, sentiment mining, developer social networks) analyze software projects to develop new ideas or validate their assumptions about software development processes. Such research consists of two fundamental steps:

  1. the collection of project data, including pre-processing steps, the synthesis of intermediate results and;
  2. the analysis of this data.

SmartSHARK is an ecosystem dedicated to mining software repository data that integrates these two tasks into a single environment. The concept behind SmartSHARK is to develop various tools for the collection, processing, and analysis of data around a single data base. All collected data, intermediary results, and analysis results are stored in a central location.

The first major advantage of this approach are powerful synergy effects between the different tools in the SmartSHARK ecosystem. For example, the results a source code based refactoring detection tool can be used to improve the labeling of commits, which is less accurate if only performed using the commit messages.

The second major advantage of SmartSHARK is a very good replicablity of the work performed within the ecosystem. The complete software stack is available as well-documented open source software. All tools for the experimental pipeline are available and shared with the community. Moreover, the plug-in system of SmartSHARK allows easy extensions not only by ourselves, but also other researchers. Since the plug-ins are in essence command line tools, they can also be used outside the SmartSHARK ecosystem, further enhancing the re-usability by other researchers.

SmartSHARK is actively developed by the AI Engineering group from the Institute of Software and Systems Engineering at the TU Clausthal and the Software Engineering for Distributed Systems group from the Institute of Computer Science at the Georg-August-University Göttingen.