VERB Replication Package

the replication package for API Method Recommendation via Explicit Matching of Functionality Verb Phrases

View project on GitHub

Replication package for paper: API Method Recommendation via Explicit Matching of Functionality Verb Phrases

Open Source Python Library

We are building an open source python library, funcverbnet, to provide data and easy-to-use interfaces to our data and analysis technologies. Any suggestions and feedback are welcome!!!Please!!!


We call the functionality sentence f_sentence

A f_sentence can be classified into a functionality category (or f_category)

The f_sentence from a f_category share a set of phrase patterns (or p_pattern)

We conducted an empirical study to understand what f_verbs, p_pattern, and f_categories are present in API f_sentence. Specifically,we focus on answering the following research questions:

RQ1: What verbs are used in the API f_sentences
  1. sorted_verb_count.txt shows the all verbs we collect from f_sentences of Java and Android API methods. There are 931 verbs including the verb “return”, and 87 verbs(9.34%) of them appear in 80% f_sentences. 930 verbs excluding the verb “return”, and 115 verbs(12.37%) of them appear in 80% f_sentences.
  2. verb_count.xlsx includes the appearance count in f_sentences of all 931 verbs. Each row contains four parts. The first part is the verb, the second is the appearance count of this verb, the third is the index, and the last part is the log result of the count. Also, there are four figures that show the trend of the verb count’s change.
  3. Total f_sentences (54256). total_functionality_descriptions.csv contains 54,256 f_sentences filtered from Java and Android API methods. Each piece of data contains API’s qualified name and its f_sentence.
RQ2: What f_categories can the f_sentences be classified into?
  1. functionality_category.tsv shows the f_categories we summarized. It mainly includes four parts. The first column shows the total verbs of each f_category, the second column is the label verb of each f_category, the third is the number of verbs that each f_category contains and the last column shows three example f_sentences of each f_category.

    For example, the label verb of functionality category “cancel/deregister/undo/deny/unset/unschedule/unregister/unbind/uninitialize/unload/deselect/unlock/unblock” is “cancel”, it contains 13 verbs. [‘Cancels the speech recognition.’, ‘Instructs the WebView to cancel the authentication request.’, ‘Cancels the current editing session.’] are three example f_sentences of this category.

  2. annotation_website is a screenshot of our annotation website.

  3. 1,139 f_sentences with annotation for coding protocol definition phase.

  4. 13,635 f_sentences with annotation for coding protocol definition phase and annotation phase.

  5. 14,774 f_sentences with annotation for coding protocol definition phase and annotation phase. “single_description” is the sentences, “final_annotation_type” is a id representing the f_category.

RQ3: What p_patterns are used in the f_sentences from each f_category?
  1. p_pattern_for_category.txt shows all p_patterns used in each f_category.
  1. 5-fold cross validation accuracy shows the accuracy of each fold cross validation for category classification.

  2. poi_test_sentences shows the data for p_patterns evaluation in poi. The first column is the sentences. The second and third column are annotation f_category and annotation p_pattern. The fourth and fifth column are automatic f_category and automatic p_pattern. The last two column represent if the automatic f_category and p_pattern are correct.

  3. evaluation_questions_with_answers.xlsx shows the task questions and answers. Also the group which the task belong to is in the file at column. Task1.docx shows a template which participants need to fill in. Evaluation_Answer.xlsx shows the data that participants record in the tasks. the first column is the question’s stackoverflow url. The second column is question title. The third column is the standard answer. 4~15th column represents if the participant find the correct answer via the tools. Next to them are the total and average number of participants who find the answer using the tools.