We thank the reviewers for their suggestions and comments. Since our previous submission, we have ran more experiments (with more datasets and longer maximum running times), and performed a slightly modification to the DFS initialization heuristic (in order to improve its performance). This new version of the manuscript contains the modifications suggested by reviewers, and the results with an extended collection of data sets; it also contains the modified version of the DFS heuristic. In particular, we have:
>> Move the definition of parental graph from Sec. 2 to Sec. 5, where it is required.
>> Modified Subsection 5.1. to describe the new slightly modified form of the DFS-based informed heuristic (which performed better than the previous form)
>> Improved the explanation of the FAS-based heuristic based on Figure 2
>> Included more experiments with 20 data sets and 2 minute max for parent set selection per variable.

Below, you can find our response to the specific comments made by reviewers. The original reviews are preceded with - and our reply is preceded by >>.

------------------------------------------------------
Reviewer A
------------------------------------------------------

- The information contained in the abstract is rather vague. With few more sentences the authors could convey more precise information about what is the characteristic feature of the "informed heuristic" they propose.

>> We extended the abstract with more precise information about our contributions.

- Although the presentation to the problem given in the introduction is in general good. There are some statements that may be misleading. For instance: "An effective approach for BN learning is to resort to local search methods that find an approximate solution". This is true but does not reflect the whole picture. Finding approximate solutions to the BN learning problem is a must since the exact problem is prohibitive. But local methods are only one of the possibilities for finding good BN approximations. There is an impressive amount of work on population-based heuristics for learning BNs (see survey at the end of this review) including evolutionary algorithms for finding the best ordering of the nodes. In their analysis, authors should separate the question of finding an approximate solution from the question of the class of optimizers to use. It is fair to focus on local methods for learning the BN structure but some motivation should be given to the choice of selecting local methods over population based methods.

>> We have rephrased that paragraph, to give credit to other approaches such as evolutionary algorithms, systematic search and mathematical programming; we note however that to our knowledge local search is still the only approach that has been shown to be able to cope with large domains (say, over 1000 variables).

- Some other statements in the introductions could be more precise, e.g. "...show that our heuristics significantly improve the quality of order-based local search ", which kind of order-based local search? State of the art order-based local search?

>> By order-based local search we mean Tessier and Koller's approach, which continues to be state-of-the-art for large domains. We have stressed that in the text.

- One of the weakest parts of the paper is the few attention given to the solution of the min-cost FAS problem, which the authors acknowledge is NP-hard. In Page 9, it is said that the there are efficient and effective approximations (no reference is given) and present the Minimum Cost FAS approximation (Algorithm 4). However, no discussion is presented about this algorithm, its scalability, its limits, etc. Since this is an essential building block for the second heuristic, more details and a critical analysis of the real capabilities of this algorithm to solve the min-cost FAS problem should be given. References on previous uses of the algorithm on large problems will enhance this part of the paper.

>> We have included references to the literature on minimum cost FAS, and we have mentioned its complexity.

- The statistical analysis of the experiments should be improved for the paper being considered for publication.  Using alpha=0.10 seems a very odd choice for testing for significance. Usual value is alpha=0.05, and more rigorous testing requires alpha=0.01. Furthermore, presenting the values of the statistics (Q) instead of directly presenting the corresponding p-values obtained from test is rather unusual and confusing. I recommend repeating the statistical analysis for a more strict value of alpha. Similarly for the post-hoc Nemenyi test, alpha=0.05 or alpha=0.01 are recommended. Furthermore, it is not clear if the critical distance includes a penalization for multiple comparison (similar to the Bonferroni test). Similar in the subsequent tests, instead of providing the Q values, provide the corresponding significance associated to these  Q values.


>> We adopted the significance level of 0.05 for deciding statistical validity, and we have reported p-values; for most cases, even a significance of 0.01 would not alter the conclusions.
>> The Friedman and Nemenyi tests already include corrections for multiple comparisons (e.g., this is the constant q_alpha in the Nemenyi test). One could additionally consider a Bonferroni correction (or some other more powerful correction) to account for the fact that we ran 4 comparisons -- this would alter only marginally our conclusions (as it would be equivalent to adopt a significance level of 0.05/4); but we understand that the first 3 tests are actually evaluating the parent set selection (not our contribution), whereas the last experiment (which aggregates all parent set selection approaches) is the one evaluating the initialization heuristics; thus we do not apply a correction for this part.
>> We have updated the visualizations of the Nemenyi test graphics for alpha = 0.05. Note that it is possible that the Nemenyi test rejects even when the Friedman test accepts (this happens in the sequential parent set selection experiment).

- In the conclusions, further emphasize whether the dimension of the problems addressed (from 30 to 294 variables) is considered significant for current state of the art BN learning algorithms. In general, the relevance of the work should be emphasized in this part of the paper.

>> We now include data sets with domains raning from 16 to 1556 variables; this is one of the largest empirical comparisons of which we are aware. We included a comment about that in the conclusions.

- Some typos:
	Page 2: alleviate solves
	Page 2: with a recap
	page 4: by aN enumerative
	Page 4: These hardness results LEADS
	Page 4: results motivate discussed
	Page 9: Intitialization
	Page 10: ...so do the differences between D (Is something missing here?)
	Page 10: An special case --> a special case
	Page 13: aggregate analysis showS

>> We corrected the typos.


- Reference:
	"Larrañaga, Pedro; Karshenas, Hossein; Bielza, Concha; Santana, Roberto; ",A review on evolutionary algorithms in Bayesian network learning and  inference tasks,Information Sciences,233,,109-125,2013,Elsevier

>> We included a comment about evolutionary algorithms, and added the above reference in the introduction.

------------------------------------------------------
							Reviewer D
------------------------------------------------------

Authors can improve their paper by:
- correcting minor writting problems;
- Evaluating their approach using metrics and databases used by other authors in similar problems

>> We used the sama real-world data sets and score function used in a previous study on local search algorithms publised at NIPS 2015 (one of the top conferences in the field). Our evaluation methodology is also standard in machine learning (see Demsar 2006). We have note these facts in the text.

- Authors need to compare their results with other similar works, such as:
	Pedro Larrafiaga, Mike Poza, Yosu Yurramendi, Roberto H. Murga, and Cindy M.H. Kuijpers Structure Learning of Bayesian Networks by Genetic Algorithms: Performance Analysis of Control Parameters. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 18, NO. 9, SEPTEMBER 1996

>> We have better justified the use of local search, and cited the work above. As far as we know, genetic algorithms for structure learning do not scale for domains with thousands of variables; at least no work which we know have shown results for such domains. We note that developing genetic algorithms is far from trivial, and that there are no available packages which we could use.

- There are some typos and ortographic issues, such as missing prepositions.


------------------------------------------------------
Reviewer E
------------------------------------------------------

- The recommendations are included as comments in the uploaded file.

>> We implemented the suggestions of the reviewer. In particular, we swaped the last two paragraphs of the introduction, we included a comment on small/large datasets, and corrected the spotted typos. The differences with respect to the KDMILE paper were noted in the introduction; we did not repreat them in the conclusions due to the limited space.