Please find below the answers to the reviewers' comments.

Reviewer A:

1. In algorithm 1 line 12, the method C.getDuration returns a list of durations for each POI type and activity extracted by census data. I wonder whether
census data could be an adequate source for this kind of information. Usually these data are not very detailed.
Answer: We tried approaches using the tweets to extract the duration from the activities. We tried to consider the sequence of tweets performing the activity and considering as duration the difference between the post times of the tweets. However, there were not enough tweets from the same accounts in the dataset. Then we chose to use the American Time Use Survey (https://www.bls.gov/tus/), where respondents filled a questionary about the daily performed activities, including their duration. Even though the activities are performed in a different country, where the habits of the people may be different, we found similar durations for common activities.

2. In algorithm 2 line 1, in order to compute sub-stops the CB-SMoT algorithm is used but in the previous page (page 7 line 1) the authors state that
"such a method is not able to distinguish different sub-stops inside a single place". This seems a contradiction: the authors should explain how they 
apply CB-SMoT and why it is adequate. Moreover, I suggest the authors to improve the explanation of lines 7-9.
Answer: We clarified the approach in the text.

3. Functions getFrequency, getSimTime, getSimDuration, getRankedActivities returns SETS of values since they apply TimeSim, DurationSim and ModelSim to
each activity performed in the given POI type. This is not clear from the description of the algorithm. 
Answer: We changed the description of the algorithm to state that these are indeed sets. 

4. As far as the complexity of algorithm 2 is concerned, in my opinion it is O(n_s * n^2_{sub}) instead of  O(n_s + n^2_{sub}) since for each stop,
sub-stops are considered. 
Answer: We considered that n_sub was the total number of sub-stops, therefore the plus sign. We changed the text to n_sub reflect the number of sub-stops at each stop.

5. Furthermore, the complexity of CB-SMoT should be reported: it is the first step of the algorithm T-Activity and it could impact on the overall complexity.
Answer: We clarified the approach in the text and added the complexity of the algorithm that was used to find the sub-stops.

6. Section 4.3 is difficult to understand. In Definition 4.3 functions R_f, R_d and R_a are not explained and it is not clear how they are computed.
Answer: We changed the examples to better explain the section*. We changed the text to clarify how function R_f, R_d and R_a are computed.

7. Algorithm 3, Filter Encounters, uses a data structure FilteredEncounters that is strange. It is initialized by an empty list (line 1) but then it is
used as an indexed structure (line 8, 10, 12, 17). In my opinion this algorithm should be rewritten in a more formal way. It is really difficult
to understand what it computes.
Answer: The algorithm receives as input an encounter and outputs groups of individuals that have a strong relationship, defined by the connections in the graph. Indeed, it is a list that indexes the groups. Each element of this list is a list of participants. We detailed this in the pseudo-code.

8. I suggest the authors to add a simple example in order to clarify the result of the computation. 
Answer: We changed the description of the algorithm to reflect the change in question 7, and we added a textual example as asked.

9. Algorithm 4, G-Activity, is not easy to understand. I suggest the authors to try to improve also the description of this algorithm, and
provide a simple example that can be of great help for the reader.
Answer: We changed the description of the algorithm and added an example to make it understandable .

10. Additionally, please discuss the relation with the previous paper  "Lucas Andre de Alencar, Luis Otávio Alvares, Chiara Renso, Alessandra
Raffaetà, Vania Bogorny: A Rule-based Method for Discovering Trajectory Profiles. SEKE 2015: 244-249" where POI type profiles were also used. - Fazer, 1 parágrafo nos trabalhos relacionados.
Answer: In this article, there are no poi type profiles, but profile rules for POI types that are used to infer visits and possible profiles of users that visit these places. Also, that approach does not involve activities. We added a reference to this work in the introduction to briefly explain what it does.

Reviewer B:

I only have few comments as a suggestion to improve the paper:

1. At page 2 the sentence: "Large sets of semantic trajectories in conceptual models such as Constant and Baquara enable the derivation of more
information, such as the frequency of the visits to each POI, and the sequence of visited POIs. ": I disagree, CONSTANT e BAQUARA enrich the
trajectory data with semantic information, not with the frequency or the sequence of POIs, I think this sentence should be rephrased
R: We changed the text to reflect the reviewer’s suggestion.

2. Page 2 later on the sentence: "Furthermore, there are different ways to identify group activities, such as a group of individuals who do not now
each other but by chance are performing the same activity at the same place, or a group of individuals that know each other and are performing an
activity together. " It is confusing please rephrase giving a precise definition of what you intend for group activity.
R: Here the ideia is not to define group activities, but to show that the concept can be interpreted in different ways. In section 4.3 we added a definition for group activity.

3. Pg 2. later on the sentence:"(i) we build a knowledge base with POI type profiles based on activities observed in tweets sent from each POI type; " I
disagree that the knowledge base is a contribution of the paper, it is part of the method you propose to infer the activities
R: We changed the text to reflect the reviewer’s suggestion.

4. I suggest, if possible, to make it public the datasets and the algorithms, this would be a great additional contribution of the paper encouraging many citations.
R: We will make the code (written in python and r), the ATUS dataset, and the pisa dataset available at Vania’s page. Unfortunately, we could not make the Florianópolis dataset public due to privacy reasons.

5. In the definition of semantic trajectory pg 4: I disagree with the definition of semantic trajectory. Not all the stops can be associated to a
POI, while you are assuming each stop is associated to a POI. I think you should clarify this difference in the definition. You can say that in your 
definition you consider ONLY the stops that have been associated to a POI, you should specify.
R: We changed the text before the definition of semantic trajectory to consider only the stops associated to POIs.

6. Pg 6 and 7 I got confused with the references for the stop and sub-stop computations: you say Palma with CB-SMOT compute stops but not sub-stops,
while Moreno compute sub-stops. But then I see only references to CB-SMOT. Please check that the references are correct and explain the difference
between the two methods. 
R: We checked the references, and gave better explanations about the algorithms that were used.

7. The group activity part lacks a definition of hat you intend by group activity. Figure 2 and 3 do not help. 
R: We provided a definition for group activity at the beginning of section 4.3.

8. The experimental evaluation is organized by datasets, not very intuitive, I would reorganize by problem solved (multiple activity or group activity). 
R: We restructured the section to describe the experiments by problem.

Reviewer C:

1. In the introduction, the authors should better describe their approach. The author mention "we present a solution to recognise activities ... in
their trajectories, based on tweets sent from the visited POIs." However, from this fragment, the reader would infer that trajectory data are built
from twitter data, which is not true. In addition, nothing is said in the introduction about the use of census data (ATUS).
R: We clarified the text and added information about the census data.

2. It is not clear how twitter messages are used... is the text considered to infer location?
R: We do not use the text to infer the location, as we already have the location from Foursquare POIs as the coordinates of the tweet, which we identify by the tweets which source atribute (https://dev.twitter.com/overview/api/tweets) is equal to foursquare. The text, along with other features is used to infer activities, as described at the beginning of section 5.1. 

3. Although the paper discusses related work, the comparison to existing work could be improved. A table comparing the works based on some selected
characteristics might be useful;
R: We tried to describe the related works in a short and precise way. As the paper is limited to 16 pages, and we already had to cut and rearrange the experiments, we did not include a table to compare the characteristics of the related work. However, a more detailed explanation can be found in (Beber, 2017). 

4. Equation (3): You should keep the same names for functions D and T as presented in previous equations. 
R: We corrected the functions D and T to DurationSim and TimeSim respectively.

5. Moreover, I am not sure that the use of frequency as it is proposed is the best solution. The authors should better justify this solution and the possible drawbacks.
R: Indeed, using the frequency can lead to low scores for unbalanced activities, but also considers the true distribution of the activities at a given POI type. We included a phrase in equation (4) to better explain the solution and drawbacks. We also added as a future work a change from frequency to a probability density function.

6. Equation (4): its components are not sufficiently described in the text.
R: We changed the text to better describe the components.

7. The authors should better provide more details on the process of building the knowledge base from twitter data.
R: We included more details in section 5.1 about the feature selection and the data splitting and evaluation.
Welcome to EditPad.org - your online plain text editor. Enter or paste your text here. To download and save it, click on the button below.