ChatToSucceed/discussion.tex at master · ikaliam/ChatToSucceed · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
% !TEX root = thesis.tex
\startchapter{Discussion}
\label{chap:disc}
In this chapter we discuss the approach and how the research we presented in the last three chapters support it (Section~\ref{ch:dis:app}).
Followed by our second contribution which takes the form of the individual insights gained by the five studies we presented (Section~\ref{sec:cont:emp}).
Before we end this thesis with concluding remarks as well as some future work (Section~\ref{ch:dis:con}) we discuss the threats to validity (Section~\ref{sec:threat}).

\section{An Approach For Improving Social Interactions}
\label{ch:dis:app}
We derived the approach presented in Chapter~\ref{chap:approach} through two case studies that investigate the usefulness of social and socio-technical networks to predict build outcome (Chapters~\ref{chap:soc-net} and~\ref{chap:stc-net2}).
We conducted a study to see if the approach can generate relevant recommendations in Chapter~\ref{chap:soc-net}.
The studies we conducted in the subsequent Chapters~\ref{chap:talk} and~\ref{chap:actionable} further explores the usefulness of the information with respect to whether experts expect the level of recommendations to be of use as well as if these recommendations could be produced in real time and potentially prevent issues from arising.
The approach we presented in Chapter~\ref{chap:approach} consists of five steps:

\begin{enumerate}
\item Define scope of interest.
\item Define outcome metric.
\item Build social networks.
\item Build technical networks.
\item Generate actionable insights.
\end{enumerate}

In Chapter~\ref{chap:soc-net} we showed that the communication structure of a software development team influences build success, suggesting that there is value in manipulating this structure to improve the likelihood for a successful build.
That evidence is further supported by our finding that gaps in the social network constructed from developer communication as suggested by technical dependencies among developers also affect build success (Chapter~\ref{chap:stc-net2}).

In those two studies we already applied the first four steps of the approach presented in Chapter~\ref{chap:approach}.
We defined the build as the scope together with the build outcome, success or failure, as the outcome metric.
Using the scope we constructed social networks from the communication among developer that can be related to a build as described in Chapter~\ref{chap:bg} in both Chapter~\ref{chap:soc-net} and~\ref{chap:stc-net2}.
Chapter~\ref{chap:stc-net2} used dependencies among change-set committed by developers that are relevant to a given build to construct a technical network to complement the social network forming a socio-technical network.

The three studies we presented in Part~\ref{part3} of this thesis focused on the last step to \emph{generate actionable insights}.
The study in Chapter~\ref{chap:stc-net} showed that we are able to produce recommendations from available repository data that affect build success.
These recommendations take the form of highlighting two developers that have a technical dependency but did not communicate in the context of the build.
%These recommendations in the form of recommending developers to communicate changes the structure of the social network, which we intend in order to improve build success.

We decided to focus on generating recommendation enticing developer to communicate in order improve build success over recommendation that would suggest code changes changing the dependencies among developer for two reasons:
(1) proper code changes are more difficult to suggest without a sufficient understanding of the program requiring more in-depth program analysis and
(2) developer need to trust the recommendation, which is easier to achieve by limiting ourselves to suggesting people that are affected by a change.

In Chapter~\ref{chap:talk} we explored the developer view with respect to recommendation systems and if and when recommendation on a change-set level would be appropriate.
The feedback we received generally welcomed such recommendations as long as they are not seen as irrelevant, thus, corroborating Murphy and Murphy-Hill's~\cite{murphy:rsse:2010} point.
Developers, in fact, discuss change-sets in general, but specifically towards the end of a release cycle, as each change becomes more important with respect to the stability of the overall project.

Through an in-class study with several students in Canada and Finland we investigated whether we can collect the necessary data to compute recommendations at the appropriate time as well as if those recommendation actually could prevent builds from failing (Chapter~\ref{chap:actionable}).
We recognize that a student project is substantially smaller in size and complexity than an industrial project such as Rational Team Concert, nevertheless, the student project was to work on an existing open source project that is used by several companies.
Furthermore, the in class study gave us the chance to see the possible effect of our recommendations more clearly as the smaller complexity allowed build to fail for less reasons and therefore making the impact of the recommendations clearer.

Overall, we gave evidence to the usefulness of our approach by selecting a specific scope and outcome metric, builds and build success respectively, and defined the construction of the social and technical networks in detail (see Chapter~\ref{chap:bg}).
Furthermore, we showed that the approach can generate actionable insights that are acceptable to developers.
In a final study involving students from two countries we found evidence that we are both able to generate recommendation early enough to be acted upon as well as demonstrated that such recommendation could actually prevent build failures.


\section{Contributions through Empirical Studies}
\label{sec:cont:emp}
Each study by itself contributed to the overall body of knowledge of software development team coordination.
We present in the order of our five research questions the contribution of each  study.


\subsection{Using Build Success as Communication Quality Indicator}
\label{subsec:practicalimpl}
%With this study we gave empirical evidence of communication among software developers influencing software quality.
%Although by itself not surprising that issues in communication can hinder productivity and introduce ambiguities that might lead to problems with respect to software quality, it is, to the best of our knowledge, the first study that instead of looking into content of individual  conversations takes a higher level approach and relates communication structures to software quality.
%
We started our investigation by exploring whether there exists a relationship between build success and communication by using prediction models (Chapter~\ref{chap:soc-net}).
Our models can be used by Jazz teams to assess the quality of their current
communication in relation to the result of their upcoming integration. If a team
is currently working on a component and an integration build is planned in the
near future, the measures of the current communication in the team can be
provided as input to our prediction model and the model will predict whether the
build will fail with a precision shown in Table~\ref{tab:PredictionResultTable}.
For example, if team P is working towards a build and our model predicts that the
structure of its current communication leads to a failed build, the team can have
a 76\% (see Chapter~\ref{chap:soc-net} Table~\ref{tab:PredictionResultTable}) confidence that the build is
going to fail. This information can be used by developers in monitoring their
team communication behaviour, or by management in decisions with respect to
adjusting collaborative tools or processes towards improving the integration.

\subsection{Unmet Coordination Needs Matter}
% stc and build success
The relationship between communication structure and build failures however significant has only a small effect on the overall success rate of software builds, the outcome metric we studied.
This lead us to include information about the system by adding technical dependencies as expressed by the source code among software developers.
Backed up by findings in the research area of socio-technical congruence we hypothesized that the technical relationships help to zero in onto the important relationships among developers that relate to build failures.
As the relationship between socio-technical congruence and productivity suggested influence on software quality, we showed in Chapter~\ref{chap:stc-net2} that it actually predicts build failures with varying accuracy depending on the type of build.
Thus not meeting coordination needs as demanded by technical dependencies among software developers has a negative effect on build success.

\subsection{Developers That Induce Build Failures}
\label{sec:implications}
% failure inducing pairs
Being able to predict whether a build fails already helps developers to plan ahead with respect to future work, such as stabilizing the system in contrast to working on new features, but ultimately we want to be able to prevent builds from failing.
To that purpose we need to influence the socio-technical network such that it takes a structure that is more favourable to build success.
We found that certain constellations within a socio-technical network, to be more precise pairings of software developer and their respective relationship, seem to be correlating with build success (Chapter~\ref{chap:stc-net}).
This evidence can be used to recommend action before the build is commenced in the sense that developers can investigate their relationship by for example discussing the code changes that created a technical relationship between them.

Thus, our findings have several implications for the design of collaborative systems.
By automating the analyses presented here we can incorporate the knowledge about
developer pairs that tend to be failure related in a real-time recommender
system. Not only do we provide the recommendations that matter to the upcoming
build, we also provide incentives to motivate developers to talk about their
technical dependencies.
Such a recommender system can use project historical data to
calculate the likelihood that an upcoming build fails given a particular
developer pair that worked on that build without communicating to each other.

For management, such a recommender system can provide details about the
individual developers in, and properties of, these potentially problematic
developer pairs. Individual developers may be an explanation for the behaviour of
the pairs we found in Rational Team Concert. This may indicate developers that are
harder to work with or too busy to coordinate appropriately, prompting management
to reorganize teams and workloads. This would minimize the likelihood of a build
to fail, by removing the underlying cause of a pair to be failure related.
Similarly, as another example from our study, most developer pairs
consisted of developers that were part of different teams. In such
situations management may decide to investigate reasons for coordination
problems that include factors such as geographical or functional distance in the project.

\subsection{Recommender System Design Guidelines}
\label{sec:sub:tools}
% talk or not to talk
In our first qualitative study (Chapter~\ref{chap:talk}) we explored whether developers would accept recommendations produced by our approach.
It turns out, that developers are generally open to recommendations on a low level, such as on a change-set basis, but it depends on external factors such as the development process.
For instance, we found depending on how close a development team to a software release is the more they focus on the implications of individual changes, whereas developers focus more high level reusability issues at the beginning of a release cycle.

Nakakoji et al~\cite{nakakoji2010:rdc} formulated nine design guidelines for systems that support seeking information in software teams. Some of them deal with minimizing the interruptions experienced by the developers who are asked for information, while others refer to enabling the information-seeker to contact the right people. Our findings help us refine Nakakoji et al guidelines:

\paragraph{Guideline \#1} \emph{Recommender systems should adjust to the development mode.}
Our first finding strongly suggests that a developer's information needs can dramatically change between development modes.
%
When in normal iteration mode, developers act upon planned work and can therefore anticipate the information they need, but in endgame mode, developers react to unplanned incoming work, such as bug reports or requests for code reviews.

Many tools, such as Codebook~\cite{begel:icse:2010} and Ensemble~\cite{xiang:rsse:2008} provide information and recommendations in a fixed way.
Codebook enables developers to discover other developers whose code is related.
In contrast,  Ensemble provides a constant stream of potentially relevant events for each developer.
In the Codebook case, this might lead to extra overhead in endgame mode when developers frequently need to search for information instead being automatically provided, whereas Ensemble might overload developers during the feature development mode by providing a constant stream of information.

To avoid overwhelming or reducing overhead further for developers, recommendation systems should either automatically adjust to the development mode or feature customizable templates that can easily be switched.

\paragraph{Guideline \#2} \emph{Recommender systems should account for perceived knowledge of other developers.}
Our second and third findings unveiled factors that trigger developers to seek information about a change-set that are not related to its code.
Instead, developers pay close attention to the experience level as well as the quality of previously delivered work to determine whether to talk to the change-set owner.

Traditional recommender systems in software engineering focus on the source code to determine useful recommendations, e.g. Codebook~\cite{begel:icse:2010} and Ensemble~\cite{xiang:rsse:2008}.
This might lead to providing developers with information about changes that are of little interest due to the trust placed in more experienced developer.

But because developers often look beyond source code and perform an additional step, namely considering the change-set owner's experience and recent work, information solely created from source code might miss interesting instances where novices to the code made inappropriate changes.
Recommender systems might report issues that are of less importance due to the substantial experience of the change-set owner.

Implementing filtering mechanisms based on author characteristics such as experience and quality of previously delivered work can help developers focus on the information that is important to them.


\paragraph{Guideline \#3} \emph{Recommender systems should assist in non-implementation tasks such as code reviews and risk assessment.}
We observed, as described in the fourth, fifth, and sixth findings, that developers are highly engaged in discussions when performing risk assessments or reviews of change-sets.

In software engineering, most recommendations are focused on providing information to support concrete tasks such as bug fixes or re-factorings, but not for tasks such as reviews. To provide information for non coding tasks, recommender systems should be configurable to display relevant information beyond the tasks that they are intended to support, so that developer can easily access the information provided by recommender systems when performing code reviews or risk assessments.

\paragraph{Guideline \#4} \emph{Recommender systems should account for business goals.}
Our last finding points to internal conflicts within teams and among developers caused by the desire to create a flawless product under the restriction of a set of business goals such as shipping the product on time.
Thus, developers often need to be reminded that they must focus their efforts on fulfilling business goals rather than on polishing the product as they see fit. Existing recommenders that use code-related metrics such as quality or productivity may shift attention away from fulfilling business goals.

To support developers to focus on business goals, systems supporting the information-seeking behaviour of developers should be able to prioritize information related to tasks that are mission-critical to the organization, helping the team focus its attention on the most relevant problems for the upcoming release.


\subsection{STC in real time}
% leveraging stc in real time
Knowing that socio-technical congruence lends itself to produce actionable knowledge that has an acceptable form to support developers in the wild, leads us to our last study (Chapter~\ref{chap:actionable}).
In this study we showed the feasibility of generating recommendations at the right time, by gathering data to generate socio-technical congruence in real time.
Thus, we showed that socio-technical congruence can be used in real time to create actionable knowledge that might be of use to developers.


\section{Threats to Validity}
\label{sec:threat}
In his section we detail the threats to validity of this thesis.

% limited number of studies
\paragraph{External Validity}
In part of this work we draw on information from observational studies (Chapters~\ref{chap:talk} and~\ref{chap:actionable}) and studies relying on development repositories (Chapters~\ref{chap:soc-net},~\ref{chap:stc-net2}, and~\ref{chap:stc-net}) that cover two development projects.
Although this limits the generalizability of the findings presented as well as the validity of the inferred approach, we think that the approach still holds merit as the studies that lay the foundation for the validity of generating insights in real time are derived from and industrial project comprising more than one hundred developers at a large software corporation.
This in-depth relationship created by working together with the IBM Rational Team Concert development team limits the amount of data available for the studies we presented but this in-depth relationship enables us to better interpret the collected data as well as gain a deeper understanding of the organization and their processes and how they influence the data.
In the case of the in class study, we aimed to minimize the conclusions we drew to only serve as a feasibility study to demonstrate that technical networks can be constructed in real time as well as give some evidence that potential recommendations can prevented build failures from occurring.

In our close relationship with the IBM Rational Team Concert team we had the chance to interview ten developers, which represent a fraction of the development team at large. These ten developers were all located at the same site. As a result of this, our interview data could be biased and unrepresentative of the RTC team at large.
However, we are confident that this threat is minor, due to the mix of developers we interviewed, including novices, senior developers, and team members that had been part of the group since its beginning.
Furthermore, the triangulation with our observations and survey responses increases our confidence in our findings.

\paragraph{Construct Validity}
In this thesis we conceptualized social dependencies among developers using digitally recorded communication artifacts in the form of work item discussions as well as relied on technical dependencies inferred from developers changing the same source code file.
Both constructs are used by the software engineering research community in several studies (e.g.~\cite{cataldo:cscw:2006}).
Nevertheless, both the social and technical dependency characterizations come with the danger that they do not necessary measure social or technical dependencies of relevance or might as well miss existing dependencies.
This leads to the threat that our inferences might be based on inconsistencies in the data such as meaningless communication among developers or file changes that are not technical in nature.
For instance, due to storage problems the Jazz teams erased some build results. In the case of
nightly builds we expected 90 builds (according to project duration) but found
only 15. This might affect our results but we argue that due to our richness of
data the general trend is still preserved.
Given that we use data that was generate by highly disciplined professionals or by students that we monitored we are confident that the data available for analysis is of high quality.

\paragraph{Internal Validity}
% never traced found patterns to issues/rellied on statistical analysis
Chapters~\ref{chap:stc-net2} and~\ref{chap:stc-net} demonstrated that constructing the socio-technical networks is feasible and in Chapter~\ref{chap:stc-net} we showed that there is a relationship between the network configuration and build success that can be used to generate recommendations.
One issue that we will need to address in future work is showing a definite link between the insights presented in Chapter~\ref{chap:stc-net} and the actual build failures and to what extend the recommendations actually can prevent build failures from happening.
To mitigate this threat we showed some initial evidence of tracing a failed build back to its original failure source and showed that the failure could have been prevented with the socio-technical information available at the point in time when the error was introduced into the code base.

%We conceptualized communication based on comments on work items. Besides
%that, the Jazz team communicates via email, chat, web-based information and
%face-to-face meetings. Based on our observations and conversations with the Jazz
%team, we are certain that comments are mostly used to communicate about work
%items. Since they are work item-specific and immediately available.

Another threat to the approach, which is related to the previously mentioned lack of tracing the basis of the recommendations back to actual build failures, is that we did not test it in the field to see how the recommendation affect the development process.
We presented in Chapter~\ref{chap:talk} a study that explored if the recommendations are made at an appropriate level of granularity as well as feedback to the usefulness of such recommendations.
Furthermore, the study conducted in a class room setting also suggests that there is value in generating such recommendations.

The surveys we deployed in our qualitative studies (Chapter~\ref{chap:talk} and~\ref{chap:actionable}) survey asked developers to answer closed questions with a pre-defined list of answers which might introduce a bias.
This bias poses a threat to our findings due to the possibility that we were missing important items.
We mitigated it by developing the survey iteratively by piloting and discussing it with one of the development teams to identify the most important items, and by relying on our other two sources of data to triangulate our findings.


\subsection{Conclusions and Future Work}
\label{ch:dis:con}
In this thesis we illustrated an approach to leverage the concept of social-technical congruence to generate actionable knowledge.
This five step approach focuses on defining two key parameters up front: (1) the scope of interest and (2) the outcome metric of interest.
The first parameter scope helps with constructing the social networks (the third step) and constructing the technical networks (the fourth step) by supporting the selection of the best data sources.
The outcome metric guides the analysis to produce actionable knowledge in the form of indicators that positively or negatively influence the outcome metric (step 5).

% future work
The work presented in this thesis lend it self to several venues of future work, such as building and testing the recommendation system with several software development teams to study its impact.
A more interesting avenue to pursue is to explore what software architecture can support what kind of communication and organizational structure.
So far, the research around socio-technical congruence is pointing into the direction of changing how software developers coordinate their work, but we propose to return to the original observation Conway made in that the software architecture will change to accommodate the communication structures in an organization.
Therefore, analyzing software architectures with respect to the project properties, such as distribution of the development team or the organizational hierarchy, might yield valuable insight in guiding design decisions of the software product that not only take into account properties to increase the feature richness or maintainability of the software product but is optimal with respect to properties of the organization and the development team in order to increase productivity and quality.