In collaborative learning, multiple parties contribute their datasets to
jointly deduce global machine learning models for numerous predictive tasks.
Despite its efficacy, this learning paradigm fails to encompass critical
application domains that involve highly sensitive data, such as healthcare and
security analytics, where privacy risks limit entities to individually train
models using only their own datasets. In this work, we target
privacy-preserving collaborative hierarchical clustering. We introduce a formal
security definition that aims to achieve the balance between utility and
privacy and present a two-party protocol that provably satisfies it. We then
extend our protocol with: (i) an optimized version for the single-linkage
clustering, and (ii) scalable approximation variants. We implement all our
schemes and experimentally evaluate their performance and accuracy on synthetic
and real datasets, obtaining very encouraging results. For example, end-to-end
execution of our secure approximate protocol for over 1M 10-dimensional data
samples requires 35sec of computation and achieves 97.09% accuracy.