The predictions of click through rate (CTR) and conversion rate (CVR) play a
crucial role in the success of ad-recommendation systems. A Deep Hierarchical
Ensemble Network (DHEN) has been proposed to integrate multiple feature
crossing modules and has achieved great success in CTR prediction. However, its
performance for CVR prediction is unclear in the conversion ads setting, where
an ad bids for the probability of a user's off-site actions on a third party
website or app, including purchase, add to cart, sign up, etc. A few challenges
in DHEN: 1) What feature-crossing modules (MLP, DCN, Transformer, to name a
few) should be included in DHEN? 2) How deep and wide should DHEN be to achieve
the best trade-off between efficiency and efficacy? 3) What hyper-parameters to
choose in each feature-crossing module? Orthogonal to the model architecture,
the input personalization features also significantly impact model performance
with a high degree of freedom. In this paper, we attack this problem and
present our contributions biased to the applied data science side, including:
First, we propose a multitask learning framework with DHEN as the single
backbone model architecture to predict all CVR tasks, with a detailed study on
how to make DHEN work effectively in practice; Second, we build both on-site
real-time user behavior sequences and off-site conversion event sequences for
CVR prediction purposes, and conduct ablation study on its importance; Last but
not least, we propose a self-supervised auxiliary loss to predict future
actions in the input sequence, to help resolve the label sparseness issue in
CVR prediction.
Our method achieves state-of-the-art performance compared to previous single
feature crossing modules with pre-trained user personalization features.