Data is the new oil for the car industry. Cars generate data about how they
are used and who's behind the wheel which gives rise to a novel way of
profiling individuals. Several prior works have successfully demonstrated the
feasibility of driver re-identification using the in-vehicle network data
captured on the vehicle's CAN (Controller Area Network) bus. However, all of
them used signals (e.g., velocity, brake pedal or accelerator position) that
have already been extracted from the CAN log which is itself not a
straightforward process. Indeed, car manufacturers intentionally do not reveal
the exact signal location within CAN logs. Nevertheless, we show that signals
can be efficiently extracted from CAN logs using machine learning techniques.
We exploit that signals have several distinguishing statistical features which
can be learnt and effectively used to identify them across different vehicles,
that is, to quasi "reverse-engineer" the CAN protocol. We also demonstrate that
the extracted signals can be successfully used to re-identify individuals in a
dataset of 33 drivers. Therefore, not revealing signal locations in CAN logs
per se does not prevent them to be regarded as personal data of drivers.