$n$-gram profiles have been successfully and widely used to analyse long
sequences of potentially differing lengths for clustering or classification.
Mainly, machine learning algorithms have been used for this purpose but,
despite their predictive performance, these methods cannot discover hidden
structures or provide a full probabilistic representation of the data. A novel
class of Bayesian generative models designed for $n$-gram profiles used as
binary attributes have been designed to address this. The flexibility of the
proposed modelling allows to consider a straightforward approach to feature
selection in the generative model. Furthermore, a slice sampling algorithm is
derived for a fast inferential procedure, which is applied to synthetic and
real data scenarios and shows that feature selection can improve classification
accuracy.