Modal verbs have different interpretations depending on their context. Their sense categories – epistemic, deontic and dynamic – provide important dimensions of meaning for the interpretation of discourse. Previous work on modal sense classification achieved relatively high performance using shallow lexical and syntactic features drawn from small-size annotated corpora. Due to the restricted empirical basis, it is difficult to assess the particular difficulties of modal sense classification and the generalization capacity of the proposed models. In this work we create large-scale, high-quality annotated corpora for modal sense classification using an automatic paraphrase-driven projec- tion approach. Using the acquired corpora, we investigate the modal sense classification task from different perspectives. We uncover the difficulty of specific sense distinctions by investigat-ing distributional bias and reducing the sparsity of existing small-scale corpora used in prior work. We build a semantically enriched model for modal sense classification by designing novel features related to lexical, proposition-level and discourse-level semantic factors. Besides improved classification performance, closer examination of interpretable feature sets unveils relevant semantic and contextual factors in modal sense classification. Finally, we investigate genre effects on modal sense distribution and how they affect classification performance. Our investigations uncover the difficulty of specific sense distinctions and how they are affected by training set size and distributional bias. Our large-scale experiments confirm that semantically enriched models outperform models built on shallow feature sets. Cross-genre experiments shed light on differences in sense distributions across genres and confirm that semantically enriched models have high generalization capacity, especially in unstable distributional settings.