“When everything fails, ask for additional domain knowledge” is the current motto of machine learning. Therefore, assessing the real added value of prior/domain knowledge is a both deep and practical question. Most commercial data mining programs accept data pre-formatted as a table, each example being encoded as a fixed set of features. Is it worth spending time engineering elaborate features incorporating domain knowledge and/or designing ad hoc algorithms? Or else, can off-the-shelf programs working on simple features encoding the raw data without much domain knowledge put out-of-business skilled data analysts?   In this challenge, the participants are allowed to compete in two tracks:

  • The “prior knowledge” track, for which they will have access to the original raw data representation and as much knowledge as possible about the data.
  • The “agnostic learning” track for which they will be forced to use a data representation encoding the raw data with dummy features.