The goal of this project is to investigate whether image representations based on local invariant features, and document analysis algorithms such as probabilistic latent semantic analysis, can be successfully adapted and combined for the specific problem of scene categorisation. More precisely, our aim is to distinguish between indoor/outdoor or city/landscape images, as well as (in a later stage) more diverse scene categories. This is interesting in its own right in the context of image retrieval or automatic image annotation, and also helps to provide context information to guide other processes such as object recognition or categorisation. So far, the intuitive analogy between local invariant features in an image and words in a text document has only been explored at the level of object rather than scene categories. Moreover, it has mostly been limited to a bags-of-keywords representation. Introducing visual equivalents for more evolved text retrieval methods to deal with word stemming, spatial relations between words, synonyms and polysemy is the prime research objective of this project, as well as studying the statistics of the extracted local features to determine to which degree the analogy between local visual features and words really holds in the context of scene classification, or how the local features based description needs to be adapted to make it hold.