Semantic information recognition and extraction is the major enabler for next generation information retrieval and natural language processing. Yet it is currently only successful in small domains of limited scope. We claim that to move beyond this restriction requires one: (1) to perform integrated semantic extraction incorporating a probabilistic representation of semantic content, and (2) to better employ the broader semantic resources now coming on-line. This project will explore both fundamental research and large scale applications, using the public domain Wikipedia as a driver and a resource. Research will explore the integration of semantic information into the language processing chain. Applications will employ this in broad spectrum named-entity recognition, and in cross-lingual information retrieval using the rich but incomplete data available fron the Wikipedia. Three PASCAL sites will contribute pre-existing software, theory, and skills to the range of tasks involved.