CS Lakes – Unstructured Data for US lakes

To address science questions at the seam between multiple disciplines, researchers often need to access and explore environmental, socioeconomic and demographic information about lakes. In addition, they often want to discover and follow other researchers’ work on lake-related research. For example, researchers might be interested in how water quality relates to the average home sale prices in lakeside areas, or whether demographics of the cities/towns are influenced by the presence and quality of lakes. Or researchers might be interested to know whether a particular lake has a body of research associated with it.

Answering such questions requires retrieving raw, largely unstructured data such as Web pages, from multiple disparate sources (e.g., various local real-estate websites for house listings, and Google scholar for publications on lakes), then extracting and inferring useful information from the retrieved data. A system that can automatically collect, process and aggregate such information is thus much desired.

Towards this goal, we have been building a structured Web portal for our research community using a variety of information extraction, integration, and database techniques. The portal regularly queries and crawls various Web sources, automatically extracts and integrates structured information about lakes, and links the lakes with useful information sources, such as National Atmospheric Deposition Networks and USGS Water Resources. Finally, the portal presents all this information to users via a Web interface.