[BDCSG2008] Text Information Management: Challenges and Oportunities (ChengXiang Zhai)

UIUC CS professor Zhai reviews texts information management. ChenXiang start reviewing the importance of text as a natural way to encode human knowledge. His main focus is how he can provide support for different usages of text information, and how they interact to models, applications, systems and algorithms. This allowed him to motivate future research directions on information retrieval. Some of his interesting words:

Future research directions require improvements on IR and NLP (shallow: POS, partial parsing, fragmental semantic analysis), but it is fragile and domain oriented. Machine learning algorithms are still no scalable and not enough training data to satisfy the algorithm requirements. Data mining has lots of algorithms, but only for salient patterns.

ChengXiang says there is a triangle involving: (1) Keyword queries (search history, complete user models), (2) bags of words (entity-relations, nwoledge representation), (3) search (access, mining, and task support). That leads to personalized search, large-scale semantic analysis, full fledged text information management. On the road there is for sure scalability (he demoed the UCAIR project as a leap toward new search engines). On the large-scale semantics he emphasize the importance of graph representation for the analysis and how you can use graph analysis techniques. And changing gears to a third topic is how you can create multi-resolution topic map for navigation. The basic idea is zoom in and zoom out strategy to drill in and aggregate for the navigation.