时间:2019年1月7日(周一)下午15:00
地点:计算机科学与技术学院六楼会议室
题目: Linguistic Information: Concepts, Structures, and Quantitative Assessments
-- A Linguistic and Cognitive Approach to Natural
Language Processing and Machine Learning
摘要: In this talk, I will discuss a linguistic and
cognitive approach to solving certain fundamental issues in the fields of
natural language processing and machine-learning with textual data.
Specifically, in contrast to the currently prevailing methods that are mainly
based on statistical approaches or mathematical operations on data as symbols,
I will address the informational nature of textual contents as data that carry
meaning and information.
NLP as a
subfield of Artificial Intelligence has had a long history with many attempts
to make significant progress. However, the fundamental issues in the field have
proven to be much more complicated than people originally expected. Over the
years, mainstream NLP approaches gradually settled on using statistical methods
for text classification and identifying information in “unstructured data”,
after hand-written heuristics and rule-based systems failed to scale up to be
generally applicable. But the advancements have not been significant due to the
extremely challenging tasks in understanding natural languages.
In this
talk, I will present the major parts of a theoretical framework and
implementation methods that I proposed for representing the relationships
between language, knowledge, and information. The topics covered will include
the concepts, structures, and quantitative measurements of what I call
“linguistic information”. I will compare this framework with the modern
information theory based on Claude Shannon’s ground-breaking work, and further
extend Shannon’s basic concepts to natural language data and the process of
linguistic communications.
Demos will
also be provided to show that more intelligent technologies and products can be
built using the linguistic and cognitive approaches, in addition to the
statistical approaches.
个人简介: Dr. Ronald
(Guangsheng) Zhang conducted his first graduate study at Shanghai Jiao Tong
University with a major in Applied Linguistic and Foreign Languages for Science
and Technology. After that he was retained as an assistant professor at the
same university. After coming to the United States, he earned his PhD in
Linguistics from the University of Delaware. During these times, he developed
deep insights into how natural languages encode and carry information, and how
human brains comprehend language,together with a keen interest in building
natural language-based technologies and products for solving practical
problems, and for verifying theoretical hypoheses.
In recent
years, Dr. Zhang worked as the CTO and Chief Scientist at Linfo Research in
Silicon Valley California, USA, and the Chief Scientist at Bello Intelligent
Technologies Co. in Shenzhen, China. Dr. Zhang is the inventor of 32 issued
patents, covering new technologies ranging from topic-modeling, intelligent search
engines, information extraction from unstructured data, rule-based methods for
high-accuracy sentiment analysis, unsupervised machine-learning methods for
knowledge-discovery and knowledge representation, and information management
methods and user interface functionalities for large amounts of unstructured
data. Some of these patents are also being turned into academic papers.
作者:计算机科学与技术学院 审核:刘学军