浙江大学控制科学与工程学院邀请MengChu Zhou教授作了一场题为“Big Data Research and Applications(大数据研究和应用)”的讲座,控制科学与工程学院现有工业控制研究所、自动化仪表研究所、智能系统与控制研究所,工业控制技术国家重点实验室,工业自动化国家工程研究中心,自动化实验教学中心和科学仪器研究中心。在职研究生讲座的主要内容是:
大数据分析与应用的研究已经收到来自业界和学术界的众多关注,由于在商业,制造业,医疗保健和政府机构及其广阔的应用前景,因为大约十年前,当谷歌发起了MapReduce的项目。一些新的数据密集型和有趣的研究领域和问题就出现了。随着技术的增加的应用程序区域,数据的大小也增加。类不平衡问题变得因为无限的大小和数据集的不平衡性在数据挖掘中的最大问题。的不平衡数据集,其中具有多样一个类以外的类在现实世界的分类问题,即,欺诈检测,危险事件检测,疾病诊断,面部识别,图形分类,并文本分类中经常遇到。少数数据很少发生,但非常重要,通常由传统的分类错误归类。本讲座将涵盖大数据的概念,基本的加工方法特别适用于不平衡数据集,以及一些有趣的应用。
原文:Big data research has been receiving numerous attentions from industry and academia due to its vast application potential in business, manufacturing, health-care, and government agencies since about ten years ago when Google initiated the MapReduce project. Some new data-intensive and interesting research fields and problems have emerged. As the application area of technology increases, the size of data also increases. Class imbalance problems become the greatest issue in data mining because of the unbounded size and imbalance nature of datasets. An imbalance dataset in which one class having much more samples than other classes are often encountered in real-world classification problems, i.e., fraud detection, hazard event detection, disease diagnosis, face recognition, graph classification, and text classification. Minority data rarely occur but very important and often misclassified by traditional classifiers. This talk will cover the big data concept, fundamental processing methods especially for imbalanced data sets, and some interesting applications.