There's a lot of data in life , that What does data analysis do Well ？ In the face of massive data , We can't get valuable information from these huge data just by relying on human brain and hands . Even if it can , Without any scientific evidence . We can't find any useful information from it . therefore , It's a combination of statistical technology and IT technology “ data mining ” It came into being .
At first , Data mining follows “One to One” or “CRM(Customer RelationshipManagement)” The trend of , It's mainly used to analyze customer behavior 、 Develop new customers 、 Forecast new products and inventory management, etc , Especially expected to be applied to the field of marketing . therefore , There is a growing tendency to combine customer data with poS Data is stored in the database (Data Warehouse) ( The figure below ) Methods ：
By using databases together (Data warechouse) And data mining , Relevant people get a lot of effective information from the field of marketing 、 knowledge 、 Assumptions and topics . In recent years , such Methods are also widely used in Finance 、 Quality management 、 Medical care 、 Scientific research and so on .
In the actual data mining , First of all, what should we prepare for ? Data mining tools ( The props ) Yes S-PLUS、SAS、SPSS And other kinds of software and professional application software . In the U.S. , Data mining tools are called Siftware, There are about 200 kinds of . Among all these software , It includes software with comprehensive functions . It also includes software with powerful single function . for example ,IBM Of Intelligence Mining,SAS The company's EnterpriseMiner,SPSS The company's Clementine, Mathematical systems, Inc VMS(Visual Mining Studio)． They are all software with comprehensive functions , and SPSS The company's Answer Tree, The decision tree prediction theory is adopted Siftware.
After a brief understanding of what data analysis does , Simple to beginners to say a few important words , In fact, there is no need for beginners to master complex operations , There is no need to have deep professional knowledge , What's more, it costs a lot of money . For daily use Excel Is an excellent data mining tool . Let's learn together Excel Well ！ According to the purpose of data mining 、 The nature of the data 、 Scale and budget , Choose the right tools to get the job done .
What is the purpose of data mining ？ We summarize the three main purposes of data mining ：
（1） Grasp trends and patterns ; By analyzing the record data of online shopping transactions 、 Complaint data in the call center 、 Customer satisfaction survey data 、 Shopping data, etc , Can grasp the customer's purchase intention and type 、 Types of complaints and other information . Data mining tools ( Method ) The neural network in 、 Shopping basket analysis , Rough (Rough) Set 、 Correspondence analysis ( Two scale method )、 Principal component analysis 、 Cluster analysis, etc .
（2） forecast ; Using tens of thousands of data to make predictions , The most effective method is neural network , It's a powerful tool , Even if the data is nonlinear . The disadvantage is , It needs a lot of data and the function of factor analysis is weak . Using dozens of 、 Hundreds of data to predict ( And factor analysis ) There are regression analysis 、 discriminant analysis 、 Logistic regression analysis 、 The theory of quantification I、 The theory of quantification II etc. , in addition , There is grey theory in forecasting time series data 、 Nearest neighbor method 、 holt (Holt) Law 、 Exponential smoothing 、 Moving average 、 Boks - Jenkins (Box-Jenkins) Law (ARIMA Model )、 The theory of quantification I etc. .
（3） Find the best solution . Under a variety of constraints , To maximize benefits or minimize costs , How to solve the parameters ( Unknown parameter ) Well ? Use Excel It is easy to solve this problem by solving the programming problem .
These three points are the main purpose of data mining , I hope it will be helpful to your understanding of data mining .
It is generally believed that Excel It's a kind of “ Table calculation software ”, actually , It also has the functions of data mining tool and database , It's very practical use excel Learn data mining Software for . Next, we will introduce the use of excel Operation tools and analysis methods of mining data ．
Excel As a powerful data mining tool , It has the following five functions ：① function 、② Chart 、③ Data analysis 、④ PivotTable 、⑤ Planning to solve .
Why is the above function “ Data mining tools ” Well ? The following describes their respective functions and the relationship with data mining .
One 、Excel The function in
Everyone keeps storing data in the computer , It can't be analyzed directly , Need to carry out “ Statistics and analysis ”. Before data mining , You need to average the data 、 The sum of the 、 Maximum 、 minimum value . After we started digging , Get deeper results for the factory , It also needs to be “ Statistics ” and “ analysis ”. Be able to effectively carry out “ Statistics and analysis ” Yes. “ function ”．Excel There are about 350 Functions , According to the purpose of statistics and analysis and the nature of the data , Flexible use of different functions ．
Two 、Excel Chart in
One of the important methods of data mining is “ Data visualization ”. It doesn't list every piece of data , It's through visualization , Take a clear form of expression . Usually you get new 、 Valuable results ． Visualize the data , Beyond all doubt ,“ Chart ” Is the preferred method . When giving a speech , Charts are indispensable .Excel Of “ Drawing tools ” It supports many functions , Very practical .
Excel Yes 70 Multiple charts , Commonly used ：a、 Bar charts b、 Broken line diagram c、 Scatter plot d、 Histogram e、 Pareto ( Here's the picture )：
Many readers must have used it in their daily work “ function ” and “ Chart ”, But how many people know the following three functions ? In particular “ ” and “ Planning to solve ”, According to the installation Excel The different settings for , In many cases, it doesn't automatically appear in the menu bar . By manipulating the “ Add in ( Add features not included in the initial setup )”, It can greatly improve the functions of data mining and training analysis .
3、 ... and 、Excel Data analysis in
Data mining tools are S-PLUs、SAS、SPSS And other software and professional applications . Using these software , Need to have a certain degree of expertise , And to pay for it . On the contrary ,Excel Of “ Data analysis ” For beginners of data mining , Is a simple and practical Analysis tools .
Excel I don't know Excel Have “ Data analysis ” Other software is also used for the function , however , Since you know and understand its convenience 、 After practicality , In order to popularize data mining and data analysis , Use it in graduate courses or in consulting business .
Four 、Excel PivotTable in
Excel You can convert the data in the table into “ PivotTable ”. PivotTable is also called “ Crossover table ”. A crosstab is to put data “ layered ” Table of . Data mining “ layered ” It's also a very important part . for example , When analyzing sales data , From gender 、 Age 、 week 、 The weather and so on different angle carries on the analysis , Very interesting results can usually be obtained . however , You need to modify the table data every time you change the angle , Very trouble . Use the menu bar's “ data ”→“ PivotTables and PivotCharts ”, It's easy to make hierarchical tables . The click of a mouse , It is convenient to switch the angle of analysis . in addition , Double click the number in the cell in the crosstab , You can also display the detailed data of the number ( The following table ).
5、 ... and 、Excel The solution of programming in
Planning to solve ( The figure below ), In a nutshell “ Linear programming program ”, Including linear programming , Nonlinear programming and integer programming ． open “ Linear programming ’, Readers may find it difficult to operate .“ Planning to solve ” There are many constraints ( Formulaize ) Next , To maximize the target variable ( Minimum ) And solving the unknowns ( Also called parameters ) Tools for , It's widely used .
Professor Fanghe, who once worked at Tokyo University of Science in Japan, said ：“Excel Of ‘ Planning solution ’ It has powerful functions . In the future, by improving the use of , It is estimated that the course of statistics also needs to be greatly revised ！”
before , In order to solve the parameters of growth curves such as logistic curve and Gompertz curve , For example, in a logic curve y=a/[l+b exp(-cx)] Of a、b、c when ,Excel Wang Xiaobian once used S-PLUS. Later, I learned that we can use programming to solve the problem .
lately , Program solving is also used to calculate covariance analysis and other programs , Its application scope is expanding . From the trend , Choose to use excel Learning data mining is a smart move , It is very likely to promote the realization of neural network as soon as possible . Planning solution is a representative tool to achieve the purpose of data mining and find the optimal solution .