The big data era of information overload,How to build a big data 188bet mobile system,Where is the trend

Industrial Investment Authors of 188bet mobile article: Li Cuiping, Lan Mengwei, etc. 2015-12-28
Year-on-year growth with the advent of the big data era,The amount of information in the network is growing exponentially,This brings about the problem of information overload。188bet mobile system is one of the most effective ways to solve information overload,Big data 188bet mobile system has gradually become a research hotspot in the information field。

1 188bet mobile system and network big data
With the rapid development of science and technology and information technology,Society has entered a new era of high informationization,The Internet is everywhere,Influenced all aspects of human life,And completely changed people’s way of life。Especially entering Web 2.Since era 0,With the sudden rise of social network media,Internet 188bet mobile are both consumers of network information,Also a producer of online content,The amount of information on the Internet is growing exponentially。Due to the limited discernment ability of the user,When faced with the huge and complex Internet information, it is often difficult to know where to start,Making the cost of finding useful information on the Internet huge,The so-called "information overload" problem has arisen。
The emergence of search engines and 188bet mobile systems provides very important technical means to solve the problem of "information overload"。For search engines,When users search for information on the Internet,Need to enter "query keywords" in the search engine,Search engine based on user input,Information matching in the system background,Display information related to user queries to users。But,If the user cannot think of keywords that accurately describe their needs,The search engine is powerless at this time。Different from search engines,The 188bet mobile system does not require users to provide clear needs,Instead, it models the user’s interests by analyzing the user’s historical behavior,Thus proactively recommending information to users that may meet their interests and needs。Therefore,Search engines and 188bet mobile systems are two complementary tools for users,The former is active,And the latter is passive。
In recent years,E-commerce is booming,The dominant position of 188bet mobile systems in the Internet is becoming more and more obvious。In international terms,The more famous e-commerce websites include Amazon and eBay,The 188bet mobile algorithm used in the Amazon platform is considered very successful。in the country,Relatively large e-commerce platform websites include Taobao (including Tmall Mall)、Jingdong Mall、、, etc.。In these e-commerce platforms,The website offers countless products,The number of users on the website is also very large。According to incomplete statistics,The number of products in Tmall Mall has exceeded 40 million。In such a huge e-commerce website,After the user enters a keyword query based on his/her purchase intention,You will get many similar results,It is also difficult for users to distinguish similarities and differences in these results,It is also difficult for users to choose suitable items。So,The 188bet mobile system can recommend some products that the user is interested in based on the user's interests,Thus the need to provide users with suggestions in shopping choices is very obvious。Among the more successful e-commerce websites currently,They all use 188bet mobile systems to varying degrees while users are shopping,Recommend some products for users,Thus increasing website sales。
On the other hand,The development of smartphones has promoted the development of mobile Internet。In the process of users using mobile Internet,Information such as its geographical location can be obtained very accurately。Based on this,A large number of websites based on user location information have appeared at home and abroad。The more famous ones abroad include Meetup and Flickr。The famous domestic ones include Douban and Dianping。For example,In a location-based service website like Dianping,Users can search for restaurants based on their current location、Hotel、cinema、Tourist attractions and other information services。At the same time,You can comment on various types of information under the current location,Rate your own real-world experience,Share your own experiences and feelings。When users use such location-based website services,Similarly encounters the problem of "information overload"。The 188bet mobile system can recommend content that the user is interested in at the current location based on the user's location information,Provide users with content that meets their true needs,Improve user satisfaction with the website。
With the rise of social networks,Users’ behavior on the Internet is no longer limited to obtaining information,More interaction with other users on the web。The famous foreign social network is Facebook、 LinkedIn、 Twitter etc.,Domestic social networks include Sina Weibo、、Tencent Weibo, etc.。In social networking sites,Users are no longer individuals,But it has intricate relationships with many people on the Internet。The most important resource in social networks is the relationship data between users。In social networks,The relationships between users are different,Relationship factors may be real-world relatives、Classmate、Colleague、Friendship,It may also be a virtual friend on the Internet,For example, they are all members of a social network with common interests。In social networks,The connection between users reflects the trust relationship between users,A user is not just an individual,Users’ behavior in social networks will be more or less affected by these user relationships。Therefore,Research and application of 188bet mobile systems in such social networking sites,The impact of users’ social relationships should be considered。
2 The emergence and development of 188bet mobile system
The concept of "188bet mobile system" was proposed at the American Association for Artificial Intelligence (AAAI) in 1995。Robert Armstrong, a professor at CMU University at the time, proposed this concept,And launched the prototype system of 188bet mobile system-Web Watcher。In the same meeting,Marko Balabanovic and others from Stanford University in the United States launched the personalized 188bet mobile system LIRA1。Then research work on 188bet mobile systems began to grow slowly。1996, Yahoo website launched a personalized entrance MyYahoo,Can be regarded as the first official commercial 188bet mobile system。Since the 21st century,The research and application of 188bet mobile systems have suddenly emerged with the rapid development of e-commerce,All major e-commerce websites have deployed 188bet mobile systems,The 188bet mobile system of the Amazon website is relatively famous。There are reports, 35% of Amazon’s website revenue comes from its own 188bet mobile system。2006,The American DVD rental company Netflix has publicly set up a 188bet mobile algorithm competition online - Netflix Prize。 Netflix has disclosed some data from the real website,Contains user ratings for movies[2]。The Netflix competition effectively promotes research on 188bet mobile algorithms in academia and industry,Many effective algorithms were proposed during this period。In recent years,With the development of social networks,188bet mobile systems are widely used in industry and have made significant progress。The more famous 188bet mobile system applications include: Amazon and Taobao’s e-commerce 188bet mobile systems、Movie 188bet mobile system for Netflix and MovieLens、Youtube’s video 188bet mobile system、Douban and’s music 188bet mobile system、Google’s news 188bet mobile system and Facebook and Twitter’s friend 188bet mobile system。
After the birth of the 188bet mobile system,The academic community is paying more and more attention to it。Starting from 1999,ACM Conference on Electronic Commerce is held every year by the American Computer Society,ACM EC),More and more papers related to 188bet mobile systems are published on ACM EC。ACM Special Interest Group of Information Retrieval, ACM SIGIR) began to treat 188bet mobile systems as an independent research topic of the conference in 2001。The 17th International Joint Conference on Artificial Intelligence held in the same year also treated the 188bet mobile system as a separate topic。Last 10 years,The academic community pays more and more attention to 188bet mobile systems。so far,Database、Data Mining、Artificial Intelligence、Important international conferences on machine learning (such as SIGMOD、VLDB、ICDE、KDD、AAAI、SIGIR、ICDM、WWW、ICML, etc.) have published a large number of research results related to 188bet mobile systems。At the same time,The first international conference named after 188bet mobile systems, ACM Recommender Systems Conference (ACM RecSys), was first held in 2007。In the KDD CUP competition held by the International Conference on Data Mining and Knowledge Discovery (KDD) in recent years,The theme of the competition for two consecutive years is the 188bet mobile system。In KDD CUP 2011 Competition,The two competition topics are "Music Rating Prediction" and "Identifying whether music has been rated by users"。In KDD CUP 2012 Competition,The two competition topics are "Friend 188bet mobile in Tencent Weibo" and "Click-through Rate Prediction in Calculated Advertising"。
3 Domain requirements and system architecture of 188bet mobile system
As above,188bet mobile systems have been widely used in many fields,As recommended by news、Weibo 188bet mobile、Book recommendations、Movie recommendations、Product 188bet mobile、Music 188bet mobile、Restaurant recommendations、Video recommendations, etc.。188bet mobile systems in different fields have different data sparsity,The scalability of the 188bet mobile system and the relevance of the 188bet mobile results、Popularity、Freshness、Diversity and novelty have different needs。See Table 1 for a comparison of the requirements of 188bet mobile systems in different fields。
Although the needs are different,A complete 188bet mobile system usually includes data modeling、User modeling、4 parts of 188bet mobile engine and user interface,As shown in Figure 1。The data modeling module is responsible for preparing the item data to be recommended,Represent it into a data form that is conducive to analysis,Determine candidate items to recommend to users,And classify items、Preprocessing such as clustering。The user modeling module is responsible for analyzing user behavior information,To obtain the user’s potential preferences。User’s behavior information includes questions and answers、Rating、Purchase、Download、View、Collection、Stay time, etc.。The 188bet mobile engine module uses the background 188bet mobile algorithm,Select items of interest to the user from a collection of candidate items in real time,Recommend to users in the form of a list after sorting。The 188bet mobile engine is the core part of the 188bet mobile system,It is also the part that consumes the most system resources and time。The user interface module is responsible for displaying recommended results、Collect user feedback and other functions。The user interface should have a reasonable layout、Beautiful interface、In addition to basic requirements such as ease of use,Should also help users proactively provide feedback。There are two main types of interfaces: Web-based and mobile-based。Subject to space restrictions,Only the two important modules of user modeling and 188bet mobile engine are introduced in detail。
3.1 188bet mobile modeling
User model reflects the user’s interests and preferences。Feedback of user interests can be divided into explicit feedback and implicit feedback。Explicit feedback includes two methods: user customization and user rating。User customization refers to the user’s answers to questions listed in the 188bet mobile,such as age、Gender、Occupation, etc.。The rating is divided into two-level rating and multi-level rating。For example,Using two levels of rating in Yahoo News: like (more like this) and dislike (less like this)。Multi-level ratings can describe how much you like a product in more detail,For example, in GroupLens, the user's preference for news can be rated from 1 to 5 points。 Level 4 feedback from News Dude support users: Interested、Not interested、Already known、Want to know more,Then perform normalization。
Many times users are unable to provide personal preferences accurately or are unwilling to provide personal preferences explicitly,More unwilling to frequently maintain personal preferences。So,Implicit feedback can often correctly reflect user preferences and changes in preferences。Commonly used implicit feedback information is: click or not、Stay time、Click time、Click location、Whether to add to favorites、Comment content (can guess the user’s mood)、User’s search content、Social Network、Fashion Trend、Click order, etc.。In collaborative filtering 188bet mobile method,Often convert users’ implicit feedback into users’ ratings of products。For example, The news read by the user in Google News is recorded as liked,Rated 1;The rating for those who have not read it is 0。In the Daily Learner system, the user clicked on the news title and the score was 0.8 points,Read the entire article and the score will increase to 1 point;If the user skips the news recommended by the system,Then subtract 0 from the system prediction score.2 points as final score。
188bet mobile’s interests can be divided into long-term interests and short-term interests。Long-term interests reflect the 188bet mobile’s true interests;Short-term interests are often associated with hot topics and change frequently,The short-term interest model learned from recent historical behavior can quickly reflect changes in 188bet mobile interests。Commonly used models include vector space models、Semantic Network Model、Classifier-based models, etc.。Since the 188bet mobile’s interest is often affected by the periodicity of the item itself、Hot events、The impact of emergencies,Very variable。So,188bet mobile models need to be updated frequently。
3.2 188bet mobile engine
The basic 188bet mobile methods of 188bet mobile engines can be divided into content-based recommendations and collaborative filtering-based recommendations。
The basic principle of content-based 188bet mobile method is,Based on the items the user liked in the past,Select other similar items as recommended results[2]。For example,There is now a new movie that has the same actors or similar themes as a movie the user has watched in the past,Then the user may like this new movie。The vector features of the user model are usually used to describe the user's interests and hobbies,Similarly extract features for each item,Content characteristics as item model。Then calculate the matching degree between the vector features of the user model and the vector features of the candidate item model,The candidate items with higher matching degree can be pushed to the target users as 188bet mobile results。
Collaborative filtering technology was proposed by David Goldberg in 1992,is currently the most successful and widely used technology in personalized 188bet mobile systems。Amazon, a famous foreign business website,The famous in China、Websites such as,All use collaborative filtering method。The essence is a technology based on correlation analysis,That is, using the common preferences of the user’s group to make recommendations to users。Collaborative filtering takes advantage of users’ historical behavior (preferences、Habits, etc.) cluster users into clusters,This 188bet mobile is based on calculating similar users,Assume that the current user is also interested in items liked by other similar users。The 188bet mobile method of collaborative filtering usually includes two steps: finding user sets (groups or clusters of users) with similar interests to the target user based on user behavior data;Find items in this collection that the user likes and that the target user has not purchased and recommend to the target user。
In actual use,Collaborative filtering technology faces two major constraints: one is the problem of 188bet mobile sparseness,The second is the cold start problem。Collaborative filtering needs to make recommendations using the correlation between users and users or items and items。The most popular memory-based collaborative filtering method is the neighbor relationship-based method。This method first finds the user's neighbors who have a similar evaluation history to the specified user,Predict results or find items similar to the query item based on the behavior of these neighbors。The premise of doing this is,If two users have similar ratings on a set of items,Then they will have similar evaluations for other items;Or if two items have similar ratings on a group of users,Then they will have similar comments about other users。
The key to the collaborative filtering algorithm is to find the nearest neighbors of the user (item)。When data is sparse,Items purchased by users are difficult to overlap,The effect of collaborative 188bet mobile is not good。One of the improvement methods is,Except direct neighbors,The behavior of indirect neighbors can also affect the current user’s decision-making behavior。Another way to solve the sparsity problem is to add some default values,Artificially make the data denser,Or use iterative completion method,Add some values ​​first,Add other values ​​on this basis。Also,There is also the use of transfer learning methods to make up for the problem of data sparse。But these methods can only partially solve the problem of data sparseness to a certain extent,Not completely overcome。In real application,Due to the large data size,The problem of data sparseness is more prominent。Data sparsity limits the effectiveness of collaborative filtering methods。Identify algorithms that match the sparsity of the data,So that you can make the right choice based on the specific application,It is a very valuable research topic。There are two types of commonly used collaborative filtering methods: memory-based methods and model-based methods。The former is mainly a memory algorithm,Export results through the relationship between users and items;The latter needs to find a suitable parameterized model,Then derive the results through this model。
User-based collaborative filtering [4] identifies 188bet mobile similar to the query user,Then use the average of these 188bet mobile’ ratings of items as the estimate of the user’s rating results。Similar to this,Item-based collaborative filtering identifies items similar to the query item,Then use the average rating of these items as an estimate of the predicted result of the item。Neighbor-based methods vary with the method of calculating the weighted average。The commonly used algorithm for calculating weighted average is Pearson coefficient、Vector cosine、MSD。
Model-based methods predict outcomes by fitting a parameterized model to the training set。It includes clustering-based CF[5~7]、Bayesian classifier[8,9]、Regression-based method[10]。The basic idea of ​​clustering-based methods is to cluster similar users (or items),This technique helps solve 188bet mobile sparsity and computational complexity issues。The basic idea of ​​Bayesian is given other ratings of user A and ratings of other users,Calculate the conditional probability of each possible rating value (such as 1~5 points in movie recommendations),Then select a score with the maximum probability value as the predicted value。The basic idea of ​​the regression-based method is to first use a linear regression model to learn the relationship between ratings,Then predict the user’s rating of the item based on these relationships。Slop-one algorithm [13] uses a linear model on the evaluation matrix,Enables it to quickly calculate results with relatively good accuracy。
A recent class of successful model-based methods are those based on low-rank matrix factorization。For example,SVD[11] and SVD++[12] decompose the evaluation matrix into 3 low-rank matrices,The product of these three matrices can restore the original matrix to some extent,So that missing values ​​can be evaluated。Another method is nonnegative matrix factorization[13],The difference is,The result of matrix decomposition must not have negative values。Extract a set of latent (hidden) factors from the scoring matrix using a method based on low-rank matrix decomposition,And describe 188bet mobile and items through these factor vectors。In the field of film,These automatically identified factors may correspond to common tags of a movie,Such as style or genre (drama or action),It may also be unexplainable。
Matrix decomposition 188bet mobile predict the interaction between two types of variables。 The Tensor decomposition model 188bet mobile extend this interactive prediction of different types of variables to higher dimensions。However,If applying the factorization model to a new task,For new problems, it is often necessary to deduce evolution based on the original factor decomposition,Implementing new models and learning algorithms。For example SVD++、 STE、 FPMC、timeSVD++、 BPTF and other models,All are improvements based on the original factor decomposition model for specific problems。Therefore,Ordinary factorization models have poor generalization ability。In terms of model optimization learning algorithms,Although there are many algorithms for learning the basic matrix decomposition model,Like (stochastic) gradient descent、Alternating least squares method、Variational Bayes and MCMC (Markov chain Monto Carlo),But for more complex decomposition models,The most common and commonly used method is the gradient descent algorithm。
Factorization machine is a general model proposed by Steffen Rendle in 2010[3]。With this model,Rendle achieved 2nd place in Track1 and 3rd place in Track2 in KDD Cup 2012。Compared with the original factor decomposition model,This model combines the generality of feature engineering with the superiority of decomposition models。It 188bet mobile simulate most factorization models through feature engineering。LibFM is an open source implementation of factorization machine,Easy to use,No need for much professional knowledge,Including 3 types of optimization learning algorithms: stochastic gradient descent、Alternating least squares and MCMC。
The Tensor decomposition model and factorization machine mentioned here both belong to the category of context-aware 188bet mobile algorithms。Context-aware 188bet mobile algorithm extends two-dimensional collaboration to multi-dimensional collaboration。From the perspective of subject origin,Context-aware 188bet mobile system is both a 188bet mobile system,Also a context-aware application system。Adomavicius and Tuzhilin et al. pointed out earlier,Integrating contextual information into the 188bet mobile system will help improve 188bet mobile accuracy,And proposed the widely cited "context-aware recommender systems, CARS)” concept。They extended the traditional "user-item" two-dimensional rating utility model into a multi-dimensional rating utility model that includes multiple contextual information。Sun et al. first used the HOSVD method for web search,Proposed CubeSVD algorithm[14],The algorithm uses the user’s location information as contextual information,For search engine result sorting,Achieved better results。Renle et al. proposed RTF algorithm[15],Different from HOSVD, RTF algorithm is optimized according to the user’s sorting,Get better accuracy。
Content-based 188bet mobile methods and collaborative filtering-based 188bet mobile methods each have their own advantages and disadvantages。Most of the existing systems are hybrid systems,It combines the advantages of different algorithms and models,Overcoming their shortcomings,Thus obtaining better 188bet mobile accuracy。
4 188bet mobile system in big data environment
4.1 Features and Challenges
Although 188bet mobile systems have been successfully used in many large-scale systems and websites,But in the current era of big data,The application scenarios of 188bet mobile systems are becoming more and more diverse,188bet mobile systems not only face data sparseness、Cold start、Traditional problems such as interest bias,We are also facing more problems caused by big data、More complex practical problems。For example,The number of users is increasing,Performance pressure caused by massive users accessing the 188bet mobile system at the same time,Making the traditional 188bet mobile system based on single-node LVS architecture no longer applicable。At the same time, the web server processing system requests are becoming increasingly large with large data sets, The slow response speed of web servers restricts the current 188bet mobile system from providing recommendations for large data sets。Also,188bet mobile based on real-time mode faces severe challenges under large data sets,Users cannot tolerate the return time of recommended results exceeding seconds。The single database storage technology of traditional 188bet mobile systems becomes no longer applicable under large data sets,There is an urgent need for a unified interface to provide external interfaces、Using a variety of mixed-mode storage internal storage architecture to meet the storage of various data files under large data sets。and,The traditional 188bet mobile system adopts a single-machine node calculation method in the 188bet mobile algorithm,Cannot meet the computing needs of large data sets generated by massive users[16]。The complexity of big data itself、Uncertainty and emergence also bring many new challenges to the 188bet mobile system,Time efficiency of traditional 188bet mobile system、Both space efficiency and 188bet mobile accuracy have encountered serious bottlenecks。
4.2 Key Technologies
4.2.1 Use distributed file system to manage 188bet mobile
Traditional 188bet mobile system technology mainly deals with small file storage and small amount of data calculation,Mostly server-oriented architecture,The central server needs to collect the user’s browsing history、Purchase record、A large amount of interactive information such as rating records to customize personalized recommendations for individual users。When the data size is too large,When all the data cannot be loaded into the server memory,Even if the external memory replacement algorithm and multi-threading technology are used,There will still be performance bottlenecks on I/O,Resulting in low task execution efficiency,It takes too long to generate recommended results。For 188bet mobile systems for massive users and massive data,188bet mobile systems based on centralized central servers cannot meet the rapidly changing needs of 188bet mobile systems in the context of big data in terms of time and space complexity[16]。
The big data 188bet mobile system uses a distributed file system based on cluster technology to manage data。Establish a high concurrency、Extensible、The big data 188bet mobile system architecture that can handle massive data is very critical,It provides strong support for processing large data sets。 Hadoop distributed file system, HDFS) architecture is a typical one。Different from traditional file systems,Data files are not stored locally on a single node,But stored on multiple nodes through the network。And the location index management of files is generally responsible for one or several central nodes[16]。When the client reads and writes data from the cluster,First get the location of the file through the central node,Then communicate with the nodes in the cluster,The client reads data from the node to the local through the network or writes data from the local to the node。In this process, HDFS manages data redundant storage、Segmentation of large files、Intermediate network communication、Data error recovery, etc.,The client can call according to the interface provided by HDFS,Very convenient。
4.2.2 Using a distributed computing framework 188bet mobile cluster technology
There are many frameworks for implementing distributed computing on clusters,MapReduce in Hadoop as a platform for parallelization of 188bet mobile algorithms,It is a distributed computing framework,It is also a new type of distributed parallel computing programming model,Parallel processing applied to large-scale data,is a common open source computing framework。 The core idea of ​​the MapReduce algorithm is "divide and conquer",Operations on large-scale data sets,Distributed to each sub-node under the management of a master node to complete together,Then by integrating the intermediate results of each node,Get the final result。The MapReduce framework is responsible for handling distributed storage in parallel programming、Work Scheduling、Load Balancing、Fault-tolerant balancing、Complex issues such as fault tolerance and network communication,Highly abstract the processing process into two functions: map and reduce。map is responsible for breaking down tasks into multiple tasks, reduce is responsible for summarizing the results of multi-tasking after decomposition[16]。For example,2010,Zhao et al. focus on the limitations of the computational complexity of collaborative filtering algorithms in large-scale 188bet mobile systems,Implementing item-based collaborative filtering algorithm on Hadoop platform。2011,In view of the problem that the 188bet mobile system cannot make recommendations to a large number of users per second, Jiang et al. divided the three main calculation stages of the item-based collaborative filtering 188bet mobile algorithm into four MapReduce stages,After splitting, each stage can run in parallel on each node of the cluster。At the same time, they also proposed a data partitioning strategy under the Hadoop platform,Reduced communication overhead between nodes,Improved the 188bet mobile efficiency of the 188bet mobile system。
4.2.3 Parallelization of recommended 188bet mobile
The 188bet mobile algorithms required by many large enterprises have to process very large amounts of data,From TB level to PB level or even higher,For example, Tencent’s Peacock topic model analysis system needs to process up to one billion documents、Millions of words、Topic model training for millions of topics,Just a matrix of one million words times one million topics,Its data storage capacity has reached 3 TB,If we consider the matrix of one billion documents times one million topics,The data volume is as high as 3 PB[17]。Faced with such huge data,If the traditional serial 188bet mobile algorithm is used,It takes too much time。When the amount of data is small,Serial algorithms with high time complexity can operate effectively,But after the data volume increases rapidly,The computational performance of these serial 188bet mobile algorithms is too low,Cannot be applied to actual 188bet mobile systems。Therefore,188bet mobile systems for large data sets should take into account distributed parallelization technology of algorithms in their design,Enables the 188bet mobile algorithm to be used in massive、Distributed、Efficiently implemented in heterogeneous data environment。
5 Open source big 188bet mobile typical recommended software
5.1 Mahout
Mahout is a new open source project under the Apache Software Foundation (ASF),The main goal is to provide some scalable implementations of classic algorithms in the field of machine learning,Free for developers to use under the Apache license,Aimed to help developers more convenient、Quickly develop applications on large-scale data。Except common categories、Except clustering and other data mining algorithms,Also includes collaborative filtering (CF)、dimensionality reduction、topic models, etc.。Mahout integrates the Java-based 188bet mobile system engine "Taste",Used to generate personalized recommendations "Taste" supports user-based、Item-based and slope-one-based 188bet mobile systems。In Mahout’s 188bet mobile algorithm,Mainly user-based collaborative filtering (user-based CF)、Item-based collaborative filtering (item-based CF)、Alternating Least Squares (ALS)、ALS on implicit feedback、Weighted matrix factorization (weighted MF)、SVD++、Parallel stochastic gradient descent (parallel SGD), etc.。
5.2 Spark MLlib
Spark MLlib implements commonly used machine learning algorithms,includes logistic regression、Support Vector Machine、Naive Bayes and other classification prediction algorithms, K-means clustering algorithm,Various gradient descent optimization algorithms and collaborative filtering 188bet mobile algorithms。MLlib currently supports the collaborative filtering method based on matrix decomposition,The function optimization process can be implemented using the alternating least squares method or gradient descent method provided by it,Supports both explicit feedback and implicit feedback information。
5.3 EasyRec
EasyRec is an open source project of SourceForge。It is for individual users,Provide low threshold and easy integration、Easy to expand、Easy to manage 188bet mobile system。This open source product includes data entry、Data Management、Recommended mining、Offline analysis and other functions。It can provide 188bet mobile services to multiple different websites at the same time。Website users who need 188bet mobile services only need to cooperate and send some user behavior data to EasyRec, EasyRec will perform background 188bet mobile analysis,And send the 188bet mobile results back to the website in XML or JSON format。User behavior data includes which products the user viewed、What items did you buy、Which products were rated, etc.。 EasyRec provides website users with an interface to access all EasyRec functions,188bet mobile business can be implemented by calling these interfaces。
5.4 Graphlab
Graphlab started in 2009,is a project developed by Carnegie Mellon University in the United States。It is based on C++ language,The main function is to provide a graph-based high-performance distributed computing framework。GraphLab can efficiently execute data-dependent iterative algorithms related to machine learning,for Boosted decision tree、Deep Learning、Text analysis, etc. provide scalable machine learning algorithm modules,Can automatically tune parameters in classification and 188bet mobile models,and SPARK、 Hadoop、 Apache Avro、 OBDCconnectors etc. are integrated。Due to unique functions,GraphLab is very famous in the industry。For large-scale data sets,Using GraphLab for random walk or graph-based 188bet mobile algorithms is very effective。Also, GraphLab also implements alternating least squares ALS、Stochastic Gradient Descent SGD、SVD++、 Weighted-ALS、 Sparse-ALS、Algorithms such as non-negative matrix factorization。
5.5 Duine
The Duine framework is a set of software libraries written in Java,Can help developers build prediction engines。Duine provides hybrid algorithm configuration,That is, the algorithm can be based on the data situation,Dynamic conversion in content-based recommendations and collaborative filtering。For example, under cold start conditions (for example, when there is no evaluation yet),It focuses on content-based analysis,The 188bet mobile module mainly uses algorithms,Extract information from user profile and product information、Calculate predicted value,Mainly include the following methods: collaborative filtering method、Instance-based reasoning (items given similar ratings by users) and GenreLMS (reasoning for classification)。 Duine has a feedback processor module,It targets enhanced predictions,Using programs to learn and obtain explicit and implicit feedback from users,Used to update user information after processing with algorithm[18]。
6 Problems faced by big data 188bet mobile system research
6.1 Feature extraction 188bet mobile
The 188bet mobile system has a rich variety of recommended objects,For example news、Blog and other text objects,Video、Picture、Music and other multimedia objects and some entity objects that can be described by text。How to extract features of these recommended objects has always been a popular research topic in academia and industry。For text objects,Features can be extracted with the help of mature text feature extraction technology in the field of information retrieval。For multimedia objects,Due to the need to combine related technologies in the field of multimedia content analysis to extract features,The multimedia content analysis technology still needs to be improved in academia and industry,Therefore, feature extraction of multimedia objects is a major problem currently faced by 188bet mobile systems[19]。Also,The distinction of recommended object features has a very important impact on the performance of the 188bet mobile system。There is currently no particularly effective method to improve feature discrimination。
6.2 188bet mobile sparse problem
Most existing 188bet mobile algorithms are based on user-item rating matrix data,The data sparsity problem mainly refers to the sparsity of the user-item rating matrix,That is, there is too little interaction between the user and the item。A large website may have hundreds of millions of users and items,The soaring total amount of user rating data is facing the faster-growing "user-item evaluation matrix",Still only a very small part,The sparsity of MovieLens, a classic data set in 188bet mobile system research, is only 4.5%, The sparsity of the music dataset provided in the Netflix Million Contest is 1.2%。These are already processed data sets,In fact, the sparsity of real data sets is far less than 1%。For example, The sparsity of Bibsonomy is 0.35%,The sparsity of Delicious is 0.046%,The sparsity of Taobao data is even only 0.About 01%[19]。Based on experience,The more user behavior data in the data set,The higher the accuracy of the 188bet mobile algorithm,The performance is also better。If the data set is very sparse,Contains only a very small amount of user behavior data,The accuracy of the 188bet mobile algorithm will be greatly reduced,It is extremely easy to cause overfitting of the 188bet mobile algorithm,Influence the performance of the algorithm。
6.3 Cold start 188bet mobile
The cold start problem is one of the biggest problems faced by 188bet mobile systems。Cold start problems can generally be divided into 3 categories: system cold start problems、New user issues and new item issues。The system cold start problem refers to the data being too sparse,The density of the "user-item rating matrix" is too low,The accuracy of the 188bet mobile results obtained by the 188bet mobile system is extremely low。The new item problem is due to the new item missing the user rating for the item,This type of item is difficult to recommend to users through the 188bet mobile system,It is difficult for users to rate these items,Thus forming a vicious circle,As a result, some new items cannot be effectively recommended。The new item problem affects different 188bet mobile systems to varying degrees: for websites where users can find items in multiple ways,The new item issue does not have much impact,Such as movie 188bet mobile system, etc.,Because there are many ways for users to find movies to watch and rate;For some websites recommended as the main way to obtain items,New item issues will have a serious impact on the 188bet mobile system。The usual way to solve this problem is to incentivize or hire a small number of users to rate each new item。The new user problem is currently the cold start problem that poses the greatest challenge to real-life 188bet mobile systems: when a new user uses the 188bet mobile system,He has not rated any items,So the system cannot make personalized recommendations;Even when new users start rating a small number of items,Due to too few ratings,The system still cannot give accurate recommendations,This may even cause users to stop using the 188bet mobile system due to poor 188bet mobile experience[20]。The current problem of new users is mainly solved by combining content-based and user characteristics-based methods,Understand the statistical characteristics and interest characteristics of users,Make more accurate recommendations when users have few or even no ratings。
6.4 Scalability Issues
The scalability problem is another problem faced by the 188bet mobile system,Especially with the advent of the big data era,The number of users and items has soared,The efficiency of traditional 188bet mobile systems will be greatly reduced as the scale of the problem increases。It is unacceptable to spend a lot of time to get recommended results,Especially for some online 188bet mobile systems with high real-time requirements。Using a memory-based 188bet mobile system,Calculating similarity between users or items will take a lot of time;Using model-based 188bet mobile system,Using machine learning algorithms to learn model parameters also consumes a lot of time,The learning time here is mainly used to solve the global optimal problem。Solving scalability issues,The method generally adopted by the industry is offline learning、Online use: first calculate the similarity or model parameters between users/items through offline data,Then online you only need to use these calculated values ​​to make recommendations[20]。But this does not fundamentally improve the efficiency of the 188bet mobile algorithm, Sarwar et al. proposed an incremental SVD collaborative filtering algorithm in 2002,When several new scores are added to the scoring matrix,The system does not need to recalculate the entire matrix,Only a small amount of calculations are needed to adjust the original model,This greatly speeds up the update of the model。At the same time,Several documents propose using clustering to solve the scalability problem,Clustering can effectively reduce the size of users and items,But this will reduce the 188bet mobile accuracy to a certain extent。In solving the global optimization problem of the model,Scholars have also done a lot of work,Hope to speed up the convergence,For example, people have proposed parallel stochastic gradient descent methods and alternating least squares methods。
7 Summary and Outlook
With the rapid development of the Internet,People are already very eager for personalized information,The emergence of 188bet mobile systems can well solve the "information explosion" problem when users use the Internet and e-commerce websites。This article mainly focuses on the emergence and development status of 188bet mobile systems in the Internet big data era、Domain requirements and system architecture、User modeling and 188bet mobile engine、Characteristics, challenges and key technologies of 188bet mobile systems in the big data era、Open source big data 188bet mobile software、Introducing the problems faced by big data 188bet mobile system research。
The future research directions of big data 188bet mobile systems are mainly in the following aspects。
From system 188bet mobile to social 188bet mobile,That is, in the process of 188bet mobile,In addition to considering the user’s historical behavior information,It is also necessary to use the user’s social network information to enhance the 188bet mobile effect;At the same time,When making recommendations between people on social networks,We must also make comprehensive use of users’ historical behavior information,Achieve mutual utilization of social networks and historical behavior information and mutual enhancement of 188bet mobile effects。
From accuracy-centered to comprehensive consideration of accuracy、Diversity and novelty evaluation 188bet mobile。
From a single data source to a cross-fusion data platform,For example, based on the user’s cross-site behavior data,Solve the cold start 188bet mobile problem on a certain website。
188bet mobile high-speed servers to parallel processing to cloud computing.
From static 188bet mobile to dynamic incremental 188bet mobile、Adaptive 188bet mobile,From fragile 188bet mobile to robust 188bet mobile。

