How Expedia performs near-real-time analysis of interdependent 188bet online sports betting sets

Author of this article: CSDN 2015-07-14
In this article,Describes Expedia,How Pariveda and the AWS team used AWS Lambda,Amazon DynamoDB,Amazon EMR and Amazon S3 are components,Find unique ways to process data in real time。188bet online sports betting how to implement a similar delivery path without managing any infrastructure。

Batch and real-time processing modes for big 188bet online sports betting already exist。But,There is no model that allows us to batch process non-independent 188bet online sports betting in real time。Once the partner receives the operation guidance,Expedia’s marketing team needs to analyze interdependent 188bet online sports betting sets。The existing system runs in a local Hadoop cluster,But the entire team has been working hard to meet the internal SLA (Service Level Agreement)。This information is also time-sensitive,Getting 188bet online sports betting faster means giving partners better operational guidance。

The Pariveda team working on Expedia participated in AWS’s Solutions Architects study to address three distinct challenges: How to deliver analysis results as quickly as possible after the source 188bet online sports betting is available;How to deal with 188bet online sports betting sets that depend on each other but are generated at different times;How to manage dependencies between 188bet online sports betting sets arriving at different times。

In this article,I will describe Expedia,How Pariveda and the AWS team used AWS Lambda,Amazon DynamoDB,Amazon EMR and Amazon S3 are components,Find unique ways to process data in real time。188bet online sports betting how to implement a similar delivery path without managing any infrastructure。

Removing interdependencies between 188bet online sports betting sets

One problem we need to solve is the interdependence between 188bet online sports betting sets。Our goal is to provide unification for another system,Clean 188bet online sports betting input source for more detailed analysis。In order to create these 188bet online sports betting input sources,Hundreds and thousands of 188bet online sports betting sets of different kinds from multiple partners and internal systems must be ingested every day,Aggregation and query。Every 188bet online sports betting,The 188bet online sports betting arrival time for each partner is different。This means that the 188bet online sports betting processing process needs to continue,Until all 188bet online sports betting required for a 188bet online sports betting input source has arrived。

The solutions outlined below are our results。We use AWS Lambda belonging to an S3 bucket to update tasks defined in DynamoDB。188bet online sports betting definition includes name,List of dependent files,Their status (arrived or not arrived) and the parameters required to run the 188bet online sports betting in the EMR。Once all files required for a specific 188bet online sports betting have arrived,lambda function will update the 188bet online sports betting queue,Start a cluster in EMR。EMR pushes the results back to S3 so that applications using S3 can retrieve the results when they are needed。

Expedia如何对相互依赖的数据集进行准实时分析

Configuration 188bet online sports betting

The core 188bet online sports betting of the system。The 188bet online sports betting object saves all information,This information is used to determine data dependencies,Required for dependency status and pending processing results。By defining tasks,You can configure all the work that needs to happen。From 188bet online sports betting list,Easily see all data dependencies。

Establish mapping between S3 events and tasks

When we get events from S3,Events raised in AWS Lambda only have context about the modified object in S3。Data fetched directly from S3 does not have any information about this 188bet online sports betting。But in Node.js This message only gives one line of code,We did get a very precious piece of information,That’s the new key for the S3 object。

var srcKey = event.Records[0].s3.object.key;

From this new key,We need a way to get 188bet online sports betting information。In order to achieve this purpose,We created the FileUnit table。This table actually completely changes the 188bet online sports betting,Use S3 key as scope key to open table,And the 188bet online sports betting key serves as the data payload。This gets us the source key,Find out the tasks we have with a single DynamoDB query。

From here,We can update the 188bet online sports betting,Determine whether all dependent data has arrived,And start Amazon EMR。

Generated flow chart

Expedia如何对相互依赖的数据集进行准实时分析

Create DynamoDB table

We create the following three tables in DynamoDB:

188bet online sports betting table

For 188bet online sports betting table,We use HashKey/RangeKey Primary Key configuration,Use date as hash key,Use TaskKey string as scope key。This table can have any name,Only in all your tasks,The name is unique。For the Expedia project,From the perspective of hash key,Date does not adhere to the guidelines for time series data,So you can create another predictable hash key。But if your 188bet online sports betting is repeated on a day basis,Then Date is a good choice for hash key,Because it facilitates subsequent searches。

Here is just a sample entry:

When creating the table,We only need to care about the TaskKey and Date parameters。But input files (note the paths of these files on S3) and ScriptParameters are necessary for the operation of the entire system。This 188bet online sports betting is created in the console。In actual operation,These configuration information should be loaded from a file at a set frequency before the data file is loaded。

FileUnit table

FileUnit is a reference to the 188bet online sports betting table using the S3 path。It has three properties:

  • Date (as hash key)

  • Filename (as scope key) – specifies the path to the Filename file on S3

  • 188bet online sports betting–the 188bet online sports betting to be referenced

In all practices,Date is not a required attribute of the FileUnit table。actually,If you can avoid this attribute without affecting the running of the 188bet online sports betting,That would be a better choice,But this property better supports our description。If your 188bet online sports betting name is not repeated based on days,Use S3 path as hash key,It would be better to use the 188bet online sports betting name as the scope key。This allows you to query based on the hash key at the same time,Managing data sets that depend on each other across multiple tasks becomes easier。

Batch table

Batch table should be created,And use the Date attribute as the hash key,Use 188bet online sports betting attribute as scope key。The value of 188bet online sports betting will be the same as the value of TaskKey in the 188bet online sports betting table。For the convenience of query,We also added a global alternate index for the Batch table,And use date as hash key,Use ProcessingState as the scope key。This helps us query unprocessed items easily。

Test 188bet online sports betting

For testing,Create an entry similar to the above in the 188bet online sports betting table。Make sure to use the input path specified by the attribute name,Set their values ​​to NULL。Next step,Get these input paths,Create entry in FileUnit table。The path name must exactly match the value of the Filename column (including case)。The value of 188bet online sports betting in the FileUnit table must match the 188bet online sports betting’s TaskKey value。To use the above 188bet online sports betting sample,You will create the following three FileUnit entries:

Table created,After the test 188bet online sports betting is loaded,We can implement AWS Lambda functions。

Write AWS Lambda function

Code Framework

The code skeleton is very similar to the flowchart shown above.

We will whitewash the code,Add completed items to the Batch table because the code is very similar to the updateTask function above。But we do use the putItem function instead of the updateItem function。

Start EMR

Starting tasks in EMR is easy。We start an EMR cluster,Then add a workflow step。There are a lot of configuration codes,But the essence is simple。188bet online sports betting must install the right apps to do your job,In this case,188bet online sports betting must install Hive,So 188bet online sports betting will see the workflow steps added to EMR when 188bet online sports betting launch EMR。

From here,We call the addJobFlowSteps function by using script parameters,Add processing tasks to workflow。There is a small conversion step that needs to be done here。188bet online sports betting can find the conversion code in the GitHub repository。

Deploy AWS Lambda function

To deploy the application in an AWS Lambda function,188bet online sports betting need:

1. Download the source code from GitHub.

2.Use the npminstall tool to install the components that async depends on。

3.in FunctionConstants.Update the value of logsPath in the js file,Make it point to a certain bucket,And prefix the path where 188bet online sports betting want EMR to place the log files。

4.Package function,And deploy the function,As shown in this example or walkthrough。

5.Make sure your Lambda Execution IAM role has the following permissions:

a.In DynamoDB – calling getItem,Permissions for updateItem and putItem functions

b.In EMR – Permission to call startJobFlow and addJobFlowItem functions

Quick test on console

Make sure all parameters are configured correctly,188bet online sports betting can open the Edit/Test page of the AWS Lambda function on the AWS console,Simulating a new file added to S3:

Expedia如何对相互依赖的数据集进行准实时分析

Using S3 sample events,Modify the parameter values ​​​​of the s3 part and the object part,Events that trigger files in simulation tasks。

In the Execution results window 188bet online sports betting should see the message Files processed successfully。

Publish S3 bucket events to AWS Lambda function

From the console,Select the Add event source option from the Actions menu,Add Object Created event from S3 bucket to AWS Lambda function,Set information about event source to S3 bucket:

Expedia如何对相互依赖的数据集进行准实时分析

End-to-end testing

Now that everything is ready,188bet online sports betting can create new files in the S3 bucket。188bet online sports betting should see a message similar to the screenshot below:

If the EMR cluster has not been started yet,188bet online sports betting can view the CloudWatch logs generated by this function,Identify the problem。188bet online sports betting can also simulate all file arrivals through the console to determine where the function receives the error message。

Congratulations, 188bet online sports betting have set up a working system!

Optimization

There are several ways to optimize this system,Depends on your usage scenario。The following are some optimization cases。

One file for multiple tasks

If multiple tasks start from the same file,You must adjust the FileUnit table appropriately,Then adjust the 188bet online sports betting query related parameters to handle the structural changes。You can use the format described above (the file name is the hash key) or you can keep the format but set the value of the entry in the 188bet online sports betting table to a set of values ​​instead of a single value。

Cleaning 188bet online sports betting

If your 188bet online sports betting is small,You expect data to arrive with a certain frequency,You can adjust the batch 188bet online sports betting size to be greater than 1 (this value is configured in the configuration file)。If you set it like this,You may want to add a cleaning function,Make it clear the tasks that may not have been run at the time specified by the timer。In this way,You gained efficiency,Pending tasks do not have to wait too long to meet the batch limit before running


Copyright Statement
Zhihui is based on "dry information"、Depth、Angle、Publish in-depth industry articles based on the principle of "objectivity"。If 188bet online sports betting want to get the first time to get the heavyweight articles in the tourism consumption industry or interact with Zhihui,Please search "Zhihui" in the WeChat official account and add follow。Contributions welcome,Jointly promote the upgrade of China’s tourism consumption industry chain。To submit articles or seek reports, please send an email to the zjz@tripvivid editorial office email address.com,After passing the review, the article will be published as quickly as possible and your name and organization will be attached。The articles published by Zhihui represent only the author’s personal views,Does not represent the views of favoritism。About investment and financing information,Zhihui Travel will try its best to verify,Does not endorse any investment or financing activities。Respect industry standards,Indicate the author and source for all reprints,Special reminder,If the reprint of the article involves copyright issues,Please contact us in time to delete。Zhihui’s original articles are also welcome to be reprinted,But please be sure to indicate the author and "Source: Zhihui",Any behavior that disrespects originality will be severely punished。
This article comes from CSDN, and the copyright belongs to the original author.
Leave a comment
Post a comment
Latest article
View more
# Hot search words #

New users automatically create accounts after logging in

Log in to indicate that 188bet online sports betting have read and agreed"Zhihui User Agreement" Register

Retrieve password

Register account