New users automatically create accounts after logging in
LoginThird-party login
Batch and real-time processing modes for big 188bet online sports betting already exist。But,There is no model that allows us to batch process non-independent 188bet online sports betting in real time。Once the partner receives the operation guidance,Expedia’s marketing team needs to analyze interdependent 188bet online sports betting sets。The existing system runs in a local Hadoop cluster,But the entire team has been working hard to meet the internal SLA (Service Level Agreement)。This information is also time-sensitive,Getting 188bet online sports betting faster means giving partners better operational guidance。
The Pariveda team working on Expedia participated in AWS’s Solutions Architects study to address three distinct challenges: How to deliver analysis results as quickly as possible after the source 188bet online sports betting is available;How to deal with 188bet online sports betting sets that depend on each other but are generated at different times;How to manage dependencies between 188bet online sports betting sets arriving at different times。
In this article,I will describe Expedia,How Pariveda and the AWS team used AWS Lambda,Amazon DynamoDB,Amazon EMR and Amazon S3 are components,Find unique ways to process data in real time。188bet online sports betting how to implement a similar delivery path without managing any infrastructure。
Removing interdependencies between 188bet online sports betting sets
One problem we need to solve is the interdependence between 188bet online sports betting sets。Our goal is to provide unification for another system,Clean 188bet online sports betting input source for more detailed analysis。In order to create these 188bet online sports betting input sources,Hundreds and thousands of 188bet online sports betting sets of different kinds from multiple partners and internal systems must be ingested every day,Aggregation and query。Every 188bet online sports betting,The 188bet online sports betting arrival time for each partner is different。This means that the 188bet online sports betting processing process needs to continue,Until all 188bet online sports betting required for a 188bet online sports betting input source has arrived。
The solutions outlined below are our results。We use AWS Lambda belonging to an S3 bucket to update tasks defined in DynamoDB。188bet online sports betting definition includes name,List of dependent files,Their status (arrived or not arrived) and the parameters required to run the 188bet online sports betting in the EMR。Once all files required for a specific 188bet online sports betting have arrived,lambda function will update the 188bet online sports betting queue,Start a cluster in EMR。EMR pushes the results back to S3 so that applications using S3 can retrieve the results when they are needed。
Configuration 188bet online sports betting
The core 188bet online sports betting of the system。The 188bet online sports betting object saves all information,This information is used to determine data dependencies,Required for dependency status and pending processing results。By defining tasks,You can configure all the work that needs to happen。From 188bet online sports betting list,Easily see all data dependencies。
Establish mapping between S3 events and tasks
When we get events from S3,Events raised in AWS Lambda only have context about the modified object in S3。Data fetched directly from S3 does not have any information about this 188bet online sports betting。But in Node.js This message only gives one line of code,We did get a very precious piece of information,That’s the new key for the S3 object。
var srcKey = event.Records[0].s3.object.key;
From this new key,We need a way to get 188bet online sports betting information。In order to achieve this purpose,We created the FileUnit table。This table actually completely changes the 188bet online sports betting,Use S3 key as scope key to open table,And the 188bet online sports betting key serves as the data payload。This gets us the source key,Find out the tasks we have with a single DynamoDB query。
From here,We can update the 188bet online sports betting,Determine whether all dependent data has arrived,And start Amazon EMR。
Generated flow chart
Create DynamoDB table
We create the following three tables in DynamoDB:
188bet online sports betting table
For 188bet online sports betting table,We use HashKey/RangeKey Primary Key configuration,Use date as hash key,Use TaskKey string as scope key。This table can have any name,Only in all your tasks,The name is unique。For the Expedia project,From the perspective of hash key,Date does not adhere to the guidelines for time series data,So you can create another predictable hash key。But if your 188bet online sports betting is repeated on a day basis,Then Date is a good choice for hash key,Because it facilitates subsequent searches。
Here is just a sample entry:
When creating the table,We only need to care about the TaskKey and Date parameters。But input files (note the paths of these files on S3) and ScriptParameters are necessary for the operation of the entire system。This 188bet online sports betting is created in the console。In actual operation,These configuration information should be loaded from a file at a set frequency before the data file is loaded。
FileUnit table
FileUnit is a reference to the 188bet online sports betting table using the S3 path。It has three properties:
Date (as hash key)
Filename (as scope key) – specifies the path to the Filename file on S3
188bet online sports betting–the 188bet online sports betting to be referenced
In all practices,Date is not a required attribute of the FileUnit table。actually,If you can avoid this attribute without affecting the running of the 188bet online sports betting,That would be a better choice,But this property better supports our description。If your 188bet online sports betting name is not repeated based on days,Use S3 path as hash key,It would be better to use the 188bet online sports betting name as the scope key。This allows you to query based on the hash key at the same time,Managing data sets that depend on each other across multiple tasks becomes easier。
Batch table
Batch table should be created,And use the Date attribute as the hash key,Use 188bet online sports betting attribute as scope key。The value of 188bet online sports betting will be the same as the value of TaskKey in the 188bet online sports betting table。For the convenience of query,We also added a global alternate index for the Batch table,And use date as hash key,Use ProcessingState as the scope key。This helps us query unprocessed items easily。
Test 188bet online sports betting
For testing,Create an entry similar to the above in the 188bet online sports betting table。Make sure to use the input path specified by the attribute name,Set their values to NULL。Next step,Get these input paths,Create entry in FileUnit table。The path name must exactly match the value of the Filename column (including case)。The value of 188bet online sports betting in the FileUnit table must match the 188bet online sports betting’s TaskKey value。To use the above 188bet online sports betting sample,You will create the following three FileUnit entries:
Table created,After the test 188bet online sports betting is loaded,We can implement AWS Lambda functions。
Write AWS Lambda function
Code Framework
The code skeleton is very similar to the flowchart shown above.
We will whitewash the code,Add completed items to the Batch table because the code is very similar to the updateTask function above。But we do use the putItem function instead of the updateItem function。
Start EMR
Starting tasks in EMR is easy。We start an EMR cluster,Then add a workflow step。There are a lot of configuration codes,But the essence is simple。188bet online sports betting must install the right apps to do your job,In this case,188bet online sports betting must install Hive,So 188bet online sports betting will see the workflow steps added to EMR when 188bet online sports betting launch EMR。
From here,We call the addJobFlowSteps function by using script parameters,Add processing tasks to workflow。There is a small conversion step that needs to be done here。188bet online sports betting can find the conversion code in the GitHub repository。
Deploy AWS Lambda function
To deploy the application in an AWS Lambda function,188bet online sports betting need:
1. Download the source code from GitHub.
2.Use the npminstall tool to install the components that async depends on。
3.in FunctionConstants.Update the value of logsPath in the js file,Make it point to a certain bucket,And prefix the path where 188bet online sports betting want EMR to place the log files。
4.Package function,And deploy the function,As shown in this example or walkthrough。
5.Make sure your Lambda Execution IAM role has the following permissions:
a.In DynamoDB – calling getItem,Permissions for updateItem and putItem functions
b.In EMR – Permission to call startJobFlow and addJobFlowItem functions
Quick test on console
Make sure all parameters are configured correctly,188bet online sports betting can open the Edit/Test page of the AWS Lambda function on the AWS console,Simulating a new file added to S3:
Using S3 sample events,Modify the parameter values of the s3 part and the object part,Events that trigger files in simulation tasks。
In the Execution results window 188bet online sports betting should see the message Files processed successfully。
Publish S3 bucket events to AWS Lambda function
From the console,Select the Add event source option from the Actions menu,Add Object Created event from S3 bucket to AWS Lambda function,Set information about event source to S3 bucket:
End-to-end testing
Now that everything is ready,188bet online sports betting can create new files in the S3 bucket。188bet online sports betting should see a message similar to the screenshot below:
If the EMR cluster has not been started yet,188bet online sports betting can view the CloudWatch logs generated by this function,Identify the problem。188bet online sports betting can also simulate all file arrivals through the console to determine where the function receives the error message。
Congratulations, 188bet online sports betting have set up a working system!
Optimization
There are several ways to optimize this system,Depends on your usage scenario。The following are some optimization cases。
One file for multiple tasks
If multiple tasks start from the same file,You must adjust the FileUnit table appropriately,Then adjust the 188bet online sports betting query related parameters to handle the structural changes。You can use the format described above (the file name is the hash key) or you can keep the format but set the value of the entry in the 188bet online sports betting table to a set of values instead of a single value。
Cleaning 188bet online sports betting
If your 188bet online sports betting is small,You expect data to arrive with a certain frequency,You can adjust the batch 188bet online sports betting size to be greater than 1 (this value is configured in the configuration file)。If you set it like this,You may want to add a cleaning function,Make it clear the tasks that may not have been run at the time specified by the timer。In this way,You gained efficiency,Pending tasks do not have to wait too long to meet the batch limit before running
New users automatically create accounts after logging in
LoginThird-party login
Retrieve password
Register account