Innovation from the Big Data Challenges

Experience of TransLab (2011-2013)

TransLab team has participated various Big Data Challenges such as WISE Challenge (Weibo Performance Track 2012 [1] and Entity Linking Track 2013 [2]) and Alibaba 2013 Challenge [3] etc with successful experience.

Considering the scalability, we treated the data from Twitter with 101 GB, Sina Weibo with 75 GB, Alibaba with 2 GB in its cloud computing environment, and Wikilinks dataset comprising 40 million Wikipedia mentions and over 3 million entities.

According to the variety of the data, we analyzed the data from On-Line Social Network (Twitter [4], ScienceNet Blog [5], Sina or Tencent Weibo), E-commerce (Alibaba) and Encyclopedia (Baidu Baike and Wikipedia[2]).

During these complex studies, we have developed a series of new solutions or methods such as

  • the Follow Model to describe the logical relationship of users in Twitter and Weibo [1, 6] ;
  • W-entropy index to present the user´s influence considering uneven information distribution across various internet platforms[7];
  • a system with straightforward two-fold unsupervised strategy to extract and tag entities from Wiki texts.
  • and also E-commerce Game Model to simulate the negotiation activity between hosting company and vendors to achieve the biggest payoff by Nash equilibrium.

Here, we briefly present above projects concentrating on analytical skill and innovation methods from our Big Data experience.

A more difficult task of Big Data study is to get the proper data. This problem may be resolved by joining some interesting Big Data Challenges, where the data may be available for the participants. During last three years, we have participated the WISE Challenges as the winner: Weibo Performance Track 2012 and Entity Linking Track 2013, and also Alibaba 2013 Challenge.

The goals of our research are in the following aspects:

  1. Practical Experience: Sharing our experience in various Big Data Challenges with attendees such as WISE Challenge (Weibo Performance Track 2012 [1] and Entity Linking Track 2013 [2]) and Alibaba 2013 Challenge [3] and others
  2. Variety data sources: Presentation of Big Data problem solving and analytical skill for diversity data from social medias such as Online Social Networks (Twitter and Weibo), E-commerce Data (Alibaba), Encyclopedia text processing (Baidu Baike and Wikipedia).
  3. Innovation methods: Presentation of the innovation method of the development of the new solutions such as Follow Model for Twitter and Weibo, W-entropy index to present the user´s influence etc.

From our research, the readers will have the basic knowledge and skill to solve the Big Data problems related to Online Social Networks, E-commerce and Encyclopedia.

As the profiles of our tutorial concerning Practical Experience, using Variety data sources and developed Innovation methods in Bid Data, this is exactly the spirit of IEEE Big Data 2013. It will transfer rich experience to attendees in this study.

• General description of research content;

           1) Introduction

WISE Challenge (Weibo Performance Track 2012 and Entity Linking Track 2013) and Alibaba 2013 Challenge.

           2) Twitter/Weibo query performance analyses [1,6]

Network and Tweet data description (101 GB from Tweeter [5], 75 GB from Weibo[1]), Follow Model to present the user´s relationship, MapReduce   solution for querying in Twitter, Hadoop implementation.

           3) Wiki Entity Extraction from Plain-Text Collections [2]

Wiki Entity linking (3 million) and mentions (40 million), proper nouns and concrete concepts, manual and automatic labeling, straightforward two-fold unsupervised strategy to extract and tag entities, and precision analyses.

           4) W-entropy index to present the user´s influence [7]

Measurement of user´s influence, Shannon entropy, uneven information distribution, W-entropy index definition, application for the data from Google, YouTube, Twitter and Facebook.

           5) Conclusions

Brief introduction of E-commerce Game Model for Alibaba and conclusions.

References

[1] Edans F. de O. Sandes, Li Weigang, Alba C. M. A. de Melo: Logical Model of Relationship for Online Social Networks and Performance Optimizing of Queries - WISE 2012 Challenge - T1: Performance Track Scalability Winner. 13th International Conference on Web Information System Engineering - WISE 2012: 726-736.

[2] Carolina Abreu, Flavio Costa, Laecio Santos, Lucas Monteiro, Luiz Peres, Patricia Lustosa and Li Weigang, Entity Extraction within Plain-Text Collections WISE 2013 Challenge - T1: Entity Linking Track, Accepted to 14th International Conference on Web Information System Engineering - WISE 2012, Nanjing, 2013.

[3] Zheng Jianya, Daniel L. Li, Patrick K. Cisuaka, Li Weigang, E-commerce Game Model: balancing platform service fees with vendor profitability, submitted to The 2013 Chinese Conference of Complex Networks (CCCN'2013), Hangzhou, China, 2013.

[4] Yang, J., Leskovec J.: Patterns of Temporal Variation in Online Media. ACM International Conference on Web Search and Data Mining (WSDM'11), pp. 177-186, 2011.

[5] Weigang Li, Jianya Zheng: Using W-Entropy Rank as a Unified Reference for Search Engines and Blogging Websites. WEBIST (Selected Papers) 2012: 252-266.

[6] Zheng Jianya, Li Weigang and Lorna Uden, Top-X Querying in Online Social Networks with MapReduce Solution, Accepted to 2013 Eighth International Knowledge Management in Organizations Conference Social and Big Data Computing for Knowledge Management, (KMO2013), Taiwan.

[7] Li Weigang, Zheng Jianya and Daniel Li, 2011. Analysis of W-entropy Index: the Impact of Members on Social Networks. In Proceedings of the IADIS International Conference WWW/INTERNET, Rio de Janeiro, Brazil, 171-178. Best Paper Award