Friday, September 21, 2012

Promising Big Query platform

Big Data, is growing buzzword that many companies are leveraging to define various business strategies. My interpretation in simple language is - How to manage (Store, access control, read, analyse) massive Information or data generated in order of tera, peta, exa or zeta bytes.

Google, a major technology company, has been dealing with this size of data like other major companies like EMC, IBM, Microsoft and Yahoo. Google recently developed an internal engine called Dremel, a scalable, interactive ad-hoc query system for analysis of read-only nested data set. This engine makes large data sets look very small (by providing meaningful data set faster to application). Big Query is a wrapped up with APIs for developers to use this engine. A good article on this by Wired.com

My team gets excited on any new offerings from Google especially for its enterprise customers. Hence we got our hands dirty to learn what it is all about. Here are our learnings. Reading through Developers site and watching note from I/O event, got us started, followed by a small POC done by me & the team. Here is our understanding & learning.

Architecture
Existing Models
Relational Database works amazing when we have query executed based on primary & foreign keys. This is because this database(s) are internally stored as B-Tree. eg:
Customer Ids. They form basis for profile information, transactional data and similar.

Challenge
A query that seeks information based on non primary or foreign key, entire table has to be scanned for retrieving result set. eg: Identify list of users from a particular geography. Here data-set is indexed on customer-id & not geography and hence entire table has to be scanned for validating geography. If we talk about Big Data (at least TB records), it gets expensive and time-consuming.

Alternatives
  • Avoid Table Scan
  • Fasten up the table scan (Dremel leverages this concept)
Dremel leverages following:
Instead of deriving record oriented storage, it follows column oriented storage. i.e. store each column in separate file. Their are two advantages of column oriented storage.
  1. Read only the columns, based on the information requested. 
  2. Leverage compression algorithm to compress the content. Column property suggest similar nature of data, hence compression works to a great extend. 





In above figure, each level Mixer and Shard are compute machines with computational power, disk and RAM. As soon as a query is requested, parallel requested are fired to next level cores (children) to read data, process and aggregate data. Result is passed on to the parent, where it is reduced as relevant. Above computes talk to each other through RPC over high bandwidth.

Execution
Managing data
Data has to be uploaded in a non-normalized pattern. A CSV table can be uploaded via Big Query tool or through Google's cloud storage (GCS).


Data Schematics
While importing data, data-schema has to be explicitly defined. Hence a text file with field name and schema definitions comes handy. (Schema is pretty simple FieldName:DataType, FieldName2:DataType). Data type are integer, float and string. 

Update: When I started playing with the tool. Not many 3rd party ETLs were available. But now I do see bunch of ETLs available. 

Query
Big query follows SQL dialect with basic set of SQL. As it is developed for analytics purpose, it does not support update or delete requests for data records.

Browser based Big Query tool is very handy to get hands dirty and try running few queries to see how it behaves, and try ad-hoc queries. 

Integration
There are 2 set of Integration points to make a full fledge system. (1) Pull in data from existing data ware house (2) Visually represent the business intelligence derived by running big query on huge data set. 

The great element of Integration story is availability of Big Query through REST Interface with rich set of integration libraries (Java, JavaScript, Python, PHP, .Net, AppScript).

Data Connectors
I do see ways to connect data sets to the big query platform, but have not explored in detail. 

Visualization & business Intelligence
Based on the library used to interact with Big Query, the JSON response set notation varies. Use sample application libraries to get started and learn more using browser debugger. As the data set received is in JSON, it can be plugged into any 3rd party tool for visualization and business representation. For my playground, I used Google Charting APIs to draw charts along with JavaScript library. 

Security
Big query follows standard project hosting on Google projects, which has standard security measures. 

Business Applicability
As claimed, this platform seems to be worth experimented with the given Google's credentials of scalability and speed power of execution. We could set it up and get it running very quickly. With low investment and empowering the cloud platform, we could see results instantaneously. 

For a business entity, insights on business data is vital for its growth. Analytics can be quickly derived using Big Query platform, as they don't need to invest heavily into infrastructure and setup cost. The most important is mining information from large data-set, with incremental knowledge is pretty instantaneous (over waiting to configure the system, pull out data, define model & so on..)

Applicability: This platform is applicable to any vertical industry - say Telco to Retail to online eCommerce platform to an automobile company. 

It's worth exploring benefits of this promising platform. 

Learnings & Challenges
  • After adding OAuth2 client ids, for all new users (through their GMail Ids) that were trying to access the web application (triggering big query request) received error of not accepting Big Query TOS (Not sure why end users have to accept it?) . Yet I've to find a way to smoothly fix the behavior. 
  • From the data set used for POC, TimeStamp data type element is missing and needs to parsed and formatted for better representation. 
  • Documentation for different libraries for Big query and their JSON response notation is not detailed. 
  • For experimental purpose, do refer to the quota policy for your needs
  • Big Query tool restricts uploading of files with larger size. Use GCS for managing larger chunks of data.

Wednesday, August 1, 2012

Raksha bandhan - Indian festival over hangout?

Raksha bandhan, an Indian festival citing bond of protection between brother and sister; where sister ties rakhi (a sacred thread) on brothers wrist praying for his well-being and long life and brother vows to take care of her.


 


Growing urbanization, where families have moved to new cities do miss their family back home especially during festivals. Technology has been bridging gaps and bringing them closer. It's easy to reach them over a phone call and hear their voice. Have a video chat to see them in action.


For such distant families, to celebrate the kind of festival in a special way, we thought of building a Rakhi celebration application using Google+ Hangout platform. This application helps in performing Tilak, sharing Rakhi, sharing sweet and gifts. This application is in alpha stage (Feedback's welcomed) to learn how technology can bring fun and liveness within our daily life.


Try it here.
Start a Hangout

Steps to use the app

  • Start Hangout and Invite your family members to join
  • Install Rakhi celebration app, and you shall see it running on left side of the Hang out page. 
  • From the rakhi app, select your family member name before performing rakhi ceremony like selecting tilak / sharing rakhi / sweet / gift. 
We will be happy to hear more about the experience of the app, platform, and the fun you had with this app. 

Friday, April 6, 2012

Chrome OS

Google's Chrome project aka Chromium runs for their Chrome Browser and Chrome OS. Loving its browser, I thought of checking out the OS. 


chromium-os




Downloading image
Google Chrome OS, is available only via their partnered OEM's hardware. Hence to try out, how it looks and what's the experience, I started googling around. Chromium OS & Google Chrome OS, share same code base with an exception of later having additional packages to ensure better experience on its partnered OEMs hardware. Chromium OS can be downloaded, compiled & executed. Avoiding all that tricky steps, I looked for readily available image. 


Though not much of information available, but whatever available; all pointed towards hexxeh. They seem to be checking out latest version of Chrome OS available on public domain, build it & release a installable image for end consumers. 


Installation

  • I started with its Vanila variant. 
  • Downloaded & copied image to the USB drive through image writer. 
  • Plugged USB in to my old HP Laptop (running with 512 MB RAM, Intel Pentium Dual Core processor). 
  • Booted computer from USB. Saw Google Chrome's logo with text 'Chromium'. Got excited !!!
    NOTE: 
    Google Chrome OS has green/yellow/red logo, while chromium OS has blue/bluer/bluest logo. 
  • After few minutes, saw initial screen asking for KeyBoard Language, Network connectivity.
    NOTE: Unless, I connected ethernet cable, I could not proceed. For some reason WiFi option was not available. 
  • Next step to configure Google Account. I gave my existing Google account credentials and it started syncing contents.
    NOTE: For security reasons, secondary encryption password is prompted before sync can be performed (which can be found in google account dashboard).
  • Upon syncing & configuring, I was asked to select my logic profile picture (It also pulled up my Google+ profile image). 

Experience

Applications installed on Chrome Browser (via Chrome browser App Store) popped up as my applications on the machine. All most all this applications works in browser & hence they worked here equally well. I do not have that many applications installed to have detail analysis on the same. 

File Manager: File manager is part of the browser, where all files are visible. UX looks very similar to Google docs listing all files in a collection. 

Printer: Printer on cloud. I haven't tried it, so can't comment. 

Settings: Could see Ethernet, WiFi, Keyboard, trackpad settings. Again not as a separate application, but all under Chrome Browser page. 

Other Info: I could see my computers battery status in bottom right corner. Could not see other hardware or similar details. 

Summary
Honestly, as this was not an official Google Chrome's OS image; but a similar variant - It gave me a basic feel of what the OS is all about. An operating system, with a home page and all applications hosted (or embedded) in a browser. Data pulled from my cloud services and processed locally (where-ever applicable). Certain services have support for offline data availability to ensure work continuity especially when no or limited internet connectivity. 

Google's official Chrome OS, has pre-bundled extended package like Adobe Flash, Adobe PDF, Google Talk client, 3G Cellular support, etc; which makes the entire OS better. 

For users, who can get their all major work done on a browser (Mail, Doc processing, Financial, Image & Video Processing), it seems to be a good option. But hold on - Do one really needs to buy a new hardware for this? May not be... We all use browser in our regular laptop / tablets. Then why a new hardware? May be a trimmed version of hardware like a tablet or ultra books seems to be a good fit.  

I'll definitely look forward to use ChromeBook soon.