Hadoop Eco System

Few intrinsic of Apache Zookeeper and their importance

As a bird’s eye view, Apache Zookeeper has been leveraged to get coordination services for managing distributed applications. Holds responsibility for providing configuration information, naming, synchronization, and group services over large clusters in distributed systems. To consider as an example, Apache Kafka uses Zookeeper for choosing their leader node for the topic partitions. Please click here if you want read on how to setup the multi-node Apache Zookeeper cluster on Ubuntu/Linux zNodes The key concept of the Zookeeper is the znode which can be acted...

Read more...

Transfer structured data from Oracle to Hadoop storage system

Using Apache's sqoop, we can transfer structured data from Relational Database Management System to Hadoop distributed file system (HDFS). Because of distributed storage mechanism in Hadoop Distributed File System (HDFS), we can store any format of data in huge volume in terms of capacity. In RDBMS, data persists in the row and column format (Known as Structured Data). In order to process the huge volume of enterprise data, we can leverage HDFS as a basic data lake. In this...

Read more...

Technology Platform behind Aadhaar card implementation

We are almost familiar with Aadhaar card which had been rolled out as a first initiative in 2003. It's a 12-digit unique identification number issued by the Indian government to every individual resident of India and that can be used to access a variety of services and benefits. Hadoop is an open-source big data processing framework, that has been customized excessively by the company named MapR to boost performance.The aadhaar card project has been developed using MapR's customized Hadoop and...

Read more...

Mainframe Applications slowly migrating to Hadoop

The giant organizations across the globe are using legacy mainframe systems due to it's scalability, security and reliability of machine's processing capacity subjected to heavy and large workloads. Of course, these infrastructures desire huge hardware, software and processing capacity. As the technology advancing very rapidly, scarcity of mainframe technicians, developers etc are increasing and it has become a major challenge for those organizations to continue their operations. The maintenance/replacement of these hardware are also another threat due to low...

Read more...

Establishment of Data Lake specific to multi-channel e-commerce application to understand customer’s buying pattern

Post order fulfillment data is becoming a very important asset of e-commerce vendors to understand complete buying pattern of customers. Especially for the e-commerce vendors who sells multiple products starting from electronics to apparels. Extraction and transformation are time-consuming operations when partially structured data starts moving from the various sources and finally land into the relational data warehouse.  Data extracted from the social media are semi-structured (JSON or XML).  As an example, Facebook provides information in JSON format through Graph API and same...

Read more...