Blog

Apache Flink – A 4G Data Processing Engine

Analyzing streaming data in large-scale systems is becoming a focal point day by day to take accurate business decisions due to mushrooming of digital data generation sources around the globe including social media. Real-Time analytics are becoming more attractive due to possibilities of getting insights from the time-value of data (in other words, when data is in motion). Apache Flink, an open source highly innovative stream processor engine has been grounded which helps to take advantage of stream-based approaches. Besides [...]

Read more...

Steering number of mapper (MapReduce) in sqoop for parallelism of data ingestion into Hadoop Distributed File System (HDFS)

To import data from most the data source like RDBMS, sqoop internally use mapper. Before delegating the responsibility to the mapper, sqoop performs few initial operations in a sequence once we execute the command on a terminal in any node in the Hadoop cluster. Ideally, in production environment, sqoop installed in the separate node and updated .bashrc file to append sqoop’s binary and configuration which helps to execute sqoop command from anywhere in the multi-node cluster. Most of the [...]

Read more...

Transfer structured data from Oracle to Hadoop storage system

Using Apache’s sqoop, we can transfer structured data from Relational Database Management System to Hadoop distributed file system (HDFS). Because of distributed storage mechanism in Hadoop Distributed File System (HDFS), we can store any format of data in huge volume in terms of capacity. In RDBMS, data persists in the row and column format (Known as Structured Data). In order to process the huge volume of enterprise data, we can leverage HDFS as a basic data lake. In this [...]

Read more...

Data Ingestion phase for migrating enterprise data into Hadoop Data Lake

The Big Data solutions helps to achieve valuable information to iron out the accurate strategic business decision. Exponential growth of digitalization, social media, telecommunication etc. are fueling enormous data generation everywhere. Prior to process of huge volume of data, we should have efficient data storage mechanism in a distributed manner to hold any form of data starting from structured to unstructured. Hadoop distributed file systems (HDFS) can be leveraged efficiently as data lake by installing on multi node cluster. [...]

Read more...

Why Lambda Architecture in Big Data Processing

Due to the exponential growth of digitalization, the entire globe is creating minimum 2.5 quintillion 2500000000000 Million) bytes of data every day and that we can denote as Big Data. Data generation is happening from everywhere starting from social media sites, various sensors, satellite, purchase transaction, Mobile, GPS signals and much more. With the advancement of technology, there is no sign of slowing down of data generation, instead it will grow in massive volume. All the major organizations, retailers, [...]

Read more...

Apache Kafka, The next Generation Distributed Messaging System.

In Big Data project, the main challenge is to collect an enormous volume of data. We need a distributed high throughput messaging systems to overcome it. Apache Kafka is designed to address the challenge. It was originally developed at LinkedIn Corporation and later on became a part of Apache project. A Messaging System is typically responsible for transferring data from one application to another. A message is nothing but the bunch of data/information. To ingest huge volume of data into [...]

Read more...

Fog Computing

Fog computing also refer to Edge computing . Cisco Systems introduced the term “Fog Computing” and it’s not the replacement of cloud computing. Ideally cloud computing points to storing and accessing data and programs over the Internet instead of local computer’s hard drive or storage. The cloud is simply a metaphor for the Internet. In Fog computing, data, processing and applications are concentrated in devices at the network edge. Here devices communicate peer-to-peer so that data storage and share [...]

Read more...

Mobile Phone Authentication and Fraud

Day by day, we are getting addicted to the mobile phone especially smart phone since smart phone performs many of the functions of a computer, typically having a touch screen interface, Internet access. With the extensive growth of mobile applications, we can utilize various mobile applications starting from games to financial transaction including stock market brokerage. Many banks have launched their own mobile applications so that customer can download and start financial transactions like balance amount verification, money transfer [...]

Read more...

Basic concept on Data Lake

  The info graphics representing the basic concept of Data Lake where we can use the approach ELT (Extraction, loading and then transformation) against traditional ETL (Extraction, Transformation and then loading)process. ETL process implies to traditional data warehousing system where structured data format follows (raw and column).By leveraging HDFS (Hadoop Distributed File System), we can develop data lake to store any format data in order to process and analysis. Directly data can be loaded in the Lake without transformation, later transformation can [...]

Read more...

Real time data analytics helps mobile service providers to achieve aggressive advantages

  Usage of smart phone is become an integral part of our daily routine.  Keeping aside calling and SMS, we are always engaged with lots of other activities Staring from entertainment to domestic shopping, social engagement etc., by installing various types of mobile applications. Of course, mobile internet is mandatory to carry out above.  Mobile service providers are facing new and difficult challenges. Due to exponential growth of customer’s expectations, they need to serve accordingly with advanced mobile technology and handle unprecedented [...]

Read more...