Hadoop Hbase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data sets, which are common in many big data use cases. An HBase system comprises a set of tables. Each table contains rows and columns, much like a traditional database. Each table must have an element defined as a Primary Key, and all access attempts…

Read More

Hadoop Sqoop efficiently transfers bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop helps offload certain tasks (such as ETL processing) from the EDW to Hadoop for efficient execution at a much lower cost. Sqoop can also be used to extract data from Hadoop and export it into external structured datastores. Sqoop works with relational databases such as Teradata, Netezza, Oracle,…

Read More

Hadoop Hive is a runtime Hadoop support structure that allows anyone who is already fluent with SQL (which is commonplace for relational data-base developers) to leverage the Hadoop platform right out of the gate. Hive allows SQL developers to write Hive Query Language (HQL) statements that are similar to standard SQL statements. HQL is limited in the commands it understands, but it is still useful….

Read More

Hadoop Pig was initially developed at Yahoo to allow people using Hadoop to focus more on analyzing large datasets and spend less time writing mappers and reduce programs. This would allow people to do what they want to do instead of thinking about mapper and reducer tasks. Name Pig was given to the programming language with a hint on it being designed to handle any…

Read More

Z-Score or Standard Score in statistics is the signed number of standard deviations by which the value of an observation or data point is above the mean value of what is being observed or measured. Observed values above the mean have positive standard scores, while values below the mean have negative standard scores. The standard score is a dimensionless quantity obtained by subtracting the population…

Read More

Unsupervised Learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. The clusters are modelled using a measure of similarity which is defined upon metrics such as Euclidean or probabilistic…

Read More

Type II Error in statistical hypothesis testing is incorrectly retaining a false null hypothesis (a “false negative”). A type II error (or error of the second kind) is the failure to reject a false null hypothesis. Examples of type II errors would be a blood test failing to detect the disease it was designed to detect, in a patient who really has the disease; a…

Read More

Type I Error in statistical hypothesis testing is the incorrect rejection of a true null hypothesis (a false positive). More simply stated, a type I error is detecting an effect that is not present. A type I error (or error of the first kind) is the incorrect rejection of a true null hypothesis. Usually, a type I error leads one to conclude that a supposed…

Read More

True Positive Rate (Sensitivity) is a statistical measure which measures the proportion of positives that are correctly identified as such (for example, the percentage of sick people who are correctly identified as having the condition). Another way to understand it, with examples in the context of medical tests is that sensitivity is the extent to which true positives are not missed/overlooked. Thus a highly sensitive…

Read More

True Negative Rate (Specificity) is a statistical measure which measures the proportion of negatives that are correctly identified as such (for example, the percentage of healthy people who are correctly identified as not having the condition). Specificity is the extent to which positives really represent the condition of interest and not some other condition being mistaken for it. A highly specific test rarely registers a…

Read More
Show Buttons
Hide Buttons