An HDFS Tutorial for Data Analysts Stuck With Relational Databases

Introduction

By now, you have probably heard of the Hadoop Distributed File System (HDFS), especially if you are data analyst or someone who is responsible for moving data from one system to another. However, what are the benefits that HDFS has over relational databases?

HDFS is a scalable, open source solution for storing and processing large volumes of data. HDFS has been proven to be reliable and efficient across many modern data centers.

HDFS utilizes commodity hardware along with open source software to reduce the overall cost per byte of storage.

With its built-in replication and resilience to disk failures, HDFS is an ideal system for storing and processing data for analytics. It does not require the underpinnings and overhead to support transaction atomicity, consistency, isolation, and durability (ACID) as is necessary with traditional relational database systems.

Moreover, when compared with enterprise and commercial databases, such as Oracle, utilizing Hadoop as the analytics platform avoids any extra licensing costs.

One of the questions many people ask when first learning about HDFS is: How do I get my existing data into the HDFS?

In this article, we will examine how to import data from a PostgreSQL database into HDFS. We will use Apache Sqoop, which is currently the most efficient, open source solution to transfer data between HDFS and relational database systems. Apache Sqoop is designed to bulk-load data from a relational database to the HDFS (import) and to bulk-write data from the HDFS to a relational database (export).

HDFS Continue reading

After All These Years, the World is Still Powered by C Programming

c programming Toptal

Many of the C projects that exist today were started decades ago.

The UNIX operating system’s development started in 1969, and its code was rewritten in C in 1972. The C language was actually created to move the UNIX kernel code from assembly to a higher level language, which would do the same tasks with fewer lines of code.

Oracle database development started in 1977, and its code was rewritten from assembly to C in 1983. It became one of the most popular databases in the world.

In 1985 Windows 1.0 was released. Although Windows source code is not publicly available, it’s been stated that its kernel is mostly written in C, with some parts in assembly. Linux kernel development started in 1991, and it is also written in C. The next year, it was released under the GNU license and was used as part of the GNU Operating System. The GNU operating system itself was started using C and Lisp programming languages, so many of its components are written in C.

But C programming isn’t limited to projects that started decades ago, when there weren’t as many programming languages as today. Many C projects are still started today; there are some good reasons for that. Continue reading

Fixing the “Heartbleed” OpenSSL Bug: A Tutorial for Sys Admins

So what exactly is the bug anyway?

Here’s a very quick rundown:

A potentially critical problem has surfaced in the widely used OpenSSL cryptographic library. It is nicknamed “Heartbleed” because the vulnerability exists in the “heartbeat extension” (RFC6520) to the Transport Layer Security (TLS)  and it is a memory leak (“bleed”) issue.  User passwords and other important data may have been compromised on any site affected by the vulnerability.

The vulnerability is particularly dangerous for two reasons:

  1. Potentially critical data is leaked.
  2. The attack leaves no trace.

The affected OpenSSL versions are 1.0.1 through 1.0.1f, 1.0.2-beta, and 1.0.2-beta1. Continue reading

Are We Creating An Insecure Internet of Things (IoT)? Security Challenges and Concerns

The Internet of Things (IoT) has been an industry buzzword for years, but sluggish development and limited commercialization have led some industry watchers to start calling it the “Internet of NoThings”.

Double puns aside, IoT development is in trouble. Aside from spawning geeky jokes unfit for most social occasions, the hype did not help; and, in fact, I believe it actually caused a lot more harm than good. There are a few problems with IoT, but all the positive coverage and baseless hype are one we could do without. The upside of generating more attention is clear: more investment, more VC funding, more consumer interest.

security and the internet of things

However, these come with an added level of scrutiny, which has made a number of shortcomings painfully obvious. After a couple of years of bullish forecasts and big promises, IoT security seems to be the biggest concern. The first few weeks of 2015 were not kind to this emerging industry, and most of the negative press revolved around security.

Was it justified? Was it just “fear, uncertainty and doubt” (FUD), brought about by years of hype? It was a bit of both; although some issues may have been overblown, the problems are very real, indeed. Continue reading

10 Most Common Web Security Vulnerabilities

For all too many companies, it’s not until after a breach has occurred that web security becomes a priority. During my years working as an IT Security professional, I have seen time and time again how obscure the world of IT Security is to so many of my fellow programmers.

An effective approach to IT security must, by definition, be proactive and defensive. Toward that end, this post is aimed at sparking a security mindset, hopefully injecting the reader with a healthy dose of paranoia.

In particular, this guide focuses on 10 common and significant web security pitfalls to be aware of, including recommendations on how they can be avoided. The focus is on the Top 10 Web Vulnerabilities identified by the Open Web Application Security Project (OWASP), an international, non-profit organization whose goal is to improve software security across the globe. Continue reading

Android Developer’s Guide to the Google Location Services API

Knowing your user’s location is useful information in many applications we develop and use today. There are a lot of popular location-based applications out there that are making our lives easier, as well as changing the way that we use these services. An example is the wildly popular application Foursquare, where users who frequent to an establishment and “check in” often win discounts. Uber, which helps you get a ride from your mobile phone at a lower rate than a normal taxi. The list is large and still growing.

Continue reading

Hosting For Freelance Developers: PaaS, VPS, Cloud, And More

At a glance, the hosting industry may not appear exciting, but it’s grunts in data centres the world over that keep our industry going. They are, quite literally, the backbone of the Internet, and as such they make everything possible: from e-commerce sites, to smart mobile apps for our latest toys. The heavy lifting is done in boring data centres, not on our flashy smartphones and wafer thin notebooks.

Whether you’re creating a virtual storefront, deploying an app, or simply doing some third-party testing and development, chances are you need some server muscle. The good news is that there is a lot to choose from. The hosting industry may not be loud or exciting, but it never sleeps; it’s a dog eat dog world, with cutthroat pricing, a lot of innovation behind the scenes, and cyclical hardware updates. Cloud, IaaS and PaaS have changed the way many developers and businesses operate, and these are relatively recent innovations.

In this post I will look at some hosting basics from the perspective of a freelance developer: what to choose and what to stay away from. Why did I underline freelance software engineers? Well, because many need their own dev environment, while at the same time working with various clients. Unfortunately, this also means that they usually have no say when it comes to deployment. For example, it’s the client’s decision how and where a particular web app will be hosted, and a freelancer hired on short-term basis usually has no say in the decision. This is a management issue, so I will not address it in this post other than to say that even freelancers need to be aware of options out there. Their hands may be tied, but in some cases clients will ask for their input and software engineers should help them make an informed decision. Earlier this week, we covered one way of blurring the line between development and operations: DevOps. In case you missed that post, I urge you to check it out and see why DevOps integration can have an impact on hosting as well.

Continue reading