Computer Lingo!

Computer lingo!

Or, “techno-babble” as I like to call it. If you don’t know your IEEE 1394s from your PCI Express x16s, then you may have a problem if you plan on building or upgrading hardware in your computer. Knowing the talk is essential for getting parts that are compatible and not blowing up your house with parts that conflict (just kidding, you’d blow up your whole neighborhood) (just kidding again, you’d just have to return the part and lower your head in shame for not reading this article).

Here’s some techno-babble that is useful for building or upgrading a computer:

Mobo: Motherboard – the circuit board that all of the computer components are attached to.

AGP: Accelerated Graphics Port – High-speed port on a motherboard for attaching a video card (not the standard, PCI Express x16 is today’s standard)

ATX: Advanced Technology Extended – Standard for computer motherboard and case design.

PCI: Peripheral Component Interconnect – Slot on the motherboard used to attach peripherals such as a sound card.

PCI Express x16: A more specific version of standard PCI, it is the standard slot for Video Cards on a motherboard.

GPU: Graphics Processing Unit – Your Video Card.

SLI: Scalable Link Interface – The technology to link two or more video cards to produce one output. What’s better than one video card? Two video cards.

SATA: Serial ATA – Method for transferring data from your hard drives to your motherboard and devices. This is today’s standard; provides increased bandwidth (it’s faster than PATA). Used for Hard Drives.

PATA: Parallel ATA – Method for transferring data from your hard drives to your motherboard and devices. It is the most common, however, it is not as preferred. Can be connected to all drives; Hard drives and CD/DVD drives.

CPU: Central Processing Unit -Crunches the 1s and 0s and turns your $1000 box into a computer.

AMD: Advanced Micro Devices – A CPU manufacturer known for making cheap efficient processors.

Socket 939/AM2: Associated with AMD brand processors, this is the physical layout of the pins that connect it to the motherboard.

Intel: A CPU manufacturer known for making powerful processors that aren’t cheap.

LGA 775: Associated with Intel brand processors, this is the physical layout of the pins that connect it to the motherboard.

GigaHertz(ghz): The clock speed of the computer. Used to determine the speed of the processor.

RAM: Random Access Memory – Your computer uses this to store data used by open applications.

DDR2 400: One example of RAM slot type. DDR2: double-data-rate two and 400 is the speed of the memory. It is critical that you check the motherboard to and RAM to be sure these match.

Gigabytes(gb): The unit used to measure the size of storage. I recommend having atleast 1gb of RAM.…

Large-Scale Data Processing Frameworks – What Is Apache Spark?

Apache Spark is the latest data processing framework from open source. It is a large-scale data processing engine that will most likely replace Hadoop’s MapReduce. Apache Spark and Scala are inseparable terms in the sense that the easiest way to begin using Spark is via the Scala shell. But it also offers support for Java and python. The framework was produced in UC Berkeley’s AMP Lab in 2009. So far there is a big group of four hundred developers from more than fifty companies building on Spark. It is clearly a huge investment.

A brief description

Apache Spark is a general use cluster computing framework that is also very quick and able to produce very high APIs. In memory, the system executes programs up to 100 times quicker than Hadoop’s MapReduce. On disk, it runs 10 times quicker than MapReduce. Spark comes with many sample programs written in Java, Python and Scala. The system is also made to support a set of other high-level functions: interactive SQL and NoSQL, MLlib(for machine learning), GraphX(for processing graphs) structured data processing and streaming. Spark introduces a fault tolerant abstraction for in-memory cluster computing called Resilient distributed datasets (RDD). This is a form of restricted distributed shared memory. When working with spark, what we want is to have concise API for users as well as work on large datasets. In this scenario many scripting languages does not fit but Scala has that capability because of its statically typed nature.

Usage tips

As a developer who is eager to use Apache Spark for bulk data processing or other activities, you should learn how to use it first. The latest documentation on how to use Apache Spark, including the programming guide, can be found on the official project website. You need to download a README file first, and then follow simple set up instructions. It is advisable to download a pre-built package to avoid building it from scratch. Those who choose to build Spark and Scala will have to use Apache Maven. Note that a configuration guide is also downloadable. Remember to check out the examples directory, which displays many sample examples that you can run.

Requirements

Spark is built for Windows, Linux and Mac Operating Systems. You can run it locally on a single computer as long as you have an already installed java on your system Path. The system will run on Scala 2.10, Java 6+ and Python 2.6+.

Spark and Hadoop

The two large-scale data processing engines are interrelated. Spark depends on Hadoop’s core library to interact with HDFS and also uses most of its storage systems. Hadoop has been available for long and different versions of it have been released. So you have to create Spark against the same sort of Hadoop that your cluster runs. The main innovation behind Spark was to introduce an in-memory caching abstraction. This makes Spark ideal for workloads where multiple operations access the same input data.

Users can instruct Spark to cache input data sets in …