YT's Big Data Analytics Blog

“Big data” is a new catchphrase that has bubbled up recently to describe the current explosion in digital data. The extremely large number of status updates, Likes, and photo/video shares on social networks on a daily basis, combined with data produced by businesses and government computerizing their operation, is behind this explosive growth. This Explosion of data, dubbed “Big Data” has resulted in a corresponding explosion in opportunities for professionals and businesses alike.

Tuesday, October 9, 2012

Using Lean Agile Methodologies for Planning & Implementing a Big Data Project @ "Data Informed Live!" on Dec. 10

I am scheduled to speak at the "Data Informed Live!" event being held at San Jose, California on December 10-11, 2012 at the San Jose Mariott. The event is focused on planning and implementing big data projects. This 2 day event targets business and IT managers with a goal of imparting them with the knowledge they need to develop and execute a "big data" plan for their companies.

Click here to register.

The first day of this two day event is dedicated to planning aspects while day 2 focuses on implementation success factors. I am speaking on day 1, and my talk is about using lean agile methodologies for defining product requirements. Everyone knows that project requirements change for data and software related projects as things start taking shape, specially when the project involves new concepts and technologies such as big data, yet most traditional project management treat requirement changes as exceptions. I will be talking about how the agile requirement gathering and product design approach embraces change; and because change is anticipated in the agile project development frameworks, it allows projects to stay on track.

I believe that traditional requirements gathering processes does not work for big data projects because the end users can't yet fully grasp the full capabilities and power of big data and hence can not describe they need.

An iterative agile approach where requirement gathering, design and implementation are done in small (2 to 4 week long) iterations allows end users to visualize what can be done and what is needed and help development team understand how long it takes. It also allows the project to continue to move ahead while providing flexibility to accommodate changes as end users discover new requirements and developers figure out technical nuances. My session will explain how the agile approach works, provides advice for using it, and gives real-world examples of how others have used it successfully.

Whether you are planning your "Big Data" project, or implementing it, "Data Informed Live!" will prepare you for achieving success in your endeavors by covering the following critical issues:

- Process: The key processes which both impact and will get impacted by the proposed big data project
- Organization: How to design and re-engineer your organization to implement and utilize big data
- Tools, Platforms and Technology: Understand what platforms and tools can be utilized to assist you in the design and implementation of the big data project.

Click here to register or to find out more.

advertisement

Looking for a Copywriter in Denver Colorado? Look no more, Michelle Lopez can help you!

Friday, May 11, 2012

Oracle Unfolds Details of Its Arsenal of Speed-of-Thought Big Data Analytics Tools

May 3, San Francisco – Today, Oracle laid out details of its “speed of thought” big data analytics suite at its Big Data and Extreme Analytics Summit.

The Oracle Exalytics in-memory BI machine forms the core of the analytics suite. Oracle Exalytics is a co-packaged combination of Sun hardware and various in-memory analytics software tools which have been co-designed for optimal combined performance. The set of analytics tools include Oracle’s in-memory BI Foundation, an architecturally unified business intelligence solution which caters to reporting, ad hoc query, and analysis needs. It also includes the Endeca Information Discovery tool, which was acquired by Oracle in December of last year. This combination of co-optimized hardware and software is touted as the “speed of thought” analytics which allows advanced, intuitive exploration and analysis of data – whether structured or unstructured.

Oracle Exalytics can work against the data from both the Oracle Big Data Appliance (a distribution of Hadoop with matching co-designed Sun server) and Oracle RDBMS.

Endeca, a suite of applications for unstructured data management, Web commerce, and business intelligence which was acquired by Oracle last year in December, has been positioned as the primary information discovery tool in Oracle’s BI suite.

The combination of Oracle Exalytics hardware and matching BI software gives businesses access to both structured and unstructured information from different sources using just one interface, thus giving them the ability to gain a deep, wide, and on-time view of their customers and allowing them to do in-context exploration of their business data in a timely manner. In addition to answering questions of whether a business has increased its revenues or missed its sales targets, businesses today want – need – to know why.

In the past, to gain such information, businesses have had to go to separate data sources such as promotions data, time frames, locations, channels – and still manage to get only an incomplete view of what they’re looking for, as other influencing factors such as current events and climactic changes may inevitably get missed.

With the Oracle Exalytics In-Memory machine and Oracle BI tools, businesses will be able to gather information, both structured and unstructured, from a far wider spectrum of sources – survey companies, CMS systems, customer reviews, tweets, news reports – and consolidate them all into one single view. And then, to facilitate big data analysis, the software allows you to arrange certain data according to demographical categories such as gender, age, or location.

The key features of the co-engineered Oracle Exalytics In-Memory BI machine and software were described as

advanced data visualization and exploration to quickly provide actionable insight from large amounts of data. Oracle claims that the software can be used by somebody with no previous training, thanks to its user-friendly interface;
the software allows businesses to download data as is, without need for costly cleansing, so iteration and evolution can be accelerated;
faster than competitive solutions for business intelligence, modeling, forecasting, and planning applications; and
comprehensiveness. The hybrid search/analytical database was designed to be able to collect and compile all the information – regardless of source, format, or type – that a business needs to make informed critical decisions.

Saturday, April 28, 2012

IBM Makes a Big Deal About Big Data by Acquiring Vivisimo

IBM has confirmed that is has made a definite arrangement towards the acquisition of Vivisimo, a Pittsburgh-based privately held company and a leading provider of navigation and federated discovery software that companies use in accessing and analyzing big data.

No financial terms were revealed during the announcement.

Vivisimo software is well known for its ability to collect and deliver high-quality information over the widest range of data sources from every format and location. This software not only automates the searching and collection of data, it also helps human users to navigate with a singly enterprise-wide view, allowing them to get important insights, resulting in better solutions for challenges encountered during operation.

This acquisition of Vivisimo by IBM speeds up the latter’s initiatives toward big data analytics with advanced federated capabilities, as it allows businesses and other entities to access, view, and analyze the complete repertoire of available data, both structured and unstructured, without having to transfer this data to another location.

As IBM combines its capabilities for big data analytics with Vivisimo’s software, IBM’s efforts towards automating data flow to business analytics applications will be getting a good push forward, resulting in greater capabilities for assisting clients in understanding customer behavior, managing network performance and customer churn, performing data-intensive marketing campaigns, and detecting fraud even as it happens.

"Navigating big data to uncover the right information is a key challenge for all industries," said IBM Information Management general manager Arvind Krishna. "The winners in the era of big data will be those who unlock their information assets to drive innovation, make real-time decisions, and gain actionable insights to be more competitive."

“As part of IBM, we can bring clients the quickest and most accurate access to information necessary to drive growth initiatives that increase customer satisfaction, streamline processes, and boost sales,” said Vivisimo CEO John Kealey.

According to IBM estimates, 2.5 billion terabytes of data are created per day from mobile phones, tablets, social media, sensors, and many other sources. The sheer quantity of this data makes it difficult for businesses to analyze them thoroughly so that they can be used to maximize company efficiency, competitiveness, and profitability.

Vivisimo has more than a decade’s experience in harvesting and navigating humongous amounts of data, helping business get full value from their data and content. Vivisimo distinguishes itself from similar software by its ability to search and index data from multiple repositories. Currently, it is serving over 140 clients from the financial industry, consumer goods, electronics, manufacturing, life sciences, and government. Some of the bigger names it serves include Procter & Gamble, the US Navy and the US Air Force, the Defense Intelligence Agency, LexisNexis, Bupa, and Airbus.

Upon the completion of IBM’s purchase of Vivisimo, around 120 of the latter’s employees will be transferred to IBM.

Monday, March 26, 2012

Oracle Shares Details of the Oracle Big Data Appliance, which includes a Distribution of Hadoop from Cloudera

Oracle shared more details about the Oracle Big Data Appliance at Oracle Day in Redwood Shores on March 22. The Oracle Big Data Appliance, which was announced last October 2011 at the Oracle OpenWorld Conference, is now shipping and is very aggressively priced. The software, which includes both Hadoop and Oracle NoSQL database, comes pre-installed on Sun server–based hardware. Oracle claims that by delivering hardware and software which are bundled and engineered to work together, it is considerably simplifying IT.

The Oracle Big Data Appliance incorporates a new version of Oracle NoSQL database, a Cloudera-sourced Hadoop distribution, and an open-source R statistical software distribution. In addition, it supports Big Data Connectors – a new set of Oracle tools – which allow businesses to transfer data from Hadoop into Oracle Database 11g. It has been designed to work with Oracle Exadata appliance, the Exalytics business-intelligence applications appliance, and Oracle Database 11g.

The Big Data Appliance comes in configurations ranging from 2 processor cores to 24 processor cores, up to 864 gigabytes memory, 648 terrabytes disk storage, and 40GB/sec InfinBand connectivity. This bundled Hadoop offering is not unique in the market; if anything, it serves as a validation of the bundled approach. Other vendors such as NetApp and Dell also offer bundled Hadoop based on Cloudera. EMC Greenplum has a similar bundled offering based on Map-R.

A bundled database appliance (whether it's targeting big data or just regular RDBMS) provides customers with a single, easy-to-deploy-and-manage system, simplifying deployment, maintenance and support, and saving time. Buying and quickly deploying a big data appliance system is lot easier than separately procuring server and storage hardware and installing and configuring software, then going through the tedious process of integrating all these with currently existing infrastructure.

Still, Oracle is the first among the large enterprise computing vendors. We believe IBM and HP will soon follow suit, since a complete, out-of-the-box packaging of software, server, storage, and network that's configured together does save IT departments valuable time by eliminating installation, configuration, and tuning.

When Oracle announced in 2011 that it planned to do a Hadoop distribution, few expected it would go with Cloudera. This decision is in fact a very smart step.By choosing Cloudera, Oracle saves time and resources, allowing the company to focus on optimizing and tuning the whole bundle.

The detailed technical description of the Oracle Big Data Appliance is a part of Oracle's overall message of hardware and software engineering working together. The other products for which additional details were released at the Oracle Day included Oracle Exadata, where Oracle flagship RDBMS is bundled with Sun hardware and is being touted by Oracle as "the world's fastest database machine," the Oracle Exalytics In-Memory Machine and Oracle Exalogic Elastic Cloud.

With this level of focus on engineered systems, Oracle is ahead of its competitors, namely HP and SAP, in branding itself as a one-stop shop for all IT needs, covering both hardware and software, including software for big data, relational data, and tools, including analytics, applications, and transaction processing.

Yash Talreja, VP Engineering, The Technology Gurus

Tuesday, February 14, 2012

Big Data Brings Big Opportunities for Data Analytic Professionals

Do you have a knack for numbers? Do you find data as fascinating to read as a spy novel?

Then America needs you!

Well, to be more specific, America needs data analysts, a group of people whose vocational calling is to make heads and tails of data and use them to help businesses make lower costs, increase sales, and make sound decisions in general.

They call it “big data” (presumably because of the actual immensity of its magnitude). But where does it come from?

Big data comes from people visiting websites and joining and commenting on social networks. It comes from sensors and software that monitor where shipments go, what they contain, who is sending and receiving them, and what the environmental conditions are around them.

It comes from gadgets we carry around, such as mobile phones and other specialized devices and appliances equipped with mobile capabilities, like Amazon’s Kindle and Barnes and Noble eBook readers. It comes from the photos we upload using these phones, the moods and status updates we share, the tweets we tweet, the highlights and bookmarks we make public on Amazon Kindle. It comes from our location and who we are with – the very personal information we were cajoled into “opting in” for sharing by our social network(s).

It comes from search engines, generated every time we type something in the search bar and click on one of the results.

As a result, we have been creating 2.5 quintillion bytes of data every day – so much that 90% of all the data in the world today was created in the last two years alone.

There’s so much data that needs to be analyzed out there, a 2011 McKinsey Global Institute report stated a projection that the United States needs up to 190,000 additional workers with “deep analytical” expertise, not to mention an additional 1.5 million data-literate managers.

Incidentally, this need for data analysts also goes beyond the traditional business world. Take, for instance, Justin Grimmer, a young assistant professor at Stanford, whose work revolves around using the computer to analyze news articles, press releases, Congressional speeches, and blog postings, all in the interest of understanding how political ideas are spread.

In sports, data analysis is being used to spot undervalued players. Shipping companies use data on traffic patterns and delivery times to fine-tune their routing. Online dating systems use algorithms to find better matches for their members. Police departments use data on holidays, weather, sports events, pay days, and arrest patterns to identify criminal hot spots.

Data analysts have found that a few weeks before a certain region’s hospital emergency rooms started getting flooded with patients, there was a spike in online searches for “flu treatments” and “flu symptoms.”

Similarly, researches have also shown that you can get a more accurate picture of how real estate sales will be in the next quarter by looking at the number of housing-related searches than by asking the opinion of experts.

This scene is repeated in many other fields. Discovery and decision making are all moving under the influence of data analysis. It makes no difference whether we’re looking at the business world, the government, or the academe. As Harvard’s Institute for Quantitative Social Science director Gary King succinctly puts it, “There is no area that is going to be untouched.”

And as the data grows, it also helps the technology for collecting and analyzing it grow as well. The more data they take in, the more intelligent the machines get. Today, not only numbers and words can be analyzed. Even the so-called unstructured data – videos, audio tracks and images – are fair game.

It is a virtuous cycle, so to speak.

Of course, while it is easy to be impressed at our current ability to harness and analyze data, it still helps to remember the old-fashioned and time-tested belief that more is not necessarily better.

As Stanford statistics professor Trevor Hastie puts it, “The trouble with seeking a meaningful needle in massive haystacks of data is that many bits of straw look like needles.”

In addition, with such a huge amount of data, it’s far too easy for people delieberately seeking to skew opinions to make the conclusion first and come up with the supporting “facts” later.

Nonetheless, despite these setbacks, it is undeniable that data and the opportunities it brings are here to stay. We will be using it in different ways, towards different ends – but use it, we will, and if we do not have the ability to do so, we will need people who do to help us out.

Yash Talreja, Vice President, Engineering, The Technology Gurus.

An Educational Video on Big Data: Technologies & Techniques

Ben Lorica converses with Roger Magoulas (Director of Research at O'Reilly) on Big Data. Roger describes the key technology factors which are important while looking at solutions for management of big data. The video also gives a peek into the future, i.e. their opinion on where big data technologies are headed.

Yash Talreja, Vice President, Engineering, The Technology Gurus.

Thursday, January 5, 2012

What is Big Data?

“Big data” is a new catchphrase that has bubbled up recently to describe the explosion in digital data created by people, corporations and the government.

On a personal level, there has been a sharp increase in data in terms of the large number of status updates and photos and video uploads on social networks.

On a corporate level, there has been an explosion of structured data as an increasing number of corporations have moved their internal and external functions – from expense and time reporting to employees' FSA claims – to online systems. In addition, a dramatic amount of data is now being produced by digitization of forms filled in paper by customers before they are processed by the companies, hospitals and government -- ranging from auto and health insurance claims to medical records and bank, brokerage and credit card statements. And finally, the government agencies have also been increasingly digitizing data collected by the geographical and climate sensors, space, ocean and land data collected by NASA and other agencies, census data, public benefits data and even crime profiling data by agencies such as FBI.

As a result, we have been creating 2.5 quintillion bytes of data every day – so much that 90% of all the data in the world today was created in the last two years alone. This enormous amount of data - which the new catch phrase "Big Data" refers to - creates both a challenge and an opportunity for software and Web service providers, especially those involved in the field of analytics.

As you might imagine, contextualizing this information is an enormous challenge – fortunately, there are many innovative techniques and tools which allow us to analyze and make sense of the big data and use it for business benefit.

The ability to mine big data will be a boon to Social and Mobile Commerce -- companies will finally be able to serve advertisements and present offers which are precisely targeted to consumers based not only on traditional demographic data (age, sex, marital status, income, zip code), but also their current location as captured by their GPS enabled phones; and not only factors such as what they recently bought, but also what their friends bought.

In B2B scenarios, companies will be able to effectively qualify and manage leads across all the novel channels, including social media, and present customized offers to their potential customers. They will also be able to analyze customers’ issues faster and serve personalized offers for repeat business.

Yash Talreja, Vice President, Engineering, The Technology Gurus