Video [Splunk]



Get the Flash Player to see this player.

time2online Extensions: Simple Video Flash Player Module

Twitter [Splunk]

about 40 minutes ago RT @Josh_Atwell : Super stoked by the cool feedback I got from our team today on @splunk reporting I'm working on #keepingmemotivated <--\0/
about 44 minutes ago RT @tara0525 : I'm hiring a Partner Marketing Manager--ping me to join a fast-growing company with a fantastic product! http://t.co/tW9ZuGfP
about 1 day ago #SplunkLive #Sabre: #Splunk transformed the way we do monitoring. Devs are using Splunk to become proactive-our run book is based on Splunk
about 1 day ago #SplunkLive Boston: Holt Hopkins, Chief Architect; Brian Medlin, Mgr Infra Dev #Sabre Hospitality Solutions: #Splunk gives us visibility
about 1 day ago #SplunkLive : @ConstantContact : Abuse team is proactive now--shutting down fraudulent/ spamming accounts before abuse happens
about 1 day ago #SplunkLive @ConstantContact #splunk is critical to the compliance abuse team's daily routine--gives secured RBAC access to data they need
about 1 day ago #SplunkLive : @ConstantContact : Within weeks devs, suppport, others using #splunk without any help from us--just using docs + SplunkAnswers
about 1 day ago #SplunkLive Constant Contact: We chose #Splunk --It was easy to get running--didn't need to know exactly what we intended to find in advance
about 1 day ago #SplunkLive Boston Constant Contact #Security Crew: Tyler, Mike, Heather use Splunk to conduct #forensics investigations over 300+GB/ day
about 1 day ago RT @dc404 : Topics: *+Doug Burks presents #SecurityOnion * & *+brad shoop presents +Splunk for #SecurityOnion * Join us http://t.co/tR4sBDYI
about 1 day ago #SplunkLive ! Boston is wicked pissa! Customers Staples, Sabre, Constant Contact use Splunk for #security #opsmgmt #appmgmt #bizintelligence
about 1 day ago RT @MikeLloydOBrien : http://t.co/huvIsIhS @LinkedIn , Top 10 BayArea startups inc. 4 #bigdata firms: @Splunk is hiring! http://t.co/8pNGpo4l
about 1 day ago Old School. @ AT&T Park San Francisco http://t.co/HfDFLe3U
about 2 days ago Splunk Blogs: Analytics Staffing for Big Data: A Perspective http://t.co/fHcWxJKL
about 3 days ago #SplunkLive DC: Jason Hubbard, Dir, Software Development, #USFDA relies on the Splunk App for #Microsoft #Exchange http://t.co/MZP0wE4i
about 3 days ago #SplunkLive DC Jeff, #Cisco : We shouldn't be working w/ spreadsheets anymore--we can automate so much w/ #splunk http://t.co/K1qypScv
about 3 days ago #SplunkLive DC Jeff, #Cisco : We shouldn't be working w/ spreadsheets anymore--we can automate so much w/ #splunk http://t.co/jCfozKkX
about 3 days ago #SplunkLive DC Jeff, #Cisco : We shouldn't be working w/ spreadsheets anymore--we can automate so much w/ #splunk http://t.co/lE9JX30l
Kill Your Data Warehouse! PDF Print E-mail
Article Index
Kill Your Data Warehouse!
Page 2
All Pages

Preparing to take maximum advantage of all of the new data and analytical capabilities rapidly arriving on the computing scene will mean that most businesses have to rethink how they assemble, distill, and use information. My advice: Plan on killing your data warehouse.

Actually, you won’t have to take it out back, have it kneel down, and shoot it in the back of the head, gangster style. This is more of a Dr. Frankenstein operation in which you will put the data warehouse on the operating table, cut it up, and create a new way to process information out of a combination of old and new parts. Actually, parts of your data warehouse can stay alive during this process. Consider it a form of vivisection. What has to die is the idea that the information a business needs come from data warehouses the way they are currently implemented.

The vision that I am crafting for a replacement of a data warehouse is a data lake, a concept I’ve written about on Forbes.com and a in a problem statement on CITOResearch.com (Preparing for Big Data).

The basic idea is simple. A data lake contains a large amount of data from various sources and forms that is ready to be distilled into information to support decisions or business processes.

Here are the primary differences between a data warehouse and a data lake:

  • In a data lake, end-users are far more involved in deciding when and how the information will be distilled. The creation of the equivalent of data cubes and other summaries to speed analysis is far faster and less intermediated by experts.
  • A data lake contains many more types of data than a data warehouse, which usually has transactional records from enterprise applications. In a data lake you will find also machine data from server logs, networking equipment, telecommunications equipment, and lots of different kinds of sensors. In addition, you will find unstructured information that can be used to add context to numerical information.
  • A data lake will use many more techniques to correlate and understand data than a data warehouse. Capabilities like Splunk and Hadoop and other MapReduce implementations will be employed to distill and summarize machine data. Complex event processing systems will sift through many streams of data looking for patterns. Unstructured data will be analyzed and correlated to structured data using capabilities like Attivio or Autonomy.
  • A data lake will be far more oriented toward in-memory processing in real time than batch processing, which dominates the world of data warehouses.

The transformation driven by a data lake will implement the paradigm of operational intelligence, a more real-time, automated way of using both structured and unstructured data, both from real-time sources and historical repositories in a way that allows analysis to be as automated as possible.

The question that interests me now is how can we craft a meaningful architecture for a data lake? What will be included from the world of business intelligence and what will be tossed off the operating table? How will the capabilities I assert that should be part of a data lake support each other? How can we make the idea of a data lake more than just a list of new ideas and new technology?

Right now, the data lake is still a somewhat fuzzy vision, but it is a vision that must be pursued. Current data warehouses are not up to the task of handling the volumes of machine data, aka Big Data, in ways that allow businesses to be responsive.