[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Data science, data analytics using XSLT streaming

Subject: Re: [xsl] Data science, data analytics using XSLT streaming
From: Ihe Onwuka <ihe.onwuka@xxxxxxxxx>
Date: Tue, 5 Nov 2013 10:41:35 +0000

On Tue, Nov 5, 2013 at 10:12 AM, Costello, Roger L. <costello@xxxxxxxxx> wrote:
> Hi Folks,
> Apparently "data science" is the hot buzzword these days:
> Data Scientist: The Sexiest Job of the 21st Century (http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/)
> I think that, in a nutshell, data science is about analyzing large amounts of data.

No it's not. The data don't necessarily have to be large. Shorn of
that prequisite almost any form of computation entails analyzing data.

> It seems that most people believe that the Hadoop, parallel processing paradigm is the sole way of doing data science/data analytics.

No they don't. First up Hadoop is not the paradigm it MapReduce is.
Hadoop is just an open source project that implements the paradigm.

> However, I think that streaming is an equally valuable approach.
> XSLT streaming is all about processing large amounts of (XML-formatted) data.

But just because XSLT just got it doesn't mean it is new.

> So XSLT streaming should fit in the "data science" and "data analytics" categories.

If the source data is in XML then it is useful for extracting data and
handing it off to an environment properly equipped with primitives for
requisite statistical analysis.

> Broad Question: Would you provide a scenario/example of doing data science/data analytics using XSLT streaming please?
> I realize that the question is rather vague and broad. I am hoping we can collectively come up with ideas on how to do data analytics (data science) using XSLT streaming. Any ideas you might have would be appreciated.

See the previous answer.

Current Thread