Data; let’s get personal

Keeping in contact with your customers is a vital ingredient to ensuring that the hard, expensive work of acquiring customers and then delivering a quality product is translated into brand loyalty, repeat orders and ultimately high lifetime value.

Mobile and browser notifications and direct social media messages are the hot channels these days, but good old email which, despite its relative long-in-the-tooth-ness, is still an extremely powerful communications medium. So building an effective communications strategy that maximises the weapons at your disposal makes the difference between a great and a mediocre strategy; one that works, and one that does not.

Data sits at the core of many of the tools at the disposal to marketeers. Measuring effectiveness is of course hugely important, but data can also be used much more tactically than that. This blog post discusses one of the ways data can be used to develop content that is personal, targeted and relevant to each individual and how you can expect this to increase your campaign click-through and key interactions by around 25%.

The right message

Content maybe king, but it has to be relevant to each individual. Sending a broad brush/generic message to the whole of your customer base may struggle to accomplish. Amazon founder Jeff Bezos famously said “If we have 6.2 million customers, we should have 6.2 million stores”, meaning of course that the customer experience should adapt to suit each individual visitor; highlighting things of interest to the individual and suppressing those that are irrelevant.

Of course doing this manually on any kind of scale quickly becomes impossible and this is where use of data comes in. A first example of use of data to build personalised content is to look at a customer’s purchasing behaviour and then recommend to them products that are often purchased together. For example, those that purchase a top-end product, such as a ride-on-lawnmower, may also be interested in other top-end products that can be used to furnish their ample gardens. Product-to-product relationships can be discovered using a technique called Association Rules (or Basket Analysis) which find products that tend to be purchased together, the famous example of this being nappies and beer which are apparently often found in the same basket at supermarket checkouts due to tired new dads being sent on the shopping run; though this may just be a slightly sexist data-driven old wives’ tale.

While use of this technique is a good first step down the personalisation road – those using it will garner no criticism from me – it does still tend to create generalised buckets. If person purchases Product A, we will always try to sell them Product B and C no matter what else we know about them. In order to get much more granular personalisation; one that really gets inside the mind of each individual customer it is worth considering another technique, collaborative filtering.

Collaborative filtering is in concept an extension of Association Rules but can be applied to far larger datasets. By combining myriad of data-sources available to you such as order history, each click, touch and scroll points website and mobile app, all your views, opens and clicks from your marketing campaigns (email and mobile push notifications) as well as pulling in social-networking queues, it becomes possible to develop algorithms that start to develop a deep understanding of the intents of individuals as they interact with your business. Combining these datasets in effect harnesses the wisdom of the crowd and gives your communication strategy a layer of collective intelligence upon which you can draw strong inferences about individuals and what they are likely to be interested in and from this derive content specifically for each individual you are talking to.

From a data-engineering perspective working with data of this nature can be extremely challenging. It has the verbose, high-velocity and schema free properties of typical big data workloads. However, the majority of the value is contained in the dense and complex relationships between customers and the ‘things’ that they might be associated with, be they your products, or their likes and dislikes gleaned from social media.

There are a number of data technologies that can help here, but at BiG we have found that a slightly esoteric but increasingly popular off-shoot of the NoSQL ecosystem forms a highly effective backbone to this type of work – the Graph database. This type of database, typified by market leaders OrientDB, contains no tables, rows or even cells. Instead all data is held as either atomic units called Nodes (or Vertex) or Relationships (Edge) between two Nodes. Thus you may be represented by one Node and a Product maybe by another Node. Your interest, viewing history and purchasing history will be represented by the relationship between those two nodes.

Graph database products, like OrientDB, allow this structure to contain many billions of records and enables you to form an extremely semantically rich picture of your data domain, be that for an individual, product or even product category. This forms the core of a highly scalable recommendation system that against which you can develop queries that give you truly personalised content.

Is it worth it?

The short answer is yes. Personalisation is huge driver in positive customer engagement; for example, we routinely see CTR of around 25%. At BiG we work closely with the world’s leading equity crowdfunding platform Via this and other data-centric hyper-personalisation marketing techniques we have driven significant increases in money invested into businesses raising on the Crowdcube platform. In some one recent case, the difference between hyper-personalisation vs control has been an over 56% increase; that equates to £8.26 vs. £0.22 investment made per email recipient. Great news for entrepreneurs seeking to take their businesses to the next level and also the investor membership, who have been connected directly with businesses they are keen to support.

About the companies in this article is the world’s leading equity based crowdfunding platform; with an investor membership of over 350,000 they have so far raised over £200 million for 500 growing businesses.

OrientDB is the world’s leading (and fastest) 2nd generation NoSQL Distributed Graph database offering a blazingly fast and scalable multi-model document and graph platform; benchmarked at 10× faster than rivals.

Big Consultancy – Gerry McNicol is co-founder at We are a team of data engineers, analysts and consultants who specialise helping companies successfully deploy data strategies and technologies. Whether you are dealing with real-time and high-volume streaming data, want to leverage the new wave of machine learning technologies or want to transform your business into a data-driven enterprise we want to solve your problems. Tell us your data ideas, hopes and data dreams. Connect on LinkedIn or via

Data Carrots

Do yellow elephants dream of data carrots?

You don’t have go far to find a lot of people talking about how data, analytics and machine learning can help take your company to the next level, transforming it into a sharper, leaner, fitter entity, fit to embrace the current age. But when it comes to reality implementing data and analytics can often feel like a large scale engineering effort coupled with expensive tools and people. Rather than making you leaner, faster, fitter, your business gets slower, fatter and heavier. A frustrating place to be.

Counteracting this situation, or ensuring you do not slip unwittingly into it, requires a cohesive data-strategy, one that blends technology, psychology, business transformation and data. And it is up to the business management to work in unison with the data team to understand and shape use of data rather than relegating it to a purely technical or IT function.

A recent study carried out at Virgin Atlantic offers a great illustration. In this parable, the airline wanted to increase the efficiency of their aircraft by reducing the cost of fuel used per flight. Airplane modifications and upgrades are of course costly and thus are not the lowest or easiest fruit to start picking. It was decided that the first port of call would be to work with the pilots to see if anything could be done to look for operational efficiency (Gosnell G. K, List J. A and Metcalfe R., An edict duly came down from on high with instructions asking pilots to optimise where possible and find ways to use a little bit less fuel per flight.

The data team then got involved and an experiment begun. The pilots were split into three groups:

The first group were asked to “save a bit of fuel please“. Fuel use was recorded and analysed in great detail to see if they did save any fuel. However, these pilots were not told about the new fuel use data logging.

The second group were also asked to “save a bit of fuel, please“. This group were told about the new detailed fuel data logging. However, the resulting analytics and data were not shared; but management “are watching!”

The third group, also asked to “save a bit of fuel, please“, were both told about the detailed data logging and, after every flight were given the comprehensive breakdown of their fuel use. Detailed analytics were provided for every flight on fuel used during taxiing, take-off, cruising and landing and was made available to them with comparisons and averages of other pilots, routes and planes. Additional gamification techniques, with the Airline donating money to pilots choice of charity for every £1 of fuel saved, were also used to incentivised the pilots.

6,828 metric tons of fuel – worth £3.3 million


It does not take a great leap of the imagination to workout what happened next. The first groups behaviour changed very little and fuel use remained the same. The second did save a little bit of fuel, but not much. But the third group started using a lot less fuel. Taxiing using one engine only, less aggressive take-off and more relaxed engine braking at landing as well as optimising cruising altitude, speed and routes all helped save an estimated 6,828 metric tons of fuel, worth £3.3 million during the course of the study. And an additional and unexpected side benefit was that the group exposed to the charitable donation gamification reported the highest level of job satisfaction.

This data-led feedback loop, known as nudging, helped the pilots execute a key company policy and provide a perfect example of exactly the type of thinking that sits at the core of any business use of data and analytics. Data is not just important; it is *really* important. It should be used to shape and inform the many thousand decisions and actions each of your team make every day. And for this to happen you, the business managers, need to define clearly what you consider to be important; then work with your data team to measure it and share it, so allowing your team to shape their work to push that measure in the right direction.

Trends in Big Data: Time to ditch the Three V’s?

Working in the world of Big Data can be exhausting. As soon as you get your head around one complex technology, another three spring up behind you to replace it. While you were looking the other way, head buried in code, the state-of-the-art goalposts moved a little bit further into the future.

It’s easy to resent the next generation of tech when it renders all the strife you’ve been through to master the current generation worthless. Until of course you realise that this next wave solves so many of the problems you’ve been wrestling with and, by embracing change, your life becomes much easier and more opportunities open up. I have lost track of the hours I have spent walking round and round the block, trying to grapple with the mechanics of early Hadoop MapReduce code, building up a library of routines and patterns that could be called upon to solve most problems. Then along came Apache Spark and suddenly it all became so much easier. Damn you in-memory resilient-distributed-datasets making my life better and my code run faster!

I’m joking of course. Well, mostly. Progress in Big Data technology has been staggering. A few years ago the challenge with Big Data Projects was often defined by technical practicalities, like how do we store this amount of data and how do we even being to process it? We don’t have to search too far back in time to remember how technically challenging, and thus costly, this was. However, as we reach the end of 2016, the toolsets now available as well a significant focus by cloud platforms, in particular Amazon Web Services and Microsoft Azure to automate much the heavy lifting, mean that there is now a smorgasbord of cost effective architectures that enable companies of all sizes to work with Big Data. Demonstrating this shift, a recent Gartner report stated that, “In 4 years 90% of all data will be on Next generation technology”. From start-ups to multinationals, if you want to work with Big Data, you can.

Many companies have now been through the pain (and cost) of experimenting with Big Data projects. Through these platform iterations we’ve reached a technological level where we can now deploy reasonably cost effective solutions of staggering scale. But so what? We’ve collected this data, we’ve deployed scalable infrastructure, but what’s the benefit?

It’s a fair question. Why do this stuff? Perhaps due to the historic technical difficulties, wide scope and far-reaching possibilities, Big Data projects have tended to be Engineer or Data Scientist led, rather than by Management. Even the famous three Vs of Big Data (Velocity, Volume, Variety) describes the nature of the data, rather than what we are trying to do with it. The three Vs are great, but so what? It’s not exactly something you put in a board level presentation and leave your audience with a sense of direction. Over the years the three Vs have evolved … to the four Vs. We’ve added Veracity! I’ve even seen the slightly tongue in cheek Ten V’s of Big Data as well…

“Vast, Volumes of Vigorously, Vexingly,
Variable, Verbose, yet Valuable, Visualised
high Velocity data, from Silicon Valley”

This is not helping. The only sane response from anybody trying to run a business is “OK, we have lots of data. So what?”

We Big Data experts really need to start thinking of a better definition. Something that others can engage with as well as guide us as to what we are trying to achieve. We need to stop the ‘so what’ question from coming back to bite us on our behinds.

How about…

Big Data is the attempt to gain competitive
advantage by exploiting new ways of storing
and processing data.

“Ah ha!” our audience now says.  “I’d quite like to gain a competitive advantage. That sounds advantageous!”

“It is” we reply. “It can be most advantageous indeed”.

“But how, and what sort of advantage would I be looking to gain?”

“What a good looking question”. We respond.

And indeed it is a good question. We can now start discussing the hopes, dreams and worries of the organisation. How can data solutions be put to use to solve each of them in a targeted, costed and most importantly, effective, way. Invariably in business, most hopes, dreams and worries come down to creating more revenue, improving op- erational efficiency, reducing risk or driving business change (in order to create more revenue, improve operational efficiency and reduce risk), but the key is that by starting from a new, better definition all data projects, big or small, will now be driven by a focused need. So what? Well there will be no more “So whats” for a start.

Gartner (again) predicts that by 2020, 75% of large and midsize organisations will compete using advanced analyt- ics and proprietary algorithms. Deploying this type of tech throughout a whole organisation is no small feat; the challenge being both technical and business transformational. Tackling both these aspect of data projects is in- creasingly moving into the domain of Big Data projects. In order for data to drive the enterprise the trend over the next few years has to be the engagement with a more holistic approach, perhaps one that starts with a better defi- nition of Big Data.

What do you think? Should we abandon the Three, Four or even Ten V’s of Big data? Can you come up with something better?