Read how Gareth, MSc award in maths winner from Exeter University, applied his knowledge at BiG

This summer I got the chance to work with BiG Consultancy whilst finishing my Masters course in Computational Finance at Exeter University. I was interning at Crowdcube, an equity crowd-funding platform, who use BiG team as their Data experts. I first met the team at BiG in my third week. I ended up enjoying it down there so much that I worked from their office for the remainder of my internship!

My task was to help implement a data driven process to find companies suitable for crowd funding.

BiG were on the track to achieve this through machine learning, with a data set that was deep and rich enough to facilitate this. Although, in order to deploy the more in my view glamorous machine learning algorithms to achieve speed efficiencies, there was a necessary grunt work involved of building our own clean and complete dataset. I was shown how to call APIs in Java, a programming language I had never used before, and write the results to a database.

The project began with warehousing the data in Amazon Redshift. However, due to the large volumes of data we were collecting we progressed over to a cloud based system, Cassandra – now that’s Big Data! Random Forest was the algorithm of choice – a decision tree based algorithm. A training set of pre-classified companies was used to train the model, with various rules being applied to evaluate the larger dataset. By the time I had finished my internship, we had created an environment capable of producing hundreds of leads a week, with the model becoming more and more intelligent as feedback of the lead quality was fed back in.

My time at BiG definitely complemented my academic work where I was focusing on applying Monte Carlo methods to price path-dependent options. This is a process where the price of an asset is simulated thousands of times to obtain the average price of the option – a technique which is useful when there is no analytical solution available.

Both projects had a heavy emphasis on data and programming with methods and theories applied feeding into each other. For example, I found the role of random number generators very interesting. In particular, how they could be replaced with low-discrepancy sequences in order to reduce variance, whilst still keeping the “random” properties needed for the Monte Carlo method to work. This meant that the option price would converge to the correct solution quicker than the standard approach.

Even simple rules of programming that I was taught at BiG such as how to structure code and export results to a database helped me when it came to writing up my report and visualising the results.

Big Data is an extremely fascinating topic and I would advise anyone with a passion for numbers to read into it. Thanks to the team at BiG for teaching me so much, it was great having the chance to work with you!

Getting to grips with Graph by Jenny, a mathematics undergraduate at University of Exeter

As a mathematics undergraduate student, I arrived at BiG Consultancy with relatively little knowledge or experience of working with big data or real-life data problems. However, during my time interning there over the summer, I was able to learn a great deal about the issues commonly faced everyday by businesses and institutions in the real world.

 Big Data is definitely a buzz word these days, and refers to particularly large volumes of data that simply cannot be processed in the same ways that smaller datasets historically have been. Organisations from almost every industry can encounter problems if they get inundated with data: in health care, patient records, treatments and response information needs to be organised efficiently so that it can be accessed quickly and accurately; in banking, a huge amount of data can stream in concerning risk, investments and regulations, as well as customer details; and in retail and manufacturing, businesses need to be able to access intelligence regarding the market and analyse their own company strategies effectively. On top of the large volume of the data, it can also be the sheer speed at which it can flow in at, as well as the variety, complexity and security of it, that causes issues.

To tackle these difficulties, new software, technologies and methods are being developed all the time, and companies are turning to specialists for help managing their data. At BiG Consultancy, I experienced first-hand some up-to-date challenges faced by real companies, as well as their route to a solution. In particular, I especially enjoyed learning about complex data visualisation, including how graph databases can be used to navigate multifaceted stores of information much more quickly, and easily, than conventional databases can be. Graph databases also allow for the exploration of data relationships to be investigated more thoroughly, and this visualisation of connections can really help to see how improvements in one area of a company, for example, can benefit another. As part of my work at BiG, I helped to classify entities for a dataset presented to us in preparation for input into a particular graph database, and was also involved with the early stages of learning how to query it. This was a brand-new area of analysis for me, but I felt I was able to really throw myself into it and utilise my mathematical skills already acquired through my university studies.

Working at BiG really opened my eyes into a field of industry that I hadn’t been able to truly get a feel for before. I am now inspired to use my developing data analysis skills on larger, more exciting projects in the future, and hopefully contribute to this fast-paced leading area of expertise that is sure to be instrumental for years to come.

Learning Java with Monsters

At BiG we’re passionate about innovating with data but it’s coming together of brilliant minds that makes what we do possible. Rich flow of fresh talent and attitudes is crucial to keep our enthusiasm for problem solving with data infectious.

Here is what our intern, 4th year Mathematics student Conor, made of working at BiG. His responsibilities included: writing code, analysing data and learning to operate office robots (a form of machine learning!)

My name is Conor and I’m a 4th year Mathematics student at the University of Exeter. In June I spent one month as an Intern for BiG Consultancy. Before my internship, I had limited experience with programming and knew very little about big data. Once I had arrived, met the team and had a tour of the office, I was given my first task. I was to spend my first two weeks learning how to use and program in Java. I had no prior experience with Java, so it was a steep learning curve, but Gerry helped me to overcome any obstacles that I faced. I was challenged to create a text-based game by the end of the two weeks where the hero, Conor, had to go through different rooms, encountering and defeating monsters along the way. I was proud to have created this game from scratch after such a short space of time. It really opened up my eyes to the many applications of Java.

After successfully playing Frankenstein and creating an army of monsters, I moved from learning about the software development aspect of BiG Consultancy to data visualisation. In the next two weeks, I learnt how to use SQL to extract data. I found this to be significantly easier to work with than Java! With the help of Chris, once I had created data tables, I was able to manipulate them to create visual representations using programs such as Tableau and KNIME. This made the data easier for a client to analyse.

I really appreciated the time that I spent working at BiG Consultancy. With the help and guidance of the team I was able to develop my skills in programming in Java, as well learning about data analysis and visualisation. In addition, this was the first time that I have worked in an office environment, so this was also a very good experience. I especially enjoyed the coffee machine! Throughout the month internship, I learnt a lot about the rapidly developing world of big data and am excited to use the skills that I acquired here at BiG in my future endeavours.

Data; let’s get personal

Keeping in contact with your customers is a vital ingredient to ensuring that the hard, expensive work of acquiring customers and then delivering a quality product is translated into brand loyalty, repeat orders and ultimately high lifetime value.

Mobile and browser notifications and direct social media messages are the hot channels these days, but good old email which, despite its relative long-in-the-tooth-ness, is still an extremely powerful communications medium. So building an effective communications strategy that maximises the weapons at your disposal makes the difference between a great and a mediocre strategy; one that works, and one that does not.

Data sits at the core of many of the tools at the disposal to marketeers. Measuring effectiveness is of course hugely important, but data can also be used much more tactically than that. This blog post discusses one of the ways data can be used to develop content that is personal, targeted and relevant to each individual and how you can expect this to increase your campaign click-through and key interactions by around 25%.

The right message

Content maybe king, but it has to be relevant to each individual. Sending a broad brush/generic message to the whole of your customer base may struggle to accomplish. Amazon founder Jeff Bezos famously said “If we have 6.2 million customers, we should have 6.2 million stores”, meaning of course that the customer experience should adapt to suit each individual visitor; highlighting things of interest to the individual and suppressing those that are irrelevant.

Of course doing this manually on any kind of scale quickly becomes impossible and this is where use of data comes in. A first example of use of data to build personalised content is to look at a customer’s purchasing behaviour and then recommend to them products that are often purchased together. For example, those that purchase a top-end product, such as a ride-on-lawnmower, may also be interested in other top-end products that can be used to furnish their ample gardens. Product-to-product relationships can be discovered using a technique called Association Rules (or Basket Analysis) which find products that tend to be purchased together, the famous example of this being nappies and beer which are apparently often found in the same basket at supermarket checkouts due to tired new dads being sent on the shopping run; though this may just be a slightly sexist data-driven old wives’ tale.

While use of this technique is a good first step down the personalisation road – those using it will garner no criticism from me – it does still tend to create generalised buckets. If person purchases Product A, we will always try to sell them Product B and C no matter what else we know about them. In order to get much more granular personalisation; one that really gets inside the mind of each individual customer it is worth considering another technique, collaborative filtering.

Collaborative filtering is in concept an extension of Association Rules but can be applied to far larger datasets. By combining myriad of data-sources available to you such as order history, each click, touch and scroll points website and mobile app, all your views, opens and clicks from your marketing campaigns (email and mobile push notifications) as well as pulling in social-networking queues, it becomes possible to develop algorithms that start to develop a deep understanding of the intents of individuals as they interact with your business. Combining these datasets in effect harnesses the wisdom of the crowd and gives your communication strategy a layer of collective intelligence upon which you can draw strong inferences about individuals and what they are likely to be interested in and from this derive content specifically for each individual you are talking to.

From a data-engineering perspective working with data of this nature can be extremely challenging. It has the verbose, high-velocity and schema free properties of typical big data workloads. However, the majority of the value is contained in the dense and complex relationships between customers and the ‘things’ that they might be associated with, be they your products, or their likes and dislikes gleaned from social media.

There are a number of data technologies that can help here, but at BiG we have found that a slightly esoteric but increasingly popular off-shoot of the NoSQL ecosystem forms a highly effective backbone to this type of work – the Graph database. This type of database, typified by market leaders OrientDB, contains no tables, rows or even cells. Instead all data is held as either atomic units called Nodes (or Vertex) or Relationships (Edge) between two Nodes. Thus you may be represented by one Node and a Product maybe by another Node. Your interest, viewing history and purchasing history will be represented by the relationship between those two nodes.

Graph database products, like OrientDB, allow this structure to contain many billions of records and enables you to form an extremely semantically rich picture of your data domain, be that for an individual, product or even product category. This forms the core of a highly scalable recommendation system that against which you can develop queries that give you truly personalised content.

Is it worth it?

The short answer is yes. Personalisation is huge driver in positive customer engagement; for example, we routinely see CTR of around 25%. At BiG we work closely with the world’s leading equity crowdfunding platform Via this and other data-centric hyper-personalisation marketing techniques we have driven significant increases in money invested into businesses raising on the Crowdcube platform. In some one recent case, the difference between hyper-personalisation vs control has been an over 56% increase; that equates to £8.26 vs. £0.22 investment made per email recipient. Great news for entrepreneurs seeking to take their businesses to the next level and also the investor membership, who have been connected directly with businesses they are keen to support.

About the companies in this article is the world’s leading equity based crowdfunding platform; with an investor membership of over 350,000 they have so far raised over £200 million for 500 growing businesses.

OrientDB is the world’s leading (and fastest) 2nd generation NoSQL Distributed Graph database offering a blazingly fast and scalable multi-model document and graph platform; benchmarked at 10× faster than rivals.

Big Consultancy – Gerry McNicol is co-founder at We are a team of data engineers, analysts and consultants who specialise helping companies successfully deploy data strategies and technologies. Whether you are dealing with real-time and high-volume streaming data, want to leverage the new wave of machine learning technologies or want to transform your business into a data-driven enterprise we want to solve your problems. Tell us your data ideas, hopes and data dreams. Connect on LinkedIn or via