Emma Bellamy, Author at Peak https://peak.ai Fri, 09 Jun 2023 10:13:04 +0000 en-GB hourly 1 https://wordpress.org/?v=6.8.3 https://assets.peak.ai/app/uploads/2022/05/25155608/cropped-Peak-Favicon-Black%401x-32x32.png Emma Bellamy, Author at Peak https://peak.ai 32 32 Tackling common data science challenges https://peak.ai/hub/blog/tackling-common-data-science-challenges/ Fri, 04 Feb 2022 15:39:27 +0000 http://peak.ai/?post_type=blog&p=34677 The post Tackling common data science challenges appeared first on Peak.

]]>
two people having a meeting in an office

Author: Emma Bellamy

By Emma Bellamy on February 4, 2022 – 5 Minute Read

When AI is used to solve real world problems and make decisions for businesses, it can be very powerful.

Yet, as data scientists know, there’s a lot of work that goes into getting it there, and the problems facing businesses aren’t always easily solved. The Data Community has come up against some big challenges this year, and in this blog we discuss these common problems and how best to tackle them.

1. Poor quality data

We’ve all heard the saying “garbage in, garbage out.” In other words, that flawed or nonsense input data produces nonsense output.

The data received by data scientists can be incomplete, faulty, inaccurate or really scarce. On top of this, data columns or data labels can be inconsistent or unstructured. It’s just plain MESSY! This makes it hard to achieve high performing optimized results from machine learning models.

Dr. Andrew Ng recently shared a fundamental skill: “using a data-centric approach to machine learning, rather than a model-centric approach, will consistently improve model performance.” This means improving the data in a more systematic way, from removing missing data to making the data more consistent and ensuring that there is enough historic data for the model. Investing effort to improve existing data quality is as effective as collecting the triple amount of the data. 

Data quality has to be monitored and improved at every step, so having alerts for when data is missing or outliers are identified is a key step when productionizing the model. 

For the Peak data science ops team, who work closely with customers, ensuring that an organization has AI-ready data is a key step during initial conversations with prospective customers. Performing quality assurance and analyzing the accuracy of the data is one of the first steps that Peak’s data scientists perform during those initial conversations. This principle can be applied to any data science project, that it’s important to have confidence in the data before starting to build a model. That is equally true for in-house data professionals.

To learn more about how Peak ingests data, and how the Peak Decision Intelligence platform can help data scientists to analyze the data they will be working with, be sure to read my colleague Vanessa’s blog: Kick-starting a data science project with Peak.

man on a call using a headset in an office

2. Difficult stakeholders

Getting new stakeholders excited about AI is often a hurdle for a data scientist or project owner. Some stakeholders are skeptical of the benefits of the tech, or mistrustful of the machine learning algorithms, and would like all parts of a solution to be transparent and explainable. 

For other stakeholders, they may have never experienced working on a data science project and expect the project management to be similar to a typical IT project. The main differences are that data science projects can be pretty unpredictable, and often involve more exploration and iteration than IT projects. 

This iterative approach to building data science models helps teams deliver value to their customers quickly, but relies on stakeholders being prepared to work with the data scientists and give feedback on the models. These projects often start with a narrow scope and a minimum viable product (MVP) will be built on a small subset of a businesses portfolio of products. When the MVP has been tested and feedback received on the decisions that it makes, the data scientist can then quickly iterate and expand the scope of the solution, by testing and receiving feedback from the stakeholders. 

An all too common problem is a misalignment between the data scientist and the stakeholders on the project goals or outcomes. This may happen when the project proposal is too vague, the data scientist builds the MVP, then it unfolds that what the data scientist expects to deliver and what the stakeholder expects to receive are totally different.

Alternatively, they may agree on the end project goal but, due to different priorities, stakeholders requesting many changes, adding additional requirements, or expecting lots of ad-hoc analysis, the solution build gets held up. A good way to avoid this scenario is to have a tight project scope agreed by everyone before the project begins.

Finally, one of the biggest challenges faced by data scientists is to get buy-in from many different stakeholders within that organization. The original project sponsors may be on board, but there can remain resistance from different departments to understand how data science can solve the problems that they have been experiencing. Data scientists often have to utilize their communication skills to influence people who don’t believe in data-driven decision making or who don’t trust the decisions delivered by the solution (often via super cool data visualizations!) When all stakeholders are aligned, the project can really come alive and turn into a reality.

In our blog, ‘Top tips for data scientists dealing with difficult stakeholders’, Peak’s data scientists share their best nuggets that they’ve learned through experience when it comes to managing data projects.

The main differences are that data science projects can be pretty unpredictable, and often involve more exploration and iteration than IT projects. 

Emma Bellamy

Data Scientist at Peak

3. Ensuring that model outputs are reliable, accurate and interpretable

One of the main challenges that data scientists across the industry face is the need to deliver outputs that are reliable, accurate and interpretable. If stakeholders begin to lose trust in the solution, the project could be derailed.

After a model has been trained and deployed, the next step requires developing an understanding of the predictive outputs and communicating them back to the relevant project stakeholders. Data scientists must develop tools that allow these results to be interpretable by subject matter experts, who understand the business needs of the project but not necessarily the technical details. One way this can be done is by showing particular example outputs from a model. 

For example, if the model is predicting customer churn then the activities of a few customers can be shown along with their predicted churn score, highlighting the factors that influence the score the most. This communication is crucial, because if it’s not possible to convince others of the business value of the project, then it’s almost certainly destined for failure. 

Once the stakeholders are happy that they understand the outputs, the focus can then go on deploying the solution to create reliable and accurate outputs, using the most appropriate computing infrastructure. 

Model outputs are often displayed in up-to-date dashboards using hosted web apps. These dashboards provide a great way to display the model outputs in an interactive way, providing useful analysis to end users or the project stakeholders.

office workers on video calls with headsets

4. Building trust through partnerships

In the majority of projects, the main stakeholders that data scientists talk to don’t have technical expertise, they haven’t studied for a Master’s in data science and often this is their first encounter with Decision Intelligence. Explaining both the solution and the outputs to a non-technical audience can be hard. In fact, one of the reasons why the majority of AI projects ultimately fail, is due to a poor relationship between stakeholders and the data scientist. The stakeholders’ vision, guidance and input eventually play a big part of a project’s success.

At the start of a data science project, it’s important that the data science team and stakeholders get to know each other and can reaffirm the shared vision of the project – this is where the project vision starts to become a reality. The next most important stage is to carry out exploration of the stakeholder’s business in detail, and in relation to the proposed solution, exactly how the business works. When data scientists ask probing questions to enrich their contextual information, they can understand the business, how it operates and the end users and stakeholders’ requirements. This enables a tightening up of the definition of the solution and reduces the risk of misalignment on project goals.

For the company’s main stakeholders, the data science solution outputs sometimes look very different to how a human would make the decision. Often the data scientist will need to convince the stakeholders why the model is efficient, accurate and adds value to their business. 

The project is successful when the end users rely on the decisions output from the model, either as these decisions are derived faster or in many cases as they are more cost effective. Therefore, user adoption is the main goal of a data science project. 

If there is a lot of skepticism regarding the model, one way to increase trust is to start by replicating what the business currently does, by adding in lots of business logic and additional constraints into the solution. As the solution matures and the stakeholders trust the model, these constraints and some of the business logic can be removed. 

A solution that no-one is using is useless. This risk can be mitigated by the constant guidance and input from the main stakeholders. Data scientists can build trust via starting simple, both in terms of the machine learning model and/or by concentrating efforts on a subset of products. As data scientists and stakeholders together assess this first model, they will understand its shortcomings and develop solutions to overcome them. They can also work together to understand what is missing from the model and codify any business logic or practical considerations that are required. The pure optimal solution resulting from the machine learning model isn’t always the best.

By encouraging open communication and receiving detailed, informative feedback, machine learning model parameters can be tuned and the solution can be iteratively improved to meet the requirements of all stakeholders. 

Join our inclusive Data Community

The Peak community exists to support data scientists and analysts who want to make a difference and drive change within their organizations.

Stay in touch!

Subscribe to our newsletter to find out what’s going on at Peak

The post Tackling common data science challenges appeared first on Peak.

]]>
Peak’s Women in Data Science https://peak.ai/hub/blog/peaks-women-in-data-science/ Tue, 05 Oct 2021 07:06:00 +0000 http://peak.ai/?post_type=blog&p=26703 The post Peak’s Women in Data Science appeared first on Peak.

]]>
person working on a laptop in the Peak Jaipur office

Author: Emma Bellamy

By Emma Bellamy on October 5, 2021

Historically, there has always been a lower proportion of women in data science and in the wider Science, Technology, Engineering and Maths (STEM) fields.

 To delve deeper into why this might be the case, we wanted to get to know our own team a little better. We recently asked our data scientists to complete a survey in order to learn more about the backgrounds and career journeys so far of the women in Peak’s data science team!

How did they get into data science?

All of the data scientists currently at Peak continued their education after high school to university, once they had completed their A-levels or equivalent qualifications. Having graduated with a Bachelor’s degree, 57% of Peak’s female data scientists then continued into postgraduate studies and were awarded a Master’s degree as their highest level of education.

A further 36% completed a doctorate degree, such as a PhD. For further advice on whether you need a PhD to be a data scientist, check out this blog written by some of Peak’s data science management team.

chart showing the education level of Peak's women in data science

The routes into our female data scientists’ careers were diverse, and covered a wide range of different subjects at A-level. The most popular subjects were Mathematics and Further Mathematics, Sciences (including Biology and Chemistry), then English and other Modern Languages. Interestingly, only a small percentage of women had studied Computer Science or Business Studies at A-level – even though these courses cover some of the key skills required for a data science position!

chart showing the most common A-level subjects studied by Peak's female data scientists

This solid background in mathematics continues to give Peak’s female data scientists high levels of confidence in their mathematics and statistics skill sets. It is perhaps a lack of computer science and programming opportunities in their high school years that led to fewer women choosing this field in their undergraduate degree, which continues to be an area with average levels of confidence.

What motivates them as data scientists?

An impressive range of different machine learning models, algorithms or domains were chosen by the female data scientists, potentially due to the diversity of the Peak platform that we work on and our varied data science backgrounds. The top three were Decision Trees/Random Forest/xgboost, Optimisation and KNN/K-means Clustering with 29%, 21% and 14% of the votes respectively.

This highlights the varied technical domains that data scientists can become experts in. As data science is a relatively new and rapidly expanding field, all data scientists have many opportunities to take their careers into a wide variety of different directions.

Chart showing the preferred machine learning model, algorithm or domain of Peak's female data scientists

In order to be able to write machine learning models and algorithms, data scientists use various coding languages. The preferred programming language of the women in the Peak team is a closely fought battle between R and Python. SQL is another important programming language for data scientists at Peak, and is included in the ‘Mixture’ category.

43% of the female data scientists chose R as their favorite programming language, 36% chose Mixture and the minority at 21% chose Python. In contrast, 56% of the male data scientists chose Python as their favorite programming language. The data scientists’ favorite programming language is likely to play a factor in their confidence levels in programming skills.

plot showing peak data scientists confidence levels in different languages

From the density plot above, we can see that the levels of confidence between females and males are fairly similar when R is chosen as their favorite language. However, when Python and a mixture of languages category are chosen, men are likely to be more confident in their programming skills. A question that we can ask ourselves is whether this is a valid reflection of people’s actual skills and capabilities due to their background and experience, or whether women are portraying themselves as having less confidence, or limiting their belief in themselves.

Data scientists need to have many strings to their bow, with programming skills being only one of them. In order to be able to turn a real world problem into a data science problem and successfully deliver a Decision Intelligence solution, data scientists must also possess machine learning, mathematics and statistics skills and an understanding of business and commercial applications.

The overall diversity in a team, including gender and skillset, will give a broad range of ideas to the problems that we solve. Although men may be more confident in computer science  and programming, the density plot below shows that women data scientists at Peak may have more confidence in business understanding and commercial applications. All data scientists have a curiosity to understand the customers that they work with so they can help shape the future of their businesses.

chart showing the preferred programming language of Peak's women in data science

The company values that resonate the most with the women in the Peak data science team are collaborative, driven, approachable, open and responsible. These traits help to build effective work connections, increase productivity at work and ensure high-performing teams remain relevant.

Collaboration, drive and openness are also echoed in the responses from male data scientists at Peak. However, the company values that inspire male data scientists the most are smart and curious. Curiosity is clearly resonating with men as it helps to develop new ideas in creative brainstorming sessions and work through complex problem solving, which are all essential skills of a data scientist.

chart showing the company values that resonate most with peak's data scientists

We have seen small differences in the responses from male and female data scientists at Peak. We continue to instill confidence in other women through a variety of ways including mentoring, our new graduate scheme and by reaching out to schools and universities.

However, we also embrace differences within the team, as diversity encourages more diverse thinking and problem solving. At Peak, the shared values of being collaborative, driven and open result in a positive working environment for all data scientists. 

About the author

I work as a data scientist in Peak’s Supply Intelligence team. Our team combines data from across the supply chain to give a unified view of demand to help businesses optimize stock levels or resource planning. I love working on projects that first add value to our customers by extracting valuable insights from their data – and then deliver practical and innovative solutions so that our customers can become more efficient!

I enjoy applying my previous experience in logistics, process improvements and project management to successfully drive business results with Decision Intelligence.

I hope you’ve found some of my research into the background of Peak’s women in data science interesting! Feel free to reach out to me on LinkedIn if you have any questions.

More from Peak

Ensemble | Women in Data Science Week

Tune in to listen to Women in tech talk through their careers in Data Science so…

Stay in touch!

Subscribe to our newsletter to find out what’s going on at Peak

The post Peak’s Women in Data Science appeared first on Peak.

]]>