Seven Ways to Avoid Bias in Your Data

Posted: 05/01/2020 - 05:34
Avoiding data Bias

AI is taking off in all areas of business and in our daily lives – from improving agriculture and predicting where forest fires might erupt to determining who is likely to return to a hospital after discharge. With advanced GPUs that can crunch more data faster and growing demand from companies looking to increase competitive advantage, machine learning and other forms of AI are expected to become more pervasive.

Today, many companies are relying on smart apps to provide the insight needed to make decisions that can affect people’s lives, such as who qualifies for a mortgage or who will be insured. Because of this responsibility, it’s more important than ever that data professionals don’t inadvertently automate any biases into the AI algorithm because of the data they use or don’t use, and how they use it.

While AI should be regulated to ensure the fair and ethical use of data, particularly as it impacts decision-making and people’s lives, unfortunately, we still have a long way to go before this happens. In the interim, it’s up to companies to provide the best anti-bias data practices. It’s not only the right thing to do, but it will also help them avoid a faux pas that can damage their brand reputation and have a negative impact on the customer experience.

Following are seven best practices for avoiding data bias when building an AI solution.

Hire a Diverse Team

Since people from different backgrounds and cultures bring different perspectives, sensitivities and ways of thinking, it’s important to have a team of data scientists that reflects this diversity. The majority of data scientists today are white males; according to a Harnham’s Diversity Report for the U.S. Data & Analytics Industry, only about 16% of data scientists are women. Without diversity, there is a possibility for unintentional racial and gender bias to creep into the data. For example, a study conducted by ProPublica found that algorithms used by Broward County courts to predict repeat offenders incorrectly identified black defendants twice as often as being at higher risk than white ones. It is particularly problematic since these risk assessments can be used to help determine a defendant’s treatment by the court, such as whether or not he/she is offered probation.

Conduct Diversity and Anti-Bias Training

Even with a diverse team and the best intentions, people may bring unconscious biases and assumptions into the algorithms they develop. Data scientists, data engineers and other team members today need to know more than technical skills – they need to be thinking about ethics and potential bias. HR or other experts should train these professionals to become aware of and guard against hidden biases. Every organization should establish guidelines and best practices that all professionals need to follow on how to recognize and avoid bias.

Use Diverse Data

It’s important to continually consider if there are inherent biases in the data that is used. For example, are mortgages denied based on a person’s ZIP Code, which may be biased against an ethnic group or class, regardless of a person’s credit score or ability to pay? What are the questions in the risk assessment algorithms used by courts that might create bias against blacks and other minorities? When Amazon tried to vet the best talent from resumes, for example, they used the data from 10 years of resumes to train the algorithm. Since the company received many more resumes from men over that timeframe, the AI program “learned” for example, that the word “women” (as in “women’s organization") was not desirable and gave the candidate a lower rating. In another case, AI and facial recognition programs have difficulty identifying non-white skin tones, which may be due to the greater availability of facial data for whites, while others are underrepresented in the data set.

Be Transparent

It’s critical that companies are transparent about algorithms that impact people’s lives. They should disclose the type of data used to teach the algorithm and the criteria that helped make decisions, such as who gets a mortgage, or how a court’s risk assessment is determined. Then, these companies need to carefully consider the feedback they receive from the public and adjust their algorithms accordingly.

Hire Tech Linguists to Develop Appropriate Conversations

The development of conversational AI apps for chatbots adds another level of responsibility for organizations. Companies must ensure that they are communicating with their customers appropriately and with respect. They should bring in tech linguists to make sure they are writing natural and appropriate responses and are able to participate in the flow of conversation naturally and clearly.

Develop the Chatbot Conversations in the Language It Will Be Used

A chatbot needs to understand cultural norms, idioms and dialect as well as accents – for call centers and other telephone applications – in order to clearly and accurately communicate and avoid miscommunication. As an example of what could go wrong, Jimmy Carter’s interpreter famously mistranslated his remarks to the Polish people as stating that he had sexual desire for them when he was talking about the Polish people’s “desires for the future.”

Continuously Test and Monitor the Algorithm

It’s critical to monitor your algorithm continually, not just to improve results, but also to make sure new data is not bringing new biases to the application. For example, bad actors on Twitter taught a Coke bot the wrong things, and it ended up posting Nazi propaganda before the company recognized the problem and took it down. Conduct ongoing testing against diverse audiences. For chatbots, analyze and report on which questions they couldn’t answer, and solicit ongoing feedback and recommendations from users.

As AI becomes more pervasive across organizations and in our everyday lives, data professionals must be vigilant against bias. In addition to being unfair and unethical, biases can lead to an organization being out of compliance with regulations and expose it to fines and penalties. And in the court of public opinion, as companies rely more heavily on these apps as their public face, if left unchecked, they can damage a company’s reputation beyond repair.

Region: 

About The Author

Carlos Meléndez's picture

Carlos M. Meléndez is the COO and Co-Founder of Wovenware an artificial intelligence and software development company based in San Juan, Puerto Rico. Mr. Meléndez has a bachelor’s degree in Electrical Engineering and a Juris Doctor both from the University of Puerto Rico. Mr. Meléndez is also the Vice Chairman of the Board of ConPRmetidos a non-profit organization that connects people to foster commitment with the personal, social and economic development of Puerto Rican communities wherever they are.