Advanced Analytics: Transforming Advertising Regulation
A three-day employee hackathon event a few years ago resulted in an idea that has since transformed the way FINRA’s Advertising Regulation group does its work.
On this episode, we are joined by Amy Sochard, Vice President of FINRA’s Advertising Regulation program, and Lisa Fair and Saugata Chakraborty of FINRA’s Technology, to hear about the impressive collaborative effort to build a tool to apply advanced analytics and machine learning to the review of firm advertising and public communications materials.
Resources mentioned in this episode:
Listen and subscribe to our podcast on Apple Podcasts, Google Play, Spotify or wherever you listen to your podcasts. Below is a transcript of the episode. Transcripts are generated using a combination of speech recognition software and human editors and may contain errors. Please check the corresponding audio before quoting in print.
00:00 – 00:26
Kaitlyn Kiernan: A three-day employee hackathon event a few years ago resulted in an idea that has since transformed the way FINRA's Advertising Regulation Group does its work.
On this episode, we hear from the head of FINRA's Ad Reg program and two members of FINRA Technology's team about the impressive collaborative effort to build a tool to apply advanced analytics and machine learning to the review of firm advertising and public communications materials.
00:26 – 00:37
00:37 – 00:56
Kaitlyn Kiernan: Welcome to FINRA Unscripted. I'm your host, Kaitlyn Kiernan. Today, I'm pleased to welcome back to the show Amy Sochard, vice president of FINRA's Advertising Regulation team. And we've got two new guests as well from FINRA Technology. We have Saugata Chakraborty and Lisa Fair. Welcome to the show.
00:57 – 00:58
Saugata Chakraborty: Thanks Kaitlyn, great to be here.
00:58 - 01:17
Kaitlyn Kiernan: On Episode 53 last year, Amy introduced us to FINRA's Advertising Regulation team. We encourage listeners to check that episode out for more information if you haven't listened to it already.
Amy, just at a high level to kick us off, can you refresh us on what FINRA's Advertising Regulation Group does?
01:17 - 01:53
Amy Sochard: Sure. Happy to oblige. So, the Advertising Regulation Department helps protect investors by helping ensure that broker dealers' communications with the public, including their websites, their social media, their print advertise, any of those communications are fair, balanced and not misleading. And we do that with a couple of regulatory programs. One is where firms submit material to our staff and we review those communications and we provide the firms with written analyses of those communications as to whether they comply or if there's things that need to be changed to bring them into compliance.
01:54 - 02:02
Kaitlyn Kiernan: And Saugata and Lisa can you kick us off with telling us what FINRA means when we use the phrase advanced analytics?
02:02 - 03:03
Saugata Chakraborty: We are all familiar with the conventional analytics approaches such as business intelligence and pattern recognition. But advanced analytics adds much more to this equation of technologies. It's the use of very sophisticated mathematical techniques to analyze data and get deeper insights into it that are not possible using traditional approaches. We are familiar with many of these techniques of advanced analytics in our daily lives, such as facial recognition, image recognition, speech recognition. Now we have also branched out into natural language processing, where we review documents and we render judgments or dispositions on them just like a human being would. Speech recognition, we find it when we call any customer support center, or when we find a chatbot that understands how we speak and what we mean, the context, and then can answer accordingly. And FINRA is increasingly making use of these advanced techniques to improve our regulatory capabilities and serve investors better.
03:03 - 03:29
Lisa Fair: Advanced analytics really allows us to do things that wouldn't be possible with the manpower that we have at hand. So, it takes that brain of the advertising regulatory analyst and supercharges it so that we can perform these tasks at scale. And that's really what we're looking at when we receive hundreds of thousands of documents a year in Advertising Regulation.
03:30 - 03:39
Kaitlyn Kiernan: Amy, what's the genesis of this advanced analytics project with Advertising Regulation? When did Ad Reg and Technology begin this work together?
03:39 - 04:30
Amy Sochard: So, this project came out of something that FINRA calls the Createathon, which we host every year for all of our staff. And it's an opportunity for several days for staff to get together and work on some really tough technological and also regulatory problems together in mixed teams that might involve folks from different departments, including Technology and Advertising. So, it's one of the ways FINRA really supports an innovative workplace.
And in 2018, during the Creatathon, a couple of my staff got together with some of the Technology folks and came up with this idea of using an advanced analytics tool that would help us triage and evaluate which were the most risky communications that were coming into our filings program even before a human analyst had a chance to look at them.
04:31 - 04:52
Kaitlyn Kiernan: And we did do a podcast on the Creatathon, so encourage listeners to check out Episode 43 for more information on that overall program. But it is great to hear about actual projects coming out of that and being implemented. And Saugata what makes the data so useful for this type of work with the advanced analytics team?
04:52 - 06:06
Saugata Chakraborty: Actually, a very good point, because 80 percent of any analytics project time spent is on making the data ready to apply the analytics algorithms to it. This project uses machine learning, and the big difference between conventional analytics and machine learning is the way we develop the tool. In conventional analytics we have to explicitly tell the tool what to do with the inputs that we provide it. With machine learning, on the other hand, it learns by example. So, we provide historical data and it learns from the data and then applies that knowledge to incoming data.
Especially in the case of advertising, it categorizes the incoming new advertising filing into some categories that are defined by the analyst. The way it does it, it looks at how the historical data has been categorized and then applies that knowledge to the new incoming file. So, for that, we need a couple of things. First is, we need the data already categorized historically. Second, we need large amounts of data. That's a characteristic of machine learning. We are fortunate that we have about five years of data well categorized, high quality, clean data for the machine to learn from, and that's absolutely critical to the success of this program.
06:06 - 07:28
Lisa Fair: He took this idea from the Createathon and started working on it in earnest. We put together this risk profile that Saugata was just talking about based on historical data, and we had a small group of analysts that we piloted this information with so they could go in and see how the tool was doing, and then they would give us feedback along the way. This worked or this didn't work. Maybe it is reading things out of order. So, we made sure that we adjusted our technologies, our tools, so that the text was read in a specific order because that was important to the meaning and understanding of the document.
And then we started thinking about other ways that we could use this tool. And we worked on expanding the use of this pattern analysis for similarity, so now when analysts are looking at an incoming document, they can go and see, oh there are five other documents that others have reviewed that I can take a look at and see how we reviewed them and help provide that consistency. So, this one idea has actually launched a few different things that we can do within the department to help us better understand what's coming into that inbox and more quickly adjudicate riskier things.
07:29 - 07:44
Kaitlyn Kiernan: So it sounds like the Creatathon, it's just a three day event, so you have an idea that comes out of it, but when you guys were actually working on implementing the idea and developing the project, that's where you fully fleshed out what the tool could do.
07:45 - 08:02
Lisa Fair: Absolutely. And the pilot program was super helpful in that respect because we were getting real time information back from the analysts as they were reviewing cases. And that allowed us to do that fine tuning before we released it out to the rest of the department.
08:02 - 08:12
Kaitlyn Kiernan: So, Amy, tell me about this final tool that your team is now currently using. What does it do and what is it looking to accomplish for your team?
08:13 - 09:05
Amy Sochard: So one of the things that we always want to make sure is that we're looking at the most problematic communications that may have been filed with us as quickly as we possibly can, because that's part of our investor protection mission and it's also a help to the firms to get to them as quickly as possible about communications that may need adjustment or may frankly not comply with the rules so that they can be pulled from the marketplace and protect investors that way.
So the tool identifies for us the communications that appear to have the highest risk, and that enables us in turn to focus on those communications early and where we need to bring in, say, more experienced resources, maybe more senior staff to review them. We can also bring those resources to bear in order to process those effectively. And that's kind of the core of the technology.
09:05 - 09:14
Kaitlyn Kiernan: And so, what is it looking at to determine what makes the piece of communication that's coming in more risky and making that determination for the team?
09:14 - 10:01
Amy Sochard: So, I'm going to take it from the business side of it and then I will give it over to my technology colleagues to talk about technologically how that happens. But essentially it is analyzing the text to look for relationships in the text and words that are used and the way they're used in relationship to one another based on the training it's received from our previous reviews to say, hey, this configuration is more risky and to flag it for the analysts to review.
One thing I want to be really clear about is that the tool doesn't replace the analysis that the staff does, that the staff are in fact analyzing all the communications and writing the comments that we provide to firms. This is simply a tool to help us decide which ones we want to look at earliest in our workflow.
10:02 - 11:19
Saugata Chakraborty: The way it works from a technical point of view is, as Amy mentioned, it learns from what's already been done in the past and analyzes those documents and creates kind of a map of the different words and their context and their proximity as to what constitutes a risky document versus what's less risky.
So, when a new document comes in, it extracts the words and relevant information from the document and creates a map for that document. Then it matches those two maps together, the closer the fit is to a risky map, the higher the probability the document will be risky. The further away it is from those two maps overlapping, the less risky it is.
So that's one of the big differences between advanced analytics and machine learning, and conventional analytics, because there's an amount of probability involved in the advanced and machine learning aspect of it. It says if the document is 99 percent risky with a 95 percent confidence, which is different from conventional analytics where it's one equation connecting inputs and outputs so it's always the same answer over and over again.
This is more probabilistic, which is kind of how the human mind works, there's an amount of judgment in it, there's an amount of context, there's an amount of interpretation. And that's what it is trying to learn, how does a human being, an Advertising Regulation analyst, interpret those documents and apply that to a new incoming document?
11:20 - 11:39
Kaitlyn Kiernan: Seems like the machine learning aspect of it might even be able to pick up on patterns that an analyst would have a hard time vocalizing. But when the computer is comparing multiple documents that in the past the analyst determined there were issues with it, the machine can see, oh, there are patterns across these documents.
11:39 - 12:30
Saugata Chakraborty: That's absolutely right. That's one of the powers of advanced analytics, where it can bring forward hidden insights because an analyst who is performing this task has been trained over years, had experience looking at various different patterns of documents and that knowledge is subconsciously embedded. And when they're applying it, it's not explicitly just by experience, but how they're doing it. When the machine is trying to analyze it, it kind of can extract those insights.
And as Lisa and Amy said, we always go back to the analyst to make sure that the insight the machine is providing is number one correct? Number two is it providing for the right reasons? And it does look reasonable over time, because ultimately our aim is to make the task easier so that they can delve into the higher-level analysis and get the tedious task taken care of by this machine learning.
12:31 - 13:23
Lisa Fair: And I can give an example that illustrates how this works in practice. One of the rules that we have is with respect to proximity. So, if you have a statement that is potentially risky, there is a rule that says that the risk disclosure needs to be close in proximity to that statement. So, when we look at these documents that come in and there is a risk disclosure included, we have found that the model will flag as potentially risky, documents where that proximity is further apart. In the map that we've created for that document, you see that they're further apart. And when we see that they're further apart in that map, that looks like other risky documents. And that's how we're able to flag this as a potential concern for one of the analysts to take a look at.
13:24 - 13:38
Kaitlyn Kiernan: Lisa, you mentioned earlier that in addition to the tool providing an estimate of the overall risk profile of a filing, it also provides the analysts with similar filings that have been addressed in the past. Can you tell us a little bit more about how that works?
13:38 - 15:11
Lisa Fair: When this idea first came to us, it was one sentence: Find similar documents that have been previously reviewed. And we started asking questions about what similarity meant because our developers are technologists, we're thinking that the document would look exactly alike and that meant similar. But when we started talking to the analysts, it wasn't really about exact matches. It was about being able to find similar issues so that when the analyst was reviewing that document, it could be consistent with how other analysts had treated that same issue.
This pattern or this map that we're creating for the risk assessment is the same tool that can help us identify these similar documents. And we were able to pilot that with our user group and say, hey, we're able to surface these similar documents. Is this helpful to you? And we've gotten good feedback from that tool.
And we found that there was another idea related to this where we find the most recent document, and that's a slightly different tool with a twist on it, where we go and find fund fact sheets that are repeatedly filed with updates on performance indicators or whatever may have changed with respect to that firm. This allows us to surface the last one that was submitted so that they can do that side by side comparison, see where the differences are and identify if there are any concerns with those changes.
15:11 - 15:14
Kaitlyn Kiernan: And Amy, how does that help benefit your team in practice?
15:15 - 16:14
Amy Sochard: So, one of the things that we always strive for and is something that I think the firms look to us for is consistency in the way we apply the rules. And our database, now, is at about, I think, over half a million records of different filings and there's a lot to digest. As an individual human being, you really can't review half a million records every time a new piece comes in.
So, by having this tool, it really helps the staff look across the entire database and get the pieces that are most relevant to their work to see - here's something similar. How did we review this? What were the issues that were presented? And then to analyze the communication with that information in hand. Of course, this gets its own separate review by the analysts, but this just makes it that much more certain that we're not wasting time and that we're not overlooking those needles in the haystack that could trip us up in the past.
16:15 - 16:21
Kaitlyn Kiernan: It sounds like there's a lot of great functionality, but what are some of the current limitations of the tool?
16:21 - 17:05
Amy Sochard: So, I can start on that and then I'll defer to my tech colleagues, but the biggest thing that it doesn't do right now, it is a very text, natural language processing-oriented tool. We want to give it the eyeglasses, if you will, so that it can look at graphics and tables and charts, which are something that you see in many, many financial services communications. And even more importantly, we want it to be able to analyze video, which is a growing medium for the industry, particularly during the pandemic. There's a lot more video communications being created and filed with us and also audio, such as this podcast, also very popular means of communication. So we want the tool to be able to look at those things as well.
17:06 - 18:05
Saugata Chakraborty: Another challenge, I would say, stems from the very nature of advanced analytics and especially neural networks, which is what this tool is built upon. The very power of neural networks comes from being able to behave like the human brain's neurons where there are layers of layers of neurons each processing data and finally coming up with an output. That's very close to what a human being does. But because it's so complex, it's very difficult to explain why you came up with that result.
And one of the common things in regulation is if we say a filing is risky, a common question to ask, why is it risky? Which words or which phrases or which combination, what sentiment, contributes to that? And that's very hard to do with a neural network. It is an area of academic research, and the team is currently engaged in looking at various techniques to help solve that kind of problem. Because it would help us not only make the tool better, but also help the analyst in understanding and interpreting the results of the analysis.
18:06 - 18:21
Kaitlyn Kiernan: Well, it sounds like the FINRA technology staff and Ad Reg has worked tremendously close together on this. So how did the technology team get such great business knowledge and how have the two teams work together to develop these capabilities?
18:22 - 19:14
Lisa Fair: We leveraged design thinking techniques to bring the two teams together. It's a method for looking at problems and making sure that we all understand the problem we're trying to solve before we start developing solutions. And particularly in technology, I think that's a blind spot for us sometimes. We want to start talking about how to solve things and how to build cool new tools. But we need to make sure that the things that we're building are really specific to and supportive of the analysts who are doing work and bringing everyone together in the room, it's a commitment from the business to set aside that time and make that a priority. And from Technology to really take the time to listen and make sure that we understand the business needs before we start moving forward with building the technology.
19:15 - 20:13
Saugata Chakraborty: Sometimes in technology we come in with blinders. We have tool and we want to use it. But as I said, we are using a neural network and we're trying to replicate the human brain. So, it's very, very important to understand how the brains in this case, the analyst, how they think. And the design thinking workshops that Lisa mentioned, that approach, it really helps us clarify the thought process of an analyst, as I said they are highly experienced and they do things subconsciously and sometimes explicitly, just by asking questions, that we can translate into the neural network. So having an open forum where we can all discuss and we come about the insights which otherwise we wouldn't have had, and that has tremendously helped us expand our knowledge base on the Advertising Regulation processes and how we can best help automate them using these advanced technologies.
20:14 - 20:30
Kaitlyn Kiernan: It seems like part of that is making sure that when you're talking about things, you're speaking the same language too. In terms of like Lisa, you mentioned similarity meant different things to the technology team versus the Ad Reg team. And it took discussing what do we mean by similar?
20:30 - 20:51
Lisa Fair: Yes. And honestly, just having the conversation where everyone realized that we weren't speaking the same language was the first step in that process. And I think that was helpful as well to get people around the table and talking more about what we wanted to do with the tools that we could build.
20:52 - 21:07
Kaitlyn Kiernan: And Amy, you've mentioned this doesn't replace any of the analytical work that your team does. So how does it fit in? And do you ever see this tool being able to write the letter to firms, or do you think that analysts will always have a role?
21:07 - 22:13
Lisa Fair: So, I think analysts are always going to have a role. I feel very strongly that advanced analytics is a way to help expand our reach and our ability to do our job, which is very important to the investing public. And what I see is this kind of tool enabling us to look at bigger data sets of communications and to help firms ensure that those communications comply with the rules and that the analysts can focus on developing the analyses, writing the comments and working with the firms to ensure that they understand the rules and apply them correctly. So, I don't see the tool as a replacement at all. I think what I see it doing is really, as I said, expanding our reach and ability to do our jobs that much better.
And I want to say I just have the greatest respect for my Technology colleagues on this. I think we've really seen an incredible spirit of collaboration, and it's a tribute to their ability to listen. And our team is very committed to continuing the collaborative work as we go forward.
22:13 - 22:29
Kaitlyn Kiernan: And I think maybe the only industry that evolves and changes faster than technology might be advertising and marketing. Amy, how does this technology help the team keep up with changes and trends in that space?
22:29 - 23:24
Lisa Fair: Well I think, as I mentioned before, we've seen, and I think everyone's seen a tremendous growth in video and audio communications with the public. And one of the things that I've heard from industry colleagues as well as my own staff, is that when you have audio or video, you have to review that in real time. And it's not quite as fast as if you're a very speedy reader. You really have to listen to the whole thing or view the whole thing and really understand what you're seeing.
So, having a tool that will enable us to parse out which are the most risky videos and audio communications would help us keep pace with one of the biggest innovations that's going on in the way people communicate. And I venture to say in some of the other types of communications that are becoming ever more popular these days, such as all the texting and the social media as well. So, I think that this technology is going to prove vital for FINRA to keep pace with what's going on.
23:25 - 23:31
Kaitlyn Kiernan: Anything else that you think is next up for this technology and how you're going to look to continue to improve these tools?
23:31 - 25:13
Saugata Chakraborty: As you mentioned, Kaitlyn, technology is something that changes very, very fast. Especially in this kind of machine learning. There's a tremendous amount of interest and investment being applied to in the industry. And what's current two weeks ago is obsolete a week from now. And the same thing applies to us. We have been continuously improving the toolset by updating our algorithms, as I say, bringing in new techniques which provide better insights.
But we also want to apply other techniques for detecting, for example, data drift. Which is the tool has been trained on a certain data set to learn from those examples. If incoming filings start changing in nature, then the tools accuracy will drop. So, we want to be able to detect that early and make changes early to the tool. We also want to ultimately make the tool self-learning so that we don't have to train it separately, it learns as it goes along. These are some of the good goals that we have. The technology is improving to get us to that point.
And as Amy mentioned, increased coverage of the data that we have today. A limitation of it is it can only detect what is has been trained for. For example, when COVID-19 came up there were advertising filings coming in with COVID-19. The tool couldn't detect it because it hadn't been trained for it. Now, there are new techniques that will flag some anomalies, even though the tool may not know what it is. It might say you might want to look at this area. So emerging issues, things that are on the watch list, those kind of things we want to incorporate into the toolset so that ultimately the analyst is relieved of the burden of looking at all the information presented at their fingertips from all different sources within and outside the company.
25:14 - 25:51
Lisa Fair: So right now, this tool is giving a piece of information and the analysts are needing to look at several different things in order to determine what they need to look at next and what may potentially be a violation or not a violation. So one of the things I'd like to see is that we provide a more consolidated or a quick view that takes all of this information and gives the analyst an easier picture to respond to so that the tool isn't sitting off to the side, that it's integrated into that stop light, so to speak, for an incoming piece.
25:54 - 26:26
Amy Sochard: So just another thing that we're looking at with the use of the advanced analytics for this tool, as Lisa alluded, it is going out to public data sources and pulling in the information that the analyst now may have to manually research and to put that at their fingertips. And as she said, a consolidated view again, that would just streamline the work process and ensure the computer can check the facts and the analysts can draw the conclusions. And I think that would just speed things along that much.
26:28 - 27:08
Kaitlyn Kiernan: It sounds like it's a very exciting time for FINRA's Advertising Regulation group, with this new technology. I really appreciated the opportunity to hear more about it today. So, Amy, Lisa Saugata, thank you so much for joining us today. I think that we will have to have an update in the future as this tool continues to evolve and look at video and audio. But for our listeners, if you don't already, make sure you subscribe to FINRA Unscripted on Apple Podcast, Overcast Pandora or wherever you listen to podcasts. If you have any ideas for future episodes, you can send us an email at [email protected] Til next time.
27:08 – 17:16
27:16 - 27:39
Disclaimer: Please note FINRA podcasts are the sole property of FINRA and the information provided is for informational and educational purposes only. The content of the podcast does not constitute any rule amendment or interpretation to such rules. Compliance with any recommended conduct presented does not mean that a firm or person has complied with the full extent of their obligations under FINRA rules, the rules of any other SRO or securities laws. This podcast is provided as is. FINRA and its affiliates are not responsible for any human or mechanical errors or omissions. Parties may not reproduce these podcasts in any form without the express written consent of FINRA.
27:39 – 27:49
Music Fades Out