13 Awesome Ideas from our Createathon

September 22, 2016

This past August, over 250 people at FINRA took two days to find data driven solutions for FINRA. With over 45 teams, there were some amazing ideas that came out of this hackathon. Below are 13 awesome ideas teams came up with.

1. Automating Processes with Machine Learning: NG Sonar

This team, NG Sonar, realized that our data models rely on manual processes for testing, implementing, and improving. Instead of a manual process, they showed how machine learning could be used to set up feedback loops to constantly evaluate a model’s effectiveness.

The team showed this by copying a current model in production. Integrating R with Java, they used logistic regression to create a new model. Then, they ran the two side by side using historical data. From there, they generated reports comparing the results, giving teams a choice to find the best solution for their needs.

2. A Virtual Assistant: A-Team

With the rise of technology like Alexa, Siri, and Cortana, the A-Team thought that a virtual personal assistant could help business users find disparate and hard to find data. They created a framework so that other developers could introduce a conversation-like interaction model in their own applications.

To create this, the team leveraged Alexa hardware, Alexa Skills Kit, AWS Lambda, and Apache Solr. Not wanting to rely solely on Amazon Echo, they built the framework so that other speech recognition services or text input could be substituted. Their prototype included integration with Request Manager, an internal application for requesting documents from member brokerage firms. It was also able to retrieve data from a time tracking app and find investment advisor firm information. The A-Team plans to add machine learning capabilities which will allows a virtual assistant to provide suggestions based on the current context and historical data.

3. Predicting Risk: Fraud Busters

Recently, our Enforcement Department reached out to see if qualification exam data could improve risk assessment of registered representatives (RR). The FraudBusters team took this idea and combined a wealth of data from multiple departments including Registration and Disclosure, Member and Market Regulation to build predictive models to score RR risk.

Using the random forests machine learning approach, they created a prototype that builds predictive models for different registration categories. The models were able to predict with 93% accuracy whether or not RR would receive one or more complaints during their careers. By automating and incorporating this application, it could help Enforcement make better informed decisions in risk assessments.

4. Classifying Documents: WonderDocs

WonderDocs saw a problem for our business side: they’d receive hundreds of documents in a case file with little organization. WonderDocs realized people were struggling to find out what kind of documents were in a case file and how to find the pieces they needed the most. They saw an opportunity for an automated classification engine.

So they created a program that could recognize the types of documents commonly found in case files. It can be used to bring users directly to their high-value documents and even identify when those documents have been omitted from the file. Instead of creating the engine from scratch, they worked with UMass’s open source classification engine. They trained it with six different document types, using 50 examples of each. Not only was the engine 96% accurate it also classified each document in less than a second. By integrating with Alfresco they could automatically apply the classification once a document is uploaded to a case folder.

5. Access Insights from Disparate Sources: Heisenberg

The Heisenberg team focused on the issue of accessing insights from various sources. More than just internal information, they wanted to incorporate external information as well to support stronger strategies and tactics.

They created INSYT, a service that would constantly stream information. Both internal and external sources would be indexed by Apache Solr. It could also be customized for certain trigger words. More than a simple list of events, they also included an extensible correlation engine to identify relationships across the data. They also worked on building a dashboard to visualize the data around trigger words and their relationships.

6. What They Say vs What They Do: Data Go

Since some brokers have used social media to spread false or misleading information, the Data Go team saw an opportunity for technology to prevent more of this behavior. They created a prototype that mapped tweets into sentiment metrics. On the other side, they created a price model. With machine learning they mapped sentiments into prices, so it could be interpreted in a market context.

This becomes powerful when cross checked with FINRA’s internal records. There, users can identify any inconsistencies between their financial activities and public messages. By combining data we have with public information, this project had a lot of potential to enhance our regulatory capability.

7. Connecting Data’s Dots: Data Troll

The Data Troll team saw lots of data that was hard to access or difficult to connect with other sets. Like dots on a page, they wanted to create a solution that would bring together structured and unstructured content, not only improving access to information but also strengthening analysis.

To solve this, they created a self-service data discovery tool to identify related data sets and documents. They used herd, FINRA’s open source data catalog system so users can find internal FINRA data. From there, their program used machine learning to identify associated internal and external data sets and previous analytics. Data modeling was added to show relationships between data sets. Finally, data visualization helps the user see system recommended analytics, select available data sets, and even lets the user create new analysis with charts and pivot tables.

8. Customized Investor Portal: Guardians of Investors

An important part of FINRA’s mission to protect investors includes providing the public with educational resources. Various FINRA websites host this information including BrockerCheck, FINRA.org, The Alert Investor, among others. The Guardians of Investors team wanted to develop a service that assesses what information might be most valuable for a given user and deliver it in a personalized dashboard. They called this dashboard MyFINRA.

Investors can enter information such as age, location and life events. Based on that data, MyFINRA then fills a user’s dashboard with customized content from various FINRA properties. This can include updates if a broker or firm they wish to follow has a significant change in their information such as a disclosure or firm merger. The goal of MyFINRA is to provide the latest personalized content that informs and protects investors.

9. Crowdsourcing Meets Oversight: Falcon

The Falcon team also tackled investor protection and education pain points. They saw an opportunity to help investors since many people don’t know their financial statuses. After hearing that the average household has over $15,000 debt on credit cards, the team wanted to help provide more opportunities for targeted investor education.

So they created a financial crowdsourcing application, FinGuide, with microservices architecture. They used AngularJS, Node, and MongoDB. People could come to it seeking financial strategies that best fit them. If they don’t find any that fit them, they can submit their financial status and ask for advice, just like programmers do with Stack Overflow without providing too much personal information. Registered brokers could share advice and link to their business and Broker Check profile. Not only would investors get a variety of advice but also check broker’s histories with Broker Check.

More than a unique social application, FinGuide could strengthen FINRA’s oversight. With this crowdsourcing platform, FINRA would be able to see who brokers target. They emphasized that it would help FINRA understand brokers’ patterns.

10. Finding Relationships in an Ocean of Big Data: Finding Nemo and Dory

With petabytes of data, this team also saw an issue of finding relationships and connections in FINRA’s vast repository of data. More importantly, the traditional method, tabulated formats, makes it even more difficult for users to find meaningful relationships between data, especially at scale.

To make this work, the Finding Nemo and Dory team harnessed various tools in AWS. With data placed in S3 buckets, the team used Lambda to ingest the data before an Elastic Search Cluster indexes the data. Visualization is done with Kibana and Graph for finding relationships.

Not only does this work visualize relationships it allows to users to ask more questions. More importantly, by using AWS it can also do this work at scale for wider analysis.

11. Using Social Media to Determine Consumer Sentiment: The Hound

The faster we identify potential violations, the less potential impact they can have on the market. The Hound team wondered: would analyzing social media and user generated content help them sniff out violations more quickly? They decided to test this against 5 different regulated broker dealer firms.

After picking out five different firms, they scraped complaint and review data from Consumer Affairs, Twitter, and Facebook. Then, they cleaned the data and began performing sentiment analysis and loaded it into Amazon’s RDS. Finally it is ready to be visualized with JavaScript, NodeJS, and Highcharts for trends and plotted with a regression fit curve to see if there are any outliers.

12. Gaming as a Service: Killer App

Many analysts in FINRA can comb through documents 400+ page documents for investigations. Natural Language Processing (NLP) helps lighten the load, but it’s not a perfect system. Confirming output is key, especially for matches with low confidence. This process can be tedious and repetitive. The Killer App team saw an opportunity: a game for validating NLP output.

They did this by creating a simple matching app, scaled for either desktop or mobile. The user only has to answer simple yes or no questions (ex. Makr Jnhnsno, is this a name?). These answers strengthen confidence on results and can be fed back into the algorithm to improve its accuracy. Turning verification into a game motivates users with points, high scores, and leaderboards. This game can transform idle time into something both productive and fun. Through a simple game experience, The Killer App team turned confirming NLP results into entertainment.

13. More than an Outlier: Team What

Many of the algorithms looking at Knowledge Discovery in Databases (KDD) look for patterns in large datasets. In many cases of outlier detection, outliers are seen in black and white. Team What wanted to investigate degree of outlierness, especially for more complex data sets.

To do this, they looked at Market Participant data. They used a density based outlier algorithm to calculate the distance between data points and finds it closest neighbors. The algorithm finds the outliers as well as the distance between outliers and how far they are from other data points. This gives the user a more nuanced view of the patterns.