Our Work
Research & Academia
DataHex and academia join forces.
Seamlessly access, query and analyze petabytes of diverse data — from real-time tick data, reference data structured data sets and more.
Ultizing NLP to enhance discovery for unstructured data
Situation
- Third Bridge is a market-leading global investment research provider for human-led insights to support capital markets firms with their decision-making process.
- Third Bridge engaged RoZetta to provide an innovative solution to automatically tag forum transcripts with companies from the reference data set previously mentioned in interviews.
- Customers needed a better search experience and wanted to discover relevant content more efficiently.
- Tagging the transcripts to provide a strong foundation for additional enhancements.
Solution
- RoZetta’s data science experts, and DataHex platform, mapped entities within 23,000 transcripts, from which over 2.9 million entities were mentioned, identifying 125 entity mentions per transcript on average.
- Enabled linking to additional data sources such as company fundamentals, news, data from other providers and alternative sources of unstructured text.
- Tracked sentiment of entities over time.
- Automated summarization of transcripts.
- Additional entities such as People, Locations and Industries were tagged.
- Developed an objective relevance measure for the identified companies mentions, initially based on transcript content with the ability to further enhance customer activity insights.
Benefits
RoZetta was able to successfully tag over 152,000 entities by leveraging NLP methods. This was ten (10) times more tags than previously identified by the client
RoZetta’s models achieved discoverability of entity mentions by 98.3% versus 37.4% current state, resulting in a substantially better search and transcript filtering experience, generating more relevant results and watch list notifications
Automated summarization reduced manual processes and improved efficiency
Additional entity tags allowed the client to enhance its search and discoverability
Linking various datasets meant the ability to extract relationships between entities
Sentiment Index efficiently generated insights into the perception of the market
Improved customer interface increased client engagement, reducing attrition, propelling customer growth, and lifting revenue
Financial cloud analytics platform for academics
Situation
- SIRCA had been operating an on-premise data portal to service its university client base who are focussed on PhD level academic research in capital markets. The solution provided a data download service for primarily Australian and New Zealand historical tick data, company announcements and Corelogic property data
- This platform had been in production since 1997 and was unmaintainable and unsupportable given its age
- RoZetta Technology collaborated with SIRCA with an enhanced proposition to market by expanding the range of data on offer, as well as migrate the service to a modern cloud analytics solution
Solution
- RoZetta Technology and SIRCA partnered with Morningstar to expand the data offering to include new geographies including new datasets covering company fundamentals, corporate actions, company announcements in addition to expanded tick data
- The solution was deployed on a Databricks managed cloud environment hosted on AWS
- The solution also normalises, conforms, and enriches this broad set of up to date historical datasets
- Want to know more about our Managed Service Platform technologies click here
Benefits
A fast, accessible, and usable solution for end users to retrieve significant data requests in only seconds and minutes rather than hours
No need to download data to a local environment. Develop and run code in the cloud interactively
A “one stop shop” for researchers to access, query, join and analyse more than 100 data assets across years of 15 years of history
Able to leverage highly scalable clusters of compute using Spark to distribute queries across computers
Choice of programming languages: R, Scala, Python and SQL with the ability to utilize the full programming libraries available in the language for example for graphing or machine learning
Strong security and full collaboration control. Researchers can share their work with a supervisor or other researchers