In 2018, ThinkData formed the DataLabs team to tackle the problem of disambiguating real world entities from data and to provide our users with entity resolution tools that are performant, flexible, and customized to their particular use case.

Our Lead Data Scientist, Hoyoung Jang, and Data Science Co-op Alumni, Cheng Lin, had the opportunity to speak at the Toronto Machine Learning Summit in 2019.

In this video, they present the details behind ThinkData Works’ entity resolution tech, as well as some of the technical aspects and challenges faced in building a scalable solution to record linkage.

Are you working…


With Toronto being a world-leading market for tech companies and talent, startups are as closely tied to this city as the CN Tower. There are a few different ideas that come to mind when we hear the word “startup” — sometimes, we picture millennials working at standing desks in a factory-turned-office, petting a dog, getting ready to play pool in the lounge, eating snacks and playing video games; other times we picture a closet-turned-office, where 10 developers work 80 hours per week and barely scrape by, with somebody at the helm who’s never managed a team before.

So, which one are we?

Even though the…


With Canada’s 43rd Federal Election not too far in the rearview mirror, we at ThinkData Works were curious as to what we can learn about our most recent election by stepping back from the punditry and analyzing some data. After all, using government data is a great way to understand how our government works.

There are dozens of open data sources we could use to explore this subject, but we wanted to primarily use data released through Elections Canada and Statistics Canada. Overall, we used 8 datasets to help us discover insights about the election:

  • Contributions to all political entities…

On September 3, 2019 ThinkData Works partner Landgrid.com officially launched a nationwide vacancy data set from the United States Postal Service, making it possible to easily see vacant properties of any zoning type in any neighbourhood or city.

Landgrid.com, a product deployment wing of Loveland Technologies, built the data set out of USPS data, and plans to update it on a regular basis. This is the first time vacancy data has ever been released at the parcel level, making this data set a unique and powerful addition to the Landgrid.com product offering.

While aggregate USPS vacancy data is a widely-used…


In May 2014, Milwaukee experienced 82 water main breaks in five days, sending thousands of people scrambling for water and costing the city hundreds of thousands of dollars in infrastructure repair and property damage.

The series of breaks — although surprising in scale — are an all-too-frequent occurrence across North America. Aging infrastructure is an increasingly critical problem in the United States, and the American Society of Civil Engineers has recently calculated that the government needs to invest around $3.6T on water infrastructure alone by 2020.

According to some studies, there are around 240,000 water main breaks annually in the…


Entity resolution, also known as record linkage, is the task of disambiguating real world entities from data. That is to say, it’s the process of identifying and resolving multiple occurrences of a single entity to reveal a clearer picture of the information within the data. It’s simple enough conceptually, but exceedingly difficult to achieve in practice and at scale, which is why there aren’t many master data management solutions available.

In 2019, we formed the DataLabs team to tackle this problem and provide our users with entity resolution tools that are performant, flexible, and customized to their particular use case.


We’ve written a lot about finding and preparing data lately. But that’s only the beginning when it comes to extracting value from it. When organizations begin flowing external data through their organization, the various issues associated with data sharing start causing problems. Some of these issues include:

  • Controlling permissions and data access, including sharing partial/filtered views or picture-in-picture data
  • Accurately tracking changes to data and monitoring how that affects the performance of models, even (and especially) when the data updates regularly
  • The ability to open up and share the data with other people and/or organizations that may benefit from it


As leaders in data technology, every aspect of what we do is driven by data — Engineering, Marketing, Sales, and Design decisions all need to be supported by numbers and evidence.

User research data is important insight that shapes our entire company and the way that we design our platform. By leveraging user data, we can better understand the needs, behaviours, and context of our audience to yield tailored solutions to complement user workflows.

We have built and designed the ultimate data workbench for data professionals. Namara, our end-to-end data management solution, streams data from any source in the world…


A bit of history

A few months back, our company enjoyed a retreat a few hours north of the city. We bonded, brainstormed, and set visions and goals for the development of our company and our products.

We also had bonfires with s’mores, and that’s more relevant than you might think.

We ended up with a surplus of marshmallows — about 4 extra bags. …


There are some expressions about data that are getting a bit tired: Data is the new oil; In God We Trust (all others must bring data); Buy data, sell high…

Okay, you caught me, I kind of made that last one up. But, the point stands: the narrative around data tells us we all understand that it has tremendous value.

The 4 Ms of Data

Data teams are tasked with gathering information from within their company and from any number of external sources, tying it all together, and conducting analysis and modeling to hit the four Ms of data: “Make Me More Money.”

Whether they’re…

ThinkData Works

Toronto-based Startup easing access to external data for everyone from civic hackers to business leaders

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store