Our “INNERJOIN” series features collaborations, integrations, and thought partners. We’re offering perspectives from thought leaders in different industries to show the value, flexibility, and potential in data.
In this blog post, we host a commentary from Alejandro García Magos PhD, Lecturer at the Department of Political Science of the University of Toronto where he teaches Quantitative Methods and analyzes the promises, risks, and perspectives of political science, big data, and elections.
One of the rules in political science is to not let data drive your theories. In other words, one should try to come up with a theory before looking…
Data is the key to unlocking insights. But with the velocity and variety of today’s data, it’s difficult to pinpoint what you need to achieve your unique business goals. Data adds value not because you have it, but because you use it.
The Namara Marketplace is the easiest point of access for hundreds of thousands of normalized, queryable datasets. It’s the first piece of the puzzle to unlock your organization’s data strategy.
Google Dataset Search is…
Data can predict a lot of things, but right now, it’s impossible to know for certain how long our routines and lives will be disrupted as a result of the COVID pandemic. We will continue to follow the advice of experts and authorities to try to flatten the curve, and we urge every individual to do their part.
Here are a few snapshots from the team in their home working environments:
I recently saw a Facebook post that captured the perfect sunset with a not-so-perfect caption that read “as dust fell” instead of “as dusk fell.” Often when I’m browsing Facebook, I’ll read something that doesn’t seem quite right. When a phrase has correct words substituted for similar-sounding ones that, more often than not, don’t make sense, it’s called an eggcorn.
The term was coined in 2003 by linguist Mark Liberman after hearing a story about how a woman mistakenly referred to “acorns” as “eggcorns.” Since there was no specific word for this erroneous phrase, eggcorn was chosen in the same…
In 2018, ThinkData formed the DataLabs team to tackle the problem of disambiguating real world entities from data and to provide our users with entity resolution tools that are performant, flexible, and customized to their particular use case.
Our Lead Data Scientist, Hoyoung Jang, and Data Science Co-op Alumni, Cheng Lin, had the opportunity to speak at the Toronto Machine Learning Summit in 2019.
In this video, they present the details behind ThinkData Works’ entity resolution tech, as well as some of the technical aspects and challenges faced in building a scalable solution to record linkage.
With Toronto being a world-leading market for tech companies and talent, startups are as closely tied to this city as the CN Tower. There are a few different ideas that come to mind when we hear the word “startup” — sometimes, we picture millennials working at standing desks in a factory-turned-office, petting a dog, getting ready to play pool in the lounge, eating snacks and playing video games; other times we picture a closet-turned-office, where 10 developers work 80 hours per week and barely scrape by, with somebody at the helm who’s never managed a team before.
Even though the…
With Canada’s 43rd Federal Election not too far in the rearview mirror, we at ThinkData Works were curious as to what we can learn about our most recent election by stepping back from the punditry and analyzing some data. After all, using government data is a great way to understand how our government works.
There are dozens of open data sources we could use to explore this subject, but we wanted to primarily use data released through Elections Canada and Statistics Canada. Overall, we used 8 datasets to help us discover insights about the election:
On September 3, 2019 ThinkData Works partner Landgrid.com officially launched a nationwide vacancy data set from the United States Postal Service, making it possible to easily see vacant properties of any zoning type in any neighbourhood or city.
Landgrid.com, a product deployment wing of Loveland Technologies, built the data set out of USPS data, and plans to update it on a regular basis. This is the first time vacancy data has ever been released at the parcel level, making this data set a unique and powerful addition to the Landgrid.com product offering.
While aggregate USPS vacancy data is a widely-used…
In May 2014, Milwaukee experienced 82 water main breaks in five days, sending thousands of people scrambling for water and costing the city hundreds of thousands of dollars in infrastructure repair and property damage.
The series of breaks — although surprising in scale — are an all-too-frequent occurrence across North America. Aging infrastructure is an increasingly critical problem in the United States, and the American Society of Civil Engineers has recently calculated that the government needs to invest around $3.6T on water infrastructure alone by 2020.
According to some studies, there are around 240,000 water main breaks annually in the…
Entity resolution, also known as record linkage, is the task of disambiguating real world entities from data. That is to say, it’s the process of identifying and resolving multiple occurrences of a single entity to reveal a clearer picture of the information within the data. It’s simple enough conceptually, but exceedingly difficult to achieve in practice and at scale, which is why there aren’t many master data management solutions available.
In 2019, we formed the DataLabs team to tackle this problem and provide our users with entity resolution tools that are performant, flexible, and customized to their particular use case.