Large amounts of quality data are a key requisite for Machine Learning. But often these data are difficult to find and very expensive. Akshay Pai, Co-founder and CTO of Cerebriu may have found a novel approach.
Last week, Asia House hosted an event for the artificial intelligence (AI) and machine learning (ML) community. The focus of the evening was a foundational problem for companies and researchers in this area: You need a good quality dataset to train your machine learning model, but how do you get that data to begin with?
High cost and privacy
The four speakers participating in the event came from the healthcare sector where this issue is particularly acute. Obtaining large amounts of quality health data is costly and time consuming, and complicated by the thorny issues of patient privacy.
While the four speakers used a variety of tools to overcome this hurdle, one approach stood out. Akshay Pai from Cerebriu bought his data from Asia. Cerebriu is a company that helps augment the radiologist workflow within brain imaging.
Pros & Cons of going to India
Working with Indian hospitals, Cerebriu was able to access very large and diverse datasets at low cost. One center alone would often have more than 500,000 images in their picture archives, ensuring a high level of variance.
Furthermore, building a relationship with the hospitals was fast and easy. The hospitals were private and bureaucracy relatively straightforward, with a low number of decision makers involved. Finally, the hospitals had already obtained data acquisition compliance from patients, meaning patient privacy issues were taken care of.
The main downside related to data quality. The images provided were mainly 2D (and 3D images are much more powerful). Furthermore, since there were no standards on how to label images, radiologists created their own individual labels, which added much time to the crucial task of labelling and validating data. At any rate, the data was in English, making it doable for non-Asian readers.
Looking for a well-trodden path
As more startups with data requirements look to Asia, some of these obstacles may be removed, either by improved standardisation at the clinical source, or via the hiring of local consultants to perform data labeling. Data acquisition for Machine Learning may become a future path for collaboration between Danish and Asian innovation ecosystems.