Tech Start-up YData Open-Source Synthetic Data Community Aims to Improve Access to High Quality Synthetic Data

With research suggesting that by 2024, 60% of data used for the development of AI and analytics projects will be synthetically generated*, a Synthetic Data Community has been created by tech start-up YData to facilitate an open-source approach to improving access to tabular and time-series data, the most common formats for storing data.

The Synthetic Data Community established by YData, which created a data preparation platform to accelerate the development of AI solutions, aims to break down barriers for data science teams, researchers, and beginner learners and in so doing unlock the power of synthetic data. YData’s Synthesizer leverages state-of-the-art deep learning techniques to learn the statistical information from the real data and mimics it on a new dataset, without transforming the original data, nor copying the real records.

Top iTechnology Data Management News: INCATech’s Newest Practice Area – Enterprise Data Management (EDM)

“We believe that having quality data is truly a game-changer and that by creating high-quality data that resembles real-world data that was initially inaccessible, endless possibilities can be unlocked,” explains YData co-founder Gonçalo Martins Ribeiro.

Synthetic data is artificially created and keeps the original data properties, ensuring its business value while being compliant. Using synthetic data reduces the risk of profile re-identification and opens up potential for innovation, collaboration and new revenue streams. Individuals’ privacy and protection against re-identification attacks are secured through mathematical methods.

Besides preserving the statistical properties of the original data, YData’s synthesized approach preserves the data quality and structure, ensuring high-quality data for purposes such as training ML models.

Top iTechnology Cloud News: Goliath Technologies Launches Multi-Cloud Monitoring Solution

Moreover, by leveraging synthetic data, organizations can achieve dataset balancing, helping to sort issues such as bias and ensure more fairness within the datasets used to develop AI initiatives. YData accelerates and eases the data sharing or selling processes, speeding up the build of a trustful data economy.

“In 2020 we conducted a study that found that the biggest problem faced by data scientists was the unavailability of high-quality data even though it is widely accepted that data is the most valuable resource,” continues Ribeiro.

“Not every company, researcher, or student has access to the most valuable data like some tech giants do. As machine learning algorithms, coding frameworks evolve rapidly, it’s safe to say the scarcest resource in AI is high-quality data at scale. The Synthetic Data Community is a step towards addressing that.”

Top iTechnology IT and DevOps News: Emtec and MashMe Announce Strategic Partnership to Deliver Classroom of the Future

[To share your insights with us, please write to sghosh@martechseries.com]

Tech Start-up YData Open-Source Synthetic Data Community Aims to Improve Access to High Quality Synthetic Data

CIO Influence News Desk

Leave a Comment Cancel Reply

Quick Links

Visit Our Other Sites

Quest Unveils Toad Data Point 5.6, Accelerating Data Democratization in the Enterprise

Fujitsu Launches ‘Global Fujitsu Distinguished Engineer’ Program to Accelerate Global Business, Technology, and Human Resources Strategies with Leading Engineers

CIO Influence News Desk

Related posts

Telxius Enhances its Security Service with Radware

Mitel Appoints Eric Hanson as Chief Marketing Officer

CyTwist Launches Advanced Security Solution to identify AI-Driven Cyber Threats in Minutes

Leave a Comment Cancel Reply