Highlights:

  • AWS Glue DataBrew, an enhanced tool by AWS, allows data analysts and data scientists to normalize and clean data up to 80% faster than conventional data preparation approaches.
  • Consumers such as INVISTA, NTT DOCOMO, Inc., and bp are using AWS Glue DataBrew.

All new AWS Glue DataBrew

Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, publicized the general availability of AWS Glue DataBrew. AWS Glue DataBrew is the new visual data preparation tool that allows data analysts and data scientists to normalize and clean data up to 80% faster without writing code. The normalized and clean data is mainly prepared for machine learning (ML) and analytics.

AWS Glue DataBrew is a visual data preparation tool for AWS Glue. AWS Glue offers both visual and code-based interfaces; thus, simplified loading, extracting, and orchestrating data in the cloud for consumers. DataBrew provides services that allow data exploration and experimentation directly from data warehouses, AWS data lakes, and databases without writing any code.

AWS’s new tool offers consumers over 250 pre-built transformations to automate data preparation tasks (e.g., correcting invalid values, filtering anomalies, and standardizing formats). Otherwise, it might require weeks or days writing hand-coded transformations. Once the procedure is complete and data is ready, consumers could instantly start using it with AWS and third-party ML techniques (Natural Language Processing (NLP)) and analytics services to query the data and train ML projects.  There is no upfront commitment or cost to use AWS Glue DataBrew, and consumers can only pay for running and creating transformations on datasets.

With AWS Glue DataBrew, end users can simply visually explore and access any amount of data across their organization directly from their Amazon Aurora and Amazon Relational Database Service (RDS) databases, Amazon Simple Storage Service (S3) data lake, and Amazon Redshift data warehouse.

Steps – how AWS Glue DataBrew works

  • Connect one or more datasets from the Glue data catalog.
  • Create a project to visually clean, explore, combine, normalize, and understand data in the dataset.
  • Produce a data profile for the selected dataset.
  • Normalize and clean data using more than 250 built-in transformations.
  • Automate the data preparation tasks by applying version recipes on all incoming data.
  • One can visually track and explore how datasets are linked to job runs, projects, and recipes.

Preparation of data for ML and analytics

Preparing data for ML and analytics include many necessary and time-consuming tasks, comprising the orchestration, data extraction, loading, cleaning, and normalization of ETL workflows at scale. For loading, extracting, and orchestrating data at scale, ETL developers and data engineers skilled in SQL or programming languages like Scala or Python could use AWS Glue. AWS recently introduced AWS Glue Studio to help the author run and monitor ETL jobs without writing any code.

This new visual data preparation tool publishes the prepared data to Amazon S3 that makes it easy for consumers to use it in analytics and ML immediately. AWS Glue DataBrew is serverless and completely managed. Thus, consumers never need to address, configure, or provision of any computing resources.

Raju Gulabani, Vice President, AWS Database, Analytics and amp; Machine Learning at Amazon, commented: “AWS customers are using data for analytics and machine learning at an unprecedented pace. However, these customers regularly tell us that their teams spend too much time on the undifferentiated, repetitive, and mundane tasks associated with data preparation.” Further, he added, “Customers love the scalability and flexibility of code-based data preparation services like AWS Glue, but they could also benefit from allowing business users, data analysts, and data scientists to visually explore and experiment with data independently, without writing code. AWS Glue DataBrew features an easy-to-use visual interface that helps data analysts and data scientists of all technical levels understand, combine, clean, and transform data.”

AWS Glue DataBrew customers comments

Consumers such as INVISTA, NTT DOCOMO, Inc., and bp are using AWS Glue DataBrew.

Takashi Ito, Senior Vice President of Corporate Affairs at NTT DOCOMO USA, Inc., commented: “AWS Glue DataBrew provides a visual interface that enables both our technical and non-technical users to analyze data quickly and easily. Its advanced data profiling capability helps us better understand our data and monitor data quality. AWS Glue DataBrew and other AWS analytics services have allowed us to streamline our workflow and increase productivity.”

John Maio, Director, Data and amp; Analytics Platforms Architecture at bp, commented: “We see AWS Glue DataBrew as a way to help us better manage our data platform and improve efficiencies in our data pipelines.”

Tanner Gonzalez, Analytics and amp; Cloud Leader at INVISTA, commented: “AWS Glue DataBrew will empower our analysts and data scientists to perform advanced data engineering activities, giving them the freedom to explore their data and decreasing the time to derive new insights.”

Availability as per regions

AWS Glue DataBrew is available in the Asia Pacific (Tokyo), US East (N. Virginia), Asia Pacific (Sydney), South East (Ohio), EU (Frankfurt), US West (Oregon), and EU (Ireland). It will be arriving soon in other regions too.

The new visual data preparation tool, AWS Glue DataBrew, will expand rapidly in the coming years.