GCP DATA
GCP Practice Questions about Storage DB and BigData
Sign in to Google to save your progress. Learn more
The database administration team has asked you to help them improve the performance of their new database server running on Google Compute Engine. The database is for importing and normalizing their performance statistics and is built with MySQL running on Debian Linux. They have an n1-standard-8 virtual machine with 80 GB of SSD persistent disk. What should they change to get better performance from this system?
Clear selection
Which of these is NOT a type of trigger that applies to Dataflow?
Clear selection
What is the recommended minimum amount of data to store in Bigtable?
Clear selection
Which Hadoop ecosystem service is most suited to storing on BigQuery instead?
Clear selection
Which types of Bigtable row keys can lead to hotspotting, that is, increased read/write loads on a particular Bigtable node? (Choose two)
For this question, refer to the MJTelco case study here: https://cloud.google.com/certification/guides/data-engineer/casestudy-mjtelco. MJTelco needs to be able to reliably handle ever-increasing amounts of streaming telemetry data, process it, and economically store analyzed data. What services should they use for this task?
Clear selection
Which of these open source technologies is the direct equivalent to Google BigQuery?
Clear selection
What are the different partitioning methods on BigQuery? (Choose two)
Which of the following statements on BigQuery are true?
To run a local training job using the Google Cloud SDK, what command would you run?
Clear selection
You have hundreds of IoT devices that generate 1 TB of streaming data per day. Due to latency, messages will often be delayed compared to when they were generated. You must be able to account for data arriving late within your processing pipeline. What should you do?
Clear selection
For this question, refer to the TerramEarth case study https://cloud.google.com/certification/guides/cloud-architect/casestudy-terramearth .  To speed up data retrieval, more vehicles will be upgraded to cellular connections and be able to transmit data to the ETL process. The current FTP process is error-prone and restarts the data transfer from the start of the file when connections fail, which happens often. You want to improve the reliability of the solution and minimize data transfer time on the cellular connections. What should you do?
Clear selection
In order to save on bandwidth costs, you need to load data into BigQuery in a compressed state. What is the preferred file format to balance size and performance?
Clear selection
You currently have a Bigtable instance you've been using for development running a development instance type, using HDD's for storage. You are ready to upgrade your development instance to a production instance for increased performance. You also want to upgrade your storage to SSD's as you need maximum performance for your instance. What should you do?
Clear selection
You are upgrading your existing (development) Cloud Bigtable instance for use in your production environment. The instance contains a large amount of data that you want to make available for production immediately. You need to design for fastest performance. What should you do?
Clear selection
What is a deep neural network?
Clear selection
What is a difference between example (training) data and test data?
Clear selection
You are building a data pipeline on Google Cloud. You need to select services that will host a deep neural network machine-learning model also hosted on Google Cloud. You also need to monitor and run jobs that could occasionally fail. What should you do?
Clear selection
Operational parameters such as oil pressure are adjustable on each of TerramEarth's vehicles to increase their efficiency, depending on their environmental conditions. Your primary goal is to increase the operating efficiency of all 20 million cellular and unconnected vehicles in the field. How can you accomplish this goal?  Select one: https://cloud.google.com/certification/guides/cloud-architect/casestudy-terramearth 
Clear selection
In a Dataflow processing pipeline, which concept describes timestamps attached to incoming messages?
Clear selection
You have a collection of media files over 5GB each that you need to migrate to Google Cloud Storage. The files are in your on-premises data center. What migration method can you use to help speed up the transfer process?
Clear selection
You need to export Avro formatted data from BigQuery into Cloud Storage. What is the best method of doing so from the web console?
Clear selection
For this question, refer to the TerramEarth case study: Based on TerramEarth's current data flow environment (refer to the image in the case study), what are the direct GCP services needed to replicate the same structure for batch uploads?
Captionless Image
Clear selection
Your company is evaluating moving to Google Cloud. You will need to migrate your 3 TB on-premises MySQL databases to a managed database service in order to reduce administrative overhead. Minimal modification to the database is desired for the move. What managed database service would best meet this requirement?
Clear selection
You are asked to design next generation of smart helmet for accident detection and reporting system.  Each helmet will push 10kb of biometric data In JSON format every 1 second to a collection platform that will process and use trained machine learning model to predict and detect if an accident happens and send notification. Management has tasked you to architect the platform ensuring the following requirements are met:·         Provide the ability for real-time analytics of the inbound biometric data·         Ensure processing of the biometric data is highly durable. Elastic and parallel·         The results of the analytic processing should be persisted for data mining to improve the accident detection ML model in the futureWhich architecture outlined below win meet the initial requirements for the platform?
Clear selection
Which of the following statements are true?
Your company wants to reduce cost on infrequently accessed data by moving it to the cloud. The data will still be accessed approximately once a month to refresh historical charts. In addition, data older than 5 years is no longer needed. How should you store and manage the data?
Clear selection
Your company is making the move to Google Cloud and has chosen to use a managed database service to reduce overhead. Your existing database is used for a product catalog that provides real-time inventory tracking for a retailer. Your database is 500 GB in size. The data is semi-structured and does not need full atomicity. You are looking for a truly no-ops/serverless solution. What storage option should you choose?
Clear selection
You are setting up Cloud Dataproc to perform some data transformations using Apache Spark jobs. The data will be used for a new set of non-critical experiments in your marketing group. You want to set up a cluster that can transform a large amount of data in the most cost-effective way. What should you do?
Clear selection
You are viewing the details of a recent large query and notice that Stage 1 has a full purple bar. What does this tell you?
Clear selection
Your CI/CD pipeline process is shown in the diagram. Which GCP services should you use in boxes 1, 2, and 3?
Captionless Image
Clear selection
Which of the following does the Cloud SQL Proxy need to connect to a Cloud SQL instance?
Which of the following is a GCP Machine Learning service?
Clear selection
You are developing an application that will process thousands of images and scan for explicit content. You need to develop your learning model quickly, and are not familiar with working in Tensorflow. How can you complete this task as quickly as possible while saving on costs?
Clear selection
You are designing storage for event data as part of building a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying individual values over time windows. Which storage service and schema design should you use?
Clear selection
A GCP user requires a service that can be used to build meta data on image catalog, moderate offensive content or perform image sentiment analysis. Which of these GCP services would you recommend?
Clear selection
Which of the following are GCP Structured Data Services?
Which of these is NOT a valid reason to choose an HDD storage type over SSD in a Bigtable instance?
Clear selection
Which of the following statements are true?
Which of the following statements on Cloud Dataproc are true?
Your Datastore database has a large number of properties per entity. Some of these entities have only one possible value assigned to them. You want to avoid explosive indexing, which will hurt performance and increase storage. What step should you take to avoid this?
Clear selection
You host structured data for analysis for multiple clients in BigQuery. For organizational purposes, you need to store all of the different clients' data in a single project. You also need to be able to give your clients the ability to query their own data without having access to other clients' data. How can you best achieve this?
Clear selection
For this question, refer to the MJTelco case study here: https://cloud.google.com/certification/guides/data-engineer/casestudy-mjtelcoMJTelco is streaming telemetry data into BigQuery for long-term storage (2 years) and analysis, at the rate of about 100 million records per day. They need to be able to run queries against certain time periods of data without incurring the costs of querying all available records. What two options would you recommend for doing so? (Choose all that apply)
Your company’s architecture is shown in the diagram. You want to keep data in sync across Region 1 and Region 2. Which product should you use?
Captionless Image
Clear selection
Which of the following statements on Cloud BigQuery are true?
What is the command for creating a storage bucket that has once per month access and is named 'archive_bucket'?
Clear selection
Choose the components that are created when you type 'datalab create (instance_name)'. (Choose all that apply)
Your team has decided to use Datalab for interactive machine learning exercises. You want your team members to share their work and progress with each other. How do you accomplish this?
Clear selection
You created a job which runs daily to import highly sensitive data from an on-premises location to Cloud Storage. You also set up a streaming data insert into Cloud Storage via a Kafka node that is running on a Compute Engine instance. You need to encrypt the data at rest and supply your own encryption key. Your key should not be stored in the Google Cloud. What should you do?
Clear selection
Your company has successfully migrated to the cloud and wants to analyze their data stream to optimize operations. They do not have any existing code for this analysis, so they are exploring all their options. These options include a mix of batch and stream processing, as they are running some hourly jobs and live-processing some data as it comes in. Which technology should they use for this?
Clear selection
You are setting up Cloud Dataproc to perform some data transformations using Apache Spark jobs. The data will be used for a new set of non-critical experiments in your marketing group. You want to set up a cluster that can transform a large amount of data in the most cost-effective way. What should you do?
Clear selection
For this question, refer to the Mountkirk Games case study https://cloud.google.com/certification/guides/cloud-architect/casestudy-mountkirkgames. Mountkirk Games wants to set up a real-time analytics platform for their new game. The new platform must meet their technical requirements. Which combination of Google technologies will meet all of their requirements?
Clear selection
You have a project using BigQuery. You want to list all BigQuery jobs for that project. You want to set this project as the default for the bq command-line tool. What should you do?
Clear selection
You are building a data pipeline on Google Cloud. You need to prepare source data for a machine-learning model. This involves quickly deduplicating rows from three input tables and also removing outliers from data columns where you do not know the data distribution. What should you do?
Clear selection
What is the purpose of hyperparameters in a machine learning training model?
Clear selection
One of your primary business objectives is being able to trust the data stored in your application. You want to log all changes to the application data. How can you design your logging system to verify authenticity of your logs?
Clear selection
You have been asked to select the storage system for the click-data of your company's large portfolio of websites. This data is streamed in from a custom website analytics package at a typical rate of 6,000 clicks per minute, with bursts of up to 8,500 clicks per second. It must be stored for future analysis by your data science and user experience teams.Which storage infrastructure should you choose?
What IAM role do you need to grant to service accounts for Dataproc workloads, while offering the smallest scope of permissions?
Clear selection
You need to run analytical queries using SQL syntax against data formatted in JSON format. What should you do? Choose the best answer.
Clear selection
You need to choose a managed database solution for an upcoming application your company is designing. The database will store transactional, non-relational data, however, atomicity is required for strong consistency. Which managed database solution should you choose?
Clear selection
Your company plans to migrate a multi-petabyte data set to the cloud. The data set must be available 24hrs a day. Your business analysts have experience only with using an SQL interface. How should you store the data to optimize it for ease of analysis?
Clear selection
If the reads and writes are not evenly distributed in a BigTable database, performance can take a hit.
Clear selection
A GCP user wishes to develop an application that annotates video footage into a variety of formats, identifying key entities within video and when they occur, and making the video content searchable and discoverable. Which of the following services would you recommend?
Clear selection
You are using a Compute Engine instance to manage your Cloud Dataflow processing workloads. What IAM role do you need to grant to the instance so that it has the necessary access?
Clear selection
Your application has a large international audience and runs stateless virtual machines within a managed instance group across multiple locations. One feature of the application lets users upload files and share them with other users. Files must be available for 30 days; after that, they are removed from the system entirely. Which storage solution should you choose?
Clear selection
How can you set up your Dataproc environment to use BigQuery as an input and output source?
Clear selection
As part of your backup plan, you set up regular snapshots of Compute Engine instances that are running. You want to be able to restore these snapshots using the fewest possible steps for replacement instances. What should you do?
Clear selection
You are evaluating a storage solution for your data. Your data is in a structured, non-relational format, and will be used for analysis. You need the lowest latency read and write speeds possible. Your data is about 3 TB in size, predicted to grow to up to 5 TB. What solution should you use?
Clear selection
Your company has a mission-critical application that serves users globally. You need to select a transactional and relational data storage system for this application. Which two products should you choose?
You have 250,000 devices which produce a JSON device status event every 10 seconds. You want to capture this event data for outlier time series analysis. What should you do?
Clear selection
For this question, refer to the TerramEarth https://cloud.google.com/certification/guides/cloud-architect/casestudy-terramearth  case study.TerramEarth's 20 million vehicles are scattered around the world. Based on the vehicle's location its telemetry data is stored in a Google Cloud Storage (GCS) regional bucket (US. Europe, or Asia). The CTO has asked you to run a report on the raw telemetry data to determine why vehicles are breaking down after 100 K miles. You want to run this job on all the data. What is the most cost-effective way to run this job?
Clear selection
You are developing an application on Google Cloud that will label famous landmarks in users’ photos. You are under competitive pressure to develop the predictive model quickly. You need to keep service costs low. What should you do?
Clear selection
You are creating a machine learning model for predicting a person's income given a variety of factors such as age, race, occupation, and others. What type of problem are we trying to solve in our prediction values?
Clear selection
Which of the following statements on Cloud Storage are true?
Your BigQuery table needs to be accessed by team members who are not proficient in technology. You want to simplify the columns they need to query to avoid confusion. How can you do this while preserving all of the data in your table?
Clear selection
You have a mission-critical database running on an instance on Google Compute Engine. You need to automate a database backup once per day to another disk. The database must remain fully operational and functional and can have no downtime. How can you best perform an automated backup of the database with minimal downtime and minimal costs?
Clear selection
In machine learning, what is the difference between test and training data?
Clear selection
You are planning the design of your Bigtable table, which will be used to collect speed limit data on highways. You anticipate needing to query by: Highway name: Mile marker  - Timestamp of measurement taken. How should you design your schema in order to maximize efficiency, query all necessary data, and avoid hotspots in the row key?
Clear selection
For this question, refer to the Dress4Win case study https://cloud.google.com/certification/guides/cloud-architect/casestudy-dress4winDress4Win is evaluating how their current database structure would translate to Google Cloud. They need to know which databases can be converted to a managed service, and which ones will need to remain unmanaged. They do not want to re-engineer their databases into a different format. Choose the two correct answers for their available options for database hosting, keeping in mind to use managed services where applicable. (Choose two)
You want to optimize the performance of an accurate, real-time, weather-charting application. The data comes from 50,000 sensors sending 10 readings a second, in the format of a timestamp and sensor reading. Where should you store the data?
Clear selection
While conducting BigQuery queries against a large table with many columns, you notice in the details section that you have a very large purple bar in the first stage of your query execution. How can you troubleshoot this to increase performance and reduce costs? (Choose all that apply)
Which of these open source frameworks is best suited to process simultaneous batch and streaming in a single data pipeline?
Clear selection
Your company is developing a next generation pet collar that collects biometric information to assist potential millions of families with promoting healthy lifestyles for their pets. Each collar will push 30kb of biometric data In JSON format every 2 seconds to a collection platform that will process and analyze the data providing health trending information back to the pet owners and veterinarians via a web portal. Management has tasked you to architect the collection platform ensuring the following requirements are met.1. Provide the ability for real-time analytics of the inbound biometric data 2. Ensure processing of the biometric data is highly durable, elastic and parallel 3.The results of the analytic processing should be persisted for data mining. Which architecture outlined below win meet the initial requirements for the platform?
Clear selection
You are transferring a very large number of small files to Google Cloud Storage from an on-premises location. You need to speed up the transfer of your files. Assuming a fast network connection, what two actions can you do to help speed up the process?Choose the 2 correct answers:
You are working on a project with two compliance requirements. The first requirement states that your developers should be able to see the Google Cloud Platform billing charges for only their projects. The second requirement states that your finance team members can set budgets and view the current charges for all projects in the organization. The finance team should not be able to view the project contents. You want to set permissions. What should you do?
Clear selection
You need to estimate the annual cost of running a Bigquery query that is scheduled to run nightly. What should you do?
Clear selection
What open source software is Datalab based on?
Clear selection
Which of the following Cloud Storage related statements are correct?
You are creating a machine learning model to predict the likelihood of fraud from credit card transaction data. What type of learning model problem is this?
Clear selection
You need to give a team member the ability to use a training model for predictions, but not have the ability to create or delete models. What IAM role should you assign to achieve this task with the minimum necessary permissions?
Clear selection
Your company processes high volumes of IoT data that are time-stamped. The total data volume can be several petabytes. The data needs to be written and changed at a high speed. You want to use the most performant storage option for your data. Which product should you use?
Clear selection
You created a job which runs daily to import highly sensitive data from an on-premises location to Cloud Storage. You also set up a streaming data insert into Cloud Storage via a Kafka node that is running on a Compute Engine instance. You need to encrypt the data at rest and supply your own encryption key. Your key should not be stored in the Google Cloud. What should you do?
Clear selection
You are migrating your existing data center environment to Google Cloud Platform. You have a 1 petabyte Storage Area Network (SAN) that needs to be migrated. What GCP service will this data map to?
Clear selection
What is the process of loading Cloud SQL data into BigQuery for analysis?
Clear selection
Which of these numbers are adjusted by a machine learning neural network as it works with its training dataset? (Choose all that apply)
For this question, refer to the Dress4Win case study. As part of their new application experience, Dress4Wm allows customers to upload images of themselves. The customer has exclusive control over who may view these images.Customers should be able to upload images with minimal latency and also be shown their images quickly on the main application page when they log in. Which configuration should Dress4Win use?
Clear selection
Which of these statements is true regarding BigQuery caching?
Clear selection
Pick two benefits of using denormalized data in BigQuery? (Choose all that apply)
Cloud Dataflow fully automates the management of processing resources.
Clear selection
Choose two best practices for creating more efficient queries and saving costs
You are designing a relational data repository on Google Cloud to grow as needed. The data will be transactional consistent and added from any location in the world. You want to monitor and adjust node count for input traffic, which can spike unpredictably. What should you do?
Clear selection
You want to display aggregate view counts for your YouTube channel data in Data Studio. You want to see the video tiles and view counts summarized over the last 30 days. You also want to segment the data by the Country Code using the fewest possible steps. What should you do?
Clear selection
Your company currently hosts an AWS S3 bucket. You need to keep the contents of this bucket in sync with a new Google Cloud Storage bucket to support a backup storage destination. What is the best method to achieve this?
Clear selection
For future phases, Dress4Win is looking at options to deploy data analytics to the Google Cloud. Which option meets their business and technical requirements?
Clear selection
Your infrastructure includes two 100-TB enterprise file servers. You need to perform a one-way, one-time migration of this data to the Google Cloud securely. Only users in Germany will access this data. You want to create the most cost-effective solution. What should you do?
Clear selection
You are creating a solution to remove backup files older than 90 days from your backup Cloud Storage bucket. You want to optimize ongoing Cloud Storage spend. What should you do?
Clear selection
Your customer is moving their storage product to Google Cloud Storage (GCS). The data contains personally identifiable information (PII) and sensitive customer information. What security strategy should you use for GCS?
Clear selection
Which of the following statements are true?
How would you best connect your Dataflow pipeline to Bigtable for output?
Clear selection
Your BigQuery dataset contains 1500 tables. When conducting a query, you are limited to a maximum of 1000 tables that you can query at once. You need to query data across all 1500 tables. What should you do?
Clear selection
Which of the following can be used to get data into cloud storage?
What is the open source equivalent to Cloud Pub/Sub?
Clear selection
You have a streaming Dataflow pipeline that you need to shut down. You want data already in the pipeline to finish and be sent to output before shutting down. Which shutdown option should you use to complete the shutdown process?
Clear selection
You need to create a model that predicts stock prices given a variety of factors. What type of problem are we modeling for?
Clear selection
For this question, refer to the MountKirk Games case study (https://cloud.google.com/certification/guides/cloud-architect/casestudy-mountkirkgames):  MountKirk Games needs to set up their game backend database. Based on their requirements, which storage service best fits their needs?
Clear selection
Your customer is moving their storage product to Google Cloud Storage (GCS). The data contains personally identifiable information (PII) and sensitive customer information. What security strategy should you use for GCS?
Clear selection
Your company is forecasting a sharp increase in the number and size of Apache Spark and Hadoop jobs being run on your local datacenter. You want to utilize the cloud to help you scale this upcoming demand with the least amount of operations work and code change.Which product should you use?
Clear selection
You are selecting a streaming service for log messages that must include final result message ordering as part of building a data pipeline on Google Cloud. You want to stream input for 5 days and be able to query the most recent message value. You will be storing the data in a searchable repository. How should you set up the input messages?
Clear selection
You are selecting a streaming service for log messages that must include final result message ordering as part of building a data pipeline on Google Cloud. You want to stream input for 5 days and be able to query the most recent message value. You will be storing the data in a searchable repository. How should you set up the input messages?
Clear selection
Which of these are valid IAM control options in BigQuery. (Choose all that apply)
For this question, refer to the JencoMart case study. https://cloud.google.com/certification/guides/cloud-architect/casestudy-jencomart   JencoMart has decided to migrate user profile storage to Google Cloud Datastore and the application servers to Google Compute Engine (GCE). During the migration, the existing infrastructure will need access to Datastore to upload the data. What service account key-management strategy should you recommend?
Clear selection
Which of the following statements are true?
You are working on a project with two compliance requirements. The first requirement states that your developers should be able to see the Google Cloud Platform billing charges for only their own projects. The second requirement states that your finance team members can set budgets and view the current charges for all projects in the organization. The finance team should not be able to view the project contents. You want to set permissions. What should you do?
Clear selection
Which of the following statements are true?
What happens to your Bigtable data when a Bigtable node suffers a critical failure?
Clear selection
Which of the following statements are true?
Your company is planning the infrastructure for a new large-scale application that will need to store over 100 TB or a petabyte of data in NoSQL format for Low-latency read/write and High-throughput analytics. Which storage option should you use?
Clear selection
You are building an application that needs to convert recorded customer service calls into text format, and will then examine call transcripts to determine customer sentiment. What is the most time effective method of doing this?
Clear selection
As part of your backup plan, you set up regular snapshots of Compute Engine instances that are running. You want to be able to restore these snapshots using the fewest possible steps for replacement instances. What should you do?
Clear selection
When developing your machine learning model, you need to tune your hyperparameters. Which of these answers are examples of hyperparameters? (Choose all that apply)
Your organization has migrated their Hadoop workloads to Cloud Dataproc. To fully take advantage of the cloud, you want to decouple your Hadoop storage and compute, and be able to destroy your cluster when compute is complete in order to save costs while preserving your data. What should you do?
Clear selection
You need to extract an address field from a multi-column element using Dataflow. Which mechanism is able to help with this task?
Clear selection
What software libraries does Cloud ML Engine support?
Clear selection
Your company wants to track whether someone is present in a meeting room reserved for a scheduled meeting. There are 1000 meeting rooms across 5 offices on 3 continents.Each room is equipped with a motion sensor that reports its status every second. The data from the motion detector includes only a sensor ID and several different discrete items of information. Analysts will use this data, together with information about account owners and office locations. Which database type should you use?
Clear selection
Your company wants to reduce cost on infrequently accessed data by moving it to the cloud. The data will still be accessed approximately once a month to refresh historical charts. In addition, data older than 5 years is no longer needed. How should you store and manage the data?
Clear selection
Your company wants to try out the cloud with low risk. They want to archive approximately 100 TB of their log data to the cloud and test the analytics features available to them there, while also retaining that data as a long-term disaster recovery backup. Which two steps should they take? Choose 2 answers
Which of the following statements on Cloud Bigtable are true?
Your App Engine application needs to store stateful data in a proper storage service. Your data is non-relational database data. You do not expect the database size to grow beyond 10 GB and you need to have the ability to scale down to zero to avoid unnecessary costs. Which storage service should you use?