GCP DATA
GCP Practice Questions about Storage DB and BigData
The database administration team has asked you to help them improve the performance of their new database server running on Google Compute Engine. The database is for importing and normalizing their performance statistics and is built with MySQL running on Debian Linux. They have an n1-standard-8 virtual machine with 80 GB of SSD persistent disk. What should they change to get better performance from this system?
Clear selection
What is Cloud Dataprep?
Clear selection
Your company is evaluating moving to Google Cloud. You will need to migrate your 3 TB on-premises MySQL databases to a managed database service in order to reduce administrative overhead. Minimal modification to the database is desired for the move. What managed database service would best meet this requirement?
Clear selection
Your company wants to reduce cost on infrequently accessed data by moving it to the cloud. The data will still be accessed approximately once a month to refresh historical charts. In addition, data older than 5 years is no longer needed. How should you store and manage the data?
Clear selection
Which of the following statements on Cloud BigQuery are true?
Your BigQuery table needs to be accessed by team members who are not proficient in technology. You want to simplify the columns they need to query to avoid confusion. How can you do this while preserving all of the data in your table?
Clear selection
You are building storage for files for a data pipeline on Google Cloud. You want to support JSON files. The schema of these files will occasionally change. Your analyst teams will use running aggregate ANSI SQL queries on this data. What should you do?
Clear selection
The application reliability team at your company has added a debug feature to their backend service to send all server events to Google Cloud Storage for eventual analysis. The event records are at least 50 KB and at most 15 MB and are expected to peak at 3,000 events per second. You want to minimize data loss. Which process should you implement?
Clear selection
What are wide neural networks good for, compared deep neural networks?
Clear selection
You have hundreds of IoT devices that generate 1 TB of streaming data per day. Due to latency, messages will often be delayed compared to when they were generated. You must be able to account for data arriving late within your processing pipeline. What should you do?
Clear selection
How would you best connect your Dataflow pipeline to Bigtable for output?
Clear selection
You host structured data for analysis for multiple clients in BigQuery. For organizational purposes, you need to store all of the different clients' data in a single project. You also need to be able to give your clients the ability to query their own data without having access to other clients' data. How can you best achieve this?
Clear selection
You created a job which runs daily to import highly sensitive data from an on-premises location to Cloud Storage. You also set up a streaming data insert into Cloud Storage via a Kafka node that is running on a Compute Engine instance. You need to encrypt the data at rest and supply your own encryption key. Your key should not be stored in the Google Cloud. What should you do?
Clear selection
In machine learning, what is the difference between test and training data?
Clear selection
What is the command for creating a storage bucket that has once per month access and is named 'archive_bucket'?
Clear selection
While conducting BigQuery queries against a large table with many columns, you notice in the details section that you have a very large purple bar in the first stage of your query execution. How can you troubleshoot this to increase performance and reduce costs? (Choose all that apply)
For this question, refer to the MJTelco case study here: https://cloud.google.com/certification/guides/data-engineer/casestudy-mjtelcoMJTelco is streaming telemetry data into BigQuery for long-term storage (2 years) and analysis, at the rate of about 100 million records per day. They need to be able to run queries against certain time periods of data without incurring the costs of querying all available records. What two options would you recommend for doing so? (Choose all that apply)
A GCP user requires a service that can break down sentences supplied by users into tokens, identify the nouns, verbs, adjectives, and other parts of speech, and figure out the relationships among the words. Which of these GCP services would you recommend?
What happens to your Bigtable data when a Bigtable node suffers a critical failure?
Clear selection
You are creating a solution to remove backup files older than 90 days from your backup Cloud Storage bucket. You want to optimize ongoing Cloud Storage spend. What should you do?
Clear selection
Which Hadoop ecosystem service is most suited to storing on BigQuery instead?
Clear selection
Your App Engine application needs to store stateful data in a proper storage service. Your data is non-relational database data. You do not expect the database size to grow beyond 10 GB and you need to have the ability to scale down to zero to avoid unnecessary costs. Which storage service should you use?
Clear selection
Your company has successfully migrated to the cloud and wants to analyze their data stream to optimize operations. They do not have any existing code for this analysis, so they are exploring all their options. These options include a mix of batch and stream processing, as they are running some hourly jobs and live-processing some data as it comes in. Which technology should they use for this?
Clear selection
You are viewing the details of a recent large query and notice that Stage 1 has a full purple bar. What does this tell you?
Clear selection
You are designing storage for CSV files and using an I/O-intensive custom Apache Spark transform as part of deploying a data pipeline on Google Cloud. You are using ANSI SQL to run queries for your analysts. You want to support complex aggregate queries and reuse existing code. How should you transform the input data?
Clear selection
You have a project using BigQuery. You want to list all BigQuery jobs for that project. You want to set this project as the default for the bq command-line tool. What should you do?
Clear selection
Your company is forecasting a sharp increase in the number and size of Apache Spark and Hadoop jobs being run on your local datacenter. You want to utilize the cloud to help you scale this upcoming demand with the least amount of operations work and code change.Which product should you use?
Clear selection
What are the different partitioning methods on BigQuery? (Choose two)
Your company collects and stores security camera footage in Google Cloud Storage. Within the first 30 days, footage is processed regularly for threat detection, object detection, trend analysis, and suspicious behavior detection. You want to minimize the cost of storing all the data. How should you store the videos?
Clear selection
You need to extract an address field from a multi-column element using Dataflow. Which mechanism is able to help with this task?
Clear selection
Your company is developing a next generation pet collar that collects biometric information to assist potential millions of families with promoting healthy lifestyles for their pets. Each collar will push 30kb of biometric data In JSON format every 2 seconds to a collection platform that will process and analyze the data providing health trending information back to the pet owners and veterinarians via a web portal. Management has tasked you to architect the collection platform ensuring the following requirements are met.1. Provide the ability for real-time analytics of the inbound biometric data 2. Ensure processing of the biometric data is highly durable, elastic and parallel 3.The results of the analytic processing should be persisted for data mining. Which architecture outlined below win meet the initial requirements for the platform?
Clear selection
In Cloud ML Engine, what does the CUSTOM tier allow you to configure? Choose the best answer.
Clear selection
You are migrating your existing data center environment to Google Cloud Platform. You have a 1 petabyte Storage Area Network (SAN) that needs to be migrated. What GCP service will this data map to?
Clear selection
You are planning the design of your Bigtable table, which will be used to collect speed limit data on highways. You anticipate needing to query by: Highway name: Mile marker - Timestamp of measurement taken. How should you design your schema in order to maximize efficiency, query all necessary data, and avoid hotspots in the row key?
Clear selection
Which of the following is a GCP Machine Learning service?
Clear selection
For this question, refer to the MountKirk Games case study (https://cloud.google.com/certification/guides/cloud-architect/casestudy-mountkirkgames): MountKirk Games needs to build out their streaming data analytics pipeline to feed from their game backend application. What GCP services in which order will achieve this?
Clear selection
Your company is planning the infrastructure for a new large-scale application that will need to store over 100 TB or a petabyte of data in NoSQL format for Low-latency read/write and High-throughput analytics. Which storage option should you use?
Clear selection
For this question, refer to the Flowlogistic case study here: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic. Flowlogistic is ready to migrate their Hadoop workloads to Google Cloud. For the data migration, they need a cost-effective 'data lake' that will scale to their growing data needs and be able to easily connect to their Hadoop workloads in the cloud. What two actions should they perform? (Choose all that apply)
What is the open source equivalent to Cloud Pub/Sub?
Clear selection
You are developing an application that will process thousands of images and scan for explicit content. You need to develop your learning model quickly, and are not familiar with working in Tensorflow. How can you complete this task as quickly as possible while saving on costs?
Clear selection
Which of the following statements on Cloud Dataproc are true?
Which of these is not a valid BigQuery data format?
Clear selection
What types of jobs does Cloud Dataproc support? (Choose all that apply)
You are setting up Cloud Dataproc to perform some data transformations using Apache Spark jobs. The data will be used for a new set of non-critical experiments in your marketing group. You want to set up a cluster that can transform a large amount of data in the most cost-effective way. What should you do?
Clear selection
You are working on a project with two compliance requirements. The first requirement states that your developers should be able to see the Google Cloud Platform billing charges for only their projects. The second requirement states that your finance team members can set budgets and view the current charges for all projects in the organization. The finance team should not be able to view the project contents. You want to set permissions. What should you do?
Clear selection
Which of the following can be used to get data into cloud storage?
Your BigQuery dataset contains 1500 tables. When conducting a query, you are limited to a maximum of 1000 tables that you can query at once. You need to query data across all 1500 tables. What should you do?
Clear selection
Choose two best practices for creating more efficient queries and saving costs
You are upgrading your existing (development) Cloud Bigtable instance for use in your production environment. The instance contains a large amount of data that you want to make available for production immediately. You need to design for fastest performance. What should you do?
Clear selection
Pick two benefits of using denormalized data in BigQuery? (Choose all that apply)
Your customer is moving their storage product to Google Cloud Storage (GCS). The data contains personally identifiable information (PII) and sensitive customer information. What security strategy should you use for GCS?
Clear selection
You regularly use prefetch caching with a Data Studio report to visualize the results of BigQuery queries. You want to minimize service costs. What should you do?
Clear selection
Which of the following statements on BigQuery are true?
For this question, refer to the Flowlogistic case study here: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic. Flowlogistic's Kafka server cluster has been unable to scale to the demands of their data ingest needs. How can they migrate this functionality to Google Cloud to be able to scale for future growth?
Clear selection
What software libraries does Cloud ML Engine support?
Clear selection
Your company processes high volumes of IoT data that are time-stamped. The total data volume can be several petabytes. The data needs to be written and changed at a high speed. You want to use the most performant storage option for your data. Which product should you use?
Clear selection
Your company plans to host a large donation website on Google Cloud Platform. You anticipate a large and undetermined amount of traffic that will create many database writes. To be certain that you do not drop any writes to a database hosted on GCP. Which service should you use with managed service?
Clear selection
What IAM role do you need to grant to service accounts for Dataproc workloads, while offering the smallest scope of permissions?
Clear selection
For this question, refer to the TerramEarth case study. https://cloud.google.com/certification/guides/cloud-architect/casestudy-terramearth TerramEarth plans to connect all 20 million vehicles in the field to the cloud. This increases the volume to 20 million 600 byte records a second for 40 TB an hour. How should you design the data ingestion?
Clear selection
Which of the following Cloud Storage related statements are correct?
What is a difference between example (training) data and test data?
Clear selection
You are selecting a streaming service for log messages that must include final result message ordering as part of building a data pipeline on Google Cloud. You want to stream input for 5 days and be able to query the most recent message value. You will be storing the data in a searchable repository. How should you set up the input messages?
Clear selection
You have data stored in a Cloud Storage dataset and also in a BigQuery dataset. You need to secure the data and provide 3 different types of access levels for your Google Cloud Platform users: administrator, read/write, and read-only. You want to follow Google-recommended practices.What should you do?
Clear selection
You created a job which runs daily to import highly sensitive data from an on-premises location to Cloud Storage. You also set up a streaming data insert into Cloud Storage via a Kafka node that is running on a Compute Engine instance. You need to encrypt the data at rest and supply your own encryption key. Your key should not be stored in the Google Cloud. What should you do?
Clear selection
Which of the following statements are true?
You want to display aggregate view counts for your YouTube channel data in Data Studio. You want to see the video tiles and view counts summarized over the last 30 days. You also want to segment the data by the Country Code using the fewest possible steps. What should you do?
Clear selection
For this question, refer to the JencoMart case study.JencoMart wants to move their User Profiles database to Google Cloud Platform. Which Google Database should they use?
Clear selection
You are creating a machine learning model to predict the likelihood of fraud from credit card transaction data. What type of learning model problem is this?
Clear selection
For this question, refer to the Dress4Win case study https://cloud.google.com/certification/guides/cloud-architect/casestudy-dress4win. You want to ensure Dress4Win’s sales and tax records remain available for infrequent viewing by auditors for at least 10 years. Cost optimization is your top priority. Which cloud services should you choose?
Clear selection
Which of the following statements on Cloud BigQuery are true?
To run a local training job using the Google Cloud SDK, what command would you run?
Clear selection
Which of the following are supported Cloud Storage Object Lifecycle management features?
For this question, refer to the MJTelco case study here: https://cloud.google.com/certification/guides/data-engineer/casestudy-mjtelco. MJTelco needs to be able to reliably handle ever-increasing amounts of streaming telemetry data, process it, and economically store analyzed data. What services should they use for this task?
Clear selection
Your company has a mission-critical application that serves users globally. You need to select a transactional and relational data storage system for this application. Which two products should you choose?
You are configuring your Cloud Pub/Sub subscription. Assuming that all requirements are met, which subscription delivery method offers better 'near real-time' delivery of messages?
Clear selection
What types of Bigtable row keys can lead to hotspotting? (Choose all that apply)
As part of a complex rollout, you have hired a third party developer consultant to assist with creating your Dataflow processing pipeline. The data that this pipeline will process is very confidential, and the consultant cannot be allowed to view the data itself. What actions should you take so that they have the ability to help build the pipeline but cannot see the data it will process?
Clear selection
Which of the following statements are true?
Why would you want to train a machine learning model locally before deploying to Cloud ML Engine? (Choose all that apply)
You need to give a team member the ability to use a training model for predictions, but not have the ability to create or delete models. What IAM role should you assign to achieve this task with the minimum necessary permissions?
Clear selection
Which of these statements is true regarding BigQuery caching?
Clear selection
As part of your backup plan, you set up regular snapshots of Compute Engine instances that are running. You want to be able to restore these snapshots using the fewest possible steps for replacement instances. What should you do?
Clear selection
You are transferring a very large number of small files to Google Cloud Storage from an on-premises location. You need to speed up the transfer of your files. Assuming a fast network connection, what two actions can you do to help speed up the process?Choose the 2 correct answers:
When should you use a hard disk drive (HDD) in Bigtable vs. a solid state drive (SSD). (Choose all that apply)
Why do you want to train a machine learning model locally before training on cloud resources? (Choose all that apply)
You need to choose a managed database solution for an upcoming application your company is designing. The database will store transactional, non-relational data, however, atomicity is required for strong consistency. Which managed database solution should you choose?
Clear selection
Which of these open source frameworks is best suited to process simultaneous batch and streaming in a single data pipeline?
Clear selection
You are creating a machine learning model for predicting a person's income given a variety of factors such as age, race, occupation, and others. What type of problem are we trying to solve in our prediction values?
Clear selection
A GCP user wishes to develop an application that annotates video footage into a variety of formats, identifying key entities within video and when they occur, and making the video content searchable and discoverable. Which of the following services would you recommend?
Clear selection
You are designing storage for CSV files and using an I/O-intensive custom Apache Spark transform as part of deploying a data pipeline on Google Cloud. You are using ANSI SQL to run queries for your analysts. You want to support complex aggregate queries and reuse existing code. How should you transform the input data?
Clear selection
You are designing a relational data repository on Google Cloud to grow as needed. The data will be transactional consistent and added from any location in the world. You want to monitor and adjust node count for input traffic, which can spike unpredictably. What should you do?
Clear selection
You currently have a Bigtable instance you've been using for development running a development instance type, using HDD's for storage. You are ready to upgrade your development instance to a production instance for increased performance. You also want to upgrade your storage to SSD's as you need maximum performance for your instance. What should you do?
Clear selection
You have a mission-critical database running on an instance on Google Compute Engine. You need to automate a database backup o