Skip to main content

Streaming data from Cloud Storage into BigQuery using Cloud Functions


Using Cloud Storage from Google Cloud Platform (GCP) helps you securely store your data and integrate storage into your apps. For real-time analysis of Cloud Storage objects, you can use GCP’s BigQuery. There are a bunch of options for streaming data from a Cloud Storage bucket into a BigQuery table. We’ve put together a new solution guide to help you stream data quickly into BigQuery from Cloud Storage. We’ll discuss in this post how to continually copy newly created objects in Cloud Storage to BigQuery using Cloud Functions.
Using Cloud Functions lets you automate the process of copying objects to BigQuery for quick analysis, allowing you near real-time access to data uploaded to Cloud Storage. This means you can get better information faster, and respond more quickly to events that are happening in your business.
Cloud Functions is GCP’s event-driven, serverless compute platform, which provides automatic scaling, high availability, and fault tolerance with no servers to provision, manage, update, or patch. Streaming data using Cloud Functions lets you connect and extend other GCP services, while paying only when your app is running. 
Note that it’s also possible to stream data into BigQuery using Cloud Dataflow. Cloud Dataflow uses the Apache Beam framework, which provides windowing and session analysis primitives, as well as an ecosystem of source and sink connectors in Java, Python, and some other languages. 
However, if you’re not fluent in the Apache Beam API and you’re trying to ingest files without considering windowing or complex transformations, such as streaming small files directly into tables, Cloud Functions is a simple and effective option. 
You can also choose to use both Cloud Dataflow (Apache Beam) for complex ETL and large data sets, and Cloud Functions for small files and simpler transformations. 
How this Cloud Functions solution worksThe following architecture diagram illustrates the components and flow of a streaming pipeline created with Cloud Functions. This pipeline assumes that you’re uploading JSON files into Cloud Storage, so you’ll have to make minor changes to support other file formats.
Cloud Functions solution.png

You can see in Step 1 that JSON files are uploaded to Cloud Storage. Every time a new file is added to the 
FILES_SOURCE bucket, the streaming Cloud Function is triggered. This function parses the data, streams inserts into BigQuery (Step 3), logs the ingestion status into Cloud Firestore for deduping (Step 4), and publishes a message in one of the two Cloud Pub/Sub topics: streaming_success_topic, if everything went right; or streaming_error_topic, if any issue has happened (Step 5). At the end, either the streaming_error or the streaming_success Cloud Functions move the JSON file from the source bucket to either FILES_ERROR bucket or FILES_SUCCESS bucket (Step 6).
Using Cloud Functions means this architecture is not only simple but also flexible and powerful. Beyond lightweight scaling up and down to fit your file uploading capacity, Cloud Functions also allows you to implement custom functionalities, such as the use of Cloud Firestore.
Get started by visiting the solutions page tutorial, where you can get detailed information on how to implement this type of architecture. And try GCP to explore further.




Comments

Popular posts from this blog

Use Vault for Gmail Confidential Messages and Jamboard Files

Google vault will be supporting two new formats in the future, Gmail confidential mode emails & Jamboard files stored in Google Drive.
Google Vault gives you a chance to retain, hold, search, and export data to support your organization’s retention and eDiscovery needs. This dispatch includes support for new information types with the goal that you can thoroughly oversee your association's information.
What happens when individuals in your association sends confidential messages? Vault can hold, retain, search, and export all confidential mode messages sent by users in your association. Messages are constantly accessible to Vault, notwithstanding when the sender sets a termination date or denies access to private messages.
Here’s an example of what admin@ink-42.com will see in Vault when they search for sam@ink-42.com and preview this email sent by lisa@ink-42.com.
But It’ll not work vise versa. Admins can hold, retain, search and export message headers and subjects of external c…

Set start times and import reminders in Tasks

Here comes one of the most awaited features. Tasks is one of the goals to follow what you have to do in G Suite. These new updates will help ensure the majority of your to-dos are in Tasks, and guarantee that you can monitor the due dates related with them. Moreover, importing reminders to Tasks can support your users if your association is at present changing from Inbox to Gmail.
Set a date and time for your tasks and receive notifications - You’ll find a place to add date & time. Create repeating tasks - Also you can make an event recur.

Import reminders into Tasks
This import tool will pull your reminders (from Inbox/Gmail, Calendar, or the Assistant) into Tasks.When importing reminders into Tasks, we’ll copy over the title, date, time and recurrence of the reminder. Please note, reminders with locations associated will not be imported. Additionally, this is a one-time import and not a constant sync.
- When you open Tasks on the web or your mobile app, you’ll see a prompt to cop…

Save time with new scheduling features in Calendar

Save time with this new update. It allows you to schedule meetings easily.
Peek at calendars and automatically add guests - when you add a calendar in “search for people” box on the left hand side panel you can temporarily view coworkers’ calendars. Also at the same time those coworkers will be added automatically when creating an event.
More fields in the creation pop-up dialog - The Guests, Rooms, Location, Conferencing, and Description fields are currently editable  in the meeting creation popup dialog. When you include your colleagues' calendar, they'll load right in the background, making it much simpler and quicker to locate an accessible time for everybody. 









Follow us on -

Our Website - http://fcpl.biz/ Our Facebook page - https://lnkd.in/f6JpUPi?
Our LinkedIn page - https://www.linkedin.com/company/finetech-consultancy/ Talk to our experts - 071 0326326