Hadoop cannot connect to Google Cloud Storage
I'm trying to connect Hadoop running on Google Cloud VM to Google Cloud Storage. I have: Modified the core-site.xml to include properties of fs.gs.impl and fs.AbstractFileSystem.gs.impl Downloaded and referenced the gcs-connector-latest-hadoop2.jar in a generated hadoop-env.sh authenticated via gcloud auth login using my personal account (instead of a service account). I'm able to run gsutil -ls gs://mybucket/ without any issues but when I execute hadoop fs -ls gs://mybucket/ I get the output: 14/09/30 23:29:31 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.2.9-hadoop2 ls: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token Wondering what steps I am missing to get Hadoop to be able to see the Google Storage? Thanks!
By default, the gcs-connector when running on Google Compute Engine is optimized for using the built-in service-account mechanisms, so in order to force it to use the oauth2 flow, there are a few extra configuration keys that need to be set; you can borrow the same “client_id” and “client_secret” from gcloud auth as follows and add them to your core-site.xml, also disabling
<property> <name>fs.gs.auth.service.account.enable</name> <value>false</value> </property> <property> <name>fs.gs.auth.client.id</name> <value>32555940559.apps.googleusercontent.com</value> </property> <property> <name>fs.gs.auth.client.secret</name> <value>ZmssLNjJy2998hD4CTg2ejr2</value> </property>
You can optionally also set
fs.gs.auth.client.file to something other than its default of
If you do this, then when you run
hadoop fs -ls gs://mybucket you’ll see a new prompt, similar to the “gcloud auth login” prompt, where you’ll visit a browser and enter a verification code again. Unfortunately, the connector can’t quite consume a “gcloud” generated credential directly, even though it can possibly share a credentialstore file, since it asks explicitly for the GCS scopes that it needs (you’ll notice that the new auth flow will ask only for GCS scopes, as opposed to a big list of services like “gcloud auth login”).
Make sure you’ve also set
fs.gs.project.id in your core-site.xml:
<property> <name>fs.gs.project.id</name> <value>your-project-id</value> </property>
since the GCS connector likewise doesn’t automatically infer a default project from the related gcloud auth.
Thanks very much for both of your answers! Your answers led me to the configuration as noted in Migrating 50TB data from local Hadoop cluster to Google Cloud Storage.
I was able to utilize the fs.gs.auth.service.account.keyfile by generating a new service account and then applying the service account email address and p12 key.
It looks like the instance itself isn’t configured to use the correct service account (but the gsutil command line utility is). The Hadoop file system adaptor looks like it’s not pulling those credentials.
Hope this helps!
- Database Administration Tutorials
- Programming Tutorials & IT News
- Linux & DevOps World
- Entertainment & General News
- Games & eSport