问题描述
我有Big Query中的数据我想在Spark集群中运行分析。每个文档如果我实例化一个Spark集群,它应该带有一个Big Query连接器。我正在寻找任何示例代码来执行此操作,在中找到。找不到任何C#示例。还发现了一些在DataProc API中的函数nuget包中。
寻找一个样本,使用c#在Google云中启动一个Spark集群。 安装Google.Apis.Dataproc.v1版本1.10.0.40(或更高版本)后:
以下是用于创建Dataproc集群的快速示例控制台应用程序在C#中:
使用Google.Apis.Auth.OAuth2;
使用Google.Apis.Services;
使用Google.Apis.Dataproc.v1;
使用Google.Apis.Dataproc.v1.Data;
使用System;
使用System.Threading;
名称空间DataprocSample {
类程序
{
static void Main(string [] args)
{
string project =您的项目这里;
字符串dataprocGlobalRegion =global;
string zone =us-east1-b;
string machineType =n1-standard-4;
string clusterName =sample-cluster;
int numWorkers = 2;
//请参阅应用程序默认凭证的文档:
// https://developers.google.com/identity/protocols/application-default-credentials
//通常,如果像自己一样运行,'gcloud auth login'就足够了。
//如果从虚拟机运行,请确保虚拟机已启动,以便服务帐户具有
// CLOUD_PLATFORM范围。
GoogleCredential凭证= GoogleCredential.GetApplicationDefaultAsync()。
if(credential.IsCreateScopedRequired)
{
credential = credential.CreateScoped(new [] {DataprocService.Scope.CloudPlatform});
}
DataprocService service = new DataprocService(
new BaseClientService.Initializer()
{
HttpClientInitializer =凭证,
ApplicationName =Dataproc示例,
});
//创建一个新的集群:
Cluster newCluster = new Cluster
{
ClusterName = clusterName,
Config = new ClusterConfig
{
GceClusterConfig = new GceClusterConfig
{
ZoneUri = String.Format(
https://www.googleapis.com/compute/v1/projects/{0}/zones/ {1},
project,zone),
},
MasterConfig = new InstanceGroupConfig
{
NumInstances = 1,
MachineTypeUri = String.Format (
https://www.googleapis.com/compute/v1/projects/{0}/zones/{1}/machineTypes/{2},
project,zone,machineType),
},
WorkerConfig = n ew InstanceGroupConfig
{
NumInstances = numWorkers,
MachineTypeUri = String.Format(
https://www.googleapis.com/compute/v1/projects/{0}/ zone / {1} / machineTypes / {2},
project,zone,machineType),
},
},
};
操作createOperation =
service.Projects.Regions.Clusters.Create(newCluster,project,dataprocGlobalRegion).Execute();
//轮询操作:
while(!IsDone(createOperation))
{
Console.WriteLine(Polling operation {0},createOperation.Name);
createOperation =
service.Projects.Regions.Operations.Get(createOperation.Name).Execute();
Thread.Sleep(1000);
}
Console.WriteLine(完成创建集群{0},newCluster.ClusterName);
}
static bool IsDone(Operation op)
{
return op.Done ??假;
}
}
}
I have data in Big Query I want to run analytics on in a spark cluster. Per documentation if I instantiate a spark cluster it should come with a Big Query connector. I was looking for any sample code to do this, found one in pyspark. Could not find any c# examples. Also found some documentation on the functions in DataProc APIs nuget package.
Looking for a sample to start a spark cluster in Google cloud using c#.
After installing Google.Apis.Dataproc.v1 version 1.10.0.40 (or higher):
Below is a quick sample console app for creating a Dataproc cluster in C#:
using Google.Apis.Auth.OAuth2;
using Google.Apis.Services;
using Google.Apis.Dataproc.v1;
using Google.Apis.Dataproc.v1.Data;
using System;
using System.Threading;
namespace DataprocSample {
class Program
{
static void Main(string[] args)
{
string project = "YOUR PROJECT HERE";
string dataprocGlobalRegion = "global";
string zone = "us-east1-b";
string machineType = "n1-standard-4";
string clusterName = "sample-cluster";
int numWorkers = 2;
// See the docs for Application Default Credentials:
// https://developers.google.com/identity/protocols/application-default-credentials
// In general, a previous 'gcloud auth login' will suffice if running as yourself.
// If running from a VM, ensure the VM was started such that the service account has
// the CLOUD_PLATFORM scope.
GoogleCredential credential = GoogleCredential.GetApplicationDefaultAsync().Result;
if (credential.IsCreateScopedRequired)
{
credential = credential.CreateScoped(new[] { DataprocService.Scope.CloudPlatform });
}
DataprocService service = new DataprocService(
new BaseClientService.Initializer()
{
HttpClientInitializer = credential,
ApplicationName = "Dataproc Sample",
});
// Create a new cluster:
Cluster newCluster = new Cluster
{
ClusterName = clusterName,
Config = new ClusterConfig
{
GceClusterConfig = new GceClusterConfig
{
ZoneUri = String.Format(
"https://www.googleapis.com/compute/v1/projects/{0}/zones/{1}",
project, zone),
},
MasterConfig = new InstanceGroupConfig
{
NumInstances = 1,
MachineTypeUri = String.Format(
"https://www.googleapis.com/compute/v1/projects/{0}/zones/{1}/machineTypes/{2}",
project, zone, machineType),
},
WorkerConfig = new InstanceGroupConfig
{
NumInstances = numWorkers,
MachineTypeUri = String.Format(
"https://www.googleapis.com/compute/v1/projects/{0}/zones/{1}/machineTypes/{2}",
project, zone, machineType),
},
},
};
Operation createOperation =
service.Projects.Regions.Clusters.Create(newCluster, project, dataprocGlobalRegion).Execute();
// Poll the operation:
while (!IsDone(createOperation))
{
Console.WriteLine("Polling operation {0}", createOperation.Name);
createOperation =
service.Projects.Regions.Operations.Get(createOperation.Name).Execute();
Thread.Sleep(1000);
}
Console.WriteLine("Done creating cluster {0}", newCluster.ClusterName);
}
static bool IsDone(Operation op)
{
return op.Done ?? false;
}
}
}
这篇关于谷歌DataProc API火花集群与C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!