在JDK17尝鲜Flink1.17

前言

还没玩明白老版本,Flink1.17就来了!!!总还是要向前看的。。。

根据官网文档:https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/try-flink/local_installation/

在JDK17尝鲜Flink1.17-LMLPHP

Flink runs on all UNIX-like environments, i.e. Linux, Mac OS X, and Cygwin (for Windows). You need to have Java 11 installed

所以JDK1.8肯定是不能再去用了。。。Flink早在1.15就要求使用JDK11,主要是为了用上比G1更优秀的ZGC,毕竟吞吐量下降15%只要多+20%的机器就可以弥补,有限Money能解决的问题并不是太大的问题,但是老一些的GC万一STW来个几秒钟,那Flink所谓的亚秒级实时响应就无从谈起了。ZGC保证了4TB内存时暂停时间控制在15ms以内,还是很适合Flink使用的。JDK15中ZGC达到了GA【使用–XX:+UseZGC开启】,目前Oracle主推的LTS在1.8、11后就是17了。。。所以JDK17才是未来。。。别人的Flink1.15在JDK17生产环境运行许久没出现过什么惊天地泣鬼神的大事故,稳定性应该还可以。

但是在Hive集群,为了兼容Hive,只能用JDK1.8:https://lizhiyong.blog.csdn.net/article/details/130799342

这种情况笔者已经踩过坑了,最好还是退而求其次使用老版本Flink。。。

JDK17部署

去Oracle官网:https://www.oracle.com/java/technologies/downloads/#jdk17-windows

在JDK17尝鲜Flink1.17-LMLPHP

Win10用户down这个就可以了。直接下一步,没啥讲究。

C:\Users\zhiyong>java -version
java version "17.0.7" 2023-04-18 LTS
Java(TM) SE Runtime Environment (build 17.0.7+8-LTS-224)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.7+8-LTS-224, mixed mode, sharing)

C:\Users\zhiyong>

在Win+R,CMD出现这个就说明JDK17部署成功了。

Linux用户直接下载:

在JDK17尝鲜Flink1.17-LMLPHP

如果是Ubuntu,那更简单,不再赘述了。。。

Flink1.17的项目配置

按照官网描述:https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/configuration/overview/

在JDK17尝鲜Flink1.17-LMLPHP

显然从Flink1.15开始的去Scala化进行的很彻底。。。未来就不能用Scala写Flink任务了。。。所以Scala用户尽早转型搞Spark。。。Flink欢迎的是Java用户。。。使用Scala开发的Flink任务也应该尽快改成Java版本,不然Flink2.0可能就彻底淘汰掉Scala了。。。

所以也就没有必要再去纠结Scala开发Flink任务的问题了。。。尽可能不要用了。。。

Flink的Pom依赖

那么Flink项目中主要的三大核心依赖就是:flink-streaming-java、flink-table-api-java、flink-table-api-java-bridge

各种连接器和其它用得着的依赖当然是用什么就添加什么。比如SQL Boy们一定要用的Table Planner,但是Javaer不一定用得上。减少不必要的依赖才能减少Fat的Jar包体积。

可以去Maven中央仓库查找:https://mvnrepository.com/

时过境迁,很多老版本的依赖早就不再更新了。比如古代笔者很喜欢的Flink写HDFS的依赖包:https://mvnrepository.com/artifact/org.apache.flink/flink-connector-filesystem

在JDK17尝鲜Flink1.17-LMLPHP

在Flink1.11以后【也就是2021年底】就没有任何更新了。所以还是要与时俱进!!!

版本

    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <scala.version>2.12</scala.version>
        <scala.binary.version>2.12.12</scala.binary.version>
        <flink.version>1.17.0</flink.version>
        <encoding>UTF-8</encoding>
    </properties>

虽然Scala要被Flink遗弃了,但是大数据基本环境还是少不了Scala的Runtime,先留着。

核心依赖

<dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java</artifactId>
            <version>${flink.version}</version>
            <!--            <scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-core</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-common</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-runtime</artifactId>
            <version>${flink.version}</version>
            <!--            <scope>test</scope>-->
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-planner -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_${scala.version}</artifactId>
            <version>${flink.version}</version>
            <!--            <scope>test</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java-bridge</artifactId>
            <version>${flink.version}</version>
            <!--            <scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-runtime-web</artifactId>
            <version>${flink.version}</version>
<!--            <scope>test</scope>-->
        </dependency>


    </dependencies>

和之前Flink1.14相比:https://lizhiyong.blog.csdn.net/article/details/124161096

新版本Flink1.17的很多依赖没有Scala版本的后缀。

打包器

这年头,Flink任务一般是运行在Yarn或者K8S上。

在多租户资源隔离、多环境隔离、动态扩缩容等方面,Yarn是弟弟,不久的将来就会像被干掉的Mesos那样被后浪K8S拍死在沙滩上。。。

各种Saas服务的底层当然也还是K8S。在K8S环境下,直接打个Fat的Jar包,再去打镜像和部署,运维压力会小很多。

<build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.4</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <artifactSet>
                                <excludes>
                                    <exclude>com.google.code.findbugs:jsr305</exclude>
                                </excludes>
                            </artifactSet>
                            <filters>
                                <filter>
                                    <!-- Do not copy the signatures in the META-INF folder.
                                    Otherwise, this might cause SecurityExceptions when using the JAR. -->
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <!-- 这里替换成要打Fat的Jar包的主类 -->
                                    <mainClass>com.zhiyong.FlinkTest1</mainClass>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

主要是替换这里,修改Fat的Jar包的主类。Flink任务一般是按照Java SE的写法,一个Main方法算一个任务。用Spring的Web开发人员别纠结这个。。。

每次打包都需要手动修改一次。。。Emmn。

无状态应用

参考之前Flink1.14的这篇:https://lizhiyong.blog.csdn.net/article/details/123649447

从1.14到1.17的变化比起1.7到1.14还是要小一点。。。

简单写个无状态应用。先定义一个对象:

package com.zhiyong.common;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import lombok.ToString;

/**
 * @program: zhiyong_flink17
 * @description: 车速日志对象
 * @author: zhiyong
 * @create: 2023-05-04 22:55
 **/

@Data
@AllArgsConstructor
@NoArgsConstructor
@ToString
public class CarLog {
    private String carCode;   //车编号
    private String vin;     //车架号
    private int speed;      //车速:km/h
    private long logTime;   //数据记录时间:ms

    private String gpsLongitude; //GPS经度
    private String gpsLatitude; //GPS维度



}

简单起见,不开启Kafka了,直接写一个模拟数据源:

package com.zhiyong.common;

import org.apache.flink.streaming.api.functions.source.SourceFunction;

import java.util.Random;

/**
 * @program: zhiyong_flink17
 * @description: 每秒产生1条数据的车辆速度数据源
 * @author: zhiyong
 * @create: 2023-05-04 22:49
 **/
public class CarSpeedSource1ps implements SourceFunction<CarLog> {

    private boolean needRun = true;

    @Override
    public void run(SourceContext sourceContext) throws Exception {
        Random random = new Random();
        CarLog carLog = new CarLog();
        carLog.setCarCode("car_" + random.nextInt(5));

        long logTime = 0;
        int speed = 0;
        while (needRun) {
            logTime = System.currentTimeMillis() - 50 - random.nextInt(500);
            speed=random.nextInt(150);
            carLog.setLogTime(logTime);
            carLog.setSpeed(speed);

            sourceContext.collect(carLog);
            Thread.sleep(1000);
        }

    }

    @Override
    public void cancel() {
        needRun = false;

    }
}

简单展示一下:

package com.zhiyong;

import com.zhiyong.common.CarLog;
import com.zhiyong.common.CarSpeedSource1ps;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * @program: zhiyong_flink17
 * @description: Flink环境验证
 * @author: zhiyong
 * @create: 2023-05-04 20:53
 **/
public class FlinkTest1 {
    public static void main(String[] args) throws Exception {
        System.out.println("JDK17");

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);//防止报网络资源不充分的错

        CarSpeedSource1ps carSpeedSource1ps = new CarSpeedSource1ps();
        DataStreamSource<CarLog> source1 = env.addSource(carSpeedSource1ps);
        DataStreamSource<CarLog> source2 = env.addSource(carSpeedSource1ps);


        source1.print("source1=>>>");
        source2.print("source2=>>>");


        env.execute();
    }
}

此时直接报错。。。坑。。。

报错

JDK17
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.lang.reflect.InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @7e0e6aa2
	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
	at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:178)
	at java.base/java.lang.reflect.Field.setAccessible(Field.java:172)
	at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:106)
	at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:132)
	at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:132)
	at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:69)
	at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:2317)
	at org.apache.flink.streaming.api.datastream.DataStream.clean(DataStream.java:202)
	at org.apache.flink.streaming.api.datastream.DataStream.addSink(DataStream.java:1244)
	at org.apache.flink.streaming.api.datastream.DataStream.print(DataStream.java:970)
	at com.zhiyong.FlinkTest1.main(FlinkTest1.java:28)

Process finished with exit code 1

显然是Java版本不同,反射出错。。。

这种情况需要在Idea中指定如下的VM Options:

--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.math=ALL-UNNAMED

在JDK17尝鲜Flink1.17-LMLPHP

此时即可正常运行:

JDK17
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
source2=>>>> CarLog(carCode=car_4, vin=null, speed=100, logTime=1683218141638, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_2, vin=null, speed=115, logTime=1683218141779, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_2, vin=null, speed=56, logTime=1683218142768, gpsLongitude=null, gpsLatitude=null)
source2=>>>> CarLog(carCode=car_4, vin=null, speed=56, logTime=1683218142874, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_2, vin=null, speed=113, logTime=1683218143666, gpsLongitude=null, gpsLatitude=null)
source2=>>>> CarLog(carCode=car_4, vin=null, speed=30, logTime=1683218143674, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_2, vin=null, speed=19, logTime=1683218144790, gpsLongitude=null, gpsLatitude=null)
source2=>>>> CarLog(carCode=car_4, vin=null, speed=101, logTime=1683218144653, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_2, vin=null, speed=118, logTime=1683218145619, gpsLongitude=null, gpsLatitude=null)
source2=>>>> CarLog(carCode=car_4, vin=null, speed=96, logTime=1683218146054, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_2, vin=null, speed=3, logTime=1683218146832, gpsLongitude=null, gpsLatitude=null)
source2=>>>> CarLog(carCode=car_4, vin=null, speed=57, logTime=1683218147022, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_2, vin=null, speed=119, logTime=1683218147948, gpsLongitude=null, gpsLatitude=null)
source2=>>>> CarLog(carCode=car_4, vin=null, speed=71, logTime=1683218147915, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_2, vin=null, speed=67, logTime=1683218148722, gpsLongitude=null, gpsLatitude=null)
source2=>>>> CarLog(carCode=car_4, vin=null, speed=20, logTime=1683218148742, gpsLongitude=null, gpsLatitude=null)

Process finished with exit code 130

数据源Mock出了符合预期的数据,且可以通过print算子展示出来。

filterFunction

写一个简单的filter类检测超速:

package com.zhiyong.trans.filter;

import com.zhiyong.common.CarLog;
import org.apache.flink.api.common.functions.FilterFunction;

/**
 * @program: zhiyong_flink17
 * @description: 简单的车辆超速过滤
 * @author: zhiyong
 * @create: 2023-05-05 00:41
 **/
public class CarOverspeedFilter1 implements FilterFunction<CarLog> {
    @Override
    public boolean filter(CarLog carLog) throws Exception {

        return carLog.getSpeed()>120?true:false;
    }
}

sinkFunction

写一个简单的sink类展示结果:

package com.zhiyong.sink;

import com.zhiyong.common.CarLog;
import org.apache.flink.streaming.api.functions.sink.SinkFunction;

import java.text.SimpleDateFormat;
import java.util.Date;

/**
 * @program: zhiyong_flink17
 * @description: 简单展示的sink
 * @author: zhiyong
 * @create: 2023-05-05 00:46
 **/
public class CarOverspeedSink1 implements SinkFunction<CarLog> {
    //SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");//设置日期格式

    @Override
    public void invoke(CarLog value, Context context) throws Exception {
        System.out.println(value.getCarCode() + "于" + new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date(value.getLogTime())) + " 超速。速度:" + value.getSpeed() + "km/h");
    }
}

查看结果

SingleOutputStreamOperator<CarLog> overspeedLog = source1
        .filter(new CarOverspeedFilter1());

overspeedLog.addSink(new CarOverspeedSink1());

只需要这样调用:

source1=>>>> CarLog(carCode=car_1, vin=null, speed=26, logTime=1683219460556, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=62, logTime=1683219461684, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=33, logTime=1683219462664, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=140, logTime=1683219463688, gpsLongitude=null, gpsLatitude=null)
car_1于2023-05-05 00:57:43 超速。速度:140km/h
source1=>>>> CarLog(carCode=car_1, vin=null, speed=20, logTime=1683219464711, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=38, logTime=1683219465884, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=21, logTime=1683219466499, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=110, logTime=1683219467705, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=98, logTime=1683219468560, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=122, logTime=1683219469470, gpsLongitude=null, gpsLatitude=null)
car_1于2023-05-05 00:57:49 超速。速度:122km/h
source1=>>>> CarLog(carCode=car_1, vin=null, speed=123, logTime=1683219470752, gpsLongitude=null, gpsLatitude=null)
car_1于2023-05-05 00:57:50 超速。速度:123km/h
source1=>>>> CarLog(carCode=car_1, vin=null, speed=80, logTime=1683219471615, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=16, logTime=1683219472708, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=118, logTime=1683219473563, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=67, logTime=1683219474634, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=120, logTime=1683219475746, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=121, logTime=1683219476736, gpsLongitude=null, gpsLatitude=null)
car_1于2023-05-05 00:57:56 超速。速度:121km/h
source1=>>>> CarLog(carCode=car_1, vin=null, speed=70, logTime=1683219477705, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=106, logTime=1683219478665, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=105, logTime=1683219479805, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=46, logTime=1683219480845, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=60, logTime=1683219481482, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=35, logTime=1683219482773, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=5, logTime=1683219483457, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_1, vin=null, speed=128, logTime=1683219484734, gpsLongitude=null, gpsLatitude=null)
car_1于2023-05-05 00:58:04 超速。速度:128km/h
source1=>>>> CarLog(carCode=car_1, vin=null, speed=135, logTime=1683219485590, gpsLongitude=null, gpsLatitude=null)
car_1于2023-05-05 00:58:05 超速。速度:135km/h
source1=>>>> CarLog(carCode=car_1, vin=null, speed=38, logTime=1683219486750, gpsLongitude=null, gpsLatitude=null)

Process finished with exit code 130

就可以看到结果。

使用Table API

EnvironmentSettings settings = EnvironmentSettings.newInstance().inStreamingMode().build();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, settings);

Table table1 = tableEnv.fromDataStream(source1);
Table table2 = table1.where($("speed").isGreater(120));
table2.printSchema();

tableEnv.toDataStream(table2,CarLog.class).print("Table API运算后的超速数据:");

执行后:

(
  `carCode` STRING,
  `vin` STRING,
  `speed` INT NOT NULL,
  `logTime` BIGINT NOT NULL,
  `gpsLongitude` STRING,
  `gpsLatitude` STRING
)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=0, logTime=1683220824050, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=97, logTime=1683220824886, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=117, logTime=1683220825843, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=90, logTime=1683220827020, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=35, logTime=1683220828225, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=12, logTime=1683220828882, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=28, logTime=1683220830090, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=27, logTime=1683220831220, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=130, logTime=1683220832300, gpsLongitude=null, gpsLatitude=null)
Table API运算后的超速数据:> CarLog(carCode=car_3, vin=null, speed=130, logTime=1683220832300, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=49, logTime=1683220832950, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=64, logTime=1683220833906, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=105, logTime=1683220835212, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=33, logTime=1683220836274, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=110, logTime=1683220837162, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=44, logTime=1683220838144, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=49, logTime=1683220839290, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=148, logTime=1683220840343, gpsLongitude=null, gpsLatitude=null)
Table API运算后的超速数据:> CarLog(carCode=car_3, vin=null, speed=148, logTime=1683220840343, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=47, logTime=1683220841115, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=121, logTime=1683220842419, gpsLongitude=null, gpsLatitude=null)
Table API运算后的超速数据:> CarLog(carCode=car_3, vin=null, speed=121, logTime=1683220842419, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_3, vin=null, speed=147, logTime=1683220843040, gpsLongitude=null, gpsLatitude=null)
Table API运算后的超速数据:> CarLog(carCode=car_3, vin=null, speed=147, logTime=1683220843040, gpsLongitude=null, gpsLatitude=null)

Process finished with exit code 130

显然这种情况下,也可以使用Table API做数据处理。

使用SQL

Table table3 = tableEnv.sqlQuery(
        "select " +
                "concat_ws(''," +
                "cast(carCode as string)," +
                "cast('于' as string)," +
                "cast(to_timestamp(from_unixtime(logTime/1000,'yyyy-MM-dd HH:mm:ss')) as string)," +//大小写敏感,不能cast
                "cast('超速,速度:' as string)," +
                "cast(speed as string)," +
                "cast('km/h' as string)" +
                ") as message " +
                "from " + table1 +
                " where speed>120"
);
table3.printSchema();
tableEnv.toDataStream(table3, String.class).print("Table Sql运算后的超速数据:");

执行后:

source1=>>>> CarLog(carCode=car_4, vin=null, speed=106, logTime=1683221978994, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_4, vin=null, speed=134, logTime=1683221979902, gpsLongitude=null, gpsLatitude=null)
Table Sql运算后的超速数据:> car_4于2023-05-05 01:39:39.000超速,速度:134km/h
source1=>>>> CarLog(carCode=car_4, vin=null, speed=142, logTime=1683221980929, gpsLongitude=null, gpsLatitude=null)
Table Sql运算后的超速数据:> car_4于2023-05-05 01:39:40.000超速,速度:142km/h
source1=>>>> CarLog(carCode=car_4, vin=null, speed=148, logTime=1683221981766, gpsLongitude=null, gpsLatitude=null)
Table Sql运算后的超速数据:> car_4于2023-05-05 01:39:41.000超速,速度:148km/h
source1=>>>> CarLog(carCode=car_4, vin=null, speed=148, logTime=1683221982592, gpsLongitude=null, gpsLatitude=null)
Table Sql运算后的超速数据:> car_4于2023-05-05 01:39:42.000超速,速度:148km/h
source1=>>>> CarLog(carCode=car_4, vin=null, speed=42, logTime=1683221983907, gpsLongitude=null, gpsLatitude=null)
source1=>>>> CarLog(carCode=car_4, vin=null, speed=110, logTime=1683221984943, gpsLongitude=null, gpsLatitude=null)

Process finished with exit code 130

显然SQL也可以用于无状态应用。

遇到的报错

开发过程中,遇到了一些报错,例如:

Exception in thread "main" org.apache.flink.table.api.ValidationException: SQL validation failed. From line 1, column 97 to line 1, column 103: Column 'logtime' not found in any table; did you mean 'logTime'?
	at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:187)
	at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:113)
	at org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:281)
	at org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:106)
	at org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:738)
	at com.zhiyong.FlinkTest1.main(FlinkTest1.java:51)
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, column 97 to line 1, column 103: Column 'logtime' not found in any table; did you mean 'logTime'?
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
	at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:505)
	at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:932)
	at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:917)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:5163)
	at org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:268)
	at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:6350)
	at org.apache.calcite.sql.validate.SqlValidatorImpl$SelectExpander.visit(SqlValidatorImpl.java:6510)
	at org.apache.calcite.sql.validate.SqlValidatorImpl$SelectExpander.visit(SqlValidatorImpl.java:6495)
	at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:324)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:134)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:101)
	at org.apache.calcite.sql.SqlOperator.acceptCall(SqlOperator.java:954)
	at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visitScoped(SqlValidatorImpl.java:6370)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:54)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:37)
	at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:134)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:101)
	at org.apache.calcite.sql.SqlOperator.acceptCall(SqlOperator.java:954)
	at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visitScoped(SqlValidatorImpl.java:6370)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:54)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:37)
	at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:134)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:101)
	at org.apache.calcite.sql.SqlOperator.acceptCall(SqlOperator.java:954)
	at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visitScoped(SqlValidatorImpl.java:6370)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:54)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:37)
	at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:134)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:101)
	at org.apache.calcite.sql.SqlOperator.acceptCall(SqlOperator.java:954)
	at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visitScoped(SqlValidatorImpl.java:6370)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:54)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:37)
	at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:134)
	at org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:101)
	at org.apache.calcite.sql.SqlAsOperator.acceptCall(SqlAsOperator.java:121)
	at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visitScoped(SqlValidatorImpl.java:6370)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:54)
	at org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:37)
	at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161)
	at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.go(SqlValidatorImpl.java:6339)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectExpr(SqlValidatorImpl.java:5910)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:426)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelectList(SqlValidatorImpl.java:4324)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3570)
	at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:64)
	at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:89)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1042)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:1017)
	at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:247)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:992)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:741)
	at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:183)
	... 5 more
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: Column 'logtime' not found in any table; did you mean 'logTime'?
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
	at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:505)
	at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:599)
	... 61 more

Process finished with exit code 1

显然Flink对SQL的解析是运用了calcite,出现了大小写敏感的问题。。。在正常的SQL【Hive QL、Spark SQL、MySQL、Oracle】都不会有这种情况。。。一言难尽。。。不能直接cast时间戳也是个麻烦事,好在有函数可以套用。

有状态应用

Demo

package com.zhiyong;

import com.zhiyong.common.CarLog;
import com.zhiyong.source.CarSpeedSource1ps;
import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

/**
 * @program: zhiyong_flink17_study
 * @description: Flink有状态应用验证
 * @author: zhiyong
 * @create: 2023-07-01 09:11
 **/
public class FlinkTest2 {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        CarSpeedSource1ps carSpeedSource1ps = new CarSpeedSource1ps();

        DataStreamSource<CarLog> source1 = env.addSource(carSpeedSource1ps);
        DataStreamSource<CarLog> source2 = env.addSource(carSpeedSource1ps);

//        source1.print("source1=>>>");
//        source2.print("source2=>>>");

        DataStream<CarLog> source3 = source1.union(source2);

        source3.print("source3=>>>");

        SingleOutputStreamOperator<Object> source4 = source3
                .keyBy(new KeySelector<CarLog, Object>() {
                    @Override
                    public Object getKey(CarLog carLog) throws Exception {
                        return carLog.getCarCode();
                    }
                })
                .flatMap(new RichFlatMapFunction<CarLog, Object>() {

                    ValueState<Integer> valueState;

                    @Override
                    public void open(Configuration parameters) throws Exception {
                        super.open(parameters);

                        ValueStateDescriptor<Integer> overSpeedCount = new ValueStateDescriptor<Integer>("overSpeedCount",
                                TypeInformation.of(new TypeHint<>() {
                                    @Override
                                    public TypeInformation<Integer> getTypeInfo() {
                                        return super.getTypeInfo();
                                    }
                                })
                        );

                        valueState = getRuntimeContext().getState(overSpeedCount);
                    }

                    @Override
                    public void flatMap(CarLog carLog, Collector<Object> collector) throws Exception {
                        Integer value = valueState.value();

                        if (null == value) {
                            value = Integer.valueOf(0);
                        }

                        if (carLog.getSpeed() > 120) {
                            value += 1;
                            valueState.update(value);
                        }

                        collector.collect(Tuple2.of(carLog.getCarCode(), value));

                    }
                });

        source4.print("source4=>>>");

        env.execute();
    }
}

也可以正常运行:

source3=>>>> CarLog(carCode=car_3, vin=null, speed=24, logTime=1688178031808, gpsLongitude=null, gpsLatitude=null)
source3=>>>> CarLog(carCode=car_2, vin=null, speed=10, logTime=1688178032050, gpsLongitude=null, gpsLatitude=null)
source4=>>>> (car_2,0)
source4=>>>> (car_3,0)
source3=>>>> CarLog(carCode=car_2, vin=null, speed=5, logTime=1688178032826, gpsLongitude=null, gpsLatitude=null)
source4=>>>> (car_3,0)
source4=>>>> (car_2,0)
source3=>>>> CarLog(carCode=car_3, vin=null, speed=93, logTime=1688178033145, gpsLongitude=null, gpsLatitude=null)
source3=>>>> CarLog(carCode=car_3, vin=null, speed=130, logTime=1688178034046, gpsLongitude=null, gpsLatitude=null)
source3=>>>> CarLog(carCode=car_2, vin=null, speed=8, logTime=1688178034202, gpsLongitude=null, gpsLatitude=null)
source4=>>>> (car_2,0)
source4=>>>> (car_3,1)
source3=>>>> CarLog(carCode=car_2, vin=null, speed=142, logTime=1688178034806, gpsLongitude=null, gpsLatitude=null)
source4=>>>> (car_3,1)
source4=>>>> (car_2,1)
source3=>>>> CarLog(carCode=car_3, vin=null, speed=113, logTime=1688178034939, gpsLongitude=null, gpsLatitude=null)
source3=>>>> CarLog(carCode=car_2, vin=null, speed=34, logTime=1688178036024, gpsLongitude=null, gpsLatitude=null)
source4=>>>> (car_2,1)
source4=>>>> (car_3,2)
source3=>>>> CarLog(carCode=car_3, vin=null, speed=142, logTime=1688178035808, gpsLongitude=null, gpsLatitude=null)
source3=>>>> CarLog(carCode=car_2, vin=null, speed=115, logTime=1688178037145, gpsLongitude=null, gpsLatitude=null)
source4=>>>> (car_3,3)
source4=>>>> (car_2,1)
source3=>>>> CarLog(carCode=car_3, vin=null, speed=129, logTime=1688178036853, gpsLongitude=null, gpsLatitude=null)
source3=>>>> CarLog(carCode=car_2, vin=null, speed=99, logTime=1688178037973, gpsLongitude=null, gpsLatitude=null)
source4=>>>> (car_2,1)
source3=>>>> CarLog(carCode=car_3, vin=null, speed=12, logTime=1688178037941, gpsLongitude=null, gpsLatitude=null)
source4=>>>> (car_3,3)

Process finished with exit code 130

类似这种状态编程,累加超速次数的应用也没啥问题。

至此,说明Flink在JDK17基本可以满足正常的使用。

更多的坑要慢慢踩了。

转载请注明出处:https://lizhiyong.blog.csdn.net/article/details/131525151

在JDK17尝鲜Flink1.17-LMLPHP

07-04 07:21