在Hadoop权威指南的第二张《关于mapreduce》中,提到了气象数据的分析问题,使用到Unix脚本,我就抽空把气象数据给下载下来,然后放到服务器上,用书中的脚本测试一下。实际发现书上有错误的地方,这么多年很少看书,第一次发现书中的错误,已圈红色:
Hadoop之气象数据分析Unix脚本验证(awk)-LMLPHP
Hadoop之气象数据分析Unix脚本验证(awk)-LMLPHP
Hadoop之气象数据分析Unix脚本验证(awk)-LMLPHP
我按照上面的方式下载文件 ftp://ftp.ncdc.noaa.gov/pub/data/noaa,使用sftp协议flashfxp软件上传上去,用了10年的数据,然后写脚本测试:

  1. [yangkai@localhost ~]$ cat max_temperatuer.sh
  2. #!/bin/bash
  3. for year in raw/*
  4. #for year in all/*
  5. do
  6. echo -ne $(basename $year)"\t"
  7. #echo -ne ${year}"\n"
  8. gunzip -c ${year}/* |\
  9. awk '{temp=substr($0,88,5)+0;
  10. q=substr($0,93,1);
  11. if(temp !=9999 && q~/[01459]/ && temp>max)max=temp}
  12. END {print max}'
  13. done
  14. exit
  15. [yangkai@localhost ~]$ sh max_temperatuer.sh
  16. 1901 317
  17. 1902 244
  18. 1903 289
  19. 1904 256
  20. 1905 283
  21. 1906 294
  22. 1907 283
  23. 1908 289
  24. 1909 278
  25. 1910 294
  26. [yangkai@localhost ~]$ ls
  27. 029070-99999-1901 for.sh max_temperatuer.sh raw yjdmdp.tar.gz
  28. [yangkai@localhost ~]$ ll raw/
  29. total 40
  30. drwxr-xr-x 2 yangkai root 4096 May 19 10:50 1901
  31. drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1902
  32. drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1903
  33. drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1904
  34. drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1905
  35. drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1906
  36. drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1907
  37. drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1908
  38. drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1909
  39. drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1910
  40. [yangkai@localhost ~]$ ll raw/1901/
  41. total 72
  42. -rw-r--r-- 1 yangkai root 11445 Nov 23 2004 029070-99999-1901.gz
  43. -rw-r--r-- 1 yangkai root 11210 Nov 23 2004 029500-99999-1901.gz
  44. -rw-r--r-- 1 yangkai root 11647 Nov 23 2004 029600-99999-1901.gz
  45. -rw-r--r-- 1 yangkai root 10998 Nov 23 2004 029720-99999-1901.gz
  46. -rw-r--r-- 1 yangkai root 11999 Nov 23 2004 029810-99999-1901.gz
  47. -rw-r--r-- 1 yangkai root 11132 Nov 23 2004 227070-99999-1901.gz
  48. [yangkai@localhost ~]$
  49. [yangkai@localhost ~]$ gunzip -c ./029720-99999-1901.gz |head
  50. 0029029720999991901010106004+60450+022267FM-12+001499999V0209991C000019999999N0000001N9-02061+99999102601ADDGF108991999999999999999999
  51. 0029029720999991901010113004+60450+022267FM-12+001499999V0202001N001019999999N0000001N9-01561+99999102621ADDGF108991999999999999999999
  52. 0029029720999991901010120004+60450+022267FM-12+001499999V0201801N001019999999N0000001N9-01391+99999102461ADDGF108991999999999999999999
  53. 0029029720999991901010206004+60450+022267FM-12+001499999V0202301N009319999999N0000001N9-00781+99999102311ADDGF108991999999999999999999
  54. 0029029720999991901010213004+60450+022267FM-12+001499999V0202301N012319999999N0000001N9-00391+99999102321ADDGF108991999999999999999999
  55. 0029029720999991901010220004+60450+022267FM-12+001499999V0202501N012319999999N0000001N9-00331+99999102241ADDGF108991999999999999999999
  56. 0029029720999991901010306004+60450+022267FM-12+001499999V0202701N015419999999N0000001N9-00391+99999102391ADDGF108991999999999999999999
  57. 0029029720999991901010313004+60450+022267FM-12+001499999V0202301N015419999999N0000001N9-00331+99999102301ADDGF108991999999999999999999
  58. 0029029720999991901010320004+60450+022267FM-12+001499999V0202701N015419999999N0000001N9-00391+99999102161ADDGF108991999999999999999999
  59. 0029029720999991901010406004+60450+022267FM-12+001499999V0202301N002619999999N0000001N9-00331+99999102191ADDGF108991999999999999999999
  60. [yangkai@localhost ~]$
结束。
09-25 14:15