问题描述
以下代码能够读取数据源(遵循所有读取规则),并具有文本(具有1字节大小的UTF-8编码):
Below code is able to read data source(following all reading rules), having text(with UTF-8 encodings of size one byte):
package main
import (
"fmt"
"io"
)
type MyStringData struct {
str string
readIndex int
}
func (myStringData *MyStringData) Read(p []byte) (n int, err error) {
// convert `str` string to slice of bytes
strBytes := []byte(myStringData.str)
// if `readIndex` is GTE source length, return `EOF` error
if myStringData.readIndex >= len(strBytes) {
return 0, io.EOF // `0` bytes read
}
// get next readable limit (exclusive)
nextReadLimit := myStringData.readIndex + len(p)
if nextReadLimit >= len(strBytes) {
nextReadLimit = len(strBytes)
err = io.EOF
}
// get next bytes to copy and set `n` to its length
nextBytes := strBytes[myStringData.readIndex:nextReadLimit]
n = len(nextBytes)
// copy all bytes of `nextBytes` into `p` slice
copy(p, nextBytes)
// increment `readIndex` to `nextReadLimit`
myStringData.readIndex = nextReadLimit
// return values
return
}
func main() {
// create data source
src := MyStringData{str: "Hello Amazing World!"} // 学中文
p := make([]byte, 3) // slice of length `3`
// read `src` until an error is returned
for {
// read `p` bytes from `src`
n, err := src.Read(p)
fmt.Printf("%d bytes read, data:%s\n", n, p[:n])
// handle error
if err == io.EOF {
fmt.Println("--end-of-file--")
break
} else if err != nil {
fmt.Println("Oops! some error occured!", err)
break
}
}
}
输出:
Output:
$
$
$ go run src/../Main.go
3 bytes read, data:Hel
3 bytes read, data:lo
3 bytes read, data:Ama
3 bytes read, data:zin
3 bytes read, data:g W
3 bytes read, data:orl
2 bytes read, data:d!
--end-of-file--
$
$
但是上面的代码无法读取具有文本的数据源(具有大于1个字节的UTF-8编码),如下所示:
But the above code is unable to read data source having text(with UTF-8 encodings of size greater than one byte) as shown below:
src := MyStringData{str: "Hello Amazing World!学中文"}
下面是输出:
$
$
$ go run src/../Main.go
3 bytes read, data:Hel
3 bytes read, data:lo
3 bytes read, data:Ama
3 bytes read, data:zin
3 bytes read, data:g W
3 bytes read, data:orl
3 bytes read, data:d!�
3 bytes read, data:���
3 bytes read, data:���
2 bytes read, data:��
--end-of-file--
$
$
在给出有关使用 strings.NewReader()
的注释后,下面是修改后的代码:
With the comments given on usage of strings.NewReader()
, below is the code modified:
// create data source
src := strings.NewReader("Hello Amazing World!学中文") // 学中文
// p := make([]byte, 3) // slice of length `3`
// read `src` until an error is returned
for {
// read `p` bytes from `src`
ch, n, err := src.ReadRune()
// n, err := src.Read(p)
fmt.Printf("%d bytes read, data:%c\n", n, ch)
// handle error
if err == io.EOF {
fmt.Println("--end-of-file--")
break
} else if err != nil {
fmt.Println("Oops! some error occured!", err)
break
}
}
如何读取unicode字符而不将字符(例如学
)拆分为两个 Read
调用?
推荐答案
使用 bufio 中的内容,例如一个 bufio.Reader
的 ReadRune
函数,或者一个 bufio.Scanner
具有扫描功能的扫描函数,仅返回一个或多个完整符文(使用 unicode/utf8 DecodeRune 和 FullRune
>进行验证,就像stdlib bufio.ScanRunes
所做的那样.)
Use something from bufio, e.g. a bufio.Reader
's ReadRune
function, or a bufio.Scanner
with a scan function that only returns one or more complete runes (using DecodeRune
and FullRune
from unicode/utf8 to validate, as the stdlib bufio.ScanRunes
does).
您可以自己完成此操作,方法是在切片中缓冲不完整的符文,并通过连续读取附加到其上,但这只会复制 Scanner
的功能.
You could do it yourself by buffering incomplete runes in a slice and appending to it with successive reads, but that would just be duplicating what Scanner
does.
这篇关于如何从数据源读取unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!