我试图使用Vision和CoreML尽可能接近实时地对跟踪的对象执行样式转换。我使用AVKit捕获视频,并使用AVCaptureVideoDataOutputSampleBufferDelegate获取每个帧。
在高层,我的管道是:
1)人脸检测
2)更新预览层,在适当的屏幕位置绘制边界框
3)将原始图像裁剪到检测到的人脸上
4)通过coreML模型运行人脸图像,得到新的图像作为输出
5)用新图像填充预览层(无论它们在哪里)
我希望在计算出边界框(在主线程上)后立即将其放置,然后在推理完成后填充它们。但是,我发现将coreML推断添加到管道(在AVCaptureOutputQueue或CoreMLQueue上)时,在推断完成之前,边界框不会更新位置。也许我遗漏了一些关于如何在闭包中处理队列的内容。代码的相关部分(希望)如下。
我正在修改https://developer.apple.com/documentation/vision/tracking_the_user_s_face_in_real_time中的代码。
public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
// omitting stuff that gets pixelBuffers etc formatted for use with Vision
// and sets up tracking requests
// Perform landmark detection on tracked faces
for trackingRequest in newTrackingRequests {
let faceLandmarksRequest = VNDetectFaceLandmarksRequest(completionHandler: { (request, error) in
guard let landmarksRequest = request as? VNDetectFaceLandmarksRequest,
let results = landmarksRequest.results as? [VNFaceObservation] else {
return
}
// Perform all UI updates (drawing) on the main queue,
//not the background queue on which this handler is being called.
DispatchQueue.main.async {
self.drawFaceObservations(results) //<<- places bounding box on the preview layer
}
CoreMLQueue.async{ //Queue for coreML uses
//get region of picture to crop for CoreML
let boundingBox = results[0].boundingBox
//crop the input frame to the detected object
let image: CVPixelBuffer = self.cropFrame(pixelBuffer: pixelBuffer, region: boundingBox)
//infer on region
let styleImage: CGImage = self.performCoreMLInference(on: image)
//on the main thread, place styleImage into the bounding box(CAShapeLayer)
DispatchQueue.main.async{
self.boundingBoxOverlayLayer?.contents = styleImage
}
}
})
do {
try requestHandler.perform(faceLandmarksRequest)
} catch let error as NSError {
NSLog("Failed Request: %@", error)
}
}
}
除了队列/同步问题之外,我还认为导致速度减慢的一个原因可能是将像素缓冲区裁剪到感兴趣的区域。我这里没有主意了,如果有什么帮助的话,我将不胜感激
最佳答案
我使用https://github.com/maxvol/RxAVFoundation和https://github.com/maxvol/RxVision的管道来解决同步问题。
一个基本的例子-
let textRequest: RxVNDetectTextRectanglesRequest<CVPixelBuffer> = VNDetectTextRectanglesRequest.rx.request(reportCharacterBoxes: true)
var session = AVCaptureSession.rx.session()
var requests = [RxVNRequest<CVPixelBuffer>]()
self.requests = [self.textRequest]
self
.textRequest
.observable
.observeOn(Scheduler.main)
.subscribe { [unowned self] (event) in
switch event {
case .next(let completion):
self.detectTextHandler(value: completion.value, request: completion.request, error: completion.error)
default:
break
}
}
.disposed(by: disposeBag)
self.session
.flatMapLatest { [unowned self] (session) -> Observable<CaptureOutput> in
let imageLayer = session.previewLayer
imageLayer.frame = self.imageView.bounds
self.imageView.layer.addSublayer(imageLayer)
return session.captureOutput
}
.subscribe { [unowned self] (event) in
switch event {
case .next(let captureOutput):
guard let pixelBuffer = CMSampleBufferGetImageBuffer(captureOutput.sampleBuffer) else {
return
}
var requestOptions: [VNImageOption: Any] = [:]
if let camData = CMGetAttachment(captureOutput.sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) {
requestOptions = [.cameraIntrinsics: camData]
}
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up, options: requestOptions)
do {
try imageRequestHandler.rx.perform(self.requests, with: pixelBuffer)
} catch {
os_log("error: %@", "\(error)")
}
break
case .error(let error):
os_log("error: %@", "\(error)")
break
case .completed:
// never happens
break
}
}
.disposed(by: disposeBag)