pub trait Utf8Validator {
    unsafe fn new() -> Self
    where
        Self: Sized
; unsafe fn update(&mut self, input: &[u8]); unsafe fn finalize(self) -> Result<(), Utf8Error>; }
Expand description

A low-level interfacne for streaming validation of UTF-8 data. It is meant to be integrated in high-performance data processing pipelines.

Data can be streamed in arbitrarily-sized chunks using the Self::update() method. There is no way to find out if the input so far was valid UTF-8 during the validation. Only when the validation is completed with the Self::finalize() method the result of the validation is returned. Use ChunkedUtf8Validator if possible for highest performance.

This implementation requires CPU SIMD features specified by the module it resides in. It is undefined behavior to use it if the required CPU features are not available which is why all trait methods are unsafe.

General usage:

use simdutf8::basic::imp::Utf8Validator;
use std::io::{stdin, Read, Result};

fn main() -> Result<()> {
    unsafe {
        if !std::is_x86_feature_detected!("avx2") {
            panic!("This example only works with CPUs supporting AVX 2");
        }

        let mut validator = simdutf8::basic::imp::x86::avx2::Utf8ValidatorImp::new();
        let mut buf = vec![0; 8192];
        loop {
            let bytes_read = stdin().read(buf.as_mut())?;
            if bytes_read == 0 {
                break;
            }
            validator.update(&buf);
        }

        if validator.finalize().is_ok() {
            println!("Input is valid UTF-8");
        } else {
            println!("Input is not valid UTF-8");
        }
    }

    Ok(())
}

Required Methods§

Creates a new validator.

Safety

This implementation requires CPU SIMD features specified by the module it resides in. It is undefined behavior to call it if the required CPU features are not available.

Updates the validator with input.

Safety

This implementation requires CPU SIMD features specified by the module it resides in. It is undefined behavior to call it if the required CPU features are not available.

Finishes the validation and returns Ok(()) if the input was valid UTF-8.

Errors

A basic::Utf8Error is returned if the input was not valid UTF-8. No further information about the location of the error is provided.

Safety

This implementation requires CPU SIMD features specified by the module it resides in. It is undefined behavior to call it if the required CPU features are not available.

Implementors§