Trait simdutf8::basic::imp::Utf8Validator
source · pub trait Utf8Validator {
unsafe fn new() -> Self
where
Self: Sized;
unsafe fn update(&mut self, input: &[u8]);
unsafe fn finalize(self) -> Result<(), Utf8Error>;
}
Expand description
A low-level interfacne for streaming validation of UTF-8 data. It is meant to be integrated in high-performance data processing pipelines.
Data can be streamed in arbitrarily-sized chunks using the Self::update()
method. There is
no way to find out if the input so far was valid UTF-8 during the validation. Only when
the validation is completed with the Self::finalize()
method the result of the validation is
returned. Use ChunkedUtf8Validator
if possible for highest performance.
This implementation requires CPU SIMD features specified by the module it resides in.
It is undefined behavior to use it if the required CPU features are not available which
is why all trait methods are unsafe
.
General usage:
use simdutf8::basic::imp::Utf8Validator;
use std::io::{stdin, Read, Result};
fn main() -> Result<()> {
unsafe {
if !std::is_x86_feature_detected!("avx2") {
panic!("This example only works with CPUs supporting AVX 2");
}
let mut validator = simdutf8::basic::imp::x86::avx2::Utf8ValidatorImp::new();
let mut buf = vec![0; 8192];
loop {
let bytes_read = stdin().read(buf.as_mut())?;
if bytes_read == 0 {
break;
}
validator.update(&buf);
}
if validator.finalize().is_ok() {
println!("Input is valid UTF-8");
} else {
println!("Input is not valid UTF-8");
}
}
Ok(())
}
Required Methods§
sourceunsafe fn new() -> Selfwhere
Self: Sized,
unsafe fn new() -> Selfwhere
Self: Sized,
Creates a new validator.
Safety
This implementation requires CPU SIMD features specified by the module it resides in. It is undefined behavior to call it if the required CPU features are not available.
sourceunsafe fn update(&mut self, input: &[u8])
unsafe fn update(&mut self, input: &[u8])
Updates the validator with input
.
Safety
This implementation requires CPU SIMD features specified by the module it resides in. It is undefined behavior to call it if the required CPU features are not available.
sourceunsafe fn finalize(self) -> Result<(), Utf8Error>
unsafe fn finalize(self) -> Result<(), Utf8Error>
Finishes the validation and returns Ok(())
if the input was valid UTF-8.
Errors
A basic::Utf8Error
is returned if the input was not valid UTF-8. No
further information about the location of the error is provided.
Safety
This implementation requires CPU SIMD features specified by the module it resides in. It is undefined behavior to call it if the required CPU features are not available.