bstr

Function decode_last_utf8

Source
pub fn decode_last_utf8<B: AsRef<[u8]>>(slice: B) -> (Option<char>, usize)
Expand description

UTF-8 decode a single Unicode scalar value from the end of a slice.

When successful, the corresponding Unicode scalar value is returned along with the number of bytes it was encoded with. The number of bytes consumed for a successful decode is always between 1 and 4, inclusive.

When unsuccessful, None is returned along with the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence. In this case, the number of bytes consumed is always between 0 and 3, inclusive, where 0 is only returned when slice is empty.

ยงExamples

Basic usage:

use bstr::decode_last_utf8;

// Decoding a valid codepoint.
let (ch, size) = decode_last_utf8(b"\xE2\x98\x83");
assert_eq!(Some('โ˜ƒ'), ch);
assert_eq!(3, size);

// Decoding an incomplete codepoint.
let (ch, size) = decode_last_utf8(b"\xE2\x98");
assert_eq!(None, ch);
assert_eq!(2, size);

This example shows how to iterate over all codepoints in UTF-8 encoded bytes in reverse, while replacing invalid UTF-8 sequences with the replacement codepoint:

use bstr::{B, decode_last_utf8};

let mut bytes = B(b"\xE2\x98\x83\xFF\xF0\x9D\x9E\x83\xE2\x98\x61");
let mut chars = vec![];
while !bytes.is_empty() {
    let (ch, size) = decode_last_utf8(bytes);
    bytes = &bytes[..bytes.len()-size];
    chars.push(ch.unwrap_or('\u{FFFD}'));
}
assert_eq!(vec!['a', '\u{FFFD}', '๐žƒ', '\u{FFFD}', 'โ˜ƒ'], chars);