Floating Parsing: Eisel-Lemire algorithm

컴퓨터/Rust 2021. 9. 14. 18:13

728x90

Eisel-Lemire algorithm

lemire: Fast function to parse strings into double (binary64) floating-point values, enforces the RF

Fast function to parse strings into double (binary64) floating-point values, enforces the RFC 7159 (JSON standard) grammar: 4x faster than strtod - GitHub - lemire/fast_double_parser: Fast function...

github.com

Introduction

Eisel lemire 알고리즘은 나온지 얼마되지 않은 플롯 파싱 알고리즘이다.

플롯 파싱 알고리즘은 "123424" 와 같은 문자 같은 변수들을 float 타입으로 변환하는 기술이다.

C언어의 strtod() 함수를 떠올리면 되고 이것보다 약 9배 이상 빠르다고 한다.

#include "fast_double_parser.h" // the file is in the include directory


double x;
char * string = ...
const char * endptr = fast_double_parser::parse_number(string, &x);

현재 Rust (1.55+), Go(1.16+)에 공식으로 내장 되어있다.

@Go버전 링크

@Rust버전 링크

How?

@논문 링크

위 논문을 이해할 수준까진 안되어서 궁금하신 분들은 논문을 확인하면 될 것 같다.

개인적으로 이해한 점

핵심: C, C++, Rust, Go와 같은 언어들에서는 floating point 수들은 기본적으로 64비트이다.

근데 strings에서 float으로 바꾸는 알고리즘들이 굉장히 느리다.

64비트 기준 floating-point 수들을 (IEEE 754) 나타내려면 최대로 한 17 digits이 필요한데,

미리 계산된 table과 정수부 (significand)를 합치면 1번이나 2번의 64비트 곱셈으로도 가장 근접한 floating-point 수를 찾을 수 있다.

구현

구현된 버전을 보면 그리 길지도 않은데 수학적이라 어려운 것 같다.

func eiselLemire64(man uint64, exp10 int, neg bool) (f float64, ok bool) {
	// The terse comments in this function body refer to sections of the
	// https://nigeltao.github.io/blog/2020/eisel-lemire.html blog post.

	// Exp10 Range.
	if man == 0 {
		if neg {
			f = math.Float64frombits(0x80000000_00000000) // Negative zero.
		}
		return f, true
	}
	if exp10 < detailedPowersOfTenMinExp10 || detailedPowersOfTenMaxExp10 < exp10 {
		return 0, false
	}

	// Normalization.
	clz := bits.LeadingZeros64(man)
	man <<= clz
	const float64ExponentBias = 1023
	retExp2 := uint64(217706*exp10>>16+64+float64ExponentBias) - uint64(clz)

	// Multiplication.
	xHi, xLo := bits.Mul64(man, detailedPowersOfTen[exp10-detailedPowersOfTenMinExp10][1])

	// Wider Approximation.
	if xHi&0x1FF == 0x1FF && xLo+man < man {
		yHi, yLo := bits.Mul64(man, detailedPowersOfTen[exp10-detailedPowersOfTenMinExp10][0])
		mergedHi, mergedLo := xHi, xLo+yHi
		if mergedLo < xLo {
			mergedHi++
		}
		if mergedHi&0x1FF == 0x1FF && mergedLo+1 == 0 && yLo+man < man {
			return 0, false
		}
		xHi, xLo = mergedHi, mergedLo
	}

	// Shifting to 54 Bits.
	msb := xHi >> 63
	retMantissa := xHi >> (msb + 9)
	retExp2 -= 1 ^ msb

	// Half-way Ambiguity.
	if xLo == 0 && xHi&0x1FF == 0 && retMantissa&3 == 1 {
		return 0, false
	}

	// From 54 to 53 Bits.
	retMantissa += retMantissa & 1
	retMantissa >>= 1
	if retMantissa>>53 > 0 {
		retMantissa >>= 1
		retExp2 += 1
	}
	// retExp2 is a uint64. Zero or underflow means that we're in subnormal
	// float64 space. 0x7FF or above means that we're in Inf/NaN float64 space.
	//
	// The if block is equivalent to (but has fewer branches than):
	//   if retExp2 <= 0 || retExp2 >= 0x7FF { etc }
	if retExp2-1 >= 0x7FF-1 {
		return 0, false
	}
	retBits := retExp2<<52 | retMantissa&0x000FFFFF_FFFFFFFF
	if neg {
		retBits |= 0x80000000_00000000
	}
	return math.Float64frombits(retBits), true
}

참고

https://arxiv.org/abs/2101.11408

Number Parsing at a Gigabyte per Second

With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires variable-precisio

arxiv.org

https://nigeltao.github.io/blog/2020/eisel-lemire.html

The Eisel-Lemire ParseNumberF64 Algorithm

The Eisel-Lemire ParseNumberF64 Algorithm Summary: ParseNumberF64, StringToDouble and similarly named functions take a string like "12.5" (one two dot five) and return a 64-bit double-precision floating point number like 12.5 (twelve point five). Some numb

nigeltao.github.io

https://internals.rust-lang.org/t/implementing-a-fast-correct-float-parser/14670

Implementing a Fast, Correct Float Parser

~2 years ago, I posted about making Rust float-parsing fast and correct. Although I had planned on working towards integrating these changes into the Rust core library, due to personal reasons (mental health), I needed to take a step back. I'm much better

internals.rust-lang.org

728x90

저작자표시 비영리 변경금지 (새창열림)

'컴퓨터 > Rust' 카테고리의 다른 글

Rust fleet (cargo 대체 빌드 툴) (0)	2022.05.04
Rust: chrono, timezone 예제 (0)	2021.09.08
Rust: Cross compile to linux on windows (0)	2021.08.30

인기포스트 MORE POST

ABOUT ME

STUDY BITS STUDY BITS

Eisel-Lemire algorithm

Introduction

How?

구현

참고

'컴퓨터 > Rust' 카테고리의 다른 글

티스토리툴바

인기포스트 MORE POST

ABOUT ME

Eisel-Lemire algorithm

Introduction

How?

구현

참고

'컴퓨터 > Rust' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바