소스 검색

Prevent constantly moving bytes around for better performance on large chunked records

master
JustAnotherArchivist 3 년 전
부모
커밋
01274e461a
1개의 변경된 파일4개의 추가작업 그리고 3개의 파일을 삭제
  1. +4
    -3
      warc-tiny

+ 4
- 3
warc-tiny 파일 보기

@@ -162,13 +162,14 @@ def iter_warc(f):
else:
httpDecompressor = DummyDecompressor()
if chunked:
pos = 0
while True:
try:
chunkLineEnd = httpBody.index(b'\r\n')
chunkLineEnd = httpBody.index(b'\r\n', pos)
except ValueError:
print('Error: could not find chunk line end in record {}, skipping'.format(recordID), file = sys.stderr)
break
chunkLine = httpBody[:chunkLineEnd]
chunkLine = httpBody[pos:chunkLineEnd]
if b';' in chunkLine:
chunkLength = chunkLine[:chunkLine.index(b';')].strip()
else:
@@ -181,7 +182,7 @@ def iter_warc(f):
break
chunk = httpDecompressor.decompress(httpBody[chunkLineEnd + 2 : chunkLineEnd + 2 + chunkLength])
yield HTTPBodyChunk(chunk)
httpBody = httpBody[chunkLineEnd + 2 + chunkLength + 2:]
pos = chunkLineEnd + 2 + chunkLength + 2
else:
yield HTTPBodyChunk(httpDecompressor.decompress(httpBody))
else:


불러오는 중...
취소
저장