Ver a proveniência

Work around warcio not writing a block digest for warcinfo records (https://github.com/webrecorder/warcio/issues/87)

The length has to be set manually because otherwise warcio will automatically remove the header again.
tags/v0.2.0
JustAnotherArchivist há 4 anos
ascendente
cometimento
f14a664b1c
1 ficheiros alterados com 6 adições e 2 eliminações
  1. +6
    -2
      qwarc/warc.py

+ 6
- 2
qwarc/warc.py Ver ficheiro

@@ -88,11 +88,15 @@ class WARC:
},
'extra': self._specDependencies.extra,
}
payload = io.BytesIO(json.dumps(data, indent = 2).encode('utf-8'))
digester = warcio.utils.Digester('sha1')
digester.update(payload.getvalue())
record = self._warcWriter.create_warc_record(
'urn:X-qwarc:warcinfo',
'warcinfo',
payload = io.BytesIO(json.dumps(data, indent = 2).encode('utf-8')),
warc_headers_dict = {'Content-Type': 'application/json; charset=utf-8'},
payload = payload,
warc_headers_dict = {'Content-Type': 'application/json; charset=utf-8', 'WARC-Block-Digest': str(digester)},
length = len(payload.getvalue()),
)
self._warcWriter.write_record(record)
return record.rec_headers.get_header('WARC-Record-ID')


Carregando…
Cancelar
Guardar