Wire Sysio Wire Sysion 1.0.0
Loading...
Searching...
No Matches
sysio::trace_api::compressed_file Class Reference

#include <compressed_file.hpp>

Public Member Functions

 compressed_file (fc::path file_path)
 
 ~compressed_file ()
 
 compressed_file (compressed_file &&)
 
compressed_fileoperator= (compressed_file &&)
 
void open ()
 
bool is_open () const
 
void seek (long loc)
 
void read (char *d, size_t n)
 
void close ()
 
auto get_file_path () const
 
compressed_file_datastream create_datastream ()
 

Static Public Member Functions

static bool process (const fc::path &input_path, const fc::path &output_path, size_t seek_point_stride)
 

Detailed Description

wrapper for read-only access to a compressed file. compressed files support seeking and reading

the efficiency of seeking is lower than that of an uncompressed file as each seek translates to

  • 2 seeks + 1 read to load and process the seek-point-mapping
  • potentially a read/decompress/discard of the data between the seek point and the requested offset

More seek points can lower the average amount of data that must be read/decompressed/discarded in order to seek to any given offset. However, each seek point has some effect on the file size as it represents a flush of the compressor which can degrade compression performance.

A compressed file looks like this on the filesystem: /====================\ file offset 0 | | | Compressed Data | | with seek points | | | |-----------------—| file offset END - 2 - (16 * seek point count) | | | mapping of | | orig offset to | | seek pt offset | | | |-----------------—| file offset END - 2 | seek pt count | ====================/ file offset END

Where a "seek point" is a point in the compressed data stream where the decompressor can start reading from having not read any of the prior data seek points should be traversable by a decompressor so that reads which span seek points do not have to be aware of them

In zlib this is created by doing a complete flush of the stream

Definition at line 45 of file compressed_file.hpp.

Constructor & Destructor Documentation

◆ compressed_file() [1/2]

sysio::trace_api::compressed_file::compressed_file ( fc::path file_path)
explicit

Definition at line 171 of file compressed_file.cpp.

172:file_path(std::move(file_path))
173,file_ptr(nullptr)
174,impl(std::make_unique<compressed_file_impl>())
175{
176 impl->file_size = fc::file_size(file_path);
177}
uint64_t file_size(const path &p)
Here is the call graph for this function:

◆ ~compressed_file()

sysio::trace_api::compressed_file::~compressed_file ( )

Definition at line 179 of file compressed_file.cpp.

180{}

◆ compressed_file() [2/2]

sysio::trace_api::compressed_file::compressed_file ( compressed_file && )
default

Provide default move construction/assignment

Member Function Documentation

◆ close()

void sysio::trace_api::compressed_file::close ( )
inline

Close the underlying fc::cfile

Definition at line 96 of file compressed_file.hpp.

96 {
97 file_ptr.reset();
98 }

◆ create_datastream()

compressed_file_datastream sysio::trace_api::compressed_file::create_datastream ( )
inline

Definition at line 155 of file compressed_file.hpp.

155 {
156 return compressed_file_datastream(*this);
157 }

◆ get_file_path()

auto sysio::trace_api::compressed_file::get_file_path ( ) const
inline

return the file path associated with this compressed_file

Returns
the fc::path associated with this file

Definition at line 104 of file compressed_file.hpp.

104 {
105 return file_path;
106 }

◆ is_open()

bool sysio::trace_api::compressed_file::is_open ( ) const
inline

Query whether the underlying file is open or not

Returns
true if the file is open

Definition at line 71 of file compressed_file.hpp.

71{ return (bool)file_ptr; }

◆ open()

void sysio::trace_api::compressed_file::open ( )
inline

Open the underlying fc::cfile for reading

Definition at line 60 of file compressed_file.hpp.

60 {
61 file_ptr = std::make_unique<fc::cfile>();
62 file_ptr->set_file_path(file_path);
63 file_ptr->open("rb");
64 }

◆ operator=()

compressed_file & sysio::trace_api::compressed_file::operator= ( compressed_file && )
default

◆ process()

bool sysio::trace_api::compressed_file::process ( const fc::path & input_path,
const fc::path & output_path,
size_t seek_point_stride )
static

Convert the file that exists at input_path into a compressed_file written to output_path.

Parameters
input_path- the path to the input file
output_path- the path to write the output file to (overwriting an existing file at that path)
seek_point_stride- the number of uncompressed bytes between seek points
Returns
true if successful, false if there was no error but the process could not complete
Exceptions
std::ios_base::failureif the input_path does not exist or the output_path cannot be written to
compressed_file_errorif there is an issue during compression of the data stream

Definition at line 197 of file compressed_file.cpp.

197 {
198 if (!fc::exists(input_path)) {
199 throw std::ios_base::failure(std::string("Attempting to create compressed_file from file that does not exist: ") + input_path.generic_string());
200 }
201
202 const size_t input_size = fc::file_size(input_path);
203 if (input_size == 0) {
204 throw std::ios_base::failure(std::string("Attempting to create compressed_file from file that is empty: ") + input_path.generic_string());
205 }
206
207 // subtract 1 to make sure that the truncated division will only create a seek point if there is at least one byte
208 // in the next stride. So, a file size of N and a stride >= N results in 0 seek points. N + 1 will have a seek
209 // point for the last byte as will XN + 1 which will create X seek points (the last of which is for the last byte)
210 // of the file
211 const auto seek_point_count = (input_size - 1) / seek_point_stride;
212 std::vector<seek_point_entry> seek_point_map(seek_point_count);
213
214 fc::cfile input_file;
215 input_file.set_file_path(input_path);
216 input_file.open("rb");
217
218 fc::cfile output_file;
219 output_file.set_file_path(output_path);
220 output_file.open("wb");
221
222 z_stream strm;
223 strm.zalloc = Z_NULL;
224 strm.zfree = Z_NULL;
225 strm.opaque = Z_NULL;
226
227 if (deflateInit2(&strm, Z_BEST_COMPRESSION, Z_DEFLATED, raw_zlib_window_bits, 8, Z_DEFAULT_STRATEGY) != Z_OK) {
228 return false;
229 }
230
231 constexpr size_t buffer_size = 64*1024;
232 auto input_buffer = std::vector<uint8_t>(buffer_size);
233 auto output_buffer = std::vector<uint8_t>(buffer_size);
234
235 auto bytes_remaining_before_sync = seek_point_stride;
236 int next_sync_point = 0;
237
238 // process a single chunk of input completely,
239 // this may sometime loop multiple times if the compressor state combined with input data creates more than a
240 // single buffer's worth of data
241 //
242 auto process_chunk = [&]( size_t input_size, int mode ) {
243 strm.avail_in = input_size;
244 strm.next_in = input_buffer.data();
245
246 do {
247 strm.avail_out = output_buffer.size();
248 strm.next_out = output_buffer.data();
249 auto ret = deflate(&strm, mode);
250
251 const bool success = ret == Z_OK || (mode == Z_FINISH && ret == Z_STREAM_END);
252 if (!success) {
253 return ret;
254 }
255
256 output_file.write(reinterpret_cast<const char*>(output_buffer.data()), output_buffer.size() - strm.avail_out);
257 } while (strm.avail_out == 0);
258
259 return Z_OK;
260 };
261
262 size_t read_offset = 0;
263 while (read_offset < input_size) {
264 const auto bytes_remaining = input_size - read_offset;
265 const auto read_size = std::min({ buffer_size, bytes_remaining, bytes_remaining_before_sync });
266 input_file.read(reinterpret_cast<char*>(input_buffer.data()), read_size);
267
268 auto ret = process_chunk(read_size, Z_NO_FLUSH);
269 if (ret != Z_OK) {
270 throw compressed_file_error(std::string("deflate failed: ") + std::to_string(ret));
271 }
272 read_offset += read_size;
273
274 if (read_size == bytes_remaining ) {
275 // finish the file out by draining remaining output
276 ret = process_chunk(0, Z_FINISH);
277 if (ret != Z_OK) {
278 throw compressed_file_error(std::string("failed to finalize file compression: ") + std::to_string(ret));
279 }
280 } else if ( read_size == bytes_remaining_before_sync ) {
281 // create a sync point by flushing the compressor so a decompressor can start at this offset
282 ret = process_chunk(0, Z_FULL_FLUSH);
283 if (ret != Z_OK) {
284 throw compressed_file_error(std::string("failed to create sync point: ") + std::to_string(ret));
285 }
286
287 seek_point_map.at(next_sync_point++) = {read_offset, output_file.tellp()};
288
289 if (next_sync_point == seek_point_count) {
290 // if we are out of sync points, set this value one past the end (disabling it)
291 bytes_remaining_before_sync = input_size - read_offset + 1;
292 } else {
293 bytes_remaining_before_sync = seek_point_stride;
294 }
295 } else {
296 bytes_remaining_before_sync -= read_size;
297 }
298 }
299
300 deflateEnd(&strm);
301 input_file.close();
302
303 // write out the seek point table
304 if (seek_point_map.size() > 0) {
305 output_file.write(reinterpret_cast<const char*>(seek_point_map.data()), seek_point_map.size() * sizeof(seek_point_entry));
306 }
307
308 // write out the seek point count
309 output_file.write(reinterpret_cast<const char*>(&seek_point_count), sizeof(seek_point_count_type));
310
311 output_file.close();
312 return true;
313}
void read(char *d, size_t n)
Definition cfile.hpp:114
void close()
Definition cfile.hpp:202
void open(const char *mode)
Definition cfile.hpp:65
size_t tellp() const
Definition cfile.hpp:79
void set_file_path(fc::path file_path)
Definition cfile.hpp:37
void write(const char *d, size_t n)
Definition cfile.hpp:127
std::string generic_string() const
bool exists(const path &p)
CK_RV ret
Here is the call graph for this function:
Here is the caller graph for this function:

◆ read()

void sysio::trace_api::compressed_file::read ( char * d,
size_t n )

Read a given number of uncompressed bytes to the buffer pointed to by d.

This interface is made to match fc::cfile for easy integration

Parameters
d- buffer to write data to
n- the number of bytes to read
Exceptions
std::ios_base::failureif this would result in reading past the end of the uncompressed file
compressed_file_errorif the compressed data stream is corrupt or unreadable

Definition at line 187 of file compressed_file.cpp.

187 {
188 impl->read(d, n, *file_ptr);
189}
Here is the caller graph for this function:

◆ seek()

void sysio::trace_api::compressed_file::seek ( long loc)

Seek to the given uncompressed offset

Parameters
locthe byte offset in the uncompressed file to seek to
Exceptions
std::ios_base::failureif this would seek past the end of the file
compressed_file_errorif the compressed data stream is corrupt or unreadable

Definition at line 182 of file compressed_file.cpp.

182 {
183 impl->seek(loc, *file_ptr);
184
185}

The documentation for this class was generated from the following files: