on feb 22nd, 2005
tagged nerd, php
and
never commented on
share this page
i had mentioned that this notaweblog code had been getting slower since my log_data file is now 560Kb and 10,315 lines, so i figured i'd do something about it.
i either had to change log_data to a database (db(3) style or sqlite) or keep log_data in plaintext but have to run some kind of conversion program after each change to rebuild a separate database.
neither of those sounded appealing. i didn't want to give up being able to manage the log data in vim and cvs, and i didn't want to have to run something after every commit and make sure the files are in sync all the time.
so... i rewrote the file parsing logic so it scans the file once very quickly, storing the byte locations for each date as they are in the file. then when it needs to pull the day data out for the given day, it knows which byte to fseek() to, jumps there, parses the day, and breaks out and stops parsing the rest of the file. this allows it to get the data for the given day, but still know the dates of every other entry in the file to be able to draw the calendar to the left.
that made it a little faster, so i added some caching to it so it will write out a hints file storing all of those day->byte pairs. when the page is loaded again, it stat()'s the hints file and the log_data file and if the modification times are the same (the hints file's modification time is set to that of the log_data file when it's written), it loads the hints file instead of having to parse the whole file again. then it can fseek() right quick and not have to parse the whole file again until it changes (and then automatically updates the hints file).
so all of that gives it a database-style index that:
- is updated automatically,
- is not required for operation (and is still somewhat quick without it), and
- does not require having to change anything in the log_data text file.
leave the first comment or contact me