digitalmars.D - dpaste and the wayback machine
- Andrei Alexandrescu (12/12) Feb 07 2016 Dpaste currently does not expire pastes by default. I was thinking it
- Wyatt (22/33) Feb 08 2016 You want it in Wayback? Sounds like you need some WARC [0].
- Andrei Alexandrescu (11/46) Feb 09 2016 That's intense. I think a simple page (or chained linked collection of
- Jesse Phillips (7/19) Feb 08 2016 I'm not sure if the wayback machine should be used for version
- Wyatt (6/11) Feb 08 2016 I'm pretty sure that's Andrei's thought, too. It's a pastebin;
Dpaste currently does not expire pastes by default. I was thinking it would be nice if it saved them in the Wayback Machine such that they are archived redundantly. I'm not sure what's the way to do it - probably linking the newly-generated paste URLs from a page that the Wayback Machine already knows of. I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when the WM does not see a link that is search for, it offers the option to archive it) obtaining https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec. Thoughts? Andrei
Feb 07 2016
On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu wrote:Dpaste currently does not expire pastes by default. I was thinking it would be nice if it saved them in the Wayback Machine such that they are archived redundantly. I'm not sure what's the way to do it - probably linking the newly-generated paste URLs from a page that the Wayback Machine already knows of. I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when the WM does not see a link that is search for, it offers the option to archive it) obtaining https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec. Thoughts?You want it in Wayback? Sounds like you need some WARC [0]. Since anyone can upload to IA (using a nice S3-like API, even [1]), this should be pretty uncomplicated. If you can get a list of all the paste URLs, you can use wget [2] to build the WARC fairly trivially. [3] Then I'd suggest getting a dlang account and make an item [4] out of it. Just make sure it's set to mediatype:web and it should get ingested by Wayback. After that? Generate a WARC when a paste is made and use the dlang S3 keys to add it to the previous item (or maybe just do it daily or weekly so as to not stress the derive queue too much). I'm pretty sure that's all that's needed. -Wyatt [0] http://fileformats.archiveteam.org/wiki/WARC [1] https://archive.org/help/abouts3.txt [2] -i, --input-file=FILE download URLs found in local or external FILE. [3] http://www.archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget [4] https://blog.archive.org/2011/03/31/how-archive-org-items-are-structured/
Feb 08 2016
On 2/8/16 11:44 AM, Wyatt wrote:On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu wrote:That's intense. I think a simple page (or chained linked collection of pages) containing links to all pastes defined would suffice. For example consider defining dpaste.dzfl.pl containing a link to dpaste.dzfl.pl/today.html. That would contain e.g. the links generated today and a button "More" linked to dpaste.dzfl.pl/2016-02-08.html (which would be yesterday). That in turn would contain links to yesterday's pastes and a link to the day before etc. My understanding is this is enough to have wayback archive all pastes.Dpaste currently does not expire pastes by default. I was thinking it would be nice if it saved them in the Wayback Machine such that they are archived redundantly. I'm not sure what's the way to do it - probably linking the newly-generated paste URLs from a page that the Wayback Machine already knows of. I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when the WM does not see a link that is search for, it offers the option to archive it) obtaining https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec. Thoughts?You want it in Wayback? Sounds like you need some WARC [0]. Since anyone can upload to IA (using a nice S3-like API, even [1]), this should be pretty uncomplicated. If you can get a list of all the paste URLs, you can use wget [2] to build the WARC fairly trivially. [3] Then I'd suggest getting a dlang account and make an item [4] out of it. Just make sure it's set to mediatype:web and it should get ingested by Wayback. After that? Generate a WARC when a paste is made and use the dlang S3 keys to add it to the previous item (or maybe just do it daily or weekly so as to not stress the derive queue too much). I'm pretty sure that's all that's needed.I'm pretty sure that's Andrei's thought, too. It's a pastebin; people use it to make web links to pasted things. If it were to disappear, a lot of links would break very permanently because Heritrix has no way to index and crawl the site.Yah. Andrei
Feb 09 2016
On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu wrote:Dpaste currently does not expire pastes by default. I was thinking it would be nice if it saved them in the Wayback Machine such that they are archived redundantly. I'm not sure what's the way to do it - probably linking the newly-generated paste URLs from a page that the Wayback Machine already knows of. I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when the WM does not see a link that is search for, it offers the option to archive it) obtaining https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec. Thoughts? AndreiI'm not sure if the wayback machine should be used for version control, if you want to keep a history of your past I suggest using a gist.github.com. I view the wayback machine as a view for what the web used to look like not necessarily what information was in it.
Feb 08 2016
On Monday, 8 February 2016 at 20:02:41 UTC, Jesse Phillips wrote:I'm not sure if the wayback machine should be used for version control, if you want to keep a history of your past I suggest using a gist.github.com. I view the wayback machine as a view for what the web used to look like not necessarily what information was in it.I'm pretty sure that's Andrei's thought, too. It's a pastebin; people use it to make web links to pasted things. If it were to disappear, a lot of links would break very permanently because Heritrix has no way to index and crawl the site. -Wyatt
Feb 08 2016