www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - dpaste and the wayback machine

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Dpaste currently does not expire pastes by default. I was thinking it 
would be nice if it saved them in the Wayback Machine such that they are 
archived redundantly.

I'm not sure what's the way to do it - probably linking the 
newly-generated paste URLs from a page that the Wayback Machine already 
knows of.

I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when the 
WM does not see a link that is search for, it offers the option to 
archive it) obtaining 
https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.


Thoughts?

Andrei
Feb 07 2016
next sibling parent reply Wyatt <wyatt.epp gmail.com> writes:
On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu 
wrote:
 Dpaste currently does not expire pastes by default. I was 
 thinking it would be nice if it saved them in the Wayback 
 Machine such that they are archived redundantly.

 I'm not sure what's the way to do it - probably linking the 
 newly-generated paste URLs from a page that the Wayback Machine 
 already knows of.

 I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec 
 (when the WM does not see a link that is search for, it offers 
 the option to archive it) obtaining 
 https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.


 Thoughts?
You want it in Wayback? Sounds like you need some WARC [0]. Since anyone can upload to IA (using a nice S3-like API, even [1]), this should be pretty uncomplicated. If you can get a list of all the paste URLs, you can use wget [2] to build the WARC fairly trivially. [3] Then I'd suggest getting a dlang account and make an item [4] out of it. Just make sure it's set to mediatype:web and it should get ingested by Wayback. After that? Generate a WARC when a paste is made and use the dlang S3 keys to add it to the previous item (or maybe just do it daily or weekly so as to not stress the derive queue too much). I'm pretty sure that's all that's needed. -Wyatt [0] http://fileformats.archiveteam.org/wiki/WARC [1] https://archive.org/help/abouts3.txt [2] -i, --input-file=FILE download URLs found in local or external FILE. [3] http://www.archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget [4] https://blog.archive.org/2011/03/31/how-archive-org-items-are-structured/
Feb 08 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/8/16 11:44 AM, Wyatt wrote:
 On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu
 wrote:
 Dpaste currently does not expire pastes by default. I was thinking
 it would be nice if it saved them in the Wayback Machine such that
 they are archived redundantly.

 I'm not sure what's the way to do it - probably linking the
 newly-generated paste URLs from a page that the Wayback Machine
 already knows of.

 I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec (when
  the WM does not see a link that is search for, it offers the
 option to archive it) obtaining
 https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.




 Thoughts?
You want it in Wayback? Sounds like you need some WARC [0]. Since anyone can upload to IA (using a nice S3-like API, even [1]), this should be pretty uncomplicated. If you can get a list of all the paste URLs, you can use wget [2] to build the WARC fairly trivially. [3] Then I'd suggest getting a dlang account and make an item [4] out of it. Just make sure it's set to mediatype:web and it should get ingested by Wayback. After that? Generate a WARC when a paste is made and use the dlang S3 keys to add it to the previous item (or maybe just do it daily or weekly so as to not stress the derive queue too much). I'm pretty sure that's all that's needed.
That's intense. I think a simple page (or chained linked collection of pages) containing links to all pastes defined would suffice. For example consider defining dpaste.dzfl.pl containing a link to dpaste.dzfl.pl/today.html. That would contain e.g. the links generated today and a button "More" linked to dpaste.dzfl.pl/2016-02-08.html (which would be yesterday). That in turn would contain links to yesterday's pastes and a link to the day before etc. My understanding is this is enough to have wayback archive all pastes.
 I'm pretty sure that's Andrei's thought, too. It's a pastebin; people
 use it to make web links to pasted things. If it were to disappear, a
 lot of links would break very permanently because Heritrix has no way
 to index and crawl the site.
Yah. Andrei
Feb 09 2016
prev sibling parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Sunday, 7 February 2016 at 21:59:00 UTC, Andrei Alexandrescu 
wrote:
 Dpaste currently does not expire pastes by default. I was 
 thinking it would be nice if it saved them in the Wayback 
 Machine such that they are archived redundantly.

 I'm not sure what's the way to do it - probably linking the 
 newly-generated paste URLs from a page that the Wayback Machine 
 already knows of.

 I just saved this by hand: http://dpaste.dzfl.pl/2012caf872ec 
 (when the WM does not see a link that is search for, it offers 
 the option to archive it) obtaining 
 https://web.archive.org/web/20160207215546/http://dpaste.dzfl.pl/2012caf872ec.


 Thoughts?

 Andrei
I'm not sure if the wayback machine should be used for version control, if you want to keep a history of your past I suggest using a gist.github.com. I view the wayback machine as a view for what the web used to look like not necessarily what information was in it.
Feb 08 2016
parent Wyatt <wyatt.epp gmail.com> writes:
On Monday, 8 February 2016 at 20:02:41 UTC, Jesse Phillips wrote:
 I'm not sure if the wayback machine should be used for version 
 control, if you want to keep a history of your past I suggest 
 using a gist.github.com.

 I view the wayback machine as a view for what the web used to 
 look like not necessarily what information was in it.
I'm pretty sure that's Andrei's thought, too. It's a pastebin; people use it to make web links to pasted things. If it were to disappear, a lot of links would break very permanently because Heritrix has no way to index and crawl the site. -Wyatt
Feb 08 2016