>> (p.1)
    Author Topic: Raw data of bitcointalk forum - how to avoid or correctly scrap information  (Read 228 times)
    JeremyB (OP)
    Sr. Member
    ****
    Offline Offline

    Activity: 812
    Merit: 270



    View Profile
    March 18, 2018, 08:33:25 AM
    Last edit: March 18, 2018, 10:17:19 AM by JeremyB
     #1

    I recently discover some posts related to Merit system based on raw data file provided by theymos

    Here you go: https://bitcointalk.org/merit.txt.xz

    Similar to trust.txt.xz, it'll be updated weekly. It will show only the last 120 days of data; someone else should archive the old ones if you want them.

    Then another one from LoyceV that points to a username/id mapping (pastebin) but I don't know who provided it, how and when it has been last updated.

    For some time I was thinking about retrieving some information from the forum to create some stats but it would involve a lot of pages scraping.

    So my first question is: is there any list of available raw data to use? I heard about the trust data (https://bitcointalk.org/trust.txt.xz) but what I clearly looking for is related to the forum architecture: thread parent/children, message parent thread, message author, user id/names mapping, etc...

    Second question: if this kind of data is not available, what are the policies concerning the forum scrapping?

    EDIT: just found theymos thread concerning new data dumps here: https://bt.irlbtc.com/view/3151741.0
    So I locked the thread.
    Pages: [1]
      Print  
Page 1
Viewing Page: 1