[WIP] Way to Build Grophup

Grophup is my new personal project to study big data analysis.

Data set

tencent qq group leak data around 2012.

Tech Stack:

  • Sql server 2008 r2, to restore the original data set.
  • Neo4j, graph database to store data and relations
  • python3, lib:
  • pipenv: setup virtualenv
  • pymssql: python lib to connect to sql server

Day 0

  • install sql server
  • WIP import data into sql server
  • import query: sp_attach_single_file_db @dbname='GroupData5_Data' . @physname='[path to your data set folder]\GroupData5_Data.MDF'

Day 1, 2017-3-30

  • init git repo, github
  • setup python virtualenv use pipenv
  • connect sql server from python script
  • need to config sql server to enable tcp/ip connection first, doc. run C:\Windows\SysWOW64\SQLServerManager10.msc to open the Configuration Manager if you can't find it.
  • in python script, conn = pymssql.connect(server='SX-DEV', database="GroupData1_Data")

Day 2, 2017-4-5

  • init django into project
  • setup local neo4j use docker
  • install python3.6
  • use py3.6 for pyenv, on windows: pipenv install --python=E:\python36\python.exe
  • well, pymssql not support python3.6 yet, will still need to use py3.5
  • create a django command to port data.

Day 3, 2017-04-07

  • setup dotenv

Day 4, 2017-04-08

  • setup neo4j connector, py2neo
  • add methods to add group nodes

Day 5, 2017-04-12, I got engaged today!

  • optimize port command to handle exception, node creating should be resumed at where it stopped.