Skip Main Navigation
Page Content

Save This Event

Event Saved

HIKM 2020 Tutorial 4 Knowledge Discovery and Data Science from the Command Line (Unix shell)

Andrew Stranieri

Monday, 3 February 2020 from 1:30 pm to 4:30 pm (AEDT)

HIKM 2020 Tutorial 4 Knowledge Discovery and Data...

Ticket Information

Type Remaining End Quantity
General 16 Tickets 31/01/2020 Free  

Share HIKM 2020 Tutorial 4 Knowledge Discovery and Data Science from the Command Line (Unix shell)

Event Details

Title: Knowledge Discovery and Data Science from the Command Line (Unix shell) Organizers:

Prof. Dr. Andreas Schmidt1,2, Dr. Steffen Scholz1


  1. Institute for Automation and Applied Informatics Karlsruhe Institute of Technology

    Karlsruhe, Germany

    email: { andreas.schmidt | steffen.scholz }@kit.edu


  2. Department of Computer Science and Business Information Systems University of Applied Sciences

Karlsruhe, Germany

email: andreas.schmidt@hs-karlsruhe.de Primary email contact: andreas.schmidt@kit.edu

Short Bio:


Prof. Dr. Andreas Schmidt is a professor at the Department of Computer Science and Business Information Systems of the Karlsruhe University of Applied Sciences (Germany). He is lecturing in the fields of database information systems, data analytics and model-driven software development.

Additionally, he is a senior research fellow in computer science at the Institute for Applied Computer Science of the Karlsruhe Institute of Technology (KIT). His research focuses on database technology, knowledge extraction from unstructured data/text, Big Data, and generative programming. Andreas Schmidt was awarded his diploma in computer science by the University of Karlsruhe in 1995 and his PhD in mechanical engineering in 2000. Dr. Schmidt has numerous publications in the field of database technology and information extraction. He regularly gives tutorials on international conferences in the field of Big Data related topics and model driven software development. Prof.

Schmidt followed sabbatical invitations from renowned institutions like the Systems-Group at ETH- Zurich in Switzerland and the Database Group at the Max-Planck-Institute for Informatics in Saarbrucken/Germany.


Dipl.-Ing Dr. Steffen G. Scholz has more than 15 years of R&D experience in the field of polymer micro & nano replication with a special focus on injection moulding and relevant tool-making technologies. He is an expert in process optimization and algorithm design and development for micro replication processes. He studied mechanical engineering with special focus on plastic processing and micro injection moulding and obtained his degree as from the University of Aachen (RWTH). He obtained his PhD from Cardiff University in the field of process monitoring and optimization in micro injection moulding and led a team in micro tool making and micro replication at Cardiff University. Dr. Scholz joined KIT in 2012, where he is now leading the group for process optimization, information management and applications (PIA).

Tutorial description:

For data analysis and knowledge discovery, typically we load the data into a dedicated tool, like a relational database, the statistic program R, mathematica, or some other specialized tools to perform our analysis.


But often, there is also another option, which can be performed on nearly every computer, having the necessary amount of mass-storage available. Many shells, like bashcsh, … provide a bunch of powerful tools to manipulate and transform data and also to perform some sort of analysis like aggregation, etc. Beside the free availability, these tools have the advantage that they can be used immediately, without transforming and loading the data into the target system before. Another important point is, that they typically are stream based and so, huge amounts of data can be processed, without running out of main-memory. With the additional use of gnuplot, ambitious graphic plots can easily be generated.


The aim of this tutorial is to present the most useful tools like cat, grep, tr, sed, awk, comm, uniq, join, split, bzip2, wget, etc., and give an introduction on how they can be used together. So, for example, a wide number of queries which typically will be formulated with SQL, can also be performed using the tools mentioned before, as it will be shown in the tutorial.


The tutorial will also include hands-on parts, in which the participants do a number of practical data- analysis, transformation and visualization tasks.


Target Audience:

Level: Intermediate - Participants should be familiar using a shell like bash, csh, DOS shell, …


Materials to be distributed to the attendees:

  • Slideset

  • Command refcard

  • Practical exercises

    Duration: (3 hours)

  • Introduction 15 min.

  • Commands/tools for structured data 45 min.

  • Hands-on Part I 30 min.

  • Commands/tools for unstructured data 30 min.

  • Visualization 30 min.

  • Hands-on Part II 30 min.

    Software Requirements for the hands-on parts:

  • Unix and Mac users: none, the needed tools are already part of your distribution

  • Windows users: Please install cygwin on your computer (https://www.cygwin.com/). gnuplot must be additional selected during the cygwin installation process.

Relevant publications from the Organizers (previously given tutorials):


Andreas Schmidt, Steffen Scholz, “How to build a Search-Engine with Common Unix-Tools”. The Tenth International Conference on Advances in Databases, Knowledge, and Data Applications, DBKDA 2018, 20. – 24. May 2018, Nice, France


Andreas Schmidt. “A Practical Approach for Teaching Model Driven Software Development – A plea for the ‘from scratch’ Approach”. Workshop at the EDUCON 2018 – IEEE Global Engineering Education Conference, 17-20. April 2018, Santa Cruz de Tenerife, Spain.


Andreas Schmidt, Steffen Scholz; “Data Science using the Shell”, 20th International Conference on Enterprise Information Systems (ICEIS), Funchal, Madeira 21. – 24. March 2018


Andreas Schmidt, Steffen Scholz: “An Introduction into statistical computing with R”. Seventh International Conference on Internet Technologies & Applications, Wrexham/Nord Wales, September 2017.


Andreas Schmidt, Steffen Scholz: “Data Manipulation and Data Transformation using the Shell”. Ninth International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA-2017), 21. – 25 May, Barcelona/Spain, 2017


Andreas Schmidt, Steffen Scholz; “An Introduction into statistical computing and natural language processing with R”. Tutorial at the 8th International Conference on Advances in Databases, Knowledge, and Data Applications, Lisbon, Portugal, June 2016.


Kimmig, D., Schmidt, A.: “The Hadoop Core – Understanding Map Reduce and the Hadoop Distributed File System”. Tutorial, DataSys 2013, November 17 - 22, Lisbon, Portugal, 2013


Schmidt, A.: “Overview of the Hadoop Ecosystem”. Fifth International Conference on Internet Technologies & Applications (ITA-13), Wrexham, Wales, September 2013.


Schmidt, A.: “The power of regular expressions in the software development process”, International Conference on Software Engineering and Applications (SEA 2010), Marina del Rey, USA, November 8

– 10, 2010


Schmidt, A.; "Building a Multi-Purpose Generator Engine"; 11th International Conference on Internet and Multimedia Systems and Applications (IMSA 2007, Honolulu, Hawaii, USA), August 20 – 22, 2007


Schmidt, A.; "Programming Patterns and Architecture of Web-based Database Applications". 8th International Conference on Internet and Multimedia Systems and Applications 2004, Kauai/USA, 16-

18. August, 2004

Have questions about HIKM 2020 Tutorial 4 Knowledge Discovery and Data Science from the Command Line (Unix shell)? Contact Andrew Stranieri

Save This Event

Event Saved

When & Where


Swinburne University of Technology
John Street
Hawthorn, VIC 3122
Australia

Monday, 3 February 2020 from 1:30 pm to 4:30 pm (AEDT)


  Add to my calendar

Organiser

Andrew Stranieri

HIKM 2020 Australasian Workshop on Health Informatics and Knowledge Management  is held in conjunction with the Australasian Computer Science Week (ACSW); the premier event for Computer Science researchers across Australasia. ACSW is attended by many national and international delegates comprising academics, industry representatives and HDR students.

  Contact the Organiser

Interested in hosting your own event?

Join millions of people on Eventbrite.

Please log in or sign up

In order to purchase these tickets in installments, you'll need an Eventbrite account. Log in or sign up for a free account to continue.