Posted by: Pieter Van Gorp
on Jul 05, 2010
Tagged in: Untagged
Posted by: Pieter Van Gorp
on Jul 01, 2010
Tagged in: Untagged
Steffen Mazanek is currently presenting the live contest challenge for TTC. The domain under consideration is lambda calculus. This may sound theoretical at first but the concrete challenges involve many practical model to model and model to text issues.
The reference solution to the challenge is contained in this ZIP archive and can be evaluated in SHARE here. Note that during the contest you don't really need the SHARE demo, since the zip archive contains sufficient input and ouput models (a test suite for all transformation solutions).
Have fun, be sharp!
Posted by: Pieter Van Gorp
on Jul 01, 2010
Tagged in: Untagged
The slides describing the workshop program, publication strategy, etc. are available at Slideboom.
Regards, and enjoy the workshop, Pieter
Posted by: Pieter Van Gorp
on Apr 20, 2010
I have finished a new screencast (PDF copy) as documentation for the new SHARE feature that enables people to stress-test each-others solutions using input models that they have stored locally.
Last year's tool contest indicated that many people required some mechanism to upload content to SHARE and at the time, we had to give people full internet access in the remote virtual machines... which is obviously undesirable on a larger scale since:
- SHARE contains various artefacts that should not be downloadable to the local machines of SHARE users,
- one could abuse SHARE for hacking other sites or send SPAM etc. (although we could trace this back based on IP information it is better to prevent such activities.)
In a nutshell, SHARE users have now a private folder on the virtual machine server (currently we do not allow more than 500MB per user but we will revisit such restrictions on regular basis). Users can use rsync to transfer data from their local machine to their remote SHARE folder. This is for example useful when a reviewer wants to exercise the correctness of a research contribution further by using some of his own test inputs.
As a convenient side-effect of this new feature, users can now also store data across immutable sessions. Previously, state transfer between sessions was only possible when using a mutable clone, but now the 500MB folder will persist across any session.
Personally, I already find this very useful in the context of various teaching activities, since now TU/e students that use a Mac can work for example with CPNtools without requiring a mutable disk (which saves us a lot of hard-drive space). Moreover, in a research context, it makes a lot of sense to have a private server folder in SHARE because that enables you to test for example all tool contest solutions to the BPMN2BPEL case study using the same input BPMN models.
Enjoy, and please do not hesitate to send me some feedback (via e-mail or via the discussion features of this site),
Pieter Van Gorp
Posted by: Ralf Laemmel
on Mar 30, 2010
Tagged in: Untagged
Giving away your corpus (in empirical language analysis or elsewhere) is perhaps nothing extremely established, but it's nothing original or strange either. It does make sense a lot! Stop limiting the impact of your research efforts! Stop wasting the time of your community members! Sharing corpora is one of these many good ideas of Research 2.0: see SSE'10 (and friend events), eScience @ Microsoft, R2oSE, ...
Computer Science vs. Science
When you do academic CS research in programming- or software development-related contexts, then the culture of validation is these days such that you are often expected to provide online access to your program, application, library, tool, what have you as an implementation or illustration. There are various open-source repositories that are used to this end--as a backend (a storage facility), but any sort of author-hosted download locations are also used widely. In basic terms, if you write a paper, you include a URL. (There is one exception: if your work leverages Haskell, you can usually include the complete source code right into your paper so that one gets convenient access through copy and paste. Sorry for the silly joke.) Metadating-wise, common practice is nowhere perfect, but it's perfect compared to what follows.
When you do empirical analysis in CS, which results in some statements or data about software/IT artifacts, then the culture of validation is essentially the one of science. In particular, reproducibility is the crucial requirement. You describe the methodology of your analysis in a detailed manner. So you define your hypotheses, your input, your techniques for measurement, your results (which you also interpret), your threats to validity, what have you. Downloads aren't integral with science. What would you want to download anyway?
Message of this post?!
I suggest that various artifacts of an empirical analysis in CS, in general; in empirical languageanalysis, in particular, qualify for a valuable download. In this post, I want to call out thecorpora (as in corpora of source projects, buildable projects, built projects, runnable projects, ran projects, demos, etc.).
Beyond reproducibility in CS
What's indeed not yet commonplace (if done ever) is that the corpora underlying empirical analyses are given convenient access to. Consider for example Baxter et. al's paper on structural properties of Java software, or Cranor et al.'s paper on P3P deployment. These are two seminal papers in their fields. I would loose a night of sleep over each of the two corpora.
Wouldn't it be helpful for researchers if such corpora were made available for one-click download incl. useful metadata, potentially even tooling? Let's suppose such convenient access became a best practice. First, reproducibility would be improved. Second, derived research would be simplified. Third, incentives for collaboration would be added.
I contend that convenient access adds little pain for the original author, but adds huge value for the scientific community. Why should we need to execute the description of some corpus from some paper, if it requires substantial work for us, but the corpus would be easily shared by the primary author. Why should we work hard to "reproduce the corpus" if some little help by the original authors would make reproducibility (of the corpus and most of the research work, perhaps) a charm.
Naysayers -- get lost
I can think of many reasons why 'convenient access' is not getting off the ground. Here are few obvious options:
- "It's extra work, even if it is little extra work." This problem can be solved if incentives are created. For instance, publications on empirical analysis with 'convenient access' to the corpus could be rated higher than those w/o. Also, just like tool papers in many venues, there could be corpus papers.
- "There is sufficient, inconvenient access available already." At least, for one of the two examples above, I fully understand how I could go about gathering the corpus myself, but I have not executed this plan, even though I could really use this corpus in some ongoing research activity. It's just too much work for me. I am effectively hampered in benefitting from the authors' research beyond their immediate results.
- "Provision of convenient access is too difficult." Think of a corpus of Java programs. Suddenly, an access provider gets into the business of configuration management. After all, convenience would imply that the corpus builds and runs out of the box. I think the short-term answer is that access to the corpus w/o extra "out-of-the box" magic is still more convenient than no access. The long-term-answer is that we may need a notion ofremote access to corpora, where I can give you access to my corpus in my environment, through appropriate, web-based or service-oriented interfaces.
- "Convenient access gives a head start to the competition." I refuse to believe that this is really too relevant in academic practice. For instance, I am sure that the research groups behind the above-mentioned papers have no "corpus monopoly" in mind. I have not done much work on empirical analysis, but I have experience with papers that "give away details", and I must say that those papers which give away the most typically coincide with those which have the highest impact in all possible ways.
- "There is copyright & Co. in the way." Yes, it is. This is a serious problem, and we better focus on solving the problem shortly, if we want to get anywhere with science and (IT) society in this age. This post will just explode if I tried to comment on that issue over here. There are many good ideas around on this issue, and we all understand that some amount of sharing works even now in this very imperfect world as we have it. If you are pro-Research 2.0, don't get bogged down by this red herring.
Well, I can think of quite a number of other reasons, but I reckon that all the usual suspects have been named, and everything else can be delegated perhaps to some discussion on this blog or elsewhere.
Regards,
Ralf Lämmel
PS: CS is of an age that empirical research is becoming viral and vital. I am grateful for talking to Jean-Marie occasionally with his lucid vision of Research 2.0 and linguistics for software languages---two topics that are strongly connected. Empirical analysis of software languages has got to be an integral part of software language linguistics. Specialized software-engineering conferences like SLE, ICPC and MSR or even big ones like ICSE or OOPSLA include empirical research for a while now.
Posted by: Pieter Van Gorp
on Feb 28, 2010
Tagged in: Untagged
Hi all,
I really look forward to R2oSE 2010.
As an online warm-up, I would like to get some community input for the following issue: during the yearly transformation tool contest [report], we typically have an interactive session to derive a feature matrix for comparing the solutions that were submitted to our case studies. We start with some brainstorming, then cluster features on a blackboard, and finally make three feature matrices that we print on paper and hand out to all workshop participants. This enables a live peer review: solution submitters get about 15 minutes to demonstrate their transformation program and the audience fills in one matrix column in the meanwhile.
By assigning weights to the different features, we then analyze which solution satisfies the criteria that were proposed by the audience the best. The following table aggregates the data from 9 evaluations of 10 contributions: BPMN2BPEL solution evaluation results.
The advantage of this interactive approach is that the winner of our contest is usually supported by most of our workshop attendees. Currently though the manual process of having the brainstorm session, clustering on the blackboard and then creating an spreadsheet is very time-consuming and the final step (creating the actual spreadsheet, documenting the meaning of a criterium, ...) needs to be done by one of the organizers (or we simply run out of time for considering the actual solutions...) This is undesirable, since it allows (probably accidental) bias and it is too error-prone.
Therefore, I have been thinking of web 2.0 support for streamlining this process. The idea is to involve workshop participants as soon as possible in the creation of the digital evaluation form (or underlying feature model) and to enable some online conflict resolution/integration. I am currently looking for a website that provides:
- an account system (since we want to avoid anonymous rant) [example: planet-research20.org, but several alternatives are out there],
- a (semi-)structured editor for feature matrices [could be google spreadsheets, a feature diagram editor, ...]
- (optional) a mechanism for merging contributed feature matrices (reconciling different documents/models created using the editor from (2),
- a mechanism to assign weights to features (to enable the automatic computation of the "best" solution,
- (optional) a mechanism to override the weights (on a user basis) after the workshop. That would enable end-users to assign a winner according to their own preferences. Usually, they would do that for finding which solution they could use out of the box or which solution they would like to extend, or ... (whatever we currently cannot imagine yet)
I can imagine this might at first sound rather specific to our transformation tool contest (TTC workshop) but I think that such a platform would be very useful for various other research purposes (researchers often want to structure a discussion about which algorithm/tool/... performs "best" according to too implicit criteria!) Once we better understand how to use (& improve) such a platform, we have a new means to perform survey research: instead of doing a large literature review, installing some tools, creating a feature model and assessing the result based on our personal understanding, we have a more efficient (and probably more effective) community based approach.
Therefore, I would like to ask the Research 2.0 community to suggest some candidate platforms that either already support some of the above functionality or that could be used to build that functionality. Notice the popularity of light-weight platforms such as doodle (when it comes to meeting planning) so perhaps we rather need a clever integration of light-weight web applications here as well (google docs + ???) rather than an inflexible "dedicated" platform?
I welcome all your input (comments, suggestions, questions, ...), please do not hesitate to contact me.
Sincerely, Pieter Van Gorp r2ose2010 co-organizer Eindhoven University of Technology
Posted by: Adrian Kuhn
on Nov 05, 2009
Tagged in: Untagged
I joined twitter one and a half years ago, after having a hand written RSS feed for years. And gosh, I did not have the slightest idea to what kind of journey I was embarking! While my handwritten feed was limited to published papers, I soon learned to love the social aspect of twitter. At a sudden, folks would react to my news posts, like when Kent Beck contacted me regarding the code maps of JUnit. However, I have to admit that since the user explosion in 2009, twitter has become much less personal.
One of the success factors of twitter is that everyone can (and has to) find its personal usage model.
Here is how I use twitter for me and my research:
One of my motivations is to "maximize the value of your keystrokes". For example, whenever I recommend a paper to a friend in private, I repeat this on twitter, etc. I first read of this on Jeff Atwood's blog, but I guess the idea goes back to Jon Udell.
Often however, we cannot maximize our keystrokes because research 1.0 does not allow us to do so (or at least we fear that we cannot do so). In these cases I try to find solutions like publishing the wordle cloud of a paper under submission, which I have seen first being done on Tom Zimmermann's blog.
A nice use case of twitter is to follow the hashtag of a conference. Alas most sci conference do not have good coverage (or none at all). There is e.g. not much point in tweeting from a talk without mentioning the speaker! In my experience technology and software conference get better coverage than research conference, but things started to change (see eg the recent MODELS/GPCE conference, or the GTTSE sommer school).
Of course, I also follow other researchers as well as practitioners (hey, the whole C2 crowd again united in one place!) Alas, since twitter broke replies, its much harder to stumble upon new folks.
Creating twitter accounts for your research projects. Both of my current projects (jexample and codemap) have their own twitter account. I did this after reading about the awesome success of the mars phoenix account by one NASA engineer, but stopped to blog in first person as if being the project after some time.
In the beginning the project accounts were limited to updates and communication with users, but over time I started to post anything related to the project's topic that I'd stumple upon (maximize the value of your keystrokes, again). One of the downsides that I see with splitting off these two accounts is that folks that visit my main account will not learn about my major fields of research, which I try to relax with occasional retweets.
Some folks in our lab use the feed of their research project to post all SVN commit message. I am unsure about this practice, putting unfiltered streams from other sources on twitter does not seem to live up the full potential of this platform.
Recently I created a twitter account for a lecture that I coach as teaching assistant. I cannot yet tell where this leads, the lecture starts in February. However, I have used a blog in the last years and I assume it will lead me in a similar direction, ie posting links to additional reading material as well as useful tools and Eclipse plugins.
Lukas uses twitter as well in the current software engineering lecture. However, his use is very specific to that lecture since the students are building a twitter client, and thus not suitable for other lectures.
I also created an account for our upcoming ICSE workshop on software search engines (suite2010), but again I cannot yet tell where this will lead. We will certainly find better use than just posting deadlines...
Over time, my whole research group joined twitter as well, including Oscar. Most of them joined after I told them about maximizing keystrokes and how awesome following a conference is. (In the meantime, I even stopped being the most active twitterer in the group.) We found that the recent twitter list feature is a great way to present your research group on twitter. We created a new account for the group including lists for staff, alumni, students, projects, etc.
See you on twitter :)
Posted by: Pieter Van Gorp
on Nov 05, 2009
Tagged in: Untagged
Hi all, Jean-Marie is so kind to present SHARE on my behalf.
In the following slideset, you find some more background information and screenshots: http://www.slideboom.com/presentations/109003/SHARE:-Sharing-Hosted-Autonomous-Research-Environments-(IBM-CASCON)
Best regards, Pieter Van Gorp
Posted by: Ian Sommerville
on Nov 05, 2009
Tagged in: Untagged
There is a UK network interested in Web 2.0 for research - not just software engineering but e-science also. Not much on the web site yet but I attended a workshop this week and slides should be available soon.
Ian S.
Posted by: Jean-Marie Favre
on Nov 03, 2009
Tagged in: Untagged
Here is a new book that is very relevant to Research 2.0 and in particular Innovation 2.0. This book entitled "Innovation Passport: The IBM First-of-a-Kind (FOAK) Journey From Research to Reality" has been written by Mary Jo Frederich and Peter Andrews. Mary Jo is the director of the IBM Industry Solutions Labs and First-of-a-Kind (FOAK) program. I highly recommend you reading this book if you want to know more about research- industry interplay and innovation. Some chapters are particularily interesting from a Research 2.0 / Enterprise 2.0 / Innovation 2.0 point of view. In particular have a look at the conclusion. It is worth! For those who are attending CASCON currently, I've this book here (I'm in the demo booth), and will be happy to give it to you if you want to have a look at it ;-)
<< Start < Prev 1 2 Next > End >>
|